Matiana, Shahbuland. Biderman, Stella. Smith, JR. Teehan, Ryan. Castricato, Louis. Gao, Leo. Frazier, Spencer. "Cut the CARP: Fishing for zero-shot story evaluation"
2021. Preprint. Available here.
Recent advances in large-scale language models (Raffel et al., 2019; Brownet al., 2020) have brought significant qualitative and quantitative improvements in machine-driven text generation. Despite this, generation and evaluation of machine-generated narrative text remains a challenging problem. Objective evaluation of computationally-generated stories may be prohibitively expensive, require meticulously annotated datasets, or may not adequately measure the logical coherence of a generated story's narratological structure.
Informed by recent advances in contrastive learning (Radford et al., 2021), we present Contrastive Authoring and Reviewing Pairing (CARP): a scalable, efficient method for performing qualitatively superior, zero-shot evaluation of stories. We show a strong correlation between human evaluation of stories and those of CARP. Model outputs more significantly correlate with corresponding human input than those language-model based methods which utilize finetuning or prompt engineering approaches. We also present and analyze the Story-Critique Dataset, a new corpora composed of 1.3 million aligned story-critique pairs derived from over 80,000 stories. We expect this corpus to be of interest to NLP researchers.
Castricato, Louis. Frazier, Spencer. Balloch, Jonathan. Tarakad, Nitya. Riedl, Mark "Tell Me A Story Like I’m Five: Story Generation via Question Answering"
Neural language-model based approaches to automated story generation suffer from two important limitations. First, language-model based story generators generally do not work toward a given goal or ending. Second, they often lose coherence as the story gets longer. We propose a novel approach to automated story generation that treats the problem as one of generative question-answering. Our proposed story generation system starts with sentences encapsulating the final event of the story. The system then iteratively (1)~analyzes the text describing the most recent event, (2)~generates a question about ``why'' a character is doing the thing they are doing in the event, and then (3)~attempts to generate another, preceding event by answering this question. We show that the coherency of a story can be measured as the relative entropy over the distribution of responses to claims about said story’s events. Using a within-subjects human evaluation we measure this coherency entropy over the responses to sets of True-False statements for multiple stories generated by our model and each baseline. The evaluation shows that our system generates stories that are on average 15.9% more coherent that those generated by the BART language model fine-tuned on a story corpus to generate sentences in reversed order to more closely match our process.
Castricato, Louis. Biderman, Stella. Thue, David. Cardona-Rivera, Rogelio. "Towards a Model-theoretic View of Narratives"
In this paper, we propose the beginnings of a formal framework for modeling narrative qua narrative. Our framework affords the ability to discuss key qualities of stories and their communication, including the flow of information from a Narrator to a Reader, the evolution of a Reader's story model over time, and Reader uncertainty. We demonstrate its applicability to computational narratology by giving explicit algorithms for measuring the accuracy with which information was conveyed to the Reader and two novel measurements of story coherence.
Castricato, Louis. Frazier, Spencer. Balloch, Jonathan. Riedl, Mark. "Fabula Entropy Indexing: Objective Measures of Story Coherence."
Automated story generation remains a difficult area of research because it lacks strong objective measures. Generated stories may be linguistically sound, but in many cases suffer poor narrative coherence required for a compelling, logically-sound story. To address this, we present Fabula Entropy Indexing (FEI), an evaluation method to assess story coherence by measuring the degree to which human participants agree with each other when answering true/false questions about stories. We devise two theoretically grounded measures of reader question-answering entropy, the entropy of world coherence (EWC), and the entropy of transitional coherence (ETC), focusing on global and local coherence, respectively. We evaluate these metrics by testing them on human-written stories and comparing against the same stories that have been corrupted to introduce incoherencies. We show that in these controlled studies, our entropy indices provide a reliable objective measure of story coherence.
Castricato, Louis. Fitz, Stephen. Shin, Gary. "Parameter-Efficient Neural Question Answering Models via Graph-Enriched Document Representations."
2020. Preprint. Available here.
As the computational footprint of modern NLP systems grows, it becomes in- creasingly important to arrive at more efficient models. We show that by employing graph convolutional document representation, we can arrive at a question answering system that performs comparably to, and in some cases exceeds the SOTA solutions, while using less than 5% of their resources in terms of trainable parameters. As it currently stands, a major issue in applying GCNs to NLP is document repre- sentation. In this paper, we show that a GCN enriched document representation greatly improves the results seen in HotPotQA, even when using a trivial topology. Our model (gQA), performs admirably when compared to the current SOTA, and requires little to no preprocessing. In "Is graph structure necessary for multi-hop reasoning?," the authors suggest that graph networks are not necessary for good performance in multi-hop QA. In this paper, we suggest that large language models are not necessary for good performance by showing a na ̈ıve implementation of a GCN performs comparably to SoTA models based on pretrained language models.
Orchard, Jeff. Castricato, Louis. "Combating Adversarial Inputs Using a Predictive-Estimator Network."
2017. ICONIPS. Best paper award. Available here.
Deep classification networks have shown great accuracy in classifying inputs. However, they fall prey to adversarial inputs, random inputs chosen to yield a classification with a high confidence. But percep- tion is a two-way process, involving the interplay between feedforward sensory input and feedback expectations. In this paper, we construct a predictive estimator (PE) network, incorporating generative (predictive) feedback, and show that the PE network is less susceptible to adversarial inputs. We also demonstrate some other properties of the PE network.