On the Embedding of Narratives, and How it Pertains to Computational Neuroscience
Updated: Mar 9, 2020
A common question for the last article I posted was "Why did you reference [Riemann17]? Aren't narrative embeddings identical to sentence embeddings?"
For this post, I will consider narrative embeddings w.r.t. the ROCStories dataset. Firstly, we need to precisely define narrative embeddings. For this we need a few definitions.
Fabula vs. Syuzhet
Lets clarify the difference between a fabula and syuzhet. A fabula is the "raw story", which often takes a graphical representation in the form of linking world/latent states of a story together. An example of this is as follows. Consider the following story form the ROCStories dataset. We shall call this story "Karen's New Friend."
Karen was assigned a roommate her first year of college. Her roommate asked her to go to a nearby city for a concert. Karen agreed happily. The show was absolutely exhilarating. Karen became good friends with her roommate.
We will separate this story into disjoint sentences. Number the sentences 1 - 5. Every sentence has a corresponding world/story state. Namely
Karen has a roommate in her first year of college.
Her roommate asks her to go to a nearby city.
Karen and her roommate enjoy the show.
Karen is now friends with her roommate.
Fabulas have an infinite number of representations, but we can get back to that later. For now, if this is how we are representing the internal state of the story, then what exactly are we reading? That is the syuzhet.
Often when someone refers to some narrative embedding, they are referring to a latent representation of a fabula. Otherwise, it would be more sensible to simply refer to a text embedding. The latter of course would be the syuzhet embedding. Note that a fabula may have an infinite number of equivalent syuzhet embeddings. Switching between these embeddings is referred to as syuzhet reprojection.
The syuzhet captures the literary devices used in explaining the narrative. It [could] quite literally be the text; however, as we shall discuss it doesn't have to be text! A syuzhet could be a video game, it could be music, it could be poetry. The syuzhet is the "creative" component of storytelling.
The great part about separating the syuzhet and fabula is that we can now directly perform our transformations and operations on the raw story and then reproject to any syuzhet we want. Sadly, for the time being, syuzhet reprojection is an open problem that
Very few ML practitioners are aware of.
Is notoriously difficult.
We are mostly interested in (2). The reason this problem is so difficult is because it requires a highly detailed representation of the fabula, which is near impossible to extract from written text.
Why are fabulas so difficult to represent?
Consider the case where we have complete knowledge of the world we are operating in. An amazing example of this, and one where most of my research to this date has been focused, is in multiagent simulations. We will define the following simulation.
Assume we have N agents, each with M desires. At time step t agent i decides which of its M desires has the highest priority. This will be called desire j. Given some prior, every agent has access to their own decision pool- say Di.
The prior is greatly useful here. An agent that is in the kitchen will have access to cooking, unlike an agent that is at work. Specifying the prior requires that agents have some capability of planning. This, in turn, is what actually allows for storytelling. It is an essential feature of any fabula representation (e.g. the ability to plan over some latent variables). We shall call agent i's set of plans Pi.
So at time step t, agent i will choose a decision from decision pool Di that maximizes desire j while minimizing the loss to all other M-1 desires and without conflicting with any of the plans contained by Pi.
This still is not a very interesting story though. Without restricting the world further, the only plans that agents would create would be to maintain vital desires and move between locations. Roughly as complex as a single celled organism. At this point there are a number of further restrictions we can place on the world.
Perhaps the simplest is just to limit the number of resources at every location. For a sufficiently intelligent agent, this would result in agents trying to maintain control of certain locations. Once resources at that location ran out, agents would move to the next location.
Ok well this is starting to get interesting, but a good story not only has competition- it has to have collaboration as well. This is a great schism in multiagent research. Collaboration and competition are on opposite ends of the spectrum. Having agents choose whether or not to compete or collaborate is an open problem, not only for multiagent researchers but for politicians alike. Namely, it is often even difficult for humans to decide whether or not to compete or collaborate.
As this representation gets more and more complex to capture even the simplest of stories, hopefully I have shed some light onto why a generalized representation of fabulas is most likely impossible for any real world datasets. This is entirely disregarding the difficulty of extracting emergent narratives from text.
Story generation is a problem that is already actively being pursued. How is this possible?
If you have seen my curated list of narratology papers then you would have known that syuzhet generation is already a problem that numerous researchers are attempting to tackle, from colleagues at Georgia Tech [Ammanabrolu2019] to, more mainstream, groups like FAIR [Fan2019]. Obviously there must be some current solution to representing fabulas then!
"But Louis," you say, "this seems to be an issue of feature engineering! This, quite literally, is the perfect application of deep learning." Note, I am referring to deep learning in the sense of learning hierarchal feature representations, not in the sense of neural networks.
My current favourite representation of fabulas comes from [REDACTED] and is currently under peer review. As such, I cannot link the paper. Once it is available I will update this blog post. The idea is rather simple though.
Firstly, given a five sentence story, we provide a model with the N randomly selected sentences. Its objective is produce a fabula equivalent to the analogous syuzhet of the original 5 sentences. This is simply referred to as narratologically analogous. The (final) evaluation is performed by humans.
Namely given the story where the 5th sentence is generated from the latent fabula representation, the original 5 sentence story, and various other baselines, all stories are presented in randomly selected pairs (without replacement) to mechanical turks. The turkers are then asked if the stories are equivalent in terms of content.
During training, we train the network to match the original text of the story and to maximize exact match. At a high level, since the loss is performed over the syuzhet rather than the fabula, this loss is quite viable. Note that it is nearly impossible to perform loss over a latent fabula directly in almost all circumstances. An exception is training neuro controllers for agent control in the case of multiagent simulations. By this token, a lot of deep RL learning can be seen as training over a fabula. Perhaps in a follow up I can discuss the application of reinforcement learning to narratology.
In the case of ROCStories, since every sentence corresponds to a different plot point, we begin by initializing some N vectors that represent our fabula at each of the corresponding given N sentences. This is done by training a VAE to determine a plot point representation given the sentence embedding.
This vector, let's call it v of dimension L, is concatenated with a vector w of length l < L where w is a function of v. Vector v is called the narrative prior while vector w is called the primitive coordinate vector. As the name entails, w is a set of coordinate vectors over a basis of primitive SLDS matrices. These SLDS determine the transition of one plot point to the next, over the narrative prior.
The important thing to note here is that the latent vector has two components. The first set of components specifies how we got to the current world state, what ever that might entail. The second set of components specifies the trajectory the narrative might take.
In the case of the original literature, this second component was stored as an embedding for a corresponding discrete state. This discrete state, in the original literature, specified the sentiment of a given sentence. Using these discrete states allows for training of the VAE. Without them, training the VAE would be significantly harder. However more recent work will allow for the removal of this discrete state, where the coordinate vector is simply a function over the narrative prior.
If it was not a give away by the restriction of our dataset, this approach only really works when the plot points are semantically sequential. How can we further expand on this point? It would be nice if we could use a recurrent switching dynamic instead of simply a linear one. Well we are in luck! Precisely this work came out of Liam Paninski's lab in late 2016 [Linderman2016]. I strongly recommend this paper, the work is truly fascinating. As it turns out though, recurrent switching dynamics are also amazing at other things! Namely modelling biological neural networks.
This is where [Riemann17]'s work really shines. We can for instance assume that every recurrent switching dynamic can be represented as a population of neurons. The more complex the switching dynamics required to model a plot, the more complex the plot has to be. As such, since Riemann gives us a metric to measure the complexity of biological neural networks, the hope is that this also gives us a metric to measure the complexity of narratives.
Since a linear switching dynamic is simply a less general case of a recurrent switching dynamic, e.g. a recurrent switching dynamic would be able to model everything a linear one would, this approach can be initialized by training on the same dataset and then fine tuned to more complex, but less numerous, narratives.
VERY IMPORTANT NOTE: Not enough work has been done on the link between Riemann17 and computational narratology to formally confirm this. We simply lack the computational resources. More advancements on the front of computational homology is required before it can be verified. The above must be taken with a grain of salt. As before, rebuttals are more than welcome!
Written by Louis C. Twitter
Co-written by Arthur R. Twitter