So, I’ve been thinking lately about repetition. For a while I was trying to figure out at what point repetition might stop acting as simple duplication and become a prescription of the duplicating act itself. To take “A rose is a rose is a rose” for example: Stein herself has used it other places with more than three repetitions. To see it through the lens of a two-step Markov chain, the phrase “a rose” has a 66% chance of being followed by “is a” and a 33% chance of being the final phrase. In a three-step chain, “# A rose” has a 100% chance of being followed by “is a rose” where the number sign marks the beginning of the line and “is a rose” has a equal chance of being either the final phrase or repeating itself. In the first example, the single phrase ” #A rose #” could be produced as well as any repetition up to an infinite number of repetitions, whereas in the second, the smallest possible phrase would be ” _ A rose is a rose _” and the number of possible repetitions could again also reach up to infinity though this is even less likely than in the two-step chain.
What has stalled me in this thinking though is where it could be useful besides theorizing other possible texts. I have since shifted the focus slightly and instead have been thinking of the idea of minimum covering sets (what is necessary to produce the text) rather than procedural chains (what the patterns of the text could also produce). A basic covering set would simply be the list of words used in a text: “a”, “rose”, and “is” are sufficient to write Stein’s aphorism, but this is not a very useful description as it gives little other information to the structure of the text. The same list of words is also the minimum covering set of lines such as “A is rose”, “Is a rose a rose?” or “A rose rose”. Wordlists, normally combined with a word frequency index, are often used as a analytical measure of texts (see the downloadable word frequency list of Wallace Stevens’s collected poems at the Wallace Stevens Online Concordance, for example), and wordlists have also been known to be used by postmodern poets as replacements of texts.
But what happens if instead of using words we look at all word connections, at pairs of successive words, in the text? For Gertrude Stein’s aphorism, we have the minimum set of:
{(#, a), (a, rose), (rose, is), (is, a), (rose, #)}, where # again signifies the end or beginning of a line.
The difference between this and Markov analysis is that the minimum covering set does not give the statistical chances of each connection being decided upon. The pairs lie flat in the set — they are purely a set of connections rather than decisions. If, like above, we expanded this to triplets, we’d end up with {(#, a, rose), (a, rose, is), (is, a, rose), (a, rose, #)}
The statistical figures for the aphorism would be as follows:
- 8 words long, 10 if the beginning and end of the line are included as textual markers.
- 3 distinct words, 2 textual markers, thus 5 members of a one-word minimum covering set (MCS)
- 5 members of a two-word MCS
- 4 members of a three-word MCS
- 5 members of a four-word MCS
- 5 members of a five-word MCS
- 5 members of a six-word MCS
- 4 members of a seven-word MCS
- 3 members of an eight-word MCS
- 2 members of a nine-word MCS
- 1 member of a ten-word MCS
The covering sets don’t start exhausting themselves (start becoming trivial) until the six-word set, and the three-word set is the most efficient of the non-trivial.
Whether this is a measure of nonsense rather than redundancy is what’s at stake, but I imagine that, whereas a wordlist can produce both sensical and nonsensical texts of the exact same size, there is most likely a very consistent (with a reasonable margin) relationship between the number of unique relations between words and the size of the text for what we’d call “normal” writing. I imagine that the number of unique relationships in nonsensical texts like those of Stein and Beckett wouldn’t start becoming trivial until the relationships were composed a much higher number of terms than for the same moment in normal texts. I also imagine that texts which started exhausting themselves in low-term covering sets would similarly be described as “meaningless” rather than “nonsensical.”
Let’s look at an 8 word “normal” sentence, using the same methods as above. I realize 8-word sentences aren’t long enough to prove much, but humor me. For the sentence “A rose is a popular kind of flower,” we have 8 distinct words, and so while the two-term covering set will tell us how to put the sentence together, it is otherwise no more efficient than the word list itself. To try and better mimic the structure of Stein’s sentence, we could try and make her sentence “meaningful” with minimum effort: “This rose is a rose and it is rose (colored).” We get 9 total words, 6 distinct words, and no assistance from covering sets. “This is a rose and that is a rose” — 9 total words, 7 distinct — gets results similar to the actual Stein sentence, having a greater efficiency at the 3-term MCS than anywhere else because, just as before, the most repeatable section (”is a rose”) is three words long. But even this sentence, though (possibly) informative and decidedly not “experimental”, has something fishy about it.
In a longer work, repeated phrases would not be the so consistently sized, and so the number of terms in the most efficient regularly sized covering set would be a compromise among all of them. If we posited a sort of smarter idea of the minimum covering set, where it deduced the most repeated phrases as arbitrarily sized from the text, we would see something else. For Stein the smarter minimum covering set (SMCS), might be something like:
{(#, a), (a, rose, is), (is, a, rose), (rose, #)}
which again has 4 members. For a non-repetitive sentence, however, there would only be one member: the entire sentence itself. In a different way, this is what Coetzee found odd in Beckett’s Lessness: that the SMCS for the text was not composed of large enough members. What differs in this picture from Coetzee’s analysis is that, like a list of words, a list of non-overlapping phrases leaves out crucial information of the text’s construction.
This post has already gone on too long, so I will save the rest for a later one, but I think a clear definition of covering sets could help give an abstracted view of the important role nonsense, as opposed to “common sense”, plays in culture as well as show the adaptability of this concept to computer text-generation where the idea of culture is perhaps less applicable.
Post a Comment