Appendix B
The John Cocke Theory of Dreams
There are such things as optimal methods of encoding. In a computer, a fragment of textual information is usually represented by one 8 bit byte for each ASCII character. More modern computer systems use 16 bits (2 bytes) for each Unicode character (which might be a character from any one of a great deal of human languages). Thus, a short story made up of a few thousand words might take 6 (characters per word) times 8 bits per character times 3,000 words equals 144,000 bits. There are various methods of encoding that reduce the number of bits needed to represent such a story. Encoding schemes fall into 2 broad classes: lossless and lossy. A message encoded by a lossless encoding scheme can always be reconstructed exactly. A message encoded by a lossy encoding scheme can only be reconstructed approximately. Lossy encoding schemes are commonly used to encode pictures (JPEG or MPEG) or speech or music (MP3). An example of lossy encoding for textual messages could involve the loss of all information as regards to font, formatting and capitalization while some words are replaced by synonyms and some word order might be swapped. Further, a really intelligent lossy encoding scheme might encode a story into the same story as told by another person quoting the story from memory (but not memorized verbatim), as opposed to a verbatim copy.
The idea is to reduce the number of bits required as much as possible. Let us imagine that somehow, we have at our disposal an optimal lossy encoding scheme. The encoded text might require only a few percent of the bits used in a Unicode representation. Normally, the encoded text would resemble a random string of bits. In fact it should be able to pass a test for randomness. The encoding method would undoubtedly involve various global steps such as defining the context, the time frame, the participants, and the nature of the text. Fragments of the text might already be in memory. For example, if we are encoding a fairy tale, the beginning of the encoded version might correspond to a more compressed version of the following: “fairy tale, s1s…” Given that it’s a fairy tale, s1s might stand for “standard 1st sentence fragment” meaning in this case “Once upon a time…”. As in a play, the characters and some things about them could be listed at the beginning along with the shorthand nicknames that would refer to them in the encoded version.
The reason that an optimally encoded story would look like a random sequence of bits is that any lack of randomness would usually mean a less than optimal encoding. When reconstructing the original from a lossy, optimally encoded story, the meaning of the bits in the middle of the story is totally dependent on information from earlier in the story. What is fascinating about a truly optimal story encoding scheme is that the decoding of every possible random sequence of input bits must yield a reasonable story! If not, then it wasn’t optimal encoding. Of course, “reasonable” allows for improbable. In any case, the decoded story would have continuity. In the middle of decoding a story, the next events would almost always have continuity with the prior events as a natural consequence of an optimal encoding for a person’s memories of various time intervals. The obvious thing is that the inputs to the optimal decoder are multiple. They include sensory inputs, and various sources of memories. This just makes sense in the design of a mental process that would help ensure the survival of any creature with a brain, such as a person or a dog.
The reason that the decoder needs to be awoken first whenever any sensory input occurs is obvious: The creature needs to take the sensory input to memory and retrieve information that might be vital to survival. If the sounds are those of a predator sneaking up, the information retrieved has to signal the creature to wake up and flee or fight. Thus there is good reason for the decoder to both control the level of consciousness and feed information to the higher levels of semi-consciousness or consciousness.
Now, let us assume that a person or a dog is asleep and in a state when a dream is likely and that the machinery for optimal decoding is operational. But, there is nothing to drive the input other than random noise combined with the following: some sensory input such as the distant sound of a dog barking and the sensation of being too warm. A dream starts up in a human. He is at a barbecue, it’s a hot sunny day, children are playing with a dog that is barking. The scene and events are familiar but evolve down random choices of reasonable paths; the human is having a dream. For the dog, her dream might be of being in an overheated house and, hearing the bark of the neighbor’s dog, wanting to run outside. The dog tries to run to see if the back door is open, and while dreaming, the dog’s legs actually go through abbreviated running motions and little muffled barking sounds are actually made as the dog tries, in her dream, to attract someone who might open the door.
The John Cocke Theory of Dreams was told to me, on the phone, late one night back in the early 1960’s. John’s complete description was contained in a very short conversation approximately as follows:
“Hey Ed. You know about optimal encoding, right?”
“Yup.”
“Say the way we remember things is using a lossy optimal encoding scheme; you’d get efficient use of memory, huh?”
“Uh huh.”
“Well the decoding could take into account recent memories and sensory inputs, like sounds being heard, right?”
“Sure!”
“Well, if when you’re asleep, the decoder is decoding random bits (digital noise) mixed in with a few sensory inputs and taking into account recent memories and stuff like that, the output of the decoder would be a dream; huh?”
I was stunned.
In this paper, we use “soul” to mean the common, current definition, and “soul” when we mean the new definition as given in this paper.