Hierarchical Sequence Memory

Author: Oleg A.Serebrennikov

This work is relied on the concept of mutual occurrence of objects in sequences as manifestations of the laws of the outer world that allow objects to be followed only in the order that is predetermined by these laws, as well as on Jerome Lettvin hypothesis of encoding objects by individual neurons of cerebral cortex - the "grandmother's cells".

In the framework of the conducted research, a mathematical model for hierarchical memory of unnumbered sequences was suggested and studied. The architecture of a new generation of neural network as well as designs of artificial neurons for strong artificial intelligence based on the hierarchical sequence memory approach, were proposed and considered. The conducted study shows that such a neural network can not only learn but also retrieve unnumbered (not indexed) sequences from memory, it is also capable of predicting the appearance of following objects, which corresponds to the "extended Turing test" first formulated by Jeff Hawkins. The work also showed that the process of generating predictions is caused by interference resembling the neuron excitation waves interference observed by researchers in the cerebral cortex. In the work, a change in the context of sequences was studied, and a model for cyclic compression of the original sequence to a sequence of contexts as a sequence of the next level in the sequence memory hierarchy was proposed and studied. The designed neural network is capable for simultaneous learning, storing and retrieving sequences of objects of any nature (audible, visible, tactile and any other sensory data sequences), linking via synchronization such sequences with each other in any measure - in time, space or in any other grading scale, including emotional and ethical. The synchronization of consecutive contexts by approximate formation point and length in a particular grading scale allows merging sequences of different nature with each other (i.e. associating a visual sequence of cat's appearance with the audible sequence of the cat meowing). The work also suggests a solution for monitoring and controlling the state of consciousness of robots and its deviations, providing to enforce robots to follow set emotional and ethical standards.

The following is a brief introduction to the study.

The sequence memory

The concept

The brain is a black box that receives sequences of sensory data corresponding to sequences of objects and events observed in the external world, while the laws of the external world are manifested in the frequency and order of joint appearance of such objects and events in sequences. Higher frequency of co-occurrences stands for relatively higher probability of appearance together in future while order stands for causal relation - the cause precedes the effect.

Thus, the sequence memory is a model of the manifestations of the regularities of sequence construction, which takes into account both the joint appearance of individual objects in the sequences and the direction of such occurrence - cause-effect relationships between the objects. After learning from the examples of observed sequences, the sequence memory must be able to predict the possible continuation of the sequence based on the known part of this sequence and on the statistics of the occurrence of objects in previously observed sequences.

The starting task of the research was to develop such structure of sequence memory that allows the frequency and order enforce predictions.

The sequence memory model

The brain is able to work with sequences of any kind hence we can examine the operation of sequence memory using text sequences as an example.

As a starting point let's remember the Jerome Lettvin's hypothesis about coding of individual objects of reality by individual neurons of the cortex (the so-called "grandmother's cells"). According to the hypothesis various reality objects are represented by separate "grandmother cells" - cortical neurons, and thus the occurrence of one particular object with other objects should be represented by dendrites and axons that connect a specific "grandmother cell" with any other "grandmother cell" encoding another specific object. The weight of the joint occurrence of such a pair of objects in the sequences will correspond to the "thickness" of the dendrite or axon connecting these "grandmother's cells", depending on the direction of occurrence: direct or reverse.

Let's now build the "grandmother's cell" model for the word "tiger". To do this, let's place the word "TIGER" in the center (the focal object), then we will find all sentences containing the word "tiger" and write these sentences through the word "tiger" with some angular displacement so that the sentences form a "circle":

Figure 1
As you can see, the sequences going through the word "tiger" resembles the structure of a neuron. Since dendrites play the role of "inputs" of a neuron, and axons play the role of "exits", the connections of the word "tiger" with all the words preceding the word "tiger" will play the role of dendrites, and the role of axons will play connections with all words that follow after the "tiger" word. Considering the model of connections built for word "tiger", we will soon discover that words such as "cat" or "forest" or "striped orange fur" will occur more often than other words and so the weight of joint occurrence of "tiger" with such frequent words will be higher than with others. This means that the word "tiger" is associated with them more than with other words of the language. Returning to the manifestation of causal relationships in the observed sequences (i.e. visual, audio etc), it can be argued that, for example, the attack of a tiger on a victim in the observed sequences will always be preceded by the appearance of the victim and the tiger.
Unique representation of objects in sequence memory

If each word (a frequent word) of the language is assigned a weight of joint appearance in sequences with a focal word, then the set of weights for all unique frequent words would uniquely identify the focal word. The sign "+" or "-" before weight may indicate the direction of the connection, where "+" means the future (the frequent word is located after the focal word) and the "-" means the past (the frequent word is located before the focal word). Hence each focal word CF in a sequence memory can now be uniquely represented and identified by a cluster (past and future clusters) of weights ωi , where each weight ωi is related to a particular unique "frequent" word Ci of the group of frequent words.

As we can see, the sequence memory model has an advantage over the perceptron model (an artificial neuron of modern neural networks), since sequence memory consist of identified "grandmother cells" and the neural link weights in the sequence memory model correspond to the statistical weight of co-occurrences ensuring "reverse compliance" to the reality - the ability to restore the sequence using the memory. In contrast to sequence memory in the perceptron learning model there is no identified neurons encoding particular object or action and neural link weights are assigned by back propagation method, thus the neural weights have no obvious relation to the reality, not guaranteeing the reverse compliance to reality and so not guaranteeing the complete predictability of the results of the neural network performance therefore.

Needless to add that the creation of a text sequence memory as a statistical model of the application of rules of composing text sentences, would allow to correctly compose sentences and predict the possible continuation for them. Modeling a real world felt by human senses, we would need to have to build separate segment of sequence memory for each sense available for a human and synchronize the segments between each other.

The complexity of analysis in sequence memory

To study the weights of the mutual occurrence of each pair of words in a language, you can use the table (N * N) = N2 where N is the number of all unique words in the language. On the diagonal of such a table there will be the weights of mutual occurrence in sequences of the corresponding word with itself, and in the cells symmetrical with respect to the diagonal there will be the weights of the direct ω1 and inverse ω2 mutual occurrences of each pair of words in the language. Obviously, the ratio ω1 / ω2 would determine the inversion of mutual occurrence, and as greater or as smaller in comparison to 1 the inversion value is, as greater the probability that one of the words precedes the other one in sequences. This can be considered a causal relationship for the rules for composing sentences of the considered language.

If we wanted to study the relationship in each group of three unique words for all words of the language, we would need a cube of (N * N * N) = N3, and so on, thus the complexity of studying the mutual relationships in each group of R-words for all N words of the language would be proportional to N in the degree R that is NR.

By contrast the computational complexity of studying the relationships in each group of R-words for all N words of the language in sequence memory is linear to R.

Full and rank clusters of focal object

Let's now truncate all the sentences (Figure 1) going through the focal word, so that before and after the focal word there only R words of each sentence are left: -R before and +R after the focal word (Figure 2).
Figure 2
As we can see in Figure 2, we've got two semicircles of radius R - a semicircle of the "past" (-R) and a semicircle of the "future" (+R). The presented circle is a pie chart for a set of linear same size text segments, each containing focal word "tiger" in its centre (Figure 1). Swapping to a 3D representation, we can speak of a sphere consisting of two hemispheres - the hemisphere of the "past" and the hemisphere of the "future" (US9679002B2). Let's call the set of weights of frequent words contained in the hemisphere of the "past" (Ktiger,-R) as R-cluster of the "past" , and accordingly the set of weights of frequent words included in the hemisphere of the "future" (Ktiger,R) as a R-cluster of the "future". Each focal object's full R-cluster of future/past contains the weights of joint occurrences of the focal object with the appropriate unique frequent objects of "future" or "past", located within R-sphere.

Obviously each R-cluster KC,R for any focal object CF can be represented as a sum of all rank clusters of object CF with rank from 1 to R:
, where each i - rank cluster includes only those weights from the R-cluster that belong to the frequent objects i - equidistant from the focal object, i.e. those separated from the focal object by (i - 1) objects of a sequence.

The state of consciousness

Let's now define 1st rank cluster for each unique object CF of all N unique objects of sequences, put all 1st rank clusters in a set of N clusters and refer the resulting set further as a "base set". It does not matter if the base set consist of 1strank clusters of only past or of only future because both base sets would differ only in sign.

It is easy to show now that any R-cluster of both a future KC,R and a past KC,-R for any unique object CF, can be represented as a composition of the 1st rank clusters from the base set.

Let's now force robot's mind to learn harmful behaviour aiming to define signature subsets of the base set corresponding to each harmful behaviour that the AI had been learned. This opens a way for predicting the potentially harmful changes in robot's consciousness allowing to completely control its mind and reset the robot's mind to the "safe" state of consciousness at any given time if signature of harmful learning was discovered.

Prediction technique

The technique of interference of rank clusters is an analog of the interference of excitation waves in the cerebral cortex, which allows us to model the wave processes of thinking in the cerebral cortex.

It is not difficult to create a method for predicting the appearance of new objects of a sequence using the interference of coherent i-clusters of objects from the known part of sequence. The interference of parallel clusters allows us to find parallel meanings, and so on, the rank clusters' analysis allows us to do so much more.

The hierarchical sequence memory

The research examines the oscillations of overall weight of frequent objects while entering sequence, and shows that the maximum of the total weight coincides with a moment when the current context of the sequence change. These successive total weight peaks form a sequence representing next level of sequence memory. Creating an artificial unique object and assigning to it a cluster corresponding to a peak allows to jump to a sequence of artificial objects representing next tier of abstract in sequence memory, thus forming complex hierarchy of knowledge in the sequence memory one tier after another.

Back to cerebral cortex

Conducting an analogy between the proposed model of sequence memory and the cerebral cortex, it can be suggested that the objects of the original sequences may be encoded by the "grandmother's cells" located in the inner granular layer, while said artificial objects may be encoded by the cells of the outer granular layer. Each pyramidal neuron in the outer layer of pyramidal neurons, may presumably play a role of a weight adder for a particular set of "grandmother's cells" located in the inner granular layer and if activated it sends spike to a particular neuron located in the outer granular layer. A column of neurons consisting of said two interconnected neurons of the outer pyramidal layer and the outer granular layer may represent the "silent cells" (named so by Vyacheslav Shvyrkov in 1986) connected to a cluster of "grandmother's cells" located in the inner granular layer. This pair of "silent cells" is activated only if the total activity of the "grandmother cells" cluster exceeds the pyramidal neuron adder' activation threshold.

Neuromorphic chip design

As part of the research and development, original architecture of neurochip for hierarchical sequence memory has been drafted, the energy efficiency of which is expected to exceed the existing neurochips by orders of magnitude. The chip can work simultaneously with sequences of objects of different nature and is capable of linking of all incoming sequences independent of their nature through synchronization of measures including but not limited by emotional, ethical, linear and angular measures, time, field strength etc. Due to this, the chip can store and retrieve the sequences of any nature that have comparable rounded length and are synchronized in at least one measure.

The findings

At any time, the base set represents the "state of consciousness" of the sequence memory and, therefore, it is possible to control deviations of the "state of consciousness" from the reference one and correct the consciousness of robots designed upon using the sequence memory approach.

Conducting operations of set theory and linear algebra over rank clusters makes it possible to predict the appearance of new objects in sequences, define synonymy, and more.

The provided presentation of the base set ultimately confirms the correctness of the hypothesis about the so-called "grandmother's cells" (ref. "Bill Clinton's neuron", "Jennifer Aniston's neuron"). It also proves that "grandmother cells" are capable of forming sequence memory in the cerebral cortex.

The weights of joint appearance of pairs of unique objects in sequences have statistical nature and should represent the weights of neural links of artificial neural networks. Backpropagation technique, widely used for defining weights of neural links in existing neural networks, including convolutional, cannot support "grandmother cells" encoding for building sequence memory, hence existing neural networks may not perform as anticipated in every particular case.

The process of producing conclusions seems to be a process of searching for causal relationships at different levels of the sequence memory hierarchy.

The process of teaching the sequence memory is the process of updating the weights of the mutual occurrences of each of the pairs of unique objects (neurons), and therefore the sequence memory is kind of unlimited, although it is subject to the effect of saturation, which people tend to refer to as "wisdom."

Studying the mutual relationship in each subset of unique objects of all unique objects of sequences using traditional methods, may have calculation complexity proportional to NR, while the sequence memory affords a calculation complexity proportional to R.

The findings also suggests that the current design of artificial neuron of fully-connected layers might be incomplete by missing a back coupling of neuron to the previous layer of neurons in hierarchy of sequence memory.
Related works

[1] "On Intelligence", Jeff Hawkins & Sandra Blakeslee, ISBN 0-8050-7456-2

[2] "Hierarchical Temporal Memory", Jeff Hawkins & Dileep George, Numenta Corp.

[3] Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation (https://arxiv.org/abs/1611.04558 и русский вариант https://m.geektimes.ru/post/282976/)

[4] «Why your brain has a Jennifer Aniston cell'»https://www.newscientist.com/article/dn7567-why-your-brain-has-a-jennifer-aniston-cell/

[5] «Compression and Reflection of Visually Evoked Cortical Waves» (https://www.researchgate.net/publication/6226590_Compression_and_Reflection_of_Visually_Evoked_Cortical_Waves)

[6] Christopher Olah, 2014, (http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/)

[7] Luong et al. (2013) (https://nlp.stanford.edu/~lmthang/data/papers/conll13_morpho.pdf)


[9] Алексей Редозубов, «Логика эмоций»http://www.aboutbrain.ru/wp-content/plugins/download-monitor/download.php?id=6

About author
Oleg Serebrennikov (https://www.linkedin.com/in/serebrennikov/), serial entrepreneur and inventor in fintech, internet, AI.

The related early patents are US9679002B2, RU2459242 and new PCT patent application is pending

The Hierarchical Sequence Memory is a game-changing approach in the development of next gen neural networks and AI that opens stunning opportunities for creation of General AI. If you are excited about it as much as I am and see your role in this development as a professional team member or investor please do not hesitate to contact me with your suggestions and inquiries. Thank you.
Made on