Mohammad Rahmani – @rahmanixyz

Note: This article is supposed to explain semantic-awareness in artificial agents using non technical terminology

Self-awareness is the ability by which an agent, either biological or artificial, understands it is experiencing something new and builds and memorizes models out of it. These models should be both generative and discriminative. Generative models help the agent to predict the future state of itself and the environment in which it is acting according to what it is experiencing now (observation) . On the other hand, discriminative models help the agent to match the current experience with the existing models in its memory.

Taking live observations as evidences of different agent/environmental states, then Bayesian inference (inference in this case means both prediction(generative nature) and discrimination) is the right mathematical tool to model it since it is built upon Bayes theorem which in its simplest form assigns probability values to states according to existing evidences (such as a doctor who observes the symptoms and according to his experience tries to figure out the most probable disease related to those symptoms). This brings the course of this post to introduction of self-awareness as a major contributor to one of the five major schools in AI:

Bayesians who believe in a self-aware approach modeled by Bayesian inference, in which the self-aware agent takes the observations as evidences and compares them to it’s existing models in order to decide to either create a new Bayesian model or update the existing models.
Symbolists who believe that knowledge could be obtained by operating on symbols and deriving rules from them.
Connectionists who believe in solving problems by breaking them into small pieces such that each piece can be solved by an artificial neural network similar to the functionality of brain
Evolutionaries who believe in evolution rules which is based upon the survival of the fittest solution
Analogyzers who believe in using kernel machines to recognize patterns in data

Bayesians, taking advantage of the most general definition given at the beginning of this post, determine the following particularities that an agent should incorporate to be considered as a self-aware agent:

An initial/reference knowledge (generative/discriminative model) about itself and the environment in which it is acting. With the generative nature of this knowledge, it can predict the future states and with it’s discriminative nature, it can compare the new experiences with the existing experiences and if necessary extract the difference to use it for further learning/model creation (Initialization)
Having the ability to discriminate between a new experience in comparison to previous experiences (Abnormality detection)
Building a model from the data derived from the difference between the new experience and the previous experiences (Model creation)
Having the ability to store and retrieve models (Memorization)
Being able to establish a relation between the observation to a decision (Decision making/acting though actuators, in case of intelligent robots).

As it is obvious, the most initial step in understanding that a new experience is taking place is through the data perception. Data perception is possible by sensors. Sensors can perceive the data both from inside an agent as well as from outside (proprioceptive and exteroceptive, respectively). A human can see what is happening outside through her eyes (exteroceptive sensor). She can also perceive her balance by her cochlea (proprioceptive sensor). Similarly, an Autonomous Vehicle (AV) can perceive what is happening outside by an external camera (exteroceptive) while it can perceive its speed by a speedometer (proprioceptive).

An example of Bayesian inference

Imagine an AV is aware that with constant speed it can expect to move equal distances per equal time units. This can be considered as the initial knowledge. With this initial knowledge it can predict the amount of time it takes to move from one point to another. But in a real experience it notices that it took longer than what was predicted by its generative model (anomaly detection). It refers back to both its proprioceptive and exteroceptive sensory data to find what exteroceptive piece of observation data caused such proprioceptive data observation (causal modeling using Bayesian inference). The exteroceptive observation could be taken as the evidence which contributed to some proprioceptive state (passive-self approach). As I said earlier, Bayesian inference is exactly the kind of statistical model which applies to this case. The whole thing could be considered in reverse. That is, internally perceived data could be considered as the evidence to the advent of a certain set of external state (active-self approach). A passive-self approach is used to model the effect of external environment on the agent (a slope/observation causes changes in speed/state) while the active-self approach is used to model the effect of an agent’s action on the future states of the environment (power increase/observation causes compensation of speed loss in a slope). As such, causal effect of perceived interoceptive and exteroceptive data and states could be modeled. But what about modeling the evolution of such causal effect over the course of time?

Contextual, temporal data and dispositional , dynamic Bayesian inference

Exteroceptive and proprioceptive data (E/P data) in an AV can be put together contextually, i.e. two pieces of exteroceptive data surround one piece of exteroceptive data or vice versa (Passive and active self modeling) to form states or observations and then a model can learn the evolution of such states over the course of time. Such models, are known as dynamic models, (dynamism stands for evolution over time). Dynamic Bayesian Models (DBNs) are examples of such models. As such, a temporal-causal model for the evolution of an agent’s states is available. Contextualized data could be modeled in more compact, but more complex format. In such kind of data presentation, each piece of data, hierarchically, contain another piece of data which continues until the leaves with the aforementioned contextualized data format. Hierarchical dynamic Bayesian models offer a way to model the evolution of such kind of data over the course of time.

From perception to semantics

The nature of data which is received by an agent such as a human or a robot is continuous and is continuously changing over the course of the time. Studying continuous evolution of data is more difficult than discreet data. It is both computationally complex and less understandable since we can assign labels (names in natural language) to discrete entities while this is not possible for continuous data. Moreover, many competent mathematical models exist for discreet state dynamical systems which are mostly derived from an area of study in statistics which is called stochastic processes, particularly derived from Markov stochastic processes.

Many solutions have been proposed to segment the perceived data over the course of time to meaningful pieces and then study the evolution of those meaningful states over time with aforementioned models. For example the movement of an AV which is moving clockwise along the perimeter of a rectangle could be described by eight consequent members of a sequence of two states (four times turning right and four times going straight). As such, the agent can model what it expects over the course of time, i.e it can predict after each straight route, there is a right turn, using a discrete but meaningful evolution of the states by taking advantage of dynamic models such as those discussed in previous section.

Now if a pedestrian interrupts the perimeter, then the AV should learn to perform an extra turn right and then left in row to avoid collision. The normal/reference task for which the agent was trained was to follow up a right turning after some straight movement. But a new experience, collision avoidance, should be learned by the AV using previously introduced semantics(opposite turning in a row). In the reference task, a right turn state, means the AV has finished a straight path and is going to experience the next straight path which is vertical to the previous one. But in the new version, a right turn may also mean a consequent left turn and then following the same straight path. As such, turning right, for this AV, from the time it learns this new experience has two meanings and the exact meaning is determined when we (and perhaps other artificial agents in the system) know the context (previous and next actions) in which it has happened. This is similar to the foundation of a science which is called distributional semantics in human’s natural language.

Emergence of semantic-awareness

Imagine a group of AVs who can dispatch their contextual actions to each other and are able to evaluate each others actions according to their previous experiences and the context in which they have happened. Such group, due to their mutual semantic understanding, forms a society of agents. As I said in the beginning of this post, if a self-aware intelligent agent is expected to build generative and discriminate models out of new experiences, then, a society of such agents is expected to build similar models for their collective behavior by such semantic understanding ability. For example, when a society of AVs has reached to a level of semantic-awareness in which each AV can distinguish between turning left to come out of park and the one for taking over another AV, then in the first case, it is expected that the front AV moves a bit forward to open more space while in the second case the AV is expected to keep its speed constant or reduce it to make the take over more comfortable. In such society, a better traffic circulation is expected.

Future expectations

For human natural language, companies such as Google have introduced solutions such Word2Vec or BERT to evaluate the semantic from a distributional point of view in which meaning of different parts of natural language such as words depend on the context in which they appears, but in the case of dynamic systems (such as robots), despite the very primitive efforts which are being taken, there still is a long way left to explore not only in building such understanding of semantic into intelligent agents, but also offering a comprehensive framework which corresponds the aforementioned understanding into appropriate actions.