# manyagents.ai

In this article I draft a protocol for the emergence of communication in a collective of Bayesian agents $A$. A fundamental assumption is that agents can associate random binary strings (messages) with rewards from events in the environment and by chance eventually converge on a jargon1 under which those messages convey information. The objective is to (i) increase the convergence probability and to (ii) provide tools to analyze the system.

I shall refer to this protocol as c11np. The protocol consists of several conditioned procedures called reflexes over which the agents have no control. An agent $a_n$ communicates via channel $C^n$ with two handles. $C_{out}^n$ samples messages (e.g. output of NN) and relays them to recipients. $C_{in}^n$ allows the agent to observe incoming messages (e.g. input to NN). Let’s keep the term “message” somewhat loose and instead define what specifically can be transmitted over $C^n$ using different term.

## Words

A binary string $s$ of length $n$ with $|s^1|$ on bits (i.e. count of 1’s) is a word if and only if $|s^1| \in [w_{min}, w_{max}]$. During sampling $C_{out}$ discards any $s$ which isn’t a word. The cardinality of the set of all words is given by2:

$$|W| = \sum_{i=w_{min}}^{w_{max}} {n \choose i}$$

The probability that $s \in W$ is given by:

$$p=\frac{|W|}{2^n}$$

For example if $n = 32$, $w_{min} = 7$ and $w_{max} = 9$ then $|W| = 41932956$ and $p \approx 0.01$.

Larger $n$ demands that the $w_{min}$ and $w_{max}$ are increased as well in order to keep $p$ practical. Although in the following sections I define other incentives for communication, random search of possible messages is still the fundamental assumption. Nevertheless, a word is the smallest constituent and agents can compose them. An evaluation which compares different word lengths might not be illuminating as the results would be task and algorithm specific. I suspect that one-size-fit-all value such as in the example above is sufficient.

A common trait of MAS research is a continuous transmission of random messages during training. The sparsity interval defined above controls the initial churn.

## Immediate feedback

$C_{out}^n$ transmits $w$ to all handles in the set $C_{in}^A$. Since $a_n \in A$, $w$ is effectively immediately fed back into $C_{in}^n$. The immediate feedback reflex is biologically inspired and intuitively well suited for inference. Its role in c11np is to support convergence to common vocabulary by making the sender also a recipient. On the other end of the wire a related reflex is employed to reinforce patterns.

## Word repetition

Word repetition is a reflex which bypasses the agent and hijacks $C_{out}^n$ sampling procedure. We transmit received $w$ from $C_{in}^n$ immediately back to $C_{in}^A$. This reflex reinforces collective familiarity with specific $w$ and contributes to the channel activity. Channel active in situations where information exchange means more efficient reward collection will more likely nudge agents to opportunistically form vocabulary.

We can tune this reflex with respect to the task at hand with a binary decision function $\delta_r(\cdot)$. When $C_{in}^n$ receives $w$ it decides according to $\delta_r(\cdot)$ whether to repeat it. If $w$ was repeated unconditionally then the channel would saturate because of an exponential growth of total replicas $Z_w$. This is in fact the limiting factor which helps us evaluate suitable $\delta_r(\cdot)$. We model this system as a branching process34.

$|A|$ is the number of recipients of a single transmitted replica because of immediate feedback. Let $p_w^n$ be the probability that a word $w$ is repeated by agent $a_n$. Let $X$ be a random variable of number of replicas (offspring) of any $w$ in one generation, i.e. how many replicas of $w$ are produced by all agents together. In one generation, $w$ can be repeated up to $|A|$ times or not repeated at all, there are therefore $|A| + 1$ possible outcomes of $X$.

If $X$ is i.i.d, i.e. $\forall (i, j) : p_w^i = p_w^j$, then $X$ follows the binomial distribution $X \sim \operatorname{B}(|A|, p_w)$. Since $p_w \in ({0}, {1})$ and $|A| \ge 2$, the distribution is well approximated5 by Poisson distribution with $\lambda = |A| p_w$. Let $e^{\star}$ be the ultimate extinction probability of $w$ in the branching process. To avoid saturation we need $e^* = 1$. It has been shown6 that $e^* = 1$ for $\lambda < 1$.

\begin{align} |A| p_w &< 1 \notag \\ p_w &< |A|^{-1} \label{hom_decision_bounds} \end{align}

\eqref{hom_decision_bounds} gives us an upper bound on $p_w$. $Z_w$ can be estimated with the Borel’s distribution7 where $\mu = \lambda$. Therefore the homogeneous system can be modeled using known properties of the Borel’s distribution.

If $X$ is independent but non-identically distributed, i.e. $\exists (i, j) : p_w^i \neq p_w^j$, then $X$ follows the Poisson binomial distribution8 $X \sim \operatorname{PoisB}(|A|, p_w)$. Let’s model this with p.g.f for Galtonâ€“Watson process $\{e_t\}_{t=0}^\infty$9:

\begin{align} e_t &= G(e_{t - 1}) = P(z_t = 0 | z_0 = 1) \notag \\ \lim_{t \to \infty} e_t &= e^* \notag \\ G(e_{t + 1}) &= \sum_{i=0}^{|A|} p(x_i) e_t^i \notag \end{align}

We find10 the mean of $X$ (also known as the basic reproduction number $R_0$). The larger $E(X)$ is the longer $w$ lingers on the channel.

$$E(X) = G’(1) = \sum_{i=1}^{|A|} i p(x_i)$$

\begin{align}\label{het_decision_bounds} \begin{cases} e^{*} = 1, &\text{if } E(X) < 1 \\ e^{*} < 1, &\text{otherwise} \end{cases} \end{align}

We show that bound \eqref{het_decision_bounds} is a generalized form of \eqref{hom_decision_bounds} by leveraging the fact that $\operatorname{B}(\cdot)$ is a special case of $\operatorname{PoisB}(\cdot)$11:

\begin{align*} \lambda &= \sum*{i=1}^{|A|} i {|A| \choose i} (p_w)^i (1 - p_w)^{|A| - i} \\ |A| p_w &= E*{\operatorname{B}(|A|, p_w)}(Z_1) \\ |A| p_w &= |A| p_w \end{align*}

The channel won’t become permanently saturated with $w$ as long as $\delta_r(\cdot)$ makes decisions which satisfy \eqref{het_decision_bounds}. Branching processes are often useful models for the word repetition reflex. However, while in principle one could always use $\delta_r(\cdot)$ which makes decision stochastically approximated before the system evolves, there are interesting $\delta_r(\cdot)$ for which some of the assumptions don’t hold (e.g. replication independence or knowing $|A|$ in advance.)

There are other practical ways of avoiding saturation such as runtime probabilistic counting with count-min sketch and similar algorithms.

## Sensory deprivation

It seems self-evident that saturating with noise or withdrawing an agent’s access to observations degrades its performance. Sensory deprivation affects reward without the need to account for c11np in the Bellman equations.

## Seeding

With the tools above we approximate how many new words are discovered and then model their lifetime in the system. We can adjust the rate of new words entering the system by publishing them artificially. This can be tuned with respect to the events in the environment.

## To be continued…

After I completely describe c11np it remains to be experimentally tested. The objective is to bootstrap communication and fade out the reflexes as agents converge to a jargon. Therefore the reflexes focus on keeping channel active and giving the designer control.