Entropy as a Quality Measure of Correlations between n Information
Sources in Multi-agent Systems
G. Enee
1
and J. Collonge
2
1
ISEA - EA 7484, UNC, Campus de Nouville, Noumea, New Caledonia
2
Atout Plus Groupe, Noumea, New Caledonia
Keywords:
Entropy, Multi-agent Systems, Agent Communication Languages.
Abstract:
Shanon’s entropy has been widely used through different Science fields, as an example, to measure the quantity
of information found in a message coming from a source. In real world applications, we need to measure the
quality of several crossed information sources. In the specific case of language creation within multi-agent
systems, we need to measure the correlation between words and their meanings to evaluate the quality of
that language. When sources of information are numerous, we are willing to make correlations between
those differents sources. Considering those n sources of information are put together in a matrix having n
dimensions, we propose in this paper to extend Shanon’s entropy to measure information quality in R
2+
and
then in R
n+
.
1 INTRODUCTION
Entropy introduced by (Shanon C.E. and al., 1949)
can be used to measure uncertainty or randomness
in a flux coming from a source. The more a source
is uncertain, the more it brings novelty and thus the
highest is the measure of entropy. On the opposite,
the more a source repeats the same pattern, the less it
brings new information and the lower is the entropy
of such a source. The main focus of our article is
to adapt entropy to multiple sources of information.
Thus we will first describe the measure itself as it was
presented by (Shanon C.E. and al., 1949). Then we
will demonstrate how to deal with two sources of in-
formation. To enhance interpretation, we propose a
new measure of information quality that will be sus-
tained by an example of the emergence of a language
within a multi-agent system. In the fourth part, we
generalize the entropy measure to n sources of infor-
mation and finally we conclude our work.
2 ENTROPY TO MEASURE
UNCERTAINTY
Information theory and thus entropy has been used in
computer science mainly to optimize the transmission
of data through a medium of communication. En-
tropy gives precise bit size to use to transmit a par-
ticular serie of data. It also measures the uncertainty
in a flux coming from a source giving an evaluation
of transmission error. We will focus here on the de-
scription of the measure itself applied to information
transmission. We will shortly describe the behavior
of the measure while data bring uncertainty or not.
2.1 Entropy for T , a Transmitted
Message
Let’s M be the 1 dimension matrix describing the
transmitted message T . The message contains n dif-
ferent values. Each of the n boxes of matrix M is
filled with the number of times each symbol of T ap-
pears. Since the transmitted message is one informa-
tion source, the matrix M is mono-dimensional too.
Thus p
i
represents the probability of having the i
th
particular symbol among n others and is calculated as
follow:
p
i
=
M
i
j
M
j
(2.1)
Thanks to p
i
, we are now able to measure the
quantity of uncertainty in the transmitted message T :
H =
i
p
i
× log
2
(p
i
) (2.2)
Finally that measure brings useful information
about the transmitted message:
Enee, G. and Collonge, J.
Entropy as a Quality Measure of Correlations between n Information Sources in Multi-agent Systems.
DOI: 10.5220/0007684802810287
In Proceedings of the 11th International Conference on Agents and Artificial Intelligence (ICAART 2019), pages 281-287
ISBN: 978-989-758-350-6
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
281
the least number of bits needed to transmit that
message M upon a perfect medium of communi-
cation.
the uncertainty level in transmitted message T .
In fact, entropy can be easily bounded in order to eval-
uate the distance of the result to maximal uncertainty
or to maximal certainty.
2.2 Bounding Entropy
The measure of entropy is naturally bounded . Thus
uncertainty will be maximum when entropy is maxi-
mal too. Maximum uncertainty happens when every
p
i
reaches uniformity i.e. when i M
i
= c and thus
p
i
=
c
c×n
=
1
n
. Entropy will then be:
H
uni f ormity
=
n
i=1
1
n
× log
2
(
1
n
)
= n ×
1
n
× log
2
(
1
n
)
= (log
2
(1) log
2
(n))
= log
2
(n) (2.3)
While uncertainty is maximum when H reaches
H
uni f ormity
value, certainty will be maximal when only
one p
i
has a value i.e. when p
i
= 1 or equivalently i f
i, M
i
=
c
c
= 1, j 6= i M
j
= 0. That situation de-
scribes a message containing a repeated sequence of
the same symbol while n different symbols were ex-
pected. Thus message exists but finaly brings no in-
formation. Entropy will be:
1
H
certainty
= 1 × log
2
(1) (n 1) × 0 ×log
2
(0)
= 0 (2.4)
So H will tend to reach 0 while message brings
certainty.
Now that we have described how to measure un-
certainty with a one-dimension source of information
that is a transmitted message, let’s see how we can
deal with two sources of information.
3 ENTROPY IN r
2+
Entropy can be useful when we deal with two differ-
ent sources of information and we want to demon-
strate correlations between those sources. To picture
what we are dealing with, we propose here to sustain
our demonstration with the formation of a lexicon ma-
trix(MacLennan B.J. and al., 1994) that emerges from
1
lim
x0
x × log
2
(x) = lim
x+
log
2
(x)
x
= 0
agents or group of agents communicating. That ma-
trix, we call M, contains on one hand the word used to
communicate and on the other hand, the meaning of
the word when it is used. Thus we are willing to show
if in a lexicon matrix each word has a unique mean-
ing or not. First let’s describe how to adapt entropy
mesure to two sources of information.
3.1 Entropy with Two Sources of
Information
When we deal with two sources of information, the
transmitted information T
1
indicates for each value
it can take, the direct correlation with information
source T
2
in the matrix M. Thus two dimensional ma-
trix M measures the quantity of correspondence (i.e.
correlation) between those two sources of informa-
tion. Entropy will help us to measure quality of the
lexicon, i.e. Level of certainty.
Let’s suppose that source of information T
1
pro-
duces n different values and that source of informa-
tion T
2
produces m different values: the matrix M will
then be of size (n, m) to capture any correlation be-
tween T
1
and T
2
.
To measure the reality of a correlation between the
two sources of information, we must adapt the calcu-
lus of p
2
:
p
i j
=
M
i j
k
M
k j
+
l
M
il
M
i j
(3.1)
Entropy is evaluated the same way:
H =
i j
p
i j
× log
2
(p
i j
) (3.2)
From discussion started in 2.2, we can evaluate
maximal entropy to occur when every p
i j
has the
same value, i.e. when i, j M
i j
= c. As a consequence,
H
uni f ormity
will be:
2
p is the weight of M
i j
compared to all values in the
same column and in the same line.
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
282
H
uni f ormity
=
i j
c
c × (n + m 1)
×log
2
(
c
c × (n + m 1)
)
= (n × m)×
1
n + m 1
×log
2
(
1
n + m 1
)
=
n × m
(n + m 1)
×(log
2
(1) log
2
(n + m 1))
=
n × m
(n + m 1)
log
2
(n + m 1)(3.3)
Thanks to the choice of p
i j
we affirm that entropy
will be minimal when there is only one positive value
of p
i j
for each row and each column in matrix M.
Thus we can only have q such p
i j
considering that
q = min(m, n). As a consequence H
certainty
will be:
H
certainty
= q × 1 × log
2
(1)
(n × m q) × 0 × log
2
(0)
= q × log
2
(1)
= 0 (3.4)
Note that from a technical point of view, the max-
imal entropy is given by
H
max
=
n × m
exp(1)× ln(2)
(3.5)
However it occurs if and only if p
i j
=
1
e
(all i, j),
which in impossible in our setting since the n ×m ele-
ments M
i j
are integers. For the reader’s convenience,
let us show the real
n×m
exp(1)×ln (2)
is our entropy function
upper bound.
Let n, m N \ {0}.
Since the R-vector space of matrix of size n × m
and the set
R
+
\ {0}
n×m
are isomorph, the mappig
H defined above by H(P) =
n
i=1
m
j=1
p
i j
log
2
(p
i j
)
for any matrix P = (p
i j
) M
n,m
(R
+
\ {0}) can be
identified to the mapping
˜
H :
R
+
\ {0}
n×m
R,
x 7→
n×m
k=1
x
k
ln(x
k
)
ln(2)
where x
k
= p
r,k(r1)m
for each k
n
(r 1)m +
1, (r 1)m +2, . . . , rm
o
when r {1, 2, . . . , n} and p
·,·
has been introduced above.
This is the reason why we note H instead of
˜
H
hereafter. We recall that any proper real-valued func-
tion which is coercive and strictly concave admits a
unique global maximizer.
On one hand, it is clear that the mapping H is
proper (the set
n
x
R
+
\ {0}
n×m
H(x) >
o
is non-empty) and coercive (lim
||x||→+
H(x) = ,
where || · || is an arbitrary norm on R
n×m
).
On the other hand, the Hessian matrix
2
H of H is
the (n × m) × (n × m) negative definite matrix whose
components are given for allx
R
+
\ {0}
n×m
by
2
H(x)
k,k
=
1
ln(2)x
k
and
2
H(x)
k,k
0
= 0 for
k 6= k
0
, which prove H strictly concave on
R
+
\
{0}
n×m
.
To conclude the proof we compute
H :
R
+
\ {0}
n×m
R
+
\ {0}
n×m
,
x 7→ H(x) =
1
ln(2)
ln(x
1
) + 1,. . . , ln(x
nm
) + 1
and we apply Fermat’s rule for concave function. We
get
H(x)
n × m
exp(1)× ln(2)
for all x
R
+
\ {0}
n×m
.
(3.6)
We can observe
3
that if (n + m 1) = e:
H
uni f ormity
=
n × m
(n + m 1)
log
2
(n + m 1)
=
n × m
ln(2)
×
ln(e)
e
=
n × m
e × ln(2)
= H
max
Now that we have bounded entropy measure deal-
ing with two different sources of information, let’s
show through an example how it can be efficiently
used.
3.2 Application to Lexicon Quality
Evaluation
As described in the introduction, we will now focus
our attention upon an example (Enee and al., 2002) to
sustain our demonstration. As a first step to study a
language structure, we can fill a lexicon matrix that
3
Which is impossible since n and m are integers.
Entropy as a Quality Measure of Correlations between n Information Sources in Multi-agent Systems
283
Table 1: Matrix of a perfect language.
Word (T
1
)\Meaning(T
2
) 1 2 3 4 5 6 7 8
1 c 0 0 0 0 0 0 0
2 0 c 0 0 0 0 0 0
3 0 0 c 0 0 0 0 0
4 0 0 0 c 0 0 0 0
5 0 0 0 0 c 0 0 0
6 0 0 0 0 0 c 0 0
7 0 0 0 0 0 0 c 0
8 0 0 0 0 0 0 0 c
will indicate for each word i.e. T
1
, their meaning
i.e. T
2
. Thus, each time a word i is used, we add
one in the matrix to the corresponding meaning j:
M
i j
= M
i j
+ 1. While original entropy can capture
the redondancy of words or of meanings, it won’t be
able to capture if a language is well shaped. Let’s de-
scribe a simple lexicon composed with 8 words and
8 meanings. For comprehension matter, the matrix M
will be diagonally filled and meanings or words will
be symbolized by numbers. Thus a perfect language
will have a matrix looking like table 1.
As a purpose of simplification, we consider that
we have the same value c for each unique word /
meaning correspondance. If we compare the origi-
nal measure of entropy H
origins
(cf. equation 2.1) and
the new calculus H
new
(cf. equation 3.1), we will find:
H
origins
=
i j
p
i j
log
2
(p
i j
)
= 8 ×
c
8 × c
log
2
(
c
8 × c
)
56 ×
0
8 × c
log
2
(
0
8 × c
)
= log
2
(
1
8
)
= log
2
(8)
= 3 × log
2
(2)
= 3 (3.7)
H
new
=
i j
p
i j
log
2
(p
i j
)
= 8 ×
c
c
log
2
(
c
c
) 56 ×
0
2 × c
log
2
(
0
2 × c
)
= 8log
2
(1)
= 0 (3.8)
It is obvious that the original measure is unable
to take into account the two dimensional aspect of a
lexicon formation. It indicates that matrix contains
uncertainty while new measure describes the matrix
as perfectly weighted.
There exists another matrix configuration where
H
origins
offers confusing results (see table 2).
The two measures will then be:
H
origins
=
i j
p
i j
log
2
(p
i j
)
= 8 ×
c
8 × c
log
2
(
c
8 × c
)
56 ×
0
8 × c
log
2
(
0
8 × c
)
= log
2
(
1
8
)
= log
2
(8)
= 3 × log
2
(2)
= 3 (3.9)
H
new
=
i j
p
i j
log
2
(p
i j
)
= 8 ×
c
8 × c
log
2
(
c
8 × c
) 56 ×
0
c
log
2
(
0
c
)
= log
2
(
1
8
)
= 3 (3.10)
Equation 3.10 and equation 3.9 shows the same re-
sults. H
new
proves this matrix is confusing regarding
language understanding, while H
origins
indicates that
this matrix is as confusing as the perfectly weighted
one. We conclude that changing p
i j
calculus in en-
tropy is the key to measure correlations between two
sources of information since the new calculus takes
into account the two-dimensional aspect of the data.
As entropy is now well understood in R
2+
, we
propose to introduce a new way to measure correla-
tions between different sources of information thanks
to entropy.
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
284
Table 2: Matrix of a fully confusing language.
Word (T
1
)\Meaning(T
2
) 1 2 3 4 5 6 7 8
1 c 0 0 0 0 0 0 0
2 c 0 0 0 0 0 0 0
3 c 0 0 0 0 0 0 0
4 c 0 0 0 0 0 0 0
5 c 0 0 0 0 0 0 0
6 c 0 0 0 0 0 0 0
7 c 0 0 0 0 0 0 0
8 c 0 0 0 0 0 0 0
4 INTRODUCING A MEASURE
OF QUALITY
The maximal value of entropy is named H
uni f ormity
and the minimal value is named H
certainty
. Equation
3.4 proves that H
certainty
always equals 0 in R
2+
. We
propose to introduce a new simple calculus of the lin-
ear distance between H
calculated
to ideal matrix called
dH:
dH =
H
calculated
H
uni f ormity
H
certainty
=
H
calculated
H
uni f ormity
(4.1)
Maximal value of dH is therefore normalized to
1since H
certainty
H
calculated
H
uni f ormity
.
We can now evaluate the quality of an entropical
matrix by introducing Q
H
, a percentage of quality of
the matrix for the measured entropy:
Q
H
= (1 dH) × 100 (4.2)
Q
H
is 0% when H
calculated
worthes H
uni f ormity
. By
opposition, Q
H
is 100% when H
calculated
is H
certainty
.
Thus quality reflects the lack of diversity in the matrix
and as a consequence, quality indicates the strength of
correlation between information sources.
Results for table 1 and table 2 are respectively:
Q
H
= (1
H
calculated
H
uni f ormity
) × 100
= (1
0
8×8
(8+81)
log
2
(8 + 8 1)
) × 100
= 100% (4.3)
Q
H
= (1
H
calculated
H
uni f ormity
) × 100
= (1
3
8×8
(8+81)
log
2
(8 + 8 1)
) × 100
= (1
3
64
15
log
2
(15)
) × 100
= (1
45
64log
2
(15)
) × 100
82% (4.4)
We find that perfect lexicon matrix has 100%
quality while confusing lexicon matrix has an 82%
level of quality.
That last level of quality should awake re-
searcher’s curiosity by analysing further more the
confusing lexicon matrix. We can observe in the 2
lexicon matrix that all words have the same meaning:
they are all synonyms. To measure homonymy and
synonymy in a lexicon matrix, we only have to little
adapt the p
i j
calculus in H:
p
i j
=
M
i j
k
M
ik
for synonymy
p
i j
=
M
i j
l
M
l j
for homonymy
Studying synonymy or homonymy is about to
study each dimension of the matrix separately.
We propose to use conversly Q
H
as a level of noise
L
H
. Using the fitted p
i j
, L
H
will be:
Entropy as a Quality Measure of Correlations between n Information Sources in Multi-agent Systems
285
L
H
synonymy
= dH × 100
= (
8 ×
c
8×c
log
2
(
c
8×c
)
8 × log
2
(8)
) × 100
= (
log
2
(
1
8
)
8 × 3 × log
2
(2)
) × 100
= (
3 × log
2
(2)
24
) × 100
=
1
8
× 100
= 12, 5% (4.5)
L
H
homonymy
= dH × 100
= (
8 ×
C
C
log
2
(
c
c
)
8 × log
2
(8)
) × 100
= (
0
24
) × 100
= 0% (4.6)
Analyzing L
H
measure reveals that it is perfectly
corresponding to the matrix as there is one column
filled with noisy values: synonyms. It is
1
8
of the
whole matrix or 12, 5%. On the other hand the level
of homonymy is 0% as it should be while the matrix
contains no single word having different meanings.
Q
H
and L
H
offers two ways to analyze correlations
matrix containg different sources of information. We
propose now to generalize our work to n sources of
information thus to matrix in R
n+
.
5 ENTROPY IN r
n+
While working with n different sources of informa-
tion to correlate, entropy will thus be extracted from a
n dimensional matrix. The p calculus will then modi-
fied as follow:
p
a
1
...a
n
=
M
a
1
...a
n
(
n
j=1
s
j
k=1
M
a
1
...k...a
n
) (n 1) × M
a
1
...a
n
(5.1)
Where s
j
is the size of the j
th
dimension of the
matrix and a
i
is the is the index in the matrix of the i
th
dimension. Entropy calculus will remain the same:
H
calculated
=
a
1
...a
n
p
a
1
...a
n
× log
2
(p
a
1
...a
n
) (5.2)
Maximum entropy is reached while
a
1
, . . . , a
n
M
a
1
,...,a
n
= c. For calculus simplifi-
cation matter, we assert that a matrix filled with
the same value cε R brings the same information
quality as a matrix filled with 1 i.e. every single value
divided by c.
H
uni f ormity
=
a
1
,...,a
n
1
a
1
+ . . . +a
n
1
×log
2
(
1
a
1
+ . . . +a
n
1
)
= (a
1
× . . . ×a
n
) ×
1
a
1
+ . . . +a
n
1
×log
2
(
1
a
1
+ . . . +a
n
1
)
=
(a
1
× . . . ×a
n
)
a
1
+ . . . +a
n
1
×log
2
(
1
a
1
+ . . . +a
n
1
)
=
(a
1
× . . . ×a
n
)
a
1
+ . . . +a
n
1
×(log
2
(1) log
2
(a
1
+ . . . +a
n
1))
=
(a
1
× . . . ×a
n
)
a
1
+ . . . +a
n
1
×log
2
(a
1
+ . . . +a
n
1) (5.3)
Entropy reaches its minimum while the biggest
identity square matrix would be represented in the
whole matrix i.e. when we have p
a
1
...a
n
= 1 for each
unique ”a” position.
If we consider q as min(a
1
, . . . , a
n
)
4
, the calculus
of minimal entropy becomes:
H
certainty
= q × 1 × log
2
(1)
(a
1
+ ×. . . × a
n
q) ×0 × log
2
(0)
= q × log
2
(1)
= 0 (5.4)
The dH measure calculus remains the same as the
calculus of Q
H
and L
H
. L
H
would be extended to find
out why the Q
H
does not reach 100%. The principle
is still the same to adapt L
H
to n dimensional matrix:
fix one or more column in the p
a
1
...a
n
variable and then
give a meaning to the L
H
calculus like we did in 4.
This last assertion concludes our work.
6 CONCLUSION AND FURTHER
WORK
Entropy has been used for decades in computer sci-
ence but not only. While it offers clear evaluation of
4
in order to get the biggest square identity matrix
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
286
the quality of the transmission of an information, until
now, it was not used to correlate different sources of
information in a simple way. That modified entropy
offers clear and efficient measure to correlate interac-
tions between agents and multi-agent systems.
Next step is to implement an algorithm to make
the calculus of H in n dimensional matrix with a rea-
sonable complexity.
REFERENCES
Shannon, C.E. and W. Weaver (1949). The Mathe-
matical Theory of Communication. University of
Illinois Press, Urbana, Ill.
MacLennan, B.J. and Burghardt, G.M. (1994). Syn-
thetic ethology and the evolution of cooperative
communication. Adaptive Behavior, 2(2), Fall
1993, pp. 161-188. MIT Press.
Enee, G. and Escazut, C. (2002). A Minimal Model
of Communication for a Multi-Agent Classi-
fier System. Advances in Learning Classifier
Systems. LNAI 2321 (Lecture Notes in Artifi-
cial Intelligence), Pier Luca Lanzi, Wolfgang
Stolzmann, Stewart W. Wilson (Eds.). Springer-
Verlag Berlin Heidelberg 2002.
Entropy as a Quality Measure of Correlations between n Information Sources in Multi-agent Systems
287