
H(Y |X) only considers the information flow from X
to Y. Let us circle back to the stock market scenario.
If H(Y |X) = 0, then company C follows the global
trend to the highest degree. The value of H(Y |X) is
inversely proportional to the degree of dependency of
C on the global trend.
In practice, calculating the conditional entropy is
infeasible as it requires complete knowledge of the
joint and marginal probability distributions of X and
Y . Hence, it is often estimated using X and Y observa-
tions (Pham, 2004). Such data-driven estimators are
often sophisticated, while suffering from significant
estimation biases (Beirlant et al., 1997).
In the field of computer security, a side channel
attack is an attack that is based on extra information
that can be gathered owing to the fundamental way in
which a computer algorithm is implemented (Golder
et al., 2019). In order to prevent side-channel attacks,
security analysts must ascertain the amount of data,
which when collected, can be used to compromise a
computer. Let us define X to be the random variable
associated with the gathered data – the “side-channel
information”. Let us also define the two valued ran-
dom variable Y as equalling 1 if the security is com-
promised, and 0 otherwise. Then H(Y |X) quantifies
how much of X must be observed in order to accu-
rately determine Y . Again, calculating H(Y |X) ex-
actly is infeasible. Machine Learning was recently
used to skirt around this (Drees et al., 2021) (Gupta
et al., 2022). However, these works are preliminary
and empirical without formal backing. In any case
they do not attempt to estimate conditional entropy or
draw information theoretic conclusions.
Our Contributions. We present a Machine Learning
based easy-to-implement method to estimate the con-
ditional entropy H(Y |X), where Y is a bi-variate ran-
dom variable and X is discrete valued. This estimate
is used to measure the degree of dependency of Y on
X. Given observations of X and Y , we discuss how to
transform the problem of estimating the conditional
entropy to a supervised learning problem. We then
use the prediction accuracy of the supervised learn-
ing algorithm to estimate the conditional entropy. We
present sufficient conditions on the accuracy of the
learning algorithm for guaranteed information flow
from X to Y. We support our ideas through formal ar-
guments and experiments on real datasets.
1.1 Conditional Entropy in Sociology:
A Data-Driven Approach
Companies and organizations aim to ensure their male
and female employees are equally represented, that
their policies are not skewed towards one gender.
Conditional entropy plays an essential role when ana-
lyzing the current state of affairs with respect to gen-
der equality in these organizations. Here, we present
an ML-based approach for such an analysis. First,
the employee database is used to create a supervised
learning (classification) dataset. There is one input in-
stance corresponding to each employee. It only con-
tains gender neutral information such as date of birth,
salary, position, etc. Gender revealing information
such as names, gender, etc., are excluded. The input
instances are labeled using their gender – 0 for female
and 1 for male. We are therefore in the setting of bi-
nary classification. We use X to represent the random
variable associated with the input, and Y to represent
the class random variable.
If the gender does not play a role in hiring and
subsequent career development, then one cannot re-
liably predict the gender Y merely using the gender
neutral information X, such as the title and salaray. In
the parlance of information theory, H(Y |X) = H(Y ).
On the other hand, suppose there is gender bias, then
H(Y |X) < H(Y ). In this paper, we call a dataset gen-
der biased when the genders are unequally repre-
sented (gender plays a role in hiring, career devel-
opment, etc.) when H(Y |X) < H(Y ). As explained
earlier, we present a supervised learning approach to
estimating H(Y |X ), and checking if H(Y |X) < H(Y ).
The gender inequality problem is very well stud-
ied in literature, see, e.g., (Heiberger, 2022). Also,
ML tools have been been previously used to em-
pirically study problems in sociology, e.g., (Zajko,
2022) (Molina and Garip, 2019). To the best of our
knowledge, this is the first time in literature, wherein
ML is used within the framework of statistical infer-
ence to answer a sociological question, in particu-
lar a gender-bias question. Further, the framework is
backed by formal theory.
Here is another scenario where our methodology
is useful. Let us suppose that a vast region is flooded.
After emergency relief operations, the responsible
government committee must formulate a plan to allo-
cate resources and funds for the long-term rebuilding
process. For the sake of simplicity, let us suppose that
the region can be divided into neighborhoods – each
consisting of groups of houses, one or more commu-
nity centers, commercial buildings, schools and hos-
pitals. The committee must allocate resources at the
“neighborhood level”. Resource allocation must be
done in a fair and equitable manner, proportional to
the losses incurred by the neighborhood. Further, the
associated timelines, e.g., to release funds, must fa-
cilitate immediate and equitable relief. The plan must
not be influenced by the political orientation, racial
identity or affluence of any given neighborhood.
Information Theoretic Deductions Using Machine Learning with an Application in Sociology
321