Hierarchical Relation Networks:

Exploiting Categorical Structure in Neural Relational Reasoning

Ruomu Zou

and Constantine Dovrolis

Phillips Exeter Academy, Exeter NH, U.S.A.

School of Computer Science, Georgia Tech, Atlanta GA, U.S.A.

Keywords:

Relational Reasoning, Categorization, Hierarchy Learning.

Abstract:

Organizing objects in the world into conceptual hierarchies is a key part of human cognition and general

intelligence. It allows us to efﬁciently reason about complex and novel situations relying on relationships

between object categories and hierarchies. Learning relationships among sets of objects from data is known

as relation learning. Recent developments in this area using neural networks have enabled answering complex

questions posed on sets of objects. Previous approaches operate directly on objects – instead of categories of

objects. In this position paper, we make the case for reasoning at the level of object categories, and we propose

the Hierarchical Relation Network (HRN) framework. HRNs ﬁrst infer a category for each object to drastically

decrease the number of relationships that need to be learned. An HRN consists of a number of distinct modules,

each of which can be initialized as a simple arithmetic operation, a supervised or unsupervised model, or as part

of a fully differentiable network. This approach demonstrates that categories in relational reasoning can allow

for major reductions in training time, increased data efﬁciency, and better interpretability of the network’s

reasoning process.

1 INTRODUCTION

A key component of human intelligence is the abil-

ity of associating entities (such as your best friend’s

younger brother or the chair in your grandmother’s

living room with the embroidered cushion) with cate-

gories such as people or furniture. (Alexander et al.,

2016; Krawczyk, 2012). These categories are also of-

ten referred to as ”concepts” which we form to bet-

ter abstract our world (Pothos and Wills, 2011; Rosch

et al., 1976; Markman and Wisniewski, 1997; Yeung

and Leung, 2006). Forming categories allows us to

carry out efﬁcient and robust relational reasoning.

Relational reasoning considers problems that re-

quire assessing not only individual objects in a

dataset, but how groups of such objects relate. For

example, looking at an ofﬁce desk like the one in

Figure 1 on the following page, one might ask the

non-relational question ”is the lamp on?” This ques-

tion only looks at one object: the lamp. A relational

query, on the other hand, would consider more objects

– for example: ”does the lamp shine light on objects

which are beside each other?” To answer this query,

one needs to compute the relationships between the

lamp, the desk, and the chair. To tackle similar chal-

lenges, several models have been proposed (Shanahan

et al., 2020; Kazemi and Poole, 2018; Sourek et al.,

2015; Franc¸a et al., 2014), the most versatile of which

is the Relation Network (RN) architecture, summa-

rized in the next section (Santoro et al., 2017).

Using categories when reasoning about relation-

ships is particularly useful because the number of cat-

egories is often much smaller than the number of dis-

tinct objects. For example, instead of learning a re-

lation (e.g., ”sweeter than”) between every type of

fruit and every type of vegetable, we can simply learn

that ”fruits are sweeter than vegetables” – and apply

that relation between these two categories of objects

to all their corresponding instances. This would al-

low a learner to deduce that ”bananas are sweeter than

eggplants” even if the learner was not trained on the

(bananas, eggplants) pair.

To test the approach of utilizing categories for

neural relational reasoning, we propose the Hierarchi-

cal Relation Network (HRN) framework as a proof-

of-concept. We demonstrate HRN’s plausibility ,

massively reducing training time without sacriﬁcing

accuracy, on a real-world dataset of complex rela-

tional queries.

Zou, R. and Dovrolis, C.

Hierarchical Relation Networks: Exploiting Categorical Structure in Neural Relational Reasoning.

DOI: 10.5220/0010690500003063

In Proceedings of the 13th Inter national Joint Conference on Computational Intelligence (IJCCI 2021), pages 359-365

ISBN: 978-989-758-534-0; ISSN: 2184-3236

359

Figure 1: An example of a relational view of a scene vs. a

non-relational view.

2 PRIOR WORK

Relation Networks: The Relation Network (RN) ar-

chitecture was proposed as a general way of operating

on inputs represented as sets of objects (Santoro et al.,

2017). It takes as input a set of objects and possibly

some additional information, such as questions posed

on the set of objects. To get the ﬁnal output (an an-

swer for the posed question), the objects are placed

into pairs and each pair is processed separately. This

is equivalent to considering the binary relations be-

tween every possible object pair. If there are n ob-

jects, then the network would have to evaluate n

re-

lations. This can quickly get intractable, especially

when working with complex domains.

Hierarchy Learning and Semantic Computing: A

variety of techniques in semantic computing learns

and utilizes hierarchies/categories among objects

(Wang, 2010). The algorithm Concept Semantic Hi-

erarchy Learning (CSH Learning) was proposed to

construct ontologies of concepts for semantic com-

prehension by machine learning systems (Valipour

and Wang, 2017). A similar algorithm, proposed by

Anoop et al., constructs concept hierarchies in an un-

supervised manner from natural language text cor-

puses (Anoop et al., 2016). Although these tech-

niques don’t focus on relational question answering,

they nevertheless demonstrate the value of learning

and using hierarchies as opposed to ﬂat lists of con-

cepts.

Categorization in Cognition: Humans naturally or-

ganize perceived objects into categories. Past re-

search has identiﬁed three main theories for human

categorization: classical, prototype, and examplar

(Pothos and Wills, 2011). We mainly utilize ideas

from the prototype view of classes, which considers

each class to be represented by a prototype vector (Ye-

ung and Leung, 2006). Additionally, it has also been

shown that instead of overly broad (e.g. all living

things) or speciﬁc (blue-cream calico British Short-

hair) classes, humans often focus on basic categories

(like cat) that are neither too general nor too restric-

tive (Rosch et al., 1976; Markman and Wisniewski,

1997). This supports our usage of one level of ob-

ject hierarchy (each class belongs to a category) as an

efﬁcient simpliﬁcation to full ontologies.

Figure 2: A visual demonstration of using categories to

more effectively conduct relational reasoning.

3 HIERARCHICAL RELATION

NETWORKS (HRN)

In this section we describe a general framework,

termed the Hierarchical Relation Network (HRN),

that can harness hierarchical structure among input

objects to conduct efﬁcient relational reasoning. It

ﬁrst classiﬁes input objects to their categories, and

it then applies the previously proposed Relation Net-

work approach (Santoro et al., 2017) on those cat-

egories. Three speciﬁc instantiations of this HRN

framework are presented, ranging from separate mod-

els to a single fully-differentiable neural network.

The process of hierarchical relation reasoning can

be divided into three steps which are carried out by

three distinct components in the reasoning pipeline.

The ﬁrst component, called a Categorizer, assigns

each input object into a category. The second com-

ponent, called a Conceptualizer, uses the objects and

their predicted categories to generate a dense repre-

sentation for each category. The third and ﬁnal com-

ponent is a network architecture that handles the rela-

tional computation, such as a Relation Network (San-

toro et al., 2017).

NCTA 2021 - 13th International Conference on Neural Computation Theory and Applications

360

Figure 3: A graphical illustration of an SCHRN with sample input from the Animals dataset below. The raw inputs are

ﬁrst categorized, and then dense representations of each class/concept are built by summing the representations of all objects

classiﬁed under it. Finally the dense representation for each class is passed to an existing framework for neural relational

reasoning (the relational network).

3.1 Supervised Categorizer

Hierarchical Relation Network

(SCHRN)

The ﬁrst and simplest implementation of the Hierar-

chical Relation Network framework takes the form of

two separate network models, one for assigning ob-

jects to their classes and the other for conducting rea-

soning (see Figure 3). This implementation is less

powerful than the ones presented below, but serves

as an important ﬁrst step. The strict assumptions of

SCHRN are relaxed in the subsequent implementa-

tions.

To see how SCHRN works, imagine a toddler who

is just beginning to learn about the world. The par-

ent may show a number of apples and pears, and say

that apples is one type of fruit and pears is another.

Through such supervised input, the toddler learns the

concepts (object categories) of apples and pears and

use these concepts to reason.

The ﬁrst part of our framework, the categorizer

model, acts similar to the toddler in the previous ex-

ample, and it learns object categories through cor-

responding labeled examples. When the categorizer

model has been learned, we pass input objects (such

as horse, frog etc) to transform them into one-hot rep-

resentations in the space of categories (mammal, am-

phibian, etc).

The conceptualizer consists of a simple dot prod-

uct between the previous one-hot representation and

the original representation of an input object. This

operation results in a category representation that is

the sum of the attributes of all member objects of that

category. This is similar to the cobweb system for

conceptual clustering proposed (Fisher, 1987).

Finally, the previous dense category representa-

tions are passed to a network for relational reasoning

– in our implementation we use the Relation Network

(Santoro et al., 2017) but other networks could also

be used.

The main advantage of splitting categorization

and reasoning into two networks, apart from more

closely resembling human learning, is the additional

ﬂexibility introduced. For example, if we only have

enough information about an object to determine its

category, but no information on its relevant attributes,

the SCHRN is effectively unhindered while the sub-

sequent HRN models might suffer depending on their

generalization ability.

However, the two-network structure of SCHRN

also has drawbacks: it assumes prior knowledge about

the categories of some objects used as training data

for the categorizer network. It also assumes that the

available categories will be relevant to the particu-

lar reasoning task (e.g., classifying fruits into ”large

fruits” versus ”small fruits” does not help in determin-

ing their sweetness). SCHRNs perform well when the

previous assumptions are met.

The model UCHRN of the following subsection

relaxes the ﬁrst aforementioned assumption, while

Hierarchical Relation Networks: Exploiting Categorical Structure in Neural Relational Reasoning

361

Figure 4: An FDHRN with sample input from the Animals dataset below. The entire framework is end-to-end differentiable.

The inputs are ﬁrst passed through fully connected layers with a softmax activation at the end to get the predicted probability

distribution over categories. The probabilities are then summed, yielding the approximate number of objects in each category.

They are then multiplied with an embedding matrix containing trainable class embeddings to get the dense representations

that are passed into the relation network module.

the model FDHRN of subsection 3.3 relaxes both as-

sumptions. This increase in generality and power

comes at the cost of higher variance over the training

data however.

3.2 Unsupervised Categorizer

Hierarchical Relation Network

(UCHRN)

This architecture is effectively identical to SCHRN

with one important difference: instead of using a su-

pervised model for categorization, an unsupervised

method based on object similarity is employed. This

approach is more aligned with the prototype view of

cognitive categorization (Yeung and Leung, 2006).

Here we use k-means clustering on the object repre-

sentation to map the raw objects onto one-hot vectors

in the space of categories.

Continuing the toddler and fruits analogy, the

UCHRN architecture corresponds to the toddler look-

ing at a number of different fruits (apples, pears, mel-

ons, etc.), and categorizing them without supervision

based on view, taste, smell etc.

An important advantage of UCHRN is that it al-

lows us to vary the number of categories used by the

HRN by varying the number of clusters. For con-

venience, we denote the number of categories by m.

This gives us a wide range of intermediate models be-

tween the extremes of operating on uncategorized ob-

jects (m = n, where n is the number of objects) and

treating all objects as one category (m = 1). In the

Results section, we see that the accuracy of UCHRN

increases up to a certain point as m increases, before

starting to drop.

The UCHRN architecture relaxes the assumption

of requiring labeled data for the categorizer. However,

it still assumes that the categories found by clustering

are good in answering the questions in the test dataset.

3.3 Fully Differentiable Hierarchical

Relation Network (FDHRN)

Our third and most general architecture of the HRN

framework takes the form of an end-to-end differen-

tiable neural network. Its goal is to form artiﬁcial cat-

egories/concepts (in the form of embeddings) that are

most relevant to answering the questions posed in the

training data.

The FDHRN architecture is shown in Figure 4

above. It randomly initializes m categories, the em-

beddings of which will be trained with the rest of the

network. Each input object is ﬁrst passed through

a number of fully connected layers and a softmax

function to estimate the likelihood that it belongs to

each derived class. The dense representation for each

class, which is then passed into the relational reason-

ing module, is scaled up by summing the ”soft” (prob-

abilistic) values of the number of objects it contains.

NCTA 2021 - 13th International Conference on Neural Computation Theory and Applications

362

Table 1: The training time and test accuracy of a baseline RN and three HRN architectures at 30 and 200 epochs.

Architecture Training time Test accuracy Training time Test accuracy

(30 epochs) (30 epochs) (200 epochs) 200 (epochs)

RN (baseline) 2,080s 90.3% 13,349s 92.7%

SCHRN 81s 88.9% 549s 93.3%

UCHRN 67s 90.4% 449s 94.1%

FDHRN 90s 87.8% 545s 89.6%

The intuition behind this approach is that instead

of pre-assigning each object to a deﬁnite category, we

treat each object as belonging, say 95% to category-

one, 3% to category-ﬁve, and so on. This transi-

tion from using hard category assignments, similar to

Aristotelic logic, to using soft assignments, similar to

fuzzy logic, both allows for differentiability and in-

creases the ﬁtting power of the network. For small m,

when many objects would fall ”on the fence” between

two classes, the increase in ﬁtting power would result

in greater ﬂexibility and improved performance at the

cost of training more parameters.

An important feature of the FDHRN architecture

is the use of trainable embeddings for the object cat-

egories. Imagine a zoo at which we only ask ques-

tions about features that have little to do with animal

taxonomy, such as color or size (”do brown animals

tend to be bigger than black animals?”). If we were

to groups objects by more domain-based categories,

such as ”amphibian”, ”mammal” etc., there would

be a mismatch between the relation questions we ask

and the given categories, which would decrease the

learning efﬁciency. Trainable embeddings allow the

FDHRN to derive categories that are more speciﬁc to

the reasoning task at hand.

Although advantageous, inferring categories from

data can lead to increased model complexity and

make the network more prone to overﬁtting.

4 RESULTS

4.1 The Zoo Animals Dataset

We used a dataset containing information about

101 animals with various attributes for each animal

(Learning, 2020).

We tested all three aforementioned HRN architec-

tures in tasks of relational queries on subsets of the

101 animals. An example query might be: ”Among

these animals, are those with lungs more likely to be

aquatic than those with ﬁns?” The possible answers

(balanced to occur with about the same frequency) are

”yes,” ”no,” and ”about equally likely.”

Each data point consists of a single task: a subset

of animals and a query to which the network gener-

ates an answer. During training, the correct answer is

also provided as a label. The models are scored based

on the percentage of correctly performed tasks, with

a random guess baseline achieving around 33% accu-

racy.

4.2 Testing Speciﬁcations

We use a version of the Animals dataset that contains

about 19, 000 questions on different subsets of ani-

mals. Half the questions are used for training while

the other half is reserved for testing. Each subset of

animals consists of 25 animals. The choice of 25 is

arbitrary, and purely made to ensure that the baseline

RN completes training in reasonable time. Even so,

the RN model takes several hours to train while the

three HRN architectures take a few minutes each.

All models have the same fully-connected layer

depth and width, as well as the same batch size (64)

and learning rate (0.0005). The tests ran in Python 3

on an OSX operating system with GPU-support.

4.3 Comparison of Performance

We evaluated four models on the Animals dataset:

The original RN used as baseline, an SCHRN with

7 categories, an UCHRN with 8 categories, and the

FDHRN with 8 categories. We compared both train-

ing times and testing accuracies achieved. The results

are summarized in Table 1, with values averaged over

several runs and rounded. All models converge by

200 epochs.

Even for merely 25 objects, the training time of

the baseline RN is one or two orders of magnitude

higher than that of the HRN models. This is expected

as, in general, the time complexity of the HRNs scale

with m

instead of n

, where m is the number of cate-

gories and n the number of objects.

We see that, out of all the models tested, the

UCHRN achieves superior accuracy at the end of

training and also takes the shortest time to ﬁnish train-

ing. This is likely because its assumptions are satis-

ﬁed best by the animals dataset (the questions were

posed on all animal attributes). The SCHRN performs

worse as the ﬁxed categories given by the dataset

(mammal, bug, etc.) are not particularly suited for the

Hierarchical Relation Networks: Exploiting Categorical Structure in Neural Relational Reasoning

363

Figure 5: Effect of the number of categories, m, on training time and testing accuracy.

query questions. FDHRN does not perform as well as

UCHRN due to its higher variance. The main advan-

tage of FDHRN is that it can infer better categories

when many object attributes are irrelevant – but that

is not the case in our data.

4.4 Exploring the Optimal Number of

Classes

HRNs that support a different number of categories,

such as UCHRNs or FDHRNs, allow us to create a

large spectrum of models between baseline RNs and

simple MLPs. When the number of categories m is the

same as the number of objects n, the resulting HRN

is effectively a simple RN operating on uncategorized

objects, since each object becomes its own category.

Whereas when m=1, an HRN resembles more an MLP

model that operates on one vector representation of

only the objects.

For these experiments we use a subset of the An-

imals dataset with 20,000 thousand questions (again

with an equal train-test split) posed on sets of 80 an-

imals, and used an UCHRN with increasing m. We

start with m = 2 to keep the network structure consis-

tent. Figure 5 shows both how training time and test

accuracy change with m.

We see a clear increase in training time as m in-

creases As the number of categories increases, the

number of relations to compute among those cate-

gories increases as m

Another point in these results is that the accuracy

increases rapidly ﬁrst and then curves off. Once again

this is expected as UCHRNs that can leverage more

categories perform better, but only up to a certain

point. In this dataset, the network has sufﬁcient in-

formation to answer the questions as well as it can

with 6 categories.

This last observation is consistent with empirical

evidence from human reasoning which suggests that

we tend to avoid both overly speciﬁc categories and

overly generic categories in order to form the small-

est possible number of concepts that is sufﬁcient to

understand well our environment (Rosch et al., 1976;

Markman and Wisniewski, 1997).

5 CONCLUSIONS AND FUTURE

WORK

In this paper we drew inspiration from how humans

learn relations to explore grouping objects into cate-

gories as a plausible improvement to neural relational

reasoning. We proposed the Hierarchical Relation

Network (HRN) framework, and three architectures

of this framework that differ in how they categorize

objects. HRNs have comparable accuracy with Rela-

tion Networks but they run orders of magnitude faster.

Our work serves as a proof-of-concept and an ini-

tial exploration into hierarchical relational reasoning.

Indeed, there are many open questions to consider:

How can we better represent categories in a way that

is more ﬂexible than summing objects? What hap-

pens when the objects evenly span the space of their

attributes and do not form distinct clusters? Can cat-

egories be deﬁned in terms of the task at hand in that

case?

We believe that hierarchical object classiﬁcation

and relational learning between categories have great

potential to work hand-in-hand towards more practi-

cal and general machine intelligence.

REFERENCES

Alexander, P. A., Dumas, D., Grossnickle, E. M., List,

A., and Firetto, C. M. (2016). Measuring relational

reasoning. The Journal of Experimental Education,

84(1):119–151.

NCTA 2021 - 13th International Conference on Neural Computation Theory and Applications

364

Anoop, V., Asharaf, S., and Deepak, P. (2016). Unsu-

pervised concept hierarchy learning: A topic mod-

eling guided approach. Procedia Computer Science,

89:386–394.

Fisher, D. (1987). Improving inference through conceptual

clustering. Association for the Advancement of Artiﬁ-

cial Intelligence.

Franc¸a, M. V. M., Zaverucha, G., and d’Avila Garcez, A. S.

(2014). Fast relational learning using bottom clause

propositionalization with artiﬁcial neural networks.

Mach Learn, 94.

Kazemi, S. M. and Poole, D. (2018). Relnn: A deep neural

model for relational learning. Proceedings of the AAAI

Conference on Artiﬁcial Intelligence, 32(1).

Krawczyk, D. C. (2012). The cognition and neuroscience

of relational reasoning. Brain Research, 1428:13–23.

Learning, U. M. (2020). Zoo animal classiﬁcation. https:

//www.kaggle.com/uciml/zoo-animal-classiﬁcation.

Accessed: 2020-11-30.

Markman, A. B. and Wisniewski, E. J. (1997). Similar and

different: The differentiation of basic-level categories.

Journal of Experimental Psychology: Learning, Mem-

ory, and Cognition.

Pothos, E. M. and Wills, A. J. (2011). Formal Approaches

In Categorization. Cambridge University Press.

Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., and

Boyes-Braem, P. (1976). Basic objects in natural cat-

egories. Cognitive Psychology, 8(3):382–439.

Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M.,

Pascanu, R., Battaglia, P., and Lillicrap, T. (2017). A

simple neural network module for relational reason-

ing. Neural Information Processing Systems.

Shanahan, M., Nikiforou, K., Creswell, A., Kaplanis, C.,

Barrett, D., and Garnelo, M. (2020). An explicitly

relational neural network architecture. arXiv preprint

arXiv:1905.10307v4.

Sourek, G., Aschenbrenner, V., Zelezny, F., and Kuzelka,

O. (2015). Lifted relational neural networks. arXiv

preprint arXiv:1508.05128v2.

Valipour, M. and Wang, Y. (2017). Building semantic hier-

archies of formal concepts by deep cognitive machine

learning. pages 51–58.

Wang, Y. (2010). On formal and cognitive semantics for se-

mantic computing. International Journal of Semantic

Computing, 04.

Yeung, C. and Leung, H. (2006). Ontology with like-

ness and typicality of objects in concepts. Conceptual

Modelling.

Hierarchical Relation Networks: Exploiting Categorical Structure in Neural Relational Reasoning

365