Occupant Activity Recognition in IoT-Enabled Buildings:

A Temporal HTN Planning Approach

Ilche Georgievski

Service Computing Department, IAAS, University of Stuttgart, Universitaetsstrasse 38, Stuttgart, Germany

Keywords:

Activity Recognition, HTN Planning, Plan and Goal Recognition, Smart Buildings, Internet of Things.

Abstract:

Given that people spend most of their time indoors, it is imperative that buildings maintain optimal well-being

for occupants. To achieve this, research must prioritise occupants over buildings themselves. IoT-enabled

buildings can improve quality of life by understanding and responding to occupant’s behaviour. This requires

recognising what occupants are doing based on IoT data, particularly by considering the objects they use in

speciﬁc building areas. Situated within the realm of plan and goal recognition as planning, we propose a

novel knowledge-engineering approach to occupant activity recognitions leveraging temporal HTN planning.

Our approach consists of two primary processes: generating problem instances from IoT data and engineering

HTN domain models for activity recognition. The ﬁrst ensures the representation of IoT data using planning

constructs, while the second integrates knowledge about occupant activities into HTN domain models. To

support our approach, we provide two HTN domain models tailored for workspaces and homes. Experimental

validation with the latter domain and a real-world dataset show that the quality of our computed solutions

surpasses that of baseline data-driven approaches and is comparable to more advanced, hybrid approaches.

1 INTRODUCTION

The need for indoor environments that prioritise occu-

pant well-being has become critical, as people spend

the majority of their time indoors (Klepeis et al.,

2001; Matz et al., 2015). This demand is further

intensiﬁed by factors such as the growing emphasis

on sustainability like carbon awareness, demographic

shifts like an ageing population, and the rise of hy-

brid working models. In response, buildings, such

as homes and workspaces, are being equipped with

advanced technologies, including Internet of Things

(IoT) devices and Artiﬁcial Intelligence (AI) func-

tionality. These innovations enable real-time moni-

toring and dynamic control of buildings. However,

to truly enhance occupant well-being, the focus must

shift from merely optimising the buildings themselves

to supporting the occupants (Awada et al., 2021). This

also includes understanding occupants’ activities as

informed by buildings’ IoT data, introducing the chal-

lenge of occupant activity recognition.

Occupant activity recognition is essential for cre-

ating buildings responsive to the needs of occu-

pants (Nguyen and Aiello, 2013). However, exist-

https://orcid.org/0000-0001-6745-0063

ing approaches to this problem have signiﬁcant lim-

itations. Data-driven methods, while powerful, de-

mand big data and often struggle with generalisation

to unseen scenarios (Chen et al., 2021). Conversely,

knowledge-based techniques, such as ontological rea-

soning, involve complex modelling (Uschold and

Gruninger, 1996), may lack adaptability and scalabil-

ity (Sirin et al., 2007), and have difﬁculty with tem-

poral reasoning (Riboni et al., 2011). Despite the va-

riety of available methods, a gap persists in having

approaches that generalise across buildings and offer

structured and adaptable integration of activities.

One promising avenue for addressing this chal-

lenge lies in the area of Plan and Goal Recognition

(PGR) (Sukthankar et al., 2014; Van-Horenbeke and

Peer, 2021). PGR involves inferring an agent’s plans

and goals based on observed actions, typically rely-

ing on a plan library or domain theory. Notably, the

approach known as PGR as planning (Ram

ırez and

Geffner, 2009) has shown potential by leveraging do-

main theory and AI planners to solve PGR problems.

This approach offers high expressiveness, support for

adaptable knowledge representations, the use of off-

the-shelf tools, and ability to operate without big data.

Building on the strengths of PGR as planning, we

propose an approach to occupant activity recognition

956

Georgievski, I.

Occupant Activity Recognition in IoT-Enabled Buildings: A Temporal HTN Planning Approach.

DOI: 10.5220/0013241900003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 3, pages 956-966

ISBN: 978-989-758-737-5; ISSN: 2184-433X

based on PGR as Hierarchical Task Network (HTN)

planning (H

oller et al., 2018). Unlike PGR as plan-

ning that uses action-based domain theory, our ap-

proach employs HTNs as a domain theory to capture

occupant activity recognition at multiple levels of ab-

straction in a structured fashion. HTN planning is also

known for its computational efﬁciency and scalability,

making it well-suited for the challenges of buildings.

Speciﬁcally, we propose a novel knowledge-

engineering approach to framing occupant activity

recognition as a temporal HTN planning problem.

This approach consists of two key processes. One in-

volves developing HTN domain models that encapsu-

late a hierarchical structure of tasks that describe oc-

cupant activities via involved objects and associated

IoT devices, and general mechanisms for handling

IoT data. The other outlines the integration of IoT

data as temporal facts in the initial state of problem

instances. The solution to this planning problem is a

set of activities, with start and end times, recognised

across different locations within a given building.

Our main contributions are as follows:

• We propose a novel framing of occupant activ-

ity recognition as a temporal HTN planning prob-

lem. This framing focuses on addressing the com-

plexity of building environments, including hi-

erarchical relationships, temporal properties, and

IoT data integration.

• We introduce a knowledge-engineering approach

that enables the application of HTN planning for

occupant activity recognition. This approach en-

sures guided transformation of domain knowledge

into a domain model and clear deﬁnition of goals

and states within problem instances, essential for

the success of planning (Bhatnagar et al., 2022).

These problem-formulation challenges are partic-

ularly pronounced in realistic applications, where

dynamic conditions require solutions to be gener-

ated in real time, as opposed to being derived from

simpliﬁed or pre-generated benchmarks.

• Our work extends the application of PGR as

HTN planning beyond its previous contexts, e.g.,

driver and ofﬁce activity recognition (Fernandez-

Olivares and Perez, 2020; Georgievski, 2022), to

varied buildings. This also contributes to the gen-

eralisation gap in the literature, making our work

unique in its focus on cross-domain application.

• By following our approach, we develop HTN do-

main models designed explicitly for occupant ac-

tivity recognition in two common building types:

workspaces and homes. Their design allows for

application in various settings, from case studies

to large-scale environments, offering scalability

and adaptability. These domain models can serve

as a basis for further research and development.

• We show the efﬁciency of our approach using the

Homes domain, a real-world dataset, and an exist-

ing HTN planner. Our preliminary results show

improvements over baseline data-driven meth-

ods and comparable performance to hybrid ap-

proaches, highlighting the potential of our work.

The rest of the paper is organised as follows.

Section 2 describes the occupant activity recogni-

tion problem. Section 3 offers a brief overview of

temporal HTN planning. Section 4 introduces our

knowledge-engineering approach, and Section 5 gives

insights into the two domain models and the prelimi-

nary evaluation. Sections 6 and 7 present related work

and conclusions, respectively.

2 OCCUPANT ACTIVITY

RECOGNITION

We outline key concepts for occupant activity recog-

nition, focusing on their relevance to engineering

domain models and problem instances. While this

overview covers essential aspects, formalising occu-

pant activity recognition and its correspondence to

HTN planning is beyond this scope and can be found

elsewhere, e.g., (Georgievski, 2022).

We consider buildings equipped with IoT sensors

for real-time monitoring and data collection. Sen-

sors report readings whenever changes occur in the

observed space or at regular intervals via a pub-

lish/subscribe mechanism (Al-Masri et al., 2020).

The readings can be binary data (e.g., on/off states)

or numeric data (e.g., humidity values). To determine

whether numeric sensors are active or not, threshold

values are employed (Nguyen et al., 2014). Thresh-

olds can be absolute (the minimum detectable value),

or dynamic (varying based on spatial and usage char-

acteristics).

Buildings are organised into locations, each con-

taining IoT devices. Occupants perform various ac-

tivities within these locations, with an activity area

being a logical space where speciﬁc activities oc-

cur (Curry, 1996). Ergo, spatial relationships emerge,

linking sensors or activities to locations. The type of

activity is often dictated by occupants’ location. That

is, the type of location and objects with which occu-

pants interact – explicitly and implicitly – can con-

strain their activities (Liao et al., 2005; Wu et al.,

2007). When these objects are associated with sen-

sors, each type of occupant activity can be deﬁned in

terms of the relevant sensors deployed in a speciﬁc

Occupant Activity Recognition in IoT-Enabled Buildings: A Temporal HTN Planning Approach

957

activity area. For example, the deﬁnition of “Work-

ing with a computer” might involve sensors on a com-

puter, keyboard, mouse, and chair.

Occupant’s mere presence in a location, with or

without engaging in anything speciﬁcally, represents

the most fundamental occupant activity, called “Pres-

ence”. Part of this means that performing a speciﬁc

activity in some location also entails the “Presence”

activity. For example, the activity ‘Working” per-

formed in some location also requires being present.

“Working” can be further reﬁned into sub-activities,

such as “Working with a computer” and “Reading”,

revealing hierarchical relationships among the activ-

ities. Conversely, understanding when people do not

occupy some location holds signiﬁcance (e.g., energy

saving). We can interpret this situation as a com-

plementary type of activity called “Absence”, which

can also indicate that no activity takes place (Nguyen

et al., 2014). “Absence” exists at the same hierarchi-

cal level as “Presence”, indicating that an activity in a

location can be either presence-based or “Absence”.

Occupant activity recognition entails processing

temporal readings of sensors associated with ob-

jects that occupants interact with while performing

presence-based activities in speciﬁc locations at spe-

ciﬁc times. Recognised activities must conform to

a predeﬁned set of relevant objects deployed in spe-

ciﬁc locations and readings of sensors associated with

those objects at speciﬁc time points. Thus, the occu-

pant activity recognition problem can be deﬁned as

follows: Given a building with locations and sensors

characterised by thresholds and spatial properties,

and a set of possible occupant activities with hierar-

chical relationships and described by sensors linked

to relevant objects, determine the set of activities oc-

curring in all activity areas consistent with the prede-

ﬁned activities and temporal readings.

3 BACKGROUND

We approach occupant activity recognition as state-

based HTN planning (Georgievski and Aiello, 2015).

A central construct is a task network, which is a hi-

erarchy of primitive and compound tasks. Primitive

tasks are executable actions, whereas the latter en-

capsulate domain knowledge that extends beyond ac-

tions. These compound tasks are decomposable into

subtasks using methods. An objective is modelled

as an initial task network that should be decomposed

starting from an initial state using the task hierarchy.

The solution is a course of action executable in the

initial state.

Building on this standard concept, temporal HTN

planning integrates time into the state and tasks. We

focus on this integration as captured by the Hierar-

chical Domain Deﬁnition Language (HPDL), as de-

signed for the SIADEX planner (Castillo et al., 2006).

HPDL builds upon the Planning Domain Deﬁnition

Language (PDDL) (McDermott et al., 1998), where

primitive and compound tasks along with their de-

composition methods are directly mapped into HPDL

elements. Beyond the standard HTN constructs and

derived predicates, HPDL incorporates a temporal di-

mension that aligns with PDDL 2.1 level 3 (e.g., tem-

poral facts). HPDL also introduces several special

features, which are instrumental in our knowledge-

engineering approach. These include the modelling

of temporally constrained task, incorporation of an in-

ference task, and use of the bind predicate.

• Temporal facts are timed initial literals or facts

with time points. For example, the fact

(timestamp "30/11/2023 09:31:32" 0) en-

codes a sensor reading’s timestamp with index 0.

• Temporally constrained task is a task in a

method’s subtasks that comes with temporal con-

straints over its start and end points and duration.

For this, three special variables are used, ?start,

?end and ?dur, and a logical expression with re-

lational operators for the constraints.

• Inference task is a task that can appear in

a method’s subtasks to add or remove new

facts into the current state without an ex-

plicit action invocation or capture informa-

tion from the current state. This task is of

the form :inline <precondition> <effect>,

where precondition and effects are usual logical

expressions as deﬁned for an action’s precondi-

tions and effects.

• The bind predicate is used to bind a variable

by evaluating an expression. It is of the form

(bind <var> <expression>).

4 THE APPROACH

Figure 1 illustrates our approach to occupant activ-

ity recognition, which consists of two main processes:

problem instance generation from sensor data and en-

gineering domain models for activity recognition. In

the ﬁrst process, problem instances in HPDL are gen-

erated by converting raw sensor readings into indexed

temporal observations, which are then transformed

into indexed facts for the initial state of an HPDL

problem instance. The concept of indexing draws in-

spiration from (Fernandez-Olivares and Perez, 2020).

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

958

Problem

Instance

Generation

Plan

Generation

Engineering

Domain Models

Indexed Facts

Generation

Activity types,

properties

and objects

Proposed Process

Existing Process

Input Data

Output document

Facts

Sensor

Readings

HPDL Problem

Instance

HPDL Domain

Model

Recognised

Activities

Output Data

Input knowledge

Figure 1: Flowchart representing the approach for occupant

activity recognition as HTN planning.

The second process involves acquiring knowledge

about relevant activities and engineering it into an

HPDL domain model. This includes insights about

activities, their properties, and involved objects from

prior experience, existing activity models, and litera-

ture. We represent activities as primitive tasks. When

a speciﬁc activity occurs at some location, the corre-

sponding primitive task is added to the plan. Since

activities can last for speciﬁc periods, the plan’s tasks

are annotated with start and end times. This requires

aligning the temporal points of primitive tasks with

sensor timestamps (C1) and ensuring no temporal

conﬂicts between tasks in the plan, allowing for par-

allel activities at different locations (C2). The HTN

domain model is responsible for achieving condition

C1, while plan generation ensures C2.

These processes provide the inputs for the plan

generation, which is executed by a temporal HTN

planner, resulting in a plan of temporally annotated

activities recognised in locations within the building.

4.1 Indexed Facts Generation

We assume that raw sensor readings are gathered via

a standard IoT publish/subscribe mechanism, where

sensors send their readings to topics managed by a

message broker, see Figure 2. These readings are

stored in the order they are received, allowing for re-

trieval in the same sequence.

Each sensor reading includes the value and times-

tamp, which are translated into an indexed temporal

observation. An indexed temporal observation is a

tuple (sID, sVal, rTimestamp, idx), where sID is the

sensor’s unique identiﬁer, sVal is the sensor reading

value, rTimestamp is the reading’s timestamp, and idx

…

sensor:s1 topic:x value:1

sensor:s2 topic:y value:250

sensor:s3 topic:y value:125

sensor:sN topic:z value:0

…

30/11/2023 09:31:34

30/11/2023 09:31:52

30/11/2023 17:07:27

(s1 1 “30/11/2023 09:31:34” 0)

(s2 250 “30/11/2023 09:31:52” 1)

(s2 125 “30/11/2023 09:31:52” 1)

(sN 0 “30/11/2023 17:07:27” k)

…

(reading s1 1 0)

(timestamp “30/11/2023 09:31:34” 0)

(reading s2 250 1)

(reading s3 125 1)

(timestamp “30/11/2023 09:31:52” 1)

(reading sN 0 k)

(timestamp “30/11/2023 17:07:27” k)

…

Sensors Message Broker

Indexed Temporal Observation

Indexed HPDL Facts

Figure 2: Process of transforming sensor readings to in-

dexed temporal observations and generating HPDL facts.

is the reading’s position in the sequence.

Observing the correct order of sensor readings is

crucial for accurate activity recognition. If readings

are processed out of order, it can lead to incorrect in-

ferences about activities. For example, if the plan-

ner ﬁrst processes the PC is turned on at time t and

then the monitor is on at time t

′

, it might incorrectly

conclude the occupant is working on the PC, even if

the PC was turned off between t and t

′

. Indexing en-

sures sensor readings are handled in strict sequence,

enabling accurate reasoning.

To represent an indexed temporal observation,

we use standard and temporal facts in HPDL. Fig-

ure 2 illustrates how sensor readings are translated

into HPDL facts. For example, at time 30/11/2023

09:31:54, sensors s2 and s3 report readings simul-

taneously on topic y, so their observations share the

same timestamp and index. These are then translated

into standard predicates for each reading and a tem-

poral fact linking the timestamp to the reading index.

4.2 Engineering Domain Models

We now outline the general process for engineering

HPDL domain models capable of performing occu-

pant activity recognition. This includes the represen-

tation of relevant objects, the modelling of mecha-

nisms for manipulating sensor readings, and the rep-

resentation of occupant activities.

4.2.1 Representing Sensors, Locations, and

Activities

The main types of objects within living spaces related

to user activities are sensors, locations, and activities.

These are represented as types at the highest level of

the typing hierarchy in HPDL. We can then deﬁne

subtypes using these types. Since sensors can be bi-

Occupant Activity Recognition in IoT-Enabled Buildings: A Temporal HTN Planning Approach

959

nary or numeric, we deﬁne their corresponding types

as subtypes of the sensor type. Each sensor can be

deﬁned using these two types; for example, a moni-

tor is a binary sensor, and a chair pressure sensor is a

numeric one.

To represent sensor locations, sensor readings, and

sensor status, we deﬁne a set of predicates, some of

which are shown in Figure 2. Other predicates in-

clude (sensorAt ?s - sensor ?l - location)

and (active ?s - sensor).

To infer useful information from the basic pred-

icates, we deﬁne several derived predicated. Partic-

ularly, (isBinary ?s) and (isNumeric ?s) enable

determining the type of a sensor when the sensor su-

pertype is passed as a parameter. Also, we deﬁne

(existsActiveSensor ?l - location) to deter-

mine whether there is any active sensor at a location.

For the representation of the current ongo-

ing activity and its start time, we employ two

predicates. The ﬁrst one takes the location and

name of the current activity as arguments, whereas

the second one takes the location and start time:

(currentAct ?l - location ?a - activity),

(startAct ?l - location ?time - number).

To infer location types, we propose a straight-

forward representation requiring the speciﬁcation of

one predicate per location in the form (is roomType

?l - roomType). For example, if the location is a

working room, then the location type inferring predi-

cate is (isWorkingRoom ?l - workingRoom), with

workingRoom deﬁned under the location type.

4.2.2 Representing the Indexing Concept

To represent and infer the current index, we use

a derived predicate, (idx ?i - number). How-

ever, to get the last index in the sequence of

sensor readings in some location, we need an addi-

tional derived predicate (lastIdx ?i - number).

Since these predicates depend on numeric vari-

ables, we need functions to generate their values,

(currentIdx) and (lastIdx). To tie the de-

rived predicates and functions together, we employ

the bind predicate to link i to the value of the

current index in the sequence of sensor readings:

(:derived (idx ?i) (bind ?i (curIdx))).

Similarly for the last index.

4.2.3 Representing the Changing Threshold

Concept

To address the concept of changing threshold, we use

a predicate and a function. The predicate, taking the

numeric sensor, the new threshold value, and the in-

dex as parameters, represents the updated value of

the threshold for a given sensor. When a sensor’s

threshold gets changed, this predicate needs to be in-

serted at the same or before the index of the sen-

sor readings related to this threshold. The function

(threshold ?s - numeric) is used to associate the

threshold of a numeric sensor with a numeric value.

4.2.4 Representing a Recognised Activity

For the representation of recognised activities,

we propose using one primitive task, speciﬁcally

the action recognised activity with two param-

eters for the activity location and type. The

action has start and endpoints but has no pre-

conditions and effects. As this action is the only

primitive task, the activities recognised in a lo-

cation represent a temporally ordered sequence

of (:action recognisedActivity ?l ?a) in-

stances. Hence, for each a and a

′

at l

, where

a is added before a

′

in the sequence, it follows

end(a) ≤ start(a

′

). Consequently, a plan is a se-

quence of instances of this action ordered by location,

i.e., the ﬁrst subsequence of actions is for location

, the next subsequence for location l

i+1

, and so on,

aligning with the order of locations provided in the

initial task network. In the ﬁnal plan, for a at l

and a

′

at l

, no temporal constraints exist between a and a

′

4.2.5 Updating Sensor Status

To update the active status of sensors based on their

current readings, we introduce a dedicated compound

task, which distinguishes between binary and nu-

meric sensors and has two methods for each sen-

sor type. For binary sensors, the task activates or

deactivates a sensor when the current sensor read-

ing is 1 or 0, respectively. For numeric sensors,

the logic is more complex yet the outcome re-

mains the same. Speciﬁcally, if a numeric sensor’s

reading equals or exceeds its threshold, the sensor

is active. Otherwise, the sensor is inactive. In

all task’s methods, we can employ inference tasks

whose effects consist of either (sensorActive ?s)

or (not (sensorActive ?s)).

4.2.6 Processing Sensor Readings per Location

Recognising occupant activities per location

dictates how sensor readings should be pro-

cessed to identify speciﬁc activities. We ad-

dress this with a dedicated compound task,

(:task recogniseActivitiesIn ?l). This

task handles sensor readings at location ?l while

considering dynamic thresholds and temporal infor-

mation at the current index. Four methods realise

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

960

this task. The ﬁrst captures new threshold values for

numeric sensors at the current index. The second

processes a new sensor reading by invoking the

compound task for updating the sensor status and

then marking the reading as processed using an

inference task. The third method recognises an

activity at a time point corresponding to the current

index, capturing the temporal information of the

current reading in its precondition and triggering

the compound task (recogniseAactivity ?l ?t)

(detailed in the next subsection). Upon recognising

an activity, the method increments the current index,

informing the planner to look at sensor readings at the

next index. These three methods lead to different de-

compositions via their task networks but all end with

a recursive call to (recogniseActivitiesIn ?l).

A fourth method serves as the base case to end

the recursion, applied when the sequence of sensor

readings for that location is complete. This method’s

task network includes three tasks: an inference

task capturing the current activity at ?l (including

its start and end times), then the primitive task

(recognisedActivity ?a ?l), ensuring it ﬁts

within the temporal constraints of the plan (if a

temporal violation occurs, the decomposition would

fail, leading to backtracking; however, since sensor

readings naturally progress in time, this issue can

be resolved by resetting the index), and an inference

task to reset the current index.

4.2.7 Representing the Activity Hierarchy

To systematically recognise a speciﬁc activ-

ity, we propose employing a compound task

(recogniseActivity ?l ?t), where ?l is the

location and ?t is the timestamp. This task includes

methods for different types of locations, grouped

into two categories: presence-based activities and

absence. Each method for a presence-based activity

decomposes into a compound task for recognising

activities in a speciﬁc type of location. For presence-

based activities, each method decomposes into a

task for recognizing activities in a speciﬁc type of

location. For example, in an ofﬁce building with

three room types, there would be three methods, each

handling presence-based activities for a particular

room type. The “Absence” method checks if all

sensors in the location are inactive, then decomposes

into the task for recognising “Absence”.

4.2.8 Recognising Presence-Based Activities

We can represent the recognition of presence-based

activities at multiple levels of abstraction. At the

highest level, we propose using a compound task with

at least three methods: one for recognising the basic

”Presence” activity, another for continuous recogni-

tion (details in the last subsection), and one for recog-

nising speciﬁc activities. For instance, recognising a

meeting in an ofﬁce might involve methods for pres-

ence, continuous presence, and conducting a meet-

ing. If the method for recognising a speciﬁc activ-

ity fails, the process can fall back to identifying pres-

ence. The method for recognising a speciﬁc activity

can also lead to further reﬁnement if the activity can

be broken down hierarchically. For example, “Meet-

ing” might be reﬁned into “Having a meeting” and

“Giving a presentation”, each represented by a com-

pound task with its own methods, plus one for contin-

uous recognition. This approach allows for encoding

complex hierarchical relationships as needed. Each

of these methods includes preconditions based on the

active status of relevant sensors at the location. For

example, “Giving a presentation” might require ac-

tive chair pressure, projector, and acoustic sensors. If

a method leads to further abstraction, it decomposes

into the appropriate compound task; otherwise, it de-

composes into a task that marks the beginning of the

current activity and the end of the previous one, en-

suring temporal constraints are met while executing

(recognisedActivity ?l ?a).

4.2.9 Recognising “Absence”

Recognising “Absence” operates under the assump-

tion that none of the sensors deployed at a speciﬁc

location are active. This process is encapsulated in a

compound task consisting of two methods. One is re-

sponsible for continuous recognition, while the other

for recognising “Absence” at a given location by de-

composing to the task that marks the beginning and

end of activities.

4.2.10 Recognising Activities Continuously

For each type of activity, at different abstraction lev-

els, a method is needed to handle situations where

an activity is already in progress at a location. This

method, while sharing similar preconditions with oth-

ers, includes the predicate (currentAct ?l ?a).

This design choice allows the planning process to

continue by moving to the next index without adding

the ongoing activity to the plan. When an activity

?a is already in progress, it is stored in the predicate

(currentAct ?l ?a). If a new activity a’ is recog-

nised at the same location, the current activity ?a can

then be added to the plan. The start time of ?a is al-

ready recorded in (startAct ?l ?timepoint), and

its end time is set to when a’ begins. This approach

allows the planning process to accurately assign start

Occupant Activity Recognition in IoT-Enabled Buildings: A Temporal HTN Planning Approach

961

and end times to activities.

4.3 Problem Instances Generation

For the HPDL problem instance, we deﬁne sen-

sors and relevant locations as objects. In the ini-

tial state segment, we require initialising the sen-

sor locations, thresholds for numeric sensors, cur-

rent activity at each location to “Absence”, sensor

readings and corresponding timestamps, current in-

dex to 0, and last index to k, where k is the last

index in the sensor readings sequence. In the ini-

tial task network’s segment, we specify the loca-

tions where activities should be recognised using the

(recogniseActivitiesIn ?l) task.

5 DOMAIN MODELS AND

PRELIMINARY EVALUATION

We now present our results. We ﬁrst give insights into

the two domain models followed by the experimen-

tal setup and experimental results of our preliminary

evaluation.

5.1 The Two Domain Models

Following our approach, we developed two HPDL

domain models: one for recognising activities in

workspaces and another for activities in homes. The

Workspace domain can recognise nine activity types

in three location types, informed by our experience

with living labs and existing knowledge of ofﬁce ac-

tivities (Nguyen et al., 2014; Georgievski, 2022). The

Homes domain can recognise ten activity types in ﬁve

location types, drawing on our experience and exist-

ing knowledge of home activities (Naeem et al., 2007;

van Kasteren et al., 2011). Both domains are designed

to be generic and ﬂexible, supporting typical activi-

ties in workspaces and homes while allowing for cus-

tomisation. The two domains with sample problem

instances are publicly available on GitHub.

5.2 Experimental Setup

Dataset To test our approach with real-world data,

we used a publicly available dataset (Azkune and

Almeida, 2018) of a real home (Kasteren Home

A (van Kasteren et al., 2011)). The home, an apart-

ment of a 26-year-old male, has three rooms equipped

with 14 simple, non-obtrusive sensors. These sensors

collected 1319 readings over 25 different days. The

https://github.com/PlanX-Universe/domains

inhabitant annotated his activities using speech recog-

nition via a headset.

Planner. We use the publicly available version of

SIADEX, which is a temporal HTN planner.

Modelling Considerations. We use our Homes do-

main model. Since the selected dataset includes only

binary sensor readings, we do not need to account

for numeric sensors or dynamic thresholds. Further-

more, determining when a sensor becomes inactive

(switches from 1 to 0) can be inconsistent, as some

sensors deactivate immediately after activation, while

others remain active for extended periods, making it

difﬁcult to interpret the status change. Thus, for this

experiment, we exclude predicates related to sensor

deactivation and focus solely on activation readings.

Moreover, because the SIADEX’s version does not

support timed initial literals, we use numeric ﬂuents

to represent time (e.g., 93347 for 09:33:47). We also

split the dataset into separate problem instances for

each of the 25 days as we lack timestamps to prevent

temporal violations – the end of the last activity on

one day may occur later than the start of the ﬁrst ac-

tivity on the following day.

Evaluation Metrics. We evaluate our approach us-

ing standard accuracy metrics: precision, recall, F-

score, and relative gap. To obtain ground truth for

comparison, we label each sensor reading in the

dataset with the activity inferred from the inhab-

itant’s annotations, treating “None” annotations as

“Absence” activities. Since our approach produces

a plan with recognised activities by location, a di-

rect comparison with the labelled ground truth is not

possible. Instead, we re-label the original dataset ac-

cording to the recognised activities in the plans. We

calculate the metrics for each day, with the following

assumptions: if the ground truth contains no activity

instances for a day and our results also show no ac-

tivity, we assign a score of 1 to precision, recall, and

F-score for that activity (i.e., “Absence”). Otherwise,

if our approach labels readings with activities that do

not match the ground truth, we assign a score of 0 to

each metric.

Comparison. We compare our approach with base-

line data-driven techniques, including Naive Bayes

(NB), Hidden Markov Model (HMM), Hidden Semi-

Markov Model (HSMM), and Conditional Random

https://github.com/UGR-IntelligentSystemsGroup/

HPDL-Planner

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

962

Field (CRF) (van Kasteren et al., 2011). Addition-

ally, we evaluate it against hybrid or enhanced ap-

proaches that combine data-driven and knowledge-

based techniques, such as ARF (Ihianle et al.,

2018), HARS (Azkune and Almeida, 2018), and US-

MART (Ye et al., 2014).

5.3 Results

Daily precision and recall percentages offer limited

insights, as most activities show inconsistent oscilla-

tion with no clear pattern. The highest and most sta-

ble precision scores are observed for “Using toilet”

and “Preparing breakfast”, while the scores for other

activities ﬂuctuate more widely. Precision for “Leav-

ing house” and “Going to bed” also oscillates, though

less extremely. This variability likely stems from the

fact that each of these activities is identiﬁed by a sin-

gle sensor, meaning every activation leads to recog-

nition of the corresponding activity. Irrelevant sensor

activations are common in the dataset, further adding

to these ﬂuctuations. Precision for “Absence” goes

to the extreme, shifting between 1 and 0, indicating

that we either recognise all or none of the non-activity

readings. Recall scores also exhibit oscillation, with

our approach nearly always recognising instances of

“Going to bed” and “Preparing breakfast”. Other ac-

tivities ﬂuctuate between 1 and 0, primarily due to our

convention of assigning 0 or 1 to metrics when they

are not applicable.

Figure 3 shows the average metric percentages.

The highest F-scores were for “Preparing breakfast”

(0.95), “Going to bed” (0.88), and “Leaving house”

(0.85). On the lower end, “Preparing dinner” and

“Getting drink” scored 0.61 and 0.6, respectively,

while ‘Taking shower” and “Absence” scored 0.51

and 0.5. Although “Using toilet” had a high preci-

sion of 0.97, its overall activity recognition rate was

only 59%, leading to an F-score of 0.73.

Our approach effectively distinguishes between

similar activities, such as “Preparing breakfast” and

“Preparing dinner”, due to their distinct times of

day, allowing the planner to check the relevance of

the current timestamp. However, it struggles when

time spans overlap, such as with “Getting drink” and

“Preparing dinner”. Although both activities have

similar F-scores, “Getting drink” has higher preci-

sion, while more instances of “Preparing dinner” are

recognised.

Table 1 shows the comparison between our ap-

proach, marked with an asterisk (*), and baseline

data-driven approaches, ordered by F-score. Our ap-

proach achieved 74.32% precision, 73.6% recall, and

a 73.95% F-score, demonstrating that our knowledge-

Figure 3: Average precision, recall, and F-score per activity.

Table 1: Accuracy comparison with baseline data-driven

approaches on the Kasteren House A dataset.

Approach Precision Recall F-score

NB 67.3% 64.8% 65.8%

CRF 73.5% 68.0% 70.4%

HMM 70.3% 74.3% 72.0%

HSMM 70.5% 75.0% 72.4%

∗ 74.3% 73.6% 73.9%

Table 2: Gap analysis with hybrid approaches using the

Kasteren House A dataset.

Approach USMART HARS HARS

Gap 0.07 3.97 19.5

based approach outperforms traditional data-driven

baselines.

Table 2 shows the F-score gaps between our ap-

proach and more advanced, hybrid approaches. The

gap between our approach and USMART is minimal,

suggesting that our approach is on par with USMART,

despite USMART not accounting for the “None” ac-

tivity. The ability of our approach to recognise “Ab-

sence” gives it an advantage in scenarios where dis-

tinguishing between presence and absence is critical.

The gap with HARS is slightly larger at 3%, indi-

cating that HARS beneﬁts from its combination of

data-driven and knowledge-based techniques but only

marginally outperforms our approach. However, the

19% gap between our approach and ARF highlights

ARF’s superior performance, likely due to its effec-

tive integration of ontological reasoning with classi-

ﬁcation models. This gap points out to the potential

beneﬁts of further enhancing our approach by incor-

porating data-driven strategies.

6 RELATED WORK

The research on this topic can be broadly categorised

into two main streams: activity recognition and plan

and goal recognition (Sukthankar et al., 2014; Van-

Occupant Activity Recognition in IoT-Enabled Buildings: A Temporal HTN Planning Approach

963

Horenbeke and Peer, 2021). Activity recognition

focuses on discovering meaningful human activities

from potentially noisy, low-level sensor data. This

problem has been addressed using both data-driven

and knowledge-based approaches. Data-driven meth-

ods rely on large datasets for training and feature

engineering while exhibiting variable prediction ac-

curacy and poor generalisation (Jobanputra et al.,

2019; Minh Dang et al., 2020; Chen et al., 2021).

Knowledge-based approaches, on the other hand, for-

malise activity models using prior knowledge and ap-

ply reasoning to detect activities, achieving reason-

able accuracy but struggle with temporal constraints,

uncertainty, scalability, and adaptability (Chen et al.,

2008; Riboni et al., 2011; Chen et al., 2011; Nguyen

et al., 2014).

Plan and goal recognition focuses on identify-

ing high-level goals and plans to reach goals by ob-

serving agents’ primitive actions. A common ap-

proach involves using plan libraries to represent pos-

sible plans to be recognised in various domains, such

as homes (Simpson et al., 2006; Bouchard et al.,

2006). Some studies also consider the temporal prop-

erties of recognised plans, e.g., (Levine and Williams,

2014). A notable approach is PGR as planning, where

an agent’s behaviour is represented in domain the-

ory (Ram

ırez and Geffner, 2009). Like our work,

PGR as planning relies on domain models, typically

STRIPS-based (Ram

ırez and Geffner, 2009; Sohrabi

et al., 2016) and recently non-temporal HTNs (H

oller

et al., 2018). Although domain-independent, these

proposals lack expressiveness needed to capture the

nuances of occupant activity recognition and do

tackle the knowledge-engineering aspect.

HTN planning has been applied to recognise and

label sequences of divers’ activities using event logs

according to hours of service regulations (Fernandez-

Olivares and Perez, 2020). In this approach, ini-

tial observations correspond to events in a driver’s

log, and service regulations are translated into a

temporal HTN domain to identify driving activi-

ties, resulting in plans as interpretable labelled event

logs. Also, HTN planning has been used for ac-

tivity recognition in ofﬁces, though without tackling

the domain-engineering problem and temporal prop-

erties (Georgievski, 2022).

Our work parallels runtime veriﬁcation in plan-

ning, which studies whether observed plan execu-

tions meet intended plan semantics (Bensalem et al.,

2014), and conformance checking in process mining,

which studies whether business process executions,

as recorded in event logs, conform to process mod-

els (van der Aalst, 2011). We go beyond verifying

activities against sensor readings by identifying spe-

ciﬁc activities and generating plans that interpret sen-

sor readings as activities with start and end times.

7 CONCLUSIONS

Enabling buildings to support occupants and their

well-being requires placing occupants in the centre of

their operation and recognising occupants’ behaviour.

We propose a novel framing of occupant activity

recognition as temporal HTN planning and a new

domain-engineering approach for designing HTN do-

main models and problem instances that effectively

capture the essential aspects of occupant activities and

IoT data processing. We also developed HTN do-

main models for workspaces and homes, demonstrat-

ing both the applicability of our approach and its abil-

ity to address real-world challenges in occupant activ-

ity recognition. Our work emphasises the importance

of problem formulation in AI planning, extending be-

yond traditional benchmark problem generation. The

efﬁciency of our proposal was validated using the

Homes domain on a real dataset, showing high preci-

sion and recall for speciﬁc activities, and competitive

performance compared to other approaches.

Future research will focus on reﬁning HTN do-

main models to capture more nuanced activities, in-

corporating additional IoT data, and expanding our

approach to diverse environments, such as assisted

living facilities. Integrating machine learning tech-

niques and large language models could further en-

hance accuracy and adaptability to occupant be-

haviours. For example, the integration of HTN plan-

ning as a classiﬁer into the training process of data-

driven approaches is an interesting direction for ex-

ploring performance beneﬁts.

ACKNOWLEDGEMENTS

I thank Ivaylo Spasov for his contributions to the work

that forms the foundation of this paper.

REFERENCES

Al-Masri, E., Kalyanam, K. R., Batts, J., Kim, J., Singh,

S., Vo, T., and Yan, C. (2020). Investigating Messag-

ing Protocols for the Internet of Things (IoT). IEEE

Access, 8:94880–94911.

Awada, M., Becerik-Gerber, B., Hoque, S., O’Neill, Z.,

Pedrielli, G., Wen, J., and Wu, T. (2021). Ten ques-

tions concerning occupant health in buildings dur-

ing normal operations and extreme events including

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

964

the covid-19 pandemic. Building and Environment,

188:107480.

Azkune, G. and Almeida, A. (2018). A Scalable Hybrid

Activity Recognition Approach for Intelligent Envi-

ronments. IEEE Access, 6:41745–41759.

Bensalem, S., Havelund, K., and Orlandini, A. (2014). Ver-

iﬁcation and Validation Meet Planning and Schedul-

ing. International Journal on Software Tools for Tech-

nology Transfer, 16(1):1–12.

Bhatnagar, S., Mund, S., Scala, E., McCabe, K., Mc-

Cluskey, T. L., and Vallati, M. (2022). On-the-Fly

Knowledge Acquisition for Automated Planning Ap-

plications: Challenges and Lessons Learnt. In Inter-

national Conference on Agents and Artiﬁcial Intelli-

gence, pages 387–397.

Bouchard, B., Giroux, S., and Bouzouane, A. (2006).

A Smart Home Agent for Plan Recognition of

Cognitively-impaired Patients. Journal of Compuers.,

1(5):53–62.

Castillo, L., Fdez-Olivares, J., Garc

ıa-P

erez, O., and Palao,

F. (2006). Efﬁciently Handling Temporal Knowledge

in an HTN Planner. In International Conference on

Automated Planning and Scheduling, pages 63–72.

Chen, K., Zhang, D., Yao, L., Guo, B., Yu, Z., and Liu,

Y. (2021). Deep Learning for Sensor-Based Human

Activity Recognition: Overview, Challenges, and Op-

portunities. ACM Computing Surveys, 54(4).

Chen, L., Nugent, C. D., Mulvenna, M., Finlay, D., Hong,

X., and Poland, M. (2008). A Logical Framework

for Behaviour Reasoning and Assistance in a Smart

Home. International Journal of Assistive Robotics

and Mechatronics, 9(4):20–34.

Chen, L., Nugent, C. D., and Wang, H. (2011). A

Knowledge-driven Approach to Activity Recognition

in Smart Homes. IEEE Transactions on Knowledge

and Data Engineering, 24(6):961–974.

Curry, M. R. (1996). The Work in the World: Geographical

Practice and the Written Word. University of Min-

nesota Press.

Fernandez-Olivares, J. and Perez, R. (2020). Driver Activity

Recognition by Means of Temporal HTN Planning. In

International Conference on Automated Planning and

Scheduling, pages 375–383.

Georgievski, I. (2022). Ofﬁce Activity Recognition Using

HTN Planning. In IEEE International Conference on

Signal Image Technology & Internet-Based Systems,

pages 70–77.

Georgievski, I. and Aiello, M. (2015). HTN Planning:

Overview, Comparison, and Beyond. Artiﬁcial Intel-

ligence, 222(0):124–156.

oller, D., Behnke, G., Bercher, P., and Biundo, S. (2018).

Plan and Goal Recognition as HTN Planning. In IEEE

International Conference on Tools with Artiﬁcial In-

telligence, pages 466–473.

Ihianle, I. K., Naeem, U., Islam, S., and Tawil, A.-R.

(2018). A Hybrid Approach to Recognising Activities

of Daily Living from Object Use in the Home Envi-

ronment. Informatics, 5(1).

Jobanputra, C., Bavishi, J., and Doshi, N. (2019). Human

Activity Recognition: A Survey. Procedia Computer

Science, 155:698–703.

Klepeis, N. E., Nelson, W. C., Ott, W. R., Robinson, J. P.,

Tsang, A. M., Switzer, P., Behar, J. V., Hern, S. C.,

and Engelmann, W. H. (2001). The National Human

Activity Pattern Survey (NHAPS): a resource for as-

sessing exposure to environmental pollutants. Jour-

nal of exposure science & environmental epidemiol-

ogy, 11(3):231–252.

Levine, S. and Williams, B. (2014). Concurrent Plan Recog-

nition and Execution for Human-Robot Teams. In In-

ternational Conference on Automated Planning and

Scheduling, pages 490–498.

Liao, L., Fox, D., and Kautz, H. A. (2005). Location-Based

Activity Recognition using Relational Markov Net-

works. In International Joint Conference on Artiﬁcial

Intelligence, volume 5, pages 773–778.

Matz, C. J., Stieb, D. M., and Brion, O. (2015). Urban-

rural differences in daily time-activity patterns, occu-

pational activity and housing characteristics. Environ-

mental Health, 14:1–11.

McDermott, D., Ghallab, M., Howe, A., Knoblock, C.,

Ram, A., Veloso, M., Weld, D., and Wilkins, D.

(1998). PDDL - The planning domain deﬁnition lan-

guage. Tech. Rep. CVC TR-98-003/DCS TR-1165,

Yale Center for Computer Vision and Control.

Minh Dang, L., Min, K., Wang, H., Jalil Piran, M., Hee Lee,

C., and Moon, H. (2020). Sensor-Based and Vision-

Based Human Activity Recognition: A Comprehen-

sive Survey. Pattern Recognition, 108:107561.

Naeem, U., Bigham, J., and Wang, J. (2007). Recognis-

ing activities of daily life using hierarchical plans. In

European Conference on Smart Sensing and Context,

pages 175–189.

Nguyen, T. A. and Aiello, M. (2013). Energy intelligent

buildings based on user activity: A survey. Energy

and buildings, 56:244–257.

Nguyen, T. A., Raspitzu, A., and Aiello, M. (2014).

Ontology-Based Ofﬁce Activity Recognition with Ap-

plications for Energy Savings. Journal of Ambient In-

telligence and Humanized Computing, 5(5):667–681.

Ram

ırez, M. and Geffner, H. (2009). Plan Recognition as

Planning. In International Joint Conference on Artiﬁ-

cial Intelligence, pages 1778–1783.

Riboni, D., Pareschi, L., Radaelli, L., and Bettini, C. (2011).

Is Ontology-Based Activity Recognition Really Ef-

fective? In IEEE International Conference on Per-

vasive Computing and Communications Workshops,

pages 427–431.

Simpson, R., Schreckenghost, D., LoPresti, E. F., and

Kirsch, N. (2006). Plans and Planning in Smart

Homes. In Designing Smart Homes: The Role of AI,

pages 71–84. Springer.

Sirin, E., Parsia, B., Grau, B. C., Kalyanpur, A., and Katz,

Y. (2007). Pellet: A practical OWL-DL reasoner.

Journal of Web Semantics, 5(2):51–53.

Sohrabi, S., Riabov, A. V., and Udrea, O. (2016). Plan

Recognition as Planning Revisited. In International

Joint Conference on Artiﬁcial Intelligence, pages

3258–3264.

Occupant Activity Recognition in IoT-Enabled Buildings: A Temporal HTN Planning Approach

965

Sukthankar, G., Goldman, R. P., Geib, C., Pynadath, D.,

and Bui, H. (2014). An Introduction to Plan, Activity,

and Intent recognition. In Plan, Activity, and Intent

Recognition, pages 1–22. Elsevier.

Uschold, M. and Gruninger, M. (1996). Ontologies: Princi-

ples, methods and applications. The knowledge engi-

neering review, 11(2):93–136.

van der Aalst, W. M. P. (2011). Process Mining: Discov-

ery, Conformance and Enhancement of Business Pro-

cesses. Springer Publishing Company, Incorporated,

1st edition.

Van-Horenbeke, F. A. and Peer, A. (2021). Activity,

Plan, and Goal Recognition: A Review. Frontiers in

Robotics and AI, 8.

van Kasteren, T. L., Englebienne, G., and Kr

ose, B. J.

(2011). Human Activity Recognition from Wireless

Sensor Network Data: Benchmark and Software. In

Chen, L., Nugent, C. D., Biswas, J., and Hoey, J., edi-

tors, Activity Recognition in Pervasive Intelligent En-

vironments, pages 165–186. Atlantis Press.

Wu, J., Osuntogun, A., Choudhury, T., Philipose, M., and

Rehg, J. M. (2007). A Scalable Approach to Activity

Recognition Based on Object Use. In IEEE Interna-

tional Conference on Computer Vision, pages 1–8.

Ye, J., Stevenson, G., and Dobson, S. (2014). USMART: An

Unsupervised Semantic Mining Activity Recognition

Technique. ACM Transactions on Interactive Intelli-

gent Systems, 4(4).

APPENDIX

The acronyms mentioned in this paper are listed in

Table 3.

Table 3: List of acronyms.

Acronym Description

AI Artiﬁcial Intelligence

IoT Internet of Things

PGR Plan and Goal Recognition

HTN Hierarchical Task Network

HPDL Hierarchical Domain Deﬁnition Language

PDDL Planning Domain Deﬁnition Language

NB Naive Bayes

HMM Hidden Markov Model

HSMM Hidden Semi-Markov Model

CRF Conditional Random Field

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

966