to its actions. Furthermore, since the generic goal
is to learn the most opportune timings, time and, in
some cases, even the day of the observation are also
tracked. With the purpose of recording users’ reac-
tions to the actions of the agent, 3 options were de-
fined and associated with a value: 0, meaning that in
the last timestep, a notification was sent and ignored
or dismissed; 1,a notification was sent and positively
addressed ; 3, a notification was not sent.
2.3 Reward Definition
The various types rewards of our experiments are
structured in the following manner: when a notifica-
tion is sent and the user does not answer, the agent
receives reward a. However, if the user responds then
the received reward value is b. Contrarily, if a no-
tification is not sent the agent receives c. Lastly, if
the episode, in this context a day, ends without hav-
ing achieved the goal of one answer then d is re-
ceived. Thus, the rewards assume values in the set
R = {a,b,c, d}. We define the following alternatives
for the values of {a,b,c, d}: R1 = {−1,2,−1,−2}
; R3 = {−2, 2,−1,−3}; R5 = {−2,2,0,−3} ; R6 =
{−3,2,0, −3}.
The general idea we wish to transmit to the agent
with these structures is that the goal is to get the user
to answer one notification without bothering them by
sending notifications that go unanswered.
2.4 Environment Model
We assume that no difference exists between ignoring
a message or explicitly dismissing it, considering both
as “No Answer”. In this initial approach, we do not
wish to understand why a moment is less opportune
but simply that it is. In this way, the users’ answers or
lack of it are registered, and their motivations disre-
garded. Furthermore, the user’s answer is considered
to be either immediate or non-existent.
2.4.1 Behavior Model
This model reflects a users’ routine, for example, the
activities performed, their duration, and the user’s lo-
cation. It mirrors the ExtraSensory dataset (Vaiz-
man et al., 2017), which aggregates daily traces of
60 participants. Measurements from smartphones and
smartwatches were collected, along with self-reported
labels. Since this data was collected in the wild, its re-
liability is not perfect; after processed and cleaned,
it considers 51 possible tags, shown in Appendix
A (15 locations, 8 primary activities, 28 secondary
ones). These include primary activities, which de-
scribe movement or posture and are mutually exclu-
sive, and secondary activities, which represent a more
specific context. For the latter, such as for locations,
the user could apply several tags to a single instance
in time. In this simulator, the users’ state is repre-
sented as the combination of one primary activity and
a set of up to 43 possible secondary tags, composed
by secondary activities and locations.
From the available data, three user traces were
chosen. These were selected according to two main
concerns: providing lifestyles as distinct as possible
while ensuring the availability of enough data to rep-
resent a week in these users’ lives.
2.4.2 Response Model
The response model simulates how a user responds
to a notification in any given context, originating the
observations that our agent receives. In the literature,
a set of behaviors that researchers consensually agree
users tend to show were considered when implement-
ing this model (Mehrotra et al., 2015; Mehrotra et al.,
2016; Mehrotra and Musolesi, 2017).
Firstly, when the behavior model presents labels
such as sleeping, which ensure an inability to answer,
the simulator does not respond to notifications. In the
case of tasks such as driving or being in a meeting, for
which usually a low probability of answering is asso-
ciated, the simulator tends not to respond. Secondly,
a randomness level is always associated with every
decision the simulator makes, except when the user
is sleeping. This level intends to express the same
randomness a human would show in their daily life.
Thirdly, a component (β), defined as the exponential
decay in (4), is used to convey the diminishing desire
to use the app that most users would experience as the
number of daily notifications rises.
β(n
t
) = P(Answer | n
t
) = e
−λn
t
(4)
Here, n
t
represents the number of messages already
sent during the current day. λ equals 0.3, chosen to
guarantee reasonable values are obtained.
Each user has a predefined prior probability of an-
swering P(A) and not answering P(A). This value
represents a person’s predisposition to be on their
phone and regularly use a mHealth application. We
assume a fixed value for each simulated user.
Assuming statistical independence between labels
and following the Naive Bayes probability model (5),
the probability of the user answering or not is as fol-
lows:
Active Data Collection of Health Data in Mobile Devices
163