be used with heuristics or have a large community of
users actively evaluating content (for example,
movies) suitable for collaborative filtering. More
research-oriented systems of recommendations use a
much wider range of methods that offer advantages
such as improved accuracy combined with
constraints such as feedback requirement or
intrusive monitoring of user behavior over long
periods of time (Middleton, 2009).
Ontology is a powerful tool for profiling users.
Ontology can contain all sorts of useful knowledge
about users and their interests, for example, related
to scientific subjects, technologies underlying each
subject area, projects over which people work, etc.
This knowledge can be used to bring out more
interests than it can be seen simply by observation.
We try to use inference to improve user profiles.
Is-relationships in the thematic ontology are used to
evoke interest in more general topics of the
superclass. This conclusion has the effect of
rounding profiles, making them more inclusive and
tuning them to the broad interests of the user.
We try to apply time decay to the observed
behavior events for the formation of the main
profile. Then the output is used to enhance the
profile of interests, with the 50% output rule applied
to all ontological relationships, up to the root class,
for each observed event.
The events of interest were chosen to balance the
feedback in favor of explicitly provided feedback,
which is likely to be the most reliable. The value of
the 50% output was chosen to reflect a decrease in
confidence that you are deviating from the observed
behavior that you are moving. Determining the
optimal values for these parameters will require
further empirical evaluation.
Recommendation systems suffer from a cold
start problem (Middleton, 2009), where the lack of
initial behavioral information significantly reduces
the accuracy of user profiles and therefore
recommendations. This low performance can keep
users from accepting the system, which, of course,
does not allow the system to receive more data about
behavior; it is possible that the recommendation
system will never be used enough to overcome its
cold start.
We try to apply external ontology containing
domain specific data. The knowledge stored in the
external ontology is used to initial load user profiles
in order to reduce the effect of cold start. External
ontology providing a solid basis for communication.
Knowledge stored in an external ontology is used to
determine historical interests for new users, and a
network analysis of ontological relationships is used
to identify similar users whose own interests can be
used to load a new user profile.
3.4 Recommendations with GRU
RNN have been developed to model sequence data.
The main difference between RNNs and ordinary
feed forward deep model is the existence of internal
state unit. Standard RNNs update hidden state with
function
ℎ
=(
−ℎ
),
where is logistic sigmoid function of
. An RNN
outputs a probability distribution over the next
element of the sequence, given its current state ℎ
.
A Gated Recurrent Unit (GRU) is more
advanced model of RNN that aims to solve
vanishing gradient problem. GRU gates essentially
learn when and by how much to update the hidden
state of the unit (Razvan, 2012). The activation of
the GRU is a linear interpolation between the
previous activation and the candidate activation ℎ
:
ℎ
=
(
1−
)
ℎ
+
ℎ
,
where the update gate is given by:
=(
+
ℎ
),
while the candidate activation function
ℎ
is
computed in a similar manner:
ℎ
=tanh(
+
(
⊙ℎ
)
),
and finally the reset gate
is given by:
(
+
ℎ
).
We used the GRU-based RNN model for
recommendations. The input of the network is the
current session while the output is the item of the
next event in the session. The state of the session can
either be the item of the actual event or the events in
the session so far. In the former case 1-of-N
encoding is used, i.e. the input vector’s length equals
to the number of items and only the coordinate
corresponding to the active item is one, the others
are zeros. The latter setting uses a weighted sum of
these representations, in which events are discounted
if they have occurred earlier. For the stake of
stability, the input vector is then normalized. We
expect this to help because it reinforces the memory
effect: the reinforcement of very local ordering
constraints which are not well captured by the longer
memory of RNN. We also experimented with adding
an additional embedding layer, but the 1-of-N
encoding always performed better.
The core of the network is the GRU layer(s) and
additional feedforward layers can be added between
the last layer and the output. The output is the