be used with heuristics or have a large community of 
users actively evaluating content (for example, 
movies) suitable for collaborative filtering. More 
research-oriented systems of recommendations use a 
much wider range of methods that offer advantages 
such as improved accuracy combined with 
constraints such as feedback requirement or 
intrusive monitoring of user behavior over long 
periods of time (Middleton, 2009). 
Ontology is a powerful tool for profiling users. 
Ontology can contain all sorts of useful knowledge 
about users and their interests, for example, related 
to scientific subjects, technologies underlying each 
subject area, projects over which people work, etc. 
This knowledge can be used to bring out more 
interests than it can be seen simply by observation. 
We try to use inference to improve user profiles. 
Is-relationships in the thematic ontology are used to 
evoke interest in more general topics of the 
superclass. This conclusion has the effect of 
rounding profiles, making them more inclusive and 
tuning them to the broad interests of the user. 
We try to apply time decay to the observed 
behavior events for the formation of the main 
profile. Then the output is used to enhance the 
profile of interests, with the 50% output rule applied 
to all ontological relationships, up to the root class, 
for each observed event. 
The events of interest were chosen to balance the 
feedback in favor of explicitly provided feedback, 
which is likely to be the most reliable. The value of 
the 50% output was chosen to reflect a decrease in 
confidence that you are deviating from the observed 
behavior that you are moving. Determining the 
optimal values for these parameters will require 
further empirical evaluation. 
Recommendation systems suffer from a cold 
start problem (Middleton, 2009), where the lack of 
initial behavioral information significantly reduces 
the accuracy of user profiles and therefore 
recommendations. This low performance can keep 
users from accepting the system, which, of course, 
does not allow the system to receive more data about 
behavior; it is possible that the recommendation 
system will never be used enough to overcome its 
cold start. 
We try to apply external ontology containing 
domain specific data. The knowledge stored in the 
external ontology is used to initial load user profiles 
in order to reduce the effect of cold start. External 
ontology providing a solid basis for communication. 
Knowledge stored in an external ontology is used to 
determine historical interests for new users, and a 
network analysis of ontological relationships is used 
to identify similar users whose own interests can be 
used to load a new user profile. 
3.4  Recommendations with GRU 
RNN have been developed to model sequence data. 
The main difference between RNNs and ordinary 
feed forward deep model is the existence of internal 
state unit. Standard RNNs update hidden state with 
function  
ℎ
=(
−ℎ
), 
where  is logistic sigmoid function of 
. An RNN 
outputs a probability distribution over the next 
element of the sequence, given its current state ℎ
. 
A Gated Recurrent Unit (GRU) is more 
advanced model of RNN that aims to solve 
vanishing gradient problem. GRU gates essentially 
learn when and by how much to update the hidden 
state of the unit (Razvan, 2012). The activation of 
the GRU is a linear interpolation between the 
previous activation and the candidate activation ℎ
: 
ℎ
=
(
1−
)
ℎ
+
ℎ
, 
where the update gate is given by: 
=(
+
ℎ
), 
while the candidate activation function
  ℎ
 is 
computed in a similar manner: 
ℎ
=tanh(
+
(
⊙ℎ
)
), 
and finally the reset gate 
 is given by: 
(
+
ℎ
). 
We used the GRU-based RNN model for 
recommendations. The input of the network is the 
current  session while the output is the item of the 
next event in the session. The state of the session can 
either be the item of the actual event or the events in 
the session so far. In the former case 1-of-N 
encoding is used, i.e. the input vector’s length equals 
to the number of items and only the coordinate 
corresponding to the active item is one, the others 
are zeros. The latter setting uses a weighted sum of 
these representations, in which events are discounted 
if they have occurred earlier. For the stake of 
stability, the input vector is then normalized. We 
expect this to help because it reinforces the memory 
effect: the reinforcement of very local ordering 
constraints which are not well captured by the longer 
memory of RNN. We also experimented with adding 
an additional embedding layer, but the 1-of-N 
encoding always performed better.  
The core of the network is the GRU layer(s) and 
additional feedforward layers can be added between 
the last layer and the output. The output is the