task. For example, in Figure 1, the 3 shortest paths of
length 3 from MP3 to United Kingdom run through a
total of 4 different nodes: United States, Internet, Eu-
rope and Ice Hockey.The maximum number of unique
nodes on 3 shortest paths of length 3 is 6 (3 times 2
unique intermediary nodes). Somewhat inspired by
betweenness centrality, we propose to divide the num-
ber of nodes on the actual shortest paths by the maxi-
mum possible number of intermediary nodes, a mea-
sure which we will call shortest paths uniqueness. In
our example this results in a score of
4
6
≈ 0.67. We
will incorporate this measure along with the distance
in the difficulty classifier defined as:
dusp(u,v) = d(u,v)+β
1−
log(ψ(u,v))
log(d(u,v) × σ(u, v))
Here, ψ(u,v) is a function that returns the number of
distinct nodes on the shortest paths between u and v.
The used values are again logarithmic as a result of
the distribution of the number of unique nodes on the
shortest paths, depicted by the various thin lines in
Figure 5. The parameter β ≥ 0 indicates the amount of
focus on the number of distinct nodes over all shortest
paths, and best results were obtained for β = 1.75.
The performance of the measure is displayed by
the dotted line in Figure 6, showing a correlation of
c = −0.924 and rc = −0.925, demonstrating how
shortest paths uniqueness is a good refinement of the
global difficulty indicator based solely on distance.
6 CONCLUSIONS
Throughout this paper we have proposed and ana-
lyzed a range of techniquesfor classifying path traver-
sal difficulty in information networks. The results
are summarized in Table 2. Local measures related
to the goal article, such as the reversed neighborhood
size, appear to be most effective, whereas local prop-
erties of the source article appear to be of little in-
fluence to path difficulty. Apparently, a user tends to
quickly find his way to a hub node, from where the ac-
tual search process starts. As for the global measures
considered in this work, the distance between two
articles, though limited in range, is a good measure
of difficulty. Incorporating the percentage of unique
nodes over all shortest paths results in a global clas-
sifier with slightly better performance, but due to the
higher complexity of global measures, one may favor
the local classifiers in a practical application such as
The Wiki Game, where the difficulty classifiers could
be used to allow users to select a difficulty level.
In future work we would like to include more
article-specific information, such as the article’s link
Table 2: Summary of correlation coefficients (c), rank cor-
relation coefficients (rc) and complexity per task t = (u, v)
of the proposed difficulty classifiers for q difficulty classes.
Classifier Complexity q c rc
indeg(v) O(1) 100 0.850 0.960
outdeg(u) O(1) 100 0.637 0.789
|N
′
2
(v)| O(m/n) 100 0.915 0.978
|N
2
(u)| O(m/n) 100 0.397 0.492
d(u, v) O(m) 6 −0.957 −1.000
dsp(u,v) O(m) 100 −0.895 −0.876
dusp(u,v) O(m) 100 −0.924 −0.925
density, which loosely represents the branching fac-
tor. We also want to analyze a user’s frequent sub-
paths, which may help us to obtain a better under-
standing of the search process of a certain user or
group of similar users, possibly allowing us to per-
sonalize the difficulty indicators.
ACKNOWLEDGEMENTS
This research is part of the NWO COMPASS project
(#612.065.926). We thank A. Clemesha for the data.
REFERENCES
Agarwal, R., Veer Arya, K., and Shekhar, S. (2010). An
architectural framework for web information retrieval
based on user’s navigational pattern. In Proceedings
of the 5th International Conference on Industrial and
Information Systems, pages 195–200.
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak,
R., and Ives, Z. (2007). DBpedia: A nucleus for a
web of open data. In Proceedings of 6th International
Semantic Web Conference, pages 722–735.
Bizer, C., Heath, T., and Berners-Lee, T. (2009). Linked
data-the story so far. International Journal on Seman-
tic Web and Information Systems, 5(3):1–22.
Brandes, U. (2001). A faster algorithm for between-
ness centrality. Journal of Mathematical Sociology,
25(2):163–177.
He, B., Patel, M., Zhang, Z., and Chang, K. (2007). Ac-
cessing the deep web. Communications of the ACM,
50(5):94–101.
Hsieh-Yee, I. (2001). Research on web search behavior.
Library & Information Science, 23(2):167–185.
Hu, J., Wang, G., Lochovsky, F., Sun, J., and Chen,
Z. (2009). Understanding user’s query intent with
Wikipedia. In Proceedings of the 18th International
World Wide Web Conference, pages 471–480.
Kentsch, A. M., Kosters, W., van der Putten, P., and
Takes, F. (2011). Exploratory recommendations us-
ing Wikipedia’s linking structure. In Proceedings of
the 20th Belgian Netherlands Conference on Machine
Learning, pages 61–68.
TheDifficultyofPathTraversalinInformationNetworks
143