interactions, specifically, by offering a novel
solution to the problem of optimally ordering a set of
interrelated prediction tasks.
The partial-order preferences part of our
formulation for the task ordering problem relates our
work to Cohen et al. 0, who used a similar pair-wise
preference formulation for the problem of ranking
Web pages when user feedback is available in the
form of pair-wise preferences. However, their main
focus was on finding a good linear combination of
preference functions. The problem of finding the
optimal total order was addressed using a simple
greedy algorithm, which provides an approximate
solution within a factor of two of the optimal order.
It is noteworthy that our solution for the problem
goes beyond other approximation algorithms for
LOP. That is, the link-analysis based approach for
heuristically optimizing the task order enables us to
leverage the transitivity of partial-order preferences,
which is not a property of other algorithmic
approximations for LOP. By modelling such
transitive relations explicitly, we hope to make our
method more robust against noisy or insufficient
data by estimating all pair-wise preferences more
reliably through the use of the transitivity. It is
possible to combine the strength of our current
method with those of other approximation
algorithms for LOP, which is an interesting topic for
future research.
In our proposed approach, the pair-wise task
order preferences are empirically estimated by
directly observing the performance of the classifiers
with respect to different task orders. This has the
benefit of allowing direct optimization of arbitrary
performance metrics, for instance, domain-specific
utility metrics that assign different costs to each
prediction task. It would be interesting to experiment
with different choices of the metric used to populate
the partial order matrix and the metric used to
evaluate the system, and assessing the effect of
matching vs. mismatching the two choices.
7 CONCLUSIONS
This paper examines the task ordering problem for
prediction systems in a multi-step process. We
propose a formulation of the problem in terms of
pair-wise preferences of task orders that are learned
in a supervised fashion and represented using
directed graphs. Such a formulation naturally lends
itself to the application of link analysis approaches
like HITS and PageRank, which provide reasonable
heuristics for optimizing the overall utility of a
sequence of prediction tasks, and more importantly,
enable efficient computation of optimal sequence for
applications with a large number of tasks.
Experiments on a real collection of structured
documents provide promising empirical evidence for
the effectiveness of the proposed methods: the
performance in terms of macro and micro F1 of the
classifiers improved by 27% and 13%, respectively,
over the performance of random ordering, and was
statistically indistinguishable from the performance
obtained when using an expert-suggested ordering of
the tasks.
REFERENCES
Brin, S. and Page, L. The anatomy of a large-scale
hypertextual Web search engine. Proceedings of the
7th World-Wide Web Conference, (1998)
Caruana, R. and Niculescu-Mizil, A. An empirical
comparison of supervised learning algorithms.
Proceedings of the 23rd international conference on
Machine learning (2006)
Charon, I. and Hudry, O. A branch-and-bound algorithm
to solve the linear ordering problem for weighted
tournaments. Discrete Applied Mathematics (2006)
Cohen, W.W. and Schapire, R.E. and Singer, Y. Learning
to order things. Journal of Artificial Intelligence
Research (1999)
Ingwersen, P. and Jarvelin, K. Information retrieval in
context: IRiX. ACM SIGIR Forum (2005)
Joachims, T. Text categorization with support vector
machines: Learning with many relevant features.
Proceedings of ECML-98, 10th European Conference
on Machine Learning (1998)
Kleinberg, J. Authoritative sources in a hyperlinked
environment. ACM-SIAM Symposium on Discrete
Algorithms, (1998)
Lewis, L. Evaluating Text Categorization, Proceedings of
Speech and Natural Language Processing Workshop
(1991)
Ozmutlu, S. and Ozmutlu, H.C. and Spink, A.
Multitasking Web searching and implications for
design. Proceedings of the American Society for
Information Science and Technology (2003)
Reinelt G. The Linear Ordering Problem: Algorithms and
Applications. Research and Exposition in
Mathematics, (1985)
Rijsbergen, C.J. Information Retrieval. Butterworths
(1979)
Yang, Y. and Lad, A. and Lao, N. and Harpale, A. and
Kisiel, B. and Rogati, M. Utility-based information
distillation over temporally sequenced documents.
Proceedings of the 30th annual international ACM
SIGIR conference on Research and development in
information retrieval (2007)
GRAPH STRUCTURE LEARNING FOR TASK ORDERING
169