does not require explicitly defining legal issues and
constructing queries. Second, by providing meaning-
ful, hierarchically structured labels by way of our la-
beling algorithm for legal issues, we show that users
can effectively identify interesting and useful topics.
And third, the system is highly scalable and flexible,
as it has been applied to on the order of 100 million
associations across different document types.
Based on our studies, users, especially legal re-
searchers, often prefer to have the ability to drill down
and focus on key issues common within a document
set, as opposed to getting a high-level overview of
a document collection. Attention to fine-grained le-
gal issues, robustness and resulting topically homo-
geneous but content-type heterogeneous, high quality
document clusters, not to mention scalability are the
chief characteristics of this issue-based recommenda-
tion system. It represents a powerful research tool for
the legal community.
Our future work will focus on improvements in the
existing topic segmentation algorithm for documents
which contain little metadata information. We have
been experimenting with topic modeling algorithms,
such as Latent Dirichlet Allocation (LDA) (Blei et al.,
2003) and Non-negative Matrix Factorization (NMF)
(Lee and Seung, 1999), in other related projects, and
have witnessed very promising outcomes. Human
quality labels remain a challenge since up until now,
substantial manual reviews by human experts have
been required to ensure quality. We are pursuing this
subject as another future research direction.
We thank John Duprey, Helen Hsu, Debanjan Ghosh
and Dave Seaman for their help in developing soft-
ware for this work, and we are also grateful for the as-
sistance of Julie Gleason and her team of legal experts
for their detailed quality assessments and invaluable
feedback. We thank Khalid Al-Kofahi, Bill Keenan
and Peter Jackson as well for their on-going feedback
and support.
