Authors:
Chetan Verma
1
;
Michael Hart
2
;
Sandeep Bhatkar
2
;
Aleatha Parker-Wood
2
and
Sujit Dey
1
Affiliations:
1
University of California San Diego, United States
;
2
Symantec Research Labs, United States
Keyword(s):
Information Retrieval, Machine Learning, Enterprise, File Systems.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Collaborative and Social Interaction
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Health Information Systems
;
Human-Computer Interaction
;
Information Systems Analysis and Specification
;
Knowledge Management
;
Model Driven Architectures and Engineering
;
Ontologies and the Semantic Web
;
Sensor Networks
;
Signal Processing
;
Society, e-Business and e-Government
;
Soft Computing
;
Web Information Systems and Technologies
Abstract:
The data which knowledge workers need to conduct their work is stored across an increasing number of
repositories and grows annually at a significant rate. It is therefore unreasonable to expect that knowledge
workers can efficiently search and identify what they need across a myriad of locations where upwards of
hundreds of thousands of items can be created daily. This paper describes a system which can observe user
activity and train models to predict which items a user will access in order to help knowledge workers discover
content. We specifically investigate network file systems and determine how well we can predict future access
to newly created or modified content. Utilizing file metadata to construct access prediction models, we show
how the performance of these models can be improved for shares demonstrating high collaboration among its
users. Experiments on eight enterprise shares reveal that models based on file metadata can achieve F scores
upwards of 99%. Furthermore, on
an average, collaboration aware models can correctly predict nearly half of
new file accesses by users while ensuring a precision of 75%, thus validating that the proposed system can be
utilized to help knowledge workers discover new or modified content.
(More)