SEARCHING KEYWORD-LACKING FILES BASED ON LATENT INTERFILE RELATIONSHIPS

Tetsutaro Watanabe, Takashi Kobayashi, Haruo Yokota

2010

Abstract

Current information technologies require file systems to contain so many files that searching for desired files is a major problem. To address this problem, desktop search tools using full-text search techniques have been developed. However, those files lacking any given keywords, such as picture files and the source data of experiments, cannot be found by tools based on full-text searches, even if they are related to the keywords. It is even harder to find files located in different directories from the files that include the keywords. In this paper, we propose a method for searching for files that lack keywords but do have an association with them. The proposed method derives relationship information from file access logs in the file server, based on the concept that those files opened by a user in a particular time period are related. We have implemented the proposed method, and evaluated its effectiveness by experiment. The evaluation results indicate that the proposed method is capable of searching keyword-lacking files and has superior precision and recall compared with full-text and directory-search methods.

References

  1. Agrawal, N., Bolosky, W. J., Douceur, J. R., and Lorch, J. R. (2007). A five-year study of file-system metadata. ACM Transactions on Storage, 3(3).
  2. Barreau, D. and Nardi:, B. A. (1995). Finding and reminding - file organization from the desktop,. ACM SIGCHI Bulletin, 27(3):39-43.
  3. Blanc-Brude, T. and Scapin, D. L. (2007). What do people recall about their documents?: Implications for desktop search tools. In Proc. Intl' Conf. on Intelligent User Interfaces(IUI2007), pages 102-111.
  4. Chirita, P. A., Gaugaz, J., Costache, S., and Nejdl, W. (2006). Desktop context detection using implicit feedback. In Proc. SIGIR 2006 Workshop on Personal Information Management, pages 24-27.
  5. Chirita, P. A., Gavriloaie, R., Ghita, S., Nejdl, W., and Paiu, R. (2005). Activity based metadata for semantic desktop search. In Proc. Second European Semantic Web Conference(ESWC 2005), pages 439-454.
  6. Chirita, P. A. and Nejdl, W. (2006). Analyzing user behavior to rank desktop items. In Proc. Intl' Symp. on String Processing and Information Retrieval(SPIRE), pages 86-97.
  7. Cohen, S., Domshlak, C., and Zwerdling:, N. (2008). On ranking techniques for desktop search. ACM Transactions on Information Systems, 26.
  8. Dumais, S., Cutrell, E., Cadiz, J. J., Jancke, G., Sarin, R., and Robbins, D. C. (2003). Stuff i've seen: A system for personal information retrieval and re-use. In Proc. SIGIR2003, pages 72-79.
  9. Estraier, H. (2007). Hyper estraier: a fulltext search system for communities. http://hyperestraier.sourceforge.net/.
  10. Fertig, S., Freeman, E., and Gelernter, D. (1996). ”finding and reminding” reconsidered,. ACM SIGCHI Bulletin, 28(1):66-69.
  11. Freeman, E. and Gelernter, D. (1996). Lifestreams: A storage model for personal data. ACM SIGMOD Bulletin, 25:80-86.
  12. Gifford, D. K., Jouvelot, P., Sheldon, M. A., and James W. O'Toole, J. (1991). Semantic file systems. In Proc. ACM Symposium on Operating Systems Principles,, pages 16-25.
  13. Google (2010). Google image search,. http://images. google.com/.
  14. Hayes, B. (2002). Terabyte territory. American Scientist, 90(3):212-216.
  15. Ishikawa, K., Morishima, A., and Tajima:, K. (2006). Development of a semantic file system for managing large document spaces(in japanese). Technical report DE2006-115, IEICE.
  16. Namazu (2009). Namazu: a full-text search engine. http://www.namazu.org/index.html.en.
  17. Nejd, W. and Paiu, R. (2005). Desktop search - how contextual information influences search results and rankings. In Proc. ACM SIGIR 2005 Workshop on Information Retrieval in Context (IRiX), pages 29-32.
  18. Ohsawa, R., Takashio, K., and Tokuda, H. (2006). Oredesk: A tool for retrieving data history based on user operations. In Proc. Eighth IEEE International Symposium on Multimedia (ISM'06), pages 762-765.
  19. Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. TR 1999-66, Stanford InfoLab.
  20. Rekimoto, J. (1999). Timemachine computing: A timecentric approach for the information environment. In Proc. ACM UIST'99.
  21. Soules, C. A. and Ganger:, G. R. (2005). Connections: Using context to enhance file search,. In Proc. ACM Symposium on Operating Systems Principles, pages 119- 132.
  22. Watanabe, T., Kobayashi, T., and Yokota:, H. (2008). A method for searching keyword-lacking files based on interfile relationships. In Proc. 16th Intl' Conf. on Cooperative Information Systems (CoopIS'08), pages 14-15.
  23. Yee, K.-P., Swearingen, K., Li, K., and Hearst:, M. (2003). Faceted metadata for image search and browsing,. In Proc. CHI'03, pages 401-408.
Download


Paper Citation


in Harvard Style

Watanabe T., Kobayashi T. and Yokota H. (2010). SEARCHING KEYWORD-LACKING FILES BASED ON LATENT INTERFILE RELATIONSHIPS . In Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT, ISBN 978-989-8425-22-5, pages 236-244. DOI: 10.5220/0002931002360244


in Bibtex Style

@conference{icsoft10,
author={Tetsutaro Watanabe and Takashi Kobayashi and Haruo Yokota},
title={SEARCHING KEYWORD-LACKING FILES BASED ON LATENT INTERFILE RELATIONSHIPS},
booktitle={Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,},
year={2010},
pages={236-244},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002931002360244},
isbn={978-989-8425-22-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,
TI - SEARCHING KEYWORD-LACKING FILES BASED ON LATENT INTERFILE RELATIONSHIPS
SN - 978-989-8425-22-5
AU - Watanabe T.
AU - Kobayashi T.
AU - Yokota H.
PY - 2010
SP - 236
EP - 244
DO - 10.5220/0002931002360244