Create a Specialized Search Engine - The Case of an RSS Search Engine

Robert Viseur

Abstract

Several approaches are possible for creating specialized search engines. For example, you can use the API of existing commercial search engines or create engine from scratch with reusable components such as open source indexer. RSS format is used for spreading information from websites, creating new applications (mashups), or collecting information for competitive or technical watch. In this paper, we focus on the study case of an RSS search engine development. We identify issues and propose ways to address them.

References

  1. Alspaugh, T. A., Asuncion, H. U., Scacchi W., 2009. Intellectual property rights requirements for heterogeneously-licensed systems. In 17th IEEE International Requirements Engineering Conference (RE'09), pp. 24-33, Augustus 31 - September 4, 2009.
  2. Bing, 2012. Advanced Operator Reference, MSDN (msdn.microsoft.com). Read: February 3, 2012.
  3. Boughanem, M., Tamine-Lechani, L., Martinez, J., Calabretto, S., Chevallet, J.-P., 2006, Un nouveau passage à l'échelle en recherche d'information. In Ingénierie des Systèmes d'Information (ISI), 11 (4), pp. 9-35.
  4. Chakrabarti, S., 2002. Mining the Web, MorganKaufmann Publishers.
  5. Christen, M., 2011. Web Search by the people, for the people. RMLL 2011, Strasbourg (France).
  6. Foster, J. C., 2007. Automating Google searching. In Long, J., Google Hacking for Penetration Testers. Syngress.
  7. Gao, W., Lee, H. C., Miao, Y, 2006. Geographically focused collaborative crawling. In Proceedings of the 15th International Conference on World Wide Web, Edinburgh, Scotland, May 23-26, 2006), ACM Press, New York pp. 287-296.
  8. Gill, K. E., 2005, Blogging, RSS and the information landscape: a look at online news. In WWW 2005 Workshop on the Weblogging Ecosystem, May 10-14, 2005, Chiba (Japan).
  9. Gulli, A., 2005. The Anatomy of a News Search Engine, In WWW 2005, May 10-14, 2005, Chiba (Japan).
  10. Jhingran, A., 2006. Enterprise Information Mashups: Integrating Information, Simply. In VLDB 2006, September 12-15, 2006, Seoul (Korea).
  11. Kilgarriff, A., 2007. Googleology is Bad Science. In Computational Linguistics, 33(1), pp. 147-151.
  12. Lapauze, J., Niveau, S., 2009. Agrégation de flux RSS. In RICM5, 6 novembre 2009.
  13. Lohmann, S., Ziegler, J., Tetzlaff, L., 2009. Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration. In Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I.
  14. McCandless, M., Hatcher, E., Gospodnetic, O, 2010. Lucene in Action, Manning Publications; 2 edition 2010.
  15. Mayr, P., Tosques, F., 2005. Google Web APIs - an instrument for Webometric analyses?. In Proceedings of the ISSI 2005 conference.
  16. McCown, F., Nelson M. L., 2007a. Search engines and their public interfaces: which apis are the most synchronized?. In WWW 7807 Proceedings of the 16th international conference on World Wide Web,
  17. McCown, F., Nelson M. L., 2007b. Agreeing to disagree: search engine and their public interface. In JCDL'07, June 18-23.
  18. Prasanna Kumar J., Govindarajulu P., 2009. Duplicate and Near Duplicate Documents Detection: A Review. In European Journal of Scientific Research, Vol. 32, Issue 4, pp. 514-527.
  19. Romero-Frias, E., 2009, Googling companies - a webometric approach to business studies. In The electronic journal of business research methods, Vol. 7(1), pp. 93-106.
  20. Samier, H., Sandoval, V., 2004. La veille sur les weblogs. In Actes du colloque VSST, Toulouse, October 2004.
  21. Srinivas, K., Srinivas, P. V.S., Govardhan, A., 2011. Web service architecture for meta search engine. In International Journal of advanced computer science and applications, Vol. 2 n°10, pp.31-36.
  22. Taboada, M., Anthony, C., Voll, K., 2006. Methods for creating semantic orientation dictionaries. In Proceedings of Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, pp. 427-432.
  23. Thelwall, M., Vaughan, L., Björneborn, L., 2005. Webometrics. In: Annual Review of Information Science and Technology, 39, pp. 81-135.
  24. Thelwall, M., Sud, P., 2012. Webometric research with the Bing Search API 2.0. In Journal of Informetrics, 6(1), pp44-52.
  25. Turney, P. D., 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL'02), Philadelphia, Pennsylvania, 2002, p. 417-424.
  26. Véronis, J., 2006. Etude comparative de six moteurs de recherche, Université de Provence, 23 février 2006.
  27. Viseur, R., 2010. Introduction to libre fulltext technology. In RMLL 2010, Bordeaux (France), July 6-11, 2010.
  28. Viseur, R., 2011. Développement d'un moteur de recherche avec Zend Search. In RMLL 2011, Strasbourg (France), 11-14 juillet 2011.
  29. W3C, 2012. Use <link>s in your document. In W3C (www.w3.org), November 24, 2006 (read: March 13, 2012).
  30. Xu, J., Croft, B. C., 1998. Query expansion using local and global document analysis. In SIGIR'96, Zurich.
Download


Paper Citation


in Harvard Style

Viseur R. (2012). Create a Specialized Search Engine - The Case of an RSS Search Engine . In Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA, ISBN 978-989-8565-18-1, pages 245-248. DOI: 10.5220/0004051302450248


in Bibtex Style

@conference{data12,
author={Robert Viseur},
title={Create a Specialized Search Engine - The Case of an RSS Search Engine},
booktitle={Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA,},
year={2012},
pages={245-248},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004051302450248},
isbn={978-989-8565-18-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA,
TI - Create a Specialized Search Engine - The Case of an RSS Search Engine
SN - 978-989-8565-18-1
AU - Viseur R.
PY - 2012
SP - 245
EP - 248
DO - 10.5220/0004051302450248