Power Point document is then indexed with standard
methodologies to be available for a search engine.
This solution is nontrivial, but it is a first attempt to
solve this complex problem for a real case study
where a positive judgement from users has been
obtained. In fact, this solution has been tested with
Enel SpA energy company, where encouraging
preliminary results have been obtained. The scenario
considered is a call center, where a technician has to
identify the right content in a few minutes during
telephone assistance. At the moment, we have
conducted some interviews with technicians who are
using the new service that implements our solution,
receiving positive feedback. In the future, we are
planning to validate the preliminary positive
feedback with robust analysis.
The work presented in this position paper could
open a debate in order to establish new
methodologies aimed at solving the problem related
to the automatic definition of a structure from
documents made up of short textual descriptions,
titles, images, etc. The difficulties arise when the
standard approaches (based on the frequency of text
analysis or based on LDA/LSA methods) presented
in the literature are not efficient for this specific
task. A discussion to improve the preliminary
attempt presented in this position paper that is based
on a specific syntax to annotate this particular type
of documents could stimulate the definition of new
methodologies.
ACKNOWLEDGEMENTS
This paper was written within the Enel SpA Project,
and we wish to thank all the people who worked
with us on the development of the software.
REFERENCES
Baeza-Yates, R. A., Navarro, G., 1996. Integrating
contents and structure in text retrieval. In Newsletter
ACM SIGMOD Record, Volume 25, Issue 1, ACM
New York, NY, USA, 67–79.
Bradley, N., 2002. The book, The XML companion, 3
rd
edition, In Pearson Education limited.
Calegari, S., Dominoni, M., Panzeri, E., 2014. Towards
the Design of an Advanced Knowledge-Based Portal
for Enterprises: The KBMS 2.0 Project. In
Proceedings of the 27th International Conference on
Industrial Engineering and Other Applications of
Applied Intelligent Systems, Part II, IEA/AIE, LNCS,
Springer, VOLUME 8482, ISBN 978-3-31907466-5,
Kaohsiung, Taiwan, pp. 58-67.
Calegari, S., Dominoni, M., 2014. Modeling Ontology-
based User Profiles from Company Knowledge. In
Proceedings of the 6th International Conference on
Advances in Databases, Knowledge, and Data
Applications, DBKDA 2014, ISBN 978-1-61208-334-
6, IARIA, Chamonix, France, pp. 26-29.
Callan, J., 1994. Passage-level evidence in document
retrieval. In Proceedings of the 17th annual
International ACM SIGIR conference on Research and
development in Information Retrieval, Springer-
Verlag New York, Inc., 302–310.
Hearst, M., 1997. TextTiling: Segmenting Text into Multi-
Paragraph Subtopic Passages. In Journal
Computational Linguistics. Volume 23, Issue 1, MIT
Press Cambridge, MA, USA, 33–64.
HTML, 2013. http://www.w3.org/html/
INEX, 2014. https://inex.mmci.uni-saarland.de/
Klein, R., Kyrilov, A., Tokman, M., 2011. Automated
assessment of short free-text responses in computer
science using latent semantic analysis. In Proceedings
of the 16th annual joint conference on Innovation and
technology in computer science education (ITiCSE
'11). ACM, New York, NY, USA, 158-162.
Lalmas, M., 2009. The book, XML Information Retrieval,
In Encyclopedia of Library and Information Sciences.
Taylor and Francis Group.
Lewis, D.D., Hayes, P.J., 1994. Special issue of ACM:
Transactions on Information Systems on text
categorization, Volume 12, Issue 1, ACM New York,
NY, USA.
Liferay, 2013. http://www.liferay.com.
Lucene, 2013. http://lucene.apache.org/core/
Morris, J., Hirst, G., 1991. Lexical cohesion computed by
thesaural relations as an indicator of the structure of
text. In Journal Computational Linguistics, Volume
17, Issue 1, MIT Press Cambridge, MA, USA, 21-48.
Siebel, 2009. http://www.oracle.com/partners/en/knowle
dge-zone/applications/siebel/default-329117.html.
Tian, Y., Wang, W., Wang, X., Rao, J., Chen, C., Ma, J.,
2010. Topic detection and organization of mobile text
messages. In Proceedings of the 19th ACM
international conference on Information and
knowledge management (CIKM '10). ACM, New
York, NY, USA, 1877-1880.
Wilkinson, R., 1994. Effective retrieval of structured
documents. In Proceedings of the 17th annual
International ACM SIGIR conference on Research and
development in Information Retrieval, Springer-
Verlag New York, Inc., 311–317.
XML, 2014. www.w3.org/XML/
DATA2015-4thInternationalConferenceonDataManagementTechnologiesandApplications
68