Exemplarily for the computational output two
excerpts are shown in Figure 2. Both arbitrary
selected previews provide identified and highlighted
words. The left preview (Sulak Sivaraksa) highlights
in intensive green words in relation to three topics:
religion ("Buddhism"), social ("University",
"Social", "Interactual") and press ("Magazine",
"Journal", "Press"). On the other side the right
preview (Social Development Theory) highlights
intensive green words in relation to Technology
("Technology", "Knowledge", "Development",
"Effective") and Social ("Social", "Organization")
topics. As a conclusion it can be stated that different
words out of different topics have been identified,
which match the meaning of the given query-input
very well in any topic (social, technologic) or have
generally a significant meaning for the specific
document (e.g. press and religion).
It has to be stated that the enhancement of a
semantic search engine by implementation of a word
similarity algorithm can be definitely used to
identify semantic relevant sections and text passages
within a document. Even though the current
implementation is still at the prototype stage,
benefits to be gained by its usage already become
clearly visible, especially with respect to the
implemented visual assignment.
5 OUTLOOK
As one of the next major milestones an evaluation of
the approach in a field study is planned to validate
the concept on operational level. This field study
will be conducted by potential key users belonging
to a product development department in the aviation
sector. By picking up the group of engineers,
designers and physicians for aerodynamics, working
in the area of Aircraft Design as target users on the
one hand and their document repositories as a test-
bed on the other, the authors want to ensure
practicability and usability of the proposed solution.
On a technical level several improvements are
foreseen either: Next to LDA, other topic models
will be tested and compared by each other in order
to receive an overview how well the proposed
approach works upon them. For instance the authors
expect to have even better results by usage of the
HMM-LDA (Griffiths et al., 2005) approach instead
of LDA, because the HMM-LDA not only considers
the unstructured "bag-of-words" input, but also
reflects the sequence of words within the respective
document. By using HMM-LDA the authors expect
to come up with a solution, where the intensive
highlighted words will be even more concentrated
on a specific text passage. Next to those
examinations further optimization and refinements
of the core algorithms for computation of the word
similarity are foreseen in order to continuously
improve the output quality.
ACKNOWLEDGEMENTS
This work has been done under the LuFo IV 2nd call
-HIGHER-TE- WP4200 research programme funded
by the Airbus S.A.S. We wish to acknowledge our
gratitude and appreciation to all the project partners
for their contribution during the development of
various ideas and concepts presented in this paper.
REFERENCES
Bahrs, J. et al., 2007. Wissensmanagement in der Praxis -
Ergebnisse einer empirischen Untersuchung:
Empirische Studien in der Wirtschaftsinformatik 1.
ed., Gito.
Blei, D.M., Ng, A.Y. & Jordan, M.I., 2003. Latent
dirichlet allocation. The Journal of Machine Learning
Research, 3, 993-1022.
Deerwester, S. et al., 1990. Indexing by latent semantic
analysis. Journal of the American Sociaty for
informations science, 41(6), 391-407.
Dredze, M. et al., 2008. Generating summary keywords
for emails using topics. In Proceedings of the 13th
international conference on Intelligent user interfaces.
Gran Canaria, Spain, pp. 199-206.
Gong, Y. & Liu, X., 2001. Generic text summarization
using relevance measure and latent semantic analysis.
In Proceedings of the 24th annual international ACM
SIGIR conference on Research and development in
information retrieval. New Orleans, United States, pp.
19-25.
Griffiths, T.L. et al., 2005. Integrating topics and syntax.
Advances in neural information Processing Systems,
17, 537-544.
Hofmann, T., 1999. Probabilistic Latent Semantic
Analysis. Proceedings of uncertainty in artificial
intelligence, 289-296.
Kleiza, K. et al., 2010. Integrated Semantic Search in the
Product Development Phase. to be published in
Proceedings of the 16th International Conference on
Concurrent Enterprising.
Liu, S. et al., 2009. Interactive, topic-based visual text
summarization and analysis. In Proceeding of the 18th
ACM conference on Information and knowledge
management. Hong Kong, China, pp. 543-552.
Salton, G. & Buckley, C., 1988. Term-weighting
approaches in automatic text retrieval. Information
Processing & Management, 24(5), 513-523.
SEMANTIC IDENTIFICATION AND VISUALIZATION OF SIGNIFICANT WORDS WITHIN DOCUMENTS -
Approach to Visualize Relevant Words within Documents to a Search Query by Word Similarity Computation
485