6 CONCLUSIONS
In this paper, we discussed the problem of document
analysis from a social point of view. We followed
the guidelines of Zacklad and other researchers
which argue that authority study is one of the main
relevance criteria (Zacklad, 2007) and (Rieh, 2002).
We were also based on the document usage type
notion introduced by Aussenac-Gilles and
Condamines (2004). Using these notions, we
proposed an approach to extract the logical structure
of documents. The first step in this approach is the
social study that allows identifying the actors
involved in the documentarisation process,
document usages and finally fragments and links
types to be identified for each document usage. This
will lead, in a second step, to a document model that
regroups all the required elements. The third step
consists in structuring the documents according to
this document model. In this step, we distinguished
two levels of logical structure: the macro-logical
level and the micro-logical level. This distinction
allowed us developing reusable and interoperable
components that can be used in different document
types and styles.
To illustrate our approach, we presented two
projects we are actually working on. We used
documents in different domains, different types
(papers and books) and different languages (Arabic
and French).
As future research, we plan to study knowledge
extraction and mapping. It includes conceptual
indexing and knowledge visualisation. Our goal is to
structure a document collection as a map linking
themes, fragments, concepts and actors. We think of
an intelligent map that guides the user navigation
according to his practice represented in his profile.
This navigation is also guided by the authority
evaluation process.
REFERENCES
Zacklad, M., 2007. Processus de documentarisation dans
les Documents pour l’Action (DopA). Babel - edit -,
Le numérique: impact sur le cycle de vie du document,
ENSSIB.
Naumann, F. & Rolker, C., 2000. Assessment Methods for
Information Quality Criteria, In. International
Conference on Information Quality (IQ),
Cambridge, MA..
Knight, S. & Burn, J., 2005. Developing a Framework for
Assessing Information Quality on the World Wide
Web, Informing Science Journal, vol. 8, pp. 59-73.
Rieh, S. Y., 2002. Judgment of Information Quality and
Cognitive Authority in the Web, Journal of the
American Society for Information Science and
Technology, vol. 53, no. 2, pp. 145-161.
Shaalan, K. & Raza, H., 2007. Person Name Entity
Recognition for Arabic. In ACL’07. Workshop on
Computational Approaches to Semitic Languages,
Prague, Czech Republic, pp.17-24.
Viola, P. & Narasimhand, M., 2005. Learning to Extract
Information from Semi-structured Text using a
Discriminative Context Free Grammar. In 28th annual
international ACM SIGIR conference on Research and
development in information retrieval, Salvador, Bahia,
Brazil, pp. 330-337.
Rangoni, Y. & Belaïd, A., 2006. Document Logical
Structure Analysis Based on Perceptive Cycles. 7th
IAPR Workshop on Document Analysis Systems - DAS
2006, Springer Verlag (Ed.), pp. 117-128.
Dou, H. & Hassanaly, P., Quoniam, L.; La Tela A., 1990.
Technological watch and information: on bibliometric
analysis in information services, Documentaliste, vol.
27, no. 3, pp. 132-141
Connan, J. & Omlin., C. W., 2000. .Bibliography
Extraction with Hidden Markov Models, technical
report US-CS-TR-00-6, computer science department,
University of Stellenbosch.
Aussenac-Gilles, N. & Condamines, A., 2004. Documents
électroniques et constitution de ressources
terminologiques ou ontologiques, Information-
Interaction-Intelligence, vol. 4, no. 1 pp. 75-94,
Wenger, E., 1998. Communities of Practice: Learning,
Meaning and Identity, Cambridge University Press.
Zitouni, I., Sorensen, J., Luo, X. & Florian R., 2005. The
Impact of Morphological Stemming on Arabic
Mention Detection and Coreference Resolution, In
ACL’05, workshop on Computational Approaches to
Semitic Languages, 43rd Annual Meeting of the
Association of Computational Linguistics. Ann Arbor,
Michigan, USA, pp. 63-70.
Abuleil, S., 2004. Extracting Names from Arabic Text for
Question-Answering Systems, In RIAO’2004,
Coupling approaches, coupling media and coupling
languages for information retrieval, Avignon, France.
pp. 638- 647.
KMIS 2009 - International Conference on Knowledge Management and Information Sharing
102