loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Author: Ebru Celikel

Affiliation: Ege University International Computer, Turkey

Keyword(s): Language identification, Statistical modelling, PPM

Related Ontology Subjects/Areas/Topics: Artificial Intelligence and Decision Support Systems ; Enterprise Information Systems ; Natural Language Interfaces to Intelligent Systems

Abstract: The problem of language discrimination may arise in situations when many texts belonging to different source languages are at hand but we are not sure to which language each belongs to. This might usually be the case during information retrieval via Internet. We propose a cryptographic solution to the language identification problem: Employing the Prediction by Partial Matching (PPM) model, we generate a language model and then use this model to discriminate languages. PPM is a cryptographic tool based on an adaptive statistical model. It yields compression rates (measured in bits per character –bpc) to far better levels than that of many other conventional lossless compression tools. Language identification experiment results obtained on sample texts from five different languages as English, French, Turkish, German and Spanish Corpora are given. The rate of success yielded that the performance of the system is highly dependent on the diversity, as well as the target text and trainin g text file sizes. The results also indicate that the PPM model is highly sensitive to input language. In cryptographic aspect, if the training text itself is kept secret, our language identification system would provide security to promising degrees. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.85.224.214

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Celikel, E. (2005). A CRYPTOGRAPHIC APPROACH TO LANGUAGE IDENTIFICATION: PPM. In Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS; ISBN 972-8865-19-8; ISSN 2184-4992, SciTePress, pages 213-219. DOI: 10.5220/0002556102130219

@conference{iceis05,
author={Ebru Celikel.},
title={A CRYPTOGRAPHIC APPROACH TO LANGUAGE IDENTIFICATION: PPM},
booktitle={Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS},
year={2005},
pages={213-219},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002556102130219},
isbn={972-8865-19-8},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS
TI - A CRYPTOGRAPHIC APPROACH TO LANGUAGE IDENTIFICATION: PPM
SN - 972-8865-19-8
IS - 2184-4992
AU - Celikel, E.
PY - 2005
SP - 213
EP - 219
DO - 10.5220/0002556102130219
PB - SciTePress