ABBIE: Attention-Based BI-Encoders for Predicting Where to Split Compound Sanskrit Words

Irfan Ali; Liliana Lo Presti; Igor Spano; Marco La Cascia

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

ABBIE: Attention-Based BI-Encoders for Predicting Where to Split Compound Sanskrit Words

Topics: Deep Learning; Industrial Applications of AI; Natural Language Processing; Neural Networks

In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, 334-344, 2025 , Porto, Portugal

Authors: Irfan Ali ¹ ; Liliana Lo Presti ¹ ; Igor Spano ² and Marco La Cascia ¹

Affiliations: ¹ Department of Engineering, University of Palermo, Palermo, Italy ; ² Department of Cultures and Society, University of Palermo, Palermo, Italy

Keyword(s): Word Segmentation, Sanskrit Language, Sandhi Rule, Bi-Encoders, Attention.

Abstract: Sanskrit is a highly composite language, morphologically and phonetically complex. One of the major challenges in processing Sanskrit is the splitting of compound words that are merged phonetically. Recognizing the exact location of splits in a compound word is difficult since several possible splits can be found, but only a few of them are semantically meaningful. This paper proposes a novel deep learning method that uses two bi-encoders and a multi-head attention module to predict the valid split location in Sanskrit compound words. The two bi-encoders process the input sequence in direct and reverse order respectively. The model learns the character-level context in which the splitting occurs by exploiting the correlation between the direct and reverse dynamics of the characters sequence. The results of the proposed model are compared with a state-of-the-art technique that adopts a bidirectional recurrent network to solve the same task. Experimental results show that the proposed model correctly identifies where the compound word should be split into its components in 89.27% of cases, outperforming the state-of-the-art technique. The paper also proposes a dataset developed from the repository of the Digital Corpus of Sanskrit (DCS) and the University of Hyderabad (UoH) corpus. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.118.254.77

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Ali, I., Lo Presti, L., Spano, I. and La Cascia, M. (2025). ABBIE: Attention-Based BI-Encoders for Predicting Where to Split Compound Sanskrit Words. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-737-5; ISSN 2184-433X, SciTePress, pages 334-344. DOI: 10.5220/0013155300003890

@conference{icaart25,
author={Irfan Ali and Liliana {Lo Presti} and Igor Spano and Marco {La Cascia}},
title={ABBIE: Attention-Based BI-Encoders for Predicting Where to Split Compound Sanskrit Words},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2025},
pages={334-344},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013155300003890},
isbn={978-989-758-737-5},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - ABBIE: Attention-Based BI-Encoders for Predicting Where to Split Compound Sanskrit Words
SN - 978-989-758-737-5
IS - 2184-433X
AU - Ali, I.
AU - Lo Presti, L.
AU - Spano, I.
AU - La Cascia, M.
PY - 2025
SP - 334
EP - 344
DO - 10.5220/0013155300003890
PB - SciTePress