Building Roger: Technical Challenges While Developing a Bilingual Corpus Management and Query Platform

Cosmin Strilețchi, Mădălina Chitez, Karla Csürös

2022

Abstract

This paper presents an approach to a bilingual Corpus query system. ROGER has been designed and implemented as a cross-platform distributed web application. The backend interface available to authenticated administrators provides the digital tools for managing the database stored texts and associated metadata, and also offers an extensive statistics mechanism that cover the data composition and usage (words, characters, languages, study levels, genres, domains and n-grams). The frontend capabilities are offered to the registered users allowing them to search for specific keywords and to refine the obtained results by applying a series of filters. Current platform features include search terms and phrases, n-gram distributions and statistical visualizations for performed queries. After inputting a search term / phase, the user may filter available texts by: (i) language (English, Romanian); (ii) student genre (currently 20 genres); (iii) study year (1 through 4); (iv) level (BA, MA or PhD); (v) discipline (currently 8 disciplines) and (vi) gender (male, female or unknown). A series of solutions have been implemented to improve the response times of the intensely computational procedures that manipulate big amounts of data.

Download


Paper Citation


in Harvard Style

Strilețchi C., Chitez M. and Csürös K. (2022). Building Roger: Technical Challenges While Developing a Bilingual Corpus Management and Query Platform. In Proceedings of the 17th International Conference on Software Technologies - Volume 1: ICSOFT, ISBN 978-989-758-588-3, pages 390-398. DOI: 10.5220/0011144500003266


in Bibtex Style

@conference{icsoft22,
author={Cosmin Strilețchi and Mădălina Chitez and Karla Csürös},
title={Building Roger: Technical Challenges While Developing a Bilingual Corpus Management and Query Platform},
booktitle={Proceedings of the 17th International Conference on Software Technologies - Volume 1: ICSOFT,},
year={2022},
pages={390-398},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011144500003266},
isbn={978-989-758-588-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Software Technologies - Volume 1: ICSOFT,
TI - Building Roger: Technical Challenges While Developing a Bilingual Corpus Management and Query Platform
SN - 978-989-758-588-3
AU - Strilețchi C.
AU - Chitez M.
AU - Csürös K.
PY - 2022
SP - 390
EP - 398
DO - 10.5220/0011144500003266