Authors:
Cosmin Strilețchi
1
;
Mădălina Chitez
2
and
Karla Csürös
2
Affiliations:
1
Communications Department, Technical University of Cluj-Napoca, Cluj-Napoca, Romania
;
2
Department of Modern Languages and Literatures, West University of Timisoara, Timisoara, Romania
Keyword(s):
Corpus Query Platform, Academic Writing, Bilingual Corpus, Big Data, ROGER.
Abstract:
This paper presents an approach to a bilingual Corpus query system. ROGER has been designed and implemented as a cross-platform distributed web application. The backend interface available to authenticated administrators provides the digital tools for managing the database stored texts and associated metadata, and also offers an extensive statistics mechanism that cover the data composition and usage (words, characters, languages, study levels, genres, domains and n-grams). The frontend capabilities are offered to the registered users allowing them to search for specific keywords and to refine the obtained results by applying a series of filters. Current platform features include search terms and phrases, n-gram distributions and statistical visualizations for performed queries. After inputting a search term / phase, the user may filter available texts by: (i) language (English, Romanian); (ii) student genre (currently 20 genres); (iii) study year (1 through 4); (iv) level (BA, MA or
PhD); (v) discipline (currently 8 disciplines) and (vi) gender (male, female or unknown). A series of solutions have been implemented to improve the response times of the intensely computational procedures that manipulate big amounts of data.
(More)