„Bilingual automatic term recognition (DVITAS)“, No. P-MIP-20-282
Project No. P-MIP-20-282
Project title: „Bilingual automatic term recognition”
Project duration: from 2019-05-01 to 2022-06-30
Project coordinator: Vytauto Didžiojo universitetas
Project manager in MRU: prof. dr. Sigita Rackevičienė.
Summary: The aim of the project is to develop a methodology for automatic recognition of English and Lithuanian terms of a special field in parallel and comparative textframes, and to compile a bilingual database of terms in that field, based on empirical data, which is publicly available on the Internet. The specific domain chosen for this study is cyber security terminology. The research problem addressed. The research problem addressed by the project is the automatic collection of terminographic material from bilingual data, i.e. from parallel and comparative textbooks, when one of the languages is a language with few linguistic resources and is morphologically rich. The Project will develop an innovative methodology which, to our knowledge, has not yet been applied in Lithuania. In the course of its development, the plafrom is to explore the possibility of using state-of-the-art learning system algorithms and neural networks for bilingual term recognition. Cybersecurity (CS). The field of CS has been chosen because of its particular relevance in today’s information society. The field is particularly dynamic: new documents in the field of CS are emerging fromlat, new concepts are being developed, whose names are not yet established in Lithuanian, and which are used in several variants, often in the original (English) language, or as hybrids (combinations of English and Lithuanian lexical units). Therefore, the KS term database is currently in high demand by drafters and translators of legal and administrative acts, information technology professionals and the general public.
Result to be achieved: The project will create and make publicly available (on the CLARIN repository) bilingual (English-Lithuanian) cybersecurity textbooks – parallel and comparative. They will reflect the use of cybersecurity terms in different genres and types of texts in national and international settings. The terminographic material collected from the textbooks will be published in an open database of English and Lithuanian cybersecurity terms. This database could serve as a model for the development of terminology databases in other fields, using state-of-the-art technologies to automate the collection of terminographic material.
The project is carried out under the Lithuanian Research Council (LRC) supported activity “Research Group Projects”.