• ELEXAI - EUROPEAN LEXICOGRAPHIC INFRASTRUCTURE FOR ARTIFICIAL INTELLIGENCE (ELEXAI)
The Client : ( ELEXAI )
Project duration: 2026 - 2029
  • Description

The project proposes both valuable enhancement of current Artificial Intelligence (AI) foundation models and a major upgrade of the European Lexicographic Infrastructure (ELEXIS), which resulted from a Horizon 2020 project with the same name (2018-2022). The upgrade represents a transformative reconstruction of the infrastructure, based on recent developments in AI, in particular those related to the emergence of Large Language Models (LLMs). Besides the integration of national, regional, and institutional efforts in the field of lexicography, the new infrastructure will offer upgraded technology, language data, tools, and services that are crucial for improving transformer-based models with multilingual knowledge management originating from high-quality lexicographic resources. The advent of machine-readable knowledge representations (knowledge bases and graphs) that are linguistically sound and can be injected into LLMs, enables the introduction of a virtuous cycle where the relevance of the integrated linguistic knowledge is verified by better outputs from LLM applications that,

subsequently, can be used for improving of knowledge representations and language data. To verify the effect of incorporating linguistic knowledge in LLMs, the creation of reliable benchmarks and other means of evaluation of machine-generated output is foreseen as a result. As such, the infrastructure design will contribute to the yet unresolved tasks of Natural Language Understanding. The establishment of a new virtual lexicographic infrastructure will be carried out by a broad and diverse consortium, including partners from all relevant fields: lexicography, Computational Linguistics, and AI. For long-term sustainability, the infrastructure will rely on several prominent infrastructural initiatives: CLARIN and DARIAH, two ESFRI Landmark infrastructures, and ALT-EDIC, as the new pan-European initiative dedicated to the development of European open massively multilingual language models.