Tsakonian Digital is a project aimed at supporting the preservation and revitalization efforts of the Tsakonian language — the last surviving descendant of Ancient Doric Greek (see 1.1 General description) — by providing digital resources and tools to learn and use the language. The project is led by Jaime García Chaparro, Senior Data Scientist based in Madrid, Spain, focusing on the technical development (data collection, model training, website development) and Prof. Maxim Kisilier, based in St. Petersburg, on the linguistic and institutional side, as one of the leading Tsakonian experts worldwide. (García Chaparro, 2025)
For the revitalization efforts context, see 1.2 History and Status. The orthographies supported by the dictionary are described in 3. Orthography.
Much of the material used in the project has been provided by either the Tsakonian Archives or the local Panos Marneris, whose efforts in preserving the language and supporting Tsakonian Digital are deeply appreciated. (García Chaparro, 2025)
Tools and Resources
Since its onset in July 2023, the project has developed the following tools and resources: (García Chaparro, 2025)
- Online Dictionary
- Neural Machine Translator (NMT)
- Bilingual Corpus
- Keyboard Extension: A Tsakonian keyboard extension, allowing to write Tsakonian with Kostakis’ orthography on computers.
Online Dictionary
The Tsakonian Digital Dictionary, launched in 2023, is the first online dictionary for the language. It currently holds around 1,300 terms and supports bidirectional translations between Tsakonian and Greek, English, and Spanish. (García Chaparro, 2025)
| Feature | Details |
|---|---|
| Launch year | 2023 |
| Terms | ~1,300 |
| Languages | Greek, English, Spanish |
| Tech stack | Python / Django, SQLite backend |
| Orthographies | Kostakis, Nowakowski, Marneris |
| Access | tsakoniandigital.com |
The dictionary is built with Python and the Django framework, designed for easy extensibility and maintenance. It uses Kostakis’ orthography as the standard and includes an automated converter script to switch between orthographies on the fly. (García Chaparro, 2025)
The primary source is Ioannis Kamvysis’ To preserve our language (Για να κ̔οντούμε τα γρούσσα νάμου) (Gia na khondoúme ta groússa námou), supplemented by Thanasis Kostakis’ Dictionary of the Tsakonian dialect (1986) and selected volumes from the Tsakonian Chronicles (Χρωνικά των Τσακώνων) (Chroniká ton Tsakónon). (García Chaparro, 2025)
Grammatical information is provided where available: (García Chaparro, 2025)
- Verbs: aorist in indicative and subjunctive, participle, present subjunctive (if different from present indicative).
- Nouns: gender (via the article), plural, and genitive form (if it exists).
- Adjectives: masculine, feminine, and neuter singular endings.
Features under development include sentence examples, dialect usage marking, verb conjugation tables, and expansion of the Grammar section with paradigm tables. (García Chaparro, 2025)
Neural Machine Translator (NMT)
The Tsakonian Digital NMT is the first neural machine translation model able to translate between Tsakonian and Greek in both directions. The model is open source and available on HuggingFace. (García Chaparro, 2025)
| Feature | Details |
|---|---|
| Base model | Gemma 2 9B |
| Fine-tuning method | QLoRA (Quantized Low-Rank Adaptation) |
| Training corpus | 1,600+ bilingual sentence pairs (Tsakonian–Greek) |
| Corpus split | 80% training, 10% validation, 10% test |
| Evaluation metrics | BLEU, ChrF++ |
| Training hardware | 1× A100 GPU (~1 hour per model) |
| Epochs | 2 |
| Learning rate | 5e-5 (cosine decay) |
Several LoRA rank (r) configurations were tested. BLEU (Bilingual Evaluation Understudy) and ChrF++ are standard automatic evaluation metrics for machine translation: BLEU measures n-gram overlap between the model’s output and reference translations (0–100, higher = better), while ChrF++ measures character-level similarity. The table below summarises scores across two evaluation sets: (García Chaparro, 2025)
Evaluation set 1 (27 sentences per direction; source: Kamvysis)
| Rank (r, α) | BLEU Tsd→Ell | BLEU Ell→Tsd | ChrF++ Tsd→Ell | ChrF++ Ell→Tsd |
|---|---|---|---|---|
| Base model | 0.37 | 0.16 | 10.40 | 5.55 |
| r=16, α=8 | 52.32 | 41.39 | 69.61 | 68.45 |
| r=32, α=16 | 49.89 | 44.60 | 67.44 | 69.66 |
| r=64, α=32 | 44.05 | 47.21 | 63.47 | 70.88 |
| r=128, α=64 | 44.71 | 41.61 | 61.99 | 66.56 |
Evaluation set 2 (25 sentences per direction; sources: Lysikatos, Marneris)
| Rank (r, α) | BLEU Tsd→Ell | BLEU Ell→Tsd | ChrF++ Tsd→Ell | ChrF++ Ell→Tsd |
|---|---|---|---|---|
| Base model | 1.46 | 0.34 | 20.11 | 11.16 |
| r=16, α=8 | 46.05 | 37.51 | 65.04 | 62.38 |
| r=32, α=16 | 43.88 | 34.28 | 63.23 | 59.87 |
| r=64, α=32 | 38.50 | 33.25 | 60.27 | 60.50 |
| r=128, α=64 | 34.12 | 33.37 | 54.18 | 57.41 |
The r=16 configuration yielded the most competitive results overall, with a notable exception in Greek→Tsakonian on evaluation set 1, where r=64 returned the best scores. The paper attributes this to smaller models striking an optimal balance between capturing linguistic trends and avoiding overfitting on the limited dataset. (García Chaparro, 2025)
AI bridging pipeline: The NMT model also serves as an intermediary to enable Tsakonian interaction with commercial AI platforms (e.g., Google Translate, DeepL, GPT, Gemini). The pipeline translates Tsakonian input into Greek, passes the Greek text to the commercial platform, and translates the response back into Tsakonian. This bypasses the need to directly train a full Tsakonian-centric AI model on extremely scarce data. (García Chaparro, 2025)
Bilingual Corpus
The Tsakonian-Greek bilingual corpus, compiled as part of the project, contains more than 1,600 sentence pairs suitable for NMT training. Sources include published texts, fieldwork transcripts, and manual translations. All samples are stored in Kostakis’ orthography; a Python script converts samples written in other systems. (García Chaparro, 2025)
The corpus is split into training (80%), validation (10%), and test (10%) sets. Each pair is unfolded into two samples (Tsakonian→Greek and Greek→Tsakonian) to support bidirectional translation. (García Chaparro, 2025)
Two evaluation datasets were derived from the test set: (García Chaparro, 2025)
| Evaluation set | Sentences per direction | Main sources | Purpose |
|---|---|---|---|
| Set 1 | 27 | Kamvysis (published text) | Basic benchmark and progress tracking |
| Set 2 | 25 | Lysikatos, Marneris (chronicles, online texts) | Generalization to more complex sentences |
Project Stages
The project was roughly divided in three steps: (García Chaparro, 2025)
- Linguistics stage: deals with basic theoretical foundations for language study, research on the current language situation, available resources, and the creation of auxiliary materials like the digital dictionary.
- Data collection stage: aims to gather raw information to build a parallel corpus storing sentences in Tsakonian alongside Modern Standard Greek translations.
- AI building stage: focuses on training a Large Language Model (LLM) system capable of performing translation tasks and other language processing operations.
Milestones
Yearly advancements are presented in the closing session of the Tsakonian Summer School in Leonidio. (García Chaparro, 2025)
- 2024: First public release of the dictionary.
- 2025: Unveiling of the AI translation model and expansion of the dictionary to English and Spanish.
As of November 2025, the foundational stages of the project have been completed, with the team focusing on expanding the dictionary and improving the translation model. (García Chaparro, 2025)
References
- García Chaparro, J. (2025). About Tsakonian Digital. About Tsakonian Digital Source.md
- García Chaparro, J. (2025). Tsakonian Digital: Tsakonian’s journey towards Artificial Intelligence. Proceedings of MGDLT9. Tsakonian Digital Tsakonian’s journey towards Artificial Intelligence.md
Tsakonian Digital Vault