Tsakonian Digital is a project aimed at supporting the preservation and revitalization efforts of the Tsakonian language — the last surviving descendant of Ancient Doric Greek (see 1.1 General description) — by providing digital resources and tools to learn and use the language. The project is led by Jaime García Chaparro, Senior Data Scientist based in Madrid, Spain, focusing on the technical development (data collection, model training, website development) and Prof. Maxim Kisilier, based in St. Petersburg, on the linguistic and institutional side, as one of the leading Tsakonian experts worldwide. (García Chaparro, 2025)

For the revitalization efforts context, see 1.2 History and Status. The orthographies supported by the dictionary are described in 3. Orthography.

Much of the material used in the project has been provided by either the Tsakonian Archives or the local Panos Marneris, whose efforts in preserving the language and supporting Tsakonian Digital are deeply appreciated. (García Chaparro, 2025)

Tools and Resources

Since its onset in July 2023, the project has developed the following tools and resources: (García Chaparro, 2025)

  1. Online Dictionary
  2. Neural Machine Translator (NMT)
  3. Bilingual Corpus
  4. Keyboard Extension: A Tsakonian keyboard extension, allowing to write Tsakonian with Kostakis’ orthography on computers.

Online Dictionary

The Tsakonian Digital Dictionary, launched in 2023, is the first online dictionary for the language. It currently holds around 1,300 terms and supports bidirectional translations between Tsakonian and Greek, English, and Spanish. (García Chaparro, 2025)

FeatureDetails
Launch year2023
Terms~1,300
LanguagesGreek, English, Spanish
Tech stackPython / Django, SQLite backend
OrthographiesKostakis, Nowakowski, Marneris
Accesstsakoniandigital.com

The dictionary is built with Python and the Django framework, designed for easy extensibility and maintenance. It uses Kostakis’ orthography as the standard and includes an automated converter script to switch between orthographies on the fly. (García Chaparro, 2025)

The primary source is Ioannis Kamvysis’ To preserve our language (Για να κ̔οντούμε τα γρούσσα νάμου) (Gia na khondoúme ta groússa námou), supplemented by Thanasis Kostakis’ Dictionary of the Tsakonian dialect (1986) and selected volumes from the Tsakonian Chronicles (Χρωνικά των Τσακώνων) (Chroniká ton Tsakónon). (García Chaparro, 2025)

Grammatical information is provided where available: (García Chaparro, 2025)

  • Verbs: aorist in indicative and subjunctive, participle, present subjunctive (if different from present indicative).
  • Nouns: gender (via the article), plural, and genitive form (if it exists).
  • Adjectives: masculine, feminine, and neuter singular endings.

Features under development include sentence examples, dialect usage marking, verb conjugation tables, and expansion of the Grammar section with paradigm tables. (García Chaparro, 2025)

Neural Machine Translator (NMT)

The Tsakonian Digital NMT is the first neural machine translation model able to translate between Tsakonian and Greek in both directions. The model is open source and available on HuggingFace. (García Chaparro, 2025)

FeatureDetails
Base modelGemma 2 9B
Fine-tuning methodQLoRA (Quantized Low-Rank Adaptation)
Training corpus1,600+ bilingual sentence pairs (Tsakonian–Greek)
Corpus split80% training, 10% validation, 10% test
Evaluation metricsBLEU, ChrF++
Training hardware1× A100 GPU (~1 hour per model)
Epochs2
Learning rate5e-5 (cosine decay)

Several LoRA rank (r) configurations were tested. BLEU (Bilingual Evaluation Understudy) and ChrF++ are standard automatic evaluation metrics for machine translation: BLEU measures n-gram overlap between the model’s output and reference translations (0–100, higher = better), while ChrF++ measures character-level similarity. The table below summarises scores across two evaluation sets: (García Chaparro, 2025)

Evaluation set 1 (27 sentences per direction; source: Kamvysis)

Rank (r, α)BLEU Tsd→EllBLEU Ell→TsdChrF++ Tsd→EllChrF++ Ell→Tsd
Base model0.370.1610.405.55
r=16, α=852.3241.3969.6168.45
r=32, α=1649.8944.6067.4469.66
r=64, α=3244.0547.2163.4770.88
r=128, α=6444.7141.6161.9966.56

Evaluation set 2 (25 sentences per direction; sources: Lysikatos, Marneris)

Rank (r, α)BLEU Tsd→EllBLEU Ell→TsdChrF++ Tsd→EllChrF++ Ell→Tsd
Base model1.460.3420.1111.16
r=16, α=846.0537.5165.0462.38
r=32, α=1643.8834.2863.2359.87
r=64, α=3238.5033.2560.2760.50
r=128, α=6434.1233.3754.1857.41

The r=16 configuration yielded the most competitive results overall, with a notable exception in Greek→Tsakonian on evaluation set 1, where r=64 returned the best scores. The paper attributes this to smaller models striking an optimal balance between capturing linguistic trends and avoiding overfitting on the limited dataset. (García Chaparro, 2025)

AI bridging pipeline: The NMT model also serves as an intermediary to enable Tsakonian interaction with commercial AI platforms (e.g., Google Translate, DeepL, GPT, Gemini). The pipeline translates Tsakonian input into Greek, passes the Greek text to the commercial platform, and translates the response back into Tsakonian. This bypasses the need to directly train a full Tsakonian-centric AI model on extremely scarce data. (García Chaparro, 2025)

Bilingual Corpus

The Tsakonian-Greek bilingual corpus, compiled as part of the project, contains more than 1,600 sentence pairs suitable for NMT training. Sources include published texts, fieldwork transcripts, and manual translations. All samples are stored in Kostakis’ orthography; a Python script converts samples written in other systems. (García Chaparro, 2025)

The corpus is split into training (80%), validation (10%), and test (10%) sets. Each pair is unfolded into two samples (Tsakonian→Greek and Greek→Tsakonian) to support bidirectional translation. (García Chaparro, 2025)

Two evaluation datasets were derived from the test set: (García Chaparro, 2025)

Evaluation setSentences per directionMain sourcesPurpose
Set 127Kamvysis (published text)Basic benchmark and progress tracking
Set 225Lysikatos, Marneris (chronicles, online texts)Generalization to more complex sentences

Project Stages

The project was roughly divided in three steps: (García Chaparro, 2025)

  1. Linguistics stage: deals with basic theoretical foundations for language study, research on the current language situation, available resources, and the creation of auxiliary materials like the digital dictionary.
  2. Data collection stage: aims to gather raw information to build a parallel corpus storing sentences in Tsakonian alongside Modern Standard Greek translations.
  3. AI building stage: focuses on training a Large Language Model (LLM) system capable of performing translation tasks and other language processing operations.

Milestones

Yearly advancements are presented in the closing session of the Tsakonian Summer School in Leonidio. (García Chaparro, 2025)

  • 2024: First public release of the dictionary.
  • 2025: Unveiling of the AI translation model and expansion of the dictionary to English and Spanish.

As of November 2025, the foundational stages of the project have been completed, with the team focusing on expanding the dictionary and improving the translation model. (García Chaparro, 2025)

References