6.3 Tsakonian Digital

Tsakonian Digital is a project aimed at supporting the preservation and revitalization efforts of the Tsakonian language — the last surviving descendant of Ancient Doric Greek (see 1.1 General description) — by providing digital resources and tools to learn and use the language. The project is led by Jaime García Chaparro, Senior Data Scientist based in Madrid, Spain, focusing on the technical development (data collection, model training, website development) and Prof. Maxim Kisilier, based in St. Petersburg, on the linguistic and institutional side, as one of the leading Tsakonian experts worldwide. (García Chaparro, 2025)

For the revitalization efforts context, see 1.2 History and Status. The orthographies supported by the dictionary are described in 3. Orthography.

Much of the material used in the project has been provided by either the Tsakonian Archives or the local Panos Marneris, whose efforts in preserving the language and supporting Tsakonian Digital are deeply appreciated. (García Chaparro, 2025)

Tools and Resources

Since its onset in July 2023, the project has developed the following tools and resources: (García Chaparro, 2025)

Online Dictionary
Neural Machine Translator (NMT)
Bilingual Corpus
Keyboard Extension: A Tsakonian keyboard extension, allowing to write Tsakonian with Kostakis’ orthography on computers.

Online Dictionary

The Tsakonian Digital Dictionary, launched in 2023, is the first online dictionary for the language. It currently holds around 1,300 terms and supports bidirectional translations between Tsakonian and Greek, English, and Spanish. (García Chaparro, 2025)

Feature	Details
Launch year	2023
Terms	~1,300
Languages	Greek, English, Spanish
Tech stack	Python / Django, SQLite backend
Orthographies	Kostakis, Nowakowski, Marneris
Access	tsakoniandigital.com

The dictionary is built with Python and the Django framework, designed for easy extensibility and maintenance. It uses Kostakis’ orthography as the standard and includes an automated converter script to switch between orthographies on the fly. (García Chaparro, 2025)

The primary source is Ioannis Kamvysis’ To preserve our language (Για να κ̔οντούμε τα γρούσσα νάμου) (Gia na khondoúme ta groússa námou), supplemented by Thanasis Kostakis’ Dictionary of the Tsakonian dialect (1986) and selected volumes from the Tsakonian Chronicles (Χρωνικά των Τσακώνων) (Chroniká ton Tsakónon). (García Chaparro, 2025)

Grammatical information is provided where available: (García Chaparro, 2025)

Verbs: aorist in indicative and subjunctive, participle, present subjunctive (if different from present indicative).
Nouns: gender (via the article), plural, and genitive form (if it exists).
Adjectives: masculine, feminine, and neuter singular endings.

Features under development include sentence examples, dialect usage marking, verb conjugation tables, and expansion of the Grammar section with paradigm tables. (García Chaparro, 2025)

Neural Machine Translator (NMT)

The Tsakonian Digital NMT is the first neural machine translation model able to translate between Tsakonian and Greek in both directions. The model is open source and available on HuggingFace. (García Chaparro, 2025)

Feature	Details
Base model	Gemma 2 9B
Fine-tuning method	QLoRA (Quantized Low-Rank Adaptation)
Training corpus	1,600+ bilingual sentence pairs (Tsakonian–Greek)
Corpus split	80% training, 10% validation, 10% test
Evaluation metrics	BLEU, ChrF++
Training hardware	1× A100 GPU (~1 hour per model)
Epochs	2
Learning rate	5e-5 (cosine decay)

Several LoRA rank (r) configurations were tested. BLEU (Bilingual Evaluation Understudy) and ChrF++ are standard automatic evaluation metrics for machine translation: BLEU measures n-gram overlap between the model’s output and reference translations (0–100, higher = better), while ChrF++ measures character-level similarity. The table below summarises scores across two evaluation sets: (García Chaparro, 2025)

Evaluation set 1 (27 sentences per direction; source: Kamvysis)

Rank (r, α)	BLEU Tsd→Ell	BLEU Ell→Tsd	ChrF++ Tsd→Ell	ChrF++ Ell→Tsd
Base model	0.37	0.16	10.40	5.55
r=16, α=8	52.32	41.39	69.61	68.45
r=32, α=16	49.89	44.60	67.44	69.66
r=64, α=32	44.05	47.21	63.47	70.88
r=128, α=64	44.71	41.61	61.99	66.56

Evaluation set 2 (25 sentences per direction; sources: Lysikatos, Marneris)

Rank (r, α)	BLEU Tsd→Ell	BLEU Ell→Tsd	ChrF++ Tsd→Ell	ChrF++ Ell→Tsd
Base model	1.46	0.34	20.11	11.16
r=16, α=8	46.05	37.51	65.04	62.38
r=32, α=16	43.88	34.28	63.23	59.87
r=64, α=32	38.50	33.25	60.27	60.50
r=128, α=64	34.12	33.37	54.18	57.41

The r=16 configuration yielded the most competitive results overall, with a notable exception in Greek→Tsakonian on evaluation set 1, where r=64 returned the best scores. The paper attributes this to smaller models striking an optimal balance between capturing linguistic trends and avoiding overfitting on the limited dataset. (García Chaparro, 2025)

AI bridging pipeline: The NMT model also serves as an intermediary to enable Tsakonian interaction with commercial AI platforms (e.g., Google Translate, DeepL, GPT, Gemini). The pipeline translates Tsakonian input into Greek, passes the Greek text to the commercial platform, and translates the response back into Tsakonian. This bypasses the need to directly train a full Tsakonian-centric AI model on extremely scarce data. (García Chaparro, 2025)

Bilingual Corpus

The Tsakonian-Greek bilingual corpus, compiled as part of the project, contains more than 1,600 sentence pairs suitable for NMT training. Sources include published texts, fieldwork transcripts, and manual translations. All samples are stored in Kostakis’ orthography; a Python script converts samples written in other systems. (García Chaparro, 2025)

The corpus is split into training (80%), validation (10%), and test (10%) sets. Each pair is unfolded into two samples (Tsakonian→Greek and Greek→Tsakonian) to support bidirectional translation. (García Chaparro, 2025)

Two evaluation datasets were derived from the test set: (García Chaparro, 2025)

Evaluation set	Sentences per direction	Main sources	Purpose
Set 1	27	Kamvysis (published text)	Basic benchmark and progress tracking
Set 2	25	Lysikatos, Marneris (chronicles, online texts)	Generalization to more complex sentences

Project Stages

The project was roughly divided in three steps: (García Chaparro, 2025)

Linguistics stage: deals with basic theoretical foundations for language study, research on the current language situation, available resources, and the creation of auxiliary materials like the digital dictionary.
Data collection stage: aims to gather raw information to build a parallel corpus storing sentences in Tsakonian alongside Modern Standard Greek translations.
AI building stage: focuses on training a Large Language Model (LLM) system capable of performing translation tasks and other language processing operations.

Milestones

Yearly advancements are presented in the closing session of the Tsakonian Summer School in Leonidio. (García Chaparro, 2025)

2024: First public release of the dictionary.
2025: Unveiling of the AI translation model and expansion of the dictionary to English and Spanish.

As of November 2025, the foundational stages of the project have been completed, with the team focusing on expanding the dictionary and improving the translation model. (García Chaparro, 2025)

References

García Chaparro, J. (2025). About Tsakonian Digital. About Tsakonian Digital Source.md
García Chaparro, J. (2025). Tsakonian Digital: Tsakonian’s journey towards Artificial Intelligence. Proceedings of MGDLT9. Tsakonian Digital Tsakonian’s journey towards Artificial Intelligence.md

Tsakonian Digital Vault

Explorer