Sources
Every text, dictionary, parser, and model the project depends on, with its license and the role it plays. The canonical, machine- parseable copy is ATTRIBUTION.md in the repository; this page mirrors it for readers.
Sanskrit primary sources
- GRETIL — Göttingen Register of Electronic Texts in Indian Languages
-
License: per-file, mostly CC-BY 4.0; some files carry stricter terms parsed from the header at ingestion.
Role: primary source for the bulk of the canon. Per-text source revision is recorded in
texts.source_revision. - Muktabodha Indological Research Institute (MIRI)
-
muktabodha.org · library: muktalib7.com
License: written redistribution permission pending. No Muktabodha-derived text is published in our dataset or upstreamed to Ambuda until permission lands. For reader display we operate under Muktabodha's scholarly-use terms.
Role: primary source for Kashmir Shaivism / Trika / Kaula texts not covered elsewhere.
- sanskritdocuments.org
-
License: non-commercial scholarly use; redistribution requires per-text permission.
Role: fallback source for texts not covered by GRETIL or Muktabodha. Per-text volunteer transcriber credits preserved in
texts.attribution_html. - Wikisource (Sanskrit)
-
License: CC-BY-SA 4.0 + GFDL.
Role: license-clean fallback source.
- SARIT — South Asian Resources for Indic Texts
-
License: CC-BY-SA per text, with TEI-XML markup preserved upstream.
Role: reference TEI structure and selected texts.
Public-domain English translations
These are passed to the LLM-as-judge as reference signals only, not as ground truth. See methodology. All translators below either died before 1956 or their cited works are unambiguously pre-1930 US-public-domain as of 2026.
- John Woodroffe (Arthur Avalon)
-
The Great Liberation (Mahānirvāṇa Tantra, 1913); Principles of Tantra (1914–1916); Śakti and Śākta (1918); The Garland of Letters (1922); Karpūrādi Stotra.
- Ralph T. H. Griffith
-
Hymns of the Rigveda (1889–1896); White Yajurveda (1899); Sāmaveda (1893); Atharvaveda (1895–1896).
- George Thibaut
-
Brahma Sūtra with Śaṅkara Bhāṣya (Sacred Books of the East 34, 38; 1890–1896).
- Max Müller
-
Principal Upaniṣads (Sacred Books of the East 1, 15; 1879, 1884).
- W. D. Whitney
-
Atharvaveda (1905).
- J. H. Woods
-
Yoga Sūtras of Patañjali (Harvard Oriental Series 17, 1914).
Computational resources
- Vidyut
-
github.com/ambuda-org/vidyut · Rust toolkit from the Ambuda project.
License: MIT / Apache-2.0.
Role: Sanskrit morphological segmentation — lemmas, case, number, gender, sandhi splits.
- DCS — Digital Corpus of Sanskrit
-
License: CC-BY 3.0. Cite Hellwig, Oliver (2010–present).
Role: lemmatised, POS-tagged corpus used where coverage exists.
- Cologne C-SALT Sanskrit Dictionaries
-
cceh.github.io/c-salt_sanskrit_data
License: CC-BY-SA.
Role: Monier-Williams, Apte, and 20+ other Sanskrit dictionaries via REST and GraphQL. Provides per-lemma glosses fed to the translator and the judge.
- Skrutable
-
github.com/tylergneill/skrutable
License: MIT.
Role: meter identification per verse (anuṣṭubh, triṣṭubh, etc.) feeding the meter tag in
verses.meter. - Sanscript.js
-
github.com/indic-transliteration/sanscript.js
License: MIT.
Role: client-side script conversion between Devanāgarī, IAST, SLP1, and other Indic scripts.
- Aksharamukha
-
License: MIT / AGPL per component.
Role: server-side script conversion (~120 scripts) for build-time transliterations.
Translation models
- Anthropic Claude Sonnet
-
Used for translation generation. The exact model and version is recorded per row in
translations.modelandtranslations.model_version; the prompt hash is recorded intranslations.prompt_version. - OpenAI text-embedding-3-large
-
Used to embed verses and glosses for the semantic-search index (search workstream, V1.x).
Last revised: 2026-05-31 · Source: ATTRIBUTION.md.