Dataset
Every tagged release of sohamhamso publishes a dataset bundle to Zenodo. It contains the same verses, glosses, and translations the site renders — packaged for analysis, archival, and downstream reuse.
What's in it
- CSV — flat tables:
texts.csv,verses.csv,word_glosses.csv,translations.csv. Easiest to load into pandas, DuckDB, or a spreadsheet. - JSON shards — one file per text, with nested verse/word/translation structure preserved.
- TEI-XML — one document per text, structured to interoperate with SARIT and other Indological TEI corpora.
- checksums.sha256 — SHA-256 digests of every file in the bundle, signed alongside the release.
Loading the data
The CSV bundle loads directly with pandas:
import pandas as pd
texts = pd.read_csv("texts.csv")
verses = pd.read_csv("verses.csv")
translations = pd.read_csv("translations.csv")
ss = verses[verses.text_id == texts.loc[texts.slug == "shiva-sutras", "id"].iloc[0]]
print(ss.head()) Versioning
Release tags follow vYYYY.MM.DD. Schema changes are
additive within a year; any breaking change is announced in the
release notes and the changelog. The
Translation Status Contract
documents the stability guarantees for the badge-relevant
provenance fields (ai_assisted, status,
model, model_version,
prompt_version, judge_score).
Integrity
Every bundle ships with checksums.sha256. Verify
before use:
sha256sum --check checksums.sha256 Citation
The Zenodo deposit issues a versioned DOI per release and a concept DOI that always resolves to the latest. Until the first release lands, the DOI is a placeholder.
@dataset{sohamhamso_vYYYY_MM_DD,
author = {sohamhamso contributors},
title = {sohamhamso: Tantric Sanskrit canon dataset},
year = {YYYY},
version = {vYYYY.MM.DD},
doi = {10.5281/zenodo.PLACEHOLDER},
publisher = {Zenodo},
url = {https://doi.org/10.5281/zenodo.PLACEHOLDER},
license = {CC-BY-SA-4.0}
} Additional citation formats — Chicago, MLA, plain — live on /cite and are regenerated on each release.
License
The dataset is released under CC-BY-SA 4.0: free to share and adapt, including commercially, provided you credit the source and license derivatives under the same terms. Where an upstream source carries stricter terms (e.g., Muktabodha pending-permission), the most-restrictive applicable license governs that file. Full details on the License page; per-source attribution on Sources.
Where to get it
- Zenodo deposit (concept DOI): TBD on first release
- GitHub release tags: github.com/sohamhamso/sohamhamso/releases
Last revised: 2026-05-31.