Studying semantic shifts of sanskrit

Tanay Agrawal

|

December 16, 2025

Sanskrit

humorous depiction of panini

कथमर्थं निषेधन्तु श्रुतयः स्मृतयोऽपि वा । यासामेकं पदमपि न चलत्यर्थतो विना ॥ 1

How could meaning ever be denied by the Śrutis (the Vedas) or Smṛtis, when not even a single word can function without meaning?

In Sanskrit intellectual tradition, meaning was not incidental across texts. Not even a single word was thought to function without purpose or semantic weight. The śloka above testifies that language is treated as intentional with precise regard to meaning.

This project begins from that same perspective and respect for meaning.

About Semantic Drift

Languages change. Words travel across centuries, genres, and communities. They appear in religious, legal, scientific, casual, and musical (to name a few) contexts. Therefore, they evolve over time as we progress as a species. This phenomenon, where a word’s meaning shifts due to changing patterns of usage, is known as semantic drift or semantic change.

A prime example of this is slang, especially in online language. The word cool once only meant moderately cold, but now means something like interesting or stylish. To ghost someone is not to turn them into a specter, but rather to ignore their texts. (As 2025 draws to a close, I can only imagine historical linguists working overtime analyzing the hundreds of new slang words!)

humorous comic of meaning [2]

Traditional philology can detect such changes through meticulous reading and documentation, but this approach is necessarily limited in scale. In my case, Sanskrit literature spans thousands of texts across millennia and would take forever for a human reader to track how every occurrence of a word behaves.

A Modern Lens

Computational methods can contribute as tools for semantic observation. One influential idea is distributional semantics, which can be summed up in the distributional hypothesis:

Words that occur in similar contexts tend to purport similar meanings.

Word embedding models use this idea by representing words as vectors in a high-dimensional space based on their surrounding words. If the contexts change, the vector shifts. If two words are used similarly, their vectors lie close together. Check out this quick video to visualize this!

How does this apply to semantic drift? Well, we can construct multiple vector spaces of word embeddings from different periods of time, and then compare vectors of the same word across different periods to measure change. This methodology, introduced by [3], serves as the base for my research.

Research Question(s)

I seek to answer the following questions:

  1. Can diachronic word embeddings capture measurable semantic shift in Sanskrit words across ancient texts?
  2. How do these computationally detected shifts compare with known / hypothesized word changes in Sanskrit literature?

What next?

This blog series will document my project step by step, walking you, the reader, through my process and the learnings I gleaned from it.

I'll cover how I constructed my corpus, including cleaning and normalization, complex Sanskrit-specific considerations (foreshadowing), modeling choices, and my results / findings.

Follow me on along this journey and lastly, thank you for reading!

References

  1. Wisdom Library, Mahāsubhāṣitasaṃgraha (quote nr. 8477), https://www.wisdomlib.org/sanskrit/quote/mss/subhashita-8477

  2. PBF Comics, https://pbfcomics.com/comics/charged/

  3. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change, PDF: https://aclanthology.org/P16-1141.pdf