Organizing (Clustering) the Indian Lexicon

Indian Lexicon

Organizing (Clustering) the lexemes of all Indian Languages

For a perspective on the philological issues see:

An introduction Discovering the language of India ca. 3000 B.C. (65 kb.)

This is a lexicon with a difference. It is a comparative semantic (sic) lexicon of synonyms from Indian languages. See list of the languages of India surveyed in the Indian Lexicon.

Indian Lexicon goes beyond the concept of a comparative lexicon or an etymological dictionary.

It is a search for synonyms in ancient forms of Indian languages. Hence, it may be called a semantic lexicon and not an etymological lexicon. This search has taken the compiler 20 years to accomplish using a powerful computer processor to compile the database of semantic clusters, working on an average, for 4 hours daily.

The lexemes of all Indian languages are organized in two major categories:

Alphabetical sequence

The Dravidian Etymological Dictionary (DEDR) uses the order of the Tamil alphabet to seqauence etymological groups, assuming Proto-Dravidian phonemes in the reconstructed PDr roots or stems involved, with the order of the Tamil alphabet applied of these phonemes. Tamil phonemes do not serve as PD reconstructions in all cases. The Indian Lexicon is a first step towards the compilation of an Indian Etymological Dictionary. Proto-Indic construction of phonemes of Proto-Indic roots or stems involved has not been attempted. Lexemes from the Indian languages are clustered based primarily on 'semantics' and secondarily on 'phonetis'. The Comparative Dictionary of Indo-Aryan Languages (CDIAL) provides many reconstructions of Proto-Indo-Aryan phonemes prefixing the root or stem involved with an * as has beeen attempted, in superb other works in philology, for the reconstruction of Indo-European etyma.

Semantic sequence

The semantic problem has been handled vigorously and the Indian Lexicon includes many borrowings among and between languages. This approach has resulted in clustering as many as over 3000 etymological groups of DEDR with the comparative groups of CDIAL, together with thousands of lexemes of Santali, Mundarica and other languages of the Austro-Asiatic linguistic group. There could be many opinions among linguists on semantic developments of a language. It is assumed that there were homophones in a Proto-Indic language which was the lingua franca of the Sarasvati-Sindhu civilization, ca. 2500 B.C.; this assumption, coupled with the Mesopotamian links, provides some hope for deciphering the inscriptions of the Sarasvati (Indus) Script.

A note is appended which recounts Prof. Emeneau's postulation of an Indian Linguistic Area, together with some briefs on the key dates related to the desiccation of the Sarasvati River.(Note on Key dates of the Sarasvati River and the Indian Linguistic Area) This provides the underpinnings for a hypothesis that many entries in this Indian Lexicon are likely to provide the phonemes which were current for a millennium, starting circa 3000 B.C. This hypothesis will be tested by an attempt to decipher the inscriptions of the civilization which sustained the Indian Linguistic Area.

The semantic sequence provided in the Indian Lexicon is like a meta-index of English meanings, using synonyms or near-synonyms of basic English words, while trying to separate English homonyms or near-homonyms. Botanical names (primarily Latin) have been used ater Hooker to index flora, though some entries are also sequenced in the context of sememes related to cultural processes, for e.g. 'food'.

Search facility

In addition to the alphabetic and semantic sequences, a general search facility is also provided. This search can be performed using ANY INDIAN WORD or ANY ENGLISH MEANING. While entering the Indian word (from any language), the simple transliteration rules have to be observed which will be obvious from a cursory review of the Indian Lexicon clusters.

There are over 8300 semantic clusters included in the Indian Lexicon from over 25 languages which makes the work very large. Hence, to render the search faster, a meta-index has been constructed.

For the purposes of the preliminary decipherment effort, a search within the Semantic sequence lists using the search facilities provided on the Browser tool bar (Netscape or Internet Explorer) should be adequate.

Munda lexemes in Sanskrit ( After Kuiper, 37 kb.)

Sememes (213 kb.)

Etyma in Niruktam (25 kb.)

Roots (Dha_tupa_t.ha) (156 kb.)

Verb forms (After Whitney, 42 kb.)

Phonetic guide (Basic sounds of the language)

Abbreviations : Grammatical

Abbreviations used for linguistic categories and other languages

Bibliography (Textual sources of lexemes)

Lexemes of Epigraphy (281kb.)