Indo-Aryan languages

The Indo-Aryan languages form a subfamily of the Indo-Iranian branch of the Indo-European language family . The total of over a hundred Indo-Aryan languages ​​spoken today have around a billion speakers, mainly in northern and central India , Pakistan , Bangladesh , Nepal, and Sri Lanka and the Maldives . The most important Indo-Aryan languages ​​include Hindi - Urdu , Bengali and the classical language Sanskrit . Also that of the Roma spoken in Europe Romani is one of the Indo-Aryan languages. The Indo-Aryan languages ​​are not related to the Dravidian languages spoken mainly in South India , but they have developed numerous common features through millennia of language contact .

Distribution area of ​​some Indo-Aryan languages

Relationships with other languages

Indo-European language family

The Indo-Aryan languages ​​form a sub-branch of the Indo-European language family , which also includes the majority of the languages ​​spoken in Europe . Other branches of the Indo-European language family include Greek , Romance , Slavic and Germanic languages. Thus the Indo-Aryan languages ​​are - albeit distant - relatives of German . In modern languages, traces of this relationship can only be seen in a few words at first glance: The Bengali word for "name" is nām , the Hindi word for "new" is nayā and the "cow" is Marathi gau . In other cases such as English wheel and Nepali cakkā (both mean "wheel"), the common origin, although present, can only be traced back to complicated etymologies . For a large part of the vocabulary and especially the grammar of the modern Indo-Aryan languages, however, no equivalent can be found in today's European languages. On the other hand, between languages ​​spoken in ancient times such as Sanskrit and Latin or ancient Greek , the similarities are far greater, both in terms of vocabulary and morphology . Compare forms such as Sanskrit dantam and Latin dentem "the tooth" or Sanskrit abharan and ancient Greek epheron "they wore".

The knowledge of the relationship between Sanskrit and the languages ​​of Europe was decisive for the development of comparative linguistics . The Englishman William Jones had learned Sanskrit while he was a judge in Kolkata (Calcutta) and in 1786 was the first to postulate that Sanskrit was related to Greek, Latin, Gothic and Celtic . On this basis, the German linguist Franz Bopp (1791–1867) founded the historical-comparative discipline of Indo-European studies . The fact that the modern Indo-Aryan languages ​​are related to Sanskrit was only recognized later, but now it went beyond the goal and the Dravidian languages ​​were also considered to be descendants of Sanskrit. It was not until 1856 that Robert Caldwell recognized the independence of the Dravidian language family.

Indo-Iranian branch of the language

Within the Indo-European language family, the Indo-Aryan languages are close to the Iranian languages , which include Persian (Farsi), Kurdish and Pashtun , which some linguists also refer to as Iranian languages. Here, too, the relationship is most evident in the oldest forms of language: In Old Persian , the language of the Achaemenid great kings, and Sanskrit, many words such as daiva and deva "god", būmi and bhūmi "earth" or aspa and aśva "horse" are almost same shape, while modern languages ​​have diverged. The Indo-Aryan and Iranian languages ​​are grouped under the branch of the Indo-Iranian languages . These also include the numerically small group of the Nuristani languages spoken in Afghanistan and Pakistan as a separate sub-branch . The position of the Dardic languages, which are also widespread in the extreme northwest of the subcontinent, within Indo-Iranian is uncertain. Whereas in the past they were either combined with the Nuristani languages ​​or viewed as a separate branch, they are now considered to be a subgroup of the Indo-Aryan languages.

South Asian language federation

The Indo-Aryan languages ​​in the context of the language families of South Asia

The Indo-Aryan languages ​​share the Indian subcontinent with three other language families: the Dravidian languages, which are mainly spoken in southern India (the most important representatives of which are Telugu , Tamil , Kannada and Malayalam ), the smaller group of the Munda languages (a branch of the Austro-Asian languages ) scattered in central India ) and the Tibetan-Burmese languages (a branch of the Sino-Tibetan languages) on the northern and eastern edges of the subcontinent. The Indo-Aryan languages ​​are not genetically related to these language families, but through thousands of years of language contact they have strongly influenced each other in terms of vocabulary, morphology and phonetics (particularly characteristic, for example, the presence of retroflex consonants). Due to the numerous common features, these languages ​​can be grouped into a South Asian language union . In particular, the interaction between the Indo-Aryan and Dravidian languages ​​has been remarkable, with the Dravidian languages ​​largely adopting words from Sanskrit, while themselves having exerted a strong structural influence on the phonetics and syntax of the Indo-Aryan languages.

Language history

The Indo-Aryan languages ​​can look back on a language history of almost four millennia. It is divided into three historical levels: ancient Indian ( Vedic , classical Sanskrit), central Indian ( Prakrit , Pali , Apabhramsha ) and new Indian, the Indo-Aryan languages ​​spoken from around 1000 AD until today.


As members of the Indo-European language family, the Indo-Aryan languages ​​derive from an assumed Indo-European original language or Proto-Indo-European (PIE), which probably dates back to the 4th or 3rd millennium BC. Was spoken in the steppes of southern Russia. A group split off from the Indo-European indigenous population that referred to itself as " Aryans " ( ārya ) and spoke a preliminary stage of the Indo-Iranian languages . After presumably staying in Bactria for a while, it split around 2000 BC. In an Iranian and Indo-Aryan branch. The Iranians settled in northern and western Iran , the Indo-Aryans probably emigrated around 1500 BC. BC in several waves on the Indian subcontinent. The oldest linguistic reference to the Indo-Aryans comes from the Hurrian Mitanni empire in northern Mesopotamia and northeast Syria . In the 16th – 13th Century BC Some of the throne names of the Mitannic kings are believed to be Indo-Aryan. Gods like d in-da-ra ( equated with Indra ) and d mi-it-ra-aš ( equated with Mitra ) are mentioned in a treaty text. In a textbook on horse keeping that the Mitannier Kikkuli in the 15th century BC In the Hittite language , there are some technical terms borrowed from the Indo-Aryan. These traces of the early Indian language in the west of the Near East disappeared again after the fall of the Mitanni empire.

Old Indian languages

The ancient Indian phase begins with the immigration of the Indo-Aryans to India in the 2nd millennium BC. BC This took place in several waves over a longer period of time. Gradually the Indo-Aryans spread in northern India and displaced the languages ​​of the indigenous population there, but not without being influenced by their substrate effect. There is much to suggest that the Dravidian and Munda languages ​​were once spoken in a much larger area before they were pushed back by the Indo-Aryan expansion to southern India and the inaccessible mountain and forest areas of central India. The view popular in India that the Indo-Aryans were autochthonous in India and had already established the Indus culture there, is doubtful from a scientific point of view.

As Altindoarisch or Old Indic one summarizes the Vedic and classical Sanskrit together. Vedic, the language of the Veda scriptures, is the earliest traditional Indo-Aryan language form. The dating of the texts, which have been handed down orally for a long time, is uncertain, but the oldest hymns of the Rig Veda are likely shortly after the Indo-Aryans immigrated to India in the middle of the 2nd millennium BC. BC. Vedic is an archaic form of Sanskrit with a greater wealth of grammatical forms and some differences in phonology and vocabulary. The differences to classical Sanskrit roughly correspond to those between the language of Homer and classical ancient Greek . The language of Brahmanas and Sutras is an intermediate stage between Vedic and Classical Sanskrit.

In order to ensure the understanding and correct recitation of the sacred texts, the science of phonetics and grammar developed early in India . This found its completion in the work of Panini . Around 300 BC In his grammar he codified the language of the educated upper class. The common people were already speaking Central Indian idioms at this time. The term “Sanskrit” ( saṃskr̥ta “made up, cultivated”) also stands in contrast to the term “ Prakrit ” ( prākr̥ta “natural”), which is used to summarize the Central Indian languages. Panini's grammar became normative for classical Sanskrit. Thus, Sanskrit as a literary language was preserved in an archaic stage and, like Latin in medieval Europe, existed for a long period parallel to the Central Indo-Aryan languages ​​as the language of religion and scholarship. Sanskrit has retained this position in a weakened form to this day. The Indian constitution even recognizes Sanskrit as one of 22 national languages.

The heyday of Sanskrit literature falls in the middle of the 1st millennium AD. This means that a poet like Kalidasa , who probably lived in the 5th century, wrote his works at a time when Sanskrit was no longer a spoken language and obeyed the rules of a grammarian who had lived 700 years before him. Unlike the theory of phonetics and forms, however , the syntax was hardly regulated by Panini and was therefore able to develop peculiarities under the influence of the Central Indian languages ​​that were unknown in the early stages of Old Indian language. Classical Sanskrit is characterized by the preference for passive constructions and the formation of huge compounds with up to 20 components.

Central Indian languages

The Central Indian languages ​​originated from around 600 BC. From the Old Indo-Aryan. Since the spoken forms of Old Indo Aryan were by no means uniform, the often-said statement that certain Middle Indo Aryan languages ​​"emerged from Sanskrit" is misleading. A simplification of the theory of forms and the phonetic form of the words (e.g. Sanskrit trividya to Pali tevijja ) is characteristic of the development from ancient to central Indian . There are several Middle Indian idioms passed down, for which the generic term “ Prakrit ” is often used. The oldest language testimonies of Central Indian and at the same time the oldest written monuments of India are the edicts of Emperor Ashoka from the 3rd century BC , written in a number of regional dialects . They are handed down in stone inscriptions in Brahmi script from different parts of India. The reformist religions of Buddhism and Jainism preferred Prakrit for their writings. Stylized forms of prakrit were also used in art poetry, sometimes parallel to Sanskrit. The classic Sanskrit drama, for example, is multilingual: the protagonists speak Sanskrit, women Sauraseni-Prakrit, comic characters Magadhi-Prakrit and the lyrical songs are written in Maharashtri-Prakrit.

The Central Indian languages ​​can be divided into three phases. The earliest phase embodies Pali , as the language of the Hinayana canon and numerous other Buddhist literature, the most important Central Indian literary language. In Buddhist countries such as Sri Lanka, Burma and Thailand, Pali is considered a classical language. The later prakrits are divided into a western and an eastern branch. The main form of western prakrit, sauraseni , was common in the area of ​​the Ganges and Yamuna rivers . It was also the standard prakrit of drama and the language of some Jain texts. Magadhi , the language of the land of Magadha in today's Bihar , belonged to the eastern Prakrit . It has also been used to characterize lower classes in Sanskrit dramas. Geographically and linguistically, the Ardhamagadhi ("half-Magadhi") spoken in Kosala (today eastern Uttar Pradesh ) occupied an intermediate position. The early Jain canon is written in Ardhamagadhi. Maharashtri , the forerunner of today's Marathi , was related to him . It was mainly used as the language of poetry, including the songs of the Sanskrit dramas. Phonologically it represents the most advanced dialect of the middle phase. Outside of India, the Niya Prakrit is in manuscripts from the 3rd-7th centuries. Century attested as the administrative language of Indo-Aryan groups in today's East Turkestan. Related to him is the somewhat older Gandhari , the language of Indo-Aryan Khotan manuscripts from the 1st century.

Around the middle of the 1st millennium, the next level of Middle Indian Aryan emerged, which is called Apabhramsha ( apabhraṃśa "corrupted language"). The term is used generally for all Indo-Aryan dialects of the late Central Indo-Aryan phase. The Apabhramsha is grammatically more simplified than the Prakrits and already represents a transition language to the New Indo Aryan. The most important literary language of this period was the Nagara Apabhramsha, in addition there existed several regional Apabhramshas, ​​which are already forerunners of today's Indo Aryan languages.

Sinhalese is a special case, as the Sinhalese began around 500 BC. BC probably immigrated from Gujarat to Sri Lanka and their language, isolated from the other Indo-Aryan languages, developed on its own. From the 1st century BC A Sinhala Prakrit has been handed down in inscriptions. The Sinhalese equivalent of the Apabhramsha phase is Elu .

New Indian languages

The transition from Central to New Indian Aryan took place around 900–1100 AD. This phase is poorly documented, the first texts in the New Indo-Aryan languages ​​appear quite late: a short inscription in Marathi and a gloss in Bengali have survived from the 12th century . The oldest literary work in Marathi was written in 1290, in Gujarati in 1394 and in Urdu around 1400.

In the New Indo-Aryan languages, the grammatical development that was already evident in the Central Indo-Aryan phase is being brought to an end. Only rudiments of the old inflected language structure are left; instead, the analytical structure prevails and individual languages ​​develop periphrastic and agglutinating forms. The western languages ​​are generally more conservative than the eastern ones, and the Dardic languages ​​have received a particularly large number of archaic elements . Especially in terms of vocabulary, the rule of the Muslim sultans of Delhi and Mughals , who used Persian as the court language, and the British colonial era left traces in the Indo-Aryan languages.

Geographical distribution

Assumed Indo-Aryan migration with corresponding chronological assignment, beginning 4500 BC Chr.

The main distribution area of ​​today's Indo-Aryan languages ​​includes the northern part of the Indian subcontinent from the Indus in the west to Assam in the east and from the Himalayas in the north to around the 18th parallel in the south. The Indo-Aryan languages ​​are the largest family of languages ​​in South Asia . 15 out of 22 official languages ​​in India are Indo-Aryan, three out of four Indians speak an Indo-Aryan language as their mother tongue. An Indo-Aryan language is also the official language in Pakistan , Bangladesh , Nepal , Sri Lanka and the Maldives .

Central India

Distribution areas of the most important Indo-Aryan languages

The official national language of India is Hindi . The number of native speakers depends on the extent to which neighboring related languages ​​or dialects are added to Hindi or viewed as independent languages. In the narrower sense, Hindi has over 200 million native speakers, if one takes the expanded political definition of the Indian government (see below) as a basis, there are 420 million. With second language speakers, Hindi is spoken by 500 million Indians, and this number is constantly increasing. The standard Hindi language is based on Hindustani , a supraregional lingua franca based on Khari Boli , the dialect of Delhi and the surrounding area. It serves as the official language in the northern Indian states of Uttar Pradesh , Bihar , Jharkhand , Chhattisgarh , Madhya Pradesh , Rajasthan , Haryana , Uttarakhand and Himachal Pradesh as well as in the Union Territory of Delhi and is used by the population as a written language. A number of closely related regional languages , some of which are also classified as Hindi dialects, are spoken in this central Indian area . These are divided into two groups, "West Hindi" or West-Central Indian ( Haryani , Braj-Kanauji , Bundeli ) and "East Hindi" or East-Central Indian ( Awadhi , Bagheli , Chhattisgarhi ).

For political reasons, the Indian government classifies a number of other languages, which are independent from a linguistic point of view, belong to different subgroups of Indo-Aryan and some even have their own written language, as "Hindi dialects" These are the languages ​​of the East Indian Bihari group (with Bhojpuri , Maithili and Magahi ), the West Indian Rajasthani languages ​​and in the north the group of the North Indian Pahari languages spoken on the edge of the Himalayas . This definition is not linguistic, but purely politically motivated. The aim is to expand Hindi into a real national language. However, as a media and prestige language, Hindi is increasingly influencing other Indo-Aryan languages.

Urdu , the language of the Indian and Pakistani Muslims , and Hindi are almost identical in everyday language; they are both based on Hindustani and are not even different dialects. The written language of Urdu differs through a high proportion of words of Persian - Arabic origin and the use of the Arabic script . Despite 65 million speakers (105 million with second speakers), Urdu is a language without a territorial base. The majority of their speakers are made up of the Muslim urban population of northern India, and an Urdu dialect known as Dakhini is also widespread in southern Indian cities such as Hyderabad . In Pakistan , Urdu is only spoken as a mother tongue by a small fraction of the population (around 10 million). It consists of the descendants of immigrant North Indian Muslims who spread all over the country, are economically and politically very active and live almost exclusively in the cities. Urdu soon established itself as a supraregional linguistic and educational language and is the official national language in Pakistan, which is why the number of Urdu speakers is steadily increasing.


The above-mentioned Bihari group (65 million speakers in total) with the main languages Bhojpuri , Maithili and Magahi , which are spoken in Bihar between the central Indian idioms and Bengali, are counted among the East Indian languages . Bengali (the second largest Indo-Aryan language with 210 million speakers) is the language of the Indian states of West Bengal and Tripura as well as of Bangladesh . Some of the Bengali varieties ( Chittagong , Sylhetti and Rajbangsi ) are also classified as languages ​​in their own right. To the northeast, Asamiya is spoken by 15 million speakers in the state of Assam .

The language of the state of Orissa , located on the east coast of India, is Oriya , which is spoken by 32 million people. In the forest and mountain areas of central India alongside the nichtindoarischen languages are Adivasi -Stammesbevölkerung Bhatri and Halbi , two Indo-Aryan transition dialects spoken.

South and west

Marathi is widespread in the northwestern Deccan in the state of Maharashtra and has a total of 80 million speakers. Closely related to Marathi is Konkani (8 million speakers), which is the official language in Goa and is spoken in the extreme south of Maharashtra as well as on the coast of Karnataka and Kerala .

In the tribal areas of North Maharashtra, East Gujarat and South Rajasthan, Bhili and Khandeshi are spoken, two Indo-Aryan languages ​​that were previously considered to be Gujarati dialects. Gujarati , which is adjacent to the west, has 45 million speakers and is spoken in the state of Gujarat and by part of the population of Mumbai (Bombay). To the north are the languages ​​of Rajasthan , the so-called Rajasthani group with the languages Marwari (15 million), Malvi , Bagri , Lambardi and Nimadi , each with 1 to 2 million speakers.

The Sindhi language area (22 million speakers) begins in western Gujarat and continues across the Pakistani border in Sindh province on the lower Indus. Closely related to the Sindhi is the group of western so-called Punjabi dialects, which is also known as the Lahnda group. Of the Lahnda dialects, Siraiki has established itself as the written language, and Hindko is another western Spanish language . A total of around 80 million speak Lahnda, Hindko or Siraiki. The actual (Eastern) Punjabi has a total of 30 million speakers and is widespread in the north of the Pakistani Indus Valley and in the Indian part of the Punjab . Dogri-Kangri (2.2 million speakers) is spoken in the Jammu area in the Indian Union Territory of Jammu and Kashmir.It was previously regarded as a Punjabi dialect, but belongs to a separate branch of the language and is now officially recognized as an independent language in India.


To the north of the Hindi-speaking area, Nepali is spoken by 16 million people. It is the national language of Nepal and is also common in Sikkim , Darjiling and parts of Bhutan . Other important North Indian languages ​​are Garhwali and Kumauni , each with around 2 million speakers. They are spoken in the foothills of the Himalayas west of the Nepali language area (so-called West Pahari languages).

In the extreme northwest of the subcontinent is the distribution area of ​​the Dardic languages. The most important of these is the Kashmiri spoken in the Kashmir Valley with 4.5 million speakers, the only Dardic literary language. The other Dardic languages ​​(including Pashai , Khowar , Kalasha , Shina and Indus-Kohistani ) are spoken by a total of 1.2 million people in the Hindu Kush region of Pakistan and Afghanistan .


Sinhala (Sinhala) is spatially separated from the rest of the Indo-Aryan language area . It is spoken by the majority of the Sri Lankan population (15 million speakers). Divehi , the language of the Maldives , has 300,000 speakers and is closely related to Sinhala.

A special case is Romani (Romanes), the Roma language with around 3.5 million speakers, which is scattered in numerous dialects across the countries of Europe and the Near East . Related to Romani - which also includes the dialect of the Sinti language that is widespread in Germany - are the idioms Domari and Lomavren , which are also spoken outside of India in the Middle East and Europe .

As a result of recent migration processes during the British colonial period, Indo-Aryan languages ​​are used in greater numbers and a. also used in the Caribbean , Guyana , South Africa , the United Kingdom , Mauritius and Fiji . In Fiji, a variant of Hindustani is even used as the official language.

Classification of the New Indo-Aryan languages


An internal classification of the New Indo-Aryan languages, which have been spoken since about AD 1000, encounters many problems. Ideally, a family tree can reflect the genetic splitting off of language groups that have diverged over time due to spatial distance. In principle, this process also took place in the Indo-Aryan languages, but due to various migration movements, it can no longer be clearly traced historically. The reasons for the recurring migrations and the associated processes of mixing are the hardly existing natural barriers in the Indian heartland and unstable political units with multiethnic and multilingual societies. These processes ultimately resulted in a dialect continuum that extends across the entire Indo-Aryan language area from west to east and from north to south.

The consequences are firstly great difficulties in the identification of individual languages, secondly in the delimitation of dialect and language and finally in the classification, that is, the internal structure of the New Indo-Aryan languages ​​as a whole. A further complicating factor is the fact that the transition from the late Central Indian to the early New Indian Aryan around 900 to 1100 AD is only very weakly documented in writing; this makes it almost impossible to trace the New Indian languages ​​back to certain Central Indian languages ​​and thus to achieve a natural grouping of the New Indian languages.

Since a simple, well-founded family tree model is not easy to achieve, there have been attempts to understand the structure of the New Indian languages with the help of the wave model . Innovations emanating from certain centers are examined, which have moved through sub-areas of the New Indo-Aryan languages ​​over the course of time and which can be reproduced in isoglosses . The phenomenon of prestige languages ​​also plays a major role here, the characteristics and innovations of which have increasingly been transferred to neighboring languages ​​through contact. (Indo-Aryan prestige languages ​​with this function were Vedic, Sanskrit, Magadhi, Sauraseni, Apabhramsha and nowadays Hindi.) The problem with the application of the wave theory is that different isoglosses lead to completely different classifications and thus this model does not allow any classification.

Historical approaches

Classification attempts in the classical genetic sense have been around since the early 19th century. But only Hoernle 1880 gives an overview that is already based on a large number of New Indian languages ​​and can therefore be compared with more modern versions. Hoernle's main structure is a north-westerly and a south-east, which he traces back to temporally separated waves of immigration:

  • New Indian languages ​​(after Hoernle 1880)
    • North-West Group
      • Northern group: Nepali, Kumanauni, Garhwali u. a.
      • Western group: Sindhi, Punjabi, Gujarati, Hindi and others a.
    • South-east group
      • Eastern group: Bihari, Bengali, Oriya u. a.
      • South group: Marathi, Konkani

The basic structure of this classification, which is also dependent on the area, was adopted by many later researchers, but the thesis of different immigration flows was soon rejected. George Abraham Grierson made a next important step in his Linguistic Survey of India (1903-28), which is still an important working basis today. He started from a concept of “outer” and “inner” New Indian languages. To the inner he counted the Pahari group, Panjabi, Rajasthani, Gujarati and Hindi, to the outer group the eastern group (Bengali, Assami, Oriya), the southern group (Marathi, Konkani, Sinhala) and a northwest group (Lahnda, Sindhi). In between he positioned a “middle” group of transitional languages ​​(e.g. Awadhi, Chhattisgarhi). The inside-outside concept and the migration thesis could not hold up.

Chatterji then presented a new classification in 1926, which essentially corresponds to today's approaches. Although the groups again have area names, Chatterji assumes linguistic features and certain phonetic isoglosses and thus comes to the following non-hierarchical classification:

  • New Indian languages ​​(after Chatterji 1926)
    • North: Pahari, Nepali
    • Northwest: Lahnda, Punjabi, Sindhi
    • Southwest: Rajasthani, Gujarati
    • Central: Hindi and related languages
    • East: Bihari, Bengali, Assami, Oriya
    • South: Marathi, Sinhala

In 1931 Grierson revised his original approach and came up with a very similar internal structure as Chatterji. The classifications by Turner (1960), Katre (1965) and Cardona (1974) are each well-founded variants of the Chatterji approach.

Special cases Dardisch, Romani, Sinhalese

While one had gradually found an approximate consensus on the core of the New Indo-Aryan languages ​​without, however, arriving at a generally accepted classification in every detail, there were even longer disputes about marginal groups, namely Dardian , the Gypsy languages ​​( Romani , Domari ) and the Sinhala and Maldivian (Divehi). The latter has either been assigned to the South Indian languages ​​( Marathi , Konkani ) or treated as a separate group.

In the case of Dardian, it has not yet been definitively clarified which languages ​​should belong to it. If the Nuristani languages were originally included, the majority of researchers now tend to place Nuristani as the third branch of Indo-Iranian on an equal footing with Iranian and Indo-Aryan and no longer to be assigned to the Dardic languages. The position of the (remaining) Dardic languages ​​(the most important is Kashmiri) within New Indian Aryan is still in dispute. While some researchers regard it as a sub-branch of Northwest Indian (for example together with Lahnda and Sindhi), it is positioning itself as an independent branch of Indo-Aryan.

The assignment of Romani and the other Gypsy languages, whose affiliation with Indo-Aryan was recognized in the mid-19th century, is particularly difficult. The modern representation of Romani and its dialects by Matras (2002) positions it close to Hindi in Central Indian; earlier views tended towards Northwest Indian because of some similarities in phonetics. The final decision has not yet been made on this matter. This article presents Romani along with Domari and Lomavren as a separate branch of New Indian Aryan.

Thus, the classification given here largely coincides with that of Gippert in the Metzler Lexikon Sprach (2nd edition 2000), which he describes as "currently the best working hypothesis". There will probably not be a stable classification of the New Indian languages ​​that is valid for all times either, but large deviations from the model presented here are not to be expected either.

Main branches of the New Indian Aryan

The following list shows the main branches of New Indian Aryan with the most important languages. The complete classification of all New Indian languages ​​is presented in the next section.

Classification of the New Indo-Aryan languages

For each main branch of New Indian Aryan, the following overview gives the structural breakdown and the associated languages ​​with their current number of speakers. David Dalby, The Linguasphere Register (2000) , was mainly used for language identification (and differentiation from dialects) . It should be noted that the units represented are actually “languages” and not just “dialects”; each language mentioned usually has several dialects. There are usually transitional dialects between neighboring languages, the assignment of which is of course problematic. For a complete and detailed listing of dialects and sub-dialects, see the web link below on which this classification is based. The number of speakers comes mainly from Ethnologue (15th edition, 2005); for larger languages, statistical yearbooks and additional sources were used for security purposes.

Main branches in capital letters, genetic subgroups in bold, language names in normal type.

DARDISH (23 languages ​​with 5.7 million speakers)

  • Kashmiri
  • Shina
    • Shina (500 thousand), Brokshat (Brokskat, Brokpa) (3 thousand), Ushojo (2 thousand), Dumaki (500) [ also considered a Domari dialect ],
      Phalura (Dangarik) (10 thousand), Sawi (Sow) (3 thousand)
  • Kohistani
    • Indus Kohistani (220 thousand), Kalami Kohistani (Bashkarik, Garwi) (40 thousand), Torwali (60 thousand), Kalkoti (4 thousand),
      Bateri (30 thousand), Chilisso (3 thousand), Gowro (200), Wotapuri-Katarqalai (2 thousand), Tirahi (100)
  • Chitral
    • Khowar (Chitrali) (240 thousand), Kalasha (5 thousand)
  • Kunar
    • Pashai (110 thousand), Gawarbati (10 thousand), Dameli (5 thousand), Shumasti (1 thousand)

NORTH INDIAN or PAHARI (3 languages ​​with 21 million speakers)

  • Western Pahari
    • Garhwali (2.2 million), Kumauni (2.4 million)
  • Eastern Pahari

NORTH WEST INDIAN (20 languages ​​with 135 million speakers)

  • Dogri-Kangri
    • Dogri-Kangri (2.2 million), Gaddi (Bhamauri) (120 thousand), Churahi (110 thousand), Bhattiyali (100 thousand), Bilaspuri (300 thousand),
      Kinnauri-Harijani (6 thousand) , Chambeali (130 thousand), Mandeali (800 thousand), Mahasu-Pahari (650 thousand), Jaunsari (100 thousand),
      Kului (110 thousand), Bhadrawahi-Pangwali (90 thousand), Pahari-Potwari (200 thousand)
  • Lahnda
    • Hindko (3 million), Lahnda (West Punjabi) (45 million), Siraiki (South Punjabi, Multani) (30 million)
  • Punjabi
    • Punjabi (East Punjabi) (30 million)
  • Sindhi
    • Sindhi (22 million), Kachchi (850 thousand), Jadgali (100 thousand)

WEST INDIAN (13 languages ​​with 78 million speakers)

  • Rajasthani
    • Marwari (15 million), Harauti (600 thousand), Goaria (25 thousand);
      Malvi (1.2 million), Nimadi (1.4 million), Gujari (Gujuri) (1-2 million),
      Bagri (1.8 million), Lambadi (Lamani) (2.8 million) , Lohari (a few thousand)
  • Gujarati
    • Gujarati (45 million), Vasavi (1 million), Saurashtri (300 thousand)
  • Bhili-Khandeshi
    • Bhili (6 million), Kandeshi (2.5 million)

CENTRAL INDIAN (14 languages ​​with 320 million speakers, including S2 655 million)

  • west
    • Hindi (200 million, with S2 490 million), Urdu (60 million, with S2 105 million),
      Braj-Kanauji (6 million), Haryanvi (Bangaru) (13 million), Bundeli (8 million) ),
      Gowli (35 thousand), Chamari (5 thousand), Sansi (10 thousand), Ghera (10 thousand), Bhaya (700)
  • east
    • Awadhi (21 million), Bagheli (400 thousand);
      Chhattisgarhi (12 million), Dhanwar (15 thousand)

EAST INDIAN (26 languages ​​with 347 million speakers)

  • Bihari
    • Bhojpuri (26 million), Maithili (25 million), Magahi (12 million), Sadri (2 million),
      Oraon Sadri (200 thousand), Angika (750 thousand), Bote-Majhi (10 thousand. )
  • Oriya
    • Oriya (32 million), Adiwasi Oriya (300 thousand), Halbi (800 thousand)
  • Tharu
    • Rana Thakur Tharu (270 thousand), Saptari Tharu (250 thousand), Chitwania Tharu (80 thousand),
      Deokri Tharu (80 thousand), Mahotari Tharu (30 thousand), Buksa (45 thousand)
  • Assami Bengali
    • Assami (15 million)
    • Bengali (210 million), Chittagong (14 million), Sylhetti (5 million), Rajbangsi (2.4 million),
      Chakma (600 thousand), Bishnupriya Manipuri (75 thousand), Hajong (20 thousand) )

SOUTH INDIAN (4 languages ​​with 89 million speakers)

  • Marathi
    • Marathi (80 million)
  • Konkani
    • Konkani (8 million), Bhil-Konkani (600 thousand), Varli (500 thousand)

SINHALA-DIVEHI (2 languages ​​with 13.2 million speakers)

  • Sinhala (Sinhala) (13 million), Divehi (Maldivian) (300 thousand)

ROMANI-DOMARI (3 languages ​​with 4 million speakers)

  • Romani (3.5 million), Domari (500 thousand), Lomavren (100 thousand?)

NOT CLASSIFIED (8 languages ​​with 220,000 speakers)
In addition to the classified New Indo-Aryan languages, there are some
non-scripted languages ​​which up to now could not be assigned to any of the main branches; however, the languages ​​mentioned here are undoubtedly Indo-Aryan. Possibly some of these languages ​​are dialects of classified languages. No linguistic studies or even grammars have been published for any of these languages. It is a matter of:

  • Tippera (100 thousand speakers), Kanjari (50 thousand), Od (50 thousand), Usui (5 thousand), Vaagri Booli (10 thousand),
    Darai (7 thousand), Kumhali (1 thousand) , Chinali (1 thousand).

Linguistic features


The phoneme inventory of the Indo-Aryan languages ​​has remained fairly stable in the various language levels. Characteristic sounds such as the retroflex and aspirated consonants occur in ancient, middle and almost all modern Indian languages. On the other hand, the various language levels have undergone profound changes, especially with regard to the distribution of sounds in the word, as a result of which the sound shape of the words has changed in some cases considerably.


Characteristic of the consonant system of the Indo-Aryan languages ​​is a large number (usually 20) of plosives ( plosives ), which are differentiated according to five articulations ( velar , palatal , retroflex , dental and labial ). The contrast between retroflex and dental t (cf. Hindi totā “parrot” and ṭoṭā “deficiency”) is typical of the languages ​​of South Asia. Although c and j are traditionally classified as plosives, in practice they are spoken more as affricates , ie [ ʧ ] and [ ʤ ]. The difference between voicing and voicelessness (e.g. p vs. b ) is just as important as aspiration , which occurs in both voiceless and voiced plosives (e.g. p , b vs. ph , bh ). After describing the traditional Indian grammar to each of the five rows of plosives there is a homorganer (on the same articulation of spoken) Nasal . This results in the following system of plosives and nasals (the IAST transcription and the sound value in IPA phonetic transcription are given ):

Plosives Nasals
unvoiced voicelessly aspirated voiced voiced aspirated
Velar k [ k ] kh [ ] g [ g ] gh [ ] ṅ [ ŋ ]
Palatal c [ c ] ch [ ] j [ ɟ ] jh [ ɟʱ ] ñ [ ɲ ]
Retroflex ṭ [ ʈ ] ṭh [ ʈʰ ] ḍ [ ɖ ] ḍh [ ɖʱ ] ṇ [ ɳ ]
Dental t [ ] th [ t̪ʰ ] d [ ] dh [ d̪ʱ ] n [ ]
labial p [ p ] ph [ ] b [ b ] bh [ ] m [ m ]

Some peripheral Indo-Aryan languages ​​have simplified this system. In Sinhala (probably under Tamil influence) aspiration has been lost, while Asamiya knows no retroflex sounds. Other languages ​​have developed additional phonemes, Sindhi for example the implosives [ ɠ ], [ ʄ ], [ ɗ ], and [ ɓ ]. As far as the nasals are concerned, originally only m , the dental n and the retroflexe ṇ were independent phonemes, and the distinction between the last two is not maintained in all modern languages. The sounds and ñ are mostly positional allophones that only appear before the corresponding plosives, but in some languages ​​they have acquired secondary phoneme status.

In classical Sanskrit , the vibrant r [ r ] and the lateral l [ l ] occurred. Other Indo-Aryan languages ​​have expanded their phoneme inventory in this area: A retroflexer lateral [ ɭ ] already appears in Vedic and later u. A. in Oriya , Marathi , Gujarati and Punjabi . Hindi , Bengali , Punjabi and Sindhi know the retroflex flap [ ɽ ]. While there were four fricatives in Old Indian Aryan - the three sibilants ś [ ɕ ], [ ʂ ] and s [ s ] as well as h [ ɦ ] - in modern languages ​​the three original sibilants to one sound, in the West mostly [s] , in the east [ ʃ ], collapsed. Usually, however, a distinction between [s] and [ʃ] has been introduced again through loanwords. Y [ j ] and v [ ʋ ] appear on half-vowels .

In addition to these original Indo Aryan consonants, many New Indo Aryan languages ​​have borrowed words from Persian and English to adopt new phonemes, namely [ f ], [ z ], [ x ], [ ɣ ] and [ q ]. In all languages ​​except Urdu, however , the position of these phonemes is not very stable; if the pronunciation is careless, they are often replaced by similar-sounding sounds, e.g. p h ilm instead of film .


The number of vowel phonemes in most of the New Indian languages ​​is between six and ten. Romani has only five vowels, while Sinhala has a system of 13 vowels, which is primarily based on the distinction according to vowel length. For the Dardic languages ​​and certain Marathi dialects, systems with up to 18 vowels are described, but these have only been insufficiently researched.

The vowel systems of the main Indo-Aryan languages ​​are as follows:

language Vowel phonemes
Marathi , Nepali : / i , e , a , ə , o , u /
Oriya : / i , e , a , ɔ , o , u /
Bengali : / i , e , æ , a , ɔ , o , u /
Asamiya : / i , e , ɛ , a , ɒ , ɔ , o , u /
Gujarati : / i , e , ɛ , a , ə , ɔ , o , u /
Hindi , Punjabi : / i , ɪ , e , æ , a , ə , ɔ , o , ʊ , u /

Note: The short a sound can be rendered as [ ʌ ] or [ ə ].

The symmetrical ten-vowel system of Hindi and Punjabi is closest to Sanskrit. In Sanskrit, however, the difference between pairs like i / ī was primarily in the length of the vowel: [ i ] / [ ]. In the New Indo-Aryan languages ​​this quantitative difference has been replaced by a qualitative one: [ ɪ ] / [ i ]. However, it is possible that the qualitative difference was associated with the distinction according to the length of the vowel right from the start. At least for the short a [ ə ] and the long ā [ ], a difference in the vowel quality is already described in the oldest grammars. In addition, Sanskrit knew the "consonantic vowels" , r̥̄ and . The last two are very rare, but the r̥ also occurs in modern languages ​​in Sanskrit loanwords and is now spoken as [ ] or [ ] depending on the region (e.g. r̥ṣi [ rɪʃɪ ] " Rishi ").

The phonemes [ æ ] and [ ɔ ] in Hindi and Punjabi originally go back to the diphthongs [ ai ] and [ au ] and are still spoken as such in some dialects. While these two diphthongs are phonematic in Sanskrit, the numerous vowel connections of the New Indo-Aryan languages ​​are not regarded as separate phonemes.

In most of the New Indian languages, the pure vowels are opposed to nasal vowels (e.g. Hindi d "moon"). Sanskrit also knows a nasalization called anusvara ( ) (e.g. māṃsa "meat"), which only occurs in predictable cases and is therefore not phonematic in contrast to the nasal vowels of modern languages. The same applies to the voiceless breath Visarga ( ) present in Sanskrit , which usually occurs at the end of the word and goes back to s or r (cf. the nominative ending -aḥ in Sanskrit with Greek -os and Latin -us ).


The oldest Indo-Iranian language form, Vedic , had a tonal accent that corresponded to that of ancient Greek (cf. Vedic pā́t, padáḥ with ancient Greek poús , podós "foot"). The accent could fall on any syllable of the word and was spoken with a high note ( udātta ). In classical Sanskrit, the tonal accent changed to a dynamic accent based on sonic abundance , as it also occurs in German. The position of the accent did not match the old tonal accent, but fell on the second, third or fourth from the last syllable according to predictable rules, as in Latin . The accent follows different rules in the New Indo-Aryan languages, but is never meaningful. Asamiya is an exception (cf. ˈpise "he drinks" and piˈse "then").

Panjabi is a special case as a tonal language . The three different tones (e.g. koṛā "whip", kóṛā "leper", kòṛā "horse") were created secondarily under the influence of an earlier aspirated consonant (cf. Panjabi kòṛā with Hindi ghoṛā ).

Historical phonology

The ancient Indo-Aryan languages ​​had a complicated phonology that is still very close to the Indo-European type. The main points in which Sanskrit differs from the reconstructed Indo-European original language are as follows:

  • Coincidence of * a , * e and * o to a (cf. Latin agit with Sanskrit ajati "he drives", ancient Greek esti with Sanskrit asti "he is" and ancient Greek posis with Sanskrit patiḥ "husband, master")
  • Change of the syllabic nasals * n̥ and * m̥ to a (cf. Latin Indian and German un- with Sanskrit a- )
  • Monophthongation of * ai and * au to e and o (cf. ancient Gr . Oida with Sanskrit veda "I know")
  • Coincidence of the labiovelars * k w , * g w and * g w h with the velars k , g , gh , in front of the original anterior vowels , these change to the palatals c , j (cf. Latin -que and Sanskrit ca "and") .
  • Change of palatovelars * ḱ , * ǵ and * ǵh to ś , j and h (cf. Latin centum with Sanskrit śatam "hundred"), so Sanskrit is one of the Satem languages .
  • Creation of an unvoiced series of aspirated consonants in addition to the voiced one
  • Origin of the retroflexes under the influence of non-Indian languages.

Complex consonant clusters occur at the beginning of the word and in the interior of the word in Sanskrit (e.g. jyotsna "moonlight"). In contrast, words can only end in certain consonants, connections of several consonants do not usually occur (cf. Latin vox and Avestisch vāxš with Sanskrit vāk "voice"). When sounds come together within a word or when two words come together, sandhi appearances (e.g. na uvāca becomes novāca “he did not say”).

In the Middle Indo-Aryan period the phonology was simplified considerably. Sandhi rules were no longer used, and the phoneme inventory was slightly reduced. The most important change in the Middle Indo-Aryan languages ​​was the radical simplification of the syllable structure to a type that resembled that of the Dravidian languages : consonant connections at the beginning of the word were no longer possible, only certain easy-to-pronounce consonant connections (doubled consonants or connections with a nasal as first component), at the end of the word no consonants were allowed except for the nasalized . The most important sound changes in Middle Indian Aryan are:

  • Reduction of consonant connections at the beginning of a word (e.g. Sanskrit prathama "first", skandha "shoulder" to Pali paṭhama , khandha )
  • Assimilation of consonant connections within the word (e.g. Sanskrit putra "son", hasta "hand" to Pali putta , hattha )
  • Elimination of final consonants (e.g. Sanskrit paścāt “back” to Pali pacchā ), only -m and -n are retained as nasalized anusvara (e.g. kartum “make” to Pali kattuṃ ).
  • Collapse of spirants , S and s (z. B. Sanskrit Desa "Country" dosa "mistake" and dāsa "servant" to Pali desa , dosa , dāsa )
  • Elimination of the consonant vowels , r̥̄ and (e.g. Sanskrit pr̥cchati "he asks" to Pali pucchati )
  • Monophthongation of the diphthongs ai , au and the sound connections aya , ava to e and o (e.g. Sanskrit auṣaḍha "medicinal herb", ropayati "he plants" to Pali osaḍha , ropeti )
  • in the later phase intervowal consonants fail (e.g. Sanskrit loka "world" to Prakrit loa ), aspirated intervocatical consonants become h (e.g. Sanskrit kathayati "he tells" to Prakrit kahei ). This creates sequences of two vowels that were not allowed in Sanskrit.

In the New Indo-Aryan languages, the syllable structure has evolved away from the simple type of Middle Indo-Aryan due to the lack of short vowels. Consonants appear at the end of a word much more often than in Sanskrit, and consonant connections are also possible inside the word. This development is reinforced by loan words from non-Indian languages. Many New Indian languages ​​have undergone special developments that cannot be discussed in detail here. Central features that characterize the phonetics of the majority of the New Indo-Aryan languages ​​are:

  • Short vowels dropped out at the end of a word (e.g. Prakrit phala "fruit" for Hindi , Nepali phal , Bengali , Asamiya phɔl but Oriya phɔlɔ )
  • Failure of unstressed short vowels within the word (e.g. Prakrit sutthira "solid" to Hindi suthrā , Prakrit gaddaha "donkey" to Bengali gādhā ), resulting in the creation of numerous new consonant connections. With polysyllabic stems, this sometimes leads to alternation (e.g. Hindi samajh-nā "understand" and samjh-ā "understood").
  • Simplification of double consonant with replacement elongation (z. B. Prakrit of the preceding vowel satta "seven" to Hindi, Marathi satellite , Bengali ʃāt but Panjabi fed )
  • Replacement of a nasal before a plosive by elongation and nasalization of the preceding vowel (e.g. Prakrit danta "tooth" in Hindi, Bengali dā̃t , but Panjabi dand ).


The morphology of the Indo-Aryan languages ​​has undergone fundamental changes in the course of its development. The ancient Indian Sanskrit was a highly synthetic - inflected language with a complicated theory of forms, not dissimilar to the Latin and ancient Greek . The development towards the Central Indian languages ​​went hand in hand with a clear simplification of the formation of forms. The New Indo-Aryan languages ​​have changed over to a largely analytical language structure with agglutinating elements. Typologically , the Indo-Aryan languages ​​have been strongly influenced by their Dravidian neighboring languages, especially in the area of syntax , this influence is already clearly reflected in classical Sanskrit.


The morphology of nouns is complex in Sanskrit. They have preserved all eight cases ( nominative , accusative , instrumental , dative , ablative , genitive , locative , vocative ) and three numbers ( singular , dual , plural ) of the Indo-European original language . Depending on the stem end and gender, the nouns are divided into different types of declension , each with different case endings. Some tribes use quantitative ablaut and are therefore highly variable (e.g. the tribe pitr̥- "father" forms the following forms: pitā , pitar-am , pitr-e , pitr̥-bhyām , pitr̥̄-n ).

In Central Indian this complicated system was simplified: the dual was lost, the case system was reduced by the coincidence of genitive and dative and the variable consonant stems were converted into regular vowel stems (e.g. Sanskrit gacchant- / gacchat- "going" to Pali gacchanta- ) until there is only one general type of declination in the apabhramsha phase. On the whole, however, the old case system remains in Central Indian Aryan, albeit simplified. As an example, the declination of the word putra- / putta- ("son") is given in the singular in Sanskrit, Pali and Apabhramsha .

case Sanskrit Pali Apabhramsha
Nominative putraḥ putto puttu
accusative putram puttaṃ puttu
Instrumental putrena puttena putteṇa (ṃ), puttẽ, puttiṃ
dative putrāya (puttāya) -
ablative putrāt puttā, puttasmā, puttamhā puttahi, puttaho
Genitive putrasya puttassa puttaha, puttaho, puttassu, puttāsu
locative putre putto, puttasmiṃ, puttamhi putti, puttahiṃ

The New Indo-Aryan languages, on the other hand, have fundamentally redesigned the declension system. Only rudiments of the inflected system of ancient and Middle Indian Aryan are preserved. Usually there are only two primary cases, only a few remnants of the old cases instrumental, locative and ablative can be found. Nominative and accusative, already coincident through sound changes in Apabhramsha, are combined to form the rectus . In opposition to the rectus, there is usually an obliquus (e.g. Hindi laṛkā - lake "boy", Gujarati ghoḍo - ghoḍā "horse"). Some languages ​​like Bengali or Asamiya do not have a special form for the obliquus. Formally, the obliquus mostly goes back to the genitive and has retained its function in a few languages. In most languages, however, it does not appear alone, but is further differentiated in terms of its meaning through a system of postings or secondary affixes (e.g. Hindi laṛke ko "the boy", Gujarati ghoḍānu des "the horse"). These affixes originally go back to independent words, but are partly fused with the obliquus to form secondary agglutinating case endings. The genitive ending -er in Bengali is derived from the particle kera on the ancient Indian noun kārya with the meaning "matter".

The plural is formed in different ways. If this happens inflected in Hindi, for example, with an ending that simultaneously expresses case and number (cf. nominative singular laṛkā , plural laṛke ; obliquus singular laṛke , plural laṛkõ ), other languages ​​such as Bengali, on the other hand, use agglutinating plural suffixes to which additional case formants are added tread (cf. nominative singular chele , plural chele-gulo ; objective singular chele-ke , plural chele-gulo-ke ).

The ancient and central Indian languages ​​know the three Indo-European genera masculine , feminine and neuter . Among the New Indian languages, this system has been retained in the Western languages ​​( Gujarati , Marathi , Konkani ). There are also three genera in Sinhala , but this is a different system based on animate and natural gender , as in the Dravidian languages . In most of the New Indian languages, the masculine and neuter have coincided. The gender category is less pronounced towards the east. They have completely lost the easternmost languages ​​Bengali, Asamiya and Oriya , as well as Khowar and Kalasha at the opposite end of the Indo-Aryan language area.


In Old Indo-Aryan ( even more so in Vedic than in Classical Sanskrit) the verb is characterized by a large number of forms. The verbs are in three people, three numbers (singular, dual, plural) and three genera Verbi ( active or parasmaipada , medium or ātmanepada and passive ) conjugated . The tense system is present tense , past tense , future tense , aorist , perfect yet and the Vedic pluperfect pronounced. Originally the past tenses differed in their meaning, but later they are used synonymously. The most common way to express the past is in classical Sanskrit a nominal sentence with the participle perfect passive . At modes exist indicative , subjunctive , optative and imperative . There are also several participles of active voice and passive voice, gerunds , an infinitive and a system of derived verbs ( causative , desiderative , intensive ). The verbs are divided into ten classes according to the formation of the present stem. The main distinction here is between the thematic verbs that insert the theme vowel a between the stem and the ending and the athematic verbs that do not. The past tense, like the imperative and optative, is formed from the present tense, the forms of the other tenses are independent of the present tense. The verbal morphology of Sanskrit is complicated and regularly uses means such as reduplication and quantitative ablaut ( guṇa - and vr̥ddhi level) (e.g. from the stem kr Stamm- "make" the forms kr̥-ta "made", kar-oti " he makes ”and ca-kār-a “ he made ”).

In the Central Indian languages, this system is simplified and made more regular. The past tenses, the distinction of which was only artificially maintained in Sanskrit, are completely replaced by the participle construction. The subjunctive is dying out, as is the medium and the derived verbs with the exception of the causative.

In addition to the old synthetic forms, which are conjugated according to person and number, the New Indo-Aryan languages ​​use participle forms to a greater extent , which change according to gender and number, as well as analytical (compound) verb forms from participle and auxiliary verb . The various New Indo-Aryan languages ​​differ in how they use these options, as the following comparison of some forms of the verb for "to come" shows in the most important New Indo-Aryan languages:

language tribe Present tense ("I'm coming") Perfect ("I came") Future tense ("I will come")
Hindi - Urdu - ātā hū̃ (m.), ātī hū̃ (f.) āyā (m.), āī (f.) āū̃gā (m.), āū̃gī (f.)
Punjabi āu-, āv-, ā- āundā hā̃ āiā (m.), āī (f.) āvā̃gā (m.), āvā̃gī (f.)
Kashmiri y (i) -, ā- chus yivān (m.), ches yivān (f.) ās (m.), āyēs (f.) yimɨ
Sindhi ac-, ā-, ī- acā̃ ​​tho (m.), acā̃ thī (f.) āīus (m.), āīasi (f.) īndus (m.), īndīas (f.)
Gujarati āv- āvũ chũ āyvo (m.), āvī (f.) āviʃ
Marathi ye-, ā- yetõ (m.), yetẽ (f.) ālõ (m.), ālẽ (f.) yeīn
Sinhala e-, āv- enavā, emi āvā, āmi ennam, emi
Oriya ās- āsẽ āsili āsibi
Bengali āʃ- āʃi āʃlum āʃbo
Asamiya āh-, ɒh- āhõ āhilõ āhim
Nepali āu-, ā- āũchu āẽ āunechu


The normal sentence order is in all language levels of the Indo-Aryan Subject-Object-Verb (SOV). In Sanskrit this word sequence can still be varied quite freely, in the New Indo-Aryan languages ​​the word sequence is more tightly regulated. A clause can only be placed after the verb for special emphasis. The Indo-Aryan languages ​​also share the other typological features that are characteristic of SOV languages: They use postpositions instead of prepositions (e.g. Sanskrit rāmena saha "with Rama") and put the determining element before the definite one. This means that attributes precede their reference words and subordinate clauses . Examples of SOV word order with interlinear translation :

May tum ko ye kitāb detā hū̃
I to you this book giving am

Hindi : "I will give you this book."

āmi egg āmgula nūtan bazār counter enechi
I this Mangoes New market from brought

Bengali : "I brought these mangoes from the new market."

guru-varayā maṭa iskōlē-di siṃhala akuru igennuvā
Teacher + Honorificum me School-in-during Sinhala Letters taught

Sinhala : "The teacher taught me the Sinhala script when I was in school."

Even in classical Sanskrit, the preferred way of expressing a sentence from the past was a passive construction with the past participle , in which the acting person is in the instrumental (e.g. bālena kanyā dr̥ṣṭā literally "the girl [is] seen by the boy" instead of bālaḥ kanyām apaśyat “the boy saw the girl”). This construction is also extended to intransitive verbs (e.g. mayā suptam literally “[it was] slept by me” for “I slept”). In the New Indo-Aryan languages, an ergative-like construction has developed from this . A characteristic of this is that with transitive sentences of the past the subject takes on a special form, which is called agentive , while with intransitive verbs and present sentences it is in the basic form. Compare the following example sentences from the Hindi:

laṛka kitāb kharīdtā hai
Boy book is buying

"The boy is buying the book."

lake no kitāb kharīdī
Boy (agentiv) book Bought

"The boy bought the book."


Native grammar divides the vocabulary of modern Indo-Aryan languages ​​into four categories, referred to by Sanskrit names:

  • tadbhava (" originated from it [i.e. from a Sanskrit word]"): Hereditary words from the Old Indian language
  • tatsama ("the same as that [i.e. a Sanskrit word]"): direct borrowings from Sanskrit
  • deśya ("local"): words with no equivalent in Sanskrit
  • videśi ("foreign"): loan words from non-Indian languages

Hereditary words

The core of the New Indian lexic is formed by the Tadbhava words, which were naturally borrowed from the Old Indian via the intermediate stage of the Central Indian Prakrit and were changed in shape through a series of sound changes. The Hindi word khet (“field”) goes back to Sanskrit kṣetra via Prakrit khetta . Some words like deva (“god”) or nāma (“name”) already had such a simple form in ancient Indian that they were not subject to any further changes. Tadbhava words can have an originally non-Indo-European origin, as borrowings from the Dravidian and Munda languages are already available in Sanskrit .

Some Indo-Aryan word equations

language hand tooth ear do drink Listen
Sanskrit hasta danta karṇa kar- pib- śṛn-
Hindi hāth dā̃t can kar- pi- sun-
Bengali Has dā̃t kān kɔr- pi- ʃon-
Punjabi hatth dand can kar- pi- suṇ-
Marathi Has dāt kān kar- pi- aik-
Gujarati hāth dā̃t kān kar- pi- sā̃bhaḷ-
Oriya hātɔ dāntɔ kānɔ kɔr- pi- suṇ-
Sindhi hathu ɗandu canoe kar- pi- suṇ-
Asamiya Has dā̃t kān kɔr- pi- xun-
Nepali Has dā̃t kān at all- piu- sun-
Kashmiri athɨ dād can kar- co- buz
Sinhala ata data kaṇa kara- bo-, bī- aha-, äsu-
Romani vast dand can ker- pi- sun-


Words that have been taken directly from Sanskrit in an unchanged form (or better: spelling) are called tatsamas. The pronunciation can vary: the distinction between certain sounds such as ś and is usually no longer observed, in Hindi and Marathi the short a at the end of a word is often omitted, in Bengali consonant combinations are assimilated. The Tatsama ātmahatyā (“suicide”) is pronounced in Hindi [ aːtmʌhʌtjaː ], while in Bengali it is pronounced [ ãttohɔtta ]. In Sinhala , words adopted from Pali , which as the language of the Buddhist canon plays a similarly important role as Sanskrit, are counted among the Tatsamas. Sometimes there are doubles of Tadbhava and Tatsama words, whereby the Tatsama usually has a more specialized meaning. In Hindi, for example, in addition to the aforementioned Tadbhava khet , there is the Tatsama word kṣetra for a field in the concrete sense (i.e. one that can be plowed), which denotes a field in the figurative sense ( i.e. a field of employment, etc.).

In modern Indo-Aryan literary languages ​​(except those like Urdu , which are subject to the cultural influence of Islam) the use of Sanskrit words has taken on very large proportions. Above all in the vocabulary of the higher register there are many Sanskritisms, similar to the way Latin and Greek foreign words are used in European languages . Nationalist circles promote the use of Sanskrit words as a symbol of political Hinduism and try to establish Sanskrit neologisms in the written language for newer terms such as “electricity” . In everyday language, however, artificial Sanskrit neologisms can hardly prevail against English loanwords.


The Deśya words include words without parallels in Sanskrit. These include words inherited from ancient Indian dialects that are missing in Sanskrit, as well as borrowings from the Dravidian and Munda languages. In addition, in the New Indian languages ​​there are a large number of borrowed words from non- Indian languages, especially Persian , Arabic , Portuguese and English , which Indian grammar classifies as part of the videśi category.

During the approximately eight hundred years of Islamic rule in northern India, Persian was the court language of the upper class. This is how many Persian and, through Persian mediation, Arabic words found their way into the Indo-Aryan languages. For Urdu, the language of the Indian Muslims, Persian Arabic takes on a similar role as a source for words of higher stylistic levels such as Sanskrit for the languages ​​mostly spoken by Hindus. The proportion of these in Urdu is correspondingly high; it is naturally particularly low in languages ​​such as Nepali , Asamiya or Sinhala, which were not exposed to any lasting Islamic influence.

A relatively small proportion of the foreign words are borrowed from Portuguese, with which the Indian languages ​​came into contact through European seafarers from the 16th century. Words like chave for “key” (Hindi cābhī , Marathi cāvī ), janela for “window” (Hindi janglā , Bengali jānālā , Sinhala janēlaya ) or mestre for “craftsman” (Hindi mistrī , Marathi mestrī ) were taken from the Portuguese language . English loanwords have been extremely numerous since the British colonial era. Modern terms such as “hotel” ( hoṭal ), “ticket” ( ṭikaṭ ) or “bicycle” ( sāikil , from cycle ) were taken from English.


The Indo-Aryan languages ​​are written in a variety of scripts: various Indian scripts, the Persian-Arabic script and, in individual cases, the Latin script. Dhivehi , the language of the Maldives, has its own script, called Thaana . It was modeled on Arabic numerals and other elements in the 15th century .

Indian writings

Hindu mantra in Sanskrit in Devanagari script

Most of the scripts used for the Indo-Aryan languages, like the scripts of South India, Southeast Asia and Tibet, belong to the family of Indian scripts , all of which are derived from the Brahmi script . The Brahmi script appears for the first time in the 3rd century BC. BC in the inscriptions of Emperor Ashoka . Its origins are unclear, it is believed that it was modeled on the Aramaic alphabet , while the popular Indian thesis of ancestry from the Indus script is rejected by Western researchers. In the course of time, the Brahmi script split into numerous regional variants, some of which differ greatly from one another graphically. Structurally, however, they are very similar and all share the same functional principle. They are an intermediate form of the alphabet and syllabary, so-called Abugidas , in which each consonant sign has an inherent vowel a , which can be modified by diacritical marks . Consonant connections are expressed using ligatures . Unlike in the Latin alphabet, for example, the order of the characters in Indian scripts is not arbitrary, but reflects the phonology of the Indo-Aryan languages. The letters are arranged as follows:

The inventory of characters is essentially the same in the different scriptures. Some fonts have a special character for the retroflexe , other special characters could be created by adding a point below.

The following Indian scripts are used for Indo-Aryan languages ​​(as an example, the first series of consonants ka , kha , ga , gha , ṅa is given):

font Languages) example
Devanagari Hindi , Bihari , Rajasthani , Marathi , Nepali Devanagari velars.png
Bengali script Bengali , Asamiya , Bishnupriya Manipuri Bengali velars.png
Gurmukhi Punjabi Gurmukhi velars.png
Gujarati script Gujarati Gujarati velars.png
Oriya script Oriya Oriya velars.png
Sinhala script Sinhala Sinhala velars.png

Sanskrit was traditionally written in the script of the respective regional language, today Devanagari has established itself as the usual script for Sanskrit texts. Several scripts are used in parallel for some languages: Kashmiri is written in Persian-Arabic script in Pakistan, and in Devanagari in India. Three scripts are even used for Punjabi: the Persian-Arabic in Pakistan, Gurmukhi among the Sikhs and Devanagari among the Panjabi-speaking Hindus .

Persian-Arabic script

Urdu poem by Mirza Ghalib in the Nastaliq style

Urdu , the language of the Indian Muslims, like the other Indo-Aryan languages ​​used in Pakistan ( Sindhi , Panjabi, Kashmiri), is written in the Persian-Arabic script, a version of the Arabic alphabet that has been expanded by a few special characters . The Arabic script is not very suitable for rendering Indo-Aryan languages. On the one hand, short vowels are not expressed and even with long vowels it is not possible to distinguish between ū , ō and au . Arabic loanwords contain redundant letters that are pronounced the same (e.g. Sin , Sad and Tha all as s ). For other sounds that occur in the Indo-Aryan languages, however, there are no signs in the Arabic script, so that these had to be recreated with the help of diacritical marks (e.g.ٹ ,ڈ andڑ for the retroflex sounds in Urdu). The Urdu and Sindhi alphabets show differences in the formation of these special characters (this is how the Sindhi retroflexes areٽ ,ڊ andڙ ), while Punjabi and Kashmiri are based on Urdu orthography. Sindhi also uses its own special characters for the aspirated consonants, while in Urdu they are expressed by the combination of the non-aspirated consonant and h (e.g. Sindhiٿ and Urdu تهfor th ). Urdu also differs from Sindhi in the use of the curved Nastaliq ductus, which is preferably written in the simpler Naskhi .

Latin script

The only Indo-Aryan languages ​​that are regularly written in the Latin script are Konkani and Romani . For Konkani, the language of Goa , an orthography based on Portuguese was created in the 16th century . Konkani is also written in Devanagari script. For Kalasha , the previously illiterate language of the Kalasha of Chitral , the Latin alphabet has recently been used in school lessons.

The Latin transliteration is used in a scientific context . The usual standard is the International Alphabet of Sanskrit Transliteration ( IAST ). In the representation of the consonants it is based on the sound value of the letters in English . B. y written for [j]. Aspirated consonants are expressed by the digraphs kh , th etc. Other sounds for which there is no corresponding Latin letter are expressed in the IAST transliteration by means of diacritical marks, such as the macron for marking long vowels or dots below for retroflex sounds.



