Why linguistic diversity does not mean India is cosmopolitan

While a number of people speak various languages, they remain geographically segregated.

india Updated: Aug 30, 2018 13:23 IST
Vijdan Mohammad Kawoosa
Hindustan Times, New Delhi

That India is a diverse country in terms of languages is well known. The 2011 census lists 121 different mother-tongues spoken by Indians. Fourteen of these languages have at least 10 million speakers. This is more than the population of many European countries. But these headline numbers on diversity should not be interpreted as proof of India being a cosmopolitan region in terms of languages. This means that while there is a sizeable number of Indians speaking various languages, they are mostly geographically segregated.


A Mint article by Karthik Shashidhar applied the Herfindahl-Hirschman Index (HHI) on recently released language data from the 2011 census to measure linguistic diversity across Indian states.

The HHI is normally used to measure monopoly power of firms operating in a market and calculated as sum of squares of market share of firms in a market. The inverse of this value represents the effective number of sellers. For example, if two firms had a share of 50% each in a given market, then the HHI would take a value of 0.5 (0.25+0.25) and the effective number of firms would be 2 (1/0.5). If these firms had a market share of 90% and 10% respectively, the HHI would have taken a value of 0.82 and the effective number of sellers would be 1.21. As is clear, the index is useful to measure the skewness (or lack of it) of the languages spoken in a region.

HT has applied the HHI method at the district level to get an even more disaggregated picture of linguistic diversity in India. There were 640 districts in India during the time of the 2011 census. Out of these there were only 21 districts where the effective number of languages spoken is greater than or equal to four. In as many as 428 districts, the effective number of languages spoken is less than or equal to 1.5.

These results present a very different picture from a similar analysis at the national level. The effective number of languages is 4.6 at the national level. Dimapur district in Nagaland has the highest linguistic diversity in India, where the effective number of languages is 9.8. (Chart 1: Districts by effective number of languages)


Urban regions have a marginally higher linguistic diversity. The average effective number of languages in urban areas is 1.75 against 1.51 in rural areas. There are 19 districts which have over 90% urban population. These also include the national capital city of Delhi. Among them, nine districts of Delhi have an average of only 1.4 effective languages. This is much lower compared to many other big metropolitan cities. Bangalore (3.85), Mumbai (4.25), Hyderabad (2.62) and Kolkata (2.24) are some examples. (Chart 2: Effective number of languages in districts with 90% plus urban population).


Our analysis also shows that the districts where Hindi is the single largest mother tongue have low linguistic diversity as compared to other regions. The average effective number of languages in Hindi-dominated districts is 1.27 compared to 1.86 elsewhere. To be sure, there are many languages within the broad category of Hindi itself.

In states such as Bihar, more people report dialects such as Bhojpuri and Magadhi as their mother-tongue than Hindi per se. Among the languages with over 10 million speakers, Hindi is the only language which breaks down into multiple major dialects. If one were to measure the linguistic diversity at the dialect level, the effective number of languages would go up significantly in Hindi speaking states. (Chart 3: Effective number of languages in Hindi speaking states at language-level and dialect-level)


These statistics are perhaps proof that the proverbial Hindi belt is actually a much more cosmopolitan region in terms of languages if one sheds the monolithic view on Hindi.

