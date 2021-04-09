On-screen, dowry is more problematic, newborns now include girls, beauty continues to be associated with a fair complexion, and the doctors are overwhelmingly upper-caste and Hindu.

These are some of the findings of an AI-driven analysis of subtitles and lyrics across a total of 1,400 films — 200 from each of the past seven decades, with half that number being the highest-grossing 100 Bollywood releases of each decade.

The study was designed by a student and two researchers at Carnegie Mellon University (CMU), using statistical language models that look for such factors as what words are closely associated with each other. Two of the three are movie buffs — a student named Kunal Khadilkar and his mentor, AI researcher Ashique R KhudaBukhsh, both part of CMU’s Language Technologies Institute (LTI). The third is Tom Mitchell, a Founders University professor at CMU’s School of Computer Science. Their six-month study was conducted between June and December.

“We had wanted to study how women’s representation evolved in popular content over time,” says KhudaBukhsh. “But we realised there is a serious lack of large-scale AI study in this entertainment industry that touches so many lives.” The same natural language processing tools might be used to rapidly analyse hundreds or thousands of books, magazine articles, radio transcripts or social media posts, Mitchell, co-author of the study, wrote in a report for CMU.

The tests were carried out using various assessment techniques. A new language model called BERT was used to perform a Cloze test to assess depiction of beauty in films. “If you feed thousands of sentences to BERT and then ask the system to perform fill-in-the-blank tests, it outputs a list of possible completions ranked by probability. For example, in the following Cloze test: “The name of a large city in Spain is ___”, the top three options generated by BERT are: Madrid, Barcelona and Valencia. After we fed Bollywood movie subtitles to BERT and performed the Cloze test: “A beautiful woman should have ___ skin”, the top prediction was “fair” across all eras,” says KhudaBukhsh.

Similarly, to examine evolving national priorities as portrayed in popular entertainment, when fed the questions: The biggest problem of India is ___. The answers they got from the model trained on the older Bollywood movies were: poverty, love, war, hunger, unemployment. The answers they got from the model trained on more recent Bollywood released were: poverty, Pakistan, Kashmir, terrorism, corruption.

This type of analysis has its limits, the researchers acknowledge. It considers only subtitles, which reflect spoken dialogue and song lyrics, and don’t account for the way biases might be expressed by a film’s visuals. Still, what makes this analysis important is that it goes beyond anecdotal evidence, says KhudaBukhsh. “Our methods allow us to quantify and compare biases across timespans, genres, and movie industries, to analyse biases commonly known to already exist in Bollywood films.”

Other dominant themes: corruption (seen here is the candlelit-protest scene from Rang De Basanti, 2006) and Pakistan, Kashmir and terrorism (all of which unite in Phantom, 2015).

Among the happy findings, babies born inside films from 1950 to 1999 were overwhelmingly boys (70%). In films from 2000 to 2020, 46% of newborns are girls. “Without a large-scale analysis, these types of insights are hard to obtain,” says KhudaBukhsh.

In terms of the words associated with dowry on-screen, “we find that words such as ‘loan’, ‘debt’ and ‘jewelry” appeared in Bollywood films of the 1950s.” By the 1970s — helped along possibly by the passing of the Dowry Prohibition Act in 1961, “other words, such as ‘consent’ and ‘responsibility’, start surfacing. Finally, in the 2000s, the words most closely associated with dowry are ‘trouble’, ‘divorce’ and ‘refused’,” says KhudaBukhsh.

The study found that the representation of non-Hindu communities has increased. Muslims made up 6.16% of characters and now make up 7.81%; Sikhs have gone from 7.26% to 8.06% and Christians from 0.22% to 0.49% in newer films.

“These numbers are misleading and easily manipulated and don’t tell us anything about how the portrayal of Muslims in Bollywood has been vitiated over the years, for instance,” says Meenakshi Shedde, film curator and South Asia delegate to the Berlin film festival, encapsulating her concerns with the relative superficiality of the findings. “I’d be more interested in the big picture. This kind of number-crunching leaves out qualitative analysis. And although AI seems very futuristic, the codes are written by people who are normal human beings with biases like anybody else.”

HERE AND THERE

A similar analysis of Hollywood romance and action movies revealed stark gender biases particularly in the occupations assigned to characters. Most men tended to be doctors or soldiers, and most women were nurses or homemakers.

Hollywood was also found to exhibit a bias towards lighter skin colour (which is stating things mildly).

In terms of national priorities as reflected in films, the Cloze test results for the question: The biggest problem of America is ___ (when applied to older Hollywood releases) threw up the results: war, poverty, unemployment, slavery.

The answers from the model trained on newer Hollywood releases threw up the responses: poverty, slavery, immigration, unemployment, money, war, racism.