Why state data hubs pose a risk to Aadhaar security
UIDAI says it is the sole custodian of citizen information. But state governments and police are using Aadhaar numbers to consolidate data, which is being stored without any coherent data security policy and is accessed by private players.Updated: Mar 13, 2018 11:33 IST
On January 18, a policeman showed up at the door of Abdul Hafeez, a 37-year-old businessman in Kulwakurthy in Telangana’s Mahbubnagar district, and asked to record his fingerprints, Aadhaar number, phone numbers, social media accounts, voter ID, passport, and the names and numbers of his family members, associates, lawyers, pawn brokers, and “concubines”, if any.
The policeman also geo-tagged Hafeez’s home by noting down its GPS coordinates. This information, Hafeez was told, was for the Sakala Nerasthula Samagra Survey, the brainchild of the state police to gather a vast archive of personal information of convicted criminals, suspects and under-trials.
The data, according to a police circular, would be stored in TSCOP — a new software application downloaded onto the phones of the state’s constables.
“With TSCOP a policeman moving through a difficult area can see in real time: who are the criminals in this area? Where do they live? Who are the people here who can help the police?” said M Mahendar Reddy, Telangana’s Director General of Police, who oversaw the development of TSCOP. “It is not meant for senior officers, it is meant for beat policemen who are on the front lines.”
For Hafeez, however, the survey was the most recent instance of harassment at the hands of the local police.
“I am not a repeat offender,” Hafeez said in an interview. “I am a victim of police harassment.”
The story of Hafeez and TSCOP offers a peek into the largely unnoticed big-data revolution sweeping through Indian state governments. Administrators and police departments are using individual Aadhaar numbers to consolidate citizen data scattered across disparate government departments, allowing for the creation of detailed personal databases.
This data is being collected without the consent of citizens, in the absence of a data privacy law, and stored without any coherent data security policy.
“Modern databases are designed to interoperable,” said Rahul Matthan, a partner at law firm Trilegal. “But when we start to use these databases, we need a proper privacy framework that holds government and private parties to account for how they use this data.”
Such a framework, Matthan said, needs to move beyond simply asking for user consent —as individuals may not understand the implications of sharing their data. Rather, the framework should hold data-gathers and users accountable if their use of this data results in any harm being caused to the individual.
A resident’s demographic (name, address, date of birth) and biometric (fingerprints and photograph) data is captured, stored as an encrypted data packet and forwarded to the state registrar. The Resident is issued an Enrolment Identity number (EID).2 State Registrar
Every state has a nodal office for enrolment called the registrar. This office has its own encryption key, meaning it can de-crypt and access enrolment data. The registrar routes enrolment data packets to the UIDAI, and just the demographic data, seeded with EID numbers, to the SRDH.3 UIDAI
Decrypts enrolment packet, checks for duplicate entries, and issues a corresponding Aadhaar number, or UID, to the user. The UIDAI shares the UID corresponding to each EID with the SRDH.4 SRDH
is an Aadhaar-seeded repository of information consolidated from multiple government databases. The SRDH uses an individual’s Aadhaar number as a unqiue identifier to inter-link these scattered databases. The SRDH is also linked to the UIDAI servers to allow for Aadhaar-enabled biometric authentication.Government Databases
Prior to the SRDH, each department maintained separate lists of citizens accessing their schemes and services. The SRDH merges these databases. It’s "scalable’ structure means that states can integrate as many databases as they like. In Gujarat, state administrators also added a biometric database in the system.DIFFERING VIEWS
Advocates : State administrators say that data hubs like the SRDH allow them to build detailed profiles of individual citizens, identify genuine beneficiaries for each scheme, and reduce losses due to corruption or wastage.Detractors: Privacy advocates say the SRDH violates the privacy of citizens by allowing state agencies to monitor individuals without their consent.The UIDAI has consistently said Aadhaar cannot be used to build citizen profiles. SRDHs show this is not the case.Security experts say that consolidating such vast amounts of data makes the SRDH an attractive target for hackers looking to steal such data.
On February 22 this year, Gujarat state counsel Rakesh Dwivedi acknowledged that the Gujarat government had maintained an archive of biometric data of the state’s residents in a software application called the “State Resident Data Hub” (SRDH).
This data, Dwivedi said, was erased shortly after the Aadhaar Act was implemented in March 2016. His admission, before the constitutional bench of the Supreme Court, marked the first time a state government had formally conceded to maintaining such a database. Meanwhile, Gujarat continues to maintain a separate biometric database of ration card holders, as reported by HT on January 21 2018.
The Unique Identification Authority of India (UIDAI) has consistently maintained that it is the sole custodian of citizen data collected during the Aadhaar enrolment process. Dwivedi’s statement in court reveals this was not always the case.
“These state hubs were created under an agreement between the state and UIDAI,” Dwivedi said in remarks reported in HT. Hindustan Times has seen one such draft MoU.
Applications like TSCOP and SRDH, which contain the Aadhaar numbers and biometrics of citizens, reveal the complete absence of public information regarding who has access to the sensitive personal data of over one billion Indians enrolled in the controversial Aadhaar programme, and how it is put to use.
These applications also reveal
Aadhaar’s central paradox wherein a system designed to minimise data collection has given state and central governments the ability to gather and centralise increasingly detailed information about their citizens.
“Any policy that proceeds without adequate public consultation can only damage democratic practices in the country,” said Reetika Khera, a professor of economics at IIT Delhi, who has written extensively on privacy and Aadhaar. “This is no exception.”
Telangana’s TSCOP and Gujarat’s SRDH appear to be two different applications, but six state-level IT administrators and programmers familiar with the projects, say they are based on the same principle — of using Aadhaar as a common identifier to integrate previously discrete data silos.
“The SRDH was a prototype to showcase certain capabilities,” said a coder who worked on the project.
The UIDAI did not respond to a detailed HT questionnaire asking about the existence of these biometric databases, or the veracity of the draft MoU accessed by HT.
State Resident Data Hubs
In 2012, the UIDAI asked a consortium of private companies to develop data hubs to help states “leverage the true potential of Aadhaar and Aadhaar-enabled service delivery”, according to the SRDH Institutional Framework published in April that year.
A draft copy of an agreement between the UIDAI and state governments, obtained by HT, reveals that SRDHs were created to provide a way for states to utilise the demographic data they had collected on behalf of the UIDAI during Aadhaar enrolment.
The UIDAI, the draft MoU said, would define “the process for accessing, securing and keeping up-to-date resident KYR (Know Your Resident) data as collected during enrolment.” HT was unable to access a signed copy of these agreements between states and the UIDAI.
An SRDH advisory document issued by UIDAI in March 2012 notes that “SRDH does not (and should not)store biometric data.” However, Dwivedi’s admission that Gujarat was holding biometric resident data in its SRDH points to the existence of separate agreements allowing states to merge biometric information they had gathered, with the demographic data they obtained from UIDAI.
“We made a generic programme for states to manage their Aadhaar data,” a coder who worked on the project said, seeking anonymity as he was bound by a non-disclosure agreement. “We provided them with runnable code. Each state was responsible for implementing it themselves.”
States could modify the code, UIDAI documents said, but customisation would void warranty-support offered by Mahindra-Satyam, a private software company, which also prepared the deployment guide for the software. Accenture, KPMG, Ernst and Young, Price Water House Coopers, Wipro and Deloitte Touche Tomahatsu were empanelled as project consultants, the document said.
The UIDAI’s Project Management Unit coordinated with a network of private vendors and government departments to implement the programme.
The SRDH created a resident data repository equipped with tools to allow states to seed this bulk database with individual Aadhaar numbers.
“We created a data repository of bulk resident data that states could seed with Aadhaar numbers,” the coder explained. “We call it inorganic batch seeding.”
For example: A list of ration card holders could be merged with a list of names and Aadhaar numbers to create a unified list of ration-card holders organised by Aadhaar number.
This could further be merged with a list of old-age pensioners to get a list of ration-card holders who also availed of pensions, and so on to create master databases that were integrated with Aadhaar-based biometric authentication to access these services.
The process of seeding in bulk, the coder said, is probabilistic rather than exact.
“There are many residents with the same name — so we give different weightages to variables like name, date of birth, pin code,” he said. “The system will give possible matches and ask the system administrator to manually select one.”
Mismatches during inorganic seeding could explain periodic news-reports of villagers being denied food rations.
Project Management Unit communication, interviews and documents reviewed by HT reveal that the SRDH rollout epitomised the “build the plane while you’re flying” approach beloved of software engineers.
Private vendors had access to the sensitive personal information and Aadhaar numbers of millions of Indians. Much of the data was stored as un-encrypted csv files on thumb drives, and emails exchanged between trouble-shooting teams included screenshots of resident data including Aadhaar numbers.
HT found the SRDH deployment guide — essentially an installation manual — hosted on an open-access server.
“With this kind of guide, you have a lot information on how the website is structured, the default passwords etc,” said Robert Baptiste, a French security researcher who has exposed numerous flaws in databases managed by Indian government departments.
In his most recent exploit, Baptiste gained access to 40 GB worth of personal data of 47,000 BSNL employees, hosted on a BSNL website.
Publishing the SRDH deployment guide online, Baptiste said, made it very easy for hackers to spot holes in the security architecture of the SRDH. “This should be an internal document,” Baptiste said.
Coders functioned as a hive mind, developing “innovations” with little regard to security or privacy concerns.
One internal document shows how a coder came up with a way for SRDH users with administrator privileges to link multiple identity cards like passports and drivers licences without the consent of the individual.
“Since SRDH houses a giant database of UID numbers, it makes sense to link a person’s UID number to other identity/proof of residence cards that he/she might have,” the coder wrote. “With KYR Plus you can view details about other identity cards that the individual might have easily.”
In retrospect, ‘KYR Plus’ marked the point where Aadhaar shifted from a lean system to verify the identity of a user, into a tool to consolidate vast amounts of information about every resident.
The final slide of the internal document illustrates how a system administrator could extend the SRDH to inter-link any personal document of a resident.
“If the type of card you are looking for doesn’t exist.You can easily add a new card type here,” the coder wrote, offering an example. “License to Kill.”
HT interviewed senior IT officials in four states, who said they struggled to integrate the SRDH with their programmes.
Yet, some states latched on its potential and built their own portals.
Andhra Pradesh created a version called “People’s Hub” which uses a resident’s Aadhaar number to consolidate 29 different department databases to create a “single source of truth” on the resident, according to AP government documents. The hub, officials said, does not hold biometric data.
“Now we know per household, the benefits being given,” said A Babu, CEO of the Real Times Governance Society, the state entity overseeing the People’s Hub. “Each household has an 8-digit number and its GPS coordinates are fixed on a map.”
The data in the People’s Hub is from a detailed survey conducted by the Andhra government. The results are partially visible on an open access website that plots the GPS location of every single house in surveyed districts.
Clicking on a house on the map reveals the names of the residents, and their partially masked Aadhaar numbers. Babu said integrating citizen data at this scale makes it easy to quickly identify the beneficiaries of government schemes, and seamlessly route entitlements to their respective bank accounts.
Privacy advocates like Khera question how much citizen data must be collected, and under what circumstances.
“Efficient administration requires ‘some’ information, not all information, and certainly does not require it to be centralised,” Khera said. Recent research on data-mining, she said, indicates that, “algorithms based on inaccurate assumptions can end up harming instead of helping.”
Abdul Hafeez, the man who took on TSCOP — the Telangana Police app — is one example of the dangers indiscriminate data collection and mining.
TSCOP follows the same principle as the SRDH, except with police data. Here Aadhaar numbers are integrated with police databases to build detailed profiles of convicted criminals, and under-trials.
TSCOP is based on a similar platform called HydCOP, according to M. Mahendra Reddy, the DGP of Telangana, who developed both applications.
For HydCOP, Reddy wrote in a 2016 paper, the homes of 3,500 “repeat offenders” in Hyderabad were surveyed and “geo-tagged for periodical visits by the front-line police officers.” The platform, Reddy wrote, was integrated with “UIDAI and NIC databases.”
When asked about the legal basis for integrating a police application with the UIDAI, Reddy said UIDAI integration was only for policemen to mark their attendance using the biometric sensors of their phones. UIDAI did not respond to a questionnaire sent by HT.
So, when TSCOP was launched, Reddy announced a similar state-wide survey of criminals to collect data for the app. This brought the police to Abdul Hafeez’s house.
In 2010, Hafeez had filed an RTI application at his tehsil office to establish the ownership-status of public land encroached upon by a local builder.
“When I filed the RTI, the local police and land mafia filed a false case of cheating against me.” Hafeez said. “I was acquitted in 2013.”
Then the police filed another case against him for possession of black jaggery, a controlled substance, despite – Hafeez said, his providing them with all the relevant documentation.
“So when the police came and said they are entering my name in a repeat offenders database, I took them to court,” Hafeez said.
Once the petition was admitted, the state government simply withdrew the police circular and the state-wide survey was suspended. “The state government did not even contest the case, because they knew they cannot just gather personal data like this,” said Y Sheelu Raj, Hafeez’s lawyer. “They are taking advantage of the fact that our country does not have a data protection law, and hoping no one will protest.”
DGP Reddy declined to give the legal basis for the police gathering such intrusive citizen data. He also refused say what the Telangana police has done with all the fingerprints and geo-tagged data they have already collected.
“We are following all provisions of relevant laws,” he insisted.