The 2026 reset: Assam to Puducherry

‘We are Chitragupta of the Bengal SIR’: Meet the SABAR team tracking deleted voters

Earlier this month, a study by the Kolkata-based SABAR Institute uncovered an alarming disparity in West Bengal’s Nandigram constituency, where Muslims – who comprise 26 percent of the electorate – accounted for 95.5 percent of voter deletions across recent supplementary voter lists published by the Election Commission of India.

In the seat represented by BJP leader Suvendu Adhikari, the numbers were stark: an analysis of 13 supplementary lists showed that out of 2,902 deletions in the category, 2,757 were Muslims. This pattern suggests that removals triggered by the newly introduced “logical discrepancies” category in certain constituencies, such as Nandigram, were concentrated within a single community. According to the final tally, 3,461 voters were eventually removed from Nandigram following the adjudication process.

The findings were published by The Telegraph and other national and regional outlets, but the data source was the SABAR Institute, a research group focused on evidence-based studies of disadvantaged communities. Several news organisations, including The Hindu, The Times of India, BBC, Anandabazar Patrika, and The Indian Express, have cited their analysis on SIR. But it was their analysis of Nandigram Assembly Constituency in East Midnapore district that brought the group into wider national focus.

The room where they decode electoral data sits on the first floor of an unmarked, dilapidated building in Kidderpore’s Mansatala. Inside, conversations abound over Excel sheets, data mining, and large language models as five researchers work to understand how nearly 12 percent of West Bengal’s total electorate was erased in the SIR process.

“People are still not fully aware of the real impact of SIR. For instance, before we analysed the Nandigram seat, people might have guessed that Muslims were getting deleted disproportionately, but they didn’t know it was 95 percent [of all deletions in supplementary lists]… I think the insights will enable people to question this process,” says Souptik Halder, 25, a postgraduate student of computer science at the Indian Statistical Institute (ISI), who is tasked with building data-crunching systems for SABAR.

Halder is part of a core SABAR team examining the Bengal SIR, led by Sabir Ahamed, 48, a veteran researcher and RTI activist, and Associate Ashin Chakraborty. Supported by researchers Sohail Mallick and Atri Roy, the team brings a multidisciplinary approach – combining backgrounds in economics, statistics, and computer science – to develop a data-driven academic understanding of West Bengal’s voter roll revision.

“With a view to understanding the SIR, this is an academic exercise that is being carried out by a team of young scholars who care for a public cause. It is showing people what is happening on the ground so that people can take informed action,” says Ahamed.

On Friday, meanwhile, the research group published free, open, interactive maps covering all 294 West Bengal Assembly constituencies, tracking voter deletions across the ASDD (absent, shifted, or dead/duplicate) and logical discrepancy categories.

People are still not fully aware of the real impact of SIR.
Souptik Halder, SABAR Institute

The workflow algorithm

Researchers claim that the lack of Assembly-wise cumulative data from the SIR is the biggest hurdle. As a result, one has to manually download everything and compile it all before moving on to analysis, which has its own set of challenges. For a single Assembly constituency, the team might have to analyse “between 5,000 and 10,000 PDFs.” West Bengal has 294 Assembly constituencies.

“Apart from the draft and final rolls, the documents that need to be parsed include at least 17 supplementary and deletion lists (each) for every booth. We can download data for only 10 booths per Assembly Constituency (AC) at a time, and each AC has approximately 300 booths. Each download needs a CAPTCHA code,” explains Halder. To analyse the Nandigram seat, he said that he had to process 10,296 PDFs. 

Once the PDFs are downloaded, they must be converted into machine-readable, structured datasets. To do this, the team uses Optical Character Recognition (OCR). “It works similarly to how Google Lens functions. It is very expensive to do this over vast amounts of data. Even then, manual clean-up is a must,” he adds.

The researchers add that there are two scenarios in which a margin of error may arise. First, during OCR, the system may confuse similar-looking characters, such as “I” and “1” or “O” and “0”. Second, classifying religions based on names using an open-source machine learning model provides 91-94 percent accuracy. 

“These errors are handled and minimised further during the post-processing and verification of the data, which is a manual task,” says Chakraborty.

For the researchers decoding the SIR data, the work is as punishing as it is precise. Halder, who sometimes works from a hostel room near ISI Kolkata, to avoid missing his classes, recalls spending three sleepless nights cleaning messy PDFs. 

“Once, due to a structuring error, I accidentally mixed up deletion and inclusion data for four booths. While post-processing, the expected counts did not match. So, I had to recheck and redo everything to ensure the data's accuracy and integrity. One incorrect line in my program causes a massive error in the generated data. So, I had to be very careful and responsible about it. This is the kind of pressure we operate under. It is stressful. But when it comes out right, and people appreciate the work, there's some relief,” he says.

He says he chose to take on this pressure after recognising a gap in the field. Through his interactions with Chakraborty and Ahamed, Halder says he realised how difficult it is to analyse large-scale social science data and how little modern computational tools are used in the process.

“I felt like I had a lot to contribute,” he says, pointing to his computer science background. “Building scalable tools for extracting insights from complex public data and to make research in this domain more accessible and data-driven was a strong motivator to join this team.”

Inside the data maze

Journalists tracking the SIR exercise have reported barriers in accessing comprehensive data. Electoral rolls are published as fragmented, non-machine-readable PDFs on the Election Commission of India (EC) website, requiring constituency-wise navigation. The absence of a unified database makes it difficult to track deletions, additions, or shifts such as “under adjudication”. 

Apart from the logistical challenges of handling large datasets that require substantial computational power, the team faced a persistent constraint: working as researchers without the relevant data. For instance, the electoral rolls provided by the commission do not mention religion as a specific column. The researchers had to rely on existing models to train their systems to identify religion from elector names.

“In Bengal, surnames such as Naskar and Mondal are used by both Hindus and Muslims. We found a workaround by adding their parents' names as a parameter. So, if the elector’s name created some ambiguity, the addition of another data point - of the parents’ names - would help in decoding the religion of the elector accurately…One of the things that affected me the most in the entire process was how institutions were resistant towards researchers and academics when it came to providing accessible data,” says Chakraborty, questioning why the voter rolls were published as scanned PDFs if data already existed in easier-to-process spreadsheets.

A senior print journalist from one of the biggest dailies in Kolkata, with over 15 years of experience, applauds the SABAR team for their work. 

“What they are doing is not easy: a bunch of young people busting narratives with numbers. It is phenomenal. With deadlines and breaking news, journalists do not always have the capacity to churn large amounts of data quickly. To have someone doing that is good,” the journalist says, requesting anonymity.

Shiv Sahay Singh, West Bengal bureau chief of The Hindu, says, “The system was designed in such a way that journalists were facing challenges in accessing data from the very beginning. The SIR process was not transparent. There was a clear information gap, and there were no regular press conferences. When fact-checkers and institutes like SABAR analyse the data, it serves as a great tool for journalists.”

Not everyone, however, is convinced. Another journalist with over 23 years of experience expresses reluctance to use the institute’s insights in news reports. “The data isn’t peer-reviewed. Who is vouching for its accuracy? When we file a news article based on a report or study, multiple checks and balances are needed. For an issue as important as SIR, the study must be robust and peer-reviewed. Otherwise, it just adds to the sensationalism,” he says, speaking on the condition of anonymity.

The SABAR team counters this scepticism with a policy of radical transparency. They do not believe in gatekeeping and have created an online SIR Adjudication and Deletion Data Repository accessible to all.  

Elaborating on their rationale, Chakraborty says, “Since we are the only ones currently analysing supplementary lists, we want everyone to access this data, check our calculations and build on them. It can be used to help the deleted electors, conduct granular analysis, or even maintain a booth-wise record of those affected. Tomorrow, if someone wants to track the people impacted by SIR through this data, they can.”

The SIR process was not transparent. There was a clear information gap, and there were no regular press conferences. When fact-checkers and institutes like SABAR analyse the data, it serves as a great tool for journalists.
Shiv Sahay Singh, West Bengal bureau chief of The Hindu

‘A Sisyphean task’

With a degree in economics and a specialisation in statistics, a confessed “politics and data nerd,” Ashin Chakraborty, 24, has been associated with SABAR for more than 1.5 years. The goal of “real-world impact” fuels his drive, he says. So, when the opportunity came to analyse something as crucial as electoral rolls, he took it up.

Yet, he is under no illusions. He knows that impact can be slow to manifest – or may never come at all – leaving the work feeling, for now, like a ‘Sisyphean task’.

“There is a kind of hopelessness… It has become a Sisyphean task. We are rolling the boulder up the hill, aware that, in all probability, it will fall again… But this massive and globally unprecedented scale of voter disenfranchisement needs to be recorded by someone.”

Borrowing from the Hindu mythology, he adds, “There is a character in Hindu mythology called Chitragupta, who documents the names and deeds of dead people. I feel like we are Chitragupta of the Bengal SIR. We cannot do anything else but document.”

In the politically charged atmosphere of West Bengal’s roll revision, even a loose association with the data can lead to partisan labelling. The researchers are not exempt. 

“My distant relatives accuse me of helping illegal immigrants,” says Chakraborty. “Earlier, I would use data to explain the fallacy of their argument. Now, I don’t respond to them and instead focus on my work.”

When asked whether he personally endorses any political party, he responds with a resounding no. “People can neither deny the anti-incumbency here nor the scale of the voter disenfranchisement on the ground. There has been an erosion of trust in institutions that are supposed to protect our voting rights. Even the state counsel could have made better arguments in the courts to highlight how there was no precedent for the logical discrepancy category.”

There is a character in Hindu mythology called Chitragupta, who documents the names and deeds of dead people. I feel like we are Chitragupta of the Bengal SIR. We cannot do anything else but document.
Ashin Chakraborty, SABAR Institute

‘Backed by small contributions’

From laptops to chairs to the recent coat of yellow paint in the office, everything at SABAR requires crowdfunding, the team claims. 

“If there is a plumbing issue in the office and we need Rs 7,000 to fix it, we crowdfund. If we need new laptops, we crowdfund. Every couple of weeks, you will see [contribution] appeals on our social media platforms. We are always running tight on money,” notes Chakraborty.

Following the launch of their GitHub platform, which publishes constituency-wise insights on supplementary lists, contributions have increased. They have reportedly received over Rs 30,000 in the past 3 days for their legal clinic and SIR analysis. The group is also receiving data requests from constituents in different assembly seats. 

Elaborating on it, Ahamed remarks, “We started the work with three high-end laptops. For this, we borrowed credit cards from friends and are still paying EMIs on them. People have made contributions to the work. The cost of analysing per constituency – if we are doing a state-wide analysis – is between Rs 30,000 and Rs 40,000. This includes remuneration for volunteers who download datasets and subscription fees for various software and AI tools.”

“An additional Rs 50,000 is spent per month on legal aid and assistance cells, logistics, and legal interns. Most senior advocates working with us work pro bono,” says Ahamed, who has collaborated with the city’s renowned legal luminaries, such as NUJS Professor Dr Sarfaraz Ahmed Khan and Calcutta HC lawyer Tarique Quasimuddin, for legal camps.

Asked if they have any political support or backing, Chakraborty chuckles. “Nobody has stepped forward with anything. We could do a better analysis if we had more resources. And maybe, even get an air conditioner,” he quips.

Elections are not just about who wins, but about the questions that often go unasked – and this time, they matter more than ever. Support our new NL Sena on the five upcoming assembly polls to help us follow the missing voters, the shifting politics, and the stories that could shape India’s future.

Complaining about the media is easy. Why not do something to make it better? Support independent media and subscribe to Newslaundry today. 

Also Read: A father, a beneficiary, ex-BLO: The SIR chaos queue in a Bengal district

Also Read: ‘Feels like a betrayal’: SIR deletions hit BJP’s own Hindu refugee base in West Bengal

Also Read: ‘No disease’, only ‘impediment removal’: Delhi meet flags Bengal’s 90 lakh voter deletions