The growing use of artificial intelligence (AI) in official statistics carries the risk of models faithfully reproducing the biases and gaps in their training data, and could lend false confidence to the very blind spots we intend to remove, cautioned Principal Secretary to the Prime Minister P K Mishra on Monday. He said these were “not reasons for hesitation” but reasons for a more rigorous approach to data.
Speaking at the 20th Statistics Day celebrations, Mishra said that the incorporation of AI is among the frontiers that analytical institutions must pursue. But the commitment to AI-driven datasets by the Ministry of Statistics and Programme Implementation (MoSPI) also warranted a look at the hard questions it poses.
“If a figure is imputed or nowcast by a model, can the statistical system audit it, explain it, and own it as it owns a survey result?” he asked, adding that the model “learns only from the data it is given, and it will faithfully reproduce whatever biases and gaps it contains.”
Mishra placed the AI question alongside two others — of trust and institutional independence — that he said must guide the next phase of reform as India moves to harness administrative data alongside its traditional surveys and censuses.
Mishra also flagged an institutional concern arising from the shift towards data generated and owned by different ministries. “For 75 years, the credibility of our statistics has rested in part on our control of the instrument. The ministry designed the survey, it drew the sample, it owned the estimate. As the centre of gravity shifts towards records that other ministries generate and own, we must consider carefully how that independence is preserved,” he said.
Describing this year’s Statistics Day theme — Unlocking the Potential of Administrative Data — as marking “an inflection point” for India’s statistical system, Mishra said administrative data must evolve from being a by-product of departmental processes to becoming a strategic national asset, capable of supporting timely, evidence-based decision-making and addressing critical data gaps.
The way forward, he said, required “dynamic data catalogues, seamless interoperability across government systems, and a fully integrated data ecosystem where information is generated at source, shared securely, and utilised efficiently for governance and policy analysis”. Trusted and interoperable datasets, he added, would also form the foundation for the responsible and effective adoption of AI in governance.
Mishra, however, cautioned that technical investment alone would not suffice. “Building human capacity, data literacy and analytical competencies across institutions will be equally critical for success,” he said, adding that the principles of privacy by design and alignment with existing legal and policy frameworks must guide all efforts towards greater interoperability.
“The potential of administrative data will be unlocked not by technology alone, but by the institutional architecture we build around it. The standards and safeguards and the independence that turn raw records into statistics a nation can trust,” he concluded.