What has DeepSeek done that wasn’t possible for OpenAI or Meta? Prof Ravindran of IIT-M explains

What has DeepSeek done that wasn’t possible for OpenAI or Meta? Prof Ravindran of IIT-M explains


China’s DeepSeek last week shook the ground beneath technology firms that had anything to do with artificial intelligence in the West. The company released a free AI assistant that the startup said needed less data at a fraction of the cost of existing services.

According to a Reuters report, on January 27, semiconductor design firm Nvidia lost about 17% or close to $593 billion in market value – a record one-day loss for any company, while shares of companies in semiconductor, power and infrastructure companies exposed to AI collectively shed more than $1 trillion.

businessline talks to Prof B Ravindran, Head of the Wadhwani School of Data Science and AI at IIT-Madras explains why DeepSeek’s work is wonderful news for India and what exactly the Chinese startup has done that its much larger rivals could not:

What do DeepSeek’s claims mean?

There are 2 parts to it – the first is the cost path. Meta spent a lot of money developing LLaMa and then made it open source. What DeepSeek has done is to come up with modifications to the core technology itself, so that both the training of the model as well as the inference right, by inference I mean when you are actually using the model and you’re interacting with the model, you’re asking it a question or something, it is actually doing computation and giving you an answer, and training is when you’re actually building the model itself using all the data that you have from the Internet and so on.That is the inference part.

What deep seek has done is for both the training side and for inference side they have significantly reduced the cost. What we mean by cost is the number of GPUs that you need, the amount of computation that is done on the GPU. They have managed to significantly cut down on these and they have done this using certain well-understood techniques.

But the challenge has always been this – even though people knew that this would help, how do you put it together to get a workable system? DeepSeek has managed to crack that question. That is what made the model cheap. Theoretically, knowing that you know some technique will help reduce the cost is one thing, but actually doing it right and coming up with a feasible way of doing it is another. It’s been amazing because people are now able to run these models on much less compute (power) than what you would need to run, for example, an OpenAI equivalent model.

Instead of a few 100 GPUs, you’re able to run it with few 10s of GPUs, so it’s really significant savings in the compute power during inference time. It’s not that it has become so cheap that you can run it on your desktop. But compared to what it cost to run GPT, it’s way, way cheaper.

They have also announced these API versions where you use it commercially and the amount of money you have to pay per query, that’s come down – that’s what people are talking about – one-tenth of the cost of using OpenAI’s commercially available version of Openai. If you use the commercially available version of DeepSeek, it’s like 1/10th the cost or even lower.

Importantly, they have made the model completely available freely on the Internet. You could just download their model – you still need that many GPUs, so if you have the compute power you can then run the model locally. That’s the open-source part of it. Anybody can get their code, their model and then build on top of that model and you can do other fancy things with it.

Would you agree this is a great step forward for humanity when it comes to AI?

It’s made a significant difference. What people call a moat – around incumbent AI giants –almost vanished overnight. It’s not that every developer can start using these models. But you probably required the order of about $10 million to set up a system based on this right and companies that have $10 million are certainly a lot more than organisations that have billions of dollars for whom OpenAI is talking about building the next-generation systems. That is the significant breakthrough that has come about.

But what has not happened is – there were all these things about AI hallucination and reliability of output and things like that. All those problems still persist. It is not like they have solved those problems. It’s not like suddenly overnight DeepSeek is building a much more reliable LLM than OpenAI. What they have done is they have reduced the cost to a point where hundreds of people can start experimenting with these.

Therefore, you know that since the larger community will be able to do fundamental research on this, one would expect that more breakthroughs will come. That is the reason for so much enthusiasm.

In fact, this is great news for India, by the way. It’s not like China has gone so far ahead that we can’t catch up. In fact, these guys have done something which really is good for us because we can also start becoming a player in this scene, because the investment required is a lot lesser.

This is actually giving us an advantage. We were certainly not going to be able to compete in the billions of dollars investment range, but now with this, I think a larger number of teams in India can start exploring these models.

We are good when it comes to low-cost solutions. This will certainly allow us to do more – actually invent more stuff.

What exactly has DeepSeek done that has made this cheaper, possible to run it with fewer GPUs that Openai and others were not able to?

There are a few things that they have tried. One is they have implemented something called the ‘mixture-of-experts’ approach. That is, for every input that you’re asking, there might be a different ‘expert’ who can answer that question. But this expert can require a far, far fewer number of parameters. Because it’s only going to look at certain parts of the input, not at all of the English language like maybe only looking at a certain segment of the input you have and so each ‘expert’ can be much smaller; and put together, this mixture of all these selection of experts that you have can do very well.

Now the challenge would be how would you know which expert to switch to when and all of these things, and so some of the questions required non-trivial amount of work to do. One thing that DeepSeek has done is whenever you ask a specific query, roughly about 25% of the number of parameters actually are called into use. Not the full network.

Even though I train a large network, I’m not using all of it; only as some fraction of it is actually used when I’m trying to do the inference, so that is one place where things get sped up significantly.

The second thing is they seem to have done this quantisation very effectively. What do we mean by quantisation? So normally when you have your representing numbers in computers, you use 32 binary digits – Zeros and ones – so for each number you represent you use 32 bits. But people have actually looked at using the smaller number of zeros and ones; maybe 8 bits or 16 bits for representing numbers and people have been aggressive to go down all the way to 4 bits. It looks like DeepSeek has done some very aggressive quantisation and still managed to get good results. They have even done 4 bit quantisation also.

This also speeds up computing and reduces the requirement of memory, etc.

For the new version of DeepSeek that came out now, they have done OpenAI o-1 did something which was called inference time computing, inference time learning. What they did was – when you were answering a question, instead of just answering at one shot, they were actually doing multiple generations of the answer at the at the test time. When you ask a question, it will internally run things many, many times and then pick the answer that is the best. Normally what would happen is, when you ask a question and it just goes from left to right. It goes once and it will generate an answer, so that is what the older models were doing. What o-1 started doing was internally it used to run multiple times and then give you whatever is the better answer. What DeepSeek’s new model has done is it has done the same thing but used a more efficient way of doing using what is called reinforcement learning. To do this multiple runs. They used a more efficient form of reinforcement learning to do this multiple runs internally and that is also significant savings in training time and also inference time.

It turns out that they are able to, at least on the test suites that there are out there, they’re performing on par with o-1 or close to o-1.

DeepSeek has made something possible which wasn’t thought of as possible. Where do you expect AI technology would go from here?

DeepSeek’s latest has not come completely out of the left field. They have not done something that is completely unexpected. People knew that in theory something like this could work, but nobody knew how to get it to work. From here on, perhaps we can build more efficient reasoning systems.

The faster inference-time compute would enable more and more of complicated reasoning models to develop, which right now we don’t have. Whatever a reasoning that seems to come out of these models is still not true. It’s just an illusion of reasoning. These models can only reason about things that they have already seen. But to use the concept of ‘counterfactuals’ allows you to even reason about things that you haven’t seen so far. That kind of more general-purpose reasoning abilities can certainly now come up.

How can any projects that you are working on at IIT-M could benefit from this latest development?

One of the things that we are looking at is – how would you build evaluation frameworks, evaluation metrics for these GenAI models for different applications, particularly in the Indian context? Suppose I start using this for some legal workflow, what kind of question should the end user be asking for these models? How will they evaluate it for fairness? How should they evaluate it for robustness and what are the kind of questions you should ask? We want to do this for different sectors because it’s very important to understand the impact of this in the ecosystem. Without this kind of measurement platform, it’s very hard to even talk about regulations.

Published on January 31, 2025





Source link

Shifting pole

Shifting pole


Earth’s magnetic north pole, where the magnetic lines emanate (and converge at the magnetic south pole), today lies about 400 km from the geographic North Pole. But the magnetic poles are known to wander. They also flip, so that magnetic lines start emanating from (what is today) the magnetic south pole and converge at the north pole. The last time this happened was 780,000 years ago and you can never tell when the next will happen.

However, in recent years the magnetic north pole — today located in the Arctic Ocean above Canada — is speeding towards Siberia. Its behaviour has flummoxed experts such as Dr William Brown, global geomagnetic field modeller at the British Geological Survey. He says in the latest report of the World Magnetic Model that the pole’s conduct “is something we have never observed before”. While the magnetic north has been moving slowly around Canada since the 1500s, in the past 20 years it accelerated towards Siberia, increasing in speed every year until, about five years ago, it suddenly decelerated from 50 to 35 km per year, which is “the biggest deceleration in speed we’ve ever seen”. 

Earth’s magnetic field, caused by the motion of liquid iron and nickel beneath the crust, remains one of the more poorly understood aspects of the planet. 

Fortunately, in the GPS era, the magnetic poles matter less for navigation, but the north pole’s dash towards Siberia raises intrigue.





Source link

180-year-old observatory in Mumbai to digitise records

180-year-old observatory in Mumbai to digitise records


The Colaba Observatory in Mumbai, one of the world’s oldest, has been recording variations in the strength and direction of the earth’s magnetic field since 1841. It was one of the few observatories in the world to record the Carrington event of September 2, 1859, when a burst of energy from the sun travelled 150 million km to the earth, collapsing much of the telegraph systems. 

The observatory preserves 180 years’ worth of work in the form of magnetograms (graphical records), microfilms, and hard copy volumes. A major record is the Moos Volume, a compilation dated 1896 that is credited to Dr Nanabhoy Ardeshir Moos, the first Indian director of the Colaba Magnetic Observatory. The Moos Volume is a reference material used worldwide. 

The observatory is now part of the Indian Institute of Geomagnetism (IIG), which operates 13 magnetic observatories across the country and hosts a World Data Centre for Geomagnetism, maintaining comprehensive geomagnetic data. 

And now, the observatory has set itself the task of digitising all its data sets. This work would be undertaken by the recently inaugurated Colaba Research Centre. “This (digitisation) can help form a benchmark for the probability of occurrence of geomagnetic storms in the future. The centre will also carry out research activities on the impact of space weather and allied fields,” says a press release from the Department of Science and Technology. 

“Historical data, when digitised, can also be analysed using AI/ML techniques to provide more insights,” observed Abhay Karandikar, Secretary, Department of Science and Technology, in a LinkedIn post.

Prof Sunil Kumar Gupta to lead global physics body

Prof Sunil Kumar Gupta

Prof Sunil Kumar Gupta, who taught at the Tata Institute of Fundamental Research (TIFR) between 1976 and 2000, has been elected as President of the International Union of Pure and Applied Physics. 

The Geneva-headquartered organisation is run by the physics community, with a mission to assist in the worldwide development of physics, foster international cooperation in physics, and help in the application of physics toward solving problems of concern to humanity. 

Prof Gupta will hold the position for a three-year term. He is only the second Indian to hold this position after Dr Homi Bhabha (1960-63).





Source link

Intelligent vehicle detection

Intelligent vehicle detection


Using artificial intelligence, researchers at National Institute of Technology Rourkela (NIT Rourkela) have developed a ‘multi-class vehicle detection’ (MCVD) model and a ‘light fusion bi-directional feature pyramid network’ (LFBFPN) tool aimed at improving traffic management in developing countries. Led by Prof Santos Kumar Das, Associate Professor, Department of Electronics and Communication Engineering, the team leveraged an intelligent vehicle detection (IVD) system, which uses computer vision to identify vehicles in images and videos. This system collects real-time traffic data to optimise traffic flow, reduce congestion, and aid in future road planning.

While IVD systems perform well in developed countries with organised traffic, they face challenges in developing nations with mixed traffic. In India, a wide variety of vehicles — from cars and trucks to cycles, rickshaws, and animal carts, alongside pedestrians — often operate in proximity, making accurate vehicle detection difficult. 

Traditional IVD methods, including sensor systems such as radar and light detection and ranging (LiDAR), are effective in controlled environments but struggle in adverse weather conditions, including rain and dust storms. Moreover, these systems are expensive. Video-based systems hold greater promise, especially for India, but traditional video processing techniques struggle with fast-moving traffic and demand significant computational power. 

Deep learning (DL) models, a type of AI that learn from existing data, provide an efficient way to detect vehicles in video feeds. These models use convolutional neural networks (CNNs) to identify and analyse traffic images. However, they often fail to accurately detect vehicles of varying sizes and angles, particularly in busy, mixed-traffic environments. 

Additionally, there is a lack of labelled data sets designed for such complex conditions. 

To address these challenges, Prof Das and his team have developed the new MCVD model, which uses video deinterlacing network (VDnet) to efficiently extract key features from traffic images, even when the vehicles vary in size and shape. They also introduced the specialised LFBFPN tool to further refine the extracted details.





Source link

Silicon carbide from moon’s soil

Silicon carbide from moon’s soil


Researchers at IIT-Madras have tasted success in extracting silicon carbide from (simulated) moon soil — a development that could lead to the making of silicon carbide-based composites for building lunar habitats.

Nithya Srimurugan, a PhD student, and his professor Dr Sathyan Subbiah at the Department of Mechanical Engineering worked on extracting useful materials from lunar regolith. 

Lunar regolith is not easy to get — after all, only 382 kg of moon rocks and soil have been brought to the earth, and nobody is going to dispense it freely to every researcher. But, fortunately, there are entities that make simulated lunar soil for research. Srimurugan got some from Space Resource Technologies and Exolith Labs. 

Moon has two distinct terrains — the plains, known as maria (plural for mare), and highlands. Each has its own composition and characteristics. Highlands are rich in silicon (among other elements such as aluminium and calcium). These elements exist as oxides — to get the metals you have to drive out the oxygen. 

Srimurugan wanted to make silicon carbide — the light and strong stuff with which we make abrasives. Silicon carbide is a combination of silicon and carbon. Where do we go for carbon on the moon? The breath exhaled by those living there will be made up of carbon dioxide, but this does not react with anything. 

However, at the International Space Station, a boarding-cum-research lab that orbits 400 km above the earth, the Sabatier process is used to convert the carbon dioxide exhaled by the astronauts into methane and water by adding hydrogen from electrolysers. 

At the ISS, the methane is vented into space, but it is precious for Srimurugan. When he combined the highland regolith simulant and with methane at high temperature, he was able to get silicon carbide.

In a chat with Quantum, Srimurugan stressed that more research is needed to produce bigger quantities of silicon carbide from lunar regolith, en route to making composites for building habitats on the moon. But his work, which is currently under review, marks a good beginning.





Source link

Study warns against ‘oppressive heatwaves’ due to global warming

Study warns against ‘oppressive heatwaves’ due to global warming


Killer heatwaves are caused by high temperatures, which, as is well known now, are a result of global warming. But there is a worse type, which a group of researchers have named ‘oppressive heatwaves’, triggered by a combination of high temperature and high humidity. The researchers, from IIT-Bombay and ETH Zurich, warn that oppressive heatwaves are likely to occur more frequently — so, be prepared.

‘Heatwaves’ are defined in terms of temperature exceeding a certain threshold for a specified period (days). The India Meteorological Department (IMD) defines a heatwave as three or more days with a temperature exceeding a predetermined threshold based on topography (namely above 45 degrees C in plains and above 40 degrees C in hilly areas). 

The researchers examined the historical changes in heatwave characteristics and their association with human mortality in India. Given the importance of humidity in heatwave estimation, and that India experiences high humidity due to its geographical location, they classified heatwaves into oppressive (high temperature and high humidity) and extreme (high temperature and low humidity). They further examined the likelihood of future heatwave events following global warming by 1.5 degrees C and 2 degrees C, relative to the pre-industrial period, using daily maximum temperature, daily mean temperature, and specific humidity simulations from the Community Earth System Model Large Ensemble Numerical Simulation (CESM-LENS) data set. 

They used the IMD’s daily temperature data at 1-degree spatial resolution and data from the National Oceanic and Atmospheric Administration (NOAA), National Centre for Environmental Prediction (NCEP), and National Centre for Atmospheric Research (NCAR), from 1951 to 2013. 

They projected the ‘oppressive heatwave’ days and ‘extreme heatwave’ days for the near future (2035-65) and the far future (2070-2100) to understand the changes in heatwaves during these two time periods relative to historical climate (1975-2005). 

Then they examined the association of heat-related mortality with oppressive and extreme heatwave days from 1967 to 2007 — because there is reliable data for this period. 

“The heat-related mortality is strongly positively related to oppressive heatwave days relative to dry heatwave days,” the researchers say in a paper that awaits peer review. 

“Since oppressive heatwave days significantly increase over India in most climatologically homogeneous zones, we infer the exacerbated heat-related mortality in the future if such heatwave conditions increase monotonically in the future, as these have been in recent decades,” the paper says. 

Five-fold increase

The paper examines the situation under two scenarios of global warming — a rise in global temperatures by 2 degrees C; and by 1.5 degrees C. 

“Our results show a five-fold increase in the number of days of oppressive heatwaves under 1.5 degrees C warming in both mid (2035-2065) and end (2070-2100) of the 21st century, relative to the historical period (1975 to 2005), whereas the number of extreme heatwave days remains relatively constant in mid and end of this century,” it says. 

As for 2 degrees C warming, it would result in an eight-fold increase in the number of days of oppressive heatwaves by the end of the century, relative to the historical period and 1.5 degrees C warming conditions. 

“These results suggest that limiting the mean global warming to 1.5 degrees C can reduce the likelihood of oppressive and extreme heatwaves by 44 per cent and 25 per cent by the end of the century, relative to the 2-degree C warming world,” says the paper, authored by Naveen Sudharsan, Subimal Ghosh and Subhankar Karmakar of IIT-Bombay, and Jitendra Singh of ETH Zurich. 

“The remarkable increase in oppressive heatwave days highlights the elevated risk of heatwaves over densely populated countries and indicates an imminent need for adaptation measures,” it warns.





Source link

YouTube
Instagram
WhatsApp