China’s DeepSeek last week shook the ground beneath technology firms that had anything to do with artificial intelligence in the West. The company released a free AI assistant that the startup said needed less data at a fraction of the cost of existing services.
According to a Reuters report, on January 27, semiconductor design firm Nvidia lost about 17% or close to $593 billion in market value – a record one-day loss for any company, while shares of companies in semiconductor, power and infrastructure companies exposed to AI collectively shed more than $1 trillion.
businessline talks to Prof B Ravindran, Head of the Wadhwani School of Data Science and AI at IIT-Madras explains why DeepSeek’s work is wonderful news for India and what exactly the Chinese startup has done that its much larger rivals could not:
What do DeepSeek’s claims mean?
There are 2 parts to it – the first is the cost path. Meta spent a lot of money developing LLaMa and then made it open source. What DeepSeek has done is to come up with modifications to the core technology itself, so that both the training of the model as well as the inference right, by inference I mean when you are actually using the model and you’re interacting with the model, you’re asking it a question or something, it is actually doing computation and giving you an answer, and training is when you’re actually building the model itself using all the data that you have from the Internet and so on.That is the inference part.
What deep seek has done is for both the training side and for inference side they have significantly reduced the cost. What we mean by cost is the number of GPUs that you need, the amount of computation that is done on the GPU. They have managed to significantly cut down on these and they have done this using certain well-understood techniques.
But the challenge has always been this – even though people knew that this would help, how do you put it together to get a workable system? DeepSeek has managed to crack that question. That is what made the model cheap. Theoretically, knowing that you know some technique will help reduce the cost is one thing, but actually doing it right and coming up with a feasible way of doing it is another. It’s been amazing because people are now able to run these models on much less compute (power) than what you would need to run, for example, an OpenAI equivalent model.
Instead of a few 100 GPUs, you’re able to run it with few 10s of GPUs, so it’s really significant savings in the compute power during inference time. It’s not that it has become so cheap that you can run it on your desktop. But compared to what it cost to run GPT, it’s way, way cheaper.
They have also announced these API versions where you use it commercially and the amount of money you have to pay per query, that’s come down – that’s what people are talking about – one-tenth of the cost of using OpenAI’s commercially available version of Openai. If you use the commercially available version of DeepSeek, it’s like 1/10th the cost or even lower.
Importantly, they have made the model completely available freely on the Internet. You could just download their model – you still need that many GPUs, so if you have the compute power you can then run the model locally. That’s the open-source part of it. Anybody can get their code, their model and then build on top of that model and you can do other fancy things with it.
Would you agree this is a great step forward for humanity when it comes to AI?
It’s made a significant difference. What people call a moat – around incumbent AI giants –almost vanished overnight. It’s not that every developer can start using these models. But you probably required the order of about $10 million to set up a system based on this right and companies that have $10 million are certainly a lot more than organisations that have billions of dollars for whom OpenAI is talking about building the next-generation systems. That is the significant breakthrough that has come about.
But what has not happened is – there were all these things about AI hallucination and reliability of output and things like that. All those problems still persist. It is not like they have solved those problems. It’s not like suddenly overnight DeepSeek is building a much more reliable LLM than OpenAI. What they have done is they have reduced the cost to a point where hundreds of people can start experimenting with these.
Therefore, you know that since the larger community will be able to do fundamental research on this, one would expect that more breakthroughs will come. That is the reason for so much enthusiasm.
In fact, this is great news for India, by the way. It’s not like China has gone so far ahead that we can’t catch up. In fact, these guys have done something which really is good for us because we can also start becoming a player in this scene, because the investment required is a lot lesser.
This is actually giving us an advantage. We were certainly not going to be able to compete in the billions of dollars investment range, but now with this, I think a larger number of teams in India can start exploring these models.
We are good when it comes to low-cost solutions. This will certainly allow us to do more – actually invent more stuff.
What exactly has DeepSeek done that has made this cheaper, possible to run it with fewer GPUs that Openai and others were not able to?
There are a few things that they have tried. One is they have implemented something called the ‘mixture-of-experts’ approach. That is, for every input that you’re asking, there might be a different ‘expert’ who can answer that question. But this expert can require a far, far fewer number of parameters. Because it’s only going to look at certain parts of the input, not at all of the English language like maybe only looking at a certain segment of the input you have and so each ‘expert’ can be much smaller; and put together, this mixture of all these selection of experts that you have can do very well.
Now the challenge would be how would you know which expert to switch to when and all of these things, and so some of the questions required non-trivial amount of work to do. One thing that DeepSeek has done is whenever you ask a specific query, roughly about 25% of the number of parameters actually are called into use. Not the full network.
Even though I train a large network, I’m not using all of it; only as some fraction of it is actually used when I’m trying to do the inference, so that is one place where things get sped up significantly.
The second thing is they seem to have done this quantisation very effectively. What do we mean by quantisation? So normally when you have your representing numbers in computers, you use 32 binary digits – Zeros and ones – so for each number you represent you use 32 bits. But people have actually looked at using the smaller number of zeros and ones; maybe 8 bits or 16 bits for representing numbers and people have been aggressive to go down all the way to 4 bits. It looks like DeepSeek has done some very aggressive quantisation and still managed to get good results. They have even done 4 bit quantisation also.
This also speeds up computing and reduces the requirement of memory, etc.
For the new version of DeepSeek that came out now, they have done OpenAI o-1 did something which was called inference time computing, inference time learning. What they did was – when you were answering a question, instead of just answering at one shot, they were actually doing multiple generations of the answer at the at the test time. When you ask a question, it will internally run things many, many times and then pick the answer that is the best. Normally what would happen is, when you ask a question and it just goes from left to right. It goes once and it will generate an answer, so that is what the older models were doing. What o-1 started doing was internally it used to run multiple times and then give you whatever is the better answer. What DeepSeek’s new model has done is it has done the same thing but used a more efficient way of doing using what is called reinforcement learning. To do this multiple runs. They used a more efficient form of reinforcement learning to do this multiple runs internally and that is also significant savings in training time and also inference time.
It turns out that they are able to, at least on the test suites that there are out there, they’re performing on par with o-1 or close to o-1.
DeepSeek has made something possible which wasn’t thought of as possible. Where do you expect AI technology would go from here?
DeepSeek’s latest has not come completely out of the left field. They have not done something that is completely unexpected. People knew that in theory something like this could work, but nobody knew how to get it to work. From here on, perhaps we can build more efficient reasoning systems.
The faster inference-time compute would enable more and more of complicated reasoning models to develop, which right now we don’t have. Whatever a reasoning that seems to come out of these models is still not true. It’s just an illusion of reasoning. These models can only reason about things that they have already seen. But to use the concept of ‘counterfactuals’ allows you to even reason about things that you haven’t seen so far. That kind of more general-purpose reasoning abilities can certainly now come up.
How can any projects that you are working on at IIT-M could benefit from this latest development?
One of the things that we are looking at is – how would you build evaluation frameworks, evaluation metrics for these GenAI models for different applications, particularly in the Indian context? Suppose I start using this for some legal workflow, what kind of question should the end user be asking for these models? How will they evaluate it for fairness? How should they evaluate it for robustness and what are the kind of questions you should ask? We want to do this for different sectors because it’s very important to understand the impact of this in the ecosystem. Without this kind of measurement platform, it’s very hard to even talk about regulations.
Published on January 31, 2025