Fully open reproduction of DeepSeek R1, which ends marketing hypes of USA big techs

HKPhysicist 9 months ago

Hello Friends,

Let me share this repository with you:

Fully open reproduction of DeepSeek R1

https://github.com/huggingface/open-r1

Perhaps, you can make something new out of it.

Top Replies

michaelkellett 9 months ago in reply to HKPhysicist +3

Only if you believe all the PR hype from China. Open AI say that Deep Seek works by stealing their data base and distilling it. Of course other say that Open A1 stole copyright material to make their…
battlecoder 9 months ago in reply to DAB +2

While I think that DeepSeek will have an impact, it's going to be on par with other advancements in the field that already were great on their own (for example: Llama), I agree with the sentiment that…
bradfordmiller 9 months ago in reply to HKPhysicist +2

My understanding (as it will probably be a while before there is a general consensus in terms of what Deepseek did and didn't do) is that they have pushed some of the processing to the query side (i.e…

Parents

DAB 9 months ago

You over estimate this AI variant and its impact on current exploitation efforts.

DeepSeek will not unseat the leaders and it will take some time before it is understood well enough to actually do anything useful.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
battlecoder 9 months ago in reply to DAB

While I think that DeepSeek will have an impact, it's going to be on par with other advancements in the field that already were great on their own (for example: Llama), I agree with the sentiment that it's not going to be as dramatic as people are making it to be. It's absolutely astounding work, for sure, but it's not "David defeating Goliath". Sadly every new thing is reported as an Earth-shattering event with world-changing consequences that will absolutely change life as we know it. That's always a lie, and that's what gets the panic going and people (and markets) overreacting.

And talking about blowing things out of proportion, I would also love if companies would start being more realistic with how they present the models. They are good as text manipulation tools that seem to hold some amount of knowledge and concept abstractions. They are good at extracting core ideas, commands or actions from natural language, executing on them, and then reporting back in somewhat-natural language, so they are great for personal assistants and text-processing aids. But that's not how they "sell" them. They advertise them as if they were a cosmic all-knowing thing that will "boost" productivity and reduce the workload of people everywhere, and that's absolutely not even near to being a reality.

Now, back to DeepSeek. For models to be profitable and actually useful without becoming a major disaster both financially and environmentally, they need to start using less resources, and that's exactly where DeepSeek presents a move in the right direction. The fact that people can download it from a repo (thanks for sharing one that seems to streamline some of the process) and run it without requiring too expensive hardware, will hopefully mean that more and more people will be able to play with this kind of tool and maybe find a better use for it than replacing customer support on their products and then finding out in court that it wasn't a great idea.
Cancel
Vote Up +2 Vote Down

Sign in to reply

Cancel
HKPhysicist 9 months ago in reply to battlecoder

I heard from other post/forum about this comparison:

Other USA counterparts use 100% resource to achieve a certain result while Deepseek uses only 5% resource to achieve the same result.

Assuming that ChatGPT used 100 nvidia chips to achieve a certain task and its valuation was at 100 billion US$.

After DeepSeek's performance has been verified, it just uses 5 nvidia chips to achieve the same task. Thus, ChatGPT is revaluated to 5 billion US$.

Furthermore, if everybody uses Deepseek as his tool, he only needs 5 nvidia chips to work for him instead of 100 nvidia chips. So nvidia can only sell 5 chips to each customer instead of 100 chips. So Nvidia's valuation is also recalculated to be 5% of its old value.

It explains recent stock exchange shock.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
michaelkellett 9 months ago in reply to HKPhysicist

Only if you believe all the PR hype from China.

Open AI say that Deep Seek works by stealing their data base and distilling it.

Of course other say that Open A1 stole copyright material to make their data base.

If you look at the thread "ChatGPT designs an Audio Amplifier" you might reasonably wonder why anybody cares !

MK
Cancel
Vote Up +3 Vote Down

Sign in to reply

Cancel
Andrew J 9 months ago in reply to michaelkellett

Agree. AI is the biggest bubble around at the moment, akin to Blockchain in 2017/2018. It doesn't matter how creators dress it up, it's all predicated around look-up tables/data and a bounded algorithm. Hence why the results are often rubbish and if not rubbish, no better than a human could have discovered. What it has going for it is speed and, within its bounded algorithm, an ability to pattern match. This isn't "intelligence", or even close, that is being promoted/marketed.
Cancel
Vote Up +1 Vote Down

Sign in to reply

Cancel
battlecoder 9 months ago in reply to Andrew J

From what I understand NVIDIA stock is rebounding (albeit slowly), which kinda show it was more of a panic response to an overblown piece of news. It still shows that they are not as essential as everyone thought they were, though.

DeepSeek's work definitely shakes NVIDIA's and OpenAI's ground, but to me the value of their work lies in reducing the resources needed to run a somewhat competent model, and also showing that even if you attempt to gatekeep AI from other companies or research labs, or hardware vendors, you just can't.

Now Andrew J and michaelkellett , these models are definitely over-hyped. Their capabilities are very limited, and instead of trying to create reasonable expectations, companies working on AI products try to sell them as superhuman tools.

What I think is valuable about LLMs is their ability to understand natural language and generate a response. The idea of using that framework as a "do-it-all" tool that can do everything from summarizing text to writing music or designing stuff makes no sense, but it's unfortunately the way it's being marketed. It's the biggest bag of chips; half of the content is just air.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Reply

battlecoder 9 months ago in reply to Andrew J

From what I understand NVIDIA stock is rebounding (albeit slowly), which kinda show it was more of a panic response to an overblown piece of news. It still shows that they are not as essential as everyone thought they were, though.

DeepSeek's work definitely shakes NVIDIA's and OpenAI's ground, but to me the value of their work lies in reducing the resources needed to run a somewhat competent model, and also showing that even if you attempt to gatekeep AI from other companies or research labs, or hardware vendors, you just can't.

Now Andrew J and michaelkellett , these models are definitely over-hyped. Their capabilities are very limited, and instead of trying to create reasonable expectations, companies working on AI products try to sell them as superhuman tools.

What I think is valuable about LLMs is their ability to understand natural language and generate a response. The idea of using that framework as a "do-it-all" tool that can do everything from summarizing text to writing music or designing stuff makes no sense, but it's unfortunately the way it's being marketed. It's the biggest bag of chips; half of the content is just air.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Children

No Data