Kubefeeds Team A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.

A Hugging Face Project Is Uncovering DeepSeek-R1’s Secrets

4 min read

DeepSeek on phone

DeepSeek-R1’s release was a huge wake-up call for the AI world, according to Jeff Boudier, who leads product and growth at Hugging Face.

“The wake-up call was that, in order to get the best possible AI, you don’t need to rely on closed models from OpenAI, Anthropic, Google, etc.,” Boudier said. “You can access an open model here from DeepSeek with similar capabilities, coming from a research lab that was previously not very much known.”

Hugging Face is a company that serves as a repository hub and community for open source large language models (LLMs). It very quickly saw the impact of DeepSeek-R1, which is hosted on the platform.

“What was interesting is that it was not just a big announcement for sort of the general public, it also created a flurry of activity within the AI community, and we saw that directly on Hugging Face,” Boudier told The New Stack. “The R1 release today — that’s over 10 million downloads on Hugging Face and that’s just the last 30 days.”

How DeepSeek Changed AI

DeepSeek creates very efficient models that run on less powerful hardware. That’s unusual in AI, so much so that when its R1 model was released in January, it triggered a stock dive for NVIDIA, which manufactures the graphics processing units (GPUs) upon which other AI systems rely.

DeepSeek also used multiple neural networks instead of relying on a single “generalist” model. Plus, it was inexpensive to train at just $5.5 million compared to other generation AI models, “thanks to architectural changes like Multi-Token Prediction (MTP), Multi-Head Latent Attention (MLA) and a LOT (seriously, a lot) of hardware optimization,” Hugging Face researchers wrote in a blog post.

The DeepSeek organization on Hugging Face is also the most followed organization on the site, with more than 45,000 followers. That’s more than Google, Microsoft or other large AI players. There are now thousands of DeepSeek model derivatives available on the hub, he added.

It also changed the game for those organizations that want to use AI. Now, organizations can download the open source DeepSeek, released under the MIT license, and host it on premises.

“If you’re an enterprise, you don’t need to send your customer data to an API anymore, like that of OpenAI or others,” Boudier said. “You can actually host everything in-house. And it’s also MIT-licensed, so you can use it for whatever commercial purpose. That’s really, really powerful.”

The Open-R1 Project

DeepSeek didn’t just release its open source R1 and R1-Zero models — the Chinese company released a technical report that was “very generous in terms of the knowledge they shared and how they were able to create R1 and R1-Zero models using reinforcement learning techniques and some of these tricks,” Boudier explained.

The techniques described in the technical report were implemented within Hugging Face libraries, so they can be used by research labs around the world, he added. That included techniques such as Generative Reasoning and Planning Optimization (GRPO), which enables the AI to think through completing more complex tasks and then improve over time.

But there were some missing pieces in DeepSeek’s research, Boudier said.

“The technical report did not explain or describe the training data that was used to train and align the R1 model,” he said. “It did not describe the distillation process.”

Specifically, a Hugging Face research team noted, the report left questions about:

  • Data collection, such as how the reasoning-specific datasets were curated.
  • Model training. “No training code was released by DeepSeek, so it is unknown which hyperparameters work best and how they differ across different model families and scales,” the researchers said.
  • Scaling laws. “What are the compute and data trade-offs in training reasoning models?” Hugging Face researchers asked.

These questions lead to the creation of the Open-R1 project, an initiative that is systematically reconstructing DeepSeek-R1’s data and training pipeline, validating its claims, and “pushing the boundaries of open reasoning models,” the researchers wrote.

“By building Open-R1, we aim to provide transparency on how reinforcement learning can enhance reasoning, share reproducible insights with the open source community, and create a foundation for future models to leverage these techniques,” they stated.

The Hugging Face researchers outlined their “plan of attack” for Open-R1:

  1. Replicate the R1-Distill models by distilling a high-quality reasoning dataset from DeepSeek-R1.
  2. Replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale data sets for math, reasoning, and code.
  3. Show they can go from base model → SFT → RL via multi-stage training.

Reproducing the DeepSeek-R1 pipeline allows the research labs to go through the exact same process that DeepSeek went through when they created DeepSeek-R1 and DeepSeek-R1-Zero, which were reasoning models distilled from the foundation model, DeepSeek-V3.

Open-R1’s Purpose

Open-R1 isn’t designed to create new models per se — it’s more about creating and freely publishing artifacts.

One of the missing pieces in DeepSeek’s published research was how to go from a large, pre-trained model that has general knowledge and has been trained on trillions and trillions of tokens to a model that’s very good at a particular domain.

The key was creating reasoning traces that are produced by inferencing this “very capable model” on a specific domain and questions, Boudier said. Reasoning traces refer to a record or log of the steps an AI system takes to arrive at a conclusion or decision. Think of it as recording the AI’s “thought process.”

“You can actually host everything in-house. And it’s also MIT-licensed, so you can use it for whatever commercial purpose. That’s really, really powerful.”
— Jeff Boudier, head of product and growth at Hugging Face

In the case of DeepSeek-R1 and R1-Zero, the reasoning is on a specific domain, rather than, say, the whole internet.

“You can take a model and then teach it through distillation to be really, really good at this particular type of tasks” through reasoning traces, Boudier explained.

That’s what the Hugging Face team released in its second update — a mathematical reasoning traces dataset called Open-R1-Math-220k that has more than 200,000 reasoning traces for complex mathematical questions.

“The synthetic datasets will allow everybody to fine-tune existing or new LLMs into reasoning models by simply fine-tuning on them,” the team said of the math datasets. “The training recipes involving RL [reinforcement learning] will serve as a starting point for anybody to build similar models from scratch and will allow researchers to build even more advanced methods on top.”

There’s a lot of potential in exploring other areas, including code but also scientific fields such as medicine, “where reasoning models could have a significant impact,” they stated.

The Latest Release

The Open-R1 project just released its third update, which Boudier called the “most exciting update to date.”

It includes a code programming data set with more than 100,000 events programming reasoning traces obtained from DeepSeek R1. This dataset can be used to train new models to better understand the nuances of code, enabling the AI model to explain the reasoning behind the code. From it, the team built the OlympicCoder 7-billion and 32-billion parameter models.

“What’s really exciting is that by applying the distillation pipeline that they recreated from the R1 paper and from the R1 release, they were able to create these really, really powerful models,” Boudier said. “To give you a sense, the 32-billion model outperforms Claude Sonnet, which is the Anthropic state-of-the-art model for advanced programming challenges.”

The team also released a new IOI benchmark — based on the annual competitive programming competition, the International Olympiads of Informatics — to have a new way to measure a model’s ability to tackle more challenging programming problems.

The post A Hugging Face Project Is Uncovering DeepSeek-R1’s Secrets appeared first on The New Stack.

Kubefeeds Team A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.