1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
Aimee Grice edited this page 2025-02-10 13:57:49 +08:00


Inclusion of reasoning "chains of idea" (CoT) in the model output substantially improves its quality, however it increases inference expense. - Distillation transfers thinking understanding from an expensive instructor model to a more affordable trainee, decreasing general reasoning cost.

  1. A human professional's chain of idea.
  2. The final response.

    We broadened this dataset by including:

    Synthetic R1 thinking, i.e., the CoT created by DeepSeek R1.

    Then, we fine-tuned three versions of the model (using LoRA on llama-3.1 -8 B-instruct), each with various training targets:

    Direct Answer Only: Generate the final answer without revealing reasoning. Human Expert CoT: Generate the last response along with a reasoning chain resembling the human expert's. Synthetic R1 CoT: Generate the last response together with DeepSeek R1's synthetic reasoning chain. The table below summarizes typical accuracy and reasoning length:

    - Note: The precision for the 5-shot baseline might vary from numbers reported somewhere else due to different assessment setups. The key focus is on comparing relative performance across distillation techniques, not on beating other models.

    From this study, artificial reasoning CoTs from DeepSeek R1 appear superior to human-expert CoTs in improving efficiency, albeit with a higher reasoning cost due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An user-friendly distillation user interface will quickly belong to FireOptimizer. If you need earlier gain access to, please get in touch to check out options.

    Conclusions

    By including reasoning-based information through distillation, companies can considerably improve model efficiency without bearing the complete concern of human-annotated datasets. DeepSeek R1's ability to produce long, bbarlock.com top quality thinking chains makes it a powerful teacher model-showing that, in some cases, the device might simply out-teach the human.