1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
Albertha Tribolet edited this page 2025-02-10 16:11:17 +08:00


Inclusion of thinking "chains of thought" (CoT) in the design output considerably improves its quality, however it increases reasoning expense.

  1. A human expert's chain of thought.
  2. The final response.

    We expanded this dataset by adding:

    Synthetic R1 reasoning, i.e., the CoT created by DeepSeek R1.

    Then, we fine-tuned three versions of the model (using LoRA on llama-3.1 -8 B-instruct), each with different training targets:

    Direct Answer Only: Generate the last answer without revealing reasoning. Human Expert CoT: Generate the last response together with a thinking chain looking like the human specialist's. Synthetic R1 CoT: Generate the final response together with DeepSeek R1's synthetic thinking chain. The table below sums up typical accuracy and thinking length:

    - Note: The precision for the 5-shot standard may vary from numbers reported in other places due to various evaluation setups. The crucial focus is on comparing relative efficiency throughout distillation approaches, not on beating other designs.

    From this study, synthetic reasoning CoTs from DeepSeek R1 appear superior to human-expert CoTs in enhancing performance, albeit with a higher reasoning expense due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An user-friendly distillation interface will soon belong to FireOptimizer. If you require earlier gain access to, please contact us to explore alternatives.

    Conclusions

    By including reasoning-based data through distillation, organizations can dramatically improve design efficiency without bearing the full problem of human-annotated datasets. DeepSeek R1's capability to produce long, top quality reasoning chains makes it an effective instructor model-showing that, sometimes, the machine might simply out-teach the human.