1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
ottoamundson98 edited this page 2025-02-12 12:15:29 +08:00


Inclusion of reasoning "chains of idea" (CoT) in the design output substantially improves its quality, however it increases inference cost. - Distillation transfers thinking understanding from a pricey instructor model to a more cost-effective trainee, reducing general inference expense.

  1. A human professional's chain of idea.
  2. The final answer.

    We broadened this dataset by adding:

    Synthetic R1 reasoning, i.e., the CoT generated by DeepSeek R1.

    Then, we fine-tuned three variations of the design (utilizing LoRA on llama-3.1 -8 B-instruct), each with different training targets:

    Direct Answer Only: Generate the last answer without revealing thinking. Human Expert CoT: Generate the final answer alongside a reasoning chain resembling the . Synthetic R1 CoT: Generate the final response together with DeepSeek R1's synthetic reasoning chain. The table below sums up typical accuracy and thinking length:

    - Note: The accuracy for the 5-shot standard may vary from numbers reported somewhere else due to different assessment setups. The essential focus is on comparing relative efficiency across distillation methods, not on beating other models.

    From this study, artificial reasoning CoTs from DeepSeek R1 appear remarkable to human-expert CoTs in increasing efficiency, albeit with a greater inference expense due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An easy to use distillation user interface will quickly be part of FireOptimizer. If you need earlier gain access to, please contact us to explore alternatives.

    Conclusions

    By incorporating reasoning-based data through distillation, companies can considerably enhance model performance without bearing the complete burden of human-annotated datasets. DeepSeek R1's capability to produce long, premium reasoning chains makes it a powerful instructor model-showing that, sometimes, the maker may just out-teach the human.