Bootstrapping Long Chain-of-Thought in Large Language Models

Thought Processes in Large Language Models without Distillation: A New Approach

Large Language Models (LLMs) have made impressive progress in natural language processing in recent years. In particular, their ability to solve complex reasoning tasks is a focus of research. Models like GPT from OpenAI use so-called "Chains-of-Thought" (CoT), a chain of thought steps, to arrive at a solution. These thought processes enable LLMs to analyze problems, develop plans, and even reflect on previous steps. However, developing such capabilities is complex and resource-intensive.

Previous approaches to replicating these thought processes rely primarily on distilling knowledge from existing models with CoT capabilities. This, however, raises questions about the systematic development of such capabilities and limits research to the data of the few available models. Furthermore, these works often focus on specific areas like mathematics or programming, limiting their generalizability.

A new research approach called "BOLT" (Bootstrap Long Chain-of-Thought) offers a promising alternative. BOLT enables the development of long chain-of-thought (LongCoT) capabilities in LLMs without relying on data distillation from models like GPT or requiring extensive human annotations. Instead, BOLT uses a standard instruct model as a foundation and bootstraps the LongCoT capability in three steps:

1. LongCoT data bootstrapping with in-context learning on a standard instruct model. 2. Supervised fine-tuning with the generated LongCoT data. 3. Online training to further refine the LongCoT capabilities.

A notable advantage of BOLT is the low demand for in-context examples during the bootstrapping phase. In the experiments conducted, only ten examples were needed, highlighting the practicality of the approach. The researchers used Llama-3.1-70B-Instruct as the base model and tested BOLT on various model sizes (7B, 8B, 70B). The results on benchmarks such as Arena-Hard, MT-Bench, WildBench, ZebraLogic, and MATH500 show that BOLT achieves impressive performance in various task domains and in logical reasoning.

This development opens new possibilities for research on large language models. By being independent of existing CoT models and reducing annotation effort, BOLT could accelerate the development of more robust and generalizable thought processes in LLMs. This is particularly relevant for companies like Mindverse, which specialize in developing AI-powered solutions. Integrating advanced thought processes into chatbots, voicebots, AI search engines, and knowledge systems could significantly increase the performance and efficiency of these applications and open up new fields of application.

The research results on BOLT underscore the potential of innovative approaches in the field of artificial intelligence. The development of methods that reduce the need for resources and the dependence on existing models is crucial for the advancement and democratization of AI technologies. It remains to be seen how BOLT performs in practice and what further innovations will result from this approach.

Bibliography:

Pang, B., Dong, H., Xu, J., Savarese, S., Zhou, Y., & Xiong, C. (2025). BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation. *arXiv preprint arXiv:2502.03860*.
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Ke, G. (2022). On the evaluation of large language models for reasoning. *arXiv preprint arXiv:2201.11903*.
Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Guo, E., ... & Steinhardt, J. (2024). Measuring mathematical problem solving with the MATH dataset. In *Proceedings of the First Workshop on Natural Language Processing for Problem and Idea Generation (NLP4PI)* (pp. 26-39).
Suzgun, M., Scales, N., Tan, H., Dehghani, M., & Ainslie, J. (2024). Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them. *arXiv preprint arXiv:2402.10200*.
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Mann, A., ... & Steinhardt, J. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. *arXiv preprint arXiv:2204.05862*.
Sanh, V., Webson, A., Raffel, C., Bach, S., Sutawika, L., Alyafeai, Z., ... & Rush, A. M. (2021). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. *arXiv preprint arXiv:1910.01108*.
Callison-Burch, C. (n.d.). *Publications*. Retrieved from https://callison-burch.github.io/publications.html
*Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)*. (2024). Association for Computational Linguistics.
Qi, X. (n.d.). *LLM Alignment, Safety, and Security*. Retrieved from https://xiangyuqi.com/arxiv-llm-alignment-safety-security/
Diao, S., Wang, Y., Su, Y., Zhu, C., & Duan, N. (2023). Is prompt all you need? no. a comprehensive and comparative study of parameter-efficient fine-tuning methods. *arXiv preprint arXiv:2305.08254*.

```

Bootstrapping Long Chain-of-Thought in Large Language Models

Top post

Thought Processes in Large Language Models without Distillation: A New Approach

Related blog

Multi-Turn Jailbreaks and Defenses: Enhancing LLM Security

Off-Policy Learning Enhances Reasoning Abilities in AI Models

SphereDiff Generates Seamless 360° Panoramas Without Finetuning