Hybrid Token Representation Improves Language Model Reasoning

Improved Language Model Reasoning by Combining Latent and Text Tokens

Large Language Models (LLMs) demonstrate impressive capabilities in reasoning and planning, particularly when trained with Chain-of-Thought (CoT) data. CoT data is characterized by the explicit representation of the step-by-step thinking process through text tokens. However, this detailed representation leads to long input sequences, where many words contribute more to textual coherence than to the core information of the reasoning process. Processing these long inputs requires significant computational resources.

A new research approach, presented in the paper "Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning", addresses the optimization of this process. The core idea is to use a hybrid representation of the thinking process. Instead of representing the entire thinking process through text tokens, the initial steps of reasoning are abstracted using latent, discrete tokens. These latent tokens are generated by a Vector-Quantized Variational Autoencoder (VQ-VAE). This abstraction significantly reduces the length of the reasoning sequences.

The application of these latent abstractions is investigated in two scenarios:

- First: Training a model from scratch for the "Keys-Finding Maze" problem. Here, the model must learn to find keys in a maze and use them to open doors and reach the goal. - Second: Finetuning LLMs with this hybrid data. The vocabulary of the LLM is expanded with the unseen latent tokens. This scenario was evaluated for both logical and mathematical reasoning.

To facilitate learning with this hybrid data, a special training procedure was developed. Latent and text tokens are randomly mixed. This procedure enables rapid adaptation to the new latent tokens and improves the model's performance.

The results of the study show that this approach consistently yields better results than the comparison methods in various benchmarks. The reduction in input length through the latent tokens leads to more efficient use of computational resources without compromising the accuracy of the reasoning. This opens up new possibilities for the development of more powerful and resource-efficient LLMs.

The research findings highlight the potential of hybrid representations for reasoning in LLMs. The combination of latent and text tokens enables more efficient information processing and could lead to further advancements in the field of machine learning. Future research could focus on extending this approach to other application areas and investigating more complex reasoning tasks. Particularly in the context of AI partners like Mindverse, which develop tailored solutions such as chatbots, voicebots, AI search engines, and knowledge systems, these findings could lead to more efficient and powerful AI systems.

Bibliographie: Su, D., Zhu, H., Xu, Y., Jiao, J., Tian, Y., & Zheng, Q. (2025). Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning. *arXiv preprint arXiv:2502.03275*. Bundesamt für Sicherheit in der Informationstechnik (BSI). (2024, December 6th). *Working Paper on Large Language Models*. Various discussions and contributions on platforms like Reddit and Hugging Face on related topics. Publications of the IJCAI 2024. Lukyanko, [First Name]. (Date). Paper Review: Think Before You Speak: Training Language. *LinkedIn*.

Hybrid Token Representation Improves Language Model Reasoning

Top post

Improved Language Model Reasoning by Combining Latent and Text Tokens

Related blog

Mapping the Hugging Face Model Universe

SANA-Sprint: Ultra-Fast Image Generation with One-Step Diffusion

4D LangSplat Enables Language-Based Navigation in Dynamic Environments