Scaling Laws for Floating-Point Quantization Training of Large Language Models

Scaling Laws for Floating-Point Quantization Training

Low-precision training is considered an effective strategy for reducing the cost of training and downstream inference of large language models (LLMs). Previous scaling laws for precision have primarily focused on integer quantization, giving less consideration to the components of floating-point quantization, which resulted in a poor representation of the loss functions of LLMs in this scenario. In contrast, research on floating-point quantization, which is more commonly used in practice, has remained relatively superficial.

A recently published paper titled "Scaling Laws for Floating Point Quantization Training" delves into the effects of floating-point quantization targets, exponent bits, mantissa bits, and the computational precision of the scaling factor on the training performance of LLMs. The authors present a unified scaling law for floating-point quantization and provide valuable insights for the community.

Key Findings of the Study

The study reveals several key aspects of floating-point quantization and offers concrete recommendations for practical application:

Influence of Exponent and Mantissa Bits: Exponent bits contribute slightly more to model performance than mantissa bits. The authors provide the optimal ratio of exponent to mantissa bits for various bit widths, which can serve as a reference for hardware manufacturers.

Critical Data Size: The research identifies the existence of a critical data size when training LLMs with low precision. Too much training data, exceeding this critical size, leads to a degradation in LLM performance. This phenomenon underscores the importance of data quantity optimization in the context of quantization.

Optimal Precision: The optimal floating-point quantization precision is directly proportional to computational power. Within a broad range of computational power, the authors estimate that the best cost-performance precision lies between 4 and 8 bits.

Implications for Practice

The results of this study have far-reaching implications for the development and deployment of LLMs. By applying the proposed scaling laws and recommendations, developers can improve training efficiency while minimizing performance degradation. The insights into optimal bit allocation and critical data size offer valuable guidance for resource planning and optimization.

These results are particularly relevant for companies like Mindverse, which specialize in AI-powered content creation and customized AI solutions. Optimizing training and inference costs through quantization techniques enables the development of more efficient and cost-effective AI applications, such as chatbots, voicebots, AI search engines, and knowledge systems.

Research in the field of quantization for LLMs is dynamic and promising. Further investigation is necessary to refine the scaling laws and adapt them to various model architectures and use cases. However, the findings from this study provide a solid foundation for further optimization of LLMs and contribute to pushing the boundaries of AI technology.

Bibliography: https://arxiv.org/abs/2501.02423 https://arxiv.org/pdf/2501.02423 https://openreview.net/pdf/02377195671fbdc838af333a0c06ecee1caef9be.pdf https://openreview.net/pdf/bbd4b671133186b1f40f7513655cf97746237bdd.pdf https://www.researchgate.net/publication/384974274_Scaling_laws_for_post-training_quantized_large_language_models https://dl.acm.org/doi/10.1145/3689236.3695383 https://paperreading.club/page?id=276997 https://www.linkedin.com/posts/a-roucher_paper-page-scaling-llm-test-time-compute-activity-7231637646404431873-8h-7 https://aclanthology.org/2023.emnlp-main.39.pdf https://arxiv-sanity-lite.com/?rank=pid&pid=2410.12119