Meta Releases Open Source PerceptionLM and VideoBench for Detailed Visual Understanding

Open Access to Data and Models for Detailed Visual Understanding: Meta Releases PerceptionLM

The development of powerful vision-language models (VLMs) is a central component of current research in computer vision. However, many of these models are not publicly accessible, which hinders transparency regarding their data, design, and training processes. This impedes scientific progress, as the reproducibility and comparability of research results are limited. A common approach in the research community is the distillation of knowledge from black-box models to generate training data. Although this can achieve considerable results in benchmarks, actual scientific progress remains limited because the details of the teacher model and its data sources are unknown.

Meta has now introduced PerceptionLM, a vision-language model based on a completely open and reproducible framework. The goal is to enable transparent research in the field of image and video understanding. As part of the project, standard training pipelines without distillation from proprietary models were analyzed, and the use of large-scale synthetic data was investigated to identify critical data gaps, particularly in detailed video understanding.

To address these gaps, Meta is releasing 2.8 million human-annotated instances of fine-grained video question-answer pairs and spatio-temporally bound video captions. Additionally, PLM-VideoBench is introduced, a suite for evaluating demanding video understanding tasks that focuses on the ability to reason about the "what," "where," "when," and "how" of a video.

The Importance of Open Resources for AI Research

The release of PerceptionLM underscores the growing importance of open resources in AI research. By providing data, training recipes, code, and models, Meta enables other researchers to reproduce, extend, and further develop the work. This promotes scientific exchange and accelerates progress in the field of visual understanding.

The disclosure of the training data and methods allows the community to better understand the strengths and weaknesses of the model and to make targeted improvements. Furthermore, researchers can use PerceptionLM as a basis for their own projects, building upon the results already achieved.

PLM-VideoBench: A New Benchmark for Video Understanding

With PLM-VideoBench, Meta introduces a new benchmark suite specifically designed for the challenges of video understanding. The suite tests the ability of models to answer complex questions about videos and generate detailed descriptions. This involves not only the recognition of objects and actions, but also the understanding of spatial and temporal relationships as well as causal connections.

PLM-VideoBench enables a comprehensive evaluation of video understanding models and provides a valuable basis for further research in this area. The suite can contribute to the development of more robust and powerful models capable of understanding videos on a deeper level.

Outlook

The release of PerceptionLM and PLM-VideoBench represents an important contribution to research in the field of visual understanding. Open access to data, code, and models enables transparent research and promotes scientific progress. It remains to be seen how the research community will utilize these resources and what further developments will result.

Bibliographie: - https://arxiv.org/abs/2504.13180 - https://ai.meta.com/research/publications/perceptionlm-open-access-data-and-models-for-detailed-visual-understanding/ - https://deeplearn.org/arxiv/596664/perceptionlm:-open-access-data-and-models-for-detailed-visual-understanding - https://chatpaper.com/chatpaper/paper/130443 - https://paperreading.club/page?id=300251 - https://github.com/facebookresearch/perception_models/blob/main/apps/pe/README.md - https://ai.meta.com/results/?page=1&content_types[0]=publication - https://huggingface.co/facebook/PE-Core-B16-224 - https://chatpaper.com/chatpaper/?id=2&date=1744905600&page=1 - https://arxiv.org/list/cs.AI/recent