UFO2: A Novel Approach to AI-Powered Desktop Automation

```html

UFO2: A New Approach to Desktop Automation through AI Agents

Automating workflows on computers is a constantly growing field. Artificial Intelligence (AI) plays an increasingly important role in this. Computer-Using Agents (CUAs), powered by multimodal Large Language Models (LLMs), offer promising possibilities for automating complex desktop workflows using natural language. However, previous approaches often remained conceptual prototypes, limited by superficial operating system integration, fragile screenshot-based interaction, and disruptive execution.

UFO2 presents a new approach: a multi-agent AgentOS for Windows desktops that empowers CUAs for practical, system-level automation. At the core of UFO2 is a central HostAgent, responsible for task decomposition and coordination. This HostAgent is supported by a collection of application-specific AppAgents. These AppAgents are equipped with native APIs, domain-specific knowledge, and a unified GUI API action layer.

This architecture enables robust task execution while maintaining modularity and extensibility. A hybrid control recognition system combines Windows UI Automation (UIA) with visual analysis to support various interface styles. Runtime efficiency is further improved by speculative multi-action planning, which reduces the LLM overhead per step.

A special feature of UFO2 is the Picture-in-Picture (PiP) interface. This enables automation within an isolated virtual desktop, allowing agents and users to work simultaneously without mutual interference. This significantly increases user-friendliness and simplifies integration into existing workflows.

Deeper Operating System Integration for More Reliable Automation

The developers of UFO2 have evaluated the system using over 20 real Windows applications. The results show significant improvements in robustness and execution accuracy compared to previous CUAs. The deep operating system integration of UFO2 opens a scalable path to reliable, user-oriented desktop automation.

The combination of central control by the HostAgent, specialized AppAgents, and the hybrid control recognition allows for flexible and efficient automation. The PiP function also supports the parallel use of human and machine on the desktop without the respective actions interfering with each other.

For companies like Mindverse, which specialize in AI-powered solutions, UFO2 offers interesting potential. The technology could form the basis for customized automation solutions, for example for chatbots, voicebots, AI search engines, and knowledge systems. The flexible architecture of UFO2 allows adaptation to specific use cases and integration into existing systems.

Outlook

The development of CUAs like UFO2 is still in its early stages. However, the results so far are promising and point to great potential for the future of desktop automation. The combination of LLMs, deep operating system integration, and user-friendly interfaces could fundamentally change the way we interact with computers.

```