Command Palette
Search for a command to run...
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Abstract
In the field of AI-driven human-GUI interaction automation, while rapidadvances in multimodal large language models and reinforcement fine-tuningtechniques have yielded remarkable progress, a fundamental challenge persists:their interaction logic significantly deviates from natural human-GUIcommunication patterns. To fill this gap, we propose "Blink-Think-Link" (BTL),a brain-inspired framework for human-GUI interaction that mimics the humancognitive process between users and graphical interfaces. The system decomposesinteractions into three biologically plausible phases: (1) Blink - rapiddetection and attention to relevant screen areas, analogous to saccadic eyemovements; (2) Think - higher-level reasoning and decision-making, mirroringcognitive planning; and (3) Link - generation of executable commands forprecise motor control, emulating human action selection mechanisms.Additionally, we introduce two key technical innovations for the BTL framework:(1) Blink Data Generation - an automated annotation pipeline specificallyoptimized for blink data, and (2) BTL Reward -- the first rule-based rewardmechanism that enables reinforcement learning driven by both process andoutcome. Building upon this framework, we develop a GUI agent model namedBTL-UI, which demonstrates consistent state-of-the-art performance across bothstatic GUI understanding and dynamic interaction tasks in comprehensivebenchmarks. These results provide conclusive empirical validation of theframework's efficacy in developing advanced GUI Agents.