SoM
Visit ToolSoM (Set-of-Mark) is an open-source prompt engineering tool that enhances visual grounding in large multimodal models like GPT-4V. It overlays spatial marks on images to improve understanding and reasoning.
At a glance
Trending
SoM (Set-of-Mark) is an open-source prompt engineering tool that enhances visual grounding in large multimodal models like GPT-4V. It overlays spatial marks on images to improve understanding and reasoning.
Trending
About
SoM (Set-of-Mark) is an innovative visual prompting technique designed to significantly improve the visual grounding abilities of large multimodal models (LMMs), particularly GPT-4V. By overlaying spatial and speakable marks directly onto images, SoM enables these models to better understand and reason about detailed visual content. The tool provides a toolbox for generating these set-of-mark prompts, allowing users to select mask granularity and mode (automatic or interactive). It supports fascinating applications such as smartphone GUI navigation, zero-shot anomaly detection, web UI navigation, and grounded reasoning, making it a powerful enhancement for various vision tasks. SoM also enables interleaved prompts, combining textual and visual content for more precise interactions.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending
Also listed in