LLMDet
Visit ToolLLMDet is an open-source AI tool that enables the development of strong open-vocabulary object detectors. It leverages large language models to generate detailed image captions for improved performance.
At a glance
Trending
LLMDet is an open-source AI tool that enables the development of strong open-vocabulary object detectors. It leverages large language models to generate detailed image captions for improved performance.
Trending
About
LLMDet is the official PyTorch implementation for the paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models," accepted as a highlight paper at CVPR 2025. This tool significantly improves open-vocabulary object detection by co-training with large language models. It achieves this by generating image-level detailed captions for each image, creating a dataset called GroundingCap-1M. LLMDet then fine-tunes an open-vocabulary detector using both a standard grounding loss and a caption generation loss. The approach utilizes LLMs to generate both region-level short captions and image-level long captions, resulting in superior open-vocabulary capabilities and mutual benefits for building stronger large multi-modal models. It is integrated into the official transformers library, allowing for easy use with a few lines of code.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending