Agentic Object Detection

Contribution Project

Reasoning-driven object detection: human-like precision via text prompts without the overhead of custom training

Description

To build a new plugin for HUB that will implement an agentic object detection workflow.

We will use design patterns to reason at length about unique attributes like color, shape, and texture for smarter, more precise recognition in any scenario.

This will be implemented by a combination of reasoning LLMs (like deepseek r1) with large vision models like (like SmolVLM-Instruct) with traditional compute vision techniques.

Issues & Pull Requests Thread