A high-performance, offline-ready Android assistant utilizing Llama.cpp for local LLM inference and a fallback Python kernel for complex tasks.
OmniAgent is an advanced Android application designed to bring powerful AI directly to the user’s pocket—without requiring an internet connection. The app bridges the gap between high-performance local large language models and mobile efficiency, creating a private and intelligent “Guardian AI” experience that runs entirely on-device.By integrating llama.cpp through JNI,
OmniAgent enables offline inference for modern language models such as Qwen2.5 and Gemma 2, allowing users to generate AI responses without relying on cloud services. The system provides real-time streaming responses similar to modern AI chat interfaces, giving users a smooth and responsive conversational experience optimized for mobile hardware.
The application features a dual-kernel architecture, combining a native C++ inference engine for running LLMs with a Python-based reasoning layer (via Chaquopy) that performs intent classification and heuristic analysis. This design improves both performance and intelligent task handling on mobile devices.
OmniAgent is built with Kotlin and Jetpack Compose, offering a modern and intuitive interface that includes a unique “Thinking UI” to visualize the AI’s reasoning process while generating responses.
With a strong privacy-first approach, all processing happens locally on the device—ensuring that user prompts, data, and interactions never leave the phone.
Kotlin, Jetpack Compose, llama.cpp (C++), GGUF models, Python (Chaquopy), JNI.