Nami Assistant is a local first voice assistant for hands free desktop automation, web interaction, and voice driven code generation using modular AI services.
Designed for web interaction, desktop automation, and developer productivity, Nami Assistant is a local first voice assistant. The system uses a modular multi-service architecture to link voice input with actual system actions. Instead of interacting with their computers manually, users use natural voice commands to complete tasks.
Using Picovoice Porcupine, the assistant begins by detecting wake words. Faster whisper speech recognition is used to translate speech input into text once it is activated. The request is processed by the orchestrator, which then uses intent to route it to the appropriate service. Three main types of tasks are supported by the system. Help with coding, desktop automation, and general communication.
With desktop automation, users can use voice commands to operate programs and system features. The assistant types text, opens and closes programs, and uses the mouse and keyboard. The system can accurately identify and interact with elements on the screen thanks to OCR-based screen text detection.
Through Playwright, web automation enables the assistant to manage browser tasks. The assistant interacts with page elements like buttons and forms, opens websites, conducts information searches, and extracts page content. Automated web interaction is supported by both graphical and headless browser modes.
Another essential component of the system is developer productivity. Nami Assistant uses a secure local WebSocket bridge to interface with Visual Studio Code. Voice commands are used by users to manage projects, write functions, edit code, and create files. This feature facilitates hands-free development and quicker coding workflows.
Each service in the distributed local design architecture is responsible for a particular task. The orchestrator oversees system coordination, audio input, and request routing. Tool execution, VS Code integration, desktop automation, and browser automation are all handled by different services. Performance, debugging, and scalability are all enhanced by this division.
Low latency and privacy are top priorities for Nami Assistant. The majority of processing, including automation logic and speech recognition, is done locally on the user's computer. For reasoning tasks, external AI services are still optional. This method preserves quick response times while safeguarding user data.
The project shows how to combine browser control, system automation, speech processing, and developer tools into a single voice-driven interface. It offers a basis for upcoming additions like more tools, enhanced natural language processing, and support for cross-platform automation.