Abstract:
Imagine running a powerful Large Language Model like Meta’s LLaMA completely offline — no internet, no cloud, just your laptop. In this session, we'll demystify how to run LLaMA models locally, turning your laptop into a fully autonomous AI powerhouse.
We'll cover model setup, optimization for low-resource hardware, and how to expose it via an API for use in your own apps — including how to run a chatbot on React Native without cloud dependencies.
Whether you're a developer exploring privacy-first AI or an enthusiast building offline assistants, this talk will give you the roadmap to deploy, serve, and chat with LLaMA offline.
Key Takeaways:
1] What is LLaMA and why run it locally?
2] Model formats: GGUF, quantization, and performance benchmarks
3] Step-by-step: Running LLaMA on Mac/Windows/Linux using llama.cpp
4] Serving LLaMA as an API with Node.js
5] Building an offline chatbot in React Native
6] Tips for memory-efficient inference (no GPU required!)
7] Packaging and deploying the solution to desktop/mobile stores
This seems like a fairly simple, straightforward talk that I'm not sure many people will benefit from. Most people either already want to self-host and are, and if they don't, they have no reason to, since many of the cloud-based tools are free.