Many of us use LLMs but we dont really understand why it works , by way of fine-tuning a LLM I want to firstly inform people on how an LLM actually functions under the hood, why do we even fine-tune them, how is it even comprehending information? and how can we do this for our own purposes using unsloth, an apache 2.0 licensed platform for fine-tuning open source models. this is an outline of what i will cover:
the mathematical magic behind LLMs
why do we do things how we do them, how is it different from the past?
"Attention is All You Need" research papers impact on LLMs .
Attention / memory
Subword
Word
Character
Instruction tuning
Bias tuning / remove racism
Question answering
Domain specialization
Reduced cost
Fine-tuning = update weights
RAG = external retrieval
A base model (open source and free model to fit in with FOSS's vision)
Dataset
Compute
Framework (Unsloth / PEFT)
Specialize in a domain
Model distillation
Reduced cost
Decensoring (harmful)
Dataset → DBZ scripts/fan prompts collection
Style/format prompts
Cross-entropy / Perplexity
LLM/Human as a Judge
Human eval (pairwise win-rate)
Full fine-tuning
LoRA
QLoRA
Unsloth
Quantization (4-bit, 8-bit)
Distillation
Optimized serving
Fine-tune + RAG hybrids
Open datasets / community fine-tunes
how LLMs work
what is fine-tuning
fine-tuning our own LLM to generate anime like episodes