Testing GGUF Models with llama.cpp
Benchmark tokens/sec, test logical output, and tune performance flags.
In this free course you will:
Build a repeatable test folder and scoring system
Benchmark prompt processing and token generation speed with llama-bench
Run 3 standardized quality tests across Q4, Q5, Q6, and Q8
Monitor VRAM in real time with nvidia-smi
Tune llama-server flags for speed, quality, and stability
Build model profiles and a personal preset library
Prerequisite: C1.