Benchmarks

These numbers come from production workloads on real hardware. Nothing synthetic, nothing cherry-picked.

Hardware

Component	Specification
CPU	AMD Ryzen AI MAX+ 395
Memory	128 GB LPDDR5x-8000
GPU	AMD Radeon 8060S (gfx1151)

Model

Qwen3-30B-A3B – Mixture of Experts architecture. 30 billion parameters, 3 billion active per token. The sweet spot between capability and speed on local hardware.

Vulkan + Flash Attention

The primary backend. Vulkan with Flash Attention enabled delivers the best throughput on Strix Halo.

Metric	Value
Generation speed	86 tok/s
Prompt processing	178 – 444 tok/s
Time to first token	< 110 ms
GPU utilization	97%
GPU temperature	55 C

HIP / ROCm

The alternative backend for workloads that require ROCm compatibility.

Metric	Value
Generation speed	70 tok/s
Prompt processing	217 – 302 tok/s
Time to first token	< 110 ms
GPU utilization	97%
GPU temperature	55 C

Summary

Vulkan with Flash Attention is the recommended backend for Strix Halo systems. The 86 tok/s generation speed means real-time conversational AI with no perceptible delay. Prompt processing peaks at 444 tok/s, making context ingestion nearly instantaneous for typical workloads.

Time to first token stays under 110 milliseconds on both backends. The GPU holds steady at 97% utilization and 55 degrees Celsius – fully loaded, thermally comfortable.

All inference runs locally. Nothing leaves your hardware.

Designed and built by the architect.