https://x.com/ljupc0/status/1921660533578588403
Ljubomir Josifovski @ljupc0
Mixture-of-Experts MoE in addition to dense model variants please! It's so so much faster in terms for tokens per second on localhost.
The number of active parameters makes a huge difference on a laptop. M2 mbp runs Gemma-3-27b and comparable dense Qwen3-32B at ~4-6 tps. But MoE Qwen3-30B-A3B runs at ~20-40 tps (!!) (esp when 0.6B speculative decode works well). And that makes for a world of difference in the user experience.
More context 256K maybe even 512K would be very useful too.
Do keep the QAT training please - that was just excellent! Hope all other OS models switch to QAT too.
9:15 PM · May 11, 2025