Six months ago, running a 70-billion-parameter language model at home meant either spending five figures on a multi-GPU workstation or accepting painfully slow token generation from a cloud API that charged you by the word. Today, I’m writing this article with a box the size of a sandwich running Llama 3.3 70B at 12 tokens …