LL3 debuts on April 18th. It feels like a year since then as far as progress. There was a 4bit bnb quant out in 4 hours after the release. Meta and the model wall… What’s the point? The second one person gets access the model is essentially public…
- Q4KM quant provides an excellent output chat/chat instruct output.
- Far superior conversational skills than GPT4 from my limited testing. More personable and “casual” feeling even at low temps.
- Q4KM can fit on 2×4090 with room to spare
Very large context sizes and fine tunes already well represented on HF (< 2 weeks)- Papers already written about the quantization quality!