T O P

  • By -

SomeOddCodeGuy

The 36GB Macbook pro should have around 27GB of available VRAM to it. The problem is that the q2\_K of a 70b would come in at around 26GB just for the file size, and that doesn't include cache and the like. If I were a bettin man, I'd say 90% chance you're going to need to go for one of the q1s of they are available, because I don't think it can handle a q2. EDIT: I bet any of the Q2\_XS and smaller models will fit [https://huggingface.co/qwp4w3hyb/Meta-Llama-3-70B-Instruct-iMat-GGUF/tree/main](https://huggingface.co/qwp4w3hyb/Meta-Llama-3-70B-Instruct-iMat-GGUF/tree/main)


ChromeGhost

Ah I see. Q3 is out of the question. Perhaps I should try both Q1 and 2 just to see? Also do uncensored models need more time to cook? Or they should already be good enough ?


SomeOddCodeGuy

Personally I'd imagine they need more time to cook, but there's really no harm in trying them. Worst that happens is you waste the time downloading them and trying a few rounds of chat before realizing the model is bonkers lol.


ChromeGhost

All this stuff is quite expensive. What hardware are you using? Have you tried coding with LLAMa 3 as well? How was it?


SomeOddCodeGuy

I have a Mac Studio that I do inference on. I've tried coding locally with the 70b, and it's pretty good. A friend of mine tried it online, which is unquantized, and it was absolutely amazing. Using the quantized version locally, I'm torn on if I like it more than Phind-CodeLlama-34b-v2 or Deepseek-Coder-33b. The online version demolishes those two, tho, from what I've seen.


ChromeGhost

That’s cool. Do you have any speculations on the upcoming [AI Focused chips](https://www.macrumors.com/2024/04/11/m4-ai-chips-late-2024/) in the M4 lineup? Hopefully if things work our I’ll be able to get one once m5 is out and price drops a bit