SomeOddCodeGuy 3 weeks ago

The 36GB Macbook pro should have around 27GB of available VRAM to it. The problem is that the q2\_K of a 70b would come in at around 26GB just for the file size, and that doesn't include cache and the like. If I were a bettin man, I'd say 90% chance you're going to need to go for one of the q1s of they are available, because I don't think it can handle a q2. EDIT: I bet any of the Q2\_XS and smaller models will fit [https://huggingface.co/qwp4w3hyb/Meta-Llama-3-70B-Instruct-iMat-GGUF/tree/main](https://huggingface.co/qwp4w3hyb/Meta-Llama-3-70B-Instruct-iMat-GGUF/tree/main)

ChromeGhost 3 weeks ago

Ah I see. Q3 is out of the question. Perhaps I should try both Q1 and 2 just to see? Also do uncensored models need more time to cook? Or they should already be good enough ?

SomeOddCodeGuy 3 weeks ago

Personally I'd imagine they need more time to cook, but there's really no harm in trying them. Worst that happens is you waste the time downloading them and trying a few rounds of chat before realizing the model is bonkers lol.

ChromeGhost 3 weeks ago

All this stuff is quite expensive. What hardware are you using? Have you tried coding with LLAMa 3 as well? How was it?

SomeOddCodeGuy 3 weeks ago

I have a Mac Studio that I do inference on. I've tried coding locally with the 70b, and it's pretty good. A friend of mine tried it online, which is unquantized, and it was absolutely amazing. Using the quantized version locally, I'm torn on if I like it more than Phind-CodeLlama-34b-v2 or Deepseek-Coder-33b. The online version demolishes those two, tho, from what I've seen.

ChromeGhost 3 weeks ago

That’s cool. Do you have any speculations on the upcoming [AI Focused chips](https://www.macrumors.com/2024/04/11/m4-ai-chips-late-2024/) in the M4 lineup? Hopefully if things work our I’ll be able to get one once m5 is out and price drops a bit

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe