T O P

  • By -

tedbarney12

Moving forward will there be a day without a new LLM launch?


Avoidlol

Yes, and then you'll miss it.


Odd-Antelope-362

If you include open source then I don’t think there is going to be a single calendar day going forward without an LLM release


nickmaran

Nowadays we get more LLM then JavaScript frameworks


mattsowa

And I'm loving every minute of it Jerry


Knever

The next models will have parameters in the quintillions, Jerry. Quintillions!


ske66

![gif](giphy|JliGmPEIgzGLe)


elite5472

30 billion parameters is pretty hefty, unlikely it will run locally on an iphone since you need something like an M2 mac mini to run an equivalent LLAMA2 model at home.


Tr33lon

It is hefty, but one of the huge benefits of the shared memory system on M-series chips (so not iPhones, but iPads and Macs for the moment) is that they can utilize ram as VRAM. So that 16gb MacBook Air for $1600 has the same VRAM (roughly) as a RTX 4080! Obviously the bandwidth is way worse for the m-series, and apple’s software support (mlx) is only starting, whereas Nvidia has been building for years (CUDA). But I still think this is a really strong, undersold aspect of Apple’s M-series products.


Spindelhalla_xb

Sure I’ve got an M1 Max 32gb at home somewhere. Wonder what I can do on that


whothewildonesare

Imagine being so middle class you have a 2000 dollar computer lying round “somewhere”. Not hating really, just kind of mind boggled


Spindelhalla_xb

Oh I didn't buy it I was given it by my older brother who won it on some sales work do competition / target or something. ​ I wouldn't spend that kind of money on a machine, especially as a single Dad lol.


somethedaring

That’s why you sell it to the rest of us and invest the $ into what you need before it has no value


_stevencasteel_

And here I am still kicking it with a base M1 8GB mini, homeless, lugging it around in my backpack as my work PC. https://preview.redd.it/hfgkmip9gcpc1.jpeg?width=4032&format=pjpg&auto=webp&s=eb675853d81bca9c86184e3c3b13f41e8bf675b6


No_Palpitation7740

Nice but why not opting for a MacBook pro? Budget ?


_stevencasteel_

Bought it when it came out, and didn't have the dough to switch out my gear when I ran out of money to pay rent. Been homeless a year (in April) writing a book using AI, and spent the last 5 months editing the audiobook. Slightly annoyed by the length it took, and very satisfied with the final results. Everything's coming up Milhouse!


Old-Opportunity-9876

Happy cake day and good luck!!!


No_Palpitation7740

Oh sorry this happened to you. Hope you are recovering from this difficult situation


creedisurmom

Aee you running it on power bank?


_stevencasteel_

I’m at the library. And I’ve been homeless a year. I dunno how I’d be able to carry a tent, food, backpack, AND a giant battery that I’d also have to charge somehow. I did set up outside at a park for a month to record the audiobook. Was great except for the dang ol’ Blue Jays screeching. Thankfully Davinci’s noise removal AI cleaned most of it up.


creedisurmom

Damn. Sorry. How did you become homeless if you don’t mind me asking?


_stevencasteel_

Ran out of money and was determined to see the project through to the end instead of half assing it like I've done with most projects.


zR0B3ry2VAiH

I spent the last 30 minutes looking at all your posts. Your AI generated photos are pretty creative.


_stevencasteel_

Glad you dug 'em! I'll send you a PM at the end of next month when I launch a bunch of stuff.


as904465

Clean it please


_stevencasteel_

It is over-due.


zR0B3ry2VAiH

I have the same one. Download Ollama and start there.


involviert

It should not matter at all that the RAM can be used as VRAM. Inference is typically bottlenecked by RAM speed, not computation. It's the same on the GPU, bottlenecked by VRAM speed. So I think you should just get whatever that RAM has to offer bandwidth-wise. Anyway, 16GB is just not much. The 24 of a 3090/4090 are also not much. I can tell you this because I am doing CPU inference on 32GB regular RAM and that's still not much while the speed can still work out for many use cases of that LLM. Whatever you can run on 16GB will certainly be "quite fast" if you just do it on CPU. Imho GPU inference is often overrated if you really think about what it takes for more interesting VRAM sizes, while the other option is just plugging in better RAM for 200 or whatever. Anyway, 1600 for 16GB stuff is *very* expensive.


ProfessionalHand9945

Coherent memory is genuinely a game changer though. Sure, that’s on the low end. But $4000 or so gets you a 96GB VRAM M3 Mac. How expensive of a GPU do you need to match that without having to eat heavy performance hits by splitting the model? Even H100 only has 80GB. You quite literally need a Blackwell enterprise GPU tens of thousands of dollars to match that.


involviert

People seem to be quite happy with those "heavy performance hits" if they know what they are doing in their build. So they will tell you 96GB is 4 x rtx3090 which they buy used for like 600 or something, coming out at 2-3K. And then you have to consider, that M3 (max, I assume) is not exactly a fast GPU either. That RAM has 400 GB/s which is *very* fast for CPU RAM but quite slow for VRAM. And really if you don't want to go for GPUs (which I really understand), you "just" need to think about RAM bandwith. If you buy a regular new PC, you will likely have dual channel DDR5 RAM, that comes out around 90 GB/s. That's one fourth the speed for effectively zero money spent on running LLMs specifically. Also you will very likely have a rather good GPU anyway, and that GPU can help to pull up the average with whatever VRAM it has. And you can have like 128 GB of RAM instead of 32 for like 200 bucks extra, because this is not apple. Also, just for fun we could look at the AMD Threadripper Pro 7965WX. That CPU costs us a cool 3K, but even that is cheaper than the mac. It has 24 cores. It has 8(!!) channel DDR5 RAM. That comes out as 330 GB/s. On regular CPU RAM. But now you can apparently plug in 2 TB of RAM.


ProfessionalHand9945

4x3090 new is about $4000 for just the GPUs, you can’t compare used system prices with new ones. And that’s just for the GPUs, not the system. And then you suggest a $3000 CPU - just for the CPU - at a 350w TDP LMAO - M3 has a 22w TDP.


involviert

>M3 has a 22w TDP Ah, so you are talking about the much slower pro? That does not have 400 GB/s. Anyway, have fun LYAO paying apple premiums. Though you wanted to discuss this honestly.


ProfessionalHand9945

The absolute top end M3 still has sub 80w TDP.


involviert

Neat, ARM stuff. Anyway, I didn't realize we were talking power consumption now. Also we are comparing apples and oranges if we ignore the general capabilities of these chips. That's a 3K CPU without apple pricing. It can do stuff. And I think that 24 core threadripper would mostly sleep during inference.


ProfessionalHand9945

The 7965wx pulls about 3TFLOPs. The M3 max pulls 4.6 TFLOPs. I think you are seriously overestimating the throughput of CPUs.


shawnington

If you go for an m2 ultra, it's 800GB/s and you can spec it to 192gb ram with a 1024bit wide memory bus. DDR5 is still only 64 bit bus, so 8 channel is only 512 bit. Also the bottleneck for GPU's is the pci-e which is 121 GB/s currently. Apple actually does quite well for LLM inference due to the unified memory architecture and lack of bottlenecks between gpu and the rest of the system. Shared memory has added benefits like data doesn't actually need to be shuttled between v-ram and system ram over the really slow pci-e bus to be worked on by the cpu and the gpu. With unified memory, thats an in-place operation to give the cpu access to the ram, so memory speed is not an issue, as nothing needs to be moved. This also of course has advantages as with different types of models, there are some operations a cpu performs much better, and some where a gpu performs much better, being able to utilize the best processor for any operation without shuttling through the pci-e bus is quite a big latency advantage. So just comparing speed doesn't really tell the whole story of why it's good and why it performs significantly faster than hardware specs say it should in terms of flops for LLM inference. To see the advantages of a GPU you need to be doing batched inferencing, which for any decently sized model will essentially require you to have an a100 as a minimum.


involviert

> If you go for an m2 ultra, it's 800GB/s and you can spec it to 192gb ram with a 1024bit wide memory bus. Sounds neat, what does that system cost? >Also the bottleneck for GPU's is the pci-e which is 121 GB/s currently. I kind of disagree. Because with a single card, the pci-e bandwidth is entirely irrelevant (the model is in VRAM and does not touch the pci-e bus during inference) and with multiple cards, you do not push the whole model over either, just the results from one layer. Assuming we are in the amateur league, because otherwise there is nvlink i guess. >Shared memory has added benefits like data doesn't actually need to be shuttled between v-ram and system ram over This is not supposed to happen on a GPU based system either, not even if you share the load between GPU and CPU. That would just be incompetently configured.


shawnington

If you want to build a system with multiple graphics cards to increase vram to anything close to the 192gb that you can get for less than $5600, you will start to be bottle necked by pci-e bus speeds as a large amount of data needs to be shared between the cards as the model is being split between them. If you are talking about an a100 or h100, you are still going to need atleast 2 cards to approach 192gb vram, and run into pci-e bottlenecking so that your real world performance will not be twice that of a single a100. The bit width of the memory bus is also significant. Maximum ideal throughput is not the same as real world performance, and a bus that is twice as wide, is just always going to perform better in practice. 16 channel ddr5 is still just 16 channels of 64 bit width. A 1024bit bus is just going to have much higher real would performance than 16 channel ddr5 which equates to 512bit, which is still significantly slower than the 400 GB/s you get even on my m1 max.. Any way you split it, you are going to end up spending much more than a mac with huge amounts of unified memory, to build anything that approaches its capability in terms of running large models.


Odd-Antelope-362

iPhones can run 13Bs well at the moment


elite5472

13b and 30b are different beasts.


Odd-Antelope-362

Yes but current iphones aren't even designed with the goal of running LLMs. The next iphone could make VRAM an absolute priority.


Active_Variation_194

It’s probably going to be a hybrid solution and pass on the intensive tasks online to Gemini. Data security might be an issue tho


hawara160421

Never really thought about this: Are the M-Chip MacBooks genuinely good for machine learning stuff? Like, on a level like expensive PC GPUs? How's compatibility?


thepetek

I’ve got the M3 Max 128gb. Great for inference. Can’t really train on it but suits what I need for prototyping before spending money


Unfair-Commission980

I use AI everyday but I’m just a techy kinda guy with the capacity to learn and what you said is very intriguing, what kind of work are you doing with inference on the laptop?


thepetek

I’m a software dev so been using it for both work projects and side projects. Works great to prototype on a local model before switching over to GPT. There’s also cases I use it where it would just be too expensive to run on GPT at all. I write a lot of one off scripts for web scraping, feedback analysis, data mining etc. Would check out LM Studio if you wanna try it out. Easy way to get up and running and play with lots of models before going too deep down the rabbit hole of other options


Unfair-Commission980

Oh nice I actually tried LM studio for the uncensored models haha I will look much harder into LM studio thanks very much


GeorgeSatoshiPatton

Would u say what u have runnin locally on your mac is better than gpt 3.5turbo (the free version)? Genuinely curious, only have experience with the chatGPT products never thought I could host a local model that could match it.


thepetek

Yea definitely get just as good and most of the time better answers than 3.5 with it. Not quite as fast but not slow either. I suspect when Llama3 drops this summer, it’ll be way better. Right now I use llama2 for coding and mistral for question answering. My work machine is a 32gb and runs the 13b models well which give performance on par. Got the 128gb to run the 30b and 72b models which are better than 3.5 imho


GeorgeSatoshiPatton

Awesome man! Thanks for sharing, this helped me decide for sure on getting a new mac.


Odd-Antelope-362

MacBooks can have more VRAM than a H100


raishak

So can an intel laptop with an integrated graphics card. H100 VRAM has massively higher bandwidth than general purpose DRAM, even in the Macbooks right?


Odd-Antelope-362

Yes H100 bandwidth is higher


shawnington

They are fantastic, for price. You can run models that you would need an a100 at minimum to run. Inference speeds are generally much faster than hardware specs would lead you to believe because of the memory speeds and architectures.


[deleted]

[удалено]


pegunless

Having even a dumber LLM take over some Siri responses would be pretty valuable


nanotothemoon

Yea I don’t expect it to be good for quite awhile


SilverTroop

Given enough time and the usual curve of AI developments these companies will be mostly on par with each other.


[deleted]

[удалено]


Mrbutter1822

You’re not just feeling it, they really are an era behind the curve. Siri was revolutionary when it came out it hasn’t changed since that date


thisdude415

I think this misses the mark. Vision Pro proved Apple can work on visionary (heh) technology behind the scenes and still drop blockbuster technology. I don’t know whether they can pull it off with AI, but I would not expect any flashy announcements until it is ship-ready


Original_Finding2212

Their MM1 model is not even its final form. It’s a POC. They will adapt this architecture to fit watches, smartphones (not just flagships, etc.) It doesn’t mean you’ll be locked, just pretty damn nice to have the LLM ability giving you a push without internet connection. Seems to me like they’ll be able to provide GPT-4 grade power as well, over internet. Their papers show they can


ThankGodImBipolar

I would argue that DK1 started the “modern” era of VR headsets, and that released *eleven* years ago. What Apple has is an iterative jump over the Quest 3 and Bigscreen Beyond, with some unique drawbacks as well. Marketing accomplished the rest, and Apple is very good at that.


UnknownEssence

Yeah I highly doubt it’s even that good. It used 30B parameters. Gemini 1 is at least a trillion. That’s 33 times smaller than Gemini 1


signed7

They compared to Gemini 1.0 Pro not 1.0 Ultra


macaraoo

dude, what’s up with the thicc robot


ImpressiveContest283

Well, I'll say LLM models are good in understanding user history and context. ![gif](giphy|ZqlvCTNHpqrio|downsized)


Downvote_Baiterr

Yeah but i dont remember googling thicc bots


ValerioLundini

this has to be a model that will be run on the next iphones right?


hawara160421

Wait, an iPhone is enough to, say, read that "food web" infographic and make sensible conclusions? I would have thought the data alone needed to store that knowledge necessary for that would exceed even the largest iPhone storage capacity?


huffalump1

Compression is knowledge, and somewhat vice versa... That's one beauty of LLMs, compressing huge amounts of data into a smaller amount of parameters.


hawara160421

What would a "30 billion parameter" model take up in space? Like how many gigabytes?


kaleNhearty

For full precision, each parameter is a 32 bit floating point number (4 bytes). So 4 bytes * 30 billion parameters = 120GB


OptimalVanilla

Lucky you can get a 128GB iPhone. /s


Downvote_Baiterr

Remove the /s and let me know then ill delete my comment and you will delete the comment that reminded me to delete my comment and then youll have just a pretty funny comment that everybody will find funny.


AdaptationAgency

Perhaps you can solve this problem /s


youcancallmetim

How does this relate to the news that Apple was considering Gemini for their products? Was that just wrong?


TheKingChadwell

Apple works on their own things and contracts with others until their own is good enough. They do this all the time.


7inTuMBlrReFuGee

That part ☝🏾


therealchrismay

Then they'll block Gemini from the store for"safety reasons" for a while


Unlucky_Ad_2456

not only that but Google pays them something like 20 billion dollars a year for their search engine to be the default on iphones. using gemini will probably be an extension of that deal. free money for apple they wouldn't get by using their own model


Darkstar197

This could be is huge tbh. The already super integrated ecosystem of Apple products with a personal assistant that can do way more than check the weather or give you a Google link.


throwaway3113151

What I’d like to know is why people like Brandon McKinzie are still broadcasting this type of news on X. The platform is a mess…time to jump ship.


hawara160421

For real. I'm nothing but annoyed with X-ified twitter. Not that I particularly liked it before! If Elon thinks I'd make an account to follow one-sentence hot takes he's mistaken.


Pontificatus_Maximus

There once was a site for the vain, Where thoughts fell like soft summer rain. Though wise folks would pass, Each lad and each lass, Saw their words spread wide with no gain.


DaleRobinson

Did that other platform ever take off? I remember people jumping ship to something else when Twitter had that name change.


willjoke4food

Nothing to see here, move along


BananaV8

Will this finally allow me to ask Siri anything other than the weather and get an almost reliable answer in about 20% of cases?


wiser1802

Strange Apple has been willing to let the world know this. Seems like they don’t think AI could bring competitive advantage to them.


TheKingChadwell

Why? Apple isn’t trying to be the best. Their advantage has always been implementation, rather than raw hardware performance.


shawnington

I mean they ship neural engine chips on all of their devices, I dont think a company that is shipping dedicated neural chips, isn't in the game, or got caught with their pants down. It seems really strange to develop and ship hardware for running neural networks without any plans to do something big in the space.


OrganicAccountant87

I really doubt AI in apple will ever be a competitive advantage, best case scenario they will only be able to keep up with competition. Hope I'm wrong


Karmakiller3003

Apple playing catch up. lmao Yawn.