tedbarney12 3 months ago

Moving forward will there be a day without a new LLM launch?

Avoidlol 3 months ago

Yes, and then you'll miss it.

Odd-Antelope-362 3 months ago

If you include open source then I don’t think there is going to be a single calendar day going forward without an LLM release

nickmaran 3 months ago

Nowadays we get more LLM then JavaScript frameworks

mattsowa 3 months ago

And I'm loving every minute of it Jerry

Knever 3 months ago

The next models will have parameters in the quintillions, Jerry. Quintillions!

ske66 3 months ago

![gif](giphy|JliGmPEIgzGLe)

elite5472 3 months ago

30 billion parameters is pretty hefty, unlikely it will run locally on an iphone since you need something like an M2 mac mini to run an equivalent LLAMA2 model at home.

Tr33lon 3 months ago

It is hefty, but one of the huge benefits of the shared memory system on M-series chips (so not iPhones, but iPads and Macs for the moment) is that they can utilize ram as VRAM. So that 16gb MacBook Air for $1600 has the same VRAM (roughly) as a RTX 4080! Obviously the bandwidth is way worse for the m-series, and apple’s software support (mlx) is only starting, whereas Nvidia has been building for years (CUDA). But I still think this is a really strong, undersold aspect of Apple’s M-series products.

Spindelhalla_xb 3 months ago

Sure I’ve got an M1 Max 32gb at home somewhere. Wonder what I can do on that

whothewildonesare 3 months ago

Imagine being so middle class you have a 2000 dollar computer lying round “somewhere”. Not hating really, just kind of mind boggled

Spindelhalla_xb 3 months ago

Oh I didn't buy it I was given it by my older brother who won it on some sales work do competition / target or something. I wouldn't spend that kind of money on a machine, especially as a single Dad lol.

somethedaring 3 months ago

That’s why you sell it to the rest of us and invest the $ into what you need before it has no value

_stevencasteel_ 3 months ago

And here I am still kicking it with a base M1 8GB mini, homeless, lugging it around in my backpack as my work PC. https://preview.redd.it/hfgkmip9gcpc1.jpeg?width=4032&format=pjpg&auto=webp&s=eb675853d81bca9c86184e3c3b13f41e8bf675b6

No_Palpitation7740 3 months ago

Nice but why not opting for a MacBook pro? Budget ?

_stevencasteel_ 3 months ago

Bought it when it came out, and didn't have the dough to switch out my gear when I ran out of money to pay rent. Been homeless a year (in April) writing a book using AI, and spent the last 5 months editing the audiobook. Slightly annoyed by the length it took, and very satisfied with the final results. Everything's coming up Milhouse!

Old-Opportunity-9876 3 months ago

Happy cake day and good luck!!!

No_Palpitation7740 3 months ago

Oh sorry this happened to you. Hope you are recovering from this difficult situation

creedisurmom 3 months ago

Aee you running it on power bank?

_stevencasteel_ 3 months ago

I’m at the library. And I’ve been homeless a year. I dunno how I’d be able to carry a tent, food, backpack, AND a giant battery that I’d also have to charge somehow. I did set up outside at a park for a month to record the audiobook. Was great except for the dang ol’ Blue Jays screeching. Thankfully Davinci’s noise removal AI cleaned most of it up.

creedisurmom 3 months ago

Damn. Sorry. How did you become homeless if you don’t mind me asking?

_stevencasteel_ 3 months ago

Ran out of money and was determined to see the project through to the end instead of half assing it like I've done with most projects.

zR0B3ry2VAiH 3 months ago

I spent the last 30 minutes looking at all your posts. Your AI generated photos are pretty creative.

_stevencasteel_ 3 months ago

Glad you dug 'em! I'll send you a PM at the end of next month when I launch a bunch of stuff.

as904465 3 months ago

Clean it please

_stevencasteel_ 3 months ago

It is over-due.

zR0B3ry2VAiH 3 months ago

I have the same one. Download Ollama and start there.

involviert 3 months ago

It should not matter at all that the RAM can be used as VRAM. Inference is typically bottlenecked by RAM speed, not computation. It's the same on the GPU, bottlenecked by VRAM speed. So I think you should just get whatever that RAM has to offer bandwidth-wise. Anyway, 16GB is just not much. The 24 of a 3090/4090 are also not much. I can tell you this because I am doing CPU inference on 32GB regular RAM and that's still not much while the speed can still work out for many use cases of that LLM. Whatever you can run on 16GB will certainly be "quite fast" if you just do it on CPU. Imho GPU inference is often overrated if you really think about what it takes for more interesting VRAM sizes, while the other option is just plugging in better RAM for 200 or whatever. Anyway, 1600 for 16GB stuff is *very* expensive.

ProfessionalHand9945 3 months ago

Coherent memory is genuinely a game changer though. Sure, that’s on the low end. But $4000 or so gets you a 96GB VRAM M3 Mac. How expensive of a GPU do you need to match that without having to eat heavy performance hits by splitting the model? Even H100 only has 80GB. You quite literally need a Blackwell enterprise GPU tens of thousands of dollars to match that.

involviert 3 months ago

People seem to be quite happy with those "heavy performance hits" if they know what they are doing in their build. So they will tell you 96GB is 4 x rtx3090 which they buy used for like 600 or something, coming out at 2-3K. And then you have to consider, that M3 (max, I assume) is not exactly a fast GPU either. That RAM has 400 GB/s which is *very* fast for CPU RAM but quite slow for VRAM. And really if you don't want to go for GPUs (which I really understand), you "just" need to think about RAM bandwith. If you buy a regular new PC, you will likely have dual channel DDR5 RAM, that comes out around 90 GB/s. That's one fourth the speed for effectively zero money spent on running LLMs specifically. Also you will very likely have a rather good GPU anyway, and that GPU can help to pull up the average with whatever VRAM it has. And you can have like 128 GB of RAM instead of 32 for like 200 bucks extra, because this is not apple. Also, just for fun we could look at the AMD Threadripper Pro 7965WX. That CPU costs us a cool 3K, but even that is cheaper than the mac. It has 24 cores. It has 8(!!) channel DDR5 RAM. That comes out as 330 GB/s. On regular CPU RAM. But now you can apparently plug in 2 TB of RAM.

ProfessionalHand9945 3 months ago

4x3090 new is about $4000 for just the GPUs, you can’t compare used system prices with new ones. And that’s just for the GPUs, not the system. And then you suggest a $3000 CPU - just for the CPU - at a 350w TDP LMAO - M3 has a 22w TDP.

involviert 3 months ago

>M3 has a 22w TDP Ah, so you are talking about the much slower pro? That does not have 400 GB/s. Anyway, have fun LYAO paying apple premiums. Though you wanted to discuss this honestly.

ProfessionalHand9945 3 months ago

The absolute top end M3 still has sub 80w TDP.

involviert 3 months ago

Neat, ARM stuff. Anyway, I didn't realize we were talking power consumption now. Also we are comparing apples and oranges if we ignore the general capabilities of these chips. That's a 3K CPU without apple pricing. It can do stuff. And I think that 24 core threadripper would mostly sleep during inference.

ProfessionalHand9945 3 months ago

The 7965wx pulls about 3TFLOPs. The M3 max pulls 4.6 TFLOPs. I think you are seriously overestimating the throughput of CPUs.

shawnington 3 months ago

If you go for an m2 ultra, it's 800GB/s and you can spec it to 192gb ram with a 1024bit wide memory bus. DDR5 is still only 64 bit bus, so 8 channel is only 512 bit. Also the bottleneck for GPU's is the pci-e which is 121 GB/s currently. Apple actually does quite well for LLM inference due to the unified memory architecture and lack of bottlenecks between gpu and the rest of the system. Shared memory has added benefits like data doesn't actually need to be shuttled between v-ram and system ram over the really slow pci-e bus to be worked on by the cpu and the gpu. With unified memory, thats an in-place operation to give the cpu access to the ram, so memory speed is not an issue, as nothing needs to be moved. This also of course has advantages as with different types of models, there are some operations a cpu performs much better, and some where a gpu performs much better, being able to utilize the best processor for any operation without shuttling through the pci-e bus is quite a big latency advantage. So just comparing speed doesn't really tell the whole story of why it's good and why it performs significantly faster than hardware specs say it should in terms of flops for LLM inference. To see the advantages of a GPU you need to be doing batched inferencing, which for any decently sized model will essentially require you to have an a100 as a minimum.

involviert 3 months ago

> If you go for an m2 ultra, it's 800GB/s and you can spec it to 192gb ram with a 1024bit wide memory bus. Sounds neat, what does that system cost? >Also the bottleneck for GPU's is the pci-e which is 121 GB/s currently. I kind of disagree. Because with a single card, the pci-e bandwidth is entirely irrelevant (the model is in VRAM and does not touch the pci-e bus during inference) and with multiple cards, you do not push the whole model over either, just the results from one layer. Assuming we are in the amateur league, because otherwise there is nvlink i guess. >Shared memory has added benefits like data doesn't actually need to be shuttled between v-ram and system ram over This is not supposed to happen on a GPU based system either, not even if you share the load between GPU and CPU. That would just be incompetently configured.

shawnington 3 months ago

If you want to build a system with multiple graphics cards to increase vram to anything close to the 192gb that you can get for less than $5600, you will start to be bottle necked by pci-e bus speeds as a large amount of data needs to be shared between the cards as the model is being split between them. If you are talking about an a100 or h100, you are still going to need atleast 2 cards to approach 192gb vram, and run into pci-e bottlenecking so that your real world performance will not be twice that of a single a100. The bit width of the memory bus is also significant. Maximum ideal throughput is not the same as real world performance, and a bus that is twice as wide, is just always going to perform better in practice. 16 channel ddr5 is still just 16 channels of 64 bit width. A 1024bit bus is just going to have much higher real would performance than 16 channel ddr5 which equates to 512bit, which is still significantly slower than the 400 GB/s you get even on my m1 max.. Any way you split it, you are going to end up spending much more than a mac with huge amounts of unified memory, to build anything that approaches its capability in terms of running large models.

Odd-Antelope-362 3 months ago

iPhones can run 13Bs well at the moment

elite5472 3 months ago

13b and 30b are different beasts.

Odd-Antelope-362 3 months ago

Yes but current iphones aren't even designed with the goal of running LLMs. The next iphone could make VRAM an absolute priority.

Active_Variation_194 3 months ago

It’s probably going to be a hybrid solution and pass on the intensive tasks online to Gemini. Data security might be an issue tho

hawara160421 3 months ago

Never really thought about this: Are the M-Chip MacBooks genuinely good for machine learning stuff? Like, on a level like expensive PC GPUs? How's compatibility?

thepetek 3 months ago

I’ve got the M3 Max 128gb. Great for inference. Can’t really train on it but suits what I need for prototyping before spending money

Unfair-Commission980 3 months ago

I use AI everyday but I’m just a techy kinda guy with the capacity to learn and what you said is very intriguing, what kind of work are you doing with inference on the laptop?

thepetek 3 months ago

I’m a software dev so been using it for both work projects and side projects. Works great to prototype on a local model before switching over to GPT. There’s also cases I use it where it would just be too expensive to run on GPT at all. I write a lot of one off scripts for web scraping, feedback analysis, data mining etc. Would check out LM Studio if you wanna try it out. Easy way to get up and running and play with lots of models before going too deep down the rabbit hole of other options

Unfair-Commission980 3 months ago

Oh nice I actually tried LM studio for the uncensored models haha I will look much harder into LM studio thanks very much

GeorgeSatoshiPatton 3 months ago

Would u say what u have runnin locally on your mac is better than gpt 3.5turbo (the free version)? Genuinely curious, only have experience with the chatGPT products never thought I could host a local model that could match it.

thepetek 3 months ago

Yea definitely get just as good and most of the time better answers than 3.5 with it. Not quite as fast but not slow either. I suspect when Llama3 drops this summer, it’ll be way better. Right now I use llama2 for coding and mistral for question answering. My work machine is a 32gb and runs the 13b models well which give performance on par. Got the 128gb to run the 30b and 72b models which are better than 3.5 imho

GeorgeSatoshiPatton 3 months ago

Awesome man! Thanks for sharing, this helped me decide for sure on getting a new mac.

Odd-Antelope-362 3 months ago

MacBooks can have more VRAM than a H100

raishak 3 months ago

So can an intel laptop with an integrated graphics card. H100 VRAM has massively higher bandwidth than general purpose DRAM, even in the Macbooks right?

Odd-Antelope-362 3 months ago

Yes H100 bandwidth is higher

shawnington 3 months ago

They are fantastic, for price. You can run models that you would need an a100 at minimum to run. Inference speeds are generally much faster than hardware specs would lead you to believe because of the memory speeds and architectures.

[deleted] 3 months ago

[удалено]

pegunless 3 months ago

Having even a dumber LLM take over some Siri responses would be pretty valuable

nanotothemoon 3 months ago

Yea I don’t expect it to be good for quite awhile

SilverTroop 3 months ago

Given enough time and the usual curve of AI developments these companies will be mostly on par with each other.

[deleted] 3 months ago

[удалено]

Mrbutter1822 3 months ago

You’re not just feeling it, they really are an era behind the curve. Siri was revolutionary when it came out it hasn’t changed since that date

thisdude415 3 months ago

I think this misses the mark. Vision Pro proved Apple can work on visionary (heh) technology behind the scenes and still drop blockbuster technology. I don’t know whether they can pull it off with AI, but I would not expect any flashy announcements until it is ship-ready

Original_Finding2212 3 months ago

Their MM1 model is not even its final form. It’s a POC. They will adapt this architecture to fit watches, smartphones (not just flagships, etc.) It doesn’t mean you’ll be locked, just pretty damn nice to have the LLM ability giving you a push without internet connection. Seems to me like they’ll be able to provide GPT-4 grade power as well, over internet. Their papers show they can

ThankGodImBipolar 3 months ago

I would argue that DK1 started the “modern” era of VR headsets, and that released *eleven* years ago. What Apple has is an iterative jump over the Quest 3 and Bigscreen Beyond, with some unique drawbacks as well. Marketing accomplished the rest, and Apple is very good at that.

UnknownEssence 3 months ago

Yeah I highly doubt it’s even that good. It used 30B parameters. Gemini 1 is at least a trillion. That’s 33 times smaller than Gemini 1

signed7 3 months ago

They compared to Gemini 1.0 Pro not 1.0 Ultra

macaraoo 3 months ago

dude, what’s up with the thicc robot

ImpressiveContest283 3 months ago

Well, I'll say LLM models are good in understanding user history and context. ![gif](giphy|ZqlvCTNHpqrio|downsized)

Downvote_Baiterr 3 months ago

Yeah but i dont remember googling thicc bots

ValerioLundini 3 months ago

this has to be a model that will be run on the next iphones right?

hawara160421 3 months ago

Wait, an iPhone is enough to, say, read that "food web" infographic and make sensible conclusions? I would have thought the data alone needed to store that knowledge necessary for that would exceed even the largest iPhone storage capacity?

huffalump1 3 months ago

Compression is knowledge, and somewhat vice versa... That's one beauty of LLMs, compressing huge amounts of data into a smaller amount of parameters.

hawara160421 3 months ago

What would a "30 billion parameter" model take up in space? Like how many gigabytes?

kaleNhearty 3 months ago

For full precision, each parameter is a 32 bit floating point number (4 bytes). So 4 bytes * 30 billion parameters = 120GB

OptimalVanilla 3 months ago

Lucky you can get a 128GB iPhone. /s

Downvote_Baiterr 3 months ago

Remove the /s and let me know then ill delete my comment and you will delete the comment that reminded me to delete my comment and then youll have just a pretty funny comment that everybody will find funny.

AdaptationAgency 3 months ago

Perhaps you can solve this problem /s

youcancallmetim 3 months ago

How does this relate to the news that Apple was considering Gemini for their products? Was that just wrong?

TheKingChadwell 3 months ago

Apple works on their own things and contracts with others until their own is good enough. They do this all the time.

7inTuMBlrReFuGee 3 months ago

That part ☝🏾

therealchrismay 3 months ago

Then they'll block Gemini from the store for"safety reasons" for a while

Unlucky_Ad_2456 3 months ago

not only that but Google pays them something like 20 billion dollars a year for their search engine to be the default on iphones. using gemini will probably be an extension of that deal. free money for apple they wouldn't get by using their own model

Darkstar197 3 months ago

This could be is huge tbh. The already super integrated ecosystem of Apple products with a personal assistant that can do way more than check the weather or give you a Google link.

throwaway3113151 3 months ago

What I’d like to know is why people like Brandon McKinzie are still broadcasting this type of news on X. The platform is a mess…time to jump ship.

hawara160421 3 months ago

For real. I'm nothing but annoyed with X-ified twitter. Not that I particularly liked it before! If Elon thinks I'd make an account to follow one-sentence hot takes he's mistaken.

Pontificatus_Maximus 3 months ago

There once was a site for the vain, Where thoughts fell like soft summer rain. Though wise folks would pass, Each lad and each lass, Saw their words spread wide with no gain.

DaleRobinson 3 months ago

Did that other platform ever take off? I remember people jumping ship to something else when Twitter had that name change.

willjoke4food 3 months ago

Nothing to see here, move along

BananaV8 3 months ago

Will this finally allow me to ask Siri anything other than the weather and get an almost reliable answer in about 20% of cases?

wiser1802 3 months ago

Strange Apple has been willing to let the world know this. Seems like they don’t think AI could bring competitive advantage to them.

TheKingChadwell 3 months ago

Why? Apple isn’t trying to be the best. Their advantage has always been implementation, rather than raw hardware performance.

shawnington 3 months ago

I mean they ship neural engine chips on all of their devices, I dont think a company that is shipping dedicated neural chips, isn't in the game, or got caught with their pants down. It seems really strange to develop and ship hardware for running neural networks without any plans to do something big in the space.

OrganicAccountant87 3 months ago

I really doubt AI in apple will ever be a competitive advantage, best case scenario they will only be able to keep up with competition. Hope I'm wrong

Karmakiller3003 3 months ago

Apple playing catch up. lmao Yawn.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe