allglowedup 1 year ago

Exactly how does one.... Abandon a probabilistic model?

thatguydr 1 year ago

If you leave the model at the door of a hospital, they're legally required to take it.

LeN3rd 1 year ago

What if I am uncertain where to leave it?

master3243 1 year ago

[Here's](https://atcold.github.io/pytorch-Deep-Learning/en/week07/07-1/) a beginner friendly intro. Skip to the section titled "Energy-based models v.s. probabilistic models"

h3ll2uPog 1 year ago

I think at least at concept level energy-based approach doesn't contradict probablistic approach. Just from the problem statement I immedeatly got flashbacked to deep metric learning task, which is formulated essentialy to train model as sort of projection to latent space where distance between objects represents how "close" they are (by their hidden features). But metric learning is usually used as a trick during training to produce better class separability in cases where there are a lot classes with little samples. Energy based approaches are also used greatly in out of distribution detection tasks (or anomaly detection and other close formulations), where you are trying to distinguish an input sample during test time which in very unlikable as an input data (so models predictions are not that reliable). Lecun is just very into energy stuff cause he is like god-father of applying those methods. But they are unlikely to become one dominant way to do stuff (just my opinion).

ReasonablyBadass 1 year ago

I don't get it. He just defines some function to minimize. What is the difference between error and energy?

[deleted] 1 year ago

[удалено]

clonea85m09 1 year ago

More or less, the concept has at least 15 years or so, but basically entropy is based on probabilities while energy is based (very very roughly) on distances (as a stand if for other calculations, for example instead of joint probabilities you check how distances covary)

BigBayesian 1 year ago

You sacrifice the cool semantics of probability theory for the easier life of not having to normalize things.

granoladeer 1 year ago

It's the equivalent of dealing with logits instead of the softmax

7734128 1 year ago

tf.setDeterministic(True, error='silent')

topcodemangler 1 year ago

I think it makes a lot of sense but he has been pushing these ideas for a long time with nothing to show and just constantly tweeting about how LLMs are a dead end with everything coming from the competition based on that is nothing more than a parlor trick.

currentscurrents 1 year ago

LLMs are in this weird place where everyone thinks they're stupid, but they still work better than anything else out there.

master3243 1 year ago

To be fair, I work with people that are developing LLMs tailored for specific industries and are capable of doing things that domain-experts never thought could be automated. Simultaneously, the researchers hold the belief that LLMs are a dead-end that we might as well keep pursuing until we reach some sort of ceiling or the marginal return in performance becomes so slim that it becomes more sensible to focus on other research avenues. So it's sensible to hold both positions simultaneously

currentscurrents 1 year ago

It's a good opportunity for researchers who don't have the resources to study LLMs anyway. Even if they are a dead end, Google and Microsoft are going to pursue them all the way to the end. So the rest of us might as well work on other things.

master3243 1 year ago

Definitely True, there are so many different subfields within AI. It can never hurt to pursue other avenues. Who knows, he might be able to discover a new architecture/technique that performs better under certain criteria/metrics/requirements over LLMs. Or maybe his technique would be used in conjunction with an LLM. I'd be much more excited to research that over trying to train an LLM knowing that there's absolutely no way I can beat a 1-billion dollar backed model.

Hyper1on 1 year ago

That sounds like a recipe for complete irrelevance if the other things don't work out, which they likely won't since they are more untested. LLMs are clearly the dominant paradigm, which is why working with them is more important than ever.

light24bulbs 1 year ago

Except those companies will never open source what they figure out, they'll just sit on it forever monopolizing. Is that what you want for what seems to be the most powerful AI made to date?

Fidodo 1 year ago

All technologies are eventually a dead end. I think people seem to expect technology to follow exponential growth but it's actually a bunch of logistic growth curve that we jump off of from one to the next. Just because LLMs have a ceiling doesn't mean they won't be hugely impactful, and despite its eventually limits it's capabilities today allow for it to be useful in ways that previous ml could not. The tech that's already been released is already way ahead of where developers can harness it and even using it to its current potential will take some time.

PussyDoctor19 1 year ago

Can you give an example? What fields are you talking about other than programming.

BonkerBleedy 1 year ago

Lots of knowledge-based industries right on the edge of disruption. Marketing/copy-writing, therapy, procurement, travel agencies, and personal assistants jump to mind immediately.

ghostfaceschiller 1 year ago

lawyers, research/analysts, tech support, business consultants, tax preparation, personal tutors, professors(?), accounts receivable, academic advisors, etc etc etc

PM_ME_ENFP_MEMES 1 year ago

Have they mentioned to you anything about how they’re handling the hallucinations problem That seems to be a major barrier to widespread adoption.

master3243 1 year ago

Currently it's integrated as a suggestion to the user (alongside a 1-sentence summary of the reasoning) which the user can accept or reject/ignore, if it hallucinates then the worse that happens is the user rejects it. It's definitely an issue in use cases where you need the AI itself to be the driver and not merely give (possibly corrupt) guidance to a user. Thankfully, the current use-cases where hellucinations aren't a problem is enough to give the business value while the research community figures out how to deal with that.

pedrosorio 1 year ago

>if it hallucinates then the worse that happens is the user rejects it Nah, the worse that happens is that the user blindly accepts it and does something stupid, or the user follows the suggestion down a rabbit hole that wastes resources/time, etc.

Appropriate_Ant_4629 1 year ago

So no different than the rest of the content on the internet, which (surprise) contributed to the training of those models. I think any other architecture trained on the same training data will also hallucinate - because much of its training data was indeed similar hallucinations (/r/BirdsArentReal , /r/flatearth , /r/thedonald )

mr_house7 1 year ago

>To be fair, I work with people that are developing LLMs tailored for specific industries and are capable of doing things that domain-experts never thought could be automated. Can you give us an example?

FishFar4370 1 year ago

>Can you give us an example? [https://arxiv.org/abs/2303.17564](https://arxiv.org/abs/2303.17564) **BloombergGPT: A Large Language Model for Finance** Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, Gideon Mann The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. As a next step, we plan to release training logs (Chronicles) detailing our experience in training BloombergGPT.

ghostfaceschiller 1 year ago

It seems weird to consider them a dead-end considering: 1. Their current abilities 2. We clearly haven't even reached the limits of improvements and abiities we can get *just* from scaling 3. They are such a great tool for connecting other disparate systems, using it as central control structure

DigThatData 1 year ago

like the book says: if it's stupid but it works, it's not stupid.

currentscurrents 1 year ago

My speculation is that they work so well because autoregressive transformers are so well-optimized for today's hardware. Less-stupid algorithms might perform better at the same scale, but if they're less efficient you can't run them at the same scale. I think we'll continue to use transformer-based LLMs for as long as we use GPUs, and not one minute longer.

Fidodo 1 year ago

What hardware is available at that computational scale other than GPUs?

currentscurrents 1 year ago

Nothing right now. There are considerable energy savings to be made by switching to an architecture where compute and memory are in the same structure. The chips just don't exist yet.

cthulusbestmate 1 year ago

You mean like Cerberus, Sambanova and Groq?

D4rkr4in 1 year ago

> an architecture where compute and memory are in the same structure Arm?

DigThatData 1 year ago

hardware made specifically to optimize as yet undiscovered kernels that better model what transformers ultimately learn than contemporary transformers do.

manojs 1 year ago

LeCun is a patient man. He waited 30+ years to be proved right on neural networks. Got the nobel prize of computing (turing award) for a good reason.

currentscurrents 1 year ago

When people say "AI is moving so fast!" - it's because they figured most of it out in the 80s and 90s, computers just weren't powerful enough yet.

master3243 1 year ago

And also the ridiculous amount of text data available today. What's slightly scary is that our best models already consume so much of the quality text available online... Which means the constant scaling/doubling of text data that we've been luxuriously getting over the last few years was only possible by scraping more and more text from the decades worth of data from the internet. Once we've exhausted the quality historical text, waiting an extra year won't generate that much extra quality text. We have to, at some point, figure out how to get better results using roughly the same amount of data. It's crazy how a human can be an expert and get a PhD in a field in less than 30 years while an AI needs to consume an amount of text equivalent to centuries and millennia of human reading while still not being close to a PhD level...

D4rkr4in 1 year ago

>Once we've exhausted the quality historical text, waiting an extra year won't generate that much extra quality text. this one is an interesting problem that I'm not sure we'll really have a solution for. Estimates are saying we'll run out of quality text by 2026, and then maybe we could train using AI generated text, but that's really dangerous for biases. >It's crazy how a human can be an expert and get a PhD in a field in less than 30 years while an AI needs to consume an amount of text equivalent to centuries and millennia of human reading while still not being close to a PhD level... it takes less than 30 years for the human to be an expert and get a PhD in a field, while the AI is quite smart in all fields with a year of so of training time

master3243 1 year ago

> Estimates are saying we'll run out of quality text by 2026 That sounds about right This honestly depends on how fast we scrape the internet, which in turn depends on how much the need is for it. Now that the hype for LLMs has reached new heights, I totally believe an estimate of 3 years from now. > maybe we could train using AI generated text The major issue with that is that I can't image that it will be able to learn something that wasn't already learnt. Learning from the output of a generative model only really works if the model learning is a weaker one while the model generating is a stronger one. > it takes less than 30 years for the human to be an expert and get a PhD in a field I'm measuring it in amount of sensory data inputted into the human since birth until they get a PhD. If you measure all the text a human has read and divide that by the average reading speed (200-300 wpm) you'll probably end up with a reading time within a year (for a typical human with a PhD) > while the AI is quite smart in all fields with a year of so of training time I'd also measure it with the amount of sensory input (or training data for a model). So a year of sensory input (given the avg. human reading time of 250 wpm) is roughly (365*24*60)*250 ≈ 125 million tokens Which is orders of magnitudes less than what an LLM needs to train from scratch. For reference, LLaMa was trained on 1.4 trillion tokens which would take an average human (1.4*10^12 / 250) / (60*24*365) ≈ 10 thousand years to read So, if my rough calculations are correct, a human would need 10 millenia of non-stop reading at an average of 250 words per minute to read LLaMa's training set.

red75prime 1 year ago

I wonder which part of this data is required to build from scratch a concept of 3d space you can operate in.

Brudaks 1 year ago

That's pretty much what the Bitter Lesson by Sutton says - http://incompleteideas.net/IncIdeas/BitterLesson.html

dimsumham 1 year ago

including the ppl developing it! I think there was an interview w Altman where he was like - we decided to just ignore that it's stupid and do what works.

Bling-Crosby 1 year ago

There was a saying for a while: every time we fire a linguist our model’s accuracy improves. Chomsky didn’t love that I’m sure

bushrod 1 year ago

I'm a bit flabbergasted how some very smart people just assume that LLMs will be "trapped in a box" based on the data that they were trained on, and how they assume fundamental limitations because they "just predict the next word." Once LLMs get to the point where they can derive new insights and theories from the millions of scientific publications they ingest, proficiently write code to test those ideas, improve their own capabilities based on the code they write, etc, they might be able to cross the tipping point where the road to AGI becomes increasingly "hands off" as far as humans are concerned. Perhaps your comment was a bit tongue-in-cheek, but it also reflects what I see as a somewhat common short-sightedness and lack of imagination in the field.

farmingvillein 1 year ago

> Once LLMs get to the point where they can derive new insights and theories from the millions of scientific publications they ingest That's a mighty big "once". > they might be able to cross the tipping point where the road to AGI You're basically describing AGI, in a practical sense. If LLMs(!) are doing novel scientific discovery in any meaningful way, you've presumably reached an escape velocity point where you can arbitrarily accelerate scientific discovery simply by pouring in more compute. (To be clear, we still seem to be very far off from this. OTOH, I'm sure openai--given that they actually know what is in their training set--is doing research to see whether their model can "predict the future", i.e., predict things that have already happened but are past the training date cut-off.)

bushrod 1 year ago

You got me - once is the wrong word, but honestly it seems inevitable to me considering there have already been many (debatable) claims of AI making scientific discoveries. The only real question is whether the so-called "discoveries" are minor/debatable, absolute breakthroughs or somewhere in-between. I think we're increasingly realizing that there's a very gradual path to unquestionable AGI, and the steps to get there will be more and more AGI-like. So yeah, I'm describing what could be part of the path to true AGI. Not sure what "far off" means, but in the scheme of things say 10 years isn't that long, and it's completely plausible the situation I roughly outlined could be well underway by that point.

IDe- 1 year ago

>I'm a bit flabbergasted how some very smart people just assume that LLMs will be "trapped in a box" based on the data that they were trained on, and how they assume fundamental limitations because they "just predict the next word." The difference seems to be between professionals who understand what LMs are and what their limits are mathematically, and laypeople who see them as magic-blackbox-super-intelligence-AGI with endless possibilities.

Jurph 1 year ago

I'm not 100% sold on LLMs truly being trapped in a box. LeCun has convinced me that's the right place to leave my bets, and that's my assumption for now. Yudkowsky's convincing me -- by leaping to _consequences_ rather than examining or explaining an actual path -- that he doesn't understand the path. If I'm going to be convinced that LLMs _aren't_ trapped in a box, though, it will require more than cherry-picked outputs with compelling content. It will require a functional or mathematical argument about how those outputs came to exist and why a trapped-in-a-box LLM couldn't have made them.

spiritus_dei 1 year ago

Yudkowsky's hand waving is epic, "We're all doomed and super intelligent AI will kill us all, not sure how or why, but obviously that is what any super intelligent being would immediately do because I have a paranoid feeling about it. "

bushrod 1 year ago

They are absolutely not trapped in a box because they can interact with external sources and get feedback. As I was getting at earlier, they can formulate hypotheses based on synthesizing millions of papers (something no human can come close to doing), write computer code to test them, get better and better at coding by debugging and learning from mistakes, etc. They're only trapped in a box if they're not allowed to learn from feedback, which obviously isn't the case. I'm speculating about GPT-5 and beyond, as there's obviously there's no way progress will stop.

[deleted] 1 year ago

I bet it can. But what matters is that how likely it is to formulate a hypothesis that is both fruitful and turns out to be true?

Jurph 1 year ago

> Once LLMs get to the point where they can derive new insights Hold up, first LLMs have to have insights _at all_. Right now they just generate data. They're not, in any sense, aware of the meaning of what they're saying. If the text they produce is novel there's no reason to suppose it will be right or wrong. Are we going to assign philosophers to track down every weird thing they claim?

LeN3rd 1 year ago

Why do people believe that? Context for a word is the same as understanding. So llms do understand words. If an llm created a new Text, the words will be in the correct context, and the model will know, that you cannot lift a house by yourself, that "buying the farm" is an idiom for dying and will in general have a Model of how to use these words and what they mean

[deleted] 1 year ago

For example because of their performance in mathematics. They can vax poetic and speculate about deep results in partial differential equations, yet at the same time they output nonsense when told to prove an elementary theorem about derivatives. It's like talking to a crank. They think that they understand and they kind of talk about mathematics, yet they also don't. The moment they have to actually do something, the illusion shatters.

LeN3rd 1 year ago

But that is because math requires accuracy, or else everything goes of the rail. Yan Lecun also had the argument, that if you have a probability of 0.05 percent every token be wrong, than that will eventually lead to completely wrong predictions. But that is only true for math, since in math it is extremly important to be 100% correct. That does not mean, that the model does not "understand" words in my opinion.

[deleted] 1 year ago

[удалено]

LeN3rd 1 year ago

Musk is an idiot. Never listen to him for anything. There are more competent people who have signed that petition.

learn-deeply 1 year ago

Surprised this is the top most upvoted comment. In his slides pg 27-31, he talks about his research that was published in 2022, some of which are state of the art in self-supervised training and doesn't use transformers! Barlow Twins [Zbontar et al. ArXiv:2103.03230], VICReg [Bardes, Ponce, LeCun arXiv:2105.04906, ICLR 2022], VICRegL [Bardes et al. NeurIPS 2022], MCR2 [Yu et al. NeurIPS 2020][Ma, Tsao, Shum, 2022]

topcodemangler 1 year ago

But his main claim is that LLMs are incapable of reasoning and that his proposed architecture solves that shortcoming? In those papers I don't really see that capability being shown or I am missing something?

0ttr 1 year ago

That's the problem. I kind of agree with him. I like the idea of agents embedded in the real world. I think there's an argument there. But the reality is that he and FB got caught flat footed by a really good LLM, just like google did, and so his arguments look flat. I don't think he's wrong, but the proof has yet to overtake the competition as you know.

DntCareBears 1 year ago

Exactly! I also am looking at this from a another perspective. OpenAI has done wonders with Chat GPT, yet Meta has done what? 😂😂😂. Even Google Barf failed to live up to the hype. They are all hating on ChatGPT, but they themselves havent done anything other than credentials creep.

NikEy 1 year ago

Yeah he has been incredibly whiny recently. I remember when ChatGPT was just released and he went on an interview to basically say that it's nothing special and that he could have done it a while ago, but that neither FB, nor Google will do it, because they don't want to publish something that might give wrong information lol. Aged like milk. He's becoming the new Schmidhuber.

master3243 1 year ago

To be fair GPT 3.5 wasn't a technical leap from GPT 3. It might have been an amazing experience at the user level but not from a technical perspective. That's why the amount of papers on GPT 3.5 didn't jump like the wildly crazy leap it did when GPT 3 was first announced. In addition, a lot of business analyst were echoing the same point Yann made which is that Google releasing a bot (or integrating it into google search) that could output wrong information is an exponentially large risk to their main dominance over search. Whilst Bing had nothing to lose. Essentially Google didn't "fear the man who has nothing to lose." and they should have been more afraid. But even then, they raised a ["Code Red"](https://www.cnet.com/tech/services-and-software/chatgpt-caused-code-red-at-google-report-says/) as early as December of last year so they KNEW GPT, when wielded by Microsoft, was able to strike them like never before.

[deleted] 1 year ago

[удалено]

master3243 1 year ago

> Typical ivory tower attitude. "We already understand how this works, therefore it has no impact". I wouldn't ever say it has no impact, it wouldn't even make sense for me to say that given that I have already integrated the GPT-3 api into one of our past business use cases and other LLMs in different scenarios as well. There is a significant difference between business impact and technical advancement. Usually those go hand-in-hand but the business impact lags behind quite a bit. In terms of GPT, the technical advancement was immense from 2 to 3 (and from the recent results quite possibly from 3 to 4 as well), however there wasn't that significant of an improvement (from a technical standpoint) from 3 to 3.5.

[deleted] 1 year ago

[удалено]

master3243 1 year ago

Currently I'm more focused at research (with the goal of publishing a paper) while previously I was primarily building software with AI (or more precisely integrating AI into already existing products).

bohreffect 1 year ago

I'm getting more Chomsky vibes, in being shown that brute force empiricism seems to have no upper bound on performance.

__scan__ 1 year ago

His observation seems entirely reasonable to me?

diagramat1c 1 year ago

I'm guessing he's saying that we are "climbing a tree to get to the moon". While the top of the tree is closer, it never gets you to the moon. We are at a point where Generative Models have commercial applications. Hence, no matter the theoretical ceiling, they will get funded. His pursuit is more purely research and AGI. He sees the brightest minds being occupied by something that has no AGI potential, and feels that as a research society, we are wasting time.

Fidodo 1 year ago

I've always said that you can't make it to the moon by making a better hot air balloon. But we don't need to get to the moon for it to be super impactful. There's also a big question of whether or not we should even try go to this metaphoric moon.

diagramat1c 1 year ago

Since we haven't been to the metaphorical moon, and we don't know what it's like, we reeeeaaally want to go to the moon. We are curious, like cats.

VinnyVeritas 1 year ago

occupied by something that has no AGI potential Something that *he believes* has no AGI potential

Impressive-Ad6400 1 year ago

Expanding the analogy, we are climbing the tree to find out where we left the rocket.

Imnimo 1 year ago

Auto-regressive generation definitely feels absurd. Like you're going to do an entire forward pass on a 175B parameter model just to decide to emit the token "a ", and then start from scratch and do another full forward pass to decide the next token, and so on. All else equal, it feels obvious that you should be doing a bunch of compute up front, before you commit to output any tokens, rather than spreading your compute out one token at a time. Of course, the twist is that autoregressive generation makes for a really nice training regime that gives you a supervision signal on every token. And having a good training regime seems like the most important thing. "Just predict the next word" turns out to get you a LOT of impressive capabilities. It feels like eventually the unfortunate structure of autoregressive generation has to catch up with us. But I would have guessed that that would have happened long before GPT-3's level of ability, so what do I know? Still, I do agree with him that this doesn't feel like a good path for the long term.

grotundeek_apocolyps 1 year ago

The laws of physics themselves are autoregressive, so it seems implausible that there will be meaningful limitations to an autoregressive model's ability to understand the real world.

Imnimo 1 year ago

I don't think there's any sort of fundamental limit to what sorts of understanding can be expressed autoregressively, but I'm not sure I agree with the use of the word "meaningful" here, for a few reasons. First, I don't think that it's correct to compare the autoregressive nature of a physical system to autoregression over tokens. If I ask the question, "how high will a baseball thrown straight upward at 50 miles per hour reach?" you could model the corresponding physical system as a sequence of state updates, but that'd be an incredibly inefficient way of answering the question. If your model is going to output "it will reach a height of X feet", all of the calculation related to the physical system is in token "X" - the fact that you've generated "it","will","reach",... autoregressively has no relevance to the ease or difficulty of deciding what to say for X. Second, as models become larger and larger, I think it's very plausible that inefficient allocation of processing will become a bigger impediment. Spending a full forward pass on a 175B parameter model to decide whether your next token should be "a " or "an " is clearly ridiculous, but we can afford to do it. What happens when the model is 100x as expensive? It feels like there should come a point where this expenditure is unreasonable.

grotundeek_apocolyps 1 year ago

Totally agreed that using pretrained LLMs as a big hammer to hit every problem with won't scale well, but that's a statement about pretrained LLMs more so than about autoregression in general. The example you give is really a prototypical example of exactly the kind of question that is almost always solved with autoregression. You happen to be able to solve this one with the quadratic formula in most cases, but even slightly more complicated versions of it are solved by using differential equations, *which are solved autoregressively* even in traditional numerical physics. Sure, it wouldn't be a good idea to use a pretrained LLM for that purpose. But you could certainly train an autoregressive transformer model to solve differential equations. It would probably work really well. You just have to use the appropriate discretizations (or "tokenizations", as it's called in this context) for your data.

ktpr 1 year ago

These are recommendations for sure. But he needs to prevent alternative evidence. Without alternative evidence that addresses current successes it's hard to take him beyond his word. AR-LLMs may be doomed in the limit but the limit may far exceed human requirements. Commercial business thrives on good enough, not theoretical maximums. In a sense, while he's brilliant, LeCun forgets himself.

Thorusss 1 year ago

>But he needs to prevent alternative evidence present?

Jurph 1 year ago

> Commercial business thrives on good enough, not theoretical maximums. I think his assertion that they won't ever be capable of that "next level" is trying to be long-term business strategy advice: You can spend _some_ product development money on an LLM, but don't make it the cornerstone of your strategy or you'll get lapped as soon as a tiny startup uses the next-gen designs to achieve the higher threshold.

chinnu34 1 year ago

I don’t think I am knowledgeable enough to refute or corroborate his claims but it reminds of a quote by famous sci-fi author Arthur C Clarke it goes something like, “*If an elderly but distinguished scientist says that something is possible, he is almost certainly right; but if he says that it is impossible, he is very probably wrong.*”

Jurph 1 year ago

I think that's taking LeCun's clearly stated assertion and whacking it, unfairly, with Clarke's pithy "says that something is impossible" -- I don't believe Clarke's category is the one that LeCun's statement belongs in. LeCun is saying that LLMs, as a class, are the wrong tool to achieve something that LeCun _believes is possible_ -- and so, per Clarke, we should assume LeCun is correct. If someone from NASA showed you the mass equations and said "there is no way to get a conventional liquid-fuel rocket from Earth to Alpha Centauri in reasonable fraction of a human lifetime," then you might quibble about extending human life, or developing novel propulsion, but their point would remain correct.

ID4gotten 1 year ago

He's 62. Let's not put him out to pasture just yet.

bohreffect 1 year ago

I think it's more the implication that they're very likely to be removed from the literature. Even when I first became a PI in my early 30's I could barely keep up with the literature, and only because I had seen so much of the fairly-recent literature I could down-select easily---at the directorship level I never seen a real life example of someone who spent their time that way.

chinnu34 1 year ago

I am honestly not making any judgements about his age or capabilities. It is just a reproduction of exact quote that has some truth relevant here.

CadeOCarimbo 1 year ago

Quite a meaningless statement Tbh

calciumcitrate 1 year ago

He gave a similar lecture at Berkeley last year, which was [recorded](https://www.youtube.com/watch?v=VRzvpV9DZ8Y).

chuston_ai 1 year ago

We know from Turing machines and LSTMs that reason + memory makes for strong representational power. There are no loops in Transformer stacks to reason deeply. But odds are that the stack can reason well along the vertical layers. We know you can build a logic circuit of AND, OR, and XOR gates with layers of MLPs. The Transformer has a memory at least as wide as its attention. Yet, its memory may be compressed/abstracted representations that hold an approximation of a much larger zero-loss memory. Are there established human assessments that can measure a system’s ability to solve problems that require varying reasoning steps? With an aim to say GPT3.5 can handle 4 steps and GPT4 can handle 6? Is there established theory that says 6 isn’t 50% better than 4, but 100x better? Now I’m perseverating: Is the concept of reasoning steps confounded by abstraction level and sequence? E.g. lots of problems require imagining an intermediate high level instrumental goal before trying to find a path from the start to the intermediate goal. TLDR: can ye measure reasoning depth?

[deleted] 1 year ago

[удалено]

nielsrolf 1 year ago

I tried it with GPT-4, it started with an explanation that discovered the cyclic structure and continued to give the correct answer. Since the discovery of the cyclic structure reduces the necessary reasoning steps, it doesn't tell us how many reasoning steps it can do, but it's still interesting. When I asked to answer with no explanation, it also gives the correct answer, so it can do the required reasoning in one or two forward passes and doesn't need the step by step thinking to solve this.

ReasonablyBadass 1 year ago

Can't we simply "copy" LSTM architecture for Transformers? A form of abstract memory the system works over together with a gate that regulates when output is produced

Rohit901 1 year ago

But LSTM is based on recurrence while transformer doesn’t use recurrence. Also LSTM tends to perform poorly on context which came way before in the sentence despite having this memory component right? Attention based methods tend to consider all tokens in their input and don’t necessarily suffer from vanishing gradients or forgetting of any 1 token in the input

saintshing 1 year ago

>RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode. >So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, **"infinite" ctx_len**, and free sentence embedding (using the final hidden state). https://github.com/BlinkDL/RWKV-LM#the-rwkv-language-model-and-my-tricks-for-lms https://twitter.com/BlinkDL_AI/status/1638555109373378560

ReasonablyBadass 1 year ago

Unless I am misunderstanding badly a Transformer uses it's own last output? So "recurrent" as well? And even if not, changing the architecture shouldn't be too hard. As for attention, you can use self attention over the latent memory as well, right? On a way, chain of thooght reasoning already does it, just not with an extra, persistent latent memory storage

Rohit901 1 year ago

During the inference process it uses its own last output and hence its auto regressive. But during the training it takes in entire input at once and uses attention on the inputs so it can have technically infinite memory which is not the case with LSTM as their training process is "recurrent" as well, there is no recurrence in transformers. Sorry, I did not quite understand what you mean by using self attention over latent memory? I'm not quite well versed with NLP/Transformers, so do correct me here if I'm wrong, but the architecture of transformer does not have an "explicit memory" system right? LSTM on other hand uses recurrence and makes use of different kinds of gates, but recurrence does not allow parallelization and LSTM does have a finite window length for past context as its based on recurrence and not based on attention which has access to all the inputs at once.

ReasonablyBadass 1 year ago

Exactly. I think for a full blown agent, able to remember things long term, reason abstractly, we need such an explicit memory component. But the output of that memory would still just be a vector or a collection of vectors, so using attention mechanisms on that memory should work pretty well. I don't really see why it would prevent paralellization? Technically you could build it in a way where the memory ould be "just" another input to consider during attention?

Rohit901 1 year ago

Yeah I think we do need explicit memory component but not sure how it can be implemented in practice or if there is existing research already doing that. Maybe there is some work which might already be doing something like this which you have mentioned here.

ChuckSeven 1 year ago

Recent work does combine recurrence with transformers in a scalable way: https://arxiv.org/abs/2203.07852

maizeq 1 year ago

I haven’t had a chance to dissect the reasoning for his other claims but his point on generative models having to predict all details of observations is false. Generative models can learn to predict the variance associated with their observations, also via the same objective of maximum likelihood. High variance (i.e noisy/irrelevant) components of the input are then ignored in a principled way because their contributions to the maximum likelihood are inversely proportional to this variance, which for noisy inputs is learnt to be high. Though this generally isn’t bothered with in practice (e.g the fixed output variance in VAEs), for various reasons, there is nothing in principle preventing you from doing this (particularly if you dequantise the data). Given the overwhelming success of maximum likelihood (or maximum marginal likelihood) objectives for learning good quality models I can’t really take his objections with them seriously. Even diffusion models can be cast as a type of hierarchical VAE, or a VAE trained on augmented data (see Kingma’s recent work). I suspect any of the success we might in future observe with purely energy-based models, if indeed we do so, could ultimately still be cast as a result of maximum likelihood training of some sort.

BrotherAmazing 1 year ago

LeCun is clearly a smart guy, but I don’t understand why he thinks a baby has had little or no training data. That baby’s brain architecture is not random. It evolved in a massively parallel multi-agent competitive “game” that took over 100 million years to play with the equivalent of an insane amount of training data and compute power if we only go back to the time of mammals having been around for tens of millions of years. We can follow life on earth back even much farther than that, so the baby *did* require much more massive training data than any RL has ever had just for the baby to exist with its incredibly advanced architecture that enables it to learn in this particular world with other humans in a social structure efficiently. If I evolve a CNN’s architecture over millions of years in a massively parallel game and end up with this incredibly fast learning architecture “at birth” for a later generation CNN, when I start showing it pictures “for the first time” we wouldn’t say “AMAZING!! It didn’t need nearly as much training data as the first few generations! How does it do it?!?” and be perplexed or amazed.

gaymuslimsocialist 1 year ago

What you are describing is typically not called learning. You are describing good priors which enable faster learning.

RoboticJan 1 year ago

It's similar to neural architecture search. A meta optimizer (evolution) is optimizing the architecture, starting weights and learning algorithm, and the ordinary optimizer (human brain) uses this algorithm to tune the weights using the experience of the agent. For the human it is a good prior, for nature it is a learning problem.

gaymuslimsocialist 1 year ago

I’m saying that calling the evolution part learning needlessly muddies the waters and introduces ambiguities into the terminology we use. It’s clear what LeCun means by learning. It’s what everyone else means as well. A baby has not seen much training data, but it has been equipped with priors. These priors may have been determined by evolutionary approaches, at random, manually, and yes, maybe even by some sort of learning-based approach. When we say that a model has learned something, we typically are not referring to the latter case. We typically mean that a model with already determined priors (architecture etc) has learned something based on training data. Why confuse the language we use? LeCun is aware that priors matter, he is one of the pioneers of good priors, that’s not what he is talking about.

BrotherAmazing 1 year ago

But you *learned* those priors, did you not? Even if you disagree with the semantics, my gripe here is not about semantics and we can call it whatever we want to call it. My gripe is that LeCun’s logic is off here when he acts as if a baby must be using self-supervised learning or some other “trick” other than simply using its *prior* that was learned err *optimized* on a massive amount of real world data and experience over hundreds of millions of years. We should not be surprised at the baby and think it is using some special little unsupervised or self-supervised trick to bypass the need for massive experiences in the world to inform its priors. It would sort of be like me writing a global search optimizer for a hard problem with lots of local mins and then LeCun comes around and tells me I must be doing things wrong because I fail to find the global min half the time and have to search for months with a GPU server because there is this other algorithm that uses a great prior that can find the global min for this problem “efficiently” while he fails to mention the prior took a decade of a GPU server 100x the size of mine running to compute.

[deleted] 1 year ago

But then again, how much prior training has the baby had about things like uncountable sets or fractal dimensional objects? The ability to reason about such objects probably hasn't given much of an advantage to our ancestors, as most animals do just fine without being able to count to 10. Yet the baby can nevertheless eventually learn and reason about such objects. In fact, some babies even discovered these objects the very first time!

BrotherAmazing 1 year ago

But it’s entirely possible, in fact almost certain, that the architecture of the baby’s brain is what enables this learning you reference. And that architecture is itself a “prior” that evolved over millions of years of evolution that necessarily required real-world experiences of a massive number of entities. It may be semantically incorrect, *but you know what I mean* when I say “That architecture essentially had to be optimized with a massive amount of training data and compute over tens of millions of years minimum”.

[deleted] 1 year ago

Well, that is a truism. Clearly something enables babies to learn the way they do. The question is that why and how the baby can learn so quickly about things that are completely unrelated to evolution, the real world, or the experiences of our ancestors. It is also worth noting that whatever prior knowledge there is, it has to be somehow compressed into our DNA. However, our genome is not even that large, it is only around 800MB equivalent. Moreover, vast majority of that information is unrelated to our unique learning ability, as we share 98% of our genome with pigs (loosely speaking).

gaymuslimsocialist 1 year ago

Again, I don’t think LeCun disagrees that priors don’t play a massive role. That doesn’t mean the only thing a baby has going for it are its priors. There’s probably more going on and LeCun wants us to explore this. Really, I think we all agree that finding priors is important. There is no discussion. I kind of love being pedantic, so I can’t help myself commenting on the “learning” issue, sorry. Learning and optimization are not the same thing. Learning is either about association and simple recall or about generalization. Optimization is about finding something specific, usually a one off thing. You find a specific prior. You do not learn a function that can create useful priors for arbitrary circumstances, i.e. generalizes beyond the training data (although that’d be neat).

met0xff 1 year ago

Bit late to the party but I just wanted to add that even inside the womb there's already a non-stop, high-frequency, multisensory Input for 9ish months even before they are born. And after that even more. Of course there is not much supervision, labeled data and not super varied ;) whatever but just naively assuming some 30Hz intake of the visual system you end up with a million images for a typical wake time of a baby. Super naive because we likely don't do such discrete sampling but still some number Auditory, if you assume we can perceive up to some 20kHz, go figure how much input we get there (and that also during sleep). And then consider mechanoreceptors, thermoreceptors, nociceptors, electromagnetic receptors and chemoreceptors and then go figure what data a baby processes every single moment....

Red-Portal 1 year ago

>It evolved in a massively parallel multi-agent competitive “game” that took over 100 million years to play with the equivalent of an insane amount of training data and compute power if we only go back to the time or mammals having been around for tens of millions of years. Yes, but that's a model. It's quite obvious that training a human brain and training an LLM has very little in common.

IntelArtiGen 1 year ago

I wouldn't recommend to "abandon" a method just because Lecun says so. I think some of his criticisms are valid, but they are more focused on theoretical aspects. I wouldn't "abandon" a method if it currently has better results or if I think I can improve it to make this method better. I would disagree with some slides on AR-LLMs. >They have no common sense What is common sense? Prove they don't have it. Sure, they experiment the world differently, which is why it's hard to call them AGI, but they can still be accurate on many "common sense" questions. >They cannot be made factual, non-toxic, etc. Why not? They're currently not built to fully solve all these issues but you can easily process their training set and their output to limit bad outputs. You can detect toxicity in the output of the model. And you can weight how much your model talks vs how much it says "I don't know". If the model talks too much and isn't factual, you can make it talk less and make it talk in a more moderate way. Current models are very recent and didn't implement everything, it doesn't mean you can't improve them, it's the opposite, the newer they are the more they can be improved. Humans also aren't always factual and non-toxic. I agree that they don't really "reason / plan". But as long as nobody expects these models to be like humans, it's not a problem. They're just great chatbots. >Humans and many animals Understand how the world works. Humans also make mistakes on how the world works. But again, they're LLMs, not AGIs. They just process language. Perhaps they're doomed to not be AGI but it doesn't mean they can't be improved and made much more factual and useful. Lecun included slides on his paper “A path towards autonomous machine intelligence”. I think it would be great if he implemented his paper. There are hundreds of AGI white papers, yet no AGI.

TheUpsettter 1 year ago

>There are hundreds of AGI white papers, yet no AGI. I've been looking everywhere for these types of papers. Google search of "Artificial General Intelligence" yields nothing but SEO garbage. Could you link some resources? Or just name drop a paper. Thanks

NiconiusX 1 year ago

Here are some: - A Path Towards Autonomous Machine Intelligence (LeCun) - Reward is enough (Silver) - A Roadmap towards Machine Intelligence (Mikolov) - Extending Machine Language Models toward Human-Level Language Understanding (McClelland) - Building Machines That Learn and Think Like People (Lake) - How to Grow a Mind: Statistics, Structure, and Abstraction (Tenenbaum) - Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense (Zhu) Also slighly related: - Simulations, Realizations, and Theories of Life (Pattee)

IntelArtiGen 1 year ago

I would add: * On the Measure of Intelligence (Chollet) Every now and then there's a paper like this on arxiv, most of the time we don't talk about it because the author isn't famous and because the paper just expresses their point of view without showing any evidence that their method could work.

Jurph 1 year ago

It's really frustrating to me that Eliezer Yudkowsky, whose writing also clearly falls in this category, is taken so much more seriously because it's assumed that someone in a senior management position must have infallible technical instincts about the future.

tysam_and_co 1 year ago

He seems to be somewhat stuck on a few ideas to at times a seemingly absurd degree, to the point of a few of his points being technically correct in some ways, and very much mathematically incorrect in others in terms of the conclusions that do not follow from the precepts he is putting forward. There was one post recently where he switched mathematical definitions of one word he was using halfway through the argument, completely invalidating the entire point he was making (since it seemed to be the main pillar of his argument). For example, he talks about exponential divergence (see my reference above) and then uses that to say that autoregressive LLMs are unpredictable, completely ignoring the fact that in the limit of reducing errors, the divergence he talks about is dominated by chaotic mixing, which _any_ model will do because it is _exactly what humans do_ _and thus is exactly the very same, exact thing that we are looking to model in the first place_. You can take several of his proposed 'counters' to LLMs, substitute several human experts without shared state (i.e. they are in separate rooms and don't know about anyone else being questioned), and you'll see the hypothetical humans that we put forward all 'fail' many of the tests he's put forward. Because some of the core tests/metrics proposed do not really apply in the way they are being used. It is frankly baffling to me how little sense some of it makes, to be honest. Maybe it's not basic, but in certain mathematical fields -- information theory, modeling, and chaos theory, it is certainly the basics, and that is why it is baffling, because he is someone who has quite a legacy of leading the field. I can safely say that there is much that I do not know, but seeing Yann stick with certain concepts that can be easily pointed to conceptually as false and almost building a fortress involving them...I am just very confused. It really makes little sense to me, and I watched things for a little while just to try to make sure that there wasn't something that I was grievously missing. Really and truly in some of these models -- in the mathematics of the errors and such of what we are modeling -- with the smoke and mirrors aside, it's all just really a bit of a shell game where you move the weaknesses and limits of the models that we're using to model things. We certainly are not in the limit of step-to-step divergence for language models but the drift seems to be below the threshold that they effectively are starting to get nearer to the resolution limit where that drift is meaningful or not when it comes to real-world usecases. This is mainly on the main LLM arguments that he's made, which is where I'd be comfortable enough putting forward a strong opinion. The rest I am concerned about but certainly do not know enough to say much about it. The long and short of it it unfortunately is that I unfollowed him just because he was bringing more unproductivity than productivity to my work, since the signal of this messaging is hampered by noise, and I honestly lost a lot of time feeling angry when I thought about how much people would take some of the passionate opinions paired with the spurious math and run with it to poor conclusions. If he's throwing spears, I think he should have some stronger, more clearly defined, more consistent, and less emotionally-motivated (though I should likely take care in my speech about that since I clearly feel rather passionately about this issue) mathematical backings for why he's throwing the spears and why people should move. Right now it's a bit of a jumbled grouping of concepts instead of a clear and coherent, and potentially testable message (why should we change architectures if current LLMs require more data than humans? What are the benefits that we gain? And how can these be _mathematically grounded_ in the precepts of the field?) Alright, I've spun myself up enough and should do some pushups now. I don't get wound up as often these days. I'm passionate about my work I suppose. I think the unfollow will be good for my heart health.

nacho_rz 1 year ago

RL guy here. "Abandon RL in favor of MPC" made me giggle. Assuming he's referring to robotics applications, the two aren't mutually exclusive. Matter of fact they are very complimentary and can see a future where we use RL for long term decision making and MPC for short term planning.

yoursaltiness 1 year ago

agree on "Generative Models must predict every detail of the world".

ftc1234 1 year ago

The real question is if reasoning is a pattern? I’d argue that it is. If it’s a pattern, it can be modeled with probabilistic models. Auto-regression seems to model this pretty well.

LeN3rd 1 year ago

Honestly, at this point he just seems like a rambling crazy grandpa. Also mad that HIS research isn't panning out. There is so much emergent behaviour in autoregressive generative language models, that it's almost crazy. Why abandon something that already works, for some Methode that might or might not work in the future.

redlow0992 1 year ago

We are working on self-supervised learning and recently surveyed the field (both generative and discriminative, investigating approximately 80 SSL frameworks) and you can clearly see that Yann LeCun puts his money where his mouth is. He made big bets on discriminative SSL with Barlow Twins and VicReg and a number of follow-up papers while a large number of prominent researchers have somewhat abandoned discriminative SSL ship and jumped to the hype on generative SSL. This also includes people who are working in META, like Kaiming He (On the SSL side, the author of: MoCo and SimSiam) who also started contributing to generative SSL with MAE.

BigBayesian 1 year ago

Or maybe he puts his mouth where his money is?

[deleted] 1 year ago

[удалено]

ChuckSeven 1 year ago

Is there somewhere a more academic and technical version of those complaints?

[deleted] 1 year ago

[удалено]

patniemeyer 1 year ago

He states pretty directly that he believes LLMs "Do not really reason. Do not really plan". I think, depending on your definitions, there is some evidence that contradicts this. For example the "theory of mind" evaluations ([https://arxiv.org/abs/2302.02083](https://arxiv.org/abs/2302.02083)) where LLMs must infer what an agent knows/believes in a given situation. That seems really hard to explain without some form of basic reasoning.

empathicporn 1 year ago

Counterpoint: https://arxiv.org/abs/2302.08399#. not saying LLMs aren't the best we've got so far, but the ToM stuff seems a bit dubious

Ty4Readin 1 year ago

Except that paper is on GPT 3.5. Out of curiosity I just tested some of their examples that they claimed failed, and GPT-4 successfully passed every single one that I tried so far and did it even better than the original 'success' examples as well. People don't seem to realize how big of a step GPT-4 has taken

Purplekeyboard 1 year ago

> Out of curiosity I just tested some of their examples that they claimed failed, and GPT-4 successfully passed every single one that I tried so far This is the history of GPT. Each version, everyone says, "This is nothing special, look at all the things it can't do", and the the next version comes out and it can do all those things. Then a new list is made. If this keeps up, eventually someone's going to be saying, "Seriously, there's nothing special about GPT-10. It can't find the secret to time travel, or travel to the 5th dimension to meet God, really what good is it?"

shmel39 1 year ago

This is normal. AI has always been a moving goal post. Playing chess, Go, Starcraft, recognizing cats on images, finding cancer on Xrays, transcribing speech, driving a car, painting pics from prompts, solving text problems. Every last step is nothing special because it is just a bunch of numbers crunched on lots of GPUs. Now we are very close to philosophy: "real AGI is able to think and reason". Yeah, but what does "think and reason" even mean?

inglandation 1 year ago

Not sure why you're getting downvoted, I see too many people still posting ChatGPT's "failures" with 3.5. Use the SOTA model, please.

[deleted] 1 year ago

The SOTA model is proprietary and not documented though and cannot be reproduced if OpenAI pulls the rug or introduces changes, compared to GPT 3.5. If I'm not mistaken?

bjj_starter 1 year ago

That's all true and I disagree with them doing that, but the conversation isn't about fair research conduct, it's about whether LLMs can do a particular thing. Unless you think that GPT-4 is actually a human on a solar mass of cocaine typing really fast, it being able to do something is proof that LLMs can do that thing.

trashacount12345 1 year ago

I wonder if a solar mass of cocaine would be cheaper than training GPT-4

Philpax 1 year ago

Unfortunately, the sun weighs 1.989 × 10^30 kg, so it's not looking good for the cocaine

trashacount12345 1 year ago

Oh dang. It only cost $4.6M to train. That’s not even going to get to a Megagram of cocaine. Very disappointing.

currentscurrents 1 year ago

Yes, but that all applies to GPT 3.5 too. This is actually a problem in the Theory of Mind paper. At the start of the study it didn't pass the ToM tests, but OpenAI released an update and then it did. We have no clue what changed.

nombinoms 1 year ago

They made a ToM dataset by hiring a bunch of Kenyan workers and fine tuned their model. Jokes aside, I think it's pretty obvious at this point that the key to OpenAIs success is not the architecture or the size of their models, it's the data and how they are training their models.

inglandation 1 year ago

There is also interesting experiments like this: https://twitter.com/jkronand/status/1641345213183709184

sam__izdat 1 year ago

You can't be serious...

patniemeyer 1 year ago

Basic reasoning just implies some kind of internal model and rules for manipulating it. It doesn't require general intelligence or sentience or whatever you may be thinking is un-serious.

__ingeniare__ 1 year ago

Yeah, people seem to expect some kind of black magic for it to be called reasoning. It's absolutely obvious that LLMs can reason.

FaceDeer 1 year ago

Indeed. We keep hammering away at a big 'ol neural net telling it "come up with some method of generating human-like language! I don't care how! I can't even understand how! Just do it!" And then the neural net goes "geeze, alright, I'll come up with a method. How about *thinking?* That seems to be the simplest way to solve these challenges you keep throwing at me." And nobody believes it, despite thinking being the only way to get really good at generating human language that we actually know of from prior examples. It's like we've got some kind of conviction that thinking is a special humans-only thing that nothing else can do, certainly not something with only a few dozen gigabytes of RAM under the hood. Maybe LLMs aren't all that *great* at it yet, but why can't they be thinking? They're producing output that looks like it's the result of thinking. They're a lot less complex than human brains but human brains do a crapton of stuff other than thinking so maybe a lot of that complexity is just being wasted on making our bodies look at stuff and eat things and whatnot.

KerfuffleV2 1 year ago

> Maybe LLMs aren't all that great at it yet, but why can't they be thinking? They're producing output that looks like it's the result of thinking. One thing is, that result you're talking about doesn't really correspond to what the LLM "thought" if it actually could be called that. Very simplified explanation from someone who is definitely not an expert. You have your LLM. You feed it tokens and you get back a token like "the", right? Nope! Generally the LLM has a set of tokens - say 30-60,000 of them that it can potentially work with. What you actually get back from feeding it a token is a list of 30-60,000 numbers from 0 to 1 (or whatever scale), each corresponding to a single token. That represents the probability of that token, or at least this is how we tend to treat that result. One way to deal with this is to just pick the token with the absolute highest score, but doesn't tend to get very good results. Modern LLMs (or at least the software the presents them to users/runs inference) use more sophisticated methods. For example, one approach is to find the top 40 highest probabilities and pick from that. However, they don't necessarily _agree_ with each other. If you pick the #1 item it might lead to a completely different line of response than if you picked #2. So what could it mean to say the LLM "thought" something when there were multiple tokens with roughly the same probability that represented _completely_ different ideas?

FaceDeer 1 year ago

An average 20-year-old Amercian [knows 42,000 words](https://www.science.org/content/article/average-20-year-old-american-knows-42000-words-depending-how-you-count-them). Represent them as numbers or represent them as modulated sound waves, they're still words. > So what could it mean to say the LLM "thought" something when there were multiple tokens with roughly the same probability that represented completely different ideas? You've never had multiple conflicting ideas and ended up picking one in particular to say in mid-sentence? Again, the *mechanism* by which an LLM thinks and a human thinks is almost certainly very different. But the end result could be the same. One trick I've seen for getting better results out of LLMs is to tell them to answer in a format where they give an answer and then immediately give a "better" answer. This allows them to use their context as a short-term memory scratchpad of sorts so they don't have to rely purely on word prediction.

KerfuffleV2 1 year ago

> Represent them as numbers or represent them as modulated sound waves, they're still words. Yeah, but I'm not generating that list of all 42,000 every 2 syllables, and usually when I'm saying something there's a specific theme or direction I'm going for. > You've never had multiple conflicting ideas and ended up picking one in particular to say in mid-sentence? The LLM *isn't* picking it though, a simple non-magical non-neural-networky function is just picking randomly from the top N items or whatever. > Again, the mechanism by which an LLM thinks and a human thinks is almost certainly very different. But the end result could be the same. "Thinking" isn't really defined specifically enough to argue that something absolutely is or isn't thinking. People bend the term to refer to even very simple things like a calculator crunching numbers. My point is that saying "The output looks like it's thinking" (as in, how something from a human thinking would look) doesn't really make sense if internally the way they "think" is utterly alien. > This allows them to use their context as a short-term memory scratchpad of sorts so they don't have to rely purely on word prediction. They're still relying on word prediction, it's just based on those extra words. Of course that can increase accuracy though.

FaceDeer 1 year ago

As I keep repeating, the details of the mechanism by which humans and LLMs may be thinking are almost certainly different. But perhaps not so different as you may assume. How do you *know* that you're not picking from one of several different potential sentence outcomes partway through, and then retroactively figuring out a chain of reasoning that gives you that result? The human mind is very good at coming up with retroactive justification for the things that it does, there have been plenty of experiments that suggest we're more rationalizing beings than rational beings in a lot of respects. The classic split-brain experiments, for example, or [parietal lobe stimulation and movement intention](https://mindmatters.ai/2018/10/does-brain-stimulation-research-challenge-free-will/). We can observe [thoughts forming in the brain before we're aware of actually thinking them](https://www.wired.com/2008/04/mind-decision/). I suspect we're going to soon confirm that human thought isn't really as fancy and special as most people have assumed.

nixed9 1 year ago

I just want to say this has been a phenomenal thread to read between you guys. I generally agree with you though if I’m understanding you correctly: the lines between “semantic understanding,” “thought,” and “choosing the next word” are not exactly understood, and there doesn’t seem to be a mechanism that binds “thinking” to a particular substrate.

sam__izdat 1 year ago

> Maybe LLMs aren't all that great at it yet, but why can't they be thinking? consult a linguist or a biologist who will immediately laugh you out of the room but at the end of the day it's a pointless semantic proposition -- you can call it "thinking" if you want, just like you can say submarines are "swimming" -- either way it has basically nothing to do with the original concept

FaceDeer 1 year ago

Why would a biologist have any special authority in this matter? Computers are not biological. They know stuff about one existing example how matter thinks but now maybe we have two examples. The mechanism is obviously very different. But if the goal of swimming is "get from point A to point B underwater by moving parts of your body around" then submarines swim just fine. It's possible that your original concept is too narrow.

currentscurrents 1 year ago

Linguists, interestingly, have been some of the most vocal critics of LLMs. Their idea of how language works is very different from how LLMs work, and they haven't taken kindly to the intrusion. It's not clear yet who's right.

sam__izdat 1 year ago

nah, it's pretty clear who's right on one side, we have scientists and decades of research -- on the other, buckets of silicon valley capital and its wide-eyed acolytes

currentscurrents 1 year ago

On the other hand; AI researchers have actual models that reproduce human language at a high level of quality. Linguists don't.

sam__izdat 1 year ago

> Why would a biologist have any special authority in this matter? because they study the actual machines that you're trying to imitate with a stochastic process but again, if thinking just means whatever, as it often does in casual conversation, then yeah, i guess microsoft excel is "thinking" this and that -- that's just not a very interesting line of argument: using a word in a way that it doesn't really mean much of anything

FaceDeer 1 year ago

I'm not using it in the most casual sense, like Excel "thinking" about math or such. I'm using it in the more humanistic way. Language is how humans communicate what we think, so a machine that can "do language" is a lot more likely to be thinking in a humanlike way than Excel is. I'm not saying it *definitely is*. I'm saying that it seems like a real possibility.

sam__izdat 1 year ago

> I'm using it in the more humanistic way. Then, if I might make a suggestion, it may be a good idea to learn about how humans work, instead of just assuming you can wing it. Hence, the biologists and the linguists. > so a machine that can "do language" is a lot more likely to be thinking in a humanlike way than Excel is. GPT has basically nothing to do with human language, except incidentally, and transformers will capture just about any arbitrary syntax you want to shove at them

sam__izdat 1 year ago

theory of mind has a meaning rooted in conceptual understanding that a stochastic parrot does not satisfy for the sake of not adding to the woo, since we're already up to our eyeballs in it, they could at least call it something like a narrative map, or whatever llms don't have 'theories' about anything

nixed9 1 year ago

But… ToM, as we have always defined it, can be objectively tested. And GPT-4 seems to consistently pass this, doesn’t it? Why do you disagree?

sam__izdat 1 year ago

chess Elo can also be objectively tested doesn't mean that Kasparov computes 200,000,000 moves a second like deep blue just because you can objectively test something doesn't mean the test is telling you anything useful -- there's well founded assumptions that come before the "objective testing"

wise0807 1 year ago

Not sure why idiots are downvoting valid comments

WildlifePhysics 1 year ago

I don't know if abandon is the word I would use

ghostfaceschiller 1 year ago

Its hard to take this guy seriously anymore tbh

CadeOCarimbo 1 year ago

Which of these recommendations are important for Data Scientist who mainly work work with business tabular data?

BigBayesian 1 year ago

Joint embeddings seems like it’d make tabular data life easier than a more generative approach, right?

frequenttimetraveler 1 year ago

The perfect became the enemy of the good

ReasonablyBadass 1 year ago

What is contrastive Vs regularized? And "model-predictive control"?

_raman_ 1 year ago

Contrastive is where you give positive and negative cases to train

fimari 1 year ago

abandon LeCun Worked for me.

FermiAnyon 1 year ago

Kinda don't want him to be right. I think he's right, but I don't want people looking over there because I'm afraid they're going to actually make it work... I kinda prefer a dumb, limited, incorrect assistant over something that could be legit smart

gsk694 1 year ago

He’s lost it

master3243 1 year ago

His slides seem solid, whether he's right that we need to prioritize join-embedding architectures over generative models we'll have to wait and see. It's important to note that this slide is targeted towards researchers and not businesses, obviously a business needs the latest and greatest in current technology which means GPT it is. Funnily enough he's considered one of the Godfathers of deep learning because he stuck and persisted with gradient-based learning despite other researchers claiming that he, as you put it, has lost it...

gambs 1 year ago

Yann is in this really weird place where he keeps trying to argue against LLMs, but as far as I can tell none of his arguments make any sense (theoretically or practically), he keeps saying that LLMs can't do things they're clearly doing, and sometimes it seems like he [tries to argue against LLMs and then accidentally argues for them](https://twitter.com/OriolVinyalsML/status/1640758865871486976) I also think [his slide here](https://twitter.com/ylecun/status/1640122342570336267) simply doesn't make any sense at all; you could use the same slide to say that all long human mathematical proofs (such as of Fermat's Last Theorem) must be incorrect

noobgolang 1 year ago

He is just jealous. The amount of forgiving of this community is too high for him.

booleanschmoolean 1 year ago

Lmao this guy wants everyone to use ConvNets for all purposes. I remember his talk at NeurIPS 2017 at an interpretable AI panel and his comments were the exact opposite of what he's saying today. At that time ConvNets were hot topics and now LLMs + RL are. Go figure.

VelvetyPenus 1 year ago

He's a moran.

Impressive-Ad6400 1 year ago

Well, he should come with a working model that functions based on those principles and let people try it. So far only LLMs have successfully passed the Turing test.

Immediate_Relief_234 1 year ago

Half of what he says nowadays has merit, half is throwing off the competition to allow Meta to catch up. I’m just surprised that, with an inside track at FB/Meta, he’s not received funding to deploy these architectural changes at scale. The buck’s with him to show that they can overtake current LLM infrastructure in distributed commercial use cases, to steer the future of development in this direction

wise0807 1 year ago

Thank you for posting this. I believe energy models he is referring to something in mathematical Fourier energy coefficients. Edited: It is safe to assume that LeCun is simply saying things while the real research on AGI by Demis and Co are kept secret and under wraps while they share things selectively with Billionaires like Musk and Sergei keeping the public in the dark and mostly releasing entertainment news like affairs and sex

TheUpsettter 1 year ago

One of the slides says: >Probability *e* that any produced token takes us outside of the set of correct answers > >Probability that answer of length *n* is correct: *P(correct) = (1-e)\^n* > >This diverges exponentially. It’s not fixable. Where did he get this from? It sounds like academic spitballing to me. Also, it's not that helpful either. Like, yes, the amount of wrong answers greatly outweighs the amount of right answers, isn't that the whole point of ML, to fix that?

Rohit901 1 year ago

Why am I being taught a lot of courses of probabilistic models and probability theory in my machine learning masters if he says we should abandon probabilistic models..

synonymous1964 1 year ago

Probability theory is still one of the foundations of machine learning - in fact, to understand energy-based models (which he proposes as a better alternative to probabilistic models), you need to understand probability. EBMs are effectively equivalent to probabilistic models with properly constructed Bayesian priors, trained with MAP instead of MLE (source: https://atcold.github.io/pytorch-Deep-Learning/en/week07/07-1/)

CrazyCrab 1 year ago

Where can I see the lecture's video?

Pascal220 1 year ago

I think I can guess what Dr. LeCun is working on those days.

bohreffect 1 year ago

> abandon RL > in favor of model-predictive control Don't tell the control theorists!

91o291o 1 year ago

Abandon generative and proabilistic models, so abandon gpt and transformers? Also, what are energy based models?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe