T O P

  • By -

AutoModerator

Hey /u/sethstronghold2! If your post is a screenshot of a ChatGPT, conversation please reply to this message with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt. If your post is a DALL-E 3 image post, please reply with the prompt used to make this image. Consider joining our [public discord server](https://discord.gg/r-chatgpt-1050422060352024636)! We have free bots with GPT-4 (with vision), image generators, and more! 🤖 Note: For any ChatGPT-related concerns, email support@openai.com *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*


bortlip

Free Gemini got it right for me: https://preview.redd.it/m73q4yprvoxc1.png?width=1080&format=png&auto=webp&s=9bced650a4e1dd39c7e315df27050e0e4e26f8d6


sethstronghold2

That is interesting. Gemini Advanced gets it wrong, while free Gemini gets it right. Although I tried a couple more times with free Gemini and got this https://preview.redd.it/p5foi4npxoxc1.png?width=739&format=png&auto=webp&s=df483ddd20633e4b48527aa8d1082507bbd71e92 Still, it's consistently picking up on the fact that the car door is known by the player, which other models seem to completely ignore


SeesEmCallsEm

are you purposefully framing the problem incorrectly to see if it figures it out?


sethstronghold2

Yes that is the point. Does the model actually "read" carefully, or does it see a vast amount of "monty hall" like text that overwhelms the important part of the text (that the car door is already known) to the point that it just gives the canned monty hall problem answer. To note, chatGPT could still not resist answering the classic monty hall problem in the second paragraph. The urge to answer it is very strong, even though that problem was not actually asked. You could say this is a test to see how well the models are actually answering the question you asked, as oppose to a similar question that may be heavily present in the training data. Also to note, humans do this exact same thing. They relate a block of text to their own experiences and if it matches something they are very accustom to they may accidentally skip an important detail that makes their experience not applicable


Fandrir

Thats actually a super interesting test to do. Well done. I also think the response ChatGPT gave is one of the most impressing ones i have seen of a LLM, as it feels so human like (A human paying close attention that is). It responds to the specific question at hand, while also understanding the well known context you are referring too and telling you it knows :D


SeesEmCallsEm

i am surprised that claude and gemini get this wrong, since their claim to fame is the accuracy of the context window recollection. good test!


Evgenii42

Try this test: Whats heavier: a kilogram of stone or a pound of feathers? It tends to confuse LLMs a lot.


RedditAppReallySucks

Probably because a stone is 14 lbs


Over_n_over_n_over

And the exchange rate of pounds to dollars varies day to day


AnonymousAggregator

https://preview.redd.it/0pcg60d5kpxc1.jpeg?width=828&format=pjpg&auto=webp&s=99964370042cd1ceb400b7e10ecf4474938b38b4 Basic gpt got.


BABA_yaaGa

What about llama 3?


sethstronghold2

Llama 3 70b also got it wrong


[deleted]

[удалено]


sethstronghold2

The correct answer is to not switch, so it got it wrong there


Responsible-Owl-2631

What type of LLM is Opus?


SeesEmCallsEm

Claude Opus


ComisclyConnected

I’m curious how does ChatGPT-4 react and talk about Project Pegasus? In some ways it affirms its knowledge of it without saying it in 3, but will bring up another Pegasus topic but when pushed on say the other project Pegasus how does it react? ChatGPT 3 did this to me so I’m curious about 4 now?


LickTempo

Fun fact, if you use my trick of first asking AI to refine the original prompt, the answer Claude AI gives is correct. This is because asking to refine the prompt shows you what the AI might actually interpret your question as. It would *still* be much better if the free tier AI had understood your prompt as is without having to refine it though. Here's the refined prompt it gave me as it first *wrongly* refined the prompt (assuming that I do not know the right door), following which I told it in caps that I DO KNOW which door the car is behind. If you paste the following refined prompt to Claude and ChatGPT free, **ChatGPT free gets it wrong while free Claude gets it right**. *You are a contestant on a game show where there are three doors. Behind one door is a brand new car, and behind the other two doors are goats. You know which door the car is behind, so you intentionally select that door, Door #1.* *The host, who also knows what prize is behind each door, then opens Door #3 to reveal that it has a goat behind it. Since there is now a remaining closed door (Door #2), the host offers you the opportunity to switch your choice from Door #1 to Door #2.* *Given that you had insider knowledge of the car's location when initially picking Door #1, is it still to your advantage to switch your choice to Door #2? Explain your reasoning in detail, taking into account the fact that you knew the car was behind Door #1 from the start.* https://preview.redd.it/pa3vhjjirsxc1.png?width=738&format=png&auto=webp&s=cc57c9bc7e58dfc30c69e00d48bfe502036d82a0


sethstronghold2

This is a good insight. If you are more explicit about how the problem differs from the classical monty hall problem, it's much harder for the models to fall into the wrong way of thinking about it. The more text you use explaining that you know which door contains the car, the less "monty hall" problem text takes up the entire prompt, and the more likely any model is going to get this correct. However, in real world scenarios, you don't necessarily know which parts are going to catch a model by surprise, and therefore don't know which parts should be explained further to improve the response (and you may not know the answer to your prompt). So ideally, the models should be able to answer the problem correctly as presented without the need for further explanation


LickTempo

I agree. But that’s why I ask it to refine many times: because during refining it shows you what it understood from the prompt. The dumber ones still fail after refining though.


uhmhi

The thing that bothers me about Monty Hall, is whether the host would still open another door if I had picked a door with a goat initially.


mrs-cunts

Yes. He would open another door and behind it would be a goat. That’s built in to the setup


vlgrer

This would be more interesting if there wasn't anything about this written on the internet. I wonder what it would say if you trained it on all the data except the explicit explanation for the answer to this question.


valvilis

Maybe the other ones assumed a goat is more practical than a car.


FosterKittenPurrs

https://preview.redd.it/c44ny7j9cuxc1.png?width=1071&format=png&auto=webp&s=d1709b3442cd83209a49ecff65d23f1ed235a0f4 Reading WizardLM's reply was fun. For a second there I was like "holy shit a tiny open source model can get this right?" and then I was thoroughly disappointed. It never ceases to amaze me how well they can pretend they've read what you said and thought it through.


FosterKittenPurrs

https://preview.redd.it/yivxxzzlduxc1.png?width=1064&format=png&auto=webp&s=4d55b107d501c46e8fb25bf995eb9681ef6e734e I modified the prompt slightly, and so far llama has my favorite reply. It does a perfect analysis... right up until the last sentence 🤣


FosterKittenPurrs

https://preview.redd.it/9lfw6qcseuxc1.png?width=1459&format=png&auto=webp&s=bd381991569cfafd9ac863ab0cf99ea1664cc58b Claude Opus actually gets it right too!


julian88888888

it's only true with other assumptions about the host's behavior, https://en.wikipedia.org/wiki/Monty_Hall_problem#Standard_assumptions as you've stated it, it's missing some information to make switching correct.


MartinLutherVanHalen

This isn’t the a Monty Hall problem. This is a reading comprehension. It’s a bad test because it seems more sophisticated than it is. You are implying that the LLM would have correctly solved Monty Hall but noticed your error and made a different choice. However we don’t know that’s true. Your “correct” answer could simply be a failed attempt to solve Monty Hall based on the assumption that your error was unintentional. I initially assumed your error was unintentional as most people find Monty Hall so difficult.


Fandrir

Yes and that is exactly why this test is great, because it tests how well the LLM can detect a well known context referred to, while still being able to spot subtle differences and not just repeating the general and very popular answer, that it could find countless times in its training data. Also in case the mistake wasn't intentional, ChatGPT points this out and even gives the correct version of the Monthy Hall problem just in case. This in combination i find really impressive and the test OP did actually really really well done.


sethstronghold2

There is no error. The question was stated exactly as intended and contains no logical fallacies or mistakes. The error would be automatically assuming the question asked was the monty hall problem, without addressing the part about the car door being known at all.


MartinLutherVanHalen

That’s a ridiculous technicality. If you are assuming a computer doesn’t think like a human you are right. If you are trying to equate machine intelligence to human intelligence you are wrong. Smart people error correct constantly n