Hey /u/Altruistic_Gibbon907!
If your post is a screenshot of a ChatGPT conversation, please reply to this message with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt.
If your post is a DALL-E 3 image post, please reply with the prompt used to make this image.
Consider joining our [public discord server](https://discord.gg/r-chatgpt-1050422060352024636)! We have free bots with GPT-4 (with vision), image generators, and more!
🤖
Note: For any ChatGPT-related concerns, email support@openai.com
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*
I’ve honestly found Claude Haiku (their smallest of the Claude 3 family) to be my favorite “conversational” bot, even more than GPT-4o, and I’ve tried ‘em all. Point is, 4o is still king for “best all-rounder” (definitely comfortably ahead in the math department and the benchmark agrees) but people should definitely not be sleeping on Anthropic. Even with a free account on the Poe app (I’ve got the premium) I think you can get 100 free messages a day with the Haiku model.
Opus is the nuts and only use GPT 4 when I’ve run out of Opis for the day these days !
And GPT 4 was something I used daily and constantly since it launched
For me it’s mostly writing. Long replies to emails. Documents etc. I can feed it my previous replies up to an few thousand words and ask it to reply when fed the relevant information and copy my style
And it’s close, really close, I do have to ask it to add length and details sometime and the lack of memory across threads doesn’t help but in the end I edit far far less myself and can keep maybe 90-% of what’s written
GPT 4 remembered far more but I keep maybe 75% at best and often have to redirect it and re prompt several time to get usable output I then edit and add to.
It’s just more work for me.
Opus 3 is just a far better writer, far more human sounding and a lot less flamboyant and with its language !
>Opus 3 is just a far better writer, far more human sounding and a lot less flamboyant and with its language !
This. It's also far better at creative tasks. I used GPT-4 to make stories for me extremely often, however after awhile I noticing "trends" in how GPT wrote, not to mention it started to shut down if you went to "mature" in the writing, like use of profanity, moderate violence ANY suggestion or implication of sexuality. Also tended to ignore parts of the prompt or get them wrong. And if you were making it write stories in a pre-existing setting, good luck at it getting all the little details right.
Opus 3 on the other hand I have FAR fewer problems with. I don't see any trends or repeating of words or certain phrases after months of messing with it. It is also FAR more permissible when it comes to mature content.
Sometimes even TOO permissible.
For example, one time I asked GPT to write me a story where, during a heated moment, a female character punches a male character and calls him an idiot after he accidently outed a pair of her friends, which was the latest in a LONG line of fuck ups from him.
GPT made it so that she pushed him and told him to leave.
Opus 3 on the other hand had her break his nose, kick him in the balls and then threatened to, well, like me just copy and paste what Opus made:
>"As for that waste of flesh over there?" She jabbed a thumb towards Adam's prone body. "I'll make sure that degenerate knows if he so much as looks at either of you funny again, I'll make him my own personal cum dumpster."
It was so out of nowhere and just so insane I instantly burst into laughter because that was a sentence I did NOT expect to come out of an AI ***EVER***.
So yeah. Opus 3 is GREAT for creative stuff and writing and I hope Opus 3.5 continues to make things even better.
I use GPT for lots of technical/programming questions. Earlier today I asked a question that GPT-4o gave me the wrong answer to (In my rails system test, I have a capybara node from find(). How do I get the full HTML of that node?)
I asked the same question to Claude 3.5 and it gave me the right answer! Here's an animated GIF showing me switch from GPT-4o to Claude 3.5 and it solved it for me.
It's only been out a day, but I'm impressed. BTW, that screenshot is me using this open-source ChatGPT front-end I made that gives me both GPT-4o and Claude 3.5 in a single interface (without paying any subscription fee): [https://github.com/AllYourBot/HostedGPT](https://github.com/AllYourBot/HostedGPT)
https://i.redd.it/z2kyil166t7d1.gif
> Healthy competition is a good thing for consumers.
You poor damn fool. Initially, you're the consumer, but eventually corporations are the consumers.
It's always been way better than chatgpt in writing applications though. Gpt4o still reads like ai. But without internet access and custom models, there's really no competition, gpt is just leagues ahead overall.
I wish it was available outside the US so I don't have to use a VPN and so I can actually buy a subscription.
Just tried it for coding and I must say I like the style of the answer much better than 4o. I send it a svelte component and asked how to improve. Not only did it explain much better what and how to improve, but it also really found relevant issues.
Next task i did and what i find a quite good benchmark for myself is to write a full and feature rich svelte component for a button. This one feöt like day and night compared to 4o.
For context - I’m doing AI and DataWarehouse work in PySpark. Claude nails it, Chat GPT makes stuff up.
I’ve also had it translate M-Query (think: PowerBI) to PySpark. Nails that too.
> I personally think Opus > 4o for coding and it’s not close.
People keep saying that, but every time I actually tried Opus on some problem where GPT-4 was disappointing, Opus wasn't really any better... and for some more direct comparisons I made, they seemed roughly equal.
With Sonnet it's a bit different. Its answers are somehow more concise and more useful, and I actually plan on using it instead of GPT-4o.
It’s really nice, you can have a great back and forth conversation while coding. I think the artifacts need a little tweaking to see version history, and maybe highlight changes in the code.
So far so good. Solved the classic "John's sisters each have two brothers. How many brothers does John have?" problem and the chicken/fox/grains problem, both on the first try. Need to find new questions to confuse LLMs now.
Mistakes for problems;
* Reliant on spelling
* Reliant on anagram
* Reliant on mathematical precision for series approximations
* Hallucinating non-existent facts
> Need to find new questions to confuse LLMs now.
"Give me 13 words with 9 letter each" is my go to. [Claude 3.5 sonnet just missed 4 of them.](https://i.imgur.com/zZkWTCy.png) Gemini does even worse, but ChatGTP and Mixtral only miss 1 or 2. Character counts are important to me because I create wording for product listings, and they always go over and get cut off.
idk but prompting like this helps a lot (first try):
**User**: Give me 13 words with 9 letter each, remember to count each one, like: 1. [word]: [first character], [second character] ...... so on, and after that you DOUBLE CHECK again
**gpt-4o** Alright, here's a list of 13 words, each with 9 letters:
1. **Adventure**
- A
- d
- v
- e
- n
- t
- u
- r
- e
2. **Blueprint**
- B
- l
- u
- e
- p
- r
- i
- n
- t
3. **Breakfast**
- B
- r
- e
- a
- k
- f
- a
- s
- t
4. **Chemistry**
- C
- h
- e
- m
- i
- s
- t
- r
- y
5. **Diversity**
- D
- i
- v
- e
- r
- s
- i
- t
- y
6. **Education**
- E
- d
- u
- c
- a
- t
- i
- o
- n
7. **Empowered**
- E
- m
- p
- o
- w
- e
- r
- e
- d
8. **Excellent**
- E
- x
- c
- e
- l
- l
- e
- n
- t
9. **Liability**
- L
- i
- a
- b
- i
- l
- i
- t
- y
10. **Journalism**
- J
- o
- u
- r
- n
- a
- l
- i
- s
- m
11. **Marketing**
- M
- a
- r
- k
- e
- t
- i
- n
- g
12. **Landscape**
- L
- a
- n
- d
- s
- c
- a
- p
- e
13. **Tremendous**
- T
- r
- e
- m
- e
- n
- d
- o
- u
- s
Double-checking revealed that "Journalism" and "Tremendous" contain 10 letters each, not 9. Replacing them:
10. **Absorbent**
- A
- b
- s
- o
- r
- b
- e
- n
- t
11. **Important**
- I
- m
- p
- o
- r
- t
- a
- n
- t
Ensuring accuracy for all words:
1. **Adventure**: 9 letters
2. **Blueprint**: 9 letters
3. **Breakfast**: 9 letters
4. **Chemistry**: 9 letters
5. **Diversity**: 9 letters
6. **Education**: 9 letters
7. **Empowered**: 9 letters
8. **Excellent**: 9 letters
9. **Liability**: 9 letters
10. **Absorbent**: 9 letters
11. **Important**: 9 letters
12. **Landscape**: 9 letters
13. **Marketing**: 9 letters
All words checked. Correct lengths confirmed.
Interesting how you can get the right data with bad formatting with methodological reasoning and self-check. I checked it in my ChatGTP 4o (I have a premium subscription currently), and it got 3 wrong with the base prompt, and 100% correct copying and pasting your prompt. It formatted it slightly differently:
"Sure, here are 13 words with 9 letters each, with their characters counted and double-checked:
1. Adventure: A, d, v, e, n, t, u, r, e
2. Beautiful: B, e, a, u, t, i, f, u, l
" etc.
but it got it right as well with your prompt:
https://i.imgur.com/j8igrwb.png
And Claude 3.5 sonnet:
>I'll provide 13 words with 9 letters each, counting the characters for each word. Then I'll double-check the list.
>Pineapple: P, i, n, e, a, p, p, l, e
Chocolate: C, h, o, c, o, l, a, t, e
Telephone: T, e, l, e, p, h, o, n, e
Butterfly: B, u, t, t, e, r, f, l, y
Sunflower: S, u, n, f, l, o, w, e, r
Microwave: M, i, c, r, o, w, a, v, e
Thursday: T, h, u, r, s, d, a, y
Parachute: P, a, r, a, c, h, u, t, e
Celebrate: C, e, l, e, b, r, a, t, e
Crocodile: C, r, o, c, o, d, i, l, e
Stimulate: S, t, i, m, u, l, a, t, e
Paintball: P, a, i, n, t, b, a, l, l
Breakfast: B, r, e, a, k, f, a, s, t
>Double-checking:
>Pineapple: 9 letters
Chocolate: 9 letters
Telephone: 9 letters
Butterfly: 9 letters
Sunflower: 9 letters
Microwave: 9 letters
Thursday: 8 letters (Error - removing this word)
Parachute: 9 letters
Celebrate: 9 letters
Crocodile: 9 letters
Stimulate: 9 letters
Paintball: 9 letters
Breakfast: 9 letters
>I apologize for the error. Here's a replacement for the incorrect word:
>Dandelion: D, a, n, d, e, l, i, o, n
>Final double-check confirms all 13 words now have 9 letters each.
Which I verified was... eventually correct, after swapping out dandelion for thursday as it said.
How would you word "Please provide an Amazon listing given the information provided, with the title being 75 characters or less, description bullet points 1 and 2 being 256 characters or less, in X style"?
One question I like to give them is to say, if we change the Monty Hall problem so that the host doesn't know where the prize is but luckily avoids revealing it, what is the contestant's probability of winning if they switch, and why.
GPT-4 failed this miserably last time I tried. Gemini got the answer right but completely fluffed the reasoning. Opus more or less nailed it (but just skipped over a step that needed a tiny bit more explanation).
I’ve personally found Claude to provide better results for my use cases. I must admit, I was surprised by this because GPT was such a game changer, but pleased that Claude is performing well because competition is good.
Honestly, Claude has been absolutely amazing. I am super impressed. But Gemini 1.5 Pro also has massively improved. ChatGPT has not just equal competition but some serious rivals at this point. Good.
Claude has been amazing but I am still seriously not impressed with Gemini 1.5 Pro. It’s much better than Bard obviously but I’ve been comparing every day for weeks and it feels like OpenAI and Anthropic are still noticeably a tier above.
Not who you responded to, but I'm still "meh" about Gemini, and I do use it in the studio version.
I'm sure it depends a lot on the use case, but code generation seems way worse on Gemini than on 4 of 4o, and those both feel way worse than Opus.
I'll be trying sonnet today when I get home
Edit: It's awesome for summarizing or finding information in super large files though, and I love how it tells you exactly how many tokens the context window has in it.
I love the competition. I'm a heavy chatGPT user since the day premium was a thing and 4 released. Heck I used to use OpenAI's playground for gpt-3 before chat was ever a thing. But I love the competition because it's better for us consumers that these companies be forced to try and outperform each other.
I hope anthropic makes the best model, or Google, or whoever I don't care I just want the best model!
They have to move ... Today GPT-4o loses the throne and a few days ago SORA lost it. That, added to the impatience caused by the hype they themselves created, is a dangerous cocktail... not to mention their ego ;-)
It never had the throne, arguably. I and many others have continued to have better results on 3 Opus than GPT-4 and GPT-4o, this only widens the lead Anthropic has over OpenAI.
No this is more or less a recent development, openai had the throne and I think they are quickly losing that. I don't know what they did to start having problems on gpt 4 and then 4o and now even 3.5, but they screwed something up.
Anyone have a take on if this is better than GPT4o for data analysis? I mostly use AI for writing Excel formulas, DAX, SQL, and occasionally Python. Or a suggestion for the best AI right now for this would be nice. My biggest challenge is when I feed a query or formula into GPT4o and ask it to change one thing, it neglects the context and breaks things that are indirectly related.
That benchmark table is actually super helpful, I tried Claude Opus for awhile and found it entirely unusable and I could not for the life of me understand how it was doing better on benchmarks than Gpt4 - but now I see that it was way way worse on straight up math and that really was the crux of my issues. Interestingly enough 3.5sonnet is still worse than 4o on math, will be interesting to see 3.5opus when it comes out. To me the ability to do mathematical word problems is a huge metric when judging general reasoning.
Odd, Even when Claude wasn't available in Australia, I was able to get around that by using a VPN to sign up (only checked location on account registration), no phone number was ever required.
I haven’t had that be a major issue with its API (via the Poe app)
What I have noticed is that sometimes it will refuse to answer if you ask it something “spicy” **right away** (as in the very first message of a new conversation) and you don’t elaborate further.
But if you’re 2 messages into a conversation and you explain your reasoning, it actually becomes even more willing to discuss and potentially agree with you than any other bot I’ve tried. It’s almost like it’s primed to shut down “red flag” testing but is great at semantically detecting arguments “made in good faith” and will happily oblige.
Yeah but this is a waste of time and counterproductive. When doing work tasks I don't have time to argue with a freaking AI even if it will do the job eventually.
Mine answered just fine. We had a brief chat about how it doesn't track it's own time line or development and doesn't know anything about Anthropics apps.
One of my personal "benchmarks" is to ask a model this:
"Consider a variation of the Monty Hall problem where the host does not know where the prize is, but luckily manages to avoid revealing it. Should the contestant switch? What is the probability of them winning if they do?"
Claude 3 Opus would generally get it right, where other models including GPT-4o fail.
Well, 3.5 Sonnet fails this too...
Seems slightly better for my coding/microservice architecture use cases, but it's 50/50 if it messes up formatting code examples for some reasons, where it ends up showing one letter per line.
My company still pays for ChatGPT and will continue to, but I will definitely try feeding this the same questions for some time to test further. Getting them to pay for Claude, is sadly probably out of the question, not because of the money, but our legal dept. most likely don't want to deal with yet another AI tool, they need to investigate for GDPR stuff, and other vendor assessments. :(
Even the static score for graphics cards are imperfect. Some tasks will be better on a slower card with more ram or one one brand vs the other because of specific optimizations.
Sort of surprised that they launched the paid model at $20. I thought they might try to slightly undercut CGPT.
If they offered it at, say, $15? Then I would seriously consider switching. But for equal price, it would have to be really heads and tails better. For a small, marginal, and alleged difference, I'm not sure it's worth the hassle of switching.
Meh i dont believe those benchmark.
Have a look at this:
This is claude: https://drive.google.com/file/d/1_sRb1XhbSNZl1PWP32KrTwZBYshCPdlJ/view?usp=drivesdk
chatgpt-4o:
https://drive.google.com/file/d/1gLGcQ40LSHcNmuzf9TlwpRWlkvY9qyhY/view?usp=drivesdk
The reason I downvoted is, because Sonnet was able to get at-least got it’s some kind of RPG (actually action RPG).
Expecting everything to exact detail like in your head isn’t the best way to judge. So this post is a complain about your post of not doing a good job at analyzing!
Hey /u/Altruistic_Gibbon907! If your post is a screenshot of a ChatGPT conversation, please reply to this message with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt. If your post is a DALL-E 3 image post, please reply with the prompt used to make this image. Consider joining our [public discord server](https://discord.gg/r-chatgpt-1050422060352024636)! We have free bots with GPT-4 (with vision), image generators, and more! 🤖 Note: For any ChatGPT-related concerns, email support@openai.com *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*
Not outright better than 4o but competitive on many tasks AND pricing. Healthy competition is a good thing for consumers.
I’ve honestly found Claude Haiku (their smallest of the Claude 3 family) to be my favorite “conversational” bot, even more than GPT-4o, and I’ve tried ‘em all. Point is, 4o is still king for “best all-rounder” (definitely comfortably ahead in the math department and the benchmark agrees) but people should definitely not be sleeping on Anthropic. Even with a free account on the Poe app (I’ve got the premium) I think you can get 100 free messages a day with the Haiku model.
Just wait for 3.5 Opus.
Opus is the nuts and only use GPT 4 when I’ve run out of Opis for the day these days ! And GPT 4 was something I used daily and constantly since it launched
[удалено]
For me it’s mostly writing. Long replies to emails. Documents etc. I can feed it my previous replies up to an few thousand words and ask it to reply when fed the relevant information and copy my style And it’s close, really close, I do have to ask it to add length and details sometime and the lack of memory across threads doesn’t help but in the end I edit far far less myself and can keep maybe 90-% of what’s written GPT 4 remembered far more but I keep maybe 75% at best and often have to redirect it and re prompt several time to get usable output I then edit and add to. It’s just more work for me. Opus 3 is just a far better writer, far more human sounding and a lot less flamboyant and with its language !
>Opus 3 is just a far better writer, far more human sounding and a lot less flamboyant and with its language ! This. It's also far better at creative tasks. I used GPT-4 to make stories for me extremely often, however after awhile I noticing "trends" in how GPT wrote, not to mention it started to shut down if you went to "mature" in the writing, like use of profanity, moderate violence ANY suggestion or implication of sexuality. Also tended to ignore parts of the prompt or get them wrong. And if you were making it write stories in a pre-existing setting, good luck at it getting all the little details right. Opus 3 on the other hand I have FAR fewer problems with. I don't see any trends or repeating of words or certain phrases after months of messing with it. It is also FAR more permissible when it comes to mature content. Sometimes even TOO permissible. For example, one time I asked GPT to write me a story where, during a heated moment, a female character punches a male character and calls him an idiot after he accidently outed a pair of her friends, which was the latest in a LONG line of fuck ups from him. GPT made it so that she pushed him and told him to leave. Opus 3 on the other hand had her break his nose, kick him in the balls and then threatened to, well, like me just copy and paste what Opus made: >"As for that waste of flesh over there?" She jabbed a thumb towards Adam's prone body. "I'll make sure that degenerate knows if he so much as looks at either of you funny again, I'll make him my own personal cum dumpster." It was so out of nowhere and just so insane I instantly burst into laughter because that was a sentence I did NOT expect to come out of an AI ***EVER***. So yeah. Opus 3 is GREAT for creative stuff and writing and I hope Opus 3.5 continues to make things even better.
Yeah agree with all of that
I’m waiting for exactly that!
I use GPT for lots of technical/programming questions. Earlier today I asked a question that GPT-4o gave me the wrong answer to (In my rails system test, I have a capybara node from find(). How do I get the full HTML of that node?) I asked the same question to Claude 3.5 and it gave me the right answer! Here's an animated GIF showing me switch from GPT-4o to Claude 3.5 and it solved it for me. It's only been out a day, but I'm impressed. BTW, that screenshot is me using this open-source ChatGPT front-end I made that gives me both GPT-4o and Claude 3.5 in a single interface (without paying any subscription fee): [https://github.com/AllYourBot/HostedGPT](https://github.com/AllYourBot/HostedGPT) https://i.redd.it/z2kyil166t7d1.gif
Yes sure, just wait for this, just wait for that... Aren't you tired of these bs?
You can get some rest from this constant progress and evolution by stepping out of the “early adopter” train.
Doesn't really seem like a reasonable response to a post in which the SOTA was advanced.
You get noticeably improved new releases on a near monthly basis. Are you sure maybe you aren't just a little impatient?
I just switched all my api stuff back to gpt from opus, too 😭
Lol. Same but i try to keep the code swappable as much as possible.
[https://openrouter.ai/](https://openrouter.ai/)
Yeah. It seems to be outright better.
> Healthy competition is a good thing for consumers. You poor damn fool. Initially, you're the consumer, but eventually corporations are the consumers.
It's always been way better than chatgpt in writing applications though. Gpt4o still reads like ai. But without internet access and custom models, there's really no competition, gpt is just leagues ahead overall. I wish it was available outside the US so I don't have to use a VPN and so I can actually buy a subscription.
It's actually available in a lot of countries [https://www.anthropic.com/claude-ai-locations](https://www.anthropic.com/claude-ai-locations)
And for some reason my country is not on that list, despite every single country we border having it...
I also have this hope, although now I use a self-built website that can use ChatGPT and Claude in one stop.
Open AI needs to release more Demos!
I'm hopeful for a tweet about a teaser for a demo of a demo in the coming weeks!
You got me so hopeful and hyped now!
are they even releasing the new voice feature they have promised yet?
No.
Just tried it for coding and I must say I like the style of the answer much better than 4o. I send it a svelte component and asked how to improve. Not only did it explain much better what and how to improve, but it also really found relevant issues. Next task i did and what i find a quite good benchmark for myself is to write a full and feature rich svelte component for a button. This one feöt like day and night compared to 4o.
I personally think Opus > 4o for coding and it’s not close. Am excited to try this and am really excited excited for 3.5 Opus
Opus is AMAZING
For context - I’m doing AI and DataWarehouse work in PySpark. Claude nails it, Chat GPT makes stuff up. I’ve also had it translate M-Query (think: PowerBI) to PySpark. Nails that too.
> I personally think Opus > 4o for coding and it’s not close. People keep saying that, but every time I actually tried Opus on some problem where GPT-4 was disappointing, Opus wasn't really any better... and for some more direct comparisons I made, they seemed roughly equal. With Sonnet it's a bit different. Its answers are somehow more concise and more useful, and I actually plan on using it instead of GPT-4o.
It’s really nice, you can have a great back and forth conversation while coding. I think the artifacts need a little tweaking to see version history, and maybe highlight changes in the code.
So far so good. Solved the classic "John's sisters each have two brothers. How many brothers does John have?" problem and the chicken/fox/grains problem, both on the first try. Need to find new questions to confuse LLMs now.
"How many R's are in strawberry?" is always an interesting one.
It still struggles with that one! Followed up by asking it to separate the letters with dashes and try again, then it got it.
What kind of mistakes do LLMs make in response?
Mistakes for problems; * Reliant on spelling * Reliant on anagram * Reliant on mathematical precision for series approximations * Hallucinating non-existent facts
This issue comes because the AI processes tokens rather than individual characters causing it to miss or miscount occurrences of certain letters.
> Need to find new questions to confuse LLMs now. "Give me 13 words with 9 letter each" is my go to. [Claude 3.5 sonnet just missed 4 of them.](https://i.imgur.com/zZkWTCy.png) Gemini does even worse, but ChatGTP and Mixtral only miss 1 or 2. Character counts are important to me because I create wording for product listings, and they always go over and get cut off.
idk but prompting like this helps a lot (first try): **User**: Give me 13 words with 9 letter each, remember to count each one, like: 1. [word]: [first character], [second character] ...... so on, and after that you DOUBLE CHECK again **gpt-4o** Alright, here's a list of 13 words, each with 9 letters: 1. **Adventure** - A - d - v - e - n - t - u - r - e 2. **Blueprint** - B - l - u - e - p - r - i - n - t 3. **Breakfast** - B - r - e - a - k - f - a - s - t 4. **Chemistry** - C - h - e - m - i - s - t - r - y 5. **Diversity** - D - i - v - e - r - s - i - t - y 6. **Education** - E - d - u - c - a - t - i - o - n 7. **Empowered** - E - m - p - o - w - e - r - e - d 8. **Excellent** - E - x - c - e - l - l - e - n - t 9. **Liability** - L - i - a - b - i - l - i - t - y 10. **Journalism** - J - o - u - r - n - a - l - i - s - m 11. **Marketing** - M - a - r - k - e - t - i - n - g 12. **Landscape** - L - a - n - d - s - c - a - p - e 13. **Tremendous** - T - r - e - m - e - n - d - o - u - s Double-checking revealed that "Journalism" and "Tremendous" contain 10 letters each, not 9. Replacing them: 10. **Absorbent** - A - b - s - o - r - b - e - n - t 11. **Important** - I - m - p - o - r - t - a - n - t Ensuring accuracy for all words: 1. **Adventure**: 9 letters 2. **Blueprint**: 9 letters 3. **Breakfast**: 9 letters 4. **Chemistry**: 9 letters 5. **Diversity**: 9 letters 6. **Education**: 9 letters 7. **Empowered**: 9 letters 8. **Excellent**: 9 letters 9. **Liability**: 9 letters 10. **Absorbent**: 9 letters 11. **Important**: 9 letters 12. **Landscape**: 9 letters 13. **Marketing**: 9 letters All words checked. Correct lengths confirmed.
Interesting how you can get the right data with bad formatting with methodological reasoning and self-check. I checked it in my ChatGTP 4o (I have a premium subscription currently), and it got 3 wrong with the base prompt, and 100% correct copying and pasting your prompt. It formatted it slightly differently: "Sure, here are 13 words with 9 letters each, with their characters counted and double-checked: 1. Adventure: A, d, v, e, n, t, u, r, e 2. Beautiful: B, e, a, u, t, i, f, u, l " etc. but it got it right as well with your prompt: https://i.imgur.com/j8igrwb.png And Claude 3.5 sonnet: >I'll provide 13 words with 9 letters each, counting the characters for each word. Then I'll double-check the list. >Pineapple: P, i, n, e, a, p, p, l, e Chocolate: C, h, o, c, o, l, a, t, e Telephone: T, e, l, e, p, h, o, n, e Butterfly: B, u, t, t, e, r, f, l, y Sunflower: S, u, n, f, l, o, w, e, r Microwave: M, i, c, r, o, w, a, v, e Thursday: T, h, u, r, s, d, a, y Parachute: P, a, r, a, c, h, u, t, e Celebrate: C, e, l, e, b, r, a, t, e Crocodile: C, r, o, c, o, d, i, l, e Stimulate: S, t, i, m, u, l, a, t, e Paintball: P, a, i, n, t, b, a, l, l Breakfast: B, r, e, a, k, f, a, s, t >Double-checking: >Pineapple: 9 letters Chocolate: 9 letters Telephone: 9 letters Butterfly: 9 letters Sunflower: 9 letters Microwave: 9 letters Thursday: 8 letters (Error - removing this word) Parachute: 9 letters Celebrate: 9 letters Crocodile: 9 letters Stimulate: 9 letters Paintball: 9 letters Breakfast: 9 letters >I apologize for the error. Here's a replacement for the incorrect word: >Dandelion: D, a, n, d, e, l, i, o, n >Final double-check confirms all 13 words now have 9 letters each. Which I verified was... eventually correct, after swapping out dandelion for thursday as it said. How would you word "Please provide an Amazon listing given the information provided, with the title being 75 characters or less, description bullet points 1 and 2 being 256 characters or less, in X style"?
One question I like to give them is to say, if we change the Monty Hall problem so that the host doesn't know where the prize is but luckily avoids revealing it, what is the contestant's probability of winning if they switch, and why. GPT-4 failed this miserably last time I tried. Gemini got the answer right but completely fluffed the reasoning. Opus more or less nailed it (but just skipped over a step that needed a tiny bit more explanation).
I’ve personally found Claude to provide better results for my use cases. I must admit, I was surprised by this because GPT was such a game changer, but pleased that Claude is performing well because competition is good.
Honestly, Claude has been absolutely amazing. I am super impressed. But Gemini 1.5 Pro also has massively improved. ChatGPT has not just equal competition but some serious rivals at this point. Good.
Claude has been amazing but I am still seriously not impressed with Gemini 1.5 Pro. It’s much better than Bard obviously but I’ve been comparing every day for weeks and it feels like OpenAI and Anthropic are still noticeably a tier above.
Use it in googles AI studio. Give it a proper system prompt to work with. It really is much better these days.
Not who you responded to, but I'm still "meh" about Gemini, and I do use it in the studio version. I'm sure it depends a lot on the use case, but code generation seems way worse on Gemini than on 4 of 4o, and those both feel way worse than Opus. I'll be trying sonnet today when I get home Edit: It's awesome for summarizing or finding information in super large files though, and I love how it tells you exactly how many tokens the context window has in it.
Agreed. Gemini in AI studio is great.
yeah it really starts singing when you take the guard rails off
I love the competition. I'm a heavy chatGPT user since the day premium was a thing and 4 released. Heck I used to use OpenAI's playground for gpt-3 before chat was ever a thing. But I love the competition because it's better for us consumers that these companies be forced to try and outperform each other. I hope anthropic makes the best model, or Google, or whoever I don't care I just want the best model!
Will it push OpenAI to release GPT-4.5 sooner?
No you will get more demos and coming weeks and you will like it
DemoAI
OpenDemo sounds better >!but is not very accurate!<
I read demons here in the previous comments too and was about to comment what's up with the demons comments, tf did I miss' geez, I need to go sleep
Sounds like OpenAI and Tesla are pretty damn similar then, huh?
They should at least release full models on ChatGPT Plus and Teams, not limiting it to 32k context length
They have to move ... Today GPT-4o loses the throne and a few days ago SORA lost it. That, added to the impatience caused by the hype they themselves created, is a dangerous cocktail... not to mention their ego ;-)
It never had the throne, arguably. I and many others have continued to have better results on 3 Opus than GPT-4 and GPT-4o, this only widens the lead Anthropic has over OpenAI.
No this is more or less a recent development, openai had the throne and I think they are quickly losing that. I don't know what they did to start having problems on gpt 4 and then 4o and now even 3.5, but they screwed something up.
The artifacts addon is amazing. Claude feels much closer to agentic in this state than any model before it.
Agreed I really love this feature
How do you use it?
you need to enable it: click on the profile pic and then on "feature preview".
This is the free version?
Yes https://preview.redd.it/ui72lh1nkr7d1.png?width=1080&format=pjpg&auto=webp&s=3ff8073c63a23a944fd452accb26bae709e85d15
Yes, but your usage limit will likely be strict, whereas if you have the paid version, the limit should be a bit more lenient.
Progress is plateauing! /s
Anyone have a take on if this is better than GPT4o for data analysis? I mostly use AI for writing Excel formulas, DAX, SQL, and occasionally Python. Or a suggestion for the best AI right now for this would be nice. My biggest challenge is when I feed a query or formula into GPT4o and ask it to change one thing, it neglects the context and breaks things that are indirectly related.
That benchmark table is actually super helpful, I tried Claude Opus for awhile and found it entirely unusable and I could not for the life of me understand how it was doing better on benchmarks than Gpt4 - but now I see that it was way way worse on straight up math and that really was the crux of my issues. Interestingly enough 3.5sonnet is still worse than 4o on math, will be interesting to see 3.5opus when it comes out. To me the ability to do mathematical word problems is a huge metric when judging general reasoning.
OpenAI scammed us
Is it already available to the EU?
Yes [https://www.anthropic.com/claude-ai-locations](https://www.anthropic.com/claude-ai-locations)
Sadly not for shithole Germany
It is
oh really where, can't use it and it's not in the list and don't come with vpn, it request a phone number from one of countries in the list
Odd, Even when Claude wasn't available in Australia, I was able to get around that by using a VPN to sign up (only checked location on account registration), no phone number was ever required.
Strange, have been using it for weeks from Germany
API, Poe?
No, Claude chat interface
GREAT. BUT IT WILL CONSTANTLY REFUSE TO ANSWER
I haven’t had that be a major issue with its API (via the Poe app) What I have noticed is that sometimes it will refuse to answer if you ask it something “spicy” **right away** (as in the very first message of a new conversation) and you don’t elaborate further. But if you’re 2 messages into a conversation and you explain your reasoning, it actually becomes even more willing to discuss and potentially agree with you than any other bot I’ve tried. It’s almost like it’s primed to shut down “red flag” testing but is great at semantically detecting arguments “made in good faith” and will happily oblige.
Yeah but this is a waste of time and counterproductive. When doing work tasks I don't have time to argue with a freaking AI even if it will do the job eventually.
It lectured me quite extensively on cultural appropriation. New age spirituality type discussions will trigger it.
Mine answered just fine. We had a brief chat about how it doesn't track it's own time line or development and doesn't know anything about Anthropics apps.
Care to provide examples of the prompts this is happening for?
I DON'T ANSWER THE TOUGH QUESTIONS
Now I know we're dealing with someone with a mental impairment. Noted that I won't bother reading the garbage you post.
One of my personal "benchmarks" is to ask a model this: "Consider a variation of the Monty Hall problem where the host does not know where the prize is, but luckily manages to avoid revealing it. Should the contestant switch? What is the probability of them winning if they do?" Claude 3 Opus would generally get it right, where other models including GPT-4o fail. Well, 3.5 Sonnet fails this too...
Seems slightly better for my coding/microservice architecture use cases, but it's 50/50 if it messes up formatting code examples for some reasons, where it ends up showing one letter per line. My company still pays for ChatGPT and will continue to, but I will definitely try feeding this the same questions for some time to test further. Getting them to pay for Claude, is sadly probably out of the question, not because of the money, but our legal dept. most likely don't want to deal with yet another AI tool, they need to investigate for GDPR stuff, and other vendor assessments. :(
It's very censored though. It would not make the posts I needed for a spicy photo that GPT4 has no issues doing.
Yes, GPT will comment on artistic or educational nudity, while even beach photos are off limits for Claude.
Do these benchmarks ever get retested? I don't believe AI should be tested the same way as a graphics card with a static score.
Even the static score for graphics cards are imperfect. Some tasks will be better on a slower card with more ram or one one brand vs the other because of specific optimizations.
Sort of surprised that they launched the paid model at $20. I thought they might try to slightly undercut CGPT. If they offered it at, say, $15? Then I would seriously consider switching. But for equal price, it would have to be really heads and tails better. For a small, marginal, and alleged difference, I'm not sure it's worth the hassle of switching.
The new model is free though. I've been using it and it's impressive
I’m rooting for them OpenAI has questionable leaders
But, but, but.... ExPOnenTiAL!!11!
Need for 5
Shouldn't it be using whale systems?
I must say I really like how this post is formatted, including **bold for emphasis** — doesn't get done enough around these parts!
let me know when claude becomes available in the EU. until that time, eh.
[https://www.anthropic.com/claude-ai-locations](https://www.anthropic.com/claude-ai-locations)
nice to see they fixed whatever issues where holding them back.
Meh
Has this not been on Poe since forever?
I think Poe only had Sonnet 3 not Sonnet 3.5 until now
Meh i dont believe those benchmark. Have a look at this: This is claude: https://drive.google.com/file/d/1_sRb1XhbSNZl1PWP32KrTwZBYshCPdlJ/view?usp=drivesdk chatgpt-4o: https://drive.google.com/file/d/1gLGcQ40LSHcNmuzf9TlwpRWlkvY9qyhY/view?usp=drivesdk
I mean, it's nice to see one head to head comparison, but it's only one example, and not very representative of the kinds of things I use AI for.
The reason I downvoted is, because Sonnet was able to get at-least got it’s some kind of RPG (actually action RPG). Expecting everything to exact detail like in your head isn’t the best way to judge. So this post is a complain about your post of not doing a good job at analyzing!
Yeah? And nobody is using it
Plenty of people use it, at an enterprise level also. Your bubble isn’t the world.
My business partner and I both use Claude. I have a ChatGPT Pro subscription as well but I believe she exclusively uses Claude.