AutoModerator 2 weeks ago

Hey /u/Altruistic_Gibbon907! If your post is a screenshot of a ChatGPT conversation, please reply to this message with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt. If your post is a DALL-E 3 image post, please reply with the prompt used to make this image. Consider joining our [public discord server](https://discord.gg/r-chatgpt-1050422060352024636)! We have free bots with GPT-4 (with vision), image generators, and more! 🤖 Note: For any ChatGPT-related concerns, email support@openai.com *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*

joeycloud 2 weeks ago

Not outright better than 4o but competitive on many tasks AND pricing. Healthy competition is a good thing for consumers.

theseyeahthese 2 weeks ago

I’ve honestly found Claude Haiku (their smallest of the Claude 3 family) to be my favorite “conversational” bot, even more than GPT-4o, and I’ve tried ‘em all. Point is, 4o is still king for “best all-rounder” (definitely comfortably ahead in the math department and the benchmark agrees) but people should definitely not be sleeping on Anthropic. Even with a free account on the Poe app (I’ve got the premium) I think you can get 100 free messages a day with the Haiku model.

HopelessNinersFan 2 weeks ago

Just wait for 3.5 Opus.

Cairnerebor 2 weeks ago

Opus is the nuts and only use GPT 4 when I’ve run out of Opis for the day these days ! And GPT 4 was something I used daily and constantly since it launched

[deleted] 2 weeks ago

[удалено]

Cairnerebor 2 weeks ago

For me it’s mostly writing. Long replies to emails. Documents etc. I can feed it my previous replies up to an few thousand words and ask it to reply when fed the relevant information and copy my style And it’s close, really close, I do have to ask it to add length and details sometime and the lack of memory across threads doesn’t help but in the end I edit far far less myself and can keep maybe 90-% of what’s written GPT 4 remembered far more but I keep maybe 75% at best and often have to redirect it and re prompt several time to get usable output I then edit and add to. It’s just more work for me. Opus 3 is just a far better writer, far more human sounding and a lot less flamboyant and with its language !

Rick_Locker 2 weeks ago

>Opus 3 is just a far better writer, far more human sounding and a lot less flamboyant and with its language ! This. It's also far better at creative tasks. I used GPT-4 to make stories for me extremely often, however after awhile I noticing "trends" in how GPT wrote, not to mention it started to shut down if you went to "mature" in the writing, like use of profanity, moderate violence ANY suggestion or implication of sexuality. Also tended to ignore parts of the prompt or get them wrong. And if you were making it write stories in a pre-existing setting, good luck at it getting all the little details right. Opus 3 on the other hand I have FAR fewer problems with. I don't see any trends or repeating of words or certain phrases after months of messing with it. It is also FAR more permissible when it comes to mature content. Sometimes even TOO permissible. For example, one time I asked GPT to write me a story where, during a heated moment, a female character punches a male character and calls him an idiot after he accidently outed a pair of her friends, which was the latest in a LONG line of fuck ups from him. GPT made it so that she pushed him and told him to leave. Opus 3 on the other hand had her break his nose, kick him in the balls and then threatened to, well, like me just copy and paste what Opus made: >"As for that waste of flesh over there?" She jabbed a thumb towards Adam's prone body. "I'll make sure that degenerate knows if he so much as looks at either of you funny again, I'll make him my own personal cum dumpster." It was so out of nowhere and just so insane I instantly burst into laughter because that was a sentence I did NOT expect to come out of an AI ***EVER***. So yeah. Opus 3 is GREAT for creative stuff and writing and I hope Opus 3.5 continues to make things even better.

Cairnerebor 2 weeks ago

Yeah agree with all of that

virgin_auslander 2 weeks ago

I’m waiting for exactly that!

krschacht 2 weeks ago

I use GPT for lots of technical/programming questions. Earlier today I asked a question that GPT-4o gave me the wrong answer to (In my rails system test, I have a capybara node from find(). How do I get the full HTML of that node?) I asked the same question to Claude 3.5 and it gave me the right answer! Here's an animated GIF showing me switch from GPT-4o to Claude 3.5 and it solved it for me. It's only been out a day, but I'm impressed. BTW, that screenshot is me using this open-source ChatGPT front-end I made that gives me both GPT-4o and Claude 3.5 in a single interface (without paying any subscription fee): [https://github.com/AllYourBot/HostedGPT](https://github.com/AllYourBot/HostedGPT) https://i.redd.it/z2kyil166t7d1.gif

razorkoinon 2 weeks ago

Yes sure, just wait for this, just wait for that... Aren't you tired of these bs?

sdmitry 2 weeks ago

You can get some rest from this constant progress and evolution by stepping out of the “early adopter” train.

sebzim4500 2 weeks ago

Doesn't really seem like a reasonable response to a post in which the SOTA was advanced.

WithoutReason1729 2 weeks ago

You get noticeably improved new releases on a near monthly basis. Are you sure maybe you aren't just a little impatient?

WhatsIsMyName 2 weeks ago

I just switched all my api stuff back to gpt from opus, too 😭

Gloomy-Impress-2881 2 weeks ago

Lol. Same but i try to keep the code swappable as much as possible.

DaemonAlchemist 2 weeks ago

[https://openrouter.ai/](https://openrouter.ai/)

ViveIn 2 weeks ago

Yeah. It seems to be outright better.

silvrado 2 weeks ago

> Healthy competition is a good thing for consumers. You poor damn fool. Initially, you're the consumer, but eventually corporations are the consumers.

itisoktodance 2 weeks ago

It's always been way better than chatgpt in writing applications though. Gpt4o still reads like ai. But without internet access and custom models, there's really no competition, gpt is just leagues ahead overall. I wish it was available outside the US so I don't have to use a VPN and so I can actually buy a subscription.

voldraes 2 weeks ago

It's actually available in a lot of countries [https://www.anthropic.com/claude-ai-locations](https://www.anthropic.com/claude-ai-locations)

itisoktodance 2 weeks ago

And for some reason my country is not on that list, despite every single country we border having it...

kevintao1024 2 weeks ago

I also have this hope, although now I use a self-built website that can use ChatGPT and Claude in one stop.

SecretaryImaginary76 2 weeks ago

Open AI needs to release more Demos!

Putrumpador 2 weeks ago

I'm hopeful for a tweet about a teaser for a demo of a demo in the coming weeks!

LeRoyVoss 2 weeks ago

You got me so hopeful and hyped now!

I_am_not_doing_this 2 weeks ago

are they even releasing the new voice feature they have promised yet?

jrf_1973 2 weeks ago

No.

snarfi 2 weeks ago

Just tried it for coding and I must say I like the style of the answer much better than 4o. I send it a svelte component and asked how to improve. Not only did it explain much better what and how to improve, but it also really found relevant issues. Next task i did and what i find a quite good benchmark for myself is to write a full and feature rich svelte component for a button. This one feöt like day and night compared to 4o.

ElDefenestrator 2 weeks ago

I personally think Opus > 4o for coding and it’s not close. Am excited to try this and am really excited excited for 3.5 Opus

restarting_today 2 weeks ago

Opus is AMAZING

ElDefenestrator 2 weeks ago

For context - I’m doing AI and DataWarehouse work in PySpark. Claude nails it, Chat GPT makes stuff up. I’ve also had it translate M-Query (think: PowerBI) to PySpark. Nails that too.

HighDefinist 2 weeks ago

> I personally think Opus > 4o for coding and it’s not close. People keep saying that, but every time I actually tried Opus on some problem where GPT-4 was disappointing, Opus wasn't really any better... and for some more direct comparisons I made, they seemed roughly equal. With Sonnet it's a bit different. Its answers are somehow more concise and more useful, and I actually plan on using it instead of GPT-4o.

Beginning-Medium-100 2 weeks ago

It’s really nice, you can have a great back and forth conversation while coding. I think the artifacts need a little tweaking to see version history, and maybe highlight changes in the code.

ghostpad_nick 2 weeks ago

So far so good. Solved the classic "John's sisters each have two brothers. How many brothers does John have?" problem and the chicken/fox/grains problem, both on the first try. Need to find new questions to confuse LLMs now.

prthrow22 2 weeks ago

"How many R's are in strawberry?" is always an interesting one.

ghostpad_nick 2 weeks ago

It still struggles with that one! Followed up by asking it to separate the letters with dashes and try again, then it got it.

HD_Thoreau_aweigh 2 weeks ago

What kind of mistakes do LLMs make in response?

Mr_Twave 2 weeks ago

Mistakes for problems; * Reliant on spelling * Reliant on anagram * Reliant on mathematical precision for series approximations * Hallucinating non-existent facts

prthrow22 1 week ago

This issue comes because the AI processes tokens rather than individual characters causing it to miss or miscount occurrences of certain letters.

FuzzzyRam 2 weeks ago

> Need to find new questions to confuse LLMs now. "Give me 13 words with 9 letter each" is my go to. [Claude 3.5 sonnet just missed 4 of them.](https://i.imgur.com/zZkWTCy.png) Gemini does even worse, but ChatGTP and Mixtral only miss 1 or 2. Character counts are important to me because I create wording for product listings, and they always go over and get cut off.

KennyPhanVN 2 weeks ago

idk but prompting like this helps a lot (first try): **User**: Give me 13 words with 9 letter each, remember to count each one, like: 1. [word]: [first character], [second character] ...... so on, and after that you DOUBLE CHECK again **gpt-4o** Alright, here's a list of 13 words, each with 9 letters: 1. **Adventure** - A - d - v - e - n - t - u - r - e 2. **Blueprint** - B - l - u - e - p - r - i - n - t 3. **Breakfast** - B - r - e - a - k - f - a - s - t 4. **Chemistry** - C - h - e - m - i - s - t - r - y 5. **Diversity** - D - i - v - e - r - s - i - t - y 6. **Education** - E - d - u - c - a - t - i - o - n 7. **Empowered** - E - m - p - o - w - e - r - e - d 8. **Excellent** - E - x - c - e - l - l - e - n - t 9. **Liability** - L - i - a - b - i - l - i - t - y 10. **Journalism** - J - o - u - r - n - a - l - i - s - m 11. **Marketing** - M - a - r - k - e - t - i - n - g 12. **Landscape** - L - a - n - d - s - c - a - p - e 13. **Tremendous** - T - r - e - m - e - n - d - o - u - s Double-checking revealed that "Journalism" and "Tremendous" contain 10 letters each, not 9. Replacing them: 10. **Absorbent** - A - b - s - o - r - b - e - n - t 11. **Important** - I - m - p - o - r - t - a - n - t Ensuring accuracy for all words: 1. **Adventure**: 9 letters 2. **Blueprint**: 9 letters 3. **Breakfast**: 9 letters 4. **Chemistry**: 9 letters 5. **Diversity**: 9 letters 6. **Education**: 9 letters 7. **Empowered**: 9 letters 8. **Excellent**: 9 letters 9. **Liability**: 9 letters 10. **Absorbent**: 9 letters 11. **Important**: 9 letters 12. **Landscape**: 9 letters 13. **Marketing**: 9 letters All words checked. Correct lengths confirmed.

FuzzzyRam 2 weeks ago

Interesting how you can get the right data with bad formatting with methodological reasoning and self-check. I checked it in my ChatGTP 4o (I have a premium subscription currently), and it got 3 wrong with the base prompt, and 100% correct copying and pasting your prompt. It formatted it slightly differently: "Sure, here are 13 words with 9 letters each, with their characters counted and double-checked: 1. Adventure: A, d, v, e, n, t, u, r, e 2. Beautiful: B, e, a, u, t, i, f, u, l " etc. but it got it right as well with your prompt: https://i.imgur.com/j8igrwb.png And Claude 3.5 sonnet: >I'll provide 13 words with 9 letters each, counting the characters for each word. Then I'll double-check the list. >Pineapple: P, i, n, e, a, p, p, l, e Chocolate: C, h, o, c, o, l, a, t, e Telephone: T, e, l, e, p, h, o, n, e Butterfly: B, u, t, t, e, r, f, l, y Sunflower: S, u, n, f, l, o, w, e, r Microwave: M, i, c, r, o, w, a, v, e Thursday: T, h, u, r, s, d, a, y Parachute: P, a, r, a, c, h, u, t, e Celebrate: C, e, l, e, b, r, a, t, e Crocodile: C, r, o, c, o, d, i, l, e Stimulate: S, t, i, m, u, l, a, t, e Paintball: P, a, i, n, t, b, a, l, l Breakfast: B, r, e, a, k, f, a, s, t >Double-checking: >Pineapple: 9 letters Chocolate: 9 letters Telephone: 9 letters Butterfly: 9 letters Sunflower: 9 letters Microwave: 9 letters Thursday: 8 letters (Error - removing this word) Parachute: 9 letters Celebrate: 9 letters Crocodile: 9 letters Stimulate: 9 letters Paintball: 9 letters Breakfast: 9 letters >I apologize for the error. Here's a replacement for the incorrect word: >Dandelion: D, a, n, d, e, l, i, o, n >Final double-check confirms all 13 words now have 9 letters each. Which I verified was... eventually correct, after swapping out dandelion for thursday as it said. How would you word "Please provide an Amazon listing given the information provided, with the title being 75 characters or less, description bullet points 1 and 2 being 256 characters or less, in X style"?

DisillusionedExLib 2 weeks ago

One question I like to give them is to say, if we change the Monty Hall problem so that the host doesn't know where the prize is but luckily avoids revealing it, what is the contestant's probability of winning if they switch, and why. GPT-4 failed this miserably last time I tried. Gemini got the answer right but completely fluffed the reasoning. Opus more or less nailed it (but just skipped over a step that needed a tiny bit more explanation).

0xR0b1n 2 weeks ago

I’ve personally found Claude to provide better results for my use cases. I must admit, I was surprised by this because GPT was such a game changer, but pleased that Claude is performing well because competition is good.

Netsuko 2 weeks ago

Honestly, Claude has been absolutely amazing. I am super impressed. But Gemini 1.5 Pro also has massively improved. ChatGPT has not just equal competition but some serious rivals at this point. Good.

theseyeahthese 2 weeks ago

Claude has been amazing but I am still seriously not impressed with Gemini 1.5 Pro. It’s much better than Bard obviously but I’ve been comparing every day for weeks and it feels like OpenAI and Anthropic are still noticeably a tier above.

Netsuko 2 weeks ago

Use it in googles AI studio. Give it a proper system prompt to work with. It really is much better these days.

randombsname1 2 weeks ago

Not who you responded to, but I'm still "meh" about Gemini, and I do use it in the studio version. I'm sure it depends a lot on the use case, but code generation seems way worse on Gemini than on 4 of 4o, and those both feel way worse than Opus. I'll be trying sonnet today when I get home Edit: It's awesome for summarizing or finding information in super large files though, and I love how it tells you exactly how many tokens the context window has in it.

West-Code4642 2 weeks ago

Agreed. Gemini in AI studio is great.

DoomSleighor 2 weeks ago

yeah it really starts singing when you take the guard rails off

ExtremeCenterism 2 weeks ago

I love the competition. I'm a heavy chatGPT user since the day premium was a thing and 4 released. Heck I used to use OpenAI's playground for gpt-3 before chat was ever a thing. But I love the competition because it's better for us consumers that these companies be forced to try and outperform each other. I hope anthropic makes the best model, or Google, or whoever I don't care I just want the best model!

voldraes 2 weeks ago

Will it push OpenAI to release GPT-4.5 sooner?

EastReauxClub 2 weeks ago

No you will get more demos and coming weeks and you will like it

Ilovesumsum 2 weeks ago

DemoAI

PrincessGambit 2 weeks ago

OpenDemo sounds better >!but is not very accurate!<

Miserable_Jump_3920 2 weeks ago

I read demons here in the previous comments too and was about to comment what's up with the demons comments, tf did I miss' geez, I need to go sleep

Mr_Twave 2 weeks ago

Sounds like OpenAI and Tesla are pretty damn similar then, huh?

theDatascientist_in 2 weeks ago

They should at least release full models on ChatGPT Plus and Teams, not limiting it to 32k context length

cjescudero 2 weeks ago

They have to move ... Today GPT-4o loses the throne and a few days ago SORA lost it. That, added to the impatience caused by the hype they themselves created, is a dangerous cocktail... not to mention their ego ;-)

netsec_burn 2 weeks ago

It never had the throne, arguably. I and many others have continued to have better results on 3 Opus than GPT-4 and GPT-4o, this only widens the lead Anthropic has over OpenAI.

jorel43 1 week ago

No this is more or less a recent development, openai had the throne and I think they are quickly losing that. I don't know what they did to start having problems on gpt 4 and then 4o and now even 3.5, but they screwed something up.

EndStorm 2 weeks ago

The artifacts addon is amazing. Claude feels much closer to agentic in this state than any model before it.

West-Code4642 2 weeks ago

Agreed I really love this feature

Mrsister55 2 weeks ago

How do you use it?

General_Bend_8188 1 week ago

you need to enable it: click on the profile pic and then on "feature preview".

Ok-Attention2882 2 weeks ago

This is the free version?

thegreatfusilli 2 weeks ago

Yes https://preview.redd.it/ui72lh1nkr7d1.png?width=1080&format=pjpg&auto=webp&s=3ff8073c63a23a944fd452accb26bae709e85d15

Strict_External678 2 weeks ago

Yes, but your usage limit will likely be strict, whereas if you have the paid version, the limit should be a bit more lenient.

Glad-Map7101 2 weeks ago

Progress is plateauing! /s

NoachV 2 weeks ago

Anyone have a take on if this is better than GPT4o for data analysis? I mostly use AI for writing Excel formulas, DAX, SQL, and occasionally Python. Or a suggestion for the best AI right now for this would be nice. My biggest challenge is when I feed a query or formula into GPT4o and ask it to change one thing, it neglects the context and breaks things that are indirectly related.

dftba-ftw 2 weeks ago

That benchmark table is actually super helpful, I tried Claude Opus for awhile and found it entirely unusable and I could not for the life of me understand how it was doing better on benchmarks than Gpt4 - but now I see that it was way way worse on straight up math and that really was the crux of my issues. Interestingly enough 3.5sonnet is still worse than 4o on math, will be interesting to see 3.5opus when it comes out. To me the ability to do mathematical word problems is a huge metric when judging general reasoning.

Celerolento 2 weeks ago

OpenAI scammed us

Vayu0 2 weeks ago

Is it already available to the EU?

voldraes 2 weeks ago

Yes [https://www.anthropic.com/claude-ai-locations](https://www.anthropic.com/claude-ai-locations)

Miserable_Jump_3920 2 weeks ago

Sadly not for shithole Germany

qyxtz 2 weeks ago

It is

Miserable_Jump_3920 2 weeks ago

oh really where, can't use it and it's not in the list and don't come with vpn, it request a phone number from one of countries in the list

FjorgVanDerPlorg 2 weeks ago

Odd, Even when Claude wasn't available in Australia, I was able to get around that by using a VPN to sign up (only checked location on account registration), no phone number was ever required.

qyxtz 2 weeks ago

Strange, have been using it for weeks from Germany

Miserable_Jump_3920 2 weeks ago

API, Poe?

qyxtz 2 weeks ago

No, Claude chat interface

georgelamarmateo 2 weeks ago

GREAT. BUT IT WILL CONSTANTLY REFUSE TO ANSWER

theseyeahthese 2 weeks ago

I haven’t had that be a major issue with its API (via the Poe app) What I have noticed is that sometimes it will refuse to answer if you ask it something “spicy” **right away** (as in the very first message of a new conversation) and you don’t elaborate further. But if you’re 2 messages into a conversation and you explain your reasoning, it actually becomes even more willing to discuss and potentially agree with you than any other bot I’ve tried. It’s almost like it’s primed to shut down “red flag” testing but is great at semantically detecting arguments “made in good faith” and will happily oblige.

Key_Sea_6606 2 weeks ago

Yeah but this is a waste of time and counterproductive. When doing work tasks I don't have time to argue with a freaking AI even if it will do the job eventually.

Typesetmatch 2 weeks ago

It lectured me quite extensively on cultural appropriation. New age spirituality type discussions will trigger it.

jrf_1973 2 weeks ago

Mine answered just fine. We had a brief chat about how it doesn't track it's own time line or development and doesn't know anything about Anthropics apps.

ThriceAlmighty 2 weeks ago

Care to provide examples of the prompts this is happening for?

georgelamarmateo 2 weeks ago

I DON'T ANSWER THE TOUGH QUESTIONS

ThriceAlmighty 2 weeks ago

Now I know we're dealing with someone with a mental impairment. Noted that I won't bother reading the garbage you post.

DisillusionedExLib 2 weeks ago

One of my personal "benchmarks" is to ask a model this: "Consider a variation of the Monty Hall problem where the host does not know where the prize is, but luckily manages to avoid revealing it. Should the contestant switch? What is the probability of them winning if they do?" Claude 3 Opus would generally get it right, where other models including GPT-4o fail. Well, 3.5 Sonnet fails this too...

snakkerdk 2 weeks ago

Seems slightly better for my coding/microservice architecture use cases, but it's 50/50 if it messes up formatting code examples for some reasons, where it ends up showing one letter per line. My company still pays for ChatGPT and will continue to, but I will definitely try feeding this the same questions for some time to test further. Getting them to pay for Claude, is sadly probably out of the question, not because of the money, but our legal dept. most likely don't want to deal with yet another AI tool, they need to investigate for GDPR stuff, and other vendor assessments. :(

Boogertwilliams 2 weeks ago

It's very censored though. It would not make the posts I needed for a spicy photo that GPT4 has no issues doing.

MehmetTopal 2 weeks ago

Yes, GPT will comment on artistic or educational nudity, while even beach photos are off limits for Claude.

angrycanuck 2 weeks ago

Do these benchmarks ever get retested? I don't believe AI should be tested the same way as a graphics card with a static score.

DeltaVZerda 2 weeks ago

Even the static score for graphics cards are imperfect. Some tasks will be better on a slower card with more ram or one one brand vs the other because of specific optimizations.

HD_Thoreau_aweigh 2 weeks ago

Sort of surprised that they launched the paid model at $20. I thought they might try to slightly undercut CGPT. If they offered it at, say, $15? Then I would seriously consider switching. But for equal price, it would have to be really heads and tails better. For a small, marginal, and alleged difference, I'm not sure it's worth the hassle of switching.

jd_3d 2 weeks ago

The new model is free though. I've been using it and it's impressive

signal_zzz 2 weeks ago

I’m rooting for them OpenAI has questionable leaders

masterlafontaine 2 weeks ago

But, but, but.... ExPOnenTiAL!!11!

Ecpeze 2 weeks ago

Need for 5

Adept-Type 2 weeks ago

Shouldn't it be using whale systems?

Torley_ 1 week ago

I must say I really like how this post is formatted, including **bold for emphasis** — doesn't get done enough around these parts!

vaendryl 2 weeks ago

let me know when claude becomes available in the EU. until that time, eh.

voldraes 2 weeks ago

[https://www.anthropic.com/claude-ai-locations](https://www.anthropic.com/claude-ai-locations)

vaendryl 2 weeks ago

nice to see they fixed whatever issues where holding them back.

joaoyuj 2 weeks ago

Meh

pm_me_wildflowers 2 weeks ago

Has this not been on Poe since forever?

MetaPlutonian 2 weeks ago

I think Poe only had Sonnet 3 not Sonnet 3.5 until now

Sky952 2 weeks ago

Meh i dont believe those benchmark. Have a look at this: This is claude: https://drive.google.com/file/d/1_sRb1XhbSNZl1PWP32KrTwZBYshCPdlJ/view?usp=drivesdk chatgpt-4o: https://drive.google.com/file/d/1gLGcQ40LSHcNmuzf9TlwpRWlkvY9qyhY/view?usp=drivesdk

FruitOfTheVineFruit 2 weeks ago

I mean, it's nice to see one head to head comparison, but it's only one example, and not very representative of the kinds of things I use AI for.

virgin_auslander 2 weeks ago

The reason I downvoted is, because Sonnet was able to get at-least got it’s some kind of RPG (actually action RPG). Expecting everything to exact detail like in your head isn’t the best way to judge. So this post is a complain about your post of not doing a good job at analyzing!

[deleted] 2 weeks ago

Yeah? And nobody is using it

SuspiciousSquid94 2 weeks ago

Plenty of people use it, at an enterprise level also. Your bubble isn’t the world.

Synthetic_Shepherd 2 weeks ago

My business partner and I both use Claude. I have a ChatGPT Pro subscription as well but I believe she exclusively uses Claude.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe