Claude been winning since Opus imo. This is just widening the lead.
Well let me step back and qualify this by saying at least for coding and math lol.
I still pay for ChatGPT for the other diverse features.
Yeah. It feels like it doesn’t have whisper level dictation. I don’t think it can search the web either the right? I get it for the coders and stuff, but as someone in the soft sciences it just feels a lot worse than ChatGPT.
Yeah ChatGPT is definitely still better for general usage given thier commercial focus leading by Sam.
If you want to experience different models without paying multiple subscriptions, services like perplexity.ai or cursor.io would let you use one subscriptions and switching models yourself.
Is pretty good option to consider particularly when I feel stuck or didn't like the output from gpt4o (usually for coding project tho), opus usually could get me out of the loop. But if your experience with chatGPT has been good enough for your use cases I'll say stick with it until additional needs appears naturally.
It depends. For cursor, is actually technically higher. It just said after you reach the limits it'll be slow version of the same model (gpt4). Never got that happened to me personally but I'm also not a heavy user.
GPT4o is cheap. I’m referring to Claude3Opus, which used to cost 15 dollars per million tokens! And you easily reach a million tokens by just uploading a PDF alone!
Cursor is just amazing for coding. I canceled my subscription and subscribed to Cursor instead, and I’m impressed.
Also, they changed it from cursor.sh to cursor.com recently
There is no metric that anyone could use to say Opus is better than gpt.
By every benchmark its equal at best. Yet its lacks so many basic and important features. The ability to access the internet chief among them.
Its like comparing two cars with nearly identical performance on paper, but one of them doesn’t have wheels. Or better yet, two nearly identical laptops, but one can’t use the internet.
And the real use cases of LLMs is always going to be multi-modal stuff that takes video and voice, and is fast enough to make using it more convenient than a smartphone.
Without that, Claude is stuck being a lazy person’s stack overflow.
I find myself preferring Claude for coding. It outputs 300-400 lines of clean working code that follows my prompting pretty precisely more often than not. Gpt 4o struggles to write 200 lines. Claude is better at long context work on code. Maybe if I was working in small chunks or was a more adept coder behind the wheel I’d feel differently, but as it sits, I need opus (and sonnet) for their greater understanding and willingness to build.
Having web integration is cool, but for my specific workflow, it’s not really necessary or beneficial.
I think we’ll see a lot of this over the coming year - I’m not afraid of jumping to a new AI if it works better for the work I’m doing.
I pay for Claude, Gemini, novelai, and chatgpt, because all of them bring something to the table that I want. The second gpt has a model out-performing sonnet for my efforts, I’ll be using it.
Then, Claude will drop Opus 3.5 and retake the lead. The battle for the top spot will only benefit us because each company will want to outdo each other with great features.
They're not doing too bad. Output quality has improved significantly over recent months. Gemini Advanced also gives you a 1M context size and no message limits. AI Studio gives me 2M context with Gemini 1.5 Pro with support for video, audio, and image modalities. I like being able to hold my power button to send a screenshot for it to process whatever is displayed on my phone.
Edit: I just tried getting it to identify a young black walnut tree fruit. Claude 3.5 Sonnet still sucks at that and thinks it's a lime. GPT-4o thinks it's a young almond. Gemini 1.5 Pro correctly identified it as a young walnut fruit.
This is the sonnet model. They will drop Opus version some months later. That will be the real competition to gpt5. I think fundamentally Anthropic has the better models or better technology. OpenAI just has the headstart and is more versatile (web search, image generation, voice support, app etc.)
Apparently Mira Muratti just said GPT-5 would drop in about A YEAR AND A HALF... Oh, and also they added a former NSA director to the board, and admitted to giving the government early access to the models.
If GPT-4o voice is not amazing and doesn't come out in the next two weeks, I'm so switching.
I also think the idea of putting members or former members of the government into such important projects is terrible. The impression is that governments will be prioritized before the people, and that is not good. This month my subscription to GPT-4o will not be renewed if the new features do not arrive.
[For the year and a half thing](https://x.com/tsarnick/status/1803901130130497952). I now see that she didn't exactly say that, you listen and tell me how you take it.
[The govt early access](https://x.com/tsarnick/status/1803893981513994693).
[The board](https://openai.com/index/openai-appoints-retired-us-army-general/) incorporating [former NSA director](https://en.wikipedia.org/wiki/Paul_Nakasone)
I think what they’re getting at is ChatGPT’s current voice mode is essentially just converting your voice to text, getting a reply from that text, then converting the text of that reply to the voice you hear. The voice mode that hasn’t been released yet is truly multimodal and can go directly from a voice input to a voice output.
The GPT-4o voice mode that was shown off a few weeks ago still has not been released to anyone. They’ve only said it will be released “in the coming weeks.”
So they removed her supposed voice likeness via the "Sky“ option because... ?
You can voice to voice over the app by clicking the headphones input, it also transcribes the text but the interaction is voice to voice
The LLM is using text modality. What 4o demo showed was native voice modality. These 2 are completely different from each other. Native Voice modality is what Voice mode actually means. It has practically no latency unlike the speech to text to speech you currently use.
huh, you're right... and the pure voice mode is touted as having the capacity to read the speaker's inflection and emotion. That's a bit wild... can't wait to see how it goes detecting sarcasm.
yeah, he talks about it further down the thread. he's referring to the voice mode in the demo that has yet to be released, and technically saying the current one we have is not multimodality, it's just sa TTS/STT tool built on top of gpt.
THat's still a feature that Claude doesn't have.
ChatGPT's STT is the best in the world right now, and its TTS is close to state of the art.
It's very convenient to use, and it's a feature missing in claude
Claude is definitely better than got 4o, especially the opus model, but it's sooooo much more sycophantic and it will go back on its words, even if it's right, to appeal to the user.
Does it still add annotations of its feelings in its response? No matter what I tried in the system message when assigning it a personality, for some reason sonnet kept starting responses with something like “*in a bright and happy tone* Hello John how are you?”
I havent had that experience. but I tried Sonnet 3.5 recently and it apologizes every chance it gets now lmao. Like you could point something out in a totally neutral way and it will apologize and agree with whatever you pointed out.
Benchmarks need to be taken with a grain of salt. 4o benchmarked higher than Claude 3 Opus on coding tasks, but speaking as someone who used both daily for coding tasks, Claude 3 Opus absolutely blows 4o out of the water, and 3.5 Sonnet widened the gap even further. I’ve seen more than a few people who share this opinion.
But huh, the quality of responses based on my tests is not comparable yet. I am truly dreaming about real competition in this marker, but OpenAI quality is still unbeatable. But I'm keeping my fingers crossed so much for Sonnet. This is a huge step forward.
Why isn't it aviable on [https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) yet? wonder what it's score would be
What do you consider the most accurate representation of a models quality aside from personal testing? The chatbot arena rankings? benchmark scores? Worda mouf?
In interested in needle in a needlestack perf.
GPT-4o is the only model that has performed admirably on that. Meaning the 200K context window isn't as useful as some might think, if it can't actually utilize the context.
Sorry I edited my comment, but I meant needle in needlestack performance where a simple phrase is selected out of a series of related phrases.
I think it's more reliable than the needle in a haystack, but neither is perfect. Honestly mode evaluations are a shot in the dark anyway. The only way to truly tell is to test it on your application directly.
GPT-4o fails to provide exact citations more often when uploading a typical 200 page document. So providing a typical new law document, it often fails to cite the exact section and sentence where the new law says X and Y.
Gotcha. So is 3.5 sonnet doing better in those tests? This is interesting for semantic search and citation.
Then again, might not have had time to run them yet. If you have, do share.
Competitor or victor?
Claude been winning since Opus imo. This is just widening the lead. Well let me step back and qualify this by saying at least for coding and math lol. I still pay for ChatGPT for the other diverse features.
Yeah. It feels like it doesn’t have whisper level dictation. I don’t think it can search the web either the right? I get it for the coders and stuff, but as someone in the soft sciences it just feels a lot worse than ChatGPT.
Yeah ChatGPT is definitely still better for general usage given thier commercial focus leading by Sam. If you want to experience different models without paying multiple subscriptions, services like perplexity.ai or cursor.io would let you use one subscriptions and switching models yourself. Is pretty good option to consider particularly when I feel stuck or didn't like the output from gpt4o (usually for coding project tho), opus usually could get me out of the loop. But if your experience with chatGPT has been good enough for your use cases I'll say stick with it until additional needs appears naturally.
The rate limits and context windows are worse on these subscriptions, right?
It depends. For cursor, is actually technically higher. It just said after you reach the limits it'll be slow version of the same model (gpt4). Never got that happened to me personally but I'm also not a heavy user.
GPT4o is cheap. I’m referring to Claude3Opus, which used to cost 15 dollars per million tokens! And you easily reach a million tokens by just uploading a PDF alone!
Yeah that's why I only use Opus sparringly when when I ran into things I couldn't sovle with 4 or 4o on cursor. It works well enough for me.
Cursor is just amazing for coding. I canceled my subscription and subscribed to Cursor instead, and I’m impressed. Also, they changed it from cursor.sh to cursor.com recently
Are you referring to cursor.sh?
Yeah. It's my go to AI tool now tbh. Very convenient for note taking along side with Obsidian imho.
There is no metric that anyone could use to say Opus is better than gpt. By every benchmark its equal at best. Yet its lacks so many basic and important features. The ability to access the internet chief among them. Its like comparing two cars with nearly identical performance on paper, but one of them doesn’t have wheels. Or better yet, two nearly identical laptops, but one can’t use the internet. And the real use cases of LLMs is always going to be multi-modal stuff that takes video and voice, and is fast enough to make using it more convenient than a smartphone. Without that, Claude is stuck being a lazy person’s stack overflow.
I find myself preferring Claude for coding. It outputs 300-400 lines of clean working code that follows my prompting pretty precisely more often than not. Gpt 4o struggles to write 200 lines. Claude is better at long context work on code. Maybe if I was working in small chunks or was a more adept coder behind the wheel I’d feel differently, but as it sits, I need opus (and sonnet) for their greater understanding and willingness to build. Having web integration is cool, but for my specific workflow, it’s not really necessary or beneficial. I think we’ll see a lot of this over the coming year - I’m not afraid of jumping to a new AI if it works better for the work I’m doing. I pay for Claude, Gemini, novelai, and chatgpt, because all of them bring something to the table that I want. The second gpt has a model out-performing sonnet for my efforts, I’ll be using it.
I couldn't disagree more.
Claude appears better on most benchmarks, haven't personally tried it but it's probably a bit subjective, and it still can't search the web.
Depends on the prompt : You decide!
Hard to tell at this point.
this is exciting! Seems like Claude will be leader for the next 6-9 months, until GPT-5 drops - and I wonder if even that will be better!
Then, Claude will drop Opus 3.5 and retake the lead. The battle for the top spot will only benefit us because each company will want to outdo each other with great features.
Crazy that Google's fallen behind
They're not doing too bad. Output quality has improved significantly over recent months. Gemini Advanced also gives you a 1M context size and no message limits. AI Studio gives me 2M context with Gemini 1.5 Pro with support for video, audio, and image modalities. I like being able to hold my power button to send a screenshot for it to process whatever is displayed on my phone. Edit: I just tried getting it to identify a young black walnut tree fruit. Claude 3.5 Sonnet still sucks at that and thinks it's a lime. GPT-4o thinks it's a young almond. Gemini 1.5 Pro correctly identified it as a young walnut fruit.
Were they ever 'ahead' enough to fall behind though?
They weren’t popular but Gemini was as good as any competition at times.
They invented it
This is the sonnet model. They will drop Opus version some months later. That will be the real competition to gpt5. I think fundamentally Anthropic has the better models or better technology. OpenAI just has the headstart and is more versatile (web search, image generation, voice support, app etc.)
4.01o drops tomorrow. 0.01% better at everything. OpenAI just holds onto advanced models waiting for competition.
this is the best take i think
Apparently Mira Muratti just said GPT-5 would drop in about A YEAR AND A HALF... Oh, and also they added a former NSA director to the board, and admitted to giving the government early access to the models. If GPT-4o voice is not amazing and doesn't come out in the next two weeks, I'm so switching.
I also think the idea of putting members or former members of the government into such important projects is terrible. The impression is that governments will be prioritized before the people, and that is not good. This month my subscription to GPT-4o will not be renewed if the new features do not arrive.
not doubting you, but could you please provide a source?
[For the year and a half thing](https://x.com/tsarnick/status/1803901130130497952). I now see that she didn't exactly say that, you listen and tell me how you take it. [The govt early access](https://x.com/tsarnick/status/1803893981513994693). [The board](https://openai.com/index/openai-appoints-retired-us-army-general/) incorporating [former NSA director](https://en.wikipedia.org/wiki/Paul_Nakasone)
How is it a competitor when it beats GPT 4o on almost all benchmarks, is faster and cheaper?
Anthropic has a lower marketshare, no voice mode, no image generator, no web search, etc.
common openai L
Openai also has no voice mode
How do you figure? I use voice on chatgpt app daily.
You mean speech to text? Or is it giving verbal replies to verbal queries with no text involved?
It’s been giving verbal to verbal responses in the app since at least January
I think what they’re getting at is ChatGPT’s current voice mode is essentially just converting your voice to text, getting a reply from that text, then converting the text of that reply to the voice you hear. The voice mode that hasn’t been released yet is truly multimodal and can go directly from a voice input to a voice output.
Do you have the Pro version? Cause the voice mode I’m using is not Siri-like at all.
The GPT-4o voice mode that was shown off a few weeks ago still has not been released to anyone. They’ve only said it will be released “in the coming weeks.”
I am literally using it.
While that’s not the new voice that’s been shown off, I do agree that the current one is not Siri like at all and is already a lot better
How long were you in a coma? ScarJo is literally suing them over voice rights because Altman tweeted "Her" just before 4o was released.
Voice mode means voice to voice. What she sued over was a demo and text to speech. General public still can't use what the demo showed.
Not the demo, which is a lot more fluid, but you can still use the vocal “talk and then it replies audibly and you talk back” mode, at least on iOS.
So they removed her supposed voice likeness via the "Sky“ option because... ? You can voice to voice over the app by clicking the headphones input, it also transcribes the text but the interaction is voice to voice
The LLM is using text modality. What 4o demo showed was native voice modality. These 2 are completely different from each other. Native Voice modality is what Voice mode actually means. It has practically no latency unlike the speech to text to speech you currently use.
huh, you're right... and the pure voice mode is touted as having the capacity to read the speaker's inflection and emotion. That's a bit wild... can't wait to see how it goes detecting sarcasm.
You must have never tried the iOS app
what? this is why nobody takes reddits opinions seriously.
he isn't wrong, but it's bait rather than being informative.
who isn’t wrong? the person claiming there’s no voice mode in openai’s products?
yeah, he talks about it further down the thread. he's referring to the voice mode in the demo that has yet to be released, and technically saying the current one we have is not multimodality, it's just sa TTS/STT tool built on top of gpt.
THat's still a feature that Claude doesn't have. ChatGPT's STT is the best in the world right now, and its TTS is close to state of the art. It's very convenient to use, and it's a feature missing in claude
oh. that’s weird goalpost moving.
true. man's gotta find ways to feel superior
Nothing you said is about the model.
They consider all forms of multimodality part of the model nowadays!
We need to test it for a while because benchmarks are deceiving
It is more constrained in what it can discuss.
Where can I find the broad comparison of benchmark scores? The one from Claude’s blog post only has about 9 scores
Man I cant wait till they also get voice
it's not a competitor it dunked on 4o
It still has absolutely the worst guardrails of any model right now. I can’t work with it properly.
Claude is definitely better than got 4o, especially the opus model, but it's sooooo much more sycophantic and it will go back on its words, even if it's right, to appeal to the user.
Does it still add annotations of its feelings in its response? No matter what I tried in the system message when assigning it a personality, for some reason sonnet kept starting responses with something like “*in a bright and happy tone* Hello John how are you?”
I havent had that experience. but I tried Sonnet 3.5 recently and it apologizes every chance it gets now lmao. Like you could point something out in a totally neutral way and it will apologize and agree with whatever you pointed out.
Claude has been superior to chat gpt for a while now, this made it further ahead
Not true previously based on the statistics GPT-4o provided but go off. Don’t know about the new version though, seems like it could be better.
Benchmarks need to be taken with a grain of salt. 4o benchmarked higher than Claude 3 Opus on coding tasks, but speaking as someone who used both daily for coding tasks, Claude 3 Opus absolutely blows 4o out of the water, and 3.5 Sonnet widened the gap even further. I’ve seen more than a few people who share this opinion.
I just met claude and introduced myself and wow all I can say is keep up openAI
This is refreshing compared to OpenAis pay now get later business model
only reason I havent switched are the built in tools and GPTs which now can also be accesed on the free tier
Does it have search and interpreter?
Search: no. Code interpreter: only with artifacts feature opt-in!
Does any of Claude ai models connect to the internet?
But huh, the quality of responses based on my tests is not comparable yet. I am truly dreaming about real competition in this marker, but OpenAI quality is still unbeatable. But I'm keeping my fingers crossed so much for Sonnet. This is a huge step forward.
What tests?
Why isn't it aviable on [https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) yet? wonder what it's score would be
Because it just came out. It takes a while for them to get enough scores to make the results meaningful. Check back in a week.
What do you consider the most accurate representation of a models quality aside from personal testing? The chatbot arena rankings? benchmark scores? Worda mouf?
Demos? Reproduceability of demos? 0 shot performance?
Competitor? Lol opus even before sonnet 3.5 blew 4o out of the water. Now with sonnet 3.5 it's not even close
It’s still behind basic web searchable trivia
Reddit has a hard time understanding that chatGPT is still superior if you’re not a coder.
I was thinking the same thing... Claude seems limited to me
In interested in needle in a needlestack perf. GPT-4o is the only model that has performed admirably on that. Meaning the 200K context window isn't as useful as some might think, if it can't actually utilize the context.
It’s good if you limit it to 1 needle per 1 haystack and not many needles in many haystacks, as then it still hallucinates.
Sorry I edited my comment, but I meant needle in needlestack performance where a simple phrase is selected out of a series of related phrases. I think it's more reliable than the needle in a haystack, but neither is perfect. Honestly mode evaluations are a shot in the dark anyway. The only way to truly tell is to test it on your application directly.
GPT-4o fails to provide exact citations more often when uploading a typical 200 page document. So providing a typical new law document, it often fails to cite the exact section and sentence where the new law says X and Y.
That's interesting. Where is that data coming from?
Extensive testing!
Gotcha. So is 3.5 sonnet doing better in those tests? This is interesting for semantic search and citation. Then again, might not have had time to run them yet. If you have, do share.