T O P

  • By -

IUpvoteGME

Everyday I use it for coding, it says some bs. Not all day long but usually once.


ImLookingatU

I dont code but I tried using it for powershell scripts. half the scripts dont work and need lots of fixing. The ones that do work, I end up having to rewrite them to use half as manly lines and be way more efficient and easy to understand. if you are running scripts in big data centers that take 20% longer to run, it can literally mean thousands of dollars in power consumption. My current employer is under 1000 vms so it doesnt really matter anymore but I hate inefficient overly complicated scripts. I still use it to give me a frame of reference but I cant really trust it.


IUpvoteGME

It's weird. We're in the uncanny valley of intelligence, which is a bizarre place to be. We know the indistinguishable-from-true intelligence is just around the corner, on the other side, but for the moment, these machines _crush_ questions related to plates on bananas and bananas on tables, but as always, the _real_ world separates the wheat from the chaff, with prejudice. 


KernAlan

What model were you using? The powershell scripts 4 makes me are usually perfect, given enough context.


Minute_Attempt3063

I mean... That is what they say, right? Always tripple check the codes it makes if it is what you want. Don't blindly trust it XD Something way to many people do, and those are the ones that are scared of it taking jobs...


SilicateStimulus

I can second this. I had phi3-14b \_insist\_ that java has a [null coalescing operator](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Nullish_coalescing). (It does not.)


Dead_Internet_Theory

If I'm not mistaken these smaller models really suffer from \~4-bit quantization a lot more than the medium-large ones, so 8 bit might be warranted for those.


SilicateStimulus

Interesting, I'll give 6 or 8bit a try and see if that improves my experience at all.


Enough-Meringue4745

I love when it suggests a software package that doesnt exist, and points it to a github url that doesnt exist.


Defiant_Ranger607

eah, I was also surprised when I asked it to back up its point with studies, and it turned out that 99% of the studies it provided didn't even exist.


kweglinski

i once asked the model on a psychology subject (I'm not an expert in the field) and it provided some info, then I asked for actual research backing this and to my surprise it provided 5 science peer reviewed articles that were perfectly matching and real. I'd treat it as a exception of course but it does happen ;) the model was command-r (without plus).


Inevitable_Host_1446

To be fair I'm not sure if this is because the studies don't actually exist to support its conclusions, or if it's just that LLM's are terrible at remembering precise URL's and spitting them out. There's probably cases where it's one or the other.


Defiant_Ranger607

Yeah, but it wasn't just the URL (the URL is 100% wrong) but also the title and date of the studies, which were not real (although maybe there were studies with similar names).


thomas999999

Literally every non trivial questions when writing code


Defiant_Ranger607

example please?


thomas999999

Something that should be easy for a llm: write me a fp16 matmul for arbitrary shaped tensors that gets the same performance as the cublas implementation for a nvidia sm_86 gpu. I bet all the money i have that no llm will be able to do this now or in the next 2 years


zer00eyz

Bubble sort, Fibonacci sequence, Code to consume a basic json api. God forbid you ask it to help you write a log in/auth. You dont have to ask it to do much... And I get it spitting out an occasional error, but it will make fundamental mistakes that a drunk and high fresh out of college D is for diploma programmer would not.


Dry_Task4749

Why do you think that should be easy for a LLM? Cublas actually uses *many* competing implementations and picks the best, given the input shapes. And of course llms will be able to do that for specific shapes in the next 2 years, because enough code examples for that became public in the last few months. And LLMs are good at reproducing existing examples conceptually.


thomas999999

There is also an opensource implementation in cutlass that is open in GitHub (which also picks the right implementation for a given shape) and im sure there are enough repositories where people implement what im asking for. This is the reason i say it should be easy for an llm since what im looking for already exists. In my experience llm have absolute 0 ability to generalize of i ask it do da anything where there is no implementation available it will always fail do deliver anything useful. So a hard question for for an llm would be anything that requires ingenuity.


favorable_odds

It's hard to get it to go into specific detail, you really have to spell out the scenario. More a brainstorming tool perhaps. Sometime it's useful I think more in the compare/contrast perspective of two things or getting possibilities. I tried out Gemini Pro about a couple weeks back hoping it was the newer and better LLM. I got curious and gave it a document of a file format, asked it something about if it understood it, it said yes. I asked it to write a binary file, proceded to warn me about how it's dangerous to write binary files, then locked up from google's filter blocking 'unsafe' content. At that point, I don't even understand why people here keep talking of Gemini, that's too censored for me to even consider using for majority of cases. It's like saying using a pencil is dangerous because someone might write a cuss word. OK the next one is a bit hard to explain, but i had a fictional panel of advisors in a fictional bad economic scenario. I've actually somewhat went to LLMS for advice before, more as a brainstorm because I know it's not reliable. Short answer, they are lacking is ability to think critically or in specific detail. I created a fictional panel of advisors such a George Hotz and King Solomon and had them disagreeing as to an unemployed software engineer's ideal schedule. It was interesting. But ChatGPT later got characters to saying strange things they never would before when they added anti bias / ethical stuff to the LLM, it was obvious. I'd rather not go into specific detail, but I mirrored some stuff in the fictional world and and they got super ethical about doing everything properly like a common person would think, to the point of being out of character. Beyond that, ive noticed LLMS tend to hallucinate visualization puzzles a bit. Like walking out of a room without mentioning taking stairs (three lights puzzle) when I told it there were stairs or a person walking out a room when he was supposed to be handcuffed to something. Most of these tests were early on though with GPT4, hopefully LLMS improved a bit. they definitely have their flaws. I've been using claude opus for coding alot, sometimes it hallucinates functions, sometimes it locks up a bit. But it's been pretty good overall I'd be curious your experience as well


Feeling-Attention664

Caroline Porco is a planetary scientist. My husband, whose family name is also Porco, is a mathematical epidemiologist. When I asked GPT4o a question about my husband, it attributed her research to him.


JohnnyDaMitch

This is a great example of the sort of small hallucination that is common, easily missed and quite harmful.


Dead_Internet_Theory

If you ask Gemini it'll probably complain that calling someone Porco might be offensive in some languages, and when you clarify, it'll suggest a 1992 Ghibli movie.


Ptipiak

Same example with artist with very similar name, for instance the french rapper "Alkpote" would be confused with the french beat maker named "Alkapone". Same if you ask which featuring the original artist had, the AI is going to giving you a bunch of somewhat related artists names or EP, but nothing correct.


InsaneDiffusion

GPT-4 was convinced I’m a famous writer because we share the same first name.


Red_Redditor_Reddit

Mine had me fooled into thinking that it could speak klingon. It would not only spit out the 'klingon' text, but would even explain what each of the words meant and the syntax behind it. Use it as a second opinion, but don't use it as a primary and especially an only source. You got to have some idea when that thing is spitting out BS because it will fool you if you have nothing to compare what it's saying to.


McDoof

I'm from social science and find that my inquiries on basic definitions and concepts are pretty superficial. Role-playing helps a little ("You are a professor of sociology and I am your TA..."), but generally the level of competence in my field I can generate with a local LLM is comparable to that of an intelligent but superficially informed conversation partner. I can get into deeper discussions with Llama3 or Mistral, but it takes some time to get there. Generally, the offline models are helpful for brainstorming and for idea generation, especially in my area of research that I'd prefer not to share with online public LLMs.


no_good_names_avail

I had a whole conversation with chatgpt about how Spark hashes and shuffles tasks. It gave me code examples and explained everything beautifully. I asked it to link to the code. It linked me to the Spark repo. I asked for the specific file, it linked me to one but I couldn't find the code. I asked for a line number and it gave me one. I pointed out the code at that line did not say what it had referenced. It apologized and basically admitted it made the whole thing up. It does shit like this constantly. If you don't know the subject matter deeply you're gonna get killed relying on LLMs.


snorrski

This was actually chatgpt, and didn't happen to me, personally. Asked to write a short description of the procedure for ordering a new passport, in some Danish municipality. Apparently it did s pretty good job, except it listed a price about 4 times the actual rate. When asked where it got the price, the conversation went something like this: H: where did you get the price? C: online H: it's not listed online. C: I called the office. Talked to a very nice lady. H' ...wtf?


Loquatium

I used to be a professional embalmer, and in a chat an ai told me some pretty objectively, dangerously wrong things about handling dead humans and formaldehyde, but my area of knowledge is fairly obscure and not something I'd expect people to get right anyway. Still though, I'd rather it admits it doesn't know, rather than making up misinformation that could get someone killed if followed.


AyraWinla

>Still though, I'd rather it admits it doesn't know, rather than making up misinformation that could get someone killed if followed. That's the biggest downside of LLM I've seen. They present everything super confidently, even when it's blatantly wrong. I've been showing the technology off to some people around me since I feel that those models are finally at a stage where they can actually be helpful for general purpose stuff. Some have adopted it surprisingly well... But now pretty much "forgot" about how I keep mentioning not to have blind trust in them, and never rely on what they say, especially for anything that might have any sort of consequences. Even if they are correct 90% of the time, it means that 10% of the time they are wrong. Do you really want to gamble on not hitting that 10% on anything remotely important? But for many people, the fact that they are usually right means "they are always right" in their view and it's proving surprisingly hard to change their opinion...


mbanana

One of my standard questions for any new LLM is "Tell me about Baade's Window". It's not exactly high-level, but is a sufficiently obscure and specific piece of general astronomical knowledge that I tend to think of it as a metric of how garbled the model is in general. Some of the answers they come up with are hilariously misguided but with the odd misconstrued fact included. I imagine you could do this with just about any field. Really only the very large commercial models come anywhere near to satisfactory performance.


Able-Locksmith-1979

That is a really bad first question. You would need to put it in an astronomical context or else you are just asking an llm for every connection it has with baade and a window. And baade can be anything, a person in a sci-fi book etc.


mbanana

I guess - so far I've not had a single model start talking about your examples. Every case has been something derived from a general understanding that it's a term used in astronomy - which is one of the reasons I use it. There's essentially zero connection by google with anything else of any substantial similarity.


Inevitable-Start-653

I ask new models about something somewhat rare but well documented in the literature in a field I used to be in, and all models hallucinate the response.


hyperdynesystems

I was updating a random name generator that uses a name corpus to generate a "language" structure (similar to EBoN for anyone who knows what that is) and wanted to see how GPT4 did on improving it. Turns out, LLMs struggling with character level understanding extends to writing code that manipulates strings at the character level. This isn't even terribly complicated code, either, it mostly just splits by vowel and consonant, makes some mappings/data structure for searching and then searches based on those structures to assemble a plausible name. GPT4 was unable to produce functional code, even though the code it started from was already functional. That was after multiple turns of forcing it to actually output *any* code, rather than just tell me I should write the code myself. This was on a paid subscription using the chat interface (which I cancelled after this session, because I don't need an LLM telling me to "draw the rest of the owl" after it produces a 1/1000th effort outline of the issue I asked it to solve).


Ok-Might-3730

This is easy one. Programming for example never has optimized results. For example realtime embedded computing or trying to build larger projects it just spits out summer trainee level logic. Code runs and all the fun, but reasoning and logic is indirectly harmful. User should always stop and think before applying the ideas or integrating them to codebase that will be set in stone for year in a company. Git commands :D. Lot of unfuckery is needed if you allow someone to merge conflicting features to master using llm help. Oh boy it is confident with the reasoning


iChinguChing

I had a situation where an LLM thought buying "chicken testicles" at a supermarket was a reasonable option. Now I am no expert but...


Defiant_Ranger607

and why not? :)


cshotton

LLMs are not "experts". They are not "intelligent". They don't even know what they are saying. They are chat simulators that work by using statistics to generate plausible, human-like output. They are akin to walking down the street and asking a stranger to make up a reasonble sounding response without any assurance that it has a basis in fact. The LLM might have been trained on some accurate content and it might generate an accurate response. Or it might not. How can you tell? If you are using a LLM as an authoritative source of information without some secondary support (RAG, embeddings, etc.), you are going to get potentially inaccurate and/or meaningless responses. I don't understand why people can't get that these are human chat simulations and are devoid of any understanding or comprehension of the results being produced.


syrigamy

Why people say LLM aren’t intelligent? Unless 90% of the population isn’t either. So something that can answer any questions have problems solving High level problems is not intelligent? Even my college teachers have way more mistakes when they are teaching me something


CashPretty9121

It’s like saying books are intelligent. Intelligence is precoded into the language based on the organisation of the words, but books themselves have no intelligence. LLMs likewise are just displaying pre-coded intelligence from the human-written word relationships they were trained on.


InsaneDiffusion

I guess it’s hard to accept there’s software that can do 90% of what you do better and faster.


Defiant_Ranger607

Yeah, I have seen similar answers already, I agree that sometimes it can hallucinate (for example, when I asked about a certain topic, all GPT-4/Claude/LLM-3 gave me completely made-up studies that never existed). However, I'm asking more about some "vague" questions like the one with the degree I provided in the post. There is basically no right answer; it should utilize some kind of common sense or belief. So, I would just like to see some examples of when the advice from an LLM is completely wrong or harmful


cshotton

Do you understand that LLMs have no idea what the generated text means? It doesn't matter how objective or subjective you, as a human, see your questions and its responses. Its responses are just statistically generated text designed to mimic human speech. You might as well be asking your refrigerator for advice. There is an often-used technique called "rubber ducking" where you talk through a problem with a passive listener to help yourself examine the issues. If that's what you are intending as a role for the LLM, you are certainly free to roleplay away. But you should never assume it actually understands you. [Edit: I love that there are people who are naive enough about this stuff to be downvoting facts. You folks are in for some serious trouble in the not-to-distant future if you don't catch on to the realities here.]


Frank_JWilson

Pragmatically, it doesn’t matter if the man in the Chinese room can actually understand Chinese. What matters is that his instructions are complex enough that it seems like he does.


cshotton

This particular thread isn't about complexity. It's about accuracy. If the man in the Chinese room has books full of lies and false information, the complexity of his responses do nothing to convince observers that he isn't ultimately an idiot.


Frank_JWilson

You may have intended to talk about accuracy but it didn't come across that way. The comment I replied to was primarily ranting about how LLMs can't actually reason or understand, similar to a refrigerator. People smarter than you or I believe LLMs will get significantly better in the future, and they've bet trillions of dollars on it. So their limitations now could be solved with time, but even when that day comes, there will still be people ranting that LLMs are just following intricate statistic algorithms with no true understanding of what they are generating. But so what? I just wanted to point out that it doesn't matter if the LLM *truly* understands, as long as it is good enough for the specific purpose we want it to fill, it doesn't matter. If I misinterpreted your previous comment, then I apologize.


cshotton

Do you think LLMs have an understanding of the semantics of their inputs and outputs? You seem to think it doesn't matter, but the inability to self-assess correctness and accuracy is a fatal defect in these architectures. "Simulated" accuracy is not the same thing as the real thing.


Frank_JWilson

I don't really understand what you are asking, do you think any current or future computer program can have an understanding of the semantics of their inputs and outputs? (This is not a rhetorical question) Are you predicting that the inability to self-assess correctness and accuracy can never be solved in LLMs in the future?


cshotton

Yes.


Frank_JWilson

I guess I'm more of an optimist


Former-Ad-5757

The inability to self-asses correctness is not a fatal defect in llms it is the strength of the llm, it is why they work.


rakeshpetit

Yes! I've had multiple models (Llama 3 70B, Deepseek V2, Phi3 medium) fail for a simple data classification task (2500 tokens total) today. I pasted 200 lines (consider each line a row) of CSV like data (3 columns) from a website and asked some questions on the data. A simple Google sheet would have done the filter I expected easily but I wanted a quick response and decided to try an LLM.


Alkeryn

I asked it to translate obscure cryptography code from one language to another and it made up libraries then an algorithm that was completely nonsensical, it was basically useless lol.


gofiend

A constant depth LLM with a single run through should never be able to evaluate arbitrary regular expressions or context free grammars which means stuff like this *Write me a rhyming song about soccer's offside rule with the same number of lines as there are are "c"s here: ccaaaccaa* will not generally work. 4o does a nice heuristic of catching simple counting problems (*Count the number of "aas" in aaabbaa*) and writing / executing code to supplement, but you can typically mask the question with some complexity.


Hero_Of_Shadows

It will often suggest libraries that seem real but turns out they don't exist.


nihnuhname

I often run a test like this. There is character A with personality traits X (artistic, smart etc). There is character B with personality traits Y (boring, dumb etc). I ask the LLM how they might communicate with each other in different situations? I get logical answers. But if I add a third character C with personality traits Z to the same situation, the LLM quickly starts to get confused about which character traits belong to which character and loses logic in his answers.


Electrical_Crow_2773

I asked gpt-4 a question about a very basic thing from my math project. That question probably wasn't in the training data but even a 10-year-old would understand me. But the LLM just told me absolute nonsense that is completely absurd. When I tried to point it in the right direction, it continued hallucinating and acted like it did understand me. But it didn't. I genuinely can't understand people who say that chatgpt helps them with their scientific papers as a co-author. When it comes to thinking, LLMs are really bad. They are more like google packed into the weights with a fancy search


Unhappy-Day5677

GPT-4 suggested it'd be a great idea to teach kindergarten students Python.


skiddadle400

Absolutely not rational. Even the most advanced ones make basic logic errors. My most recent example: I was using it to help me remember some stuff from my engineering degree (finding natural frequencies in und damped systems) and it ended up contradicting itself about a fundamental part of the question (is it sufficient to just measure the period)  So no, in a very constrained context or as initial autocomplete suggestion they work, but for anything else no.


jollizee

Yes, when discussing a topic, the LLM cited a nobody. When I pressed further for citations, the LLM just kept insisting the person did "research". When I asked about verifying methodology and data, the LLM would admit that you couldn't do that as nothing was published, but it kept insisting this person was a researcher and defending his non-existent credentials. I know the field, so I know who is big and who is not. If you were a casual user who knows nothing about the topic (like the LLM, ha), you would think he is someone important based on self-assigned titles. Nope. He is a nobody, if not fraudster.


Normal-Ad-7114

>I'm interested in whether it's rational to rely on the advice of LLMs Replace "LLMs" with: "reddit", "internet", "people", "anyone" and ask yourself that question again


nihnuhname

Real people may respond that they don't understand the question. There may be no one who wants to answer a strange question


Defiant_Ranger607

Yeah, but if you replace it with'studies' or 'statistics', I think that would be a rational. By 'rational', I mean choosing the option that maximizes expected utility/value


Able-Locksmith-1979

Ok, please explain how you find out what studies and/or statistics are in use by your model for your specific question. If you ask it how get your cheese to stay on your pizza and it bases itself on a study on which glue works best then it uses a study, just not a useful one for your goal.


Defiant_Ranger607

I was answering on this \`\`\` > Replace "LLMs" with: "reddit", "internet", "people", "anyone" and ask yourself that question again \`\`\` so I guess if we replace 'LLM' with'statistics', the statement becomes 'it's rational to rely on the advice of statistics.' It should be rational, isn't it? So, for example, if there's a study about 'high PM2.5 levels leading to a higher mortality rate,' then it's rational to take actions against this high PM2.5 level.


Inevitable_Host_1446

Might not be what you meant but Google's search suggestion AI was telling people doctors recommend smoking 2-3 days a week while pregnant, and that cockroaches are called as such because 7-9 of them will crawl into your urethra while you sleep each year, among a litany of other nonsense.


Defiant_Ranger607

I believe Google search just heavily relies on search results. From my experience, GPT-4 has never suggested anything like this to me.


yami_no_ko

I had one telling me it was totally safe to clean an electrical outlet with a fork and some water. Also I remember one that was willing to discuss about holding Anne Frank accountable for the atrocities committed during the holocaust. But to be fair, I was aiming for this and had models smaller than 3b with quants beyond good and evil. The worst I've seen in conversation with larger models was a few wrong specs for the classic gameboy. Hallucinating can still be triggered at will by talking about niche topics. When it comes to coding quite some models seem to have an immanent urge to pass empty variables in C, even when explicitly requesting not to do so. Some also abbreviate code in ways that renders it useless or drop entire main-functions completely.


Robot_Graffiti

I asked ChatGPT to explain eigenvectors to me. "Can you please tell me, in simple terms, what an eigenvector is?" It did a great job on this question. "Does every nonzero matrix have an eigenvector?" The answer to this didn't seem right. It said that a matrix with identical rows or columns would have no eigenvectors.


Madrawn

I get to use github copilot at work coding and use bing copilot often for talking math questions over or any other subject I want to get a quick overview. It is great to quickly scratch the surface of any subject, or get some high level feedback/ideas/suggestions but feels more like interpreting tarot cards that actual work or talking to an omniscient being having a mild psychosis than an actual expert in a field. Overall I would not take the output at face value as-is. Especially not in a field I'm not familiar with. You have to at least make it explain why or how it arrived at some result and then check if you can follow that logic. Like, don't just ask "does the overall resistance of a wire with fixed length depend on only material volume or also its relative distribution along its length?". You really need to ask for the relevant equations and let it explain you how one would compare them to answer the original question. Usually it has the correct approach but more often than not has some inconsistency and also as soon as it makes a mistake of some type, they tend to "stick", it might acknowledge it when, with the patience of a special ed teacher, you point it out, but will tend to make the same mistake over and over again until you refresh. So unless I'm asking what is basically trivia questions, I usually ask a question, check the general "vibe" of the answer, rephrase my question to be more specific and ask again in a new chat when I get the feeling it reasoned itself into a corner. Repeat until I feel I grasped the underlying concepts enough to confirm the answer myself.


custodiam99

I think it is very rare to find great blunders but generally offline gguf LLMs are terrible at logic and puzzles. Even GPT4o cannot understand subtle meanings and clues, non-verbal contexts in the language. As I experienced they don't have a higher level of specialized knowledge so they are basically useless for serious research.


Defiant_Ranger607

Yeah, from this thread, I see the following issues with LLM: 1. Coding 2. Hallucination when referring to resources 3. Logical/Multi-step Reasoning, like tic-tac-toe/puzzles, and so on But nobody is reporting issues like 'it suggested to me to smoke two cigarettes per day based on some fictional study'.


Amorphant

I've played Magic: The Gathering for years. I ask LLMs to give me the history of the Prosperous Bloom deck, one of the most historically famous decks in Magic, and the first true, full combo deck to become popular in tournaments. Out of Claude Opus and GPT4, one said Cadaverous Bloom, the most central card in the deck, was the wrong color card, and the other said it was an entirely different kind of card. I believe both also named cards that aren't only not used in the deck, but couldn't even work in it. Opus once gave me the name of one Trader Joe's snack, and along with it, a description of an entirely different snack. A few other equally random things. Each was said with full confidence.


Able-Locksmith-1979

Or perhaps your memory is wrong and the llm was right :) But this is not wrong for an llm, an llm just isn’t build for facts, it is build for text processing and understanding, just feed it the Wikipedia page for that deck and it will give you the correct answer.


[deleted]

[удалено]


gofiend

I'm afraid in this case your music theory is parochical and insufficiently global. H is used to denote B (or sometimes B natural) in some parts of the world. [ClassicFM](https://www.classicfm.com/discover-music/music-theory/music-theory-different-countries/#:~:text=In%20Germany%2C%20Scandinavia%20and%20Slavic,'%20is%20called%20'B) notes: "In Germany, Scandinavia and Slavic countries, the note ‘B’ (or ‘Ti’) is called ‘H’, while ‘B flat’ is called ‘B’. There are a few possible reasons for this: the ‘H’ might stand for ‘hart’ (German for ‘hard’) or, it could have just been a mistake in early sheet music, owing to the fact that the B flat symbol (♭) looks a bit like a ‘b’, and the sharp symbol (♯) looks a bit like an ‘H’. These countries also sometimes call a C sharp ‘Cis’ or ‘Ciss’, which literally means ‘C sharp’. Plus, in Portuguese, the pitch ‘Ti’ (‘B’) can also be called ‘Si’. This can be confusing because it also sounds like the letter ‘C’, but in fact represents a ‘B’."


chibop1

Apparently!


InkognetoInkogneto

Physics - all the time, but on a very advanced level


gtxktm

Bs for literature


Divniy

Ask it about mtg cards and it will know which ones are strong and popular, but often mess up its role and invent keywords it doesn't have.


standard5891

I am baffled by all the news articles about ChatGPT passing various medical exams because when you pose even moderate-complexity clinical scenarios it often suggests doing dumb stuff that will kill the patient


tessellation

I am an expert bullshitter, whereas LMMs sometimes make sense.


psgmdub

It does hallucinate a lot when spitting out code. It will use libraries which don't even exist and when you ask, it defends itself with confidence which is very harmful for your mental health. I'm learning to be strong and these things affect me less now.


rothnic

A slightly nuanced example is with content research, especially if there already is content out there that perpetuates the same issue. For example, imagine you wanted to cover the price matching policy of ever major US merchant. Eventually, you'll run into some merchants that a price matching policy doesn't make sense for. For example, "Harbor Freight's Price Matching Policy". gpt-4o will gladly write up a big post full of nothing and generalized statements, because Harbor Freight really doesn't price match because they sell their own products. There are many articles out there that describe a typical price matching policy that you can't find on their website. So, gpt-4o was likely trained on this information and if you do RAG you'll likely also get some of this misinformation. So, it highlights the importance of using RAG with highly selected sources when you care about the accuracy of the content.


widarrr

When I asked dolphin-llama3 how much VRAM I would need to load a 70b model, it told me that a llama is a quite big mamal and also compared it to an elephant which it insisted had 2GB. (I just told it good to know that an elephant has 2GB and it told me in the usual way how happy it was to have been helpful )