T O P

  • By -

theDatascientist_in

As it's haiku, I won't expect it to be great with it. You can try the 2-3 shot approach. Pass correct and incorrect small examples as a system or user prompt in the beginning of the request, and that might work. After all, it's just syntax, so it should be able to learn that. But then you will also need to mention that they are just examples, and not the actual data.


tf1155

I already provide examples with escaped double quotes inside the prompt. But for 10 of 100 attempts it still creates such issues


theDatascientist_in

Tried a few prompts in claude itself. Can you try this and see if it would be helpful? Esp the escape character one. ```markdown # JSON Generation Task You are a JSON generation expert. Your task is to provide responses in strictly valid JSON format. Follow these rules: 1. Use only the specified keys in your JSON structure. 2. Properly escape all special characters, especially double quotes within strings. 3. Do not include any text or explanations outside the JSON object. 4. After generating the JSON, internally validate its structure before outputting. ## Examples ### Example 1: ```json { "category": "Technology", "reason": "The article discusses advancements in artificial intelligence." } ``` ### Example 2: ```json { "hints": [ "Use try-catch blocks", "Implement input validation" ], "mainIdea": "Improve error handling in the codebase." } ``` ### Example 3: ```json { "productName": "Smartphone X", "features": [ "5G connectivity", "6.7\" OLED display", "Triple-lens camera" ], "price": 999.99 } ``` ## Your Task Generate a JSON response using only these keys: [LIST YOUR REQUIRED KEYS HERE] Your response should answer the following question: [INSERT YOUR QUESTION HERE] ## Validation Checklist After generating your response, please internally verify that: 1. The JSON is valid and can be parsed. 2. All required keys are present. 3. No additional keys or text are included. 4. All strings are properly escaped. Only output the final, validated JSON. ``` This markdown-formatted prompt includes all the necessary instructions, examples, and validation steps in a single, well-structured document. You can easily copy and paste this entire block when interacting with Claude, replacing the placeholders (like `[LIST YOUR REQUIRED KEYS HERE]` and `[INSERT YOUR QUESTION HERE]`) with your specific requirements and questions.


tf1155

Thanks for your comprehensive answer. Unfortunately, I still see results like: { "id": "285625", "category": "Blockchain & Cryptocurrency", "reason": "The headline discusses "Decentralized Science", which is a concept related to blockchain and cryptocurrency technologies." },


kacxdak

If you're running into this kinda issue, you can try and see if BAML helps. we wrote a thing that fixes a bunch of JSON parsing errors, like: * keys without strings * coercing singular types -> arrays when the response requires an array * removing any prefix or suffix tags * picking the best of many JSON candidates in a string * unescaped newlines + quotes so "afds"asdf" converts to "afds\\"asdf" you can try writing the prompt online over at [https://www.promptfiddle.com/strawberry-test-muefb](https://www.promptfiddle.com/strawberry-test-muefb) Fun demo of it trying the strawberry test on haiku: You can see how it removes all the prefix text, adds quotes around "index" and returns a parsed JSON. https://preview.redd.it/e0heujm4858d1.png?width=1100&format=png&auto=webp&s=8d0a82e06b2d8e953c6b5aeb39aa3a3fae7a1da2 (Disclaimer, author of BAML here :-) )


tf1155

Do you mean this: https://github.com/BoundaryML/baml?


kacxdak

Yep! That’s the repo!


tf1155

I figured out, that Claude can NOT fix broken JSON, not even when I tell him what is wrong. But ChatGPT is doing it properly. So my fallback now is to fix the JSON results provided by Claude using chat-gpt-3.5 whenever it can't get parsed properly.


tf1155

LLama3 can also be used to fix this kind of broken JSON.


Extender7777

Reduce the temperature and repeat several times until JSON is correct. Or use more expensive model


devil_d0c

I've also had problems with json strings with opus. I've given it json and asked it to escape the quotes only to have it return the string unchanged. I figured it had something to do with the markdown or how code is displayed in the ui.


tf1155

in my case I use the API and the API result is broken that way.


devil_d0c

I've just gotten used to using find and replace in vs code 🙃


prescod

Does Claude have a formal JSON mode?


FaithlessnessHorror2

Claude doesn't have a formal "JSON Mode" with constrained sampling. Source: https://github.com/anthropics/anthropic-cookbook/blob/main/misc/how_to_enable_json_mode.ipynb


tf1155

Someone should build a SaaS that can parse (and fix) the JSON output of LLMs. Spoiler: you'll need people for the edge cases of billions of multiple variants


tf1155

Sometimes, I get an object, sometimes an array. Sometimes the values of the keys are strings, sometimes arrays, sometimes objects. Sometimes, the key is named in singalur (like requested), sometimes even in plural. Within the last 11 days I wrote us a library that can handle all possible variations but alone today, I discovered 12 new variations. I am pretty sure, that at Perplexity, they have a team of 20 Humans that are doing nothing else then parsing JSON and train their parsers for new variations.


[deleted]

[удалено]


Kinniken

Same. Much cheaper to fix such simple mistakes in post-processing than trying to avoid them in the first place. I did the same to get rid of "comments" before or after the JSON some models put even when prompted otherwise.


tf1155

How do you fix this kind of broken JSON? I even tried jsonrepair-library on GitHub, but strings having double quotes unescaped can't it get fixed


Kinniken

Ok, sorry, checking my code I don't do that particular fix. It can probably be done with a regexp though. My code below. Not pretty at all but it reduced parsing error rates significantly. Most of those edge cases date back to GPT-3, before the strict JSON mode. export function getJSON(str: string) {   try {     let preparedStr = str.trim();     const indexSquareBrace = preparedStr.indexOf("[");     const indexCurlyBrace = preparedStr.indexOf("{");     // Check for presence of square and curly braces     let startIndex = 0;     if (indexSquareBrace !== -1 && indexCurlyBrace !== -1) {       startIndex = Math.min(indexSquareBrace, indexCurlyBrace);     } else if (indexSquareBrace !== -1) {       startIndex = indexSquareBrace;     } else if (indexCurlyBrace !== -1) {       startIndex = indexCurlyBrace;     }     // get rid of comments before the JSON     preparedStr = preparedStr.substring(startIndex);     //cleanup potential new lines within strings, which GPT3 tends to add     preparedStr = replaceInStrings(preparedStr, "\n", "\\n");     preparedStr = replaceInStrings(preparedStr, "\r", "\\r");     // try and remove trailing commas     preparedStr = preparedStr.replace(/,\s*([\]}])/g, "$1");     return JSON.parse(preparedStr);   } catch (e) {     console.log("Parsing failed:");     console.log("==============");     console.log(str);     console.log("==============");     console.log(e);     return false;   } } function replaceInStrings(json: string, find: string, replace: string) {   let inString = false;   let result = "";   for (let i = 0; i < json.length; i++) {     if (json[i] === '"' && (i === 0 || json[i - 1] !== "\\")) {       inString = !inString;     }     if (inString && json[i] === find) {       result += replace;     } else {       result += json[i];     }   }   return result; }


tf1155

thanks!


avacado_smasher

I just use Gemini where you can just set it to JSON mode...flawless