🫏 How to get a "Stubborn" LLM to Follow an Output Format

Sometimes in life you have to give concessions to other people. Now you have to give them to computers, too.

Mar 19, 2024

One of the great advantages of (most) open source models has always been the relative ease with which you can get them to follow a given output format. If you just read that sentence and wondered if we’re living in the same universe, then I’ll share a prompting secret right off the bat: the key to getting consistent behavior out of smaller open source models is to give them at least two carefully-crafted few-shot examples. With that, something like Nous Mixtral will get it right 95% of the time, which is good enough if you have validation that can catch mistakes.

But unfortunately not all models can learn from examples. I typically call these “Stubborn” models due to this post I wrote about Mistral Next (large) and Mistral Medium. Basically I’m referring to model that were deliberately overtrained to make them better in chat and zero-shot settings, but inflexible, because they often “pay more attention to” their training data than the prompt. The difference between a “stubborn” model and a non-stubborn model, in my definition, is that with two or a few more few-shot examples a non-stubborn model will pick up basically everything and even directly quote the examples at times, whereas a stubborn one will often follow the patterns it was trained with, or take aspects of the given pattern, but disobey it in others. As far as I can tell stubborness is a matter of RLHF, not parameter count or SFT: Nous Hermes Mixtral is not stubborn, but the official Mixtral Instruct is.

Needless to say, for complex pipelines where you want extremely fine control over outputs, non-stubborn models are infinitely superior. To this day, Mistral Large has a far higher error rate in Augmentoolkit (probably >20%) compared to Nous Mixtral. Despite Mistral large costing 80% of GPT-4 Turbo. This may be an imprecise definition based partly on my intuition, but from experience, I think it’s real.

Anyway, if non-stubborn models are far better than stubborn ones for most professional usecases (if you know what you’re doing when it comes to examples) then why am I writing a blog post about how to prompt stubborn models? Well, sometimes in life you don’t get to use the tools you want. For instance, maybe you’re working for a client who has more Mistral credits than God, and you absolutely need to use that particular API. You can’t afford to be a stick in the mud when working in a field that reinvents itself every other day, so I recently went and figured out some principles for prompting stubborn models.

One thing that I’ve used a lot recently is the idea of repetiton. I kinda blogged about it here, and arguably this one is also about it, but this is kind-of a combination of the two principles so I’ll go over it. If you don’t want to click the links, the two principles we’re combining are: “models see bigger things easier,” and “what you repeat, will be repeated.” Prompting is like quantum theory: any superposition of two valid prompting principles is itself a valid prompting a principle. Here’s a valid prompting example:

You are an expert something-doer AI. I need you to do X Y and Z it’s very important. I know your training data told you to do ABCDEFG but please don’t.
[output format description]
User:
[input]
Assistant:
XYZ
User:
[input]
Assistant:
XYZ
User:
[input]

That’s a prompt. Sometimes the AI will be nice:

XYZ

Often it will not be:

XABCDEFG.

Goddamn it. How do you solve this when working with a stubborn model that learned more from its training dataset, where [input] corresponded to ABCDEFG?

Repetition, Repetition, Repetiton. Also, Repetition. And don’t forget, Repetiton. (get it?) If the model pays more attention to its prompt and less to its examples (but is too stupid to pick up on is telling it to do the thing once), then we’ll darn well use the prompt to tell it what we want it to do.

You are an expert something-doer AI. I need you to do X Y and Z it’s very important. I know your training data told you to do ABCDEFG but please don’t.
[output format description]
Don’t forget to do XYZ.
User:
[input]
SPECIAL NOTE: Don’t forget XYZ.
Assistant:
XYZ
User:
[input]
SPECIAL NOTE: Don’t forget XYZ.
Assistant:
XYZ
User:
[input]
SPECIAL NOTE: Don’t forget XYZ.

AI:

XYZ

Yay!

It’s simple but I’ve used this to resolve probably over a dozen issues already over many different projects with models ranging from Mistral-Large to GPT-4 Turbo. It’s one of the most powerful things you can do, I can’t believe I haven’t explicitly blogged about it yet, since this is one of the first things I realized about prompting, way back before I’d even made Augmentoolkit.

But that’s not really revolutionary, after all it’s just combining two principles. What about the titular thing of this blog post, getting a stubborn model to write with a given output format?

This one is partly inspired by a comment on a LocalLlama post. I don’t agree with everything in it, but there’s some really good stuff in there, full credit to u/LoSboccacc. They write in their comment:

Ask the model to rephrase the prompt, you will see quickly which part of the prompt misunderstood

That’s a pretty clever idea by itself, because it uses the model to debug itself. But what does this have to do with output formats? Well, if we can use the model to understand what the model is capable of, then any LLM output can give us a clue into what it “understands”. Consider that, when prompting stubborn models and trying to get them to follow our specific output format, their tendency to follow some other format (that they likely saw in their training data) is what we’re trying to override with our prompt. However, research shows that training biases cannot be fully overcome with prompting, so we’re already fighting a losing battle. And if you’re an experienced reader of mine, you’ll remember a prompting principle: if you’re fighting the model, STOP!

So what does that tangent above boil down to? If you want to find an output format a stubborn model will easily follow, see what format it uses without you asking, and borrow that. In other words: use the format the model wants to use. From my testing, it looks like this can easily get your format-following rates up to over 90% at least.

Here’s an example. Say you create a brilliant output format, and give a prompt to a model:

You are a something-doer. Do something in the following format:
x: abc
y: def
z: ghi
User:
[input]
Assistant:

But it thwarts your master-plan by doing this instead:

abc
def
ghi

What do you do? Well one solution is to throw more few-shot examples of your xyz format at it. And depending on the model, that might work. But some stubborn models are, well, stubborn. And so even with repetition and examples you might see error rates of 40% or above. Even with things like Mistral Large or GPT-4 Turbo.

In such cases, just use the format the model wants. Yes, it might not have all the clever tricks you had thought of in order to get exactly the kind of output you want. Yes, it’s kind-of annoying to have to surrender to a bunch of matrices. Yes, if you were using Nous Mixtral, this would have all been over by the second example and you could’ve gone home by now. But you’re not using Nous Mixtral, you’re using Mistral Large. So it might be better to just suck it up and use 1. 2. 3.

That’s all for this week. Hope you enjoyed the principles. Sorry for the delay.

If you want to stay ahead of the game with esoteric tricks won from hard experience at the edge of open-source prompting, consider pressing this button if you haven’t already. I appreciate it.

Thanks for reading, have a good one and I’ll see you next time!

Prompting Weekly

Discussion about this post