4 Comments

This is such an awesome day post. I have been looking for a guide to finetuning. Unfortunately, most guides focus on technicalities of finetuning but no one talks about the structure of the data. Your writeup is really valuable!

Expand full comment

I'm glad you found my post useful! I totally agree that many of the AI guides out there focus on describing technical details that sound important and impressive, while neglecting key information that leads to noticeable gains in quality (such as data structure, as you said). That's the kind of information I'm trying to talk about with these posts: the useful stuff that doesn't get mentioned as much as it should. So, thank you for the comment and sub! It means a lot.

Expand full comment

Hey, thank you for explaining the details! Actually while reading an article, a question arised: does the learning method have the opportunity to start training the model as if it would already have some of it's context already loaded? To think of it, there's probably no need at all to make an AI model to learn on how to properly start a conversation with all the context filled with zeroes. Even more, there could be no need for an RP or chat model to learn on how to write a prompt for itself at all: it should work well inside a prompt, but not to learn on how to be the one who writes prompts.

Expand full comment

Yeah that's a common approach, I train my models that way. It's called "training on completions only." Basically, you specify a sequence of tokens that must appear in each training example; the model only learns to predict the tokens that come after that sequence. It's integrated into the TRL library, you can read more here: https://huggingface.co/docs/trl/main/en/sft_trainer#train-on-completions-only

While I do know some people who forego this approach and just train on the entire prompt, hoping that the inclusion of the user's name as a stopping token in whatever frontend the user is using will successfully prevent the model from playing both sides of the conversation, I prefer completion-only training because it seems to me that the model would have to use some of its neurons to learn how to write character cards given the former approach. And that doesn't seem ideal.

Thank you for reading the article!

Expand full comment