2 Comments
User's avatar
gaztrab's avatar

Interesting read! Do you have any read regarding best practice for Continue Pretraining?

Expand full comment
Evan Armstrong's avatar

Train for a lot of epochs and use only domain-specific data (unlike SFT where you want a mix of generic and specific data)

Expand full comment