Don't rock the boat
Interesting read! Do you have any read regarding best practice for Continue Pretraining?
Train for a lot of epochs and use only domain-specific data (unlike SFT where you want a mix of generic and specific data)
Interesting read! Do you have any read regarding best practice for Continue Pretraining?
Train for a lot of epochs and use only domain-specific data (unlike SFT where you want a mix of generic and specific data)