Tried Nemo 12B yet? That was the natural evolution for all my finetunes once I moved on from Mistral 7B. Sure, it's a bit bigger, but it has this same sponge-like effect when being trained. Extremely versatile. Only downside is the larger size, I suppose! (Though depending on the task, this may be an advantage)
Good point! I did try that way back in the day but got loss explosion, probably due to bad hyperparams. I need to spend the time to get good at this one, it's probably the natural next step -- plus the fact it's trained to produce more human-like data is likely a huge plus! Thanks for the recommendatin Gryphe!
Tried Nemo 12B yet? That was the natural evolution for all my finetunes once I moved on from Mistral 7B. Sure, it's a bit bigger, but it has this same sponge-like effect when being trained. Extremely versatile. Only downside is the larger size, I suppose! (Though depending on the task, this may be an advantage)
Good point! I did try that way back in the day but got loss explosion, probably due to bad hyperparams. I need to spend the time to get good at this one, it's probably the natural next step -- plus the fact it's trained to produce more human-like data is likely a huge plus! Thanks for the recommendatin Gryphe!
You know what they say "If it ain't broke don't fix it"
True indeed! Enough in ML breaks on its own already WITHOUT the engineer making problems for themselves
hi, how does Mistral 7B v0.3 compare to v0.2?
No real thoughts, have not done thorough enough a/b testing to confirm one way or the other