How to Learn The Bleeding Edge of LLMs

Studying areas where there are no textbooks or courses.

Jul 15, 2024

How do you stay ahead in this exponential age of AI? You don’t do it purely by buying a course or subscribing to a blog (even this one). You need to move as fast as new releases come out, or even faster. So in this post, we’re not talking about a specific principle, technique, or piece of knowledge — we’re talking about how to gather and create this knowledge yourself.

We’re going to learn:

The importance and role of actually building projects
The use of at least a bit of theoretical knowledge
The value of primary sources
The critical importance of using community to learn where documentation fails
The role of practice in doing all these things better.

So let’s start.,

Projects.

How do you get good at AI?

One might turn to blogs, or videos, or other media channels talking about AI to gain information — but to an inexperienced person, a bad blog will often seem identical to a good one. And I dare say there are more bad, misinformed, or hype-focused channels than good ones. So you’re more likely than not to pick poorly and bias your knowledge in a bad direction.

Plus, even if you pick good sources, if you only watch and never do you will be left with gaps in your knowledge and uncorrected misunderstandings of how things work because your ideas never had to contend with the real world — they could safely exist inside your own idea of how things worked. You know how if you’re arguing with an imaginary person in your mind (or a group/category of imaginary people) you will basically always win? But the talking points which were so decisive in your imagination end up being, misunderstood or fall flat if you bring them out in real conversation? If you only read or watch information, but never apply it, you’ll run into the exact same thing. Your imaginary world did not correctly model the real one — how could it, you’ve never practiced there!!!

There’s nothing to tell you you’re going in the right or wrong direction when you’re just consuming rather than producing (other than your untrained intuiton and feelings) so of course you’ll get it wrong. And if you’re serious about learning, then you likely want to be more than an armchair scientist by the end of this. Either you want to create a grand vision with AI, or you hope to use it to solve some kind of problem. So start there. Pick something pretty basic that feels like it’s in the right direction, watch tutorials on its implementation, and follow along. Take notes. You could learn stuff back in school, right? Apply those same techniques here. For an example, when I started out (after I had watched a few courses on AI) my medium-term goal was to create an AI that could speak like a certain fictional character, so the first thing I did was follow along with model training tutorials. After these, I realized I needed my own datasets to create my own models, so I started prompting and building datagen pipelines… which eventually led to Augmentoolkit.

There is a tough question though, and that is the question of theoretical knowledge. One thing almost universal among AI hype gurus 🚨🚀 is that they don’t know what a neural network actually is: they lack background information. To use a programming term, they’re “script kiddies” — newcomers to a field who do stuff by running programs they don’t actually understand at all.

Thank you for reading Prompting Weekly! I’d appreciate it a lot if you shared this post :)

Q: Theoretical knowledge: when and how much?

A: when you need it, and as little as you feel comfortable with.

That is a kinda useless answer, but that’s because this is a kinda broad subject. Different people have different goals and different tastes, so they’ll feel different amounts of pressure to dot all their ‘i’s and cross all their ‘t’s when it comes to learning ML. For instance, I’m somewhat fast and loose in a lot of areas. My code’s pretty messy, as is my room; I write all my newsletter posts the day they go out, with no editing passes; for my sales calls, I have bullet points, not a script, etc. So I typically end up learning things when I feel like I need to use them in order to progress, and not a moment sooner. You might be different. Maybe back in school you had a color-coded binder for all your subjects or something.

Feelings aside, theoretical knowledge has real practical uses:

Preexisting practitioners in the field will respect you more. This is important, because they’re often the people who can teach you the most — and who can pay you the most. Really, having even tried to figure out the fundamentals will immediately set you apart from 99% of people in a second. People respect genuine effort.
You’ll be able to solve problems easier, because you’ll understand a bit of what’s going on under the hood. In a field as fast-moving as ML, this is incredibly useful, because the error messages suck and the documentation is sparse. Plus, even if the field moves fast, the fundamentals stay the same: LLMs still use neural networks, which were invented in 1943.
You have a lower chance of learning the wrong things. Sometimes things go right because of a fluke, or difficult-to-detect circumstances; and sometimes they go right because you did something right. Having the faintest idea of what is actually going on when you’re doing stuff means you’re less likely to misattribute your successes, which means you can avoid learning the wrong lessons, which means you gain knowledge rather than stockpiling intuition-founded ignorance as you do more work.
You’ll be able to pivot to different areas of the field better, and pick up new areas easier. Knowing how an LLM works under the hood helps you if you’re prompting, training, content creating, or consulting. What you do first will often not be what you’re doing at the end, so learning how things work is an investment in flexibility.
You have a chance at inventing new stuff. ML in particular has way more unexplored ideas than talent, and honestly, all that’s needed to explore some of these ideas is a high school education and some specialized knowledge you can pick up online. You can build real things if you have the right knowledge. And your inspiration is more likely to be actually useful and new, too.
Many of the problems you’ll be working on will be unsolved. You need to be able to guess at new solutions, and it helps if your guesses are educated.

So I think theoretical knowledge is useful. It’s the first thing I went at and tackled when I got into ML — I worked through the Coursera machine learning specialization, the Imperial College AI Math course on Coursera, and, later, some of Karpathy’s lectures. I trained MNIST before I trained an LLM. I think that was useful.

That being said, the number of people who spend months and months and years and years in course land, feeling productive, is concerningly high. If I could do it again I might start iterating and actually building sooner. The amount of theoretical knowledge that exists is WAY higher than what you need to actually be productive. Limit yourself to the number of courses you can count on one hand, would be my advice.

Here are some good ones (links):

https://karpathy.ai/zero-to-hero.html

https://www.coursera.org/specializations/machine-learning-introduction

https://www.coursera.org/specializations/mathematics-machine-learning

Primary Sources

You won’t know more than most people if you watch the same handful of influencers that most people focus on. Read a damn research paper. They actually aren’t that bad; they feel intimidating at first, but go through a few of them and you get used to it. What you have to get good at with those is having a sense for what is actually useful information, what is just filler, and what is needless math that shouldn’t even be there but was put there to make the paper seem fancier.

The cool thing about ML research papers is that they will often summarize the parts of their field relevant to the paper itself at the start (sort of a “previously on… MACHINE LEARNING”) so even if you’ve never read one before, you can get the gist.

For your first few papers, I recommend taking quite a bit of time on them: for every term you don’t understand, Google it, and read or watch about it until you feel you have a solid-enough high-level understanding.

This takes time. Of course it takes time, you’re catching up with a decently-old and very advanced field. Stick with it, spend a day or two or five on your first paper, and then watch as the next one takes half the time to finish, and the one after that takes a quarter. If you can read a paper and know what most of the terms mean, and be able to explain each at a high level, then you’ll be probably in the top 95% of this field. And all it takes is Googling and patience.

Community

Stackoverflow really f*cking sucks for the cutting edge of AI. There aren’t enough mainstream people doing the stuff at the edge of this for Google results to be worth half a damn. Most of the concepts you need to know won’t be covered by any course, blogpost, or tutorial. So you need to adjust your bugfixing and learning approach.

Firstly, rely on community more. Discord is particularly great for this: rockstar open-source devs, startup founders, and major university researchers all hang out in a place that’s free to access — and they have the patience to answer your questions! If you know what servers to access and stay out of some of the more popular channels there, you’ll quickly be able to find absolute experts on any niche area who do the thing you’re asking about multiple times a day. One of the great things I had to learn when getting into ML was the courage of asking a question — my introverted self was used to just Googling and lurking when dealing with normal computer science problems, but with ML, if you’re doing stuff you have to put yourself out there. Hang around in groups of people smarter than you. Engage anyway.

If you hang around people better than you with an open mind, you won’t be worse than them for long.

Even when you don’t talk to people directly you can learn from just reading smart people’s conversations. I have multiple insightful Discord conversations between people such as TurboDerp (creator of Exllama), ChatError (creator of Kimiko and I think the Limarp dataset), Gryphe (creator, MythoMax) etc. saved on my computer because they were so interesting. Heck, last I checked, Gryphe subscribes to this newsletter! Point being — you meet really badass people if you put yourself out there on the right platform.

A good Discord to check out: https://discord.gg/FTy6ySKT (TheBloke AI)

And also: https://discord.gg/7kpsyBny (SillyTavern Discord)

Just, two rules:

Don’t be annoying. Use proper-ish grammar, don’t ask Google-able questions, etc. Be friendly, pay your social dues, thank people if they help
you out.
Don’t be too formal. Discord is pretty far on the “anonymous, fake PFP, meme-y” side of the SNS spectrum. If you envision a “professional but fake” spectrum, with 4chan on the “not” side and LinkedIn on the “very” side, Discord is closer to “not”. Enter accordingly.

Effort.

You get out what you put in. Whether it’s talking to experts, taking courses, or building projects, you have to stick with it and do it a lot. This is not a mere platitude, I mean this as ML-specific advice, not as feel-good hype stuff you read once, go to sleep thinking about, and forget the next morning — so much of this is new in ML that it takes extra effort to solve seemingly basic problems, and you actually need to be resourceful to overcome issues (rather than just waiting and seeing if it sorts itself out).

By “things are harder,” I mean

Documentation sucks, so you will either need to ask people or read the source code to figure stuff out sometimes.
- You can often ask the creator of the thing itself, usually on Discord.
- Sometimes this means guessing rather than understanding. If it works, it’s a win.
Major packages with 1000+ stars will be broken on the main branch.
Major API providers will just be down, or they won’t update your spending, or their prompt format will be broken for months and make everything 90% worse.
Major releases by billion-dollar companies will be silently broken and not fixed for months.
You will still run into OOMs all the time even after training models for a year.

Everything in this list has happened to me. Specifically: I’ve dug through the aphrodite source to understand its async functions, dug through the axolotl source yesterday to add debug statements, and dug through the PrivateGPT source to get at its prompt format; I’ve had client work interrupted because Axolotl was broken on master and I had to pull an older commit I knew had worked before; Together.ai has had many problems, as have its competitors (I diiscovered through A/B testing on one client project that OpenRouter had worse performance than other providers, meaning it was a prompt issue on every model on the service); the Llama 3 tokenizer was broken silently, not once, but twice, for months; and I still do get god-damned OOMs even in 2024. That’s not all of it, either.

Point being: it’s a grind. It’s a slog. Stuff that should work, won’t — sometimes it’ll be your fault and sometimes it won’t be. You’ll give it your all and move an inch forward. You actually have to be resourceful: you’re not trying to say “I tried” you’re trying to say “I figured it out”. Be clever. Imagine you had to get this working or something terrible would happen — what kind of tricks, what kind of ugly workarounds, would you use to make progress? Learning the bleeding edge means you get cut sometimes, but it makes you better.

Learning ML is very hard because there’s no complete guide to it — you have to figure it out, like running a company. But like running a company, it feels really good when you make progress. Working hard, being resourceful, and willingly confronting productive things outside your comfort zone make everything written about earlier more effective. So my advice: act accordingly.

Fin

This has been a bit more of a meta-post, inspired by some really good conversations I’ve had that have helped me realize that problem solving and resourcefulness are two of the things that can set someone apart in this field. So I wrote this to sort of, organize it in a way so that I can apply it better myself, and hopefully help out other people who are getting started or who want to take themselves to the next level.

Sorry for the delay on the post. Been a real busy couple weeks. Prompting Biweekly, I guess! Will try to get a post next Sunday.

That’s all for this week. Have a good one and I’ll see you next time!

Prompting, model training, and other specific bits of knowledge can be very useful in specific situations. Knowing that models are more than the sum of their parts can help when deciding what datasets to use; knowing how to use few-shot examples when prompting open LLMs makes creating quality applications much better. But even more valuable than having this knowledge, is being able to learn it or figure it out yourself.

Prompting Weekly

How to Learn The Bleeding Edge of LLMs

Studying areas where there are no textbooks or courses.

Fin

Discussion about this post