Cracking the Enigma of Ollama Templates
In olama, the template and parameters and maybe the system prompt and license are all part of the model. This is one of the strengths of the platform and it's why it's the easiest to use for beginners and advanced users alike. Every other tool that runs AI models leaves it up to the user to determine the right template to use. The tool can guess, but you don't have to do that.
For many new models to find that the first guess isn't always the best, but when you import any supported Hugging Face model, either using the manual conversion method or the newer Hugging Face convenience method, you often need to come up with a valid template on your own. This can be a real paino for users who see a complicated template on any other model and not know where to start for the one they're importing. So, let's take a look at how templates work and how to build a new one. This is the first of the more advanced topics in the olama course and will be added to a new playlist, but it's still part of the same olama course.
This is a course I've been building out based on the most common questions on the olama Discord server. I've been in the Discord starting at the beginning since I was part of the founding olama team. I'm now focused on building out my YouTube channel and helping people learn how to use olama more effectively. This is the stuff I love to do after being a trainer at Open Text and then starting the training and documentation and Community teams at Data Dog.
So, if you're new to olama or just want a deeper understanding of how templates work, this is the perfect place to start. There are a lot of really complicated templates out there, so we're going to start out with one of the first templates that happens to also be the simplest. At the beginning of ama's Journey, Llama 2 from Meta had just been released in the docs; we see the model expects the prompt to be in this format.
The reason it expects the format is that it's the way the model was trained, so we see the whole prompt is wrapped in an ins block and then the system prompt is in the Sy block, and then after that is the user prompt. In the template used in olama, this shows up with the same structure, replacing system prompt with do system in double curly brackets and prompt also in double curly brackets. This is our introduction to the specific template language that olama uses: Go templates. All instructions in Go templates are wrapped in double curly brackets, and a word with a period at the beginning indicates that this is a variable that'll be injected into the template.
If you run the Llama server in debug mode, we can see the template working. Try asking the usual questions like, "Why is the sky blue?" and in the debug output we see this. Let's set the system prompt to be, "You are an AI that explains everything at a third grade level." Now, when we ask, "What is a black hole?" we can see this in the debug output. Let's go to a model that's a bit newer and it uses a slightly more complicated template; this is Orca Mini.
Different models often use different prompt formats, and the older models often use very different formats. What's interesting about this one is that the hashash system: will only appear if the prompt, if the system prompt is defined; otherwise, the prompt will just be three hashes, then user: then the prompt, then three hashes and response: to get this. We see the if statement that says if the system variable is defined, then show the system block, and that's ended with the end instruction. Notice also the dash at the beginning of the instruction; the dash says to trim white space from before that instruction.
Okay, let's fast forward a bit more in time to the first mistal. This template is much more complicated, but you know what? It isn't complicated. You know where I'm going with this, right? Liking the video and subscribing to the channel goes a long way to support my work here, and it's so not complicated. This template is, but it uses a lot of the same basic principles as the previous templates.
One aspect that makes it difficult to understand is that spacing isn't important. So, let's tweak the template here to look like this. What's new here is the range instruction; range is basically a "for each" instruction. We see it at the top with messages, for each message in the array. First, check if the message is a user role, and then if there are any tools defined. If tools is defined and we have at least one message, then list the tools; and if system is defined and we have at least one message, then show that.
Then it checks if the message is an assistant role, meaning the previous output from a model. If the message is content, then output the content, but if it's a tool call, then display each in a special format. Then finally check if the message has a tool role and output that. Then, going back to the top: if messages is not defined, then we have a simple system prompt and user prompt, so output that format. I don't love this template because there are a few things I find a bit clunky about it.
If we Zoom forward to some of the latest models like Small LM2, I feel like the template is easier to follow and understand. It's more streamlined and uses fewer nested blocks. I think that's a good direction for templates to go in, making them simpler and more intuitive for users. When you import a new model from Hugging Face that doesn't have a template, start simple like the first one we saw in this video—just work with the system and user prompts.
Create the model and test if it works; if it does, start working with the range of messages. When you get that working, add tools to the template, assuming tools are supported by the model. In many cas, the template may be the same as one of the pre-existing templates in ama. Look at the recent models and see if there's a similar structure; if you don't find an exact match, maybe you can adapt one of the existing templates or at least find inspiration in them.
You might think it would be great if all models could just agree on a single template to use, but that would also ensure there's no progress in making better templates. So, it can be a challenge to create new ones, but hopefully now you understand how to read a template and understand what it's doing. The next step is to just practice building them. There is a doc that goes into detail on different aspects of templates and lists all the variables available; you can find that here.
And that's everything you need to know to start building out templates, or at least being able to read them. And if you've watched all the videos in the course, you know a lot about using ama; if not, check out the rest of the course and let me know what you think. Thanks so much for being here. Goodbye.