yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Cracking the Enigma of Ollama Templates


5m read
·Mar 21, 2025

In olama, the template and parameters and maybe the system prompt and license are all part of the model. This is one of the strengths of the platform and it's why it's the easiest to use for beginners and advanced users alike. Every other tool that runs AI models leaves it up to the user to determine the right template to use. The tool can guess, but you don't have to do that.

For many new models to find that the first guess isn't always the best, but when you import any supported Hugging Face model, either using the manual conversion method or the newer Hugging Face convenience method, you often need to come up with a valid template on your own. This can be a real paino for users who see a complicated template on any other model and not know where to start for the one they're importing. So, let's take a look at how templates work and how to build a new one. This is the first of the more advanced topics in the olama course and will be added to a new playlist, but it's still part of the same olama course.

This is a course I've been building out based on the most common questions on the olama Discord server. I've been in the Discord starting at the beginning since I was part of the founding olama team. I'm now focused on building out my YouTube channel and helping people learn how to use olama more effectively. This is the stuff I love to do after being a trainer at Open Text and then starting the training and documentation and Community teams at Data Dog.

So, if you're new to olama or just want a deeper understanding of how templates work, this is the perfect place to start. There are a lot of really complicated templates out there, so we're going to start out with one of the first templates that happens to also be the simplest. At the beginning of ama's Journey, Llama 2 from Meta had just been released in the docs; we see the model expects the prompt to be in this format.

The reason it expects the format is that it's the way the model was trained, so we see the whole prompt is wrapped in an ins block and then the system prompt is in the Sy block, and then after that is the user prompt. In the template used in olama, this shows up with the same structure, replacing system prompt with do system in double curly brackets and prompt also in double curly brackets. This is our introduction to the specific template language that olama uses: Go templates. All instructions in Go templates are wrapped in double curly brackets, and a word with a period at the beginning indicates that this is a variable that'll be injected into the template.

If you run the Llama server in debug mode, we can see the template working. Try asking the usual questions like, "Why is the sky blue?" and in the debug output we see this. Let's set the system prompt to be, "You are an AI that explains everything at a third grade level." Now, when we ask, "What is a black hole?" we can see this in the debug output. Let's go to a model that's a bit newer and it uses a slightly more complicated template; this is Orca Mini.

Different models often use different prompt formats, and the older models often use very different formats. What's interesting about this one is that the hashash system: will only appear if the prompt, if the system prompt is defined; otherwise, the prompt will just be three hashes, then user: then the prompt, then three hashes and response: to get this. We see the if statement that says if the system variable is defined, then show the system block, and that's ended with the end instruction. Notice also the dash at the beginning of the instruction; the dash says to trim white space from before that instruction.

Okay, let's fast forward a bit more in time to the first mistal. This template is much more complicated, but you know what? It isn't complicated. You know where I'm going with this, right? Liking the video and subscribing to the channel goes a long way to support my work here, and it's so not complicated. This template is, but it uses a lot of the same basic principles as the previous templates.

One aspect that makes it difficult to understand is that spacing isn't important. So, let's tweak the template here to look like this. What's new here is the range instruction; range is basically a "for each" instruction. We see it at the top with messages, for each message in the array. First, check if the message is a user role, and then if there are any tools defined. If tools is defined and we have at least one message, then list the tools; and if system is defined and we have at least one message, then show that.

Then it checks if the message is an assistant role, meaning the previous output from a model. If the message is content, then output the content, but if it's a tool call, then display each in a special format. Then finally check if the message has a tool role and output that. Then, going back to the top: if messages is not defined, then we have a simple system prompt and user prompt, so output that format. I don't love this template because there are a few things I find a bit clunky about it.

If we Zoom forward to some of the latest models like Small LM2, I feel like the template is easier to follow and understand. It's more streamlined and uses fewer nested blocks. I think that's a good direction for templates to go in, making them simpler and more intuitive for users. When you import a new model from Hugging Face that doesn't have a template, start simple like the first one we saw in this video—just work with the system and user prompts.

Create the model and test if it works; if it does, start working with the range of messages. When you get that working, add tools to the template, assuming tools are supported by the model. In many cas, the template may be the same as one of the pre-existing templates in ama. Look at the recent models and see if there's a similar structure; if you don't find an exact match, maybe you can adapt one of the existing templates or at least find inspiration in them.

You might think it would be great if all models could just agree on a single template to use, but that would also ensure there's no progress in making better templates. So, it can be a challenge to create new ones, but hopefully now you understand how to read a template and understand what it's doing. The next step is to just practice building them. There is a doc that goes into detail on different aspects of templates and lists all the variables available; you can find that here.

And that's everything you need to know to start building out templates, or at least being able to read them. And if you've watched all the videos in the course, you know a lot about using ama; if not, check out the rest of the course and let me know what you think. Thanks so much for being here. Goodbye.

More Articles

View All
Mr. Freeman, part 64
Ooops! Uh… Close the door! Get all of the young children out of here, and put your hands where I can see them! Do it! Today I’m going to tell you about a joyful and pleasant pastime, a piece of pocket-size happiness for anyone, a path to pure pleasure th…
Second partial derivative test
In the last video, we took a look at this function ( f(x, y) = x^4 - 4x^2 + y^2 ), which has the graph that you’re looking at on the left. We looked for all of the points where the gradient is equal to zero, which basically means both partial derivatives …
Why self improvement is ruining your life
One of the best feelings in the entire world is the feeling of getting better at the things that you’re interested in. You know, if you’re starting to get into the gym, it feels really good to actually see yourself getting stronger, whether that’s visuall…
MARS | Exclusive Sneak Peek
And now an exclusive sneak peek at the first episode of [Music] [Music] Mars Retro Rockets about to fire in 1, 2, 3… bre… 1, 2… [Music] three. We dream it’s who we are, down to our bones, ourselves. That instinct to build, that drive to seek beyond what …
Is the Universe Discrete or Continuous?
You said that we went from atoms in the time of Democritus down to nuclei, and from there to protons and neutrons, and then to quarks. It’s particles all the way down. To paraphrase Feynman, we can keep going forever, but it’s not quite forever. Right at …
How to Have Interesting Ideas (The Ben Thompson Playbook)
The most important article you write is the second article someone reads, and I do think that volume or quantity is underrated. So that’s like 50 or 60 books worth of writing over the last decade. That is an insane amount of volume. It would be hard to ha…