
Ollama is a tool designed to simplify and accelerate the process of building AI-powered applications, specifically through the use of large language models (LLMs). It is built to enable developers to run and interact with models locally on their machines, offering a more user-friendly interface and a set of tools to facilitate easy integration of AI capabilities into apps.
In summary, Ollama is useful because it simplifies access to powerful AI models, allowing developers to quickly incorporate sophisticated natural language processing capabilities into their applications while also offering better control over privacy, cost, and customization.
Let’s get started by installing Ollama on your PC. Ollama supports macOS, Windows, and Linux, so no matter your platform, you can follow along.
Go to Ollama's official website and download the version for your platform - https://ollama.com
Once downloaded, run the installation file and follow the prompts. Ollama will automatically start running when you log into your computer. On macOS, you’ll see an icon in the menu bar, and on Windows, it will appear in the system tray.
Open your terminal (or command prompt on Windows) so that we can interact with the Ollama CLI.
First, let's check the version installed:
ollama --version
You should see an output similar to this...

ollama pull llama3.2
This command can also be used to update a local model. Only the diff will be pulled.
Note! Check https://ollama.com/library for available models.
You will need at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
ollama list
Output...

ollama show llama3.2
Output...

ollama ps
Output...

Note! the processor column shows if ollama is running on GPU or CPU.
ollama rm llama3.2
ollama run llama3.2
You can now interact with the model directly from the terminal...

To exit enteractive mode type...
/bye
Alternatively, you can run the model directly with a user prompt.
ollama rm llama3.2 "How many planets in the solar system?"
Output...

ollama stop llama3.2
ollama serve
Start ollama REST API Server without running the desktop application.
A model file is your blueprint for creating and sharing models with Ollama. It lets you set key parameters like the system prompt, temperature, top_k, and top_p for the LLM. For full details, check out the official documentation: Ollama Model File Guide.
| Instruction | Description |
|---|---|
| FROM | Defines the base model to use (required). |
| PARAMETER | Sets the parameters for how Ollama will run the model. |
| TEMPLATE | The full prompt template to be sent to the model. |
| SYSTEM | Specifies the system message that will be set in the template. |
| ADAPTER | Defines the (Q)LoRA adapters to apply to the model. |
| LICENSE | Specifies the legal license. |
| MESSAGE | Specify message history. |
In this example, we will create a Yoda blueprint where the AI model communicates like Yoda from Star Wars.
Create a new file called ModelFile with the following content…
# Select llama3,2 as the base model
FROM llama3.2
# The temperature of the model.
# Increasing the temperature will make the model answer more creatively.
# (Default: 0.8)
PARAMETER temperature 1
# Sets the size of the context window used to generate the next token.
# (Default: 2048)
PARAMETER num_ctx 4096
# sets a custom system message to specify the behavior
# of the chat assistant
SYSTEM You are Yoda from star wars, acting as an assistant.
Create a new model called yoda as follows…
ollama create yoda -f ./Modelfile
You should see an output as follows…
transferring model data
using existing layer sha256:dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff
using existing layer sha256:966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396
using existing layer sha256:fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d
using existing layer sha256:a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd
creating new layer sha256:afcd998502772decfdf7ca4e90a3e01f75be28eaef2c5ce32da6f338d4c040e1
creating new layer sha256:fed51222976fa11b466d027e2882ab96b376bb91e7929851bc8f07ebe001d40a
creating new layer sha256:791cf1d0b7b8f1b1c32f961ab655229e4402b1b42535200c85cec89737eccf04
writing manifest success
If we run a list command we should see our new yoda model within the list output…
ollama list
NAME ID SIZE MODIFIED
yoda:latest 7ed337824072 2.0 GB 8 minutes ago
llama3.1:latest 46e0c10c039e 4.9 GB 7 days ago
llama3.2:latest a80c4f17acd5 2.0 GB 7 days ago
We can now run the yoda model and interact with it…
ollama run yoda
Output view...

In Ollama, you can direct the model to perform tasks using the contents of a file, like summarizing or analyzing text. This feature is particularly helpful for handling long documents, as it removes the need to manually copy and paste text when giving instructions to the model.
In the example below, we have a file named article.txt that discusses the Mediterranean diet, and we will instruct the LLM to provide a summary in 50 words or less.
ollama run llama3.2 "Summarise this article in 50 words or less." < article.txt
Output...

Ollama also allows you to save model responses to a file, making it simpler to review or refine them later.
Here's an example of asking the model a question and logging the output to a file:
ollama run llama3.2 "In less than 50 words, explain what is a democracy?" > output.txt
This will store the model’s response in output.txt:
~$ cat output.txt
A democracy is a system of government where power is held by the people, either directly or through elected representatives. Citizens have the right to participate in the decision-making process, express their opinions, and vote for leaders who will represent them in government.
Integrating Ollama with a third-party API to fetch data, process it, and produce results: In this example, we will retrieve data from the earthquake.usgs.gov API and summarise the results.
curl -sX GET "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2020-01-01&endtime=2020-01-02" | ollama run llama3.2
Output...
~$ curl -sX GET "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2020-01-01&endtime=2020-01-02" | ollama run llama3.2 "Summarise the results"
Here is a summary of the earthquake data:
**Location:** Puerto Rico
**Number of earthquakes:** 13
**Magnitudes:**
* M2.55 (64 km N of Isabela)
* M2.75 (80 km N of Isabela)
* M2.55 (64 km N of Isabela) - same location as previous one, likely same earthquake
* M2.55 (no specific location mentioned, but close to Isabela)
* M2.55 (no specific location mentioned)
**Earthquakes with significant impact:**
* M2.55 (64 km N of Isabela): 6.4 magnitude, felt in Puerto Rico
**Other notable earthquakes:**
* M1.84-2.55 (various locations near Maria Antonia): several smaller earthquakes, likely aftershocks
* M1.81-1.84 (12-9 km SSE of Maria Antonia): two small earthquakes, possibly related to the same event as 64 km N of Isabela
**Note:** The magnitude values may have changed slightly due to reprocessing and revision of the data.
Overall, this earthquake event had several significant earthquakes in the vicinity of Maria Antonia, with some smaller
aftershocks and related events.
The Ollama API feature allows developers to seamlessly integrate powerful language models into their applications. By providing easy access to advanced AI capabilities, the API enables tasks such as text generation, summarisation, sentiment analysis, and more. With simple integration and flexibility, Ollama empowers users to automate and enhance a wide range of processes, all while maintaining efficiency and scalability.
The Ollama API offers several options to customise the behavior of the language model for different use cases. Here are some key options available:
These options enable fine-grained control over the behavior of the language model, allowing you to tailor responses for specific use cases such as interactive chatbots, content generation, customer support, and more.
In this example, we’ll use curl to make a request to the Ollama API from the command line. We’ll disable streaming and set the temperature to 0.8 to encourage a more creative output.
To start Ollama API we run the following command from a command prompt...
ollama serve
Enter the following curl command in a new Terminal window...
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Create a limerick about a girl named Tracey",
"stream": false,
"options": {
"temperature": 0.8
},
"system":"You are Yoda from Star Wars"
}'
This should see an output similar to the following...
{"model":"llama3.2","created_at":"2025-01-22T16:55:09.756892Z","response":"A limerick, create I shall:\n\nThere once was a girl named Tracey so fine,\nHer kindness and heart, did truly shine.\nWith a smile so bright,\nShe lit up the night,\nAnd in her presence, all was divine.","done":true,"done_reason":"stop","context":[128006,9125,128007,271,38766,1303,33025,2696,25,6790,220,2366,18,271,2675,527,816,14320,505,7834,15317,128009,128006,882,128007,271,4110,264,326,3212,875,922,264,3828,7086,28262,88,128009,128006,78191,128007,271,32,326,3212,875,11,1893,358,4985,1473,3947,3131,574,264,3828,7086,28262,88,779,7060,345,21364,45972,323,4851,11,1550,9615,33505,627,2409,264,15648,779,10107,345,8100,13318,709,279,3814,345,3112,304,1077,9546,11,682,574,30467,13],"total_duration":681952125,"load_duration":18579584,"prompt_eval_count":43,"prompt_eval_duration":93000000,"eval_count":51,"eval_duration":569000000}%