“Just Add AI” – From Joke to Reality

A few years ago, when a product manager said “just add AI”, engineers would roll their eyes.
It sounded like someone asking you to implement TCP/IP from scratch when you were just building a CRUD app.
It was unrealistic because “AI” meant:

Hiring ML specialists
Curating & labeling data
Building a training pipeline
Deploying custom GPU clusters
Maintaining a fragile production model

That was infrastructure work, not product development.

But now, things are different.

Pretrained, Drop-In Models

Open-source models (LLaMA, Mistral, Falcon, Gemma, Phi) are pretrained and ready to go.
You don’t need to spend months on data pipelines or training. You just download a model.

Standardized Runtimes

You no longer need to write low-level inference code.
Runtimes like:

provide simple APIs—many compatible with OpenAI’s.

Integration Is Boring Now (in a good way)

Running a local model looks like this:

ollama pull llama3.2
ollama run llama3.2 "Write a haiku about software PMs"

And calling it from your app:

import requests

res = requests.post("http://localhost:11434/api/generate", json={
    "model": "llama3.2",
    "prompt": "Summarize this text..."
})
print(res.json()["response"])

This is closer to adding a database or cache layer than inventing AI.

My 1-Minute “Just Add AI” Setup

I tried Ollama—and yes, it really is a one-minute setup:

Run the model (CLI sanity check):
```
ollama run llama3.2
```
Find the service process (PID):
```
ps -ef | grep ollama
```

ps -ef | grep ollama

So we actually see two processes running by the ollama user:

The top one is the server (/usr/local/bin/ollama serve makes that obvious), so 882625 is the pid we want.
The second one /usr/local/bin/ollama runner --model ... is the interactive CLI interface we just used. (which sends the input over to the process running serve, this is just abstracted away from us.)

Find what port it’s listening on:

sudo lsof -Pan -p <PID you just found> -i

sudo lsof -Pan -p 882625 -i

Note: We use sudo because the ollama service runs as its own user, not your logged-in user. Without sudo, you might not see any output.

Test the API:

curl http://localhost:<PORT>/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?"
}'

Streaming output appears just like OpenAI’s API:

{"response":"The","done":false}
{"response":" sky","done":false}
...
{"done":true}

Expose it to clients:
- Reverse proxy 80/443 → <PORT>
- Or, open the port directly and tell clients to use it

That’s it: from “just add AI” → local AI microservice running in minutes.

Mastery Makes Hard Things Easy

When you’re new to an environment, even simple things feel scary. If you’ve only written Excel macros, building an executable feels like diving into an ocean. (Not to mention all the Windows bloat—registry keys, Win32 APIs, and endless dialogs—can feel like an uncharted world.)

But as you master your environment, things get easier:

Running a local LLM is no scarier than starting a local web server.
Managing models feels like installing libraries, not writing algorithms.
Deploying AI becomes just another process in your stack, running in a separate PID.

Even tooling perspectives change.
GUIs often seem easier at first but add a layer of abstraction between you and the raw data.
For example, Wireshark (GUI) can feel heavier and more distracting than simply running tcpdump and piping output directly into the tools you already know.

What once felt impossible turns into muscle memory.

Are Engineers Ready for the Shift?

Engineers used to joke about PMs saying “just add AI” because it implied building AI from scratch. But now that inference is commoditized, that same request is becoming realistic—not trivial, but no longer absurd.

The question is: How long before engineers start expecting AI in everything, the same way we expect networking, search, or caching to “just work”?

If history is any guide—TCP/IP, databases, cloud hosting—this shift is only a couple of years away.

Takeaways

“Just add AI” used to mean “reinvent the wheel.”
Today, it means “use an API or local runtime.”
Mastery of tools and environments accelerates this transition.
What feels like magic at first becomes standard engineering practice over time.

But now, things are different.​

Pretrained, Drop-In Models​

Standardized Runtimes​

Integration Is Boring Now (in a good way)​

My 1-Minute “Just Add AI” Setup​

Mastery Makes Hard Things Easy​

Are Engineers Ready for the Shift?​

Takeaways​