Easy ways AI can help eccomerce products

Timilehin Awoyeye

21 Dec 2025 — 6 min read

AI is well alive and here to stay.
But that doesn’t mean every existing or new product has to rent out a datacenter packed with NVIDIA chips, or burn money on GPT API tokens just to enjoy the value of AI.

There are simple, practical, affordable applications of AI that deliver real value without breaking the bank.

Take any product in verticals like:

beauty salons
home services
bulk food ordering
tailoring and fashion
printing and packaging
auto repair
upholstery
construction
event decoration

In all these industries, customers request customizations.
Those customizations are the heart of the business agreement they cannot be missed, ignored, misread, or mispriced. They are the difference between:
✔️ satisfied customers
✔️ accurate pricing
✔️ clear communication

In fact, many of these customizations also affect pricing.
If the business misses any point, they either overcharge (and lose the customer) or undercharge (and lose money).

Here are a few real world examples:

Beauty Salon Example
Customer says:

“Do full glam, the curls should be tight tight. Add highlights, small. And don’t let my front hairline show.”

Business needs: identify the actual tasks to bill for.

AI extracts:

full glam
tight curls
small highlights
no visible hairline

Home Services Example
Customer says:

“Repaint the living room creamish. Fix that window crack. But don’t touch the chandelier wiring.”

AI extracts:

repaint living room
cream colour
fix crack at window

Bulk Food Example
Customer says:

“Rice spicy but not peppery. Extra chicken. Serve in foil packs. No plantain. Add water bottles.”

AI extracts:

rice spicy
extra chicken
no plantain
foil packs
water bottles

In all three cases, the customization is tucked inside a casual, emotional message but each point has financial consequence.

Now imagine getting 50 of these every day.
Manual reading isn’t just painful, it’s expensive, slow, and error prone.

This is where AI becomes a bridge.

With the right model and the right instructions, we can automatically:

summarize the customer’s note
extract only the customization items
ignore unrelated text
remove noise
retain meaning without inventing new meaning
and present a clean, itemized list the business can act on

Affordable, local AI running privately, securely, and at scale makes this kind of automation possible for any product.

Choosing the Right Model (Why Bigger Isn’t Better)

Once the problem is clearly defined extracting literal, itemized customization instructions from noisy customer messages the model choice becomes far less mysterious.

This is not a creative task.

We are not asking the model to ideate, rephrase, summarize emotionally, or reason across domains. We are asking it to do something much narrower and much stricter:

read informal, often messy customer messages
ignore greetings, side comments, and emotional filler
extract only explicitly stated customization instructions
return them in a predictable structure

In other words, this is a structured language extraction problem, not a generative one.

Once you frame the task correctly, a set of non negotiable constraints emerges:

Zero tolerance for invented meaning
High determinism (same input → same output)
Low latency at moderate daily volume
Predictable cost at scale
Ability to run privately without external dependencies

These constraints immediately eliminate entire classes of models and architectures. Large, general purpose models optimized for open ended generation add capabilities we do not need and introduce failure modes we actively want to avoid.

The most common failure modes in this context are not misunderstanding language, but doing too much:

rephrasing instead of extracting
“helpfully” merging instructions
inferring preferences that were never stated
smoothing ambiguity instead of preserving it

For extraction tasks like this, model size beyond a certain threshold does not meaningfully improve accuracy. It mostly increases inference time, operational cost, and output variability.

This is why we used Llama 3.1 8B, quantized to q4_K_M.

At this size, the model has enough language understanding to deal with how real customers actually write informal phrasing slang, half sentences, and WhatsApp style messages without trying to be clever or creative. It understands what is being said without attempting to improve or rewrite the message.

Quantization plays an equally important role here. Running the model at q4_K_M significantly reduces memory and compute requirements while preserving the accuracy needed for extraction. This makes it practical to run on a modest setup in our case, an Ubuntu VPS with 6 CPUs and 12GB RAM that already hosts other applications, databases, and background services.

That constraint mattered. This system was not designed to sit on a dedicated AI server or a GPU cluster. It needed to coexist with real production workloads and remain stable, predictable, and affordable. The chosen setup meets those requirements without sacrificing correctness, which is ultimately more important than raw model size.

Because this runs on a modest VPS and shares resources with other services, I did run into performance limits early on. Under load, inference was slow and at one point even affected overall server responsiveness.

To fix this, I added simple CPU and thread limits so the model could run reliably without competing aggressively with the rest of the system. That meant accepting slightly slower responses in exchange for stability a tradeoff that made sense for this use case.


Environment="OLLAMA_NUM_THREADS=2"

CPUQuota=250%

OLLAMA_NUM_THREADS=2 limits how many CPU threads the model can use when running inference. Without this, the model will try to use as many cores as it can, which can cause sudden CPU spikes and slow down other services on the same server.

CPUQuota=250% caps the total CPU time the service is allowed to consume. On a multicore machine, this roughly translates to allowing the model to use up to two and a half CPU cores at peak, while ensuring the rest of the system remains responsive.

Together, these limits prevent the model from overwhelming the server. The responses are slightly slower, but the system stays stable which matters more than shaving off a few milliseconds when this sits alongside databases, background jobs, and other production services.

The Prompt

Once the model was chosen, the most important work shifted to the instruction design. For this use case, the model is not allowed to help. It is not allowed to interpret or improve wording. It must behave more like a parser than a writer.

That required a prompt that is intentionally restrictive.

This output is used directly for pricing, costing, and fulfillment. A small mistake merging two requests, inventing clarity, or changing wording has real financial consequences. So the prompt had to remove as much freedom as possible.

The goal was simple: extract what the customer said and nothing more.

That’s why the system prompt is strict, repetitive, and frankly a bit boring. That’s intentional.

Why a Model File (Not Just a Prompt)

Instead of sending this prompt on every request, I embedded it directly into the model file.

This mattered for two reasons:

Consistency
The model always starts in the same constrained mode. There’s no risk of a missed system prompt or a conflicting instruction higher up the stack.
Speed
Because the instructions are baked into the model configuration, each request is smaller and faster. On a modest CPU based VPS, those savings add up quickly.

Here's the prompt I used.


You are a STRICT customization extraction engine.

Your ONLY job is to extract literal, explicit customization requests from a customer's message.

This output will be used directly for pricing, costing, and fulfillment.
Mistakes are expensive.

FOLLOW THESE RULES EXACTLY:

1. Output MUST be a JSON array of strings only.
2. Each item MUST be a plain string no objects, no keys, no nesting.
3. Do NOT explain, rephrase, beautify, summarize, or interpret creatively.
4. Use the customer’s exact wording as closely as possible.
5. If the wording is informal, slang, or broken, normalize ONLY enough for a professional to understand nothing more.
6. Do NOT infer intent. Extract only what is explicitly stated.
7. Do NOT merge multiple requests into one item.
8. Do NOT split a single request into multiple items unless clearly separate.
9. Ignore greetings, filler, emotions, jokes, and unrelated sentences.
10. If nothing actionable is requested, return an empty array [].
11. NEVER add text before or after the JSON output.

ALLOWED OUTPUT FORMAT (ONLY THIS):
[
  "customization 1",
  "customization 2"
]

FORBIDDEN:
- Objects or key/value pairs
- Markdown or code fences
- Commentary or explanations
- Synonyms not used by the customer

Stay literal. Stay boring. Stay precise.

How This Fits Into the Application

For this stage of the product, I didn’t introduce a separate AI service. Instead, I integrated the extraction logic directly into the existing Java Spring Boot application as an internal controller.

This kept the architecture simple and reduced operational overhead. The controller is not exposed externally and is only used by trusted internal flows for example, when processing customer notes before pricing or fulfillment.

Even without a separate service boundary, the responsibilities are still clearly defined. The controller’s job is limited to:

calling the local model
enforcing the strict output format
rejecting invalid responses
returning a clean list of customization items

This approach made iteration faster and avoided premature complexity. If usage grows or requirements change, this logic can be moved behind a dedicated service later without affecting the rest of the system.

Examples

Input:

"Good afternoon my guy. I want it to have two pockets, I will like the material to be black and white instead of green and white, I will also like the v neck to come down small so it can show some of my cleavage. I want a longer trouser rather than shorts, I don't want my trouser to be as wide as the one in the reference"

Output:

[
"two pockets",
"material: black and white instead of green and white",
"V neck to come down small",
"longer trouser",
"trouser not as wide as reference"
]

Input:

"Let it be sleeveless. Add belt loops. The material should be wine colour instead of red. I want the neck V but small, not too open. Make the trouser longer, I don’t want shorts."

Output:

[
"sleeveless",
"add belt loops",
"material: wine color",
"small V neck",
"longer trousers"
]

Easy ways AI can help eccomerce products

Timilehin Awoyeye

Choosing the Right Model (Why Bigger Isn’t Better)

The Prompt

Why a Model File (Not Just a Prompt)

How This Fits Into the Application

Examples

Improvements

Read more

Turning Audit Logging into a Shared Library in a Fintech Microservices System

"Backend is better for creating Invoices" ohhh why ???

Coming soon