Easy ways AI can help eccomerce products
AI is well alive and here to stay.
But that doesn’t mean every existing or new product has to rent out a datacenter packed with NVIDIA chips, or burn money on GPT API tokens just to enjoy the value of AI.
There are simple, practical, affordable applications of AI that deliver real value without breaking the bank.
Take any product in verticals like:
- beauty salons
- home services
- bulk food ordering
- tailoring and fashion
- printing and packaging
- auto repair
- upholstery
- construction
- event decoration
In all these industries, customers request customizations.
Those customizations are the heart of the business agreement they cannot be missed, ignored, misread, or mispriced. They are the difference between:
✔️ satisfied customers
✔️ accurate pricing
✔️ clear communication
In fact, many of these customizations also affect pricing.
If the business misses any point, they either overcharge (and lose the customer) or undercharge (and lose money).
Here are a few real world examples:
Beauty Salon Example
Customer says:
“Do full glam, the curls should be tight tight. Add highlights, small. And don’t let my front hairline show.”
Business needs: identify the actual tasks to bill for.
AI extracts:
- full glam
- tight curls
- small highlights
- no visible hairline
Home Services Example
Customer says:
“Repaint the living room creamish. Fix that window crack. But don’t touch the chandelier wiring.”
AI extracts:
- repaint living room
- cream colour
- fix crack at window
Bulk Food Example
Customer says:
“Rice spicy but not peppery. Extra chicken. Serve in foil packs. No plantain. Add water bottles.”
AI extracts:
- rice spicy
- extra chicken
- no plantain
- foil packs
- water bottles
In all three cases, the customization is tucked inside a casual, emotional message but each point has financial consequence.
Now imagine getting 50 of these every day.
Manual reading isn’t just painful, it’s expensive, slow, and error prone.
This is where AI becomes a bridge.
With the right model and the right instructions, we can automatically:
- summarize the customer’s note
- extract only the customization items
- ignore unrelated text
- remove noise
- retain meaning without inventing new meaning
- and present a clean, itemized list the business can act on
Affordable, local AI running privately, securely, and at scale makes this kind of automation possible for any product.
Choosing the Right Model (Why Bigger Isn’t Better)
Once the problem is clearly defined extracting literal, itemized customization instructions from noisy customer messages the model choice becomes far less mysterious.
This is not a creative task.
We are not asking the model to ideate, rephrase, summarize emotionally, or reason across domains. We are asking it to do something much narrower and much stricter:
- read informal, often messy customer messages
- ignore greetings, side comments, and emotional filler
- extract only explicitly stated customization instructions
- return them in a predictable structure
In other words, this is a structured language extraction problem, not a generative one.
Once you frame the task correctly, a set of non negotiable constraints emerges:
- Zero tolerance for invented meaning
- High determinism (same input → same output)
- Low latency at moderate daily volume
- Predictable cost at scale
- Ability to run privately without external dependencies
These constraints immediately eliminate entire classes of models and architectures. Large, general purpose models optimized for open ended generation add capabilities we do not need and introduce failure modes we actively want to avoid.
The most common failure modes in this context are not misunderstanding language, but doing too much:
- rephrasing instead of extracting
- “helpfully” merging instructions
- inferring preferences that were never stated
- smoothing ambiguity instead of preserving it
For extraction tasks like this, model size beyond a certain threshold does not meaningfully improve accuracy. It mostly increases inference time, operational cost, and output variability.
This is why we used Llama 3.1 8B, quantized to q4_K_M.
At this size, the model has enough language understanding to deal with how real customers actually write informal phrasing slang, half sentences, and WhatsApp style messages without trying to be clever or creative. It understands what is being said without attempting to improve or rewrite the message.
Quantization plays an equally important role here. Running the model at q4_K_M significantly reduces memory and compute requirements while preserving the accuracy needed for extraction. This makes it practical to run on a modest setup in our case, an Ubuntu VPS with 6 CPUs and 12GB RAM that already hosts other applications, databases, and background services.
That constraint mattered. This system was not designed to sit on a dedicated AI server or a GPU cluster. It needed to coexist with real production workloads and remain stable, predictable, and affordable. The chosen setup meets those requirements without sacrificing correctness, which is ultimately more important than raw model size.
Because this runs on a modest VPS and shares resources with other services, I did run into performance limits early on. Under load, inference was slow and at one point even affected overall server responsiveness.
To fix this, I added simple CPU and thread limits so the model could run reliably without competing aggressively with the rest of the system. That meant accepting slightly slower responses in exchange for stability a tradeoff that made sense for this use case.
Environment="OLLAMA_NUM_THREADS=2"
CPUQuota=250%
OLLAMA_NUM_THREADS=2 limits how many CPU threads the model can use when running inference. Without this, the model will try to use as many cores as it can, which can cause sudden CPU spikes and slow down other services on the same server.
CPUQuota=250% caps the total CPU time the service is allowed to consume. On a multicore machine, this roughly translates to allowing the model to use up to two and a half CPU cores at peak, while ensuring the rest of the system remains responsive.
Together, these limits prevent the model from overwhelming the server. The responses are slightly slower, but the system stays stable which matters more than shaving off a few milliseconds when this sits alongside databases, background jobs, and other production services.
The Prompt
Once the model was chosen, the most important work shifted to the instruction design. For this use case, the model is not allowed to help. It is not allowed to interpret or improve wording. It must behave more like a parser than a writer.
That required a prompt that is intentionally restrictive.
This output is used directly for pricing, costing, and fulfillment. A small mistake merging two requests, inventing clarity, or changing wording has real financial consequences. So the prompt had to remove as much freedom as possible.
The goal was simple: extract what the customer said and nothing more.
That’s why the system prompt is strict, repetitive, and frankly a bit boring. That’s intentional.
Why a Model File (Not Just a Prompt)
Instead of sending this prompt on every request, I embedded it directly into the model file.
This mattered for two reasons:
- Consistency
The model always starts in the same constrained mode. There’s no risk of a missed system prompt or a conflicting instruction higher up the stack. - Speed
Because the instructions are baked into the model configuration, each request is smaller and faster. On a modest CPU based VPS, those savings add up quickly.
Here's the prompt I used.
You are a STRICT customization extraction engine.
Your ONLY job is to extract literal, explicit customization requests from a customer's message.
This output will be used directly for pricing, costing, and fulfillment.
Mistakes are expensive.
FOLLOW THESE RULES EXACTLY:
1. Output MUST be a JSON array of strings only.
2. Each item MUST be a plain string no objects, no keys, no nesting.
3. Do NOT explain, rephrase, beautify, summarize, or interpret creatively.
4. Use the customer’s exact wording as closely as possible.
5. If the wording is informal, slang, or broken, normalize ONLY enough for a professional to understand nothing more.
6. Do NOT infer intent. Extract only what is explicitly stated.
7. Do NOT merge multiple requests into one item.
8. Do NOT split a single request into multiple items unless clearly separate.
9. Ignore greetings, filler, emotions, jokes, and unrelated sentences.
10. If nothing actionable is requested, return an empty array [].
11. NEVER add text before or after the JSON output.
ALLOWED OUTPUT FORMAT (ONLY THIS):
[
"customization 1",
"customization 2"
]
FORBIDDEN:
- Objects or key/value pairs
- Markdown or code fences
- Commentary or explanations
- Synonyms not used by the customer
Stay literal. Stay boring. Stay precise.
How This Fits Into the Application
For this stage of the product, I didn’t introduce a separate AI service. Instead, I integrated the extraction logic directly into the existing Java Spring Boot application as an internal controller.
This kept the architecture simple and reduced operational overhead. The controller is not exposed externally and is only used by trusted internal flows for example, when processing customer notes before pricing or fulfillment.
Even without a separate service boundary, the responsibilities are still clearly defined. The controller’s job is limited to:
- calling the local model
- enforcing the strict output format
- rejecting invalid responses
- returning a clean list of customization items
This approach made iteration faster and avoided premature complexity. If usage grows or requirements change, this logic can be moved behind a dedicated service later without affecting the rest of the system.
Examples
Input:
"Good afternoon my guy. I want it to have two pockets, I will like the material to be black and white instead of green and white, I will also like the v neck to come down small so it can show some of my cleavage. I want a longer trouser rather than shorts, I don't want my trouser to be as wide as the one in the reference"
Output:
[
"two pockets",
"material: black and white instead of green and white",
"V neck to come down small",
"longer trouser",
"trouser not as wide as reference"
]
Input:
"Let it be sleeveless. Add belt loops. The material should be wine colour instead of red. I want the neck V but small, not too open. Make the trouser longer, I don’t want shorts."
Output:
[
"sleeveless",
"add belt loops",
"material: wine color",
"small V neck",
"longer trousers"
]