← Back
Building a Product-to-Category Classifier
How to utilise the best of LLM chains, embeddings, and low-friction vector searching.
By Jack Hales | 18 December 2024
Recently as a project, I was looking into product classification. There are many ways to classify data, but there didn't seem to be a simple interface I could use. As personal upskilling & research, I set out to build some tooling to assist this process.
With this problem in hand, it felt like a perfect test of modern LLM's capability. Recently, I added a basic "Chain Builder" feature in my MicroAI framework (a small wrapper library). This would be a great test of chains, and "chain of thought" intelligence building.
A language chain is a coded series of prompts which are aimed at teasing a language model to think and reflect, to improve its reasoning capabilities.
So I collated the data I needed, filtered the "Other" categorised products, and got my number: 3,832 products which need to be categorised.
I'll simplify the series of events, and the tests which I conducted.
  1. I built a basic language chain which explained its job, as well as provided the category list to in the system prompt. It would then include the user prompt of "Item: Item Name". It would then ask it to think about the product, where it should go, and what sort of categories it thinks it should be in. Then, the system prompt would ask it to pick a single category straight from the category list. This worked okay, but it was far from effective. It was not very deterministic (3-4 out of 5 would get it sensical), and I was looking to need to do frequency analysis on results. It would also reason that it was Category X, then when prompted to pick the category, reply with "Other". This was "good enough", but I wanted to see what else could be done. Using GPT-4o-mini.
  2. From there, I tried very long prompts with GPT-3.5-turbo which would similarly list the categories, provide the item, and then ask the model to evaluate the item description, and then evaluate what it thinks about that item being in the category. Although a good idea, it fails when using a smaller model since it gets stuck in a loop. It starts saying "this would not fit into this category" over the first 10 categories, then it simply does not distinguish between categories - the momentum just keeps the denials flowing.
As you can imagine - there were many smaller tests which were done to tweak the prompting text to see how it outputs.
Then, I had a thought: what about embeddings? I understand them, but don't want to setup or use a third party for this: it's too much to fit into the problem space. I did some digging through Cursor's helpful Chat, and decided on using sk-learn's NearestNeighbours in order to efficiently do vector search for this project.
The tests for vector search include:
  1. Raw Embeddings: Searching for the category name vector using the product name vector. The correct result might be in the top 10 closest.
  2. Expressed Product Embeddings: Expanding the product name embedding with GPT-3.5-turbo and searching the raw category vectors. Slightly better results.
  3. Expressed Category Embeddings: Expanding the category name description with GPT-3.5-turbo to complement the product description vectors. Better results.
  4. Better Model: Using 4o-mini to expand the category names and product name, then doing vector search. Provides great results.
From here, I was happy with the results. There was a slight variance where the top result might be (2), but (1) was a reasonable choice for the category also. This was deeply incredible to witness and experiment with.
I then decided to take it one step further: I built a chain which takes input from these top-5 results, and prompted the model to consider relevancy for each, whether customers would expect to find this, and is it appropriate for e-commerce. It would then reason, and I'd follow-up with a query to ask it to pick appropriate categories (multiple!), then, ask it to pick a single category from this list which is the most appropriate. This created some awesome data points for cross-category-product-propogation, as well as solves our core problem of classifying using domain-specific categories.
To round it off, I wanted to improve the time it took to run this workflow per-item, as embeddings take a while, not to mention the 4o-mini workflow! I wrapped it in a threadpool, and the results came flying in!
Let's look at a few examples.
Product NameSingle CategoryMultiple Categories
Systane Liq Gel Drop 10 MlEye CareEye Care, Eye Health, Creams/Lotions/Liquids
Bio-Oil Oil 125 MlCreams, Lotions & LiquidsCreams/Lotions/Liquids, Hand & Body, Hair/Skin/Nails
Fishermans Friend Loz Original StrnMedicated LozengesMedicated Lozenges, Coughs, Cough & Cold Relief, Cold & Flu
Great! Across the 3,832 items, there was a match of 3,698 with these sort of quality answers. There were then a small group of unmatched items, and ~100 items which matched to Other (through the RAG). This is a match rate of 96.5% over 87 different categories. Nice!
Technical Summary
To classify uncategories products to domain-specific categories, I ended up using a LLM chain to expand details about the categories, an LLM chain to expand the description of the product's, then created embedding vectors for each dataset. I then did nearest-neighbours vector search locally. With the top 5 closest results, I inputted this information into another separate LLM chain to reason about multiple best, then single best category.
Takeaways
The biggest takeaway for me with this project is that understanding the principles of these new tools helps a problem-solver piece together creative solutions for domain problems. Nothing innately new has been done here, but bridging the divide between the theory and practice has been done to solve a real bottleneck problem.