Xinchi (Chloe) Liu

Logo

Pursuing to excel in algorithmic roles, esp in LMMs application.

2024 Fall students for MS in Computer Science at University of Wisconsin - Madison.

View My LinkedIn Profile

View My GitHub Profile

Enhanced Product Categorization with Retrieval-Augmented Generation (RAG)

Github Jupyter Notebook

image

Outline

Introduction

Project Description:

This project aimed to revolutionize the way product titles are matched with category trees by integrating NLP techniques with Retrieval-Augmented Generation (RAG).

Objective:

while analyze competitor’s products, it is neccessary to map the product to the internal categorization, which is crucial for competitor analysis.

while uploading new product from our supplies, with this enhanced method to find product category can increase the accuracy and efficiency of product information, which are crucial for inventory management.

image

Achievements

Accuracy: 96%+

image

ProjectDevelopment

Traditional_NLP

Steps:

Achieve 80% accuracy in top 5 candidates. Typical badcases fails to capture contextual information of product and category.

RAG

Steps:

image

image

image

image

image

Challenges_Faces

Chunk_Strategies

raise several chunk strategies:

  1. use product title as bridge to connect category with new product titles.

  2. first, stem and simplify product title only keep main info, then do the step above.

  3. directly use category jsonl as chunk then prompt choose from candidate. [ best performance and lowest cost] above

image

Hallucination

In the multi-classification tasks involving long prompts, the language model (e.g., GPT) often generated hallucinated or irrelevant responses. This issue was particularly challenging as it compromised the accuracy and reliability of the product categorization.

Addressing Hallucination: To mitigate this issue, we integrated Retrieval-Augmented Generation (RAG) and utilized cosine similarity measures. This approach helped narrow down the classification candidates, effectively reducing the potential for hallucination by focusing the language model’s responses on more probable and relevant categories.