Xinchi (Chloe) Liu

Logo

Pursuing to excel in algorithmic roles, esp in LMMs application.

2024 Fall students for MS in Computer Science at University of Wisconsin - Madison.

View My LinkedIn Profile

View My GitHub Profile

Fashion Attributes Extraction for Trend Analysis and Enhance Application with CLIP and Reasoning Captcha

image

Outline

Introduction

Objective: To automate the extraction of fashion attributes like sleeve type, waistline, and material from clothing titles and images, facilitating efficient trend analysis.

Background: Traditional methods using manual annotations to train CNN models for this task resulted in low accuracy (60%) and high costs, both in terms of time and resources.

Solution: We propose using a multimodal approach with advanced AI models such as GPT, GEMINI, and CLAUDE. This method involves recognizing raw fashion attributes and then accurately mapping them to an internal library using text similarity, aiming to improve accuracy, reduce costs, and accelerate the process.

Responsibility:

Achievements

image

High Accuracy in Attribute Extraction: Successfully extracted and mapped key product attributes from competitor imagery and titles with prompt engineering, few-shot learning, and supervised fine-tuning, achieving 95% accuracy across 78 different attributes.

Agent System Integration: Break down the task and implemented an agent system that coordinated cv, LMMs, Clip models and integrated feedback, which enhanced overall accuracy by an additional 5%.

ProjectDevelopment

ExampleShowcase

Fashion attribute recognition requires a comprehensive analysis of the entire image, which includes the background, skin color, lighting, and shadows. These elements are crucial for accurately identifying and understanding the nuances of fashion attributes.

Traditional CNN methods often rely on simplistic feature recognition techniques that disregard these critical elements, resulting in reduced accuracy, especially in complex scenarios.

In contrast, Large Language Models (LLMs) such as GPT-4 demonstrate superior performance in this task. This advanced approach enables a deeper understanding of the image as a whole, significantly improving the identification of fashion attributes. image

Traditional_CNN

image

Steps:

Pros and Cons: Pros:

Cons:

LMMs-GPT-4&Gemini

image

Steps:

Pros and Cons: Pros:

Cons:

image

EnhancingApplication

Trend-Based_Product_Traffic_Optimization_Using_CLIP_Model

image

This project aims to enhance product operations by leveraging search trend data to recall and promote relevant products, thereby optimizing sales performance. Utilizing the CLIP model, the system identifies trending keywords—such as “Y2K”—and locates products within our inventory that align with these trends. Once identified, these products are prioritized for increased traffic, exposure, and promotional activities within our application. By dynamically adapting to current trends, this approach ensures that our product offerings remain highly relevant and appealing to our customer base, ultimately driving higher sales and improved operational efficiency.

Steps:

Achievement:

image

Break_Down_Captcha_with_GPT4_Azure_OCR_Enhancement

image

Steps:

Achievement:

Average 87% accuracy in breaking down captcha within five try. image

image