Pursuing to excel in algorithmic roles, esp in LMMs application.
2024 Fall students for MS in Computer Science at University of Wisconsin - Madison.
View My LinkedIn Profile
Objective: To automate the extraction of fashion attributes like sleeve type, waistline, and material from clothing titles and images, facilitating efficient trend analysis.
Background: Traditional methods using manual annotations to train CNN models for this task resulted in low accuracy (60%) and high costs, both in terms of time and resources.
Solution: We propose using a multimodal approach with advanced AI models such as GPT, GEMINI, and CLAUDE. This method involves recognizing raw fashion attributes and then accurately mapping them to an internal library using text similarity, aiming to improve accuracy, reduce costs, and accelerate the process.
Responsibility:
High Accuracy in Attribute Extraction: Successfully extracted and mapped key product attributes from competitor imagery and titles with prompt engineering, few-shot learning, and supervised fine-tuning, achieving 95% accuracy across 78 different attributes.
Agent System Integration: Break down the task and implemented an agent system that coordinated cv, LMMs, Clip models and integrated feedback, which enhanced overall accuracy by an additional 5%.
Fashion attribute recognition requires a comprehensive analysis of the entire image, which includes the background, skin color, lighting, and shadows. These elements are crucial for accurately identifying and understanding the nuances of fashion attributes.
Traditional CNN methods often rely on simplistic feature recognition techniques that disregard these critical elements, resulting in reduced accuracy, especially in complex scenarios.
In contrast, Large Language Models (LLMs) such as GPT-4 demonstrate superior performance in this task. This advanced approach enables a deeper understanding of the image as a whole, significantly improving the identification of fashion attributes.
Steps:
Pros and Cons: Pros:
Cons:
Steps:
Provide Input: The input consists of images and titles (to identify the main subject) along with prompts. (Insert prompts and example outputs here.)
Textual Mapping: The model maps the provided text to understand and categorize the attributes.
Pros and Cons: Pros:
Cons:
Hallucination: The model might generate plausible but incorrect information based on the prompt.
Specialized Domain Knowledge: GPT-4 might not be aware of domain-specific concepts. For example, the term “2 in 1” is a common style in fashion known as a false two-piece, which the model might not accurately understand.
Direct Naming Issue: The model cannot directly return internally defined attribute names specific to a company. Instead, it necessitates text matching to correlate its outputs with the company’s specific attribute lexicon.
This project aims to enhance product operations by leveraging search trend data to recall and promote relevant products, thereby optimizing sales performance. Utilizing the CLIP model, the system identifies trending keywords—such as “Y2K”—and locates products within our inventory that align with these trends. Once identified, these products are prioritized for increased traffic, exposure, and promotional activities within our application. By dynamically adapting to current trends, this approach ensures that our product offerings remain highly relevant and appealing to our customer base, ultimately driving higher sales and improved operational efficiency.
Steps:
Achievement:
Steps:
Enable Azure GPT-4 OCR Enhancement: Initiate the process by activating the GPT-4 OCR enhancement feature in Azure.
Apply Coordinate Grid to Captcha: Add a coordinate grid overlay to the Captcha image for precise identification.
For the solution:
First GPT-4 Call for Logic Questions: Make the initial call to GPT-4 asking logic-based questions, such as “What is the lowercase letter corresponding to the yellow letter?”. The response might be: “A lowercase letter ‘a’ in green.”
Second GPT-4 Call for Bounding Box: The next call to GPT-4 requests the bounding box of the identified “green lowercase a”.
Calculate Bounding Box Centroid: Determine the centroid of the bounding box coordinates to guide the user interface for the correct click response.
Achievement:
Average 87% accuracy in breaking down captcha within five try.