Pursuing to excel in algorithmic roles, esp in LMMs application.
2024 Fall students for MS in Computer Science at University of Wisconsin - Madison.
View My LinkedIn Profile
This project was initiated to harness the power of web-crawled competitor data, utilizing sophisticated data warehousing and predictive analytics technologies to enhance our strategic decision-making capabilities. As a Big Data Product Manager, I led the development and implementation of a comprehensive data management framework that not only addressed data integrity but also supported advanced applications in competitive analysis and sales forecasting.
Our objectives:
to create a robust data warehouse architecture capable of accommodating trillions of data points from competitor websites, to cleanse and maintain high data quality against anti-crawling measures, and to develop predictive models that accurately forecast daily sales.
Data Governance: Successfully managed trillions of data points collected through web scraping from three competitor websites. Implemented robust data cleaning processes to tackle issues from anti-crawling mechanisms, maintaining data quality accuracy within 97%.
Data Warehouse Architecture: Redesigned and scaled our Hadoop Hive data architecture, which included developing and implementing backup strategies that enhanced data repair efficiency by 30%.
Sales Forecasting: Developed and deployed an ARMA model to forecast daily sales accurately, achieving an annual revenue prediction accuracy within 5%.
Graph Search Service: Overhauled our Kafka-based CV service, improving the completeness of our data matching capabilities from 63% to 99%. I also integrated a Milvus vector database with ResNet models to boost the accuracy of our product matching feature from 78% to 83%.
Hadoop Hive: Utilized for scalable data warehousing solutions, supporting vast amounts of structured and semi-structured data.
SQL and Kafka: Employed for data manipulation and real-time data processing to handle massive datasets efficiently.
ARMA Model: Applied for predictive analytics to forecast sales trends based on historical data.