Anchal Chaudhary

After finishing Bachelor's in Computational Mathematics and a Master’s in Analytics from UC Davis in 2023, I’ve been working in the data world, putting my skills to the test and learning more every day. This portfolio is a collection of projects that showcase my journey in analytics and my ongoing curiosity in the field.

About Me

About Me

Skills

Python (Numpy, Pandas, Matplotlib, Sk-Learn, TensorFlow, ...)90%
R Studio 87%
Tableau 100%
Google BigQuery 100%
Looker 100%
DBT( Data Build Tool) 100%
Presto SQL 88%
SQL (MySQL, MS SQL Server)95%
MS Office (Excel, PowerPoint, Word) 80%
Cloud Services (AWS, GCP) 70%
Professional Experience

Professional Experience

Product Data Analyst

WALMART

Location: New York, NY

Duration: APR 2024 - PRESENT

  • Built analytics frameworks for various fintech products for the retail sector

Associate Manager, Business Intelligence

FINN (SERIES C STARTUP)

Location: New York, NY

Duration: SEP 2023 - Mar 2024

  • Developed scalable data solutions and delivered actionable insights to senior management and stakeholders, driving strategic decisions, optimizing remarketing operations through advanced analytics.

Data Scientist

Practicum Project- University of California Davis

FASHOM

Location: San Francisco, CA

Duration: Aug 2022 - June 2023

  • As part of MSBA, collaborated with the CEO of a B2C e-commerce startup to leverage machine learning to reduce product returns and increase LTV using python, SQL, AWS, GCP.
My Portfolio

My Portfolio

  • All
  • Featured Projects
  • Machine Learning
  • Stats Models
  • Tableau Dashboards
  • Web Scraping

Prediciting Student Performance & Learning Analytics

Prediciting Student Performance & Learning Analytics

This project aims to predict student performance in real-time during game-based learning using one of the largest open datasets of game logs. The goal is to advance research into knowledge-tracing methods for game-based learning, helping developers create more effective educational games and providing educators with dashboards and analytic tools. Although game-based learning is becoming increasingly popular, there are limited open datasets available to apply data science and learning analytic principles. The Field Day Lab, a publicly-funded research lab, designs educational games for various subjects and age groups, making use of game data to understand how people learn. The lab partners with nonprofits like The Learning Agency Lab to develop the science of learning-based tools and programs for the social good.

NLP & Multi Label Classification

NLP & Multi Label Classification

The "Toxic Comment Classification" project aims to identify and classify toxic comments in online platforms. The project falls under the category of Natural Language Processing (NLP) and involves the task of multi-label classification. The goal is to predict whether a comment belongs to one or more categories such as toxic, severe-toxic, obscene, threat, insult, or identity-hate. Problem transformation methods like Binary Relevance, Classifier Chain have been used

Spam Classifier

Spam Classifier

The Email Spam Classifier project is an end-to-end code that accurately classifies text messages as spam or ham (non-spam) based on their content. Its main objective is to alert users to spam messages and protect them from fraudulent activities. The project utilizes a dataset from the UCI Machine Learning Repository and performs data cleansing and text preprocessing to prepare the data for analysis. Various classification models, including Naive Bayes, Logistic Regression, Decision Tree Classifier, Random Forest Classifier, Ada-Boost Classifier, and Bagging Classifier, are created and evaluated based on accuracy and precision. The Multinomial Naive Bayes algorithm is found to deliver good accuracy and precision, making it the preferred choice.

Fraud Detection

Fraud Detection

A loan offering company aims to develop a default risk model using historical loan records. Used R-studio to build default risk model that gives a risk score for each customer in the test set, select the final model, and report the Mean Absolute Error (MAE) on the test data.

Customer Acquistion Analysis

Customer Acquistion Analysis

This project focuses on analyzing the effectiveness of a digital advertising campaign conducted by Game Fun, a leading developer of casual mobile games. The goal of the campaign was to improve customer acquisition by running an A/B experiment using online display banners. The project involves performing a comprehensive data analysis, evaluating the impact of the experiment on various customer segments, and providing recommendations based on the findings.

Image Compression

Image Compression

Image Compression using PCA in R-Studio

Logistic Regression

Logistic Regression

This project involves analyzing the effects of interactions in linear regression, developing logistic regression models, interpreting model coefficients, and creating interaction plots. Additionally, the project examines the effects of income and change in savings on the likelihood of buying a house through various plots

Wine Web Scraping

Wine Web Scraping

This project focuses on web scraping using Selenium to extract data from the Vivino wine website. It involves navigating the website, scraping wine pages, and storing the collected data in a MongoDB database. The project aims to automate the collection of wine data for further analysis.

Regression Discontinuity Design

Regression Discontinuity Design

In this project, the effect of drinking on the likelihood of death will be explored using the drinking.csv dataset. The RDD (Regression Discontinuity Design) method will be applied to determine if alcohol consumption increases the risk of death. Results will be analyzed to provide insights on whether the legal drinking age should be lowered from 21.

Lasso Regression with CV

Lasso Regression with Cross validation

In this project, the heart disease dataset is analyzed using R. Statistical analyses are performed, including sample subset selection, training subset selection, simple linear regression modeling, cross-validation, and lasso regression. The project aims to predict the probability of heart attack and evaluate the model's performance. AIC and AICc are used to assess the model's quality.

Recommendation System>

Recommendation System

Developed recommendation engine for customer analytics -Applied collaborative filtering techniques(user and item based) with different metrics on movies to predict ratings for existing as well as new customers and determined which model predicts the best.

Contact Me

Contact Me

Let's Connect!

Social Profiles

Email Me

aanchalll71@gmail.com