Projects
Examining the Effectiveness of Telehealth vs. In-Person Services for Children Using the CANS-IP
Data Analyst
September 2024 – October 2024
- Presented on analysis done to measure the effectiveness of in-person vs. telehealth care for children receiving outpatient mental health services
- Used t-tests and linear regression models to validate results
- Designed and presented a poster of our findings with the rest of the Five Acres Research Team to over 100 people attending the conference
Analyzing Power Outages After Hurricanes for Private Client
Data Scientist
January 2023 – March 2023
- Extracted outage keywords from NOAA descriptions and converted time data from text to integer values for modeling
- Created choropleth maps of the United States and and visualizations to present power restoration times to stakeholders
- Constantly communicated reported findings which helped stakeholders make informed business decisions
Predicting MLB Playoff Teams
Project Lead & Data Scientist
June 2022 – October 2022
GitHub Repository / Medium Article
- Led a team of 8 data scientists to build algorithms to predict if professional baseball teams (MLB) made the playoffs
- Researched, webscraped, combined, and cleaned 8 660-row datasets leveraging BeautifulSoup, pandas, and scikit learn
- Produced dashboards and conducted model selection to develop a logistic regression model with 88% accuracy
- Led project meetings and taught self and project members Tableau, GitHub, data visualization, and machine learning
- Developed a project timeline, managed the project GitHub repository, and wrote an article presenting the results
Developing a Matching Algorithim for Private Client
Data Scientist
January 2022 – May 2022
- Accessed customer data from vendor’s API and cleaned music preference data using pandas
- Leveraged cosine similarity in scikit learn to develop a matching algorithm using a customer’s music preferences which the stakeholder used in the development of their product
Understanding the 2020 U.S. Census
Data Scientist
October 2021 – December 2021
- Researched and consolidated 2020 U.S. Census data on racial and income proportions by state
- Accessed census data from the U.S. Census API and combined metrics into one data set using pandas
- Cleaned data using R and made interactive [1, 2] and animated visualizations [1, 2, 3, 4] utilizing Python and Tableau
Predicting California Energy Usage and Air Pollution In Wake of the Pandemic
Data Scientist
June 2021 – October 2021
- Researched California’s energy portfolio, cleaned data, and leveraged Google Data Studio and Tableau to make interactive dashboards to present to stakeholders
- Utilized linear regression in R to make predictions on nitrogen and sulfur oxide concentrations in the atmosphere with 90% accuracy