A small collection of work and project examples to peruse.
File Storage API Template - Python
Every business has to work with files, moving them around and deriving insight from them. They're also a good foundation for doing more interesting work with NLP and text analytics. I've done two different general containerized file management API templates implemented in Python, one using Django and the other using Flask. They both use either/or AWS and Azure for file storage based on the request, and utilize Postgres record management. To make the record management async, they both utilize Celery, with the Flask app also taking advantage of Redis. They're both set up to be easily expandable for more robust file metadata, or if one were inclined to add processing functionality like text extraction or file manipulation.
Languages:
Python
Utilizes:
Django / Flask
Docker
Postgres
AWS S3
Azure Blob
Redis
File Storage API Template - C#.NET
Another general purpose file management API template, but this one is implemented in C#.NET with clean architecture principles. Like the Python versions, it is containerized and uses either/or AWS and Azure for file storage based on the request, and utilizes Postgres for asych queue based record management. It's also set up to be easily expandable to add more metadata or to do more advanced file processing.
Languages:
C#
Utilizes:
.Net8
Docker
Postgres / EntityFramework
AWS S3
Azure Blob
Qualtrics Attention Tracking
Qualtrics is one of the most widely used academic/scientific survey platforms, but one thing it's missing is the ability to track user attention to the current task. This project demonstrates how to integrate JavaScript within Qualtrics to monitor user activity and track task engagement. The script detects whether a participant has left or entered the browser window and records these events in real-time. All captured data is stored within a JSON object and saved as an embedded variable within the Qualtrics platform itself, so the data comes out as part of the survey results. I've also provided the base scripts for unpacking the JSON for analysis in both Python and R.
Languages:
JavaScript
Utilizes:
Qualtrics
Recommender System
The goal of this project was to deploy a recommender system app online that would allow a user to select between a basic filter based approach, or a statistical approach built on user data. The app itself uses a classic public movie dataset (movielens) that has been used for recommendation research for quite a while. Since it's built in R's Shiny library, it also has a pretty boiler plate appearance, but I'm pretty happy with the final result as the algorithms run as expected and nothing breaks :)
Languages:
R
Utilizes:
Shiny App
UBCF and IBCF recommendation methods
EM for Guassian Mixtures
Although I have done tons of coding assignments, including them all here would be cumbersome, redundant, and frankly super boring. However, I found that this one from CS_598 Practical Statistical Learning was pretty challenging and it's a good showcase of some coding in R. The html from the notebook is embedded in the site, so if you want to check it out and judge me on my loop structure or bad R vectorization, feel free.
Languages:
R
Shows:
Coding EM algorithm for gaussian mixtures
Mathematical Derivations
Coding the Baum-Welch algorithm for Hidden Markov Models
Coding the Viterbi algorithm
Parallel Programing
For the final project of CS484 Parallel programing, the task was to code a repeating histogram sort algorithm in parallel using different paradigms on the University of Illinois's campus cluster. This was a fun way for me to test my coding chops, because if there's one thing sure to add a challenge to coding or to an algorithm, it's to have it run in parallel efficiently.
Languages:
C++
Utilizes:
MPI message passing protocols
Slurm cluster processing
If you're bored enough to check out the repo, see the solution.cpp file for the actual algorithm
NER ML with Tensorflow
To get extra practice building neural networks from scratch, I decided to participate in eBay's 2022 University Machine Learning Competition to see how I stacked up. The goal of the competition is to build a model for Named Entity Recognition to label a massive dataset of handbag listings. I obviously didn't win anything, but I don't think I did half bad training some models in my spare time. All in all it's good practice!
Benchmark F1 Score: 0.800
Best F1 Score: 0.8488
Languages:
Python
Models tried with various embedding strategies and architectures:
Bi-Directional LSTM
Transformer (scratch)
Transformer (re-trained DialoGPT)
Utilizes:
Tensorflow and Keras
LSTM and Transformer network architectures
Some examples in the repo (not all because they're huge and who actually cares)
Financial News Sentiment Oscillator
Accumulates and analyzes news for a specified company via multiple APIS and creates relevancy weighted news sentiment scores with advanced NLP. This was one of my favorite projects to build, as it brings together a lot of different pieces.
Languages:
Python
Utilizes:
Multiple free news APIs to accumulate news
BM25 algorithm adjusted with Query Expansion
Retrained Neural Network for financial based sentiment analysis
Plotly Dash app.
Check out the repo for more information. There are multiple ERDs to help understand the codebase and methodology, and a link to a usage tutorial.
Analyzing Neighborhoods in Chicago
For the culmination of the IBM Data Science certification, I decided on a value analysis approach to analyzing neighborhood value in the Chicagoland area for new home buyers. To add an extra challenge (and because I was curious) I wanted to see if the amount of tree cover had any effect on clustering with regard to housing prices and neighborhood value.
Languages:
Python
Utilizes:
Clustering analysis
Geospatial polygon mapping
Data Processing Wizard
Designed to be used as an extension to excel, this takes an assembly line approach to routine data processing to create and validate large import files. I originally created the first prototype with the goal of helping less technically savvy colleagues work with data faster, without the need to learn complex excel functions or coding. It grew into a decent excel tool to help speed up data manipulation, validation and importing while keeping human operators in the loop.
Languages:
VBA
Utilizes:
Excel with VBA
User defined templates for routine file processing
Assembly line procedure for quality checking and import creation
Report Generators
Creating custom excel report generators and script runners with VBA is a bit of a passion of mine.
The baked in integration with Excel and simple UserForm-code integration allows for incredibly swift development of tools to help with a plethora of routine data tasks.
Whether it be for reporting department metrics, or calculating the grades for my wife's psych 400 class, I say if you can put it into a spreadsheet, you might as well code a reusable solution.
While you're at it, making it look ridiculous is always a bonus.