In these projects I either pursued an interesting line of inquiry or devised a tool I thought would be useful. They were built with R, Python, MySQL, Excel VBA, Javascript, jQuery, or some combination of them, and skills learned in courses taken on the edX or Coursera platforms.

Data analysis, exploration, and visualization

Boricuas in the NCAA: 2019-2020 yearbook of Puerto Rico women’s college volleyball in the U.S.A.

Inspired by the SI yearbooks for college and pro sports, the project documents the performance of Puerto Rico’s college volleyball players throughout the United States. Maps were created using R, while the document itself, to a vast extent, was created using the Python API for Scribus, a desktop publishing software.

drawing drawing drawing drawing
drawing drawing
drawing drawing
drawing drawing

Tracking the performance of the Florida Retirement System pension fund

This notebook uses SAS and PROC SQL to visualize the performance of the Florida Retirement System pension fund, one of the largest in the United States. The data is from the Center for Retirement Research at Boston College, which tracks 180 pension funds across the United States. The performance of the fund is also compared to those of other large funds.

drawing drawing

Comparing the grammatical proficiency of ESL learners in English-speaking countries

This notebook uses SAS to compare the grammatical proficiency of English learners in the United States to that of learners in other countries of the Anglosphere. The dataset came from three Boston-area professors who collected and analyzed data from more than 600,000 people who took an online English grammar quiz.

drawing

Interactive visualization of performance of Puerto Rican high schools in 2013-2014

Interactive visualization of the graduation rates of all public high schools in Puerto Rico, across regions, districts, and cities, using R and Shiny server.

drawing


Interactive visualization of 5 years of University of Puerto Rico admissions data

An interactive web app created using ~69,000 records from Puerto Rico’s Open Data Portal, corresponding to students admitted to the UPR campuses over a 5-year period, using R and Shiny server. The app helps educators and future applicants visualize the qualifications of students admitted to a given UPR campus in a given year. The app also illustrates the most popular majors among male and female students, as well as the most selective ones overall. In addition, the most frequent high schools of provenance of those admitted to a campus and the top performing schools in Puerto Rico are also graphed.

drawing drawing


Orange County Real Estate Sales By Zipcode

Tableau visualization of Orlando real estate sales by zipcode, using data provided by the Orlando Regional Realtor Association.

drawing drawing


Orlando Real Estate 20-Year History

Tableau visualization of 20 years of Orlando real estate sales, using data provided by the Orlando Regional Realtor Association.

drawing


Analysis of 2016 California ballot measures

This is the final project for Python for Data Journalists: Analyzing Money in Politics, a course offered by the Knight Center for Journalism in the Americas. The project summarizes and plots the sources and amount of funding received by the various ballot measures from the November 8, 2016 election in California, as well as the voting results, using Python, Jupyter, pandas, numpy, and matplotlib.

drawing


Los Angeles County salaries dashboard (2013-2015, 300,000 records)

Heatmaps and boxplots dashboard for an L.A. County employee salaries dataset that includes employees’ salaries and benefits for the years 2013-2015, using R, the rbokeh package, and Shiny server.

drawing drawing


Exploring the 1980 MLB season with MySQL and R

A whimsical look at the 1980 Major League Baseball season using MySQL, R, and the 2016 Lahman database, which has baseball data going back to 1871. In 1980, baseball was a big deal.

drawing


Majors, salaries, and genders

A visualization of the median salaries of recent graduates of about 170 majors, and the degree of women’s participation in each major, using Python, Jupyter, pandas, numpy, and matplotlib.

drawing


Who shops Black Friday sales on Thanksgiving Day?

An exploration and visualization of who shops the Black Friday sales on Thanksgiving Day, using Python, Jupyter, pandas, numpy, and matplotlib.

drawing drawing


Lookup tables and pivot tables in spreadsheets (and R)

What lookup functions and pivot tables in spreadsheets can do for us and their equivalents in R, using R notebook, Excel, MySQL (RMySQL), and XAMPP.

drawing drawing


Exploring Africans’ views on China using Excel VBA

Using Afrobarometer’s 2016 poll data and Excel VBA to gain some insight into Africans’ view on China.

drawing drawing


Exploring the rising costs of the Affordable Care Act’s insurance premiums in Florida

Used Microsoft SQL Server, Power BI, and R to explore and visualize insurance premium data from 2014 to 2019 downloaded from Healthcare.gov’s data website. A SQL Server was set up and Transact SQL queries were run against it to extract the relevant data, which was then visualized using Power BI and R.

drawing drawing drawing

Crime maps

These interactive maps plot police activity within a given radius of a location. The user can specify types of incidents, days of week, and times of day to refine results. Each of them also displays density maps, faceted bar plots, and contingency tables. They were put together using using R and Shiny server, while the data was pre-processed using Python.

drawing drawing
drawing
drawing


Video game sales by year, platform, genre, and region

Tableau visualization of this Kaggle dataset that includes video game sales of some 16,000 video released between the 1980’s and 2016.

drawing
drawing drawing


Data pre-processing

Pre-processing of police calls data

This notebook describes the pre-processing, using R notebook, applied to a dataset that would eventually be used in the Orlando police calls map.


Statistical inference

These statistical inference projects were done using R notebook.

Inference on a population mean

An R notebook that infers the true average number of hours worked by Americans, based on the 2016 General Social Survey.

Inference on a population proportion

An R notebook that infers the true proportion of Americans working full time, based on the 2016 General Social Survey.

Inference on the difference in population means

An R notebook that infers the true difference in mean self-ranking between two populations: Americans who voted for Mitt Romney in the 2012 presidential elections, and those who voted for Barack Obama.

Inference on the difference in population proportions

An R notebook that makes inferences about the true difference in proportion of gun ownership between two populations: Americans who don’t live within a 1-mile radius of an area they fear, and Americans who do.

drawing drawing


Machine learning

Notebook: Predicting the performance of Prosper loans using logistic regression

Devising a strategy to invest in Prosper loans using logistic regression, R notebook, and the caret and ROCR packages.

drawing drawing


Notebook: Predicting thyroid diagnoses with decision trees using R notebook and the rpart package

drawing


Notebook: Predicting the severity of mammography assessments with decision trees using R notebook and the rpart package


Notebook: Classification of tweets using SVM

Visualizing and classifying tweets using Support Vector Machines via R notebook and the tm, SnowballC, wordcloud and e1071 packages.

drawing drawing drawing


EE

PLL design tool

A colorful 3rd-order PLL design tool in Python/Javascript. Computes loop filter components’ values, plots open- and closed-loop responses and output-referred noise plots, computes RMS phase and frequency errors and jitter, plots time response, and computes various lock times. Plots and tabulates extensive results to web page or, alternatively, generates complete Excel report for download and further computations. The app is available in Simplified Chinese as well. 体验一下吧. The app is hosted on Google App Engine. In the backend, the app uses Python and the modules numpy, xlrd, and xlwt, and the Jinja2 templating engine. In the front end, it uses the HTML5 stack: HTML, CSS, and Javascript, plus Google Charts.

drawing drawing drawing


Smith Chart impedance matching tool

A vibrant Smith chart impedance matching tool hosted on Google App Engine and using jQuery and HTML Canvas that aids designers to match a given impedance ZL at a given frequency to a given characteristic impedance Zo. Computes equivalent input impedance and reflection coefficient amplitude and phase and plots on the Smith Chart. Can use Z, Y, or ZY Smith charts.

drawing