This notebook is a visualization of the median salaries of recent graduates of about 170 majors, and the degree of women's participation in each of them. I want to plot the salaries and shares of women side by side to see if there's a larger participation of women in professions that earn less, and viceversa.
The dataset is available at FiveThirtyEight's Github's repository. We start by loading the libraries we will be using and then read the data.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
Reading the data
major_earnings = pd.read_csv("recent-grads.csv")
This dataset corresponds to recent graduates of each profession, those aged 28 and younger. The dataset fields we are interested in are Major_category
, Major
, Median
(the median salary), and ShareWomen
(the fraction of women among that major's recent graduates).
We are given the fraction of women in each major, ShareWomen
, but we will use the percentage of women in our plots.
# Convert fraction ShareWomen to percent
major_earnings["PctWomen"] = major_earnings["ShareWomen"].apply(lambda x: 100 * x)
We can print the number of majors in each category.
major_categories = major_earnings["Major_category"].unique().tolist()
for major_cat in major_categories:
print(major_cat, "majors:", major_earnings[major_earnings["Major_category"] == major_cat].shape[0])
There are about 170 majors in 15 categories, plus interdisciplinary majors. We will be creating "super" major categories in order to group the bar plots.
The following function will be called by all the plots.
def ax_settings(ax, ax2, major_df, plot_title, max_salary):
'''
Sets axis settings for all bar plots.
ax, ax2: salary and % of women axes
major_df: dataframe pertaining to a "Major_category"
plot_title: desired title for bar plot
max_salary: for the 'ax1' axis, the maximum salary
'''
rects = [] # bars for salary axis
rects2 = [] # bars for % of women axis
major_df.set_index("Major", inplace = True) # index df by "Major" columns
majors = major_df.index.tolist()
idx = 0
# Drawing the bars one at a time
for major in majors:
rects.append(ax.bar(x = idx, height = major_df.loc[major, "Median"], width = 0.5))
rects2.append(ax2.bar(x = idx + 0.5, height = major_df.loc[major, "PctWomen"], width = 0.5, color = "0.75",
hatch="//"))
idx += 2
# Making spines and ticks invisible for both ax and ax2
for key,spine in ax.spines.items():
spine.set_visible(False)
ax.tick_params(bottom="off", top="off", left="off", right="off")
ax.tick_params(labelbottom='off')
for key,spine in ax2.spines.items():
spine.set_visible(False)
ax2.tick_params(bottom="off", top="off", left="off", right="off")
ax2.tick_params(labelbottom='off')
# Shortening major names b/c some of them are *really long*
majors_shortened = [major[:20] for major in majors]
# Legends
ax.legend(rects, majors_shortened, loc='lower center', bbox_to_anchor=(0.5, 1.05), frameon = False, ncol = 2)
ax2.legend(["% of women in major"], frameon = False)
ax.set_title(plot_title, fontsize = 14, fontweight = "bold")
# Setting tick marks for both ax and ax2
ax.set_yticks([0, int(max_salary/2), max_salary])
vals = ax.get_yticks()
ax.set_yticklabels(['${:,.0f}'.format(x) for x in vals])
ax2.set_yticks([0, 50, 100])
vals = ax2.get_yticks()
ax2.set_yticklabels(['{:2.0f}%'.format(x) for x in vals])
return (ax, ax2)
I will group several major categories into "super" major categories for the purposes of plotting.
# Liberal arts majors except humanities
lib_arts_major_cats = ["Arts", "Communications & Journalism", "Social Science"]
fig = plt.figure(figsize=(15,10))
for sp in range(0, 3):
ax = fig.add_subplot(2,3,sp+1)
ax2 = ax.twinx()
categ_majors = major_earnings[major_earnings["Major_category"] == lib_arts_major_cats[sp]]
ax, ax2 = ax_settings(ax, ax2, categ_majors, lib_arts_major_cats[sp] + " Majors", 60000)
# Humanities majors
for sp in range(3, 5):
ax = fig.add_subplot(2,3,sp+1)
ax2 = ax.twinx()
if sp == 4:
categ_majors_hum = major_earnings[major_earnings["Major_category"] == "Humanities & Liberal Arts"][:8]
else:
categ_majors_hum = major_earnings[major_earnings["Major_category"] == "Humanities & Liberal Arts"][8:]
ax, ax2 = ax_settings(ax, ax2, categ_majors_hum, "Humanities & Liberal Arts Majors " + str(sp - 2), 60000)
fig.tight_layout(w_pad = 2.0, h_pad = 10.0)
plt.show()
Women appear to have achieved an adequate or even a high degree of representation in most liberal arts and humanities majors, with the exceptions of economics and geography. It may be worth noting that economics is the highest paid profession among the social science majors.
fig = plt.figure(figsize=(15,15))
# Engineering majors
for sp in range(0, 3):
ax = fig.add_subplot(3,3,sp+1)
ax2 = ax.twinx()
if sp == 0:
categ_majors_eng = major_earnings[major_earnings["Major_category"] == "Engineering"][:10]
elif sp == 1:
categ_majors_eng = major_earnings[major_earnings["Major_category"] == "Engineering"][10:20]
else:
categ_majors_eng = major_earnings[major_earnings["Major_category"] == "Engineering"][20:]
ax, ax2 = ax_settings(ax, ax2, categ_majors_eng, "Engineering Majors " + str(sp + 1), 60000)
# Computers and mathematics majors
for sp in range(3, 5):
ax = fig.add_subplot(3,3,sp+1)
ax2 = ax.twinx()
if sp == 4:
categ_majors_compsci = major_earnings[major_earnings["Major_category"] == "Computers & Mathematics"][:6]
else:
categ_majors_compsci = major_earnings[major_earnings["Major_category"] == "Computers & Mathematics"][6:]
ax, ax2 = ax_settings(ax, ax2, categ_majors_compsci, "Computers and Mathematics Majors " + str(sp - 2), 60000)
# Biology & Life Science majors
for sp in range(5, 7):
ax = fig.add_subplot(3,3,sp+1)
ax2 = ax.twinx()
if sp == 5:
categ_majors_biol = major_earnings[major_earnings["Major_category"] == "Biology & Life Science"][:7]
else:
categ_majors_biol = major_earnings[major_earnings["Major_category"] == "Biology & Life Science"][7:]
ax, ax2 = ax_settings(ax, ax2, categ_majors_biol, "Biology & Life Science Majors " + str(sp - 4), 60000)
# Physical Sciences majors
ax = fig.add_subplot(3,3,8)
ax2 = ax.twinx()
categ_majors_physci = major_earnings[major_earnings["Major_category"] == "Physical Sciences"]
ax, ax2 = ax_settings(ax, ax2, categ_majors_physci, "Physical Sciences Majors", 60000)
fig.tight_layout(w_pad = 2.0, h_pad = 10.0)
plt.show()
Given that in the U.S. women outnumber men in all but ten states , we note that women are underrepresented in every engineering major and all but one Computer and Mathematics major. In fact, in a majority of majors in both categories, the percentage of women participation appears to be below 35%. Graduates of professions in both major categories are typically very well remunerated, with several enjoying starting salaries above \$60k. In addition, our world is evermore increasingly driven by technology, so it is significant that women have such a low degree of participation in those careers. We should do better. In the other major categories, there appear to be as many women as there are men in most majors, except physics.
# Psychology and social work majors
fig = plt.figure(figsize=(15,5))
ax = fig.add_subplot(1,3,1)
ax2 = ax.twinx()
categ_majors_psy = major_earnings[major_earnings["Major_category"] == "Psychology & Social Work"]
ax, ax2 = ax_settings(ax, ax2, categ_majors_psy, "Psychology & Social Work Majors", 50000)
# Health majors
for sp in range(1, 3):
ax = fig.add_subplot(1,3,sp+1)
ax2 = ax.twinx()
if sp == 1:
categ_majors_health = major_earnings[major_earnings["Major_category"] == "Health"][:6]
else:
categ_majors_health = major_earnings[major_earnings["Major_category"] == "Health"][6:]
ax, ax2 = ax_settings(ax, ax2, categ_majors_health, "Health Majors " + str(sp), 50000)
fig.tight_layout(w_pad = 2.0, h_pad = 10.0)
plt.show()
The plots indicate that there is a high percentage of women in most health-related professions. In some of them, such as nursing, very high.
Here is a plot of Business, Law, Agriculture, Education, and Industrial Arts majors.
fig = plt.figure(figsize=(15,15))
# Business majors
for sp in range(0, 2):
ax = fig.add_subplot(3,3,sp+1)
ax2 = ax.twinx()
if sp == 0:
categ_majors_bus = major_earnings[major_earnings["Major_category"] == "Business"][:7]
else:
categ_majors_bus = major_earnings[major_earnings["Major_category"] == "Business"][7:]
ax, ax2 = ax_settings(ax, ax2, categ_majors_bus, "Business Majors " + str(sp + 1), 50000)
# Law majors
ax = fig.add_subplot(3,3,3)
ax2 = ax.twinx()
categ_majors_law = major_earnings[major_earnings["Major_category"] == "Law & Public Policy"]
ax, ax2 = ax_settings(ax, ax2, categ_majors_law, "Law & Public Policy Majors", 50000)
# Education majors
for sp in range(3, 5):
ax = fig.add_subplot(3,3,sp+1)
ax2 = ax.twinx()
if sp == 3:
categ_majors_edu = major_earnings[major_earnings["Major_category"] == "Education"][:8]
else:
categ_majors_edu = major_earnings[major_earnings["Major_category"] == "Education"][8:]
ax, ax2 = ax_settings(ax, ax2, categ_majors_edu, "Education Majors " + str(sp - 2), 50000)
# Agriculture majors
ax = fig.add_subplot(3,3,6)
ax2 = ax.twinx()
categ_majors_agri = major_earnings[major_earnings["Major_category"] == "Agriculture & Natural Resources"]
ax, ax2 = ax_settings(ax, ax2, categ_majors_agri, "Agriculture & Natural Resources Majors", 50000)
# Industrial Arts majors
ax = fig.add_subplot(3,3,7)
ax2 = ax.twinx()
categ_majors_indarts = major_earnings[major_earnings["Major_category"] == "Industrial Arts & Consumer Services"]
ax, ax2 = ax_settings(ax, ax2, categ_majors_indarts, "Industrial Arts & Consumer Services Majors", 50000)
fig.tight_layout(w_pad = 2.0, h_pad = 10.0)
plt.show()
Business majors are a mixed bag. Women are highly represented in some majors such as human resources and marketing, but are underrepresented in others such as logistics. Women have achieved a high rate of representation in just about every law major except court reporting, and in education, there is only one major in which the share of women is below 50%. In education, women have higher rates of participation in most majors, whereas in agriculture women have low rates of participation in 8 out of 10 majors.
Women have a high or very high share of certain major categories, such as health and education majors, but a very small portion of others that are typically in high demand and have higher earning potential, such as engineering and computer-related majors. In many major categories, the top earning profession has a low share of women in its ranks.
Dataquest. Exploratory Data Visualization.
Casselman, Ben. Data and code for FiveThirtyEight's story on earnings of college majors.
tim and Ffisegydd. Pandas: Bar-Plot with two bars and two y-axis. June 2014.
pemistahl and EOL. How can I add textures to my bars and wedges?. January 2013.
Chris and Jianxun Li. Format y axis as percent. July 2015.