F Archive HR datasets

The appendix describes the datasets used in this companion book.

F.1 Gender Pay Gap

The Gender Pay Gap dataset comes from the “Glassdor Research” website. It is contains the salary details for an hypothetical employer with 1,000 employees, spread across 10 job roles and 5 company departments.

The dataset can be accessed using:

“https://glassdoor.box.com/shared/static/beukjzgrsu35fqe59f7502hruribd5tt.csv”

Here are sample rows from this dataset:

jobTitle	gender	age	perfEval	edu	dept	seniority	basePay	bonus
Graphic Designer	Female	18	5	College	Operations	2	42363	9938
Software Engineer	Male	21	5	College	Management	5	108476	11128
Warehouse Associate	Female	19	4	PhD	Administration	5	90208	9268
Software Engineer	Male	20	5	Masters	Sales	4	108080	10154
Graphic Designer	Male	26	5	Masters	Engineering	5	99464	9319
IT	Female	20	5	PhD	Operations	4	70890	10126

F.2 Overhead value analysis

F.3 HR Service Desk

There are two publicly available datasets on the HR service desk.

“https://www.ibm.com/communities/analytics/watson-analytics-blog/it-help-desk/”"

“https://www.kaggle.com/lyndonsundmark/service-request-analysis/data”"

The datasets can be accessed using:

“https://community.watsonanalytics.com/wp-content/uploads/2015/03/WA_Fn-UseC_-IT-Help-Desk.xlsx”

Here are sample rows from this dataset:

The following 5 lines are not working, so I commented them until I have time to look into it. Hendrik #require(gdata) #servicedesk <- read.xls(“https://community.watsonanalytics.com/wp-content/uploads/2015/03/WA_Fn-UseC_-IT-Help-Desk.xlsx,” sheet = 1, header = TRUE, method=“csv”) #knitr::kable(head(servicedesk), “html”)

F.4 HR recruitment, selection and performance data

Large dataset of selected HR applicants and performance data purchased from the Data and Sons website. Row Count: 1312450

IMPORTANT: this file was generated solely for pedagogical purposes. Due to the method of generation (R: BinOrdNonNor), it should NOT be used for research purposes. Note that the files will need to be joined in order to fully explore most relevant questions. This was intentionally left to the students to do as an exercise in order to further develop relevant skills. Selection Data Description provides a description of the variables contained in each of the remaining files.

Dataset Terms & Conditions: Creative Commons Attribution-ShareAlike 4.0 International Public License

There are two publicly available datasets on the HR service desk.

“https://www.dataandsons.com/dataset/preview/90”

F.5 Job classification

The Job classification dataset comes from a blog article from Lyndon Sundmark. It is contains the salary details for an hypothetical employer with 1,000 employees, spread across 10 job roles and 5 company departments.

The dataset can be accessed using:

https://onedrive.live.com/?authkey=%21ABv-gHg5jVluYpc&cid=4EF2CCBEDB98D0F5&id=4EF2CCBEDB98D0F5%216440&parId=4EF2CCBEDB98D0F5%216433&o=OneUp

Here are ten sample rows from this dataset:

ID JobFamily JobFamilyDescription JobClass JobClassDescription PayGrade
– ——— ——————– ——– ——————- ——–

EducationLevel Experience OrgImpact ProblemSolving Supervision ContactLevel FinancialBudget PG ————– ———- ——— ————– ———– ———— ————— –

F.6 Absenteeism at work

The Abesnteeism at work dataset can be accessed from the UC Irvine Machine Learning Repository. The data set allows for several new combinations of attributes and attribute exclusions, or the modification of the attribute type (categorical, integer, or real) depending on the purpose of the research.

The dataset can be accessed using:

https://archive.ics.uci.edu/ml/datasets/Absenteeism+at+work

Here are ten sample rows from this dataset:

ID JobFamily JobFamilyDescription JobClass JobClassDescription PayGrade
– ——— ——————– ——– ——————- ——–

EducationLevel Experience OrgImpact ProblemSolving Supervision ContactLevel FinancialBudget PG ————– ———- ——— ————– ———– ———— ————— –

The database was created with records of absenteeism at work from July 2007 to July 2010 at a courier company in Brazil.

Creators original owner and donors: Andrea Martiniano (1), Ricardo Pinto Ferreira (2), and Renato Jose Sassi (3).

E-mail address: andrea.martiniano@gmail.com (1) - PhD student; log.kasparov@gmail.com (2) - PhD student; sassi@uni9.pro.br (3) - Prof. Doctor.

Universidade Nove de Julho - Postgraduate Program in Informatics and Knowledge Management.

Address: Rua Vergueiro, 235/249 Liberdade, Sao Paulo, SP, Brazil. Zip code: 01504-001.

Website: http://www.uninove.br/curso/informatica-e-gestao-do-conhecimento/