profile

Starting With Data

🟒 HR Analytics Upgrade: Step-by-Step Employee Tenure Analysis with Python (Sample Code Included)

Published 7 months agoΒ β€’Β 4 min read

Starting with Data

Actionable data analytics tips, tricks, and tutorials to increase your earning potential. Delivered right to your inbox every Saturday.

HR Analytics Upgrade: Step-by-Step Employee Tenure Analysis with Python (Sample Code Included)

Navigating the world of HR analytics can feel like stepping into a maze without a map.

Sifting through mountains of raw employee data, trying to see patterns, predict turnovers, or even just understand how long your employees stick around.

Not just daunting. But overwhelming at times.

  • Struggling to manually collate employee data month after month.
  • Battling with inconsistent analysis outcomes.
  • Facing a never-ending cycle of doubts about the accuracy and relevance of your findings.

The sheer weight makes you question if HR analytics is even worth the effort.

But it can be different. An organized, systematic approach to HR analytics where employee tenure analysis becomes a breeze.

That's right.

With just a touch of Python magic, you can turn the tide.

  • No more manual data juggling.
  • Accurate, reliable insights at your fingertips.
  • Confident decision-making powered by clear, actionable data insights.

Dive into this guide, and I'll walk you through the transformative world of HR analytics, one step at a time. By the end, you'll have the tools and knowledge to confidently tackle any HR analytics challenge that comes your way.

​Follow along using my Google Colab notebook (free)​

​Struggling with something else? Let me know!​

Step 1: Install Required Libraries and Preview the Dataset

First, we load the dataset from a URL using pandas and display the first few rows

# Installing the necessary libraries
!pip install pandas matplotlib seaborn

# Importing the pandas library
import pandas as pd

# Loading the dataset from a given URL into a pandas dataframe
df = pd.read_csv('https://query.data.world/s/uljdvjbtfkk3r4ndixhs6uuckofh4p?dws=00000')

# Displaying the first few rows of the dataframe to get a quick overview
df.head()

Step 2: Data Cleansing: Remove Specific Records

Next, filter out rows from the dataset where the department is labeled as 'Human Resources'. This is just an example to show how filtering works with Pandas.

# Filtering the dataframe to exclude rows where the 'department' column value is 'Training'
cleaned_df = df[df['department'] != 'Human Resources']

# Display the results again (it might not look any different!)
cleaned_df.head()

Step 3: Date Parsing with Timezone Conversion

Converting 'hire_date' and 'termdate' columns into datetime format with UTC timezone and handling potential invalid dates.

This is an important skill to learn because often times dates and time stamps are poorly formatted when working with data.

# Convert 'hire_date' to datetime format
cleaned_df['hire_date'] = pd.to_datetime(cleaned_df['hire_date'], errors='coerce')

# If 'hire_date' is not tz-aware after the conversion, localize it to UTC
if cleaned_df['hire_date'].dt.tz is None:
    cleaned_df['hire_date'] = cleaned_df['hire_date'].dt.tz_localize('UTC')

# Convert 'termdate' to datetime format
cleaned_df['termdate'] = pd.to_datetime(cleaned_df['termdate'], errors='coerce')

# If 'termdate' is not tz-aware after the conversion, localize it to UTC
if cleaned_df['termdate'].dt.tz is None:
    cleaned_df['termdate'] = cleaned_df['termdate'].dt.tz_localize('UTC')

# Display records (note the changes in the date columns)
cleaned_df.head()

Step 4: Combine Location Data and Remove Redundancies

Here, we create a unified 'location' column by merging 'location_city' and 'location_state' columns and then drop the originals to streamline the dataframe.

This can be useful when combining any columns.

For example, combining first name and last name into a full name field. Or combining business unit and sub-business unit. Very handy!

# Create a new 'location' column by concatenating 'location_city' 
# and 'location_state' with a comma in between
cleaned_df['location'] = cleaned_df['location_city'] + ', ' + cleaned_df['location_state']

# Drop the now redundant 'location_city' and 'location_state' columns
cleaned_df = cleaned_df.drop(columns=['location_city', 'location_state'])

cleaned_df.head()

Step 5: Calculate Employee Tenure in Days

Next, let's calculate the tenure of employees in terms of days.

If an employee has a termination date (termdate), the tenure is the difference between the hire_date and termdate. Otherwise, it's the difference between the hire_date and the current date.

# Make sure current_date is a Timestamp object
import numpy as np

current_date = pd.Timestamp('now', tz='UTC')
if cleaned_df['termdate'].dt.tz is None:
    cleaned_df['termdate'] = cleaned_df['termdate'].dt.tz_localize('UTC')

# Make sure 'termdate' and 'hire_date' are converted to datetime format if they are not
cleaned_df['termdate'] = pd.to_datetime(cleaned_df['termdate'])
cleaned_df['hire_date'] = pd.to_datetime(cleaned_df['hire_date'])

# Calculate tenure
cleaned_df['tenure'] = np.where(
    cleaned_df['termdate'].notnull(),
    (cleaned_df['termdate'] - cleaned_df['hire_date']).dt.days,
    (current_date - cleaned_df['hire_date']).dt.days
)

cleaned_df.head()

Step 6: Categorize Employee Tenure

This code segments employees based on their tenure, classifying them as 'New', 'Mid-level', or 'Experienced'.

Notice how we use a Python function here.

First, we create the categorize_tenure function with one parameter (the number of days that each employee has been employed).

Then we call that function while creating a new field in the data frame.

# Define a function to categorize tenure of employees
def categorize_tenure(tenure):
    if tenure < 1825:  # Tenure less than 5 years
        return 'New'
    elif 1825 <= tenure <= 4380:  # Tenure between 5 to 12 years
        return 'Mid-level'
    else:  # Tenure more than 7 years
        return 'Experienced'

# Apply the 'categorize_tenure' function on the 'tenure' column
# The result is stored in a new column called 'tenure_category'
cleaned_df['tenure_category'] = cleaned_df['tenure'].apply(categorize_tenure)

cleaned_df.head()

Step 7: Visualization of Employee Tenure Distribution

For the final step, let's take a look at everything we've done to turn raw data into a useful analysis.

This code visualizes the distribution of employees across the 'New', 'Mid-level', and 'Experienced' tenure categories using a histogram.

import plotly.express as px

# Create a count plot using plotly express
fig = px.histogram(cleaned_df,
                   x='tenure_category',
                   category_orders={
                       'tenure_category': ['New', 'Mid-level', 'Experienced']
                   })
fig.update_layout(
    title='Count of Employees by Tenure Category using Plotly',
    xaxis_title='Tenure Category',
    yaxis_title='Count'
)
fig.show()

And here's the final result:

Great job!

The idea for this newsletter came directly from a reader – just like you!

​Take 3 minutes to let me know what you want help with next.​

Until next time, keep exploring and happy analyzing!

Brian

Whenever you're ready, here's how I can help you:

  1. Get more data analytics tips in my previous newsletter articles​
  2. Build your data analytics portfolio in 2 weeks with The Data Analytics Portfolio Playbook​

You are receiving this because you signed up for Starting with Data, purchased one of my data analytics products, or enrolled in one of my data analytics courses. Unsubscribe at any time using the link below.

113 Cherry St #92768, Seattle, WA 98104
​Unsubscribe οΎ· Preferences​

Starting With Data

Weekly data analytics tutorials and guides. 5 minutes to read.

Learn to build analytics projects with SQL, Tableau, Excel, and Python. For data analysts looking to level up their career and complete beginners looking to get started. No fluff. No theory. Just step-by-step tutorials anyone can follow.

Read more from Starting With Data

Business leaders are dumb. That's what a lot of business analysts think because they create something "awesome" with a dataset and then it gets ignored. Unfortunately, these types of business analysts don't realize that leaders aren't dumb. They are just busy. They are responsible for making decisions and making them quickly. And leaders need answers (based on data) more than anything. Think about it: if they need answers and you have the skills to provide those answers... You become their...

5 months agoΒ β€’Β 5 min read

Starting with Data Actionable data analytics tips, tricks, and tutorials to increase your earning potential. Delivered right to your inbox every Saturday. Python Data Blending Made Easy: 3 Simple Steps to Combine Data Sets in Python (Even if You’re Not Sure Where to Start) Hey Reader, Have you ever been given a few different spreadsheets and had to combine them into one? Use clunky VLOOKUPs in Excel Copy and paste (hoping you don't break something) Give up and ask a more technical coworker...

5 months agoΒ β€’Β 3 min read

Starting with Data Actionable data analytics tips, tricks, and tutorials to increase your earning potential. Delivered right to your inbox every Saturday. Stop Guessing, Start Visualizing: Turn Retail Data into Decision-Making Tools with Interactive Python Dashboards (Tutorial + Code) Hey Reader, You won't get noticed unless you are solving real-world business problems. find a problem create a solution share your work land your dream job Pretty simple. The hard part is knowing where to start....

6 months agoΒ β€’Β 5 min read
Share this post