A Collection of Must-Know Techniques for Working with Time Series Data in Python

How to manipulate and visualize time series data in datetime format with ease

Essential techniques for working with time series in Python: convert strings to datetime, handle missing values, resample, and visualize temporal data with Pandas.
Towards Data Science Archive
Published

October 12, 2022

Illustration by xsIJciJN from IllustAC

Working with time series data can be intimidating at first. The time series values are not the only information you have to consider. The timestamps also contain information, especially about the relationship between the values.

The timestamps also contain information, especially about the relationship between the values.

In contrast to common data types, timestamps have a few unique characteristics. While they look like a string at first glance, they also have numerical aspects.

This article will give you the following list of must-know techniques for handling time series data:

How to Deal with Datetime Format
Reading Datetime Format
Converting a String to Datetime
Converting Unix Time to Datetime Format
Creating a Range of Dates
Changing the Datetime Format
How to Compose and Decompose a Datetime
Decomposing a Datetime
Assembling Multiple Columns to a Datetime
How to Fill Missing Values
Filling Missing Values with a Constant Value
Filling Missing Values with the Last Value
Filling Missing Values with Linearly Interpolated Values
How to Perform Operations on a Time Series
Getting the Min and Max
Differencing
Cumulating
Getting the Rolling Mean
Calculating the Time Difference between Two Timestamps
How to Filter Time Series
Filtering Time Series on Specific Timestamps
Filtering Time Series on Time Ranges
How to Resample Time Series
Downsampling
Upsampling
How to Plot Time Series
Plotting Numerical Data over Time
Plotting Categorical Data over Time
Plotting a Timeline
Setting the X-Axis Limits of a Time Series
Setting the X-Ticks of a Time Series

For this article, we will be using a minimal fictional dataset. It has three columns: date, cat_feature, and num_feature.

Fictional minimal time series dataset loaded as pandas DataFrame (Image by the author)

How to Deal with Datetime Format

The essential part of time series data is the timestamps. If these timestamps are in Datetime format, you can apply various manipulations, which we will discuss in this section.

Reading Datetime Format

By default, pandas reads timestamp columns as strings into a DataFrame when reading from a CSV file. To read the timestamp column as datetime objects (with data type datetime64[ns]) directly, you can use the parse_date parameter, as shown below.

import pandas as pddf = pd.read_csv("example.csv",   
                 parse_dates = ["date"])

Fictional minimal time series dataset loaded as pandas DataFrame (Image by the author)

Data type of time stamps as datetime64[ns]

Data type of time stamps as datetime64[ns] (Image by the author)

Converting a String to Datetime

To convert a string to a datetime64[ns] format, you can use the .to_datetime() method. This is handy if you can’t use the parse_dates parameter during the import. You can look up the relevant strftime format.

# By default the date column is imported as string  
df = pd.read_csv("example.csv")# Convert to datetime data type  
df["date"] = pd.to_datetime(df["date"],   
                            format = "%Y-%m-%d %H:%M:%S.%f")

Fictional minimal time series dataset loaded as pandas DataFrame (Image by the author)

Converting Unix Time to Datetime Format

If your timestamp column is in Unix time, you can convert it to a human-readable format with the .to_datetime() method by using the unit parameter.

# Convert from unix  
df["date"] = pd.to_datetime(df["date_unix"],   
                                 unit = "s")# Convert to unix  
df["date_unix"] = df["date"].view('int64')

Timestamps converted to unix (Image by the author)

Creating a Range of Dates

If you want to create a range of dates, you have two options:

  • define the range of dates with a start and an end date
  • define the range of dates with a start date, a frequency (e.g., daily, monthly, etc.), and the number of periods.
df["date"] = pd.date_range(start = "2022-01-01",   
                           end = "2022-12-31")df["date"] = pd.date_range(start = "2022-01-01",   
                           periods = 365,   
                           freq = "D")

Pandas series of a range of dates (Image by the author)

Changing the Datetime Format

To change the timestamp format you can use the .strftime() method.

# Example: Change "2022-01-01" to "January 1, 2022"  
df["date"] = df["date"].dt.strftime("%b %d, %Y")

Changed datetime format with strftime (Image by the author)

How to Compose and Decompose a Datetime

A timestamp is made up of many things like, e.g., the date or the time – or even more fine-grained, like the hour or the minute. This section will discuss how to decompose a daytime data type into its components, and also how to compose a datetime data type from different columns containing timestamp components.

Decomposing a Datetime

When you have a date and a timestamp, you can decompose them into their components, as shown below.

# Splitting date and time  
df["dates"] = df["date"].dt.date  
df["times"] = df["date"].dt.time

Timestamp decomposed to dates and times (Image by the author)

You can find also decompose it into smaller components, as shown below. You can find more possible components in the pandas DatetimeIndex documentation.

# Creating datetimeindex features  
df["year"] = df["date"].dt.year  
df["month"] = df["date"].dt.month  
df["day"] = df["date"].dt.day  
# etc.

Timestamp decomposed to year, monthn, and day (Image by the author)

Assembling Multiple Columns to a Datetime

If you want to assemble a date column from its components like the year, month, and day, you can also use the .to_datetime() method.

df["date"] = pd.to_datetime(df[["year", "month", "day"]])

Assembling Multiple Columns to a Datetime (Image by the author)

How to Fill Missing Values

Filling missing values is challenging whether you are working with numerical, categorical, or time series data. This section will explore three methods to fill in missing values in time series data.

Filling Missing Values with a Constant Value

One approach is to fill missing values with a constant value with the .fillna() method. Commonly such a constant value could be the mean of the time series or an outlier value like -1 or 999. However, filling missing values with a constant value is often not sufficient.

df["num_feature"] = df["num_feature"].fillna(0)

Filling Missing Values with a Constant Value (Image by the author via Kaggle)

Filling Missing Values with the Last Value

Another approach is to fill the missing value with the last available value with the .ffill() method.

df["num_feature"] = df["num_feature"].ffill()

Filling Missing Values with the Last Value (Image by the author via Kaggle)

Filling Missing Values with Linearly Interpolated Values

Often a good solution to handle missing values is to linearly interpolate the missing values with the .interpolate() method.

df["num_feature"] = df["num_feature"].interpolate()

Filling Missing Values with Linearly Interpolated Values (Image by the author via Kaggle)

How to Perform Operations on a Time Series

You can perform various operations on time series data, which we will discuss in this section.

Getting the Min and Max

Knowing the time series’ start or end date can be helpful in many cases.

df["date"].min()  
df["date"].max()

Differencing

Differencing means taking the difference between two consecutive values in a time series. For this, you can use the .diff() method.

df["num_feature_diff"] = df["num_feature"].diff()

Differencing of time series data (Image by the author)

Cumulating

The opposite of differencing is accumulating values of the time series with the .cumsum() method.

df["num_feature_cumsum"] = df["num_feature"].cumsum()

Cumulating of time series data (Image by the author)

Getting the Rolling Mean

Sometimes you need the rolling mean of a time series. You can use the .rolling() method, which takes a parameter of the number of values to consider in the rolling window. In the example below, we take the mean of three values. Therefore, the first two rows are empty, and the third row is the mean value of the first three rows.

df["num_feature_mean"] = df["num_feature"].rolling(3).mean()

Rolling mean of time series data (Image by the author)

Calculating the Time Difference between Two Timestamps

Sometimes you need to calculate the time difference between two timestamps. E.g., if you might need to calculate the time difference from a specific date.

df["time_since_start"] = df["date"] - df["date"].min()

Time difference of timestamp and first timestamp (Image by the author)

Or if you want to find out whether the timestamps are distributed equidistantly.

df["timestamp_difference"] = df["date"].diff()

Time difference between timestamps (Image by the author)

How to Filter Time Series

When working with time series data, you might need to filter it at specific times. To filter the time series data, you must set the date column as the index. Once you have the time stamp index, you can fill it out on a specific date or even on a specific time range.

df = df.set_index(["date"])

DataFrame of time series data with the timestamps as index (Image by the author)

Filtering Time Series on Specific Timestamps

When you have the timestamps set as the index of the pandas DataFrame, you can easily filter for specific timestamps with loc.

df.loc["2020-03-30"]

Filtered tme series on a date (Image by the author)

Filtering Time Series on Time Ranges

Similarly to the above example of filtering on specific timestamps, you can also use loc for filtering on time ranges when the timestamps are set as the index of the pandas DataFrame.

df.loc["2020-04-10":"2020-04-15"]

Filtered tme series on a date range (Image by the author)

How to Resample Time Series

Resampling can provide additional information on the data. There are two types of resampling:

Downsampling

Downsampling is when the frequency of samples is decreased (e.g., seconds to months). You can use the .resample() method.

upsampled = df.resample("M")["num_feature"].mean()

Series of monthly resampled (downsampled) values (Image by the author)

Upsampling

Upsampling is when the frequency of samples is increased (e.g., months to days). Again, you can use the .resample() method.

upsampled.resample("D").interpolate(method = "linear")

Series of daily resampled (upsampled) values (Image by the author)

How to Plot Time Series

This section will discuss how to visualize numerical and categorical time series data with Matplotlib and Seaborn. In addition to the pyplot module, we will explore different visualization techniques with the dates module.

import matplotlib.pyplot as plt  
import matplotlib.dates as mdates  
import seaborn as sns

To visualize the timely order of a time series, the x-axis of a plot usually represents the time, and the y-axis represents the value.

Plotting Numerical Data over Time

Most time series data is numerical, e.g., temperature or stock price data. To visualize numerical time series data, you can use line plots.

sns.lineplot(data = df,   
             x = "date",   
             y = "num_feature")

Line plot of numerical time series data (Image by the author)

Plotting Categorical Data over Time

Sometimes time series data can be categorical, e.g., tracking occurrences of different events.

Before plotting the data, you can label encode the categorical columns, e.g., by using the LabelEncoder or with a simple dictionary, as shown below.

# Label encode the categorical column  
enum_dict = {}  
for i, cat in enumerate(df.cat_feature.unique()):  
    enum_dict[cat] = idf["cat_feature_enum] = df["cat_feature"].replace(enum_dict)

Label encoded feature “cat_feature” as “cat_feature_enum” (Image by the author)

To visualize categorical time series data, you can use scatter plots.

fig, ax = plt.subplots(figsize=(8, 4))sns.scatterplot(data = df,  
                x = "date",   
                y = "cat_feature_enum",   
                hue = "cat_feature",  
                marker = '.',  
                linewidth = 0,  
                )ax.set_yticks(np.arange(0, (len(df.cat_feature.unique()) + 1), 1))  
ax.set_yticklabels(df.cat_feature.unique())  
ax.get_legend().remove() # remove legend - it's not necessary hereplt.show()

Event plot of categorical time series data with scatter plot (Image by the author)

You can also try out Matplotlib’s eventplot demo.

Plotting a Timeline

For plotting a timeline, we will use the label encoded categorical values from the previous section and vlines.

fig, ax = plt.subplots(figsize=(8, 4))ax.vlines(df["date"], 0, df["cat_feature_enum"])plt.show()

Timeline plot of categorical time series data with vlines (Image by the author)

Setting the X-Axis Limits of a Time Series

When you want to set the x-axis limits of a time series plot, the range has to be of the datetime64[ns] data type.

E.g., you can use the minimum and maximum timestamps of your time series:

ax.set_xlim([df.date.min(), df.date.max()])

Or you can specify a custom range, as shown below:

ax.set_xlim(np.array(["2020-04-01", "2020-04-30"],  
                      dtype="datetime64"))

Adjusted x-axis ranges (Image by the author)

Setting the X-Ticks of a Time Series

To improve the readability of your data visualization, you can add major and minor x-ticks at specific intervals (e.g., weekly, monthly, yearly, etc.)

ax.xaxis.set_major_locator(mdates.MonthLocator())  
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b %d"));  
ax.xaxis.set_minor_locator(mdates.DayLocator())

Custom x-axis ticks (Image by the author)

Conclusion

Getting started with handling time series data can be challenging when you are unfamiliar with the datetime data type. As you saw, the datetime data type has many practical in-built methods for easily manipulating time series data. This article discussed everything from manipulating the timestamps and valuable operations of the time series values to visualizing time series data.


This blog was originally published on Towards Data Science on Oct 12, 2022 and moved to this site on Feb 1, 2026.

Back to top