Pandas Resampling Financial Data Series

by

 

 

Resampling Timestamped Data with Multiple Target Columns

In this example we will show how to resample from 1 minute data in to daily or hourly, the example dataframe below is from crypto using Bybit API 

The timestamp is in milliseconds and represents 1 min interval data from Bybit.

 

Problems / Issues Solved

- Ensure that the timestamp and the new interval are aligned

- Handle multiple symbols with a groupby so we can resample multiple at the same time

	timestamp	open	high	low	close	volume	turnover	symbol
0	1776902460000	2374.77	2377.24	2373.48	2374.85	1950.06	4632215.0444	ETHUSDT
1	1776902460000	86.87	86.94	86.83	86.88	8635.1	750264.153	SOLUSDT
2	1776902460000	78254.2	78312.9	78228.0	78263.8	128.931	10092245.8399	BTCUSDT
3	1776902520000	86.88	86.93	86.87	86.92	1175.0	102104.992	SOLUSDT
4	1776902520000	78263.8	78309.0	78247.0	78305.9	33.842	2649253.7078	BTCUSDT
4189	1776986220000	77996.5	78029.8	77975.9	78028.1	33.26	2594256.9364	BTCUSDT
4190	1776986220000	85.63	85.77	85.56	85.74	28249.5	2419460.609	SOLUSDT
4191	1776986280000	85.74	85.83	85.71	85.77	10699.6	917850.263	SOLUSDT
4192	1776986280000	78028.1	78028.2	77997.7	78003.4	4.739	369681.008	BTCUSDT
4193	1776986280000	2324.95	2324.96	2323.84	2324.43	115.87	269305.6855	ETHUSDT

 

 

Problem

Lets take common issue in which we set the index as a datetime in pandas which is required for resampling, and then we do the resample , ok so here we are assuming that we want to remove the datetime index after we do the resample, this is realistic for saving to database etc. So we will notice that the output below looks correct at first glance. 

df.index = pd.to_datetime(df.timestamp, unit='ms', utc=True)

resampled = (
    df.sort_index()  # guard against unsorted input
    .groupby("symbol")
    .resample('D')
    .agg(
        {
            "open": "first",
            "high": "max",
            "low": "min",
            "close": "last",
            "volume": "sum",
            "turnover": "sum",
            "timestamp": "first",
            "symbol": "first"

        }
    )
).reset_index(drop=True)


print(resampled)
"""

	open	high	low	close	volume	turnover	timestamp	symbol
0	78254.2	78633.0	76900.0	78003.4	87345.587	6798091833.0901	1776902460000	BTCUSDT
1	2374.77	2381.3	2283.5	2324.43	1409944.26	3284361756.706	1776902460000	ETHUSDT
2	86.87	87.12	84.38	85.77	9102853.0	780557199.604	1776902460000	SOLUSDT

"""

 

So what is the issue with this? Let's simulate loading the data again from a db or file, notice below that the data timestamps starts at 1 min past the expected timestamp. 

resampled.index = pd.to_datetime(resampled.timestamp, unit='ms', utc=True)

print(resampled)

"""

	open	high	low	close	volume	turnover	timestamp	symbol
timestamp								
2026-04-23 00:01:00+00:00	78254.2	78633.0	76900.0	78003.4	87345.587	6798091833.0901	1776902460000	BTCUSDT
2026-04-23 00:01:00+00:00	2374.77	2381.3	2283.5	2324.43	1409944.26	3284361756.706	1776902460000	ETHUSDT
2026-04-23 00:01:00+00:00	86.87	87.12	84.38	85.77	9102853.0	780557199.604	1776902460000	SOLUSDT

"""

 

The error here is that you forgot that the timestamp, in fact has not been modified, it is the same as it was in the original data, this can cause all sorts of problems if we expect a unique timestamp per symbol, here all are the same but say there are 100+ symbols this will cause the index values to be different think (+1, +2) mins etc

 

Solution

Since we want the new timestamp from the index, we remove the timestamp and optionally symbol from the agg method and then derive it from the index after we have completed the resample, this ensures that we get the desired timestamp

df.index = pd.to_datetime(df.timestamp, unit='ms', utc=True)
resampled = (
    df.sort_index()  # guard against unsorted input
    .groupby("symbol")
    .resample('D')
    .agg(
        {
            "open": "first",
            "high": "max",
            "low": "min",
            "close": "last",
            "volume": "sum",
            "turnover": "sum",
        }
    ) ## leave the timestamp out of the agg 
)
# extract it from the index of resampled dataframe
resampled["timestamp"] = resampled.index.get_level_values(1).view("int64") // 10**6
## add the symbol back in
resampled["symbol"] = resampled.index.get_level_values(0)
resampled = resampled.reset_index(drop=True)


resampled.index = pd.to_datetime(resampled['timestamp'], unit='ms', utc=True)

"""
	open	high	low	close	volume	turnover	timestamp	symbol
timestamp								
2026-04-23 00:00:00+00:00	78254.2	78633.0	76900.0	78003.4	87345.587	6798091833.0901	1776902400000	BTCUSDT
2026-04-23 00:00:00+00:00	2374.77	2381.3	2283.5	2324.43	1409944.26	3284361756.706	1776902400000	ETHUSDT
2026-04-23 00:00:00+00:00	86.87	87.12	84.38	85.77	9102853.0	780557199.604	1776902400000	SOLUSDT


"""

 

 

Warning

The method above will mean the timestamp in itself is not accurate, i.e. it is a rounded version of the original timestamp, if you are doing this ensure you understand that, Can do something like the snippet below to preserver the original , ensuring you document exactly which is which. 

df.index = pd.to_datetime(df.timestamp, unit='ms', utc=True)
resampled = (
    df.sort_index()  # guard against unsorted input
    .groupby("symbol")
    .resample('D')
    .agg(
        {
            "open": "first",
            "high": "max",
            "low": "min",
            "close": "last",
            "volume": "sum",
            "turnover": "sum",
            "timestamp": "first"
        }
    ) ## leave the timestamp out of the agg 
)

resampled.rename(columns={"timestamp": "original_timestamp"}, inplace=True)
# extract it from the index of resampled dataframe
resampled["timestamp"] = resampled.index.get_level_values(1).view("int64") // 10**6
## add the symbol back in
resampled["symbol"] = resampled.index.get_level_values(0)
resampled = resampled.reset_index(drop=True)
resampled.index = pd.to_datetime(resampled['timestamp'], unit='ms', utc=True)

"""

	open	high	low	close	volume	turnover	original_timestamp	timestamp	symbol
timestamp									
2026-04-23 00:00:00+00:00	78254.2	78633.0	76900.0	78003.4	87345.587	6798091833.0901	1776902460000	1776902400000	BTCUSDT
2026-04-23 00:00:00+00:00	2374.77	2381.3	2283.5	2324.43	1409944.26	3284361756.706	1776902460000	1776902400000	ETHUSDT
2026-04-23 00:00:00+00:00	86.87	87.12	84.38	85.77	9102853.0	780557199.604	1776902460000	1776902400000	SOLUSDT

"""