Resampling Timestamped Data with Multiple Target Columns
In this example we will show how to resample from 1 minute data in to daily or hourly, the example dataframe below is from crypto using Bybit API
The timestamp is in milliseconds and represents 1 min interval data from Bybit.
Problems / Issues Solved
- Ensure that the timestamp and the new interval are aligned
- Handle multiple symbols with a groupby so we can resample multiple at the same time
timestamp open high low close volume turnover symbol
0 1776902460000 2374.77 2377.24 2373.48 2374.85 1950.06 4632215.0444 ETHUSDT
1 1776902460000 86.87 86.94 86.83 86.88 8635.1 750264.153 SOLUSDT
2 1776902460000 78254.2 78312.9 78228.0 78263.8 128.931 10092245.8399 BTCUSDT
3 1776902520000 86.88 86.93 86.87 86.92 1175.0 102104.992 SOLUSDT
4 1776902520000 78263.8 78309.0 78247.0 78305.9 33.842 2649253.7078 BTCUSDT
4189 1776986220000 77996.5 78029.8 77975.9 78028.1 33.26 2594256.9364 BTCUSDT
4190 1776986220000 85.63 85.77 85.56 85.74 28249.5 2419460.609 SOLUSDT
4191 1776986280000 85.74 85.83 85.71 85.77 10699.6 917850.263 SOLUSDT
4192 1776986280000 78028.1 78028.2 77997.7 78003.4 4.739 369681.008 BTCUSDT
4193 1776986280000 2324.95 2324.96 2323.84 2324.43 115.87 269305.6855 ETHUSDT
Problem
Lets take common issue in which we set the index as a datetime in pandas which is required for resampling, and then we do the resample , ok so here we are assuming that we want to remove the datetime index after we do the resample, this is realistic for saving to database etc. So we will notice that the output below looks correct at first glance.
df.index = pd.to_datetime(df.timestamp, unit='ms', utc=True)
resampled = (
df.sort_index() # guard against unsorted input
.groupby("symbol")
.resample('D')
.agg(
{
"open": "first",
"high": "max",
"low": "min",
"close": "last",
"volume": "sum",
"turnover": "sum",
"timestamp": "first",
"symbol": "first"
}
)
).reset_index(drop=True)
print(resampled)
"""
open high low close volume turnover timestamp symbol
0 78254.2 78633.0 76900.0 78003.4 87345.587 6798091833.0901 1776902460000 BTCUSDT
1 2374.77 2381.3 2283.5 2324.43 1409944.26 3284361756.706 1776902460000 ETHUSDT
2 86.87 87.12 84.38 85.77 9102853.0 780557199.604 1776902460000 SOLUSDT
"""
So what is the issue with this? Let's simulate loading the data again from a db or file, notice below that the data timestamps starts at 1 min past the expected timestamp.
resampled.index = pd.to_datetime(resampled.timestamp, unit='ms', utc=True)
print(resampled)
"""
open high low close volume turnover timestamp symbol
timestamp
2026-04-23 00:01:00+00:00 78254.2 78633.0 76900.0 78003.4 87345.587 6798091833.0901 1776902460000 BTCUSDT
2026-04-23 00:01:00+00:00 2374.77 2381.3 2283.5 2324.43 1409944.26 3284361756.706 1776902460000 ETHUSDT
2026-04-23 00:01:00+00:00 86.87 87.12 84.38 85.77 9102853.0 780557199.604 1776902460000 SOLUSDT
"""
The error here is that you forgot that the timestamp, in fact has not been modified, it is the same as it was in the original data, this can cause all sorts of problems if we expect a unique timestamp per symbol, here all are the same but say there are 100+ symbols this will cause the index values to be different think (+1, +2) mins etc
Solution
Since we want the new timestamp from the index, we remove the timestamp and optionally symbol from the agg method and then derive it from the index after we have completed the resample, this ensures that we get the desired timestamp
df.index = pd.to_datetime(df.timestamp, unit='ms', utc=True)
resampled = (
df.sort_index() # guard against unsorted input
.groupby("symbol")
.resample('D')
.agg(
{
"open": "first",
"high": "max",
"low": "min",
"close": "last",
"volume": "sum",
"turnover": "sum",
}
) ## leave the timestamp out of the agg
)
# extract it from the index of resampled dataframe
resampled["timestamp"] = resampled.index.get_level_values(1).view("int64") // 10**6
## add the symbol back in
resampled["symbol"] = resampled.index.get_level_values(0)
resampled = resampled.reset_index(drop=True)
resampled.index = pd.to_datetime(resampled['timestamp'], unit='ms', utc=True)
"""
open high low close volume turnover timestamp symbol
timestamp
2026-04-23 00:00:00+00:00 78254.2 78633.0 76900.0 78003.4 87345.587 6798091833.0901 1776902400000 BTCUSDT
2026-04-23 00:00:00+00:00 2374.77 2381.3 2283.5 2324.43 1409944.26 3284361756.706 1776902400000 ETHUSDT
2026-04-23 00:00:00+00:00 86.87 87.12 84.38 85.77 9102853.0 780557199.604 1776902400000 SOLUSDT
"""
Warning
The method above will mean the timestamp in itself is not accurate, i.e. it is a rounded version of the original timestamp, if you are doing this ensure you understand that, Can do something like the snippet below to preserver the original , ensuring you document exactly which is which.
df.index = pd.to_datetime(df.timestamp, unit='ms', utc=True)
resampled = (
df.sort_index() # guard against unsorted input
.groupby("symbol")
.resample('D')
.agg(
{
"open": "first",
"high": "max",
"low": "min",
"close": "last",
"volume": "sum",
"turnover": "sum",
"timestamp": "first"
}
) ## leave the timestamp out of the agg
)
resampled.rename(columns={"timestamp": "original_timestamp"}, inplace=True)
# extract it from the index of resampled dataframe
resampled["timestamp"] = resampled.index.get_level_values(1).view("int64") // 10**6
## add the symbol back in
resampled["symbol"] = resampled.index.get_level_values(0)
resampled = resampled.reset_index(drop=True)
resampled.index = pd.to_datetime(resampled['timestamp'], unit='ms', utc=True)
"""
open high low close volume turnover original_timestamp timestamp symbol
timestamp
2026-04-23 00:00:00+00:00 78254.2 78633.0 76900.0 78003.4 87345.587 6798091833.0901 1776902460000 1776902400000 BTCUSDT
2026-04-23 00:00:00+00:00 2374.77 2381.3 2283.5 2324.43 1409944.26 3284361756.706 1776902460000 1776902400000 ETHUSDT
2026-04-23 00:00:00+00:00 86.87 87.12 84.38 85.77 9102853.0 780557199.604 1776902460000 1776902400000 SOLUSDT
"""