I’ve encountered an inconsistency in the data returned by the API endpoints for K-lines (‘/api/v3/klines’) and aggregated trades (‘/api/v3/aggTrades’). Specifically, there are discrepancies in the trade counts and volumes reported. For aggregated trades, I make multiple requests up to the final trade of the minute to ensure alignment.
For instance, a request for a one-minute K-Line typically covers the period from 13:00:00:000 to 13:01:59:999. I ensure that my function for aggregated trades thoroughly scans this entire interval, making multiple requests as necessary. However, I’ve noticed that, occasionally, even with different trading pairs, the trade counts do not match.
Has anyone else experienced this issue? What could be the underlying cause?
P.S. I have seen some similar topics but found no answers/solutions.
This issue was discussed in the context of using websockets for K-line data, where users noticed inconsistencies, which were partly attributed to the specific streams subscribed to and how these handle real-time data updates
In short, after also reading those conversations, it becomes quite clear that aggregated trades are not very reliable and present critical issues that need to be investigated.
I myself have found these discrepancies between k-lines and aggregates (number of trades and volumes) to be statistically significant and thus unreliable.
If one wants to perform an analysis on the distribution, it is better to rely on individual trades (‘/api/v3/historicalTrades’ and WS).
It’s a real pity because aggregated trades retain a good amount of information with a much smaller volume of data.
This is the output of my code, actually I am testing this range on BTCUSDT.
All trades are within the specified time range. 2024-04-09 13:04:00, 2024-04-09 16:03:59.999000
k trades: 478732.0, k vol 10851.719830000013
agg trades: 473406, agg vol 10733.94963999999
k trades and k vol are obtained through k-line endpoint, agg trades and agg vol through the aggTrades.
This range has a high volume of trades, where discrepancies happen more frequently. I am doing other tests.
@Luca_D thanks for the example! That really helps.
How do you iterate through aggtrades? You should use startTime and endTime only to establish the aggtrade ID range and use fromId for pagination. Otherwise, multiple aggtrades at the same timestamp might be missing.
Here’s my script in Python that iterates over the range you mentioned and shows the same results for klines and aggtrades:
#!/usr/bin/env python3
import requests
import time
from decimal import Decimal
BINANCE_API = 'https://api.binance.com'
SYMBOL = 'BTCUSDT'
START_TIME = 1712667840000 # 2024-04-09 13:04:00.000
END_TIME = 1712678639999 # 2024-04-09 16:03:59.999
# Get klines, we can fetch them in one go
response = requests.get(f'{BINANCE_API}/api/v3/klines?symbol={SYMBOL}&interval=1m&startTime={START_TIME}&endTime={END_TIME}')
klines = response.json()
assert len(klines) == 180
assert klines[0][0] == 1712667840000
assert klines[-1][6] == 1712678639999
kline_num_trades = sum([k[8] for k in klines])
kline_volume = sum([Decimal(k[5]) for k in klines])
print(f'klines: {kline_num_trades} trades, {kline_volume} volume')
# Get aggtrades, in batches
response = requests.get(f'{BINANCE_API}/api/v3/aggTrades?symbol={SYMBOL}&startTime={START_TIME}&endTime={END_TIME}')
agg_trades = response.json()
while agg_trades[-1]['T'] < END_TIME + 1:
next_id = agg_trades[-1]['a'] + 1
response = requests.get(f'{BINANCE_API}/api/v3/aggTrades?symbol={SYMBOL}&fromId={next_id}&limit=1000')
agg_trades += response.json()
time.sleep(0.1) # be gentle to the API
# Cut off the overshoot by time at the end
agg_trades = [t for t in agg_trades if t['T'] <= END_TIME]
assert len(agg_trades) == 381806
assert agg_trades[0]['a'] == 2959684856
assert agg_trades[-1]['a'] == 2960066661
aggtrade_num_trades = sum([t['l'] - t['f'] + 1 for t in agg_trades])
aggtrade_volume = sum([Decimal(t['q']) for t in agg_trades])
print(f'aggtrades: {aggtrade_num_trades} trades, {aggtrade_volume} volume')
Yes, it’s most likely this part that had an error.
If the API response ends with a run of multiple aggtrades at the same millisecond (think, somebody placed a big order that traded at multiple price levels), some might get cut due to the limit. That depends on the exact trades, explaining why it’s randomly happening.
It is still possible to use a loop with just startTime and endTime like you do, but the trick is to query the trades with some overlap:
params['startTime'] = trades[-1]['T'] # note: no adjustment by "+ 1"
and then skip some trades at the front — the ones you’ve already seen, based on the ID:
last_aggtrade_id = aggregated_trades[-1]['a']
trades = [t for t in trades if t['a'] > last_aggtrade_id]