The efficient fetching / pagination of too many orders

williams · November 29, 2022, 7:53am

—

For GET api/v3/allOrders, there is no current=1,2,3,… or page=1,2,3, … using which you can get additional orders in case you have too many orders to fetch.

—

The only workaround I can think of now is to test if the number of orders you got in the response is equal to “limit=”. Like in the following couple of examples:

—

Example 1:

Request: limit=900

Response: number of orders received is 800

My Action: no more orders need to be retrieved, you asked for 900 and got 800.

—

Example 2:

Request: limit=900

Response: number of orders received is 900

My Action: This is ambiguous! and can mean two things:

The number of orders really exist is 900 → everything is okay, do not fetch more orders
The number of orders really exist is 1200 → API did not return the 1000 orders because the limit is 900 and you still have to perform additional calls to the endpoint to get the remaining records (you must first change the “startTime=” and “endTime=” to a smaller duration such that they return a number of orders that is less than “Limit=” so you are sure you got all the existing orders (which maps to Example 1)

—

The technique I described above suffers the disadvantage of (in Example 2) that you refetch orders that you already got once before just to be confident that you did not miss any records which is a waste of resources.

—

Can anyone think of a cleaner way to achieve this?

—

In brief: How to know the number of orders that exist in a given time window between a given “startTime=” and “endTime=” without actually fetching these orders?

—

For other endpoints like “GET sapi/v1/asset/assetDividend” and “GET sapi/v1/c2c/orderMatch/listUserOrderHistory”, we have in the response “total”: 1200, which tells us what is the number of existing items in a given timespan even if we requested the retrieval of a much less number of items — Is there anything similar to this for “GET api/v3/allOrders”?
—

In brief, my question is “How to efficiently do pagination for “GET api/v3/allOrders”?

—

ilammy · November 29, 2022, 1:51pm

You should use orderId along with limit to paginate through orders. Note that startTime..endTime range cannot be longer than 24 hours.

Suppose you want to fetch all orders in the A…B range.

Choose limit=N to be the maximum number of orders you are comfortable processing in one batch.
Using startTime=A&limit=N, get the first batch of orders starting at the desired starting point.
Process order list from the response. Once time > B, you’re done.
If you’re not done, note the orderId of the last order in the response.
Make the next query with orderId=${lastOrderId + 1}&limit=N.
Go to step 3.

williams · November 29, 2022, 3:41pm

Perfect!
I’m grateful to you for this useful insight.

So what you are saying is:

1- orderId is always coming in increasing numbers in the response (and only numbers, no characters), sorted ascendingly, and you can never encounter two orders in the response o1, o2 where o2.orderId<=o1.orderId, right?

I quickly reviewed some responses and found #1 above to apply, which supports your theory - but do you know about something reliable we can count on in which #1 is mentioned explicitly … API documentation for example?

2- In the response, “time=” is guaranteed to be always sorted ascendingly for all orders in the response — is that right?

Thanks again for this wondaful approach of handling orders retreival.

ilammy · November 30, 2022, 7:13am

Yes. orderId is unique numeric ID of the order. Every new order gets a new orderId, greater than the previous one. They are unique for symbol – i.e. orders for different symbols might share an ID, but you can never query orders without a symbol.
time is order creation timestamp. It is set when the order is created and never changes afterwards. startTime and endTime filter orders by this field, and it’s also used to sort the response.

williams · November 30, 2022, 7:31am

Thanks a lot @ilammy ilammy for the precise information and for the time you spent providing it.

williams · November 30, 2022, 4:07pm

Any idea on how to do pagination for GET /sapi/v1/asset/assetDividend ?

1- The technique of incrementing the orderId (in GET api/v3/allOrders) and invoking the endpoint again is not applicable here
2- The technique of pagination using current=1,2,3,… or page=1,2,3, … is not applicable here as well.

mentioning @ilammy

ilammy · December 1, 2022, 1:29am

I’m sorry, I’m not knowledgeable about those APIs

@dino, @aisling2, you are our only hope!

jonte · December 1, 2022, 4:05am

Hey @williams,
Good question. Since that endpoint has the endTime param available, you can do pagination using that. What you can do is create a loop and create a variable, eg. earliestTime, which tracks the earliest divTime encountered in the results thus far.

Once the query is completed, run it again with endTime = earliestTime. Using this method you’ll be able to leverage endTime to effectively paginate through the results.

Basic Sample Integration in Python to demonstrate:

import time
from binance.spot import Spot

apiKey = "<APIKEY>"
secretKey = "<APISECRET>"

client = Spot(apiKey, secretKey)

def getResults(earliestTime):
    return(client.asset_dividend_record(end_time=earliestTime))

#initialise earlistTime to current timestamp
earliestTime = int(time.time_ns() / 1000)

#first call of method using current timestamp as endTime
res = getResults(earliestTime)

flag = 0

while flag == 0:
    prevEarliestTime = earliestTime
    for x in res['rows']:
        print(x, '\n')
        if x['divTime'] < earliestTime:
            #found earlier timestamp
            earliestTime = x['divTime']
    if prevEarliestTime == earliestTime:
        #reached end of results, exit loop
        flag = 1
    else:
        #call method again with new earliestTime
        res = getResults(earliestTime)
print("No more records")

williams · December 1, 2022, 8:35am

Thanks a lot @jonte , it is interesting that we both got attracted to the {“divTime”:} in the response!

I can see also your model is seeking the retrieval of more recent items and will then continue to retrieve older item, which I’m totally okay with, to the extent that I will post a similar approach (but moving in the reverse time direction, from older to newer items) that I thought about before seeing your esteemed answer.

My only concern is that we both assumed that the the item with the earliest (in your time direction), or the latest in my reverse time direction, is the item before / after which we will start the next call to the endpoint - which is just an assumption I did that is not backed by the official documentation! what if the earliest item in the response is not actually the earliest in the provided startTime/endTime span? what if the actually “earliest” item was not returned at all in the response simply because we exceeded the “limit=”? I think we need an authoritative answer from Binance themselves about the layout of the items returned in the response, otherwise we are both just have a hypothesis that is yet to be proven.

Did I miss something? can you or anyone think of something that proves our hypothesis about the layout of the returned items in the response?

Last but not least many thanks for the time you spent writing this elegant code.

jonte · December 2, 2022, 5:24am

The code I provided is not necessarily complete, I just provided it to give you a start in the right direction.

what if the actually “earliest” item was not returned at all in the response simply because we exceeded the “limit=”?

In my example it doesn’t matter if the true earliest item wasn’t returned in the response due to reaching the limit because by passing earliestTime as the endTime, it will still return items prior to the earliest item previously returned. So to address your concerns on missing/skipping over any items, that isn’t actually an issue in this approach. Hopefully that makes sense.

williams · December 1, 2022, 8:42am

Before I read the reply of @jonte above, I thought about the approach below. Although I see hat the hypotheses of @jonte is more conservative and safer than my hypothesis below, I thought it will be useful to bot both in hypothesis in front of all because I see that both hypothesis to be unconfirmed in Binance API documentation.

I thought about a technique but I do not feel it is the best, I still think others like @dino @aisling2 and all others can get us a more efficient and cleaner one.
the technique is inspired by the brilliant approach proposed by @ilammy in The efficient fetching / pagination of too many orders - #2 by ilammy

1- request “GET sapi/v1/asset/assetDividend” with any startTime and EndTime
2- From the response of the first request above, get the {“total”:} value which is the number of items in the timespean you specified in #1 - This is not essentially the number of items actually returned in the response because the number of items in the response is limited by “limit=” in the request, but the actual number of items exist between startTime/endTime might be greater than the number in {“total”:}
3- If the number from {“total”:} is greater than the number of items in “limit=” then you know there are more items in the startTime/endTime span that you did not receive - otherwise you are done
4- If #3 is true, then the first item in the response has {“divTime”:} of the latest item in the response, Let us call this T — This is my most critical assumption: I assume all items in the response are sorted from more recent to older items and hence the first item is the latest one. I have no reliable documentation for this but I only tested it on some samples and found it to apply. This is not of course something reliable to rely on!
5- Trigger another request to “GET sapi/v1/asset/assetDividend” such that the startTime=T+1 in the request and then repeat the entire cycle above again.

Any better insights from all and from @dino @aisling2 are highly welcomed and appreciated.

williams · December 2, 2022, 11:24am

Thank you so much @jonte for your eagerness to help, I feel that I need to consider your comment in The efficient fetching / pagination of too many orders - #11 by jonte (among others) and go with the most relevant. Thank you again.

jonte · December 5, 2022, 5:58am

You’re most welcome. Good luck and happy coding mate