Hi, thanks for organizing this challenge!
I’ve got a quick question regarding the “tod” parameter:
On the Data Challenge website, it is specified that the “tod” parameter is to be interpreted as: “when the trade was executed, given as a number of milliseconds since midnight (local time).”(quote)
However, when looking as the Data (cf picture), it appears that the values are way too large to correspond to any local time taken since midnight and would more likely represent a time in years from a certain date.
Besides, during the live presentation of the challenge it was -If I’m not mistaken- specified that days were randomized purposely so as the problem might not be considered from a time series perspective.
So my question would be: could you please provide a little bit more info about those timestamps? Is since midnight (local time) to be understood as since midnight (local time) of a specific year? But then how come we can derive from this the exact day when the Dataset was built to make it -as far as I understood it -inaccessible?
Hope I’m clear enough in my request, any additional insight would be welcome,
Thanks in advance,
Jean-Baptiste
EDIT: of course this question also applies to the ts_last_update parameters
My guess is that these are in microseconds, and I am checking this.
As for the dates, they should indeed be shuffled. However, obviously the data throughout a fixed day is time stamped. So it is possible to build a dynamic model of what happens across all stocks depending on the time of the day.
PS: the timestamps are indeed in microseconds. This is fixed in the description, now. Thank you for pointing out the discrepancy.
Ahah, thanks for your edit, I felt quite dumb this morning as I thought I just misread the documentation and mistaken micro for millisecond. Glad that my post was to some extent useful.
Everything make more sense then but I’m still not too sure about how to take into account this local time thing:
When looking at the timestamps, the time lapses between the transactions are now quite small [cf pic -relative time expressed in seconds] and no systematic time difference can be observed (I mean time difference between the different venues local times).
So is there such a thing as a global local time for all the data-set or something trickier is at play? One could imagine for instance that for each sample, the local time of the first or last trade is taken as reference.
This would be suggested by the sentence:
The time is given by the timestamp of the last trade in the history of trades and will be described below in the section “Trades”
But it is also specified:
Its timestamp (‘tod’): when the trade was executed, given as a number of microseconds since midnight (local time).
so I’m not sure how to interpret both piece of information,
Ok great, I didn’t think of this alternative because in the description NYSE and NASDAQ which are not in the same timezone were provided as examples.
The second quote would suggest (as I understand it) that time zones should come into play for each timestamp but if all the 6 venues are in the same timezone then this is actually solved.
(just for fun, I have checked this list: https://www.wikiwand.com/en/List_of_stock_exchanges and 6 financial markets sharing the same timezone strongly suggest those to be the european financial markets as being the only 6 main venues sharing a common timezone CET/CEST)
The “tod” values are milliseconds since midnight of a randomized day to prevent time series analysis. These values don’t directly correspond to standard time but are meant to be processed similarly to a Working Hour Calculator like Calculette Mauricette for consistency.