What is the return?


I’m trying to understand the data of the challenge. Actually I still don’t understand what is the return. In the video at 3:05 it is said that it indicates wether or not the price increased during the next 5 minutes.
Well actually it could have make sense but when plotting the graph with the data, here is what I get:


  • return = -1 --> red dot
  • return = 1 --> green dot
  • return = 0 --> blue dot

So actually you can see red dots while the price actually increases over the next 5 minutes and green dots while the price actually decreases over the last 5 minutes. In the same manner if we assume that the return actually indicates the direction of the price over the last 5 minutes… it doesn’t work…

So, I don’t understand… What is the meaning of the return ?

Note: for reproductibility purpose, here is a piece of code that will plot the same graph:

import pandas as pd
df = pd.read_csv('data/training_input.csv', sep=';')

X = np.array(df.iloc[0, 3:57])
classes = np.array(df.iloc[0, 57:111])
plt.scatter(np.where(classes == 1)[0], X[classes==1], c='g', s=50)
plt.scatter(np.where(classes == -1)[0], X[classes==-1], c='r', s=50)
plt.scatter(np.where(classes == 0)[0], X[classes==0], c='b', s=50)

Hello Twice22,
The data of the training set you are plotting are volatilities (df.iloc[0, 3:57]), not prices.
I think that return do indicates if prices increase or decrease but is not directly related to volatilities increase or decrease. Volatility is measure of the “spread” of the prices. The fluctuations of the price is not directly related to the value of the price. For instance, you can have a decrease in volatility but still an increase of prices thus positive return.
If you look carefully at the graph in the video, it shows prices, not volatilities.
So at the end, you reasoning is right for prices but not for volatilities.


Ah yes thank you. My bad I made a confusion between volatilities and prices

Thank you @AiOpH, this is a good description of the situation.

The key point is that a high volatility during some time interval means large price variations in this interval. This says nothing about the price change.

However minute the price change is, it has a sign, which is what is reported in the “return” columns.

@Twice22: df.columns[3:57] will confirm that your curve displays volatilities, and therefore only price variation amplitudes (not returns).

How to Visualize this Data?

How to Visualize the Data?

You can run the code in the original post in a Jupyter notebook. The only missing part is that you need to use Python and also to import the plotting module, which can be done with

from matplotlib import pyplot as plt

Thank you for your reply…
I have used a lgbm classification model and the metric is about 0.5204 using light GBM can you suggest any other models for greater accuracy?

I must say that this is precisely one of the questions that is implicitly asked to participants, so I will let other participants contribute to answers if they want to!

Can you tell me if any preprocessing is required on the stock data if yes then which columns should be preprocessed ???

This is yet another question that is asked from data scientists taking part in a data challenge. There are plenty of resources on the web on how to make predictions from data, in particular with machine learning: this is the best place for information.

Please use this forum only for questions that cannot be answered by searching the web. Thanks!