This property does not have to hold.
One reason is that the baseline (“benchmark”) can be naive (no model is perfect, and the baseline can obviously be improved on): calculating the mean like in the benchmark doesn’t have to make full sense.
Another reason is that the property that you are guessing can only be approximate, at best: in fact, it makes sense that the volatility on infinitesimally small time slices be zero (prices do not have time to move), so the volatility on larger time slices cannot be calculated from the volatility on smaller slices (which are all zero, in this extreme case). So in general it doesn’t hold.
The baseline shows that it makes some sense to thus average volatilities (since predicting them this way gives a decent performance), but again doing this doesn’t have to (and cannot) make perfect sense.