Thanks for the feedback. I understand that the high R^2 scores reported in Pasion et. al. were based on the overall dataset whereas the cross-validated scores are between 0.501 and 0.687.

Also, it should be noted that since the overall data includes the data on which the model has been trained, an overfitted model may give a very high accuracy for such data points hence, masking the true performance of the model.

The comment about training individual models for each location is an open one which needs to be tested via experiments to confirm whether the location feature "fully" reflects the intricacies of each location.

However, my intuition tells me that the location feature will not capture all inherent characteristics of a location but may capture enough for a reasonably accurate model.

Abiodun Olaoye
Abiodun Olaoye

Written by Abiodun Olaoye

Mechanical and computational engineer (PhD @ MIT), Data scientist | Renewable energy systems expert.

Responses (1)