Forecasting

When is the R² Not the R²?

June 26, 2019

A recently received support email contains a riddle.

“When I build a regression model in Excel and remove the intercept, I get a different R² than when I do the same thing in MetrixND. Why does this happen?”

To be clear, the estimated regression model for both Excel and MetrixND excludes the intercept term as shown below.

So, why does Excel and MetrixND produce different R² values?

MetrixND Regressions and R² Calculation

In MetrixND, R² is calculated as shown below.

This is the standard definition for R² and is constructed as one minus the ratio of the sum of squared errors (SSE) divided by the total sum of squares (TSS).

The mathematical equivalent can be written as the model sum of squares (MSS) divided by the total sum of squares (TSS) as shown below. This is becomes important at the end (i.e. spoiler alert!).

 

Since I’m paranoid, I double check the equations by building a regression model and manually validating the MetrixND equations in Excel. Check. The manual equations confirm that MetrixND is calculating the R² according to the standard definition.

Excel Regressions and R² Calculation

If MetrixND is correct, then what’s wrong with Excel?

First, I need to run the Excel regression module. If you’ve never done it, you can do it with the following steps.

Step 1: Select Data, Data Analysis and pick the Regression option. This opens the Regression dialogue box.

Step 2: Fill out the dialogue box by identifying the X and Y data ranges and select your options. For my test, I select the check box for “Constant is Zero” to remove the intercept from the regression equation, highlighted in red:

Step 3: Select “OK” and I can see the answer.

Following this process, I compare the Excel and MetrixND R2 values, and indeed, I get different numbers. So what’s wrong?

I wade through the morass of internet posts and find a Microsoft confession: “… if you click the Set Intercept = 0 box, the R-squared value is always incorrect in any version of Excel…”

There it is. Microsoft says they are incorrect.

What is Excel Doing?

After a bit more digging, I am happy to confirm that Excel calculates R² using the following formula only when the intercept is removed (please note that Excel uses the standard definition which matches the MetrixND results when the intercept is included in the regression model).

 

When you consider this calculation, you can see that this version of R² is much like the mathematical equivalent version (MSS/TSS) I showed above. Here it is again for your reference:

 

If y ̅ = 0, then the equations are the same. Thus, when the intercept is removed, Excel changes the formula and re-centers the R² calculation around 0 not y ̅. In Microsoft’s words, this is “incorrect”.

When is the R² Not the R²?

When the R² formula is incorrect. If you find the word “incorrect” overly harsh, perhaps the word “different” is better. After all, it’s just a formula.

Want to learn more? Attend an Itron Forecasting Workshop!

By Mark Quan


Principal Forecast Consultant


Mark Quan est consultant principal en prévisions au sein de la division des prévisions d'Itron. Depuis qu'il a rejoint Itron en 1997, M. Quan s'est spécialisé dans les solutions de prévision énergétique à court et à long terme, ainsi que dans les projets de recherche sur la charge. Quan a développé et mis en œuvre plusieurs systèmes de prévision automatisés pour prédire la demande système du lendemain, les profils de charge et la consommation au détail pour des entreprises aux États-Unis et au Canada. Les solutions de prévision à court terme comprennent des systèmes pour le « Midwest Independent System Operator » (MISO) et le « California Independent System Operator » (CAISO). Les solutions de prévision à long terme comprennent le développement et le soutien des prévisions à long terme (ventes et clients) pour des clients tels que « Dairyland Power » et « Omaha Public Power District ». Ces prévisions comprennent des informations sur l'utilisation finale et les impacts de la gestion de la demande dans un cadre économétrique. Enfin, Quan a participé à la mise en œuvre de systèmes de recherche de charge, notamment chez Snohomish PUD. Avant de rejoindre Itron, Quan a travaillé dans les secteurs du gaz, de l'électricité et de l'entreprise chez Pacific Gas and Electric Company (PG&E), où il a participé à la restructuration du secteur, à la planification de l'électricité et à la planification du gaz naturel. M. Quan est titulaire d'un master en recherche opérationnelle de l'université de Stanford et d'une licence en mathématiques appliquées de l'université de Californie à Los Angeles.


Region Selector Select a region and country for the best experience.