Understanding Diagnostic Metrics for Regressors

In this blog post, we’ll explore some key metrics for evaluating regressors, including linear correlation, Spearman’s rho, mean absolute error (MAE), root mean squared error (RMSE), and information criteria like BIC and AIC. These metrics are vital in understanding the performance of regression models, especially in education and other data-driven fields.


Linear Correlation: How Do Variables Relate?

Linear correlation measures the strength of the linear relationship between two variables, A and B. Specifically, when A changes, does B change in the same direction? It assumes a linear relationship between the variables.

  • Correlation = 1: Perfect positive linear relationship.
  • Correlation = 0: No linear relationship.
  • Correlation = -1: Perfect negative linear relationship.

The interpretation of correlation depends on the field. For example, in physics, a correlation of 0.80 may be considered poor, while in education, a correlation of 0.30 can be exciting. This difference arises because educational outcomes are often influenced by many factors, making smaller correlations acceptable.

Examples:
A correlation of 1 represents a perfect line, while a correlation of 0 appears as a random cloud of points. Even though r=0.4 may show some pattern, it is far from a perfect line. The visual distinctions help emphasize that correlation measures the strength of a linear relationship (how well the model fits the data) but does not explain everything about the relationship. For example, there might be a trend forming, an outlier, or just a few outliers causing the slope. In such cases, a plot can provide valuable insights and help identify these patterns more clearly.

R-squared ([math]R^2[/math])

R-squared represents the square of the correlation coefficient(r). It quantifies the proportion of the variance in the dependent variable that is explained by the independent variables in a regression model.

R-squared is often preferred over [math]r[/math] in regression models involving multiple predictors (e.g., when predicting A using variables B, C, D, and E) because it provides a clearer indication of how well the model explains the variability in the outcome. The use of [math]R^2[/math] as a goodness-of-fit measure can vary depending on the field or community.

Spearman’s Rho (ρ)

Spearman’s rho is a rank-based correlation, making it more robust to outliers and suited for relationships that are monotonic but not necessarily linear. Instead of measuring raw values, it ranks the data and computes a correlation on those ranks. Like Pearson’s correlation, Spearman’s rho ranges from -1 to 1 and is interpreted similarly. It is interpreted in the same manner as Pearson’s correlation:

  • 1 indicates a perfect positive correlation,
  • 0 indicates no correlation, and
  • -1 represents a perfect negative correlation.

Spearman’s correlation is more robust to outliers, determines how monotonic a relationship is, not how linear it is.

[A monotonic relationship refers to a relationship between two variables where the variables move in a single direction but not necessarily at a constant rate. In a monotonic increasing relationship, as one variable increases, the other variable either always increases or remains constant. Similarly, in a monotonic decreasing relationship, as one variable increases, the other variable either always decreases or remains constant.]


Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)

To evaluate a model’s performance, we often use error metrics like MAE and RMSE.

Mean Absolute Error (MAE):

The MAE measures the average magnitude of errors in a set of predictions, without considering their direction. It is the mean of the absolute differences between actual and predicted values:

[math] \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} | y_i – \hat{y}_i | [/math]

Where:

  • [math] y_i \text{ represents the actual value for the i-th observation.} [/math]
  • [math] y^i [/math]
  • [math] n [/math] is the number of data points.

MAE tell you the average amount to which the predictions deviate from the actual values. It is intuitive and easy to understand. However, it treats all errors equally.

Root Mean Squared Error (RMSE):

RMSE is similar to MAE, but it squares the errors before averaging and takes the square root at the end:

[math]\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( y_i – \hat{y}_i \right)^2}[/math]

RMSE penalises large deviations more than small deviations. It’s often preferred over MAE for this reason.

Note:

  • Low RMSE/MAE, High Correlation => is good model
  • High RMSE/MAE, Low Correlation  => is bad model
  • * High RMSE/MAE, High Correlation => Model goes in the right direction, but is systematically biased.
  • * Low RMSE/MAE, Low Correlation => Model values are in the right range, but model doesn’t capture relative change. (particularly common if there’s not much variation in data.)

Information Criteria: BiC and AiC

When comparing models, information criteria such as BiC (Bayesian Information Criterion) and AIC (Akaike’s Information Criterion) help balance goodness of fit and model complexity.

Bayesian Information Criterion (BIC):

The Bayesian Information Criterion (BIC) is a method used to compare statistical models. It helps balance fit and complexity by adding a penalty for models with more parameters. A lower BIC value indicates a better model, as it fits the data well without being overly complex.

  • BIC > 0:
    The model fits worse than expected, given the number of parameters. A higher BIC suggests the model is more complex without offering a better fit.
  • BIC < 0:
    The model fits better than expected, given the number of parameters. A lower BIC indicates a good balance between simplicity and fit, making the model preferable.

BIC Prime (BIC’) is an extension of the Bayesian Information Criterion (BIC), often used in specific model comparisons. It’s interpreted similarly to BIC but with an adjusted approach to model fit and complexity. Here’s how to interpret it:

Interpreting BIC Prime:
  • BIC’ > 0:
    The model is worse than expected, given the number of parameters and the complexity. A positive BIC’ suggests that the model is overfitting or unnecessarily complex relative to the fit it provides.
  • BIC’ < 0:
    The model is better than expected, indicating a good fit for the given complexity. A negative BIC’ means the model strikes a good balance between simplicity and accuracy, making it preferable.

When comparing models: A lower (or more negative) BIC’ indicates a better model. If one model has a BIC’ significantly lower than another, it’s considered statistically superior in terms of balancing model complexity and goodness of fit.

 

Akaike’s Information Criterion (AIC):

Akaike’s Information Criterion (AIC) is a measure used to compare statistical models, focusing on the trade-off between model fit and complexity. Like BIC, AIC penalizes models with more parameters to prevent overfitting, but it does so less severely than BIC. AIC helps select the model that best explains the data while avoiding overly complex models.AIC makes a slightly different trade-off between model complexity and goodness of fit than BIC.

How to Interpret AIC
  • AIC values: Lower AIC values indicate a better model. When comparing models, the model with the lowest AIC is preferred because it provides the best balance between accuracy and simplicity.
  • AIC difference (ΔAIC\Delta \text{AIC}ΔAIC): The difference in AIC values between two models provides insight into how much better one model is compared to another. The following guidelines can help interpret this difference:
    • ΔAIC < 2: The models are equally good.
    • ΔAIC 2 – 4: There is some evidence favoring the model with the lower AIC.
    • ΔAIC 4 – 7: There is clear evidence favoring the model with the lower AIC.
    • ΔAIC > 10: There is strong evidence favoring the model with the lower AIC.
Key Points:
  • Lower AIC: Better model, balancing fit and simplicity.
  • AIC difference: Helps determine the degree of support for the better model.
  • Multiple models: AIC allows you to rank models from best to worst, where the lowest AIC is preferred.

Evaluating models involves a variety of metrics. While correlation and Spearman’s rho help understand relationships between variables, error metrics like MAE and RMSE assess the accuracy of predictions. BIC and AIC offer additional insights, balancing fit and complexity. No single metric tells the whole story; understanding a model’s performance requires a combination of these tools.

Reference:

Baker, R.S. (2024) Big Data and Education. 8th Edition. Philadelphia, PA: Universiwty of Pennsylvania.