Error Metrics

Regression Metrics

Mean Absolute Error (MAE)
- measures the absolute difference between the predicted and actual values.
- Be careful about MAE. Compared to MSE, it does not penalize large errors. But if your loss (Cost Functions) is just proportional to the size of error, then maybe MAE is better.
Mean Squared Error (MSE)
- measures the squared difference between the predicted and actual values.
Root Mean Squared Error (RMSE)
- the square root of the MSE, which provides a measure of the average magnitude of the error.
R-squared (R^2):
- measures the proportion of variance in the target variable that is explained by the model.
- Regression#R-Squared coefficient of determination

Source

Accuracy
- measures the proportion of correct predictions out of total predictions.
Precision
- measures the proportion of true positives (correctly predicted positives) out of all predicted positives.
Recall (Sensitivity)
- measures the proportion of true positives out of all actual positives.
F1 score
- a weighted average of precision and recall that balances both measures.
ROC curve
- Confusion Matrix#Receiver Operating Characteristic (ROC)
- ROC characteristics: only works for binary classification
- ROC AUC: area under ROC curve, works for multiclass

MAE
MSE
Mean Absolute Percentage Error (MAPE): measures the percentage difference between the predicted and actual values.
Symmetric Mean Absolute Percentage Error (SMAPE): similar to MAPE, but takes the average of the percentage difference between the predicted and actual values and the percentage difference between the actual and predicted values.
Mean Absolute Scaled Error (MASE): measures the relative accuracy of forecasts compared to a naive forecast based on the historical data.

General evaluation criteria: a good clustering algorithm should have small within-cluster variance and large between-cluster variance.

Rand Index
- measures the similarity between the cluster assignments by making pair-wise comparisons
- Rand Index = # pair-wise correct predictions / # number of all possible pairs
- a higher score signifies higher similarity
- Adjusted Rand Index: adjusts for chance by discounting a chance normalization term -> range from -1 to 1, random (uniform) label assignments have scores as 0
Mutual Information
- measures the agreement between the cluster assignments
V-measure
- measures the correctness (homogeneity and completeness) of the cluster assignments using conditional entropy analysis\
- range from 0 to 1
Fowlkes-Mallows Scores
- measure the correctness of the cluster assignments using pairwise precision and recall

Silhouette score/coefficient
- measures the between-cluster distance against within-cluster distance
- range from -1 to 1:
  - A score of 1 denotes the best, meaning that the data point i is very compact within the cluster to which it belongs and far away from the other clusters
  - Values near 0 denote overlapping clusters
  - mean silhouette over 0.6 is considered a "good" clustering solution
- tend to be higher for convex clusters
Calinski-Harabasz Index/Variance Ratio Criterion
- measures the between-cluster dispersion against within-cluster dispersion
- hard to interpret
Davies-Bouldin Index
- measures the size of clusters against the average distance between clusters
- a lower score signifies better-defined clusters
- tends to be higher for density-based clustering