Regression
Linear Regression
Prerequisites
- linearity
- homoscedasticity
- multivariate normality
- independence of errors
- lack of multicollinearity
Principal component regression
- apply PCA (Dimensionality Reduction#Feature Selection) to generate principal components from the predictor variables, with the number of principal components matching the number of original features p
- keep the first k principal components that explain most of the variance (where k < p), where k is determined by cross-validation
- fit a linear regression model on these k principal components
Partial least squares (PLS) regression
- when dependent variables are multiple and can be correlated
- similar to Regression#Principal component regression but run PCA and regression in one go (apply PCA on both independent and dependent variables)
Logistic Regression
A logistic regression model:
The odd ratio is the ratio of two groups' odds of some outcome:
Odd ratios can be derived from logistic coefficients:
Multivariate regression
Multivariate regression != Multiple regression
R-Squared: coefficient of determination
- It assesses the goodness of fit of regression models.
- It represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.
- It can be negative in extreme cases, but usually it is in
.
You can regard the R-Squared as how much the total variance of y is captured by the model (rather than errors), i.e.
so it measures the goodness of fit, but it does not validate the model.
Adjusted R-Squared
Adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model.
where
Why you need adjusted R-squared?
It is important, because adding independent variables will make the R-squared never decrease, then you cannot tell whether the increase of R-squared is due to the goodness of fit or more variables.
It includes the penalising factor that penalises you for adding independent variables that don't help your model.