Regression

Linear Regression

Prerequisites

Principal component regression

Source

  1. apply PCA (Dimensionality Reduction#Feature Selection) to generate principal components from the predictor variables, with the number of principal components matching the number of original features p
  2. keep the first k principal components that explain most of the variance (where k < p), where k is determined by cross-validation
  3. fit a linear regression model on these k principal components

Partial least squares (PLS) regression

Logistic Regression

A logistic regression model:

logit(p)=ln(p1p)=β0+β1X1+β2X2+βkXk

The odd ratio is the ratio of two groups' odds of some outcome:

odd ratio=pa1pa/pb1pb

Odd ratios can be derived from logistic coefficients:

βx=logit(pa)logit(pb)=ln(odd ratio)

Multivariate regression

Info

Multivariate regression != Multiple regression

R-Squared: coefficient of determination

R2=1sum squared regression (SSR)total sum of squares(SST)=1(yiyi^)2(yiy)2

You can regard the R-Squared as how much the total variance of y is captured by the model (rather than errors), i.e.

var(y)=var(Xβ^)+var(e)R2=var(Xβ^)var(y)

so it measures the goodness of fit, but it does not validate the model.

Adjusted R-Squared

Adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model.

Adj R2=1(1R2)n1np1

where p is the number of regressors/predictors, n is the sample size.

Why you need adjusted R-squared?
It is important, because adding independent variables will make the R-squared never decrease, then you cannot tell whether the increase of R-squared is due to the goodness of fit or more variables.
It includes the penalising factor that penalises you for adding independent variables that don't help your model.