StatTools : Logistic Regression Explained and Code Frqgments

Links : Home Index (Subjects) Contact StatTools

Related Links:

R Programming Language Explained Page

Introduction Data Named group designation Numerical group designation References

Logistic regression is an extension of linear regression, where the outcome is the probability of binomial 0/1 variable. The general formula is

    z = const + b1x1 + b2x2 + b3x3 ...etc, where
    x1, x2, x3 and so on are independent variables, either binary (0/1) or ordinal (0, 1, 2, 3..etc)
    The product of the coefficient and group designation for each independent varaible, is the log odds ratio of that group to the Reference group designated 0
    After obtaining y, the probability of the outcome is calculated by y = 1 / (1 + exp(-z))

Different statistical packages have default approaches to calculating logistic regression

  • Where an independent variable is binary (0/1), the coefficients is the log odds ratio of group 1 to group 0
  • Where an independent variable has more than 2 groups, e.g. 3 grouops of 0, 1, and 2
    • The first option is to treat the 3 groups together, so that the log odds ratio of each group to group 0 is the product of group designation and the coefficient. log odds ratio Grp 1/Grp 0 = b, grp2 / Grp 0 = 2b
    • The second option is to transform all groups into binary dummy variables. The number of dummy variables being the number of group -1. In the case of 3 groups, two dummy variables are created d1 and d2, so the d1=0 and d2=0 for group 0, d1=1 and d2=0 for group 2, and d1=0 and d2=1 for group 3.

Both options for handling independent variables with multiple groups are available in R, with the following conventions

  • Where groups are represented numerically, such as 0, 1, 2, 3... R performs logistic regression with groups in eqach variable
  • Where groups are represented by names in text, such as one, two, three, then R convertes each group to the appropriate number of dummy variables