Logistic regression, also called logic regression or logic modeling, is a statistical technique allowing researchers to create predictive models. The technique is most useful for understanding the influence of several independent variables on a single dichotomous outcome variable. For example, logistic regression would allow a researcher to evaluate the influence of grade point average, test scores and curriculum difficulty on the outcome variable of admission to a particular university. The technique is useful, but it has significant limitations.
Identifying Independent Variables
Logistic regression attempts to predict outcomes based on a set of independent variables, but if researchers include the wrong independent variables, the model will have little to no predictive value. For example, if college admissions decisions depend more on letters of recommendation than test scores, and researchers don't include a measure for letters of recommendation in their data set, then the logit model will not provide useful or accurate predictions. This means that logistic regression is not a useful tool unless researchers have already identified all the relevant independent variables.
Limited Outcome Variables
Logistic regression works well for predicting categorical outcomes like admission or rejection at a particular college. It can also predict multinomial outcomes, like admission, rejection or wait list. However, logistic regression cannot predict continuous outcomes. For example, logistic regression could not be used to determine how high an influenza patient's fever will rise, because the scale of measurement -- temperature -- is continuous. Researchers could attempt to convert the measurement of temperature into discrete categories like "high fever" or "low fever," but doing so would sacrifice the precision of the data set. This is a significant disadvantage for researchers working with continuous scales.
Independent Observations Required
Logistic regression requires that each data point be independent of all other data points. If observations are related to one another, then the model will tend to overweight the significance of those observations. This is a major disadvantage, because a lot of scientific and social-scientific research relies on research techniques involving multiple observations of the same individuals. For example, drug trials often use matched pair designs that compare two similar individuals, one taking a drug and the other taking a placebo. Logistic regression is not an appropriate technique for studies using this design.
Overfitting the Model
Logistic regression attempts to predict outcomes based on a set of independent variables, but logit models are vulnerable to overconfidence. That is, the models can appear to have more predictive power than they actually do as a result of sampling bias. In the college admissions example, a random sample of applicants might lead a logit model to predict that all students with a GPA of at least 3.7 and a SAT score in the 90th percentile will always be admitted. In reality, however, the college might reject some small percentage of these applicants. A logistic regression would therefore be "overfit," meaning that it overstates the accuracy of its predictions.
Nick Robinson is a writer, instructor and graduate student. Before deciding to pursue an advanced degree, he worked as a teacher and administrator at three different colleges and universities, and as an education coach for Inside Track. Most of Robinson's writing centers on education and travel.