Why Linear Mixed Models Are Usually the Better Choice for Pre–Post Experimental and Control Group Data

A student’s proposal, and a familiar methodological problem

A student recently contacted me with a thesis proposal built around a very common design: one experimental group, one control group, and two measurements for each participant, one before and one after the intervention. The student proposed to analyze the data using paired-samples t-tests within groups, independent-samples t-tests between groups, and Mann–Whitney test as a nonparametric alternative. The goal was perfectly understandable: evaluate the effect of the intervention and determine whether the experimental group achieved better outcomes than the control group.

This is a very common starting point among students and young researchers. The problem is not that these methods are useless in general, but that they do not fit the full logic of this design very well. When we have two groups and two repeated measurements on the same persons, the core analytical question is not simply whether one group changed, or whether two groups differ at one moment in time. The key question is whether the change over time differs between the groups. That is exactly the type of question for which linear mixed models, or LMMs, are usually better suited.

An intuitive interpretation of the linear mixed model

A linear mixed model can be understood as a model that estimates the overall effects we care about, such as the effect of group, the effect of time, and especially the interaction between group and time, while at the same time recognizing that repeated observations from the same person are connected. In a pre–post design, each participant contributes more than one score, and those scores are not independent. LMM acknowledges that each participant has their own starting point and their own pattern of deviation around the average trend.

That is the basic intuition. The model contains a fixed part, where we estimate the main research effects, and a random part, where we account for individual-level variability. Instead of pretending that all observations are independent, LMM reflects the actual structure of the data. That is why it is often methodologically preferable in intervention studies with repeated measurements.

LMM versus paired-samples t-test

The paired-samples t-test answers a narrow question: did the mean score in one group change from pre-test to post-test? That can be useful as a descriptive first look, but it is not the main inferential question in a two-group intervention design. If we run one paired t-test in the experimental group and another in the control group, we may end up saying that one result is statistically significant and the other is not. But that does not prove that the amount of change differs significantly between the two groups.

This is one of the most common mistakes in applied research. A significant change in one group and a non-significant change in another group is not the same thing as a significant difference in change between groups. LMM avoids this problem because it estimates the group-by-time interaction directly. That interaction is usually the real intervention effect.

The paired t-test does have some advantages. It is simple, familiar, and easy to explain. For a single-group pre–post comparison, it can be perfectly appropriate. Its weakness is that it treats each group separately and does not model the design as a whole. It also offers little flexibility if we want to add covariates, examine participant heterogeneity, or deal with incomplete post-test data. In that sense, the paired t-test is easy, but often too limited.

LMM versus independent-samples t-test and Mann–Whitney

The independent-samples t-test compares two independent groups on one outcome. That can be useful for comparing the experimental and control groups at baseline or at post-test. Mann–Whitney test can be used when the researcher wants a rank-based alternative. But neither method truly captures the repeated-measures nature of the design.

If the researcher compares groups only at post-test, baseline information is underused or ignored. If the researcher computes change scores and then compares those scores between groups, the design is reduced to a single derived variable. That is sometimes acceptable as a rough strategy, but it throws away part of the structure that LMM can model directly.

Mann–Whitney test has the advantage of being less sensitive to strict normality assumptions, but it does not solve the deeper design issue. It remains a two-group comparison method, not a repeated-measures model. In other words, it may help with distributional concerns, but it does not address within-person dependence. That is why it should not be treated as a general methodological substitute for a model specifically built for repeated observations.

LMM versus ANOVA and repeated measures ANOVA

ANOVA and repeated measures ANOVA are much more serious competitors to LMM than the t-tests, because they can analyze group, time, and interaction simultaneously. In a simple 2 × 2 design, repeated measures ANOVA can indeed test the main intervention question. This is why it remains common in applied research.

Its strengths are clear. It is widely taught, available in nearly every statistical package, and familiar to reviewers and readers. In balanced datasets with complete observations and a very simple design, it can perform adequately and produce interpretable results.

Still, LMM is usually preferable because it is more flexible and more realistic. Repeated measures ANOVA is more rigid in the way it handles covariance among repeated observations. It is also less comfortable when data are incomplete, when some participants miss the post-test, or when individual trajectories vary more than the classical framework expects. In real empirical research, these complications are common, not exceptional.

For exactly two time points, the well-known sphericity issue is not the decisive limitation, because with only two repeated measurements that assumption is automatically satisfied. But even then, LMM remains attractive because it models participant-level variation more naturally and adapts better when the dataset is not perfectly balanced.

So the real comparison is not “ANOVA bad, LMM good.” A better formulation is this: repeated measures ANOVA can be acceptable in very simple and clean cases, while LMM is usually the stronger general-purpose method for this design.

Assumptions still matter in LMM

A very important point must be emphasized. Choosing LMM does not mean that assumptions no longer matter. Researchers sometimes incorrectly assume that mixed models are so flexible that assumption checking becomes optional. That is not true.

Assumptions should still be checked, but they must be checked in relation to the fitted model. For LMM with a continuous outcome, the main issues usually include approximate normality of residuals, approximate normality of random effects, linearity between predictors and outcome, homogeneity or at least acceptable behavior of residual variance, appropriate specification of the random structure, and the absence of strongly influential outliers. In repeated measures ANOVA and standard ANOVA settings, we also worry about normality, homogeneity of variance, influential observations, and, in designs with more than two repeated measures, sphericity.

In practice, these assumptions can be examined through residual-versus-fitted plots, Q–Q plots of residuals, inspection of random effects, influence diagnostics, and comparison of alternative model structures. The methodological lesson here is crucial: assumptions belong to the model, not simply to the verbal description of the design as “two groups and two measurements.”

Common mistakes students often make

One common mistake is to run separate paired t-tests and treat those results as proof of an intervention effect. Another is to compare only post-test means and ignore baseline variation. A third is to believe that choosing a nonparametric test automatically solves the methodological problem. A fourth is to check normality only on raw variables and not on model residuals. Yet another is to assume that repeated measures ANOVA and LMM are interchangeable in all practical situations.

These are not trivial technicalities. They affect the actual meaning of the conclusion. If the analytical method does not correspond to the structure of the data, even a neatly reported result can be methodologically weak.

Final methodological recommendation

In a pre–post design with an experimental and a control group, the central research interest is usually the difference in change over time between groups. That is an interaction problem. Paired t-tests, independent t-tests, and Mann–Whitney tests can only address fragments of that problem. Repeated measures ANOVA comes closer, but it is less flexible and often less robust in realistic research settings.

That is why linear mixed models usually deserve priority. They match the repeated-measures structure of the data, estimate the key group-by-time effect directly, account for the dependence of observations within persons, and provide a better framework for handling real-world data complications. They do not remove the need for careful assumption checking and thoughtful interpretation, but they give the researcher a more adequate methodological tool for the job.