6.
Statistical Considerations
There will be four main analytic approaches: 1) estimation of incidence rates of
complications (Aims 1 and 2); 2) estimation of prevalence of complications (Aims 1 and 2);
3) longitudinal evaluation of predictors of outcomes and conditions, including consideration
of potential mediators and moderators (Aims 1, 2 and 4); and 4) evaluation of mortality rates,
including comparison with age-comparable non-diabetic youth (Aim 3). Here we provide
selected examples to illustrate analytic strategies and to provide key information regarding
sample size and detectable differences.
6.1.
STATISTICAL ANALYTIC METHODS
6.1.1.
Incidence Rate Estimation
Because all SEARCH Cohort Study participants will have had at least one previous
SEARCH in-person visit, we will be able to define a group of participants who were free
from the event of interest (i.e., normotensive) at “baseline”. Multiple logistic regression
methods will be employed to examine the incidence rates of binary measures (e.g.,
hypertension) of interest. Predictors can include categorical or continuous variables. A
continuous variable that measures the time between visits for each participant (to account
for the fact that individuals will have different lengths of follow-up) and the predictor- by-
time interaction will be included. Next, we will expand the logistic regression model to
include other participant level characteristics (e.g., SEARCH clinical center, age, and
gender [a “demographically adjusted model”]). We will then expand the model to adjust
for other covariates. In addition, we will examine potential interactions; if significant
interaction is present, analyses will be performed stratified by that characteristic.
6.1.2.
Prevalence Estimation
Some of the outcomes of interest will not have been measured during SEARCH 1 or 2,
such as outcomes including retinopathy and neuropathy. Therefore, prevalence of these
outcomes will be estimated. Models to evaluate cross-sectional associations of risk
factors will use logistic regression and will proceed as described above to account for
potential confounding or effect modification.
6.1.3.
Longitudinal Models
All participants in the SEARCH Cohort Study will have already had at least one in-
person visit during SEARCH 1 and 2, and ~75% of the 2002-2005 incidence cases have
at least 2 in-person visits per the SEARCH 2 protocol. Since SEARCH 2 also included
longitudinal data (there are over 2000 SEARCH participants already with at least one
follow-up visit), our team developed a plan for modeling longitudinal data. Specifically,
we will use longitudinal mixed effects analysis of covariance models that always include
duration of diabetes as a time-varying covariate. This approach correctly models the
varying durations of disease prior to the initial SEARCH in-person visit, and the varying
Section 6B - Statistical Considerations (Phase 3 - 12/2010)
Section 6B - Page 2
Cohort
Study
durations of time allowed via the SEARCH data collection windows between the initial
and subsequent visits.
The initial model will examine outcomes (measured previously between 1 (baseline) and
4 times (baseline, 12, 24, 60 mo visits) and once during the SEARCH Cohort Study
visit), the predictor of interest (e.g., DM type), the duration of diabetes at each
measurement time and the predictor-by-diabetes duration interaction. These models will
then be expanded to include demographic information (e.g., sex) that would be
considered as fixed/non-time varying effects. In addition, based on our experience with
performing these longitudinal analyses on the SEARCH 2 cohort, we also propose to
consider treating the exposure (predictor) of interest as a time-varying covariate in these
models as well. This will allow the time-varying correlation of the predictor to the
outcome of interest to be modeled correctly. We will also consider adding other time-
varying covariates (e.g., BMI z-score) into these models as needed to examine the
specific relationships being studied. These mixed effects models also are flexible to
allow for potentially non-linear relationships to be modeled over time, and permit random
rates of progression, consistent with a perspective that different participants progress
through time at different rates. Use of random intercepts and/or slopes provides a source
of autocorrelation between repeated measures. More flexible structures for the
correlation between repeated measures will be investigated using combination mixed
models that allow the specification of separate parameters representing variation between
experimental units, and serial correlation within units. Our choice of methods for
accounting for serial correlation depends on the plausibility of the model, and the number
of outcomes relative to the number of participants. For example, with many participants
and few repeated measurements, an unstructured covariance matrix can often provide for
the most efficient estimation of model parameters.
For analysis of longitudinal discrete outcomes (e.g. transfer of care from a pediatric to
adult provider), we will use the generalized estimating equation (GEE) approach to fit
logistic or log-linear models that account for the dependency between repeated measures.
GEE techniques allow estimation of model parameters and their standard errors from
longitudinal data having continuous and categorical responses and potentially missing
observations. An advantage of this technique is that the assumptions required are weaker
than those of maximum likelihood techniques: one need not specify the distribution of the
dependent variable, just the relationships between the marginal mean and variance, and
between the marginal mean and covariates.
Section 6B - Statistical Considerations (Phase 3 - 12/2010)
Section 6B - Page 3
Cohort
Study
6.1.4.
Dostları ilə paylaş: |