Table of contents for Statistical modeling for biomedical researchers : a simple introduction to the analysis of complex data / William D. Dupont.

Bibliographic record and links to related information available from the Library of Congress catalog.

Note: Contents data are machine generated based on pre-publication provided by the publisher. Contents may have variations from the printed book or be incomplete or contain other coding. Contents
Preface page xv
1 Introduction 1
1.1 Algebraic notation 1
1.2 Descriptive statistics 3
1.2.1 Dot plot 3
1.2.2 Sample mean 4
1.2.3 Residual 4
1.2.4 Sample variance 4
1.2.5 Sample standard deviation 5
1.2.6 Percentile and median 5
1.2.7 Box plot 5
1.2.8 Histogram 6
1.2.9 Scatter plot 6
1.3 The Stata Statistical Software Package 7
1.3.2 Creating histograms with Stata 9
1.3.3 Stata command syntax 12
1.3.4 Obtaining interactive help from Stata 13
1.3.5 Stata log files 14
1.3.6 Stata graphics and schemes 14
1.3.7 Stata do files 15
1.3.8 Stata pulldown menus 15
1.3.9 Displaying other descriptive statistics with Stata 20
1.4 Inferential statistics 22
1.4.1 Probability density function 23
1.4.2 Mean, variance and standard deviation 24
1.4.3 Normal distribution 24
1.4.4 Expected value 25
1.4.5 Standard error 25
Contents
1.4.6 Null hypothesis, alternative hypothesis and,
P-value 26
1.4.7 95% confidence interval 27
1.4.8 Statistical power 27
1.4.9 The z and Student's t distributions 29
1.4.10 Paired t test 30
1.4.11 Performing paired t tests with Stata 31
1.4.12 Independent t test using a pooled standard error
estimate 34
1.4.13 Independent t test using separate standard error
estimates 35
1.4.14 Independent t tests using Stata 36
1.4.15 The chi-squared distribution 38
1.5 Overview of methods discussed in this text 39
1.5.1 Models with one response per patient 40
1.5.2 Models with multiple responses per patient 41
1.7 Exercises 42
2 Simple linear regression 45
2.1 Sample covariance 45
2.2 Sample correlation coefficient 47
2.3 Population covariance and correlation coefficient 47
2.4 Conditional expectation 48
2.5 Simple linear regression model 49
2.6 Fitting the linear regression model 50
2.7 Historical trivia: origin of the term regression 52
2.8 Determining the accuracy of linear regression estimates 53
2.9 Ethylene glycol poisoning example 54
2.10 95% confidence interval for y[x] = · + 'x evaluated at x 55
2.11 95% prediction interval for the response of a new patient 56
2.12 Simple linear regression with Stata 57
2.13 Lowess regression 64
2.14 Plotting a lowess regression curve in Stata 64
2.15 Residual analyses 66
2.16 Studentized residual analysis using Stata 69
2.17 Transforming the x and y variables 70
2.17.1 Stabilizing the variance 70
2.17.2 Correcting for non-linearity 71
Contents
2.17.3 Example: research funding and morbidity for 29
diseases 72
2.18 Analyzing transformed data with Stata 74
2.19 Testing the equality of regression slopes 79
2.19.1 Example: the Framingham Heart Study 81
2.20 Comparing slope estimates with Stata 82
2.21 Density-distribution sunflower plots 87
2.22 Creating density-distribution sunflower plots with Stata 88
2.24 Exercises 93
3 Multiple linear regression 97
3.1 The model 97
3.2 Confounding variables 98
3.3 Estimating the parameters for a multiple linear regression
model 99
3.4 R2 statistic for multiple regression models 99
3.5 Expected response in the multiple regression model 99
3.6 The accuracy of multiple regression parameter
estimates 100
3.7 Hypothesis tests 101
3.8 Leverage 101
3.9 95% confidence interval for øyi 102
3.10 95% prediction intervals 102
3.11 Example: the Framingham Heart Study 102
3.11.1 Preliminary univariate analyses 103
3.12 Scatterplot matrix graphs 105
3.12.1 Producing scatterplot matrix graphs with Stata 105
3.13 Modeling interaction in multiple linear regression 107
3.13.1 The Framingham example 107
3.14 Multiple regression modeling of the Framingham data 109
3.15 Intuitive understanding of a multiple regression model 110
3.15.1 The Framingham example 110
3.16 Calculating 95% confidence and prediction intervals 114
3.17 Multiple linear regression with Stata 114
3.18 Automatic methods of model selection 119
3.18.1 Forward selection using Stata 120
3.18.2 Backward selection 122
3.18.3 Forward stepwise selection 123
Contents
3.18.4 Backward stepwise selection 123
3.18.5 Pros and cons of automated model selection 124
3.19 Collinearity 124
3.20 Residual analyses 125
3.21 Influence 126
3.21.1 .£] influence statistic 127
3.21.2 Cook°¶s distance 127
3.21.3 The Framingham example 128
3.22 Residual and influence analyses using Stata 129
3.23 Using multiple linear regression for non-linear models 133
3.24 Building non-linear models with restricted cubic splines 134
3.24.1 Choosing the knots for a restricted cubic spline
model 137
3.25 The SUPPORT Study of hospitalized patients 138
3.25.1 Modeling length-of-stay and MAP using restricted
cubic splines 138
3.25.2 Using Stata for non-linear models with restricted
cubic splines 142
3.27 Exercises 155
4 Simple logistic regression 159
4.1 Example: APACHE score and mortality in patients with
sepsis 159
4.2 Sigmoidal family of logistic regression curves 159
4.3 The log odds of death given a logistic probability
function 161
4.4 The binomial distribution 162
4.5 Simple logistic regression model 163
4.6 Generalized linear model 163
4.7 Contrast between logistic and linear regression 164
4.8 Maximum likelihood estimation 164
4.8.1 Variance of maximum likelihood parameter
estimates 165
4.9 Statistical tests and confidence intervals 166
4.9.1 Likelihood ratio tests 166
4.9.2 Quadratic approximations to the log likelihood
ratio function 167
4.9.3 Score tests 168
Contents
4.9.4 Wald tests and confidence intervals 168
4.9.5 Which test should you use? 169
4.10 Sepsis example 170
4.11 Logistic regression with Stata 171
4.12 Odds ratios and the logistic regression model 174
4.13 95% confidence interval for the odds ratio associated
with a unit increase in x 175
4.13.1 Calculating this odds ratio with Stata 175
4.14 Logistic regression with grouped response data 176
4.15 95% confidence interval for øOE[x] 176
4.16 Exact 100(1 . øø)% confidence intervals for proportions 177
4.17 Example: the Ibuprofen in Sepsis Study 178
4.18 Logistic regression with grouped data using Stata 181
4.19 Simple 2 Å
2 case-control studies 187
4.19.1 Example: the Ille-et-Vilaine Study of esophageal
cancer and alcohol 187
4.19.2 Review of classical case-control theory 188
4.19.3 95% confidence interval for the odds ratio:
Woolf Åfs method 189
4.19.4 Test of the null hypothesis that the odds ratio
equals one 190
4.19.5 Test of the null hypothesis that two proportions
are equal 190
4.20 Logistic regression models for 2 Å
2 contingency
tables 191
4.20.1 Nuisance parameters 191
4.20.2 95% confidence interval for the odds ratio: logistic
regression 191
4.21 Creating a Stata data file 192
4.22 Analyzing case.control data with Stata 195
4.23 Regressing disease against exposure 197
4.25 Exercises 199
5 Multiple logistic regression 201
5.1 Mantel.Haenszel estimate of an age-adjusted odds ratio 201
5.2 Mantel.Haenszel ø'2 statistic for multiple 2 Å
2
tables 203
5.3 95% confidence interval for the age-adjusted odds ratio 204
Contents
5.4 Breslow°VDay°VTarone test for homogeneity 204
5.5 Calculating the Mantel°VHaenszel odds ratio using Stata 206
5.6 Multiple logistic regression model 210
5.6.1 Likelihood ratio test of the influence of the
covariates on the response variable 211
5.7 95% confidence interval for an adjusted odds ratio 211
5.8 Logistic regression for multiple 2 °; 2 contingency tables 212
5.9 Analyzing multiple 2 °; 2 tables with Stata 214
5.10 Handling categorical variables in Stata 216
5.11 Effect of dose of alcohol on esophageal cancer risk 217
5.11.1 Analyzing Model (5.25) with Stata 219
5.12 Effect of dose of tobacco on esophageal cancer risk 221
5.13 Deriving odds ratios from multiple parameters 221
5.14 The standard error of a weighted sum of
regression coefficients 222
5.15 Confidence intervals for weighted sums of coefficients 222
5.16 Hypothesis tests for weighted sums of coefficients 223
5.17 The estimated variance°Vcovariance matrix 223
5.18 Multiplicative models of two risk factors 224
5.19 Multiplicative model of smoking, alcohol, and
esophageal cancer 225
5.20 Fitting a multiplicative model with Stata 227
5.21 Model of two risk factors with interaction 231
5.22 Model of alcohol, tobacco, and esophageal cancer with
interaction terms 233
5.23 Fitting a model with interaction using Stata 234
5.24 Model fitting: nested models and model deviance 238
5.25 Effect modifiers and confounding variables 240
5.26 Goodness-of-fit tests 240
5.26.1 The Pearson £q2 goodness-of-fit statistic 241
5.27 Hosmer°VLemeshow goodness-of-fit test 242
5.27.1 An example: the Ille-et-Vilaine cancer data set 242
5.28 Residual and influence analysis 244
5.28.1 Standardized Pearson residual 245
5.28.2 .£] j influence statistic 245
5.28.3 Residual plots of the Ille-et-Vilaine data on
esophageal cancer 246
5.29 Using Stata for goodness-of-fit tests and residual analyses 248
5.30 Frequency matched case-control studies 258
5.31 Conditional logistic regression 258
Contents
5.32 Analyzing data with missing values 258
5.32.1 Imputing data that is missing at random 259
5.32.2 Cardiac output in the Ibuprofen in Sepsis Study 260
5.32.3 Modeling missing values with Stata 263
5.33 Logistic regression using restricted cubic splines 265
5.33.1 Odds ratios from restricted cubic spline models 266
5.33.2 95% confidence intervals for .£r [x] 267
5.34 Modeling hospital mortality in the SUPPORT Study 267
5.35 Using Stata for logistic regression with restricted cubic
splines 271
5.36 Regression methods with a categorical response variable 278
5.36.1 Proportional odds logistic regression 278
5.36.2 Polytomous logistic regression 279
5.38 Exercises 283
6 Introduction to survival analysis 287
6.1 Survival and cumulative mortality functions 287
6.2 Right censored data 289
6.3 Kaplan°VMeier survival curves 290
6.4 An example: genetic risk of recurrent intracerebral
hemorrhage 291
6.5 95% confidence intervals for survival functions 293
6.6 Cumulative mortality function 295
6.7 Censoring and bias 296
6.8 Log-rank test 296
6.9 Using Stata to derive survival functions and the log-rank
test 298
6.10 Log-rank test for multiple patient groups 305
6.11 Hazard functions 306
6.12 Proportional hazards 306
6.13 Relative risks and hazard ratios 307
6.14 Proportional hazards regression analysis 309
6.15 Hazard regression analysis of the intracerebral
hemorrhage data 310
6.16 Proportional hazards regression analysis with Stata 310
6.17 Tied failure times 311
6.19 Exercises 312
Contents
7 Hazard regression analysis 315
7.1 Proportional hazards model 315
7.2 Relative risks and hazard ratios 315
7.3 95% confidence intervals and hypothesis tests 317
7.4 Nested models and model deviance 317
7.5 An example: the Framingham Heart Study 317
7.5.1 Kaplan--Meier survival curves for DBP 317
7.5.2 Simple hazard regression model for CHD risk and
DBP 318
7.5.3 Restricted cubic spline model of CHD risk and
DBP 320
7.5.4 Categorical hazard regression model of CHD risk
and DBP 323
7.5.5 Simple hazard regression model of CHD risk and
gender 324
7.5.6 Multiplicative model of DBP and gender on risk of
CHD 325
7.5.7 Using interaction terms to model the effects of
gender and DBP on CHD 326
7.5.8 Adjusting for confounding variables 327
7.5.9 Interpretation 329
7.5.10 Alternative models 330
7.6 Proportional hazards regression analysis using Stata 331
7.7 Stratified proportional hazards models 348
7.8 Survival analysis with ragged study entry 349
7.8.1 Kaplan--Meier survival curve and the log-rank test
with ragged entry 350
7.8.2 Age, sex, and CHD in the Framingham Heart
Study 350
7.8.3 Proportional hazards regression analysis with
ragged entry 351
7.8.4 Survival analysis with ragged entry using Stata 351
7.9 Predicted survival, log--log plots and the proportional
hazards assumption 354
7.9.1 Evaluating the proportional hazards assumption
with Stata 357
7.10 Hazard regression models with time-dependent
covariates 359
7.10.1 Testing the proportional hazards assumption 361
Contents
7.10.2 Modeling time-dependent covariates with Stata 362
7.12 Exercises 370
8 Introduction to Poisson regression: inferences
on morbidity and mortality rates 373
8.1 Elementary statistics involving rates 373
8.2 Calculating relative risks from incidence data using Stata 374
8.3 The binomial and Poisson distributions 376
8.4 Simple Poisson regression for 2×2 tables 376
8.5 Poisson regression and the generalized linear model 378
8.6 Contrast between Poisson, logistic, and linear regression 379
8.7 Simple Poisson regression with Stata 379
8.8 Poisson regression and survival analysis 381
8.8.1 Recoding survival data on patients as patient--year
data 381
8.8.2 Converting survival records to person--years of
follow-up using Stata 383
8.9 Converting the Framingham survival data set to
person--time data 386
8.10 Simple Poisson regression with multiple data records 392
8.11 Poisson regression with a classification variable 393
8.12 Applying simple Poisson regression to the Framingham
data 395
8.14 Exercises 398
9 Multiple Poisson regression 401
9.1 Multiple Poisson regression model 401
9.2 An example: the Framingham Heart Study 404
9.2.1 A multiplicative model of gender, age and
coronary heart disease 405
9.2.2 A model of age, gender and CHD with interaction
terms 408
9.2.3 Adding confounding variables to the model 410
9.3 Using Stata to perform Poisson regression 411
9.4 Residual analyses for Poisson regression models 423
9.4.1Deviance residuals 423
Contents
9.5 Residual analysis of Poisson regression models using
Stata 424
9.7 Exercises 427
10 Fixed effects analysis of variance 429
10.1 One-way analysis of variance 429
10.2 Multiple comparisons 431
10.3 Reformulating analysis of variance as a linear
regression model 433
10.4 Non-parametric methods 434
10.5 Kruskal--Wallis Test 435
10.6 Example: a polymorphism in the estrogen receptor
gene 435
10.7 User contributed software in Stata 438
10.8 One-way analyses of variance using Stata 439
10.9 Two-way analysis of variance 446
10.11 Exercises 448
11 Repeated-measures analysis of variance 451
11.1 Example: effect of race and dose of isoproterenol on
blood flow 451
11.2 Exploratory analysis of repeated measures data using
Stata 453
11.3 Response feature analysis 459
11.4 Example: the isoproterenol data set 460
11.5 Response feature analysis using Stata 463
11.6 The area-under-the-curve response feature 468
11.7 Generalized estimating equations 470
11.8 Common correlation structures 470
11.9 GEE analysis and the Huber--White sandwich estimator 472
11.10 Example: analyzing the isoproterenol data with
GEE 473
11.11 Using Stata to analyze the isoproterenol data set using
GEE 476
11.12 GEE analyses with logistic or Poisson models 481
11.14 Exercises 482
Contents
Appendices
A Summary of statistical models discussed
in this text 485
A.1 Models for continuous response variables with one
response per patient 485
A.2 Models for dichotomous or categorical response variables
with one response per patient 486
A.3 Models for survival data (follow-up time plus fate at exit
observed on each patient) 487
A.4 Models for response variables that are event rates or the
number of events during a specified number of
patient--years of follow-up. The event must be rare 489
A.5 Models with multiple observations per patient or
matched or clustered patients 489
B Summary of Stata commands used in this text 491
B.1 Data manipulation and description 491
B.2 Analysis commands 493
B.3 Graph commands 497
B.4 Common options for graph commands (insert after
comma) 500
B.5 Post-estimation commands (affected by preceding
regression-type command) 502
B.6 Command prefixes 504
B.7 Command qualifiers (insert before comma) 504
B.8 Logical and relational operators and system variables (see
Stata User's Guide) 505
B.9 Functions (see Stata Data Management Manual) 506
References 507
Index 513

Library of Congress Subject Headings for this publication:

Medicine -- Research -- Statistical methods -- Mathematical models.
Biometry -- methods -- Problems and Exercises.
Data Interpretation, Statistical -- Problems and Exercises.
Mathematical Computing -- Problems and Exercises.
Models, Statistical -- Problems and Exercises.