Bibliographic record and links to related information available from the Library of Congress catalog.

**Note:** Contents data are machine generated based on pre-publication provided by the publisher. Contents may have variations from the printed book or be incomplete or contain other coding.

Contents Preface page xv 1 Introduction 1 1.1 Algebraic notation 1 1.2 Descriptive statistics 3 1.2.1 Dot plot 3 1.2.2 Sample mean 4 1.2.3 Residual 4 1.2.4 Sample variance 4 1.2.5 Sample standard deviation 5 1.2.6 Percentile and median 5 1.2.7 Box plot 5 1.2.8 Histogram 6 1.2.9 Scatter plot 6 1.3 The Stata Statistical Software Package 7 1.3.1 Downloading data from my website 8 1.3.2 Creating histograms with Stata 9 1.3.3 Stata command syntax 12 1.3.4 Obtaining interactive help from Stata 13 1.3.5 Stata log files 14 1.3.6 Stata graphics and schemes 14 1.3.7 Stata do files 15 1.3.8 Stata pulldown menus 15 1.3.9 Displaying other descriptive statistics with Stata 20 1.4 Inferential statistics 22 1.4.1 Probability density function 23 1.4.2 Mean, variance and standard deviation 24 1.4.3 Normal distribution 24 1.4.4 Expected value 25 1.4.5 Standard error 25 Contents 1.4.6 Null hypothesis, alternative hypothesis and, P-value 26 1.4.7 95% confidence interval 27 1.4.8 Statistical power 27 1.4.9 The z and Student's t distributions 29 1.4.10 Paired t test 30 1.4.11 Performing paired t tests with Stata 31 1.4.12 Independent t test using a pooled standard error estimate 34 1.4.13 Independent t test using separate standard error estimates 35 1.4.14 Independent t tests using Stata 36 1.4.15 The chi-squared distribution 38 1.5 Overview of methods discussed in this text 39 1.5.1 Models with one response per patient 40 1.5.2 Models with multiple responses per patient 41 1.6 Additional reading 41 1.7 Exercises 42 2 Simple linear regression 45 2.1 Sample covariance 45 2.2 Sample correlation coefficient 47 2.3 Population covariance and correlation coefficient 47 2.4 Conditional expectation 48 2.5 Simple linear regression model 49 2.6 Fitting the linear regression model 50 2.7 Historical trivia: origin of the term regression 52 2.8 Determining the accuracy of linear regression estimates 53 2.9 Ethylene glycol poisoning example 54 2.10 95% confidence interval for y[x] = · + 'x evaluated at x 55 2.11 95% prediction interval for the response of a new patient 56 2.12 Simple linear regression with Stata 57 2.13 Lowess regression 64 2.14 Plotting a lowess regression curve in Stata 64 2.15 Residual analyses 66 2.16 Studentized residual analysis using Stata 69 2.17 Transforming the x and y variables 70 2.17.1 Stabilizing the variance 70 2.17.2 Correcting for non-linearity 71 Contents 2.17.3 Example: research funding and morbidity for 29 diseases 72 2.18 Analyzing transformed data with Stata 74 2.19 Testing the equality of regression slopes 79 2.19.1 Example: the Framingham Heart Study 81 2.20 Comparing slope estimates with Stata 82 2.21 Density-distribution sunflower plots 87 2.22 Creating density-distribution sunflower plots with Stata 88 2.23 Additional reading 92 2.24 Exercises 93 3 Multiple linear regression 97 3.1 The model 97 3.2 Confounding variables 98 3.3 Estimating the parameters for a multiple linear regression model 99 3.4 R2 statistic for multiple regression models 99 3.5 Expected response in the multiple regression model 99 3.6 The accuracy of multiple regression parameter estimates 100 3.7 Hypothesis tests 101 3.8 Leverage 101 3.9 95% confidence interval for øyi 102 3.10 95% prediction intervals 102 3.11 Example: the Framingham Heart Study 102 3.11.1 Preliminary univariate analyses 103 3.12 Scatterplot matrix graphs 105 3.12.1 Producing scatterplot matrix graphs with Stata 105 3.13 Modeling interaction in multiple linear regression 107 3.13.1 The Framingham example 107 3.14 Multiple regression modeling of the Framingham data 109 3.15 Intuitive understanding of a multiple regression model 110 3.15.1 The Framingham example 110 3.16 Calculating 95% confidence and prediction intervals 114 3.17 Multiple linear regression with Stata 114 3.18 Automatic methods of model selection 119 3.18.1 Forward selection using Stata 120 3.18.2 Backward selection 122 3.18.3 Forward stepwise selection 123 Contents 3.18.4 Backward stepwise selection 123 3.18.5 Pros and cons of automated model selection 124 3.19 Collinearity 124 3.20 Residual analyses 125 3.21 Influence 126 3.21.1 .£] influence statistic 127 3.21.2 Cook°¶s distance 127 3.21.3 The Framingham example 128 3.22 Residual and influence analyses using Stata 129 3.23 Using multiple linear regression for non-linear models 133 3.24 Building non-linear models with restricted cubic splines 134 3.24.1 Choosing the knots for a restricted cubic spline model 137 3.25 The SUPPORT Study of hospitalized patients 138 3.25.1 Modeling length-of-stay and MAP using restricted cubic splines 138 3.25.2 Using Stata for non-linear models with restricted cubic splines 142 3.26 Additional reading 154 3.27 Exercises 155 4 Simple logistic regression 159 4.1 Example: APACHE score and mortality in patients with sepsis 159 4.2 Sigmoidal family of logistic regression curves 159 4.3 The log odds of death given a logistic probability function 161 4.4 The binomial distribution 162 4.5 Simple logistic regression model 163 4.6 Generalized linear model 163 4.7 Contrast between logistic and linear regression 164 4.8 Maximum likelihood estimation 164 4.8.1 Variance of maximum likelihood parameter estimates 165 4.9 Statistical tests and confidence intervals 166 4.9.1 Likelihood ratio tests 166 4.9.2 Quadratic approximations to the log likelihood ratio function 167 4.9.3 Score tests 168 Contents 4.9.4 Wald tests and confidence intervals 168 4.9.5 Which test should you use? 169 4.10 Sepsis example 170 4.11 Logistic regression with Stata 171 4.12 Odds ratios and the logistic regression model 174 4.13 95% confidence interval for the odds ratio associated with a unit increase in x 175 4.13.1 Calculating this odds ratio with Stata 175 4.14 Logistic regression with grouped response data 176 4.15 95% confidence interval for øOE[x] 176 4.16 Exact 100(1 . øø)% confidence intervals for proportions 177 4.17 Example: the Ibuprofen in Sepsis Study 178 4.18 Logistic regression with grouped data using Stata 181 4.19 Simple 2 Å 2 case-control studies 187 4.19.1 Example: the Ille-et-Vilaine Study of esophageal cancer and alcohol 187 4.19.2 Review of classical case-control theory 188 4.19.3 95% confidence interval for the odds ratio: Woolf Åfs method 189 4.19.4 Test of the null hypothesis that the odds ratio equals one 190 4.19.5 Test of the null hypothesis that two proportions are equal 190 4.20 Logistic regression models for 2 Å 2 contingency tables 191 4.20.1 Nuisance parameters 191 4.20.2 95% confidence interval for the odds ratio: logistic regression 191 4.21 Creating a Stata data file 192 4.22 Analyzing case.control data with Stata 195 4.23 Regressing disease against exposure 197 4.24 Additional reading 198 4.25 Exercises 199 5 Multiple logistic regression 201 5.1 Mantel.Haenszel estimate of an age-adjusted odds ratio 201 5.2 Mantel.Haenszel ø'2 statistic for multiple 2 Å 2 tables 203 5.3 95% confidence interval for the age-adjusted odds ratio 204 Contents 5.4 Breslow°VDay°VTarone test for homogeneity 204 5.5 Calculating the Mantel°VHaenszel odds ratio using Stata 206 5.6 Multiple logistic regression model 210 5.6.1 Likelihood ratio test of the influence of the covariates on the response variable 211 5.7 95% confidence interval for an adjusted odds ratio 211 5.8 Logistic regression for multiple 2 °; 2 contingency tables 212 5.9 Analyzing multiple 2 °; 2 tables with Stata 214 5.10 Handling categorical variables in Stata 216 5.11 Effect of dose of alcohol on esophageal cancer risk 217 5.11.1 Analyzing Model (5.25) with Stata 219 5.12 Effect of dose of tobacco on esophageal cancer risk 221 5.13 Deriving odds ratios from multiple parameters 221 5.14 The standard error of a weighted sum of regression coefficients 222 5.15 Confidence intervals for weighted sums of coefficients 222 5.16 Hypothesis tests for weighted sums of coefficients 223 5.17 The estimated variance°Vcovariance matrix 223 5.18 Multiplicative models of two risk factors 224 5.19 Multiplicative model of smoking, alcohol, and esophageal cancer 225 5.20 Fitting a multiplicative model with Stata 227 5.21 Model of two risk factors with interaction 231 5.22 Model of alcohol, tobacco, and esophageal cancer with interaction terms 233 5.23 Fitting a model with interaction using Stata 234 5.24 Model fitting: nested models and model deviance 238 5.25 Effect modifiers and confounding variables 240 5.26 Goodness-of-fit tests 240 5.26.1 The Pearson £q2 goodness-of-fit statistic 241 5.27 Hosmer°VLemeshow goodness-of-fit test 242 5.27.1 An example: the Ille-et-Vilaine cancer data set 242 5.28 Residual and influence analysis 244 5.28.1 Standardized Pearson residual 245 5.28.2 .£] j influence statistic 245 5.28.3 Residual plots of the Ille-et-Vilaine data on esophageal cancer 246 5.29 Using Stata for goodness-of-fit tests and residual analyses 248 5.30 Frequency matched case-control studies 258 5.31 Conditional logistic regression 258 Contents 5.32 Analyzing data with missing values 258 5.32.1 Imputing data that is missing at random 259 5.32.2 Cardiac output in the Ibuprofen in Sepsis Study 260 5.32.3 Modeling missing values with Stata 263 5.33 Logistic regression using restricted cubic splines 265 5.33.1 Odds ratios from restricted cubic spline models 266 5.33.2 95% confidence intervals for .£r [x] 267 5.34 Modeling hospital mortality in the SUPPORT Study 267 5.35 Using Stata for logistic regression with restricted cubic splines 271 5.36 Regression methods with a categorical response variable 278 5.36.1 Proportional odds logistic regression 278 5.36.2 Polytomous logistic regression 279 5.37 Additional reading 282 5.38 Exercises 283 6 Introduction to survival analysis 287 6.1 Survival and cumulative mortality functions 287 6.2 Right censored data 289 6.3 Kaplan°VMeier survival curves 290 6.4 An example: genetic risk of recurrent intracerebral hemorrhage 291 6.5 95% confidence intervals for survival functions 293 6.6 Cumulative mortality function 295 6.7 Censoring and bias 296 6.8 Log-rank test 296 6.9 Using Stata to derive survival functions and the log-rank test 298 6.10 Log-rank test for multiple patient groups 305 6.11 Hazard functions 306 6.12 Proportional hazards 306 6.13 Relative risks and hazard ratios 307 6.14 Proportional hazards regression analysis 309 6.15 Hazard regression analysis of the intracerebral hemorrhage data 310 6.16 Proportional hazards regression analysis with Stata 310 6.17 Tied failure times 311 6.18 Additional reading 312 6.19 Exercises 312 Contents 7 Hazard regression analysis 315 7.1 Proportional hazards model 315 7.2 Relative risks and hazard ratios 315 7.3 95% confidence intervals and hypothesis tests 317 7.4 Nested models and model deviance 317 7.5 An example: the Framingham Heart Study 317 7.5.1 Kaplan--Meier survival curves for DBP 317 7.5.2 Simple hazard regression model for CHD risk and DBP 318 7.5.3 Restricted cubic spline model of CHD risk and DBP 320 7.5.4 Categorical hazard regression model of CHD risk and DBP 323 7.5.5 Simple hazard regression model of CHD risk and gender 324 7.5.6 Multiplicative model of DBP and gender on risk of CHD 325 7.5.7 Using interaction terms to model the effects of gender and DBP on CHD 326 7.5.8 Adjusting for confounding variables 327 7.5.9 Interpretation 329 7.5.10 Alternative models 330 7.6 Proportional hazards regression analysis using Stata 331 7.7 Stratified proportional hazards models 348 7.8 Survival analysis with ragged study entry 349 7.8.1 Kaplan--Meier survival curve and the log-rank test with ragged entry 350 7.8.2 Age, sex, and CHD in the Framingham Heart Study 350 7.8.3 Proportional hazards regression analysis with ragged entry 351 7.8.4 Survival analysis with ragged entry using Stata 351 7.9 Predicted survival, log--log plots and the proportional hazards assumption 354 7.9.1 Evaluating the proportional hazards assumption with Stata 357 7.10 Hazard regression models with time-dependent covariates 359 7.10.1 Testing the proportional hazards assumption 361 Contents 7.10.2 Modeling time-dependent covariates with Stata 362 7.11 Additional reading 370 7.12 Exercises 370 8 Introduction to Poisson regression: inferences on morbidity and mortality rates 373 8.1 Elementary statistics involving rates 373 8.2 Calculating relative risks from incidence data using Stata 374 8.3 The binomial and Poisson distributions 376 8.4 Simple Poisson regression for 2×2 tables 376 8.5 Poisson regression and the generalized linear model 378 8.6 Contrast between Poisson, logistic, and linear regression 379 8.7 Simple Poisson regression with Stata 379 8.8 Poisson regression and survival analysis 381 8.8.1 Recoding survival data on patients as patient--year data 381 8.8.2 Converting survival records to person--years of follow-up using Stata 383 8.9 Converting the Framingham survival data set to person--time data 386 8.10 Simple Poisson regression with multiple data records 392 8.11 Poisson regression with a classification variable 393 8.12 Applying simple Poisson regression to the Framingham data 395 8.13 Additional reading 397 8.14 Exercises 398 9 Multiple Poisson regression 401 9.1 Multiple Poisson regression model 401 9.2 An example: the Framingham Heart Study 404 9.2.1 A multiplicative model of gender, age and coronary heart disease 405 9.2.2 A model of age, gender and CHD with interaction terms 408 9.2.3 Adding confounding variables to the model 410 9.3 Using Stata to perform Poisson regression 411 9.4 Residual analyses for Poisson regression models 423 9.4.1Deviance residuals 423 Contents 9.5 Residual analysis of Poisson regression models using Stata 424 9.6 Additional reading 427 9.7 Exercises 427 10 Fixed effects analysis of variance 429 10.1 One-way analysis of variance 429 10.2 Multiple comparisons 431 10.3 Reformulating analysis of variance as a linear regression model 433 10.4 Non-parametric methods 434 10.5 Kruskal--Wallis Test 435 10.6 Example: a polymorphism in the estrogen receptor gene 435 10.7 User contributed software in Stata 438 10.8 One-way analyses of variance using Stata 439 10.9 Two-way analysis of variance 446 10.10 Additional reading 448 10.11 Exercises 448 11 Repeated-measures analysis of variance 451 11.1 Example: effect of race and dose of isoproterenol on blood flow 451 11.2 Exploratory analysis of repeated measures data using Stata 453 11.3 Response feature analysis 459 11.4 Example: the isoproterenol data set 460 11.5 Response feature analysis using Stata 463 11.6 The area-under-the-curve response feature 468 11.7 Generalized estimating equations 470 11.8 Common correlation structures 470 11.9 GEE analysis and the Huber--White sandwich estimator 472 11.10 Example: analyzing the isoproterenol data with GEE 473 11.11 Using Stata to analyze the isoproterenol data set using GEE 476 11.12 GEE analyses with logistic or Poisson models 481 11.13 Additional reading 481 11.14 Exercises 482 Contents Appendices A Summary of statistical models discussed in this text 485 A.1 Models for continuous response variables with one response per patient 485 A.2 Models for dichotomous or categorical response variables with one response per patient 486 A.3 Models for survival data (follow-up time plus fate at exit observed on each patient) 487 A.4 Models for response variables that are event rates or the number of events during a specified number of patient--years of follow-up. The event must be rare 489 A.5 Models with multiple observations per patient or matched or clustered patients 489 B Summary of Stata commands used in this text 491 B.1 Data manipulation and description 491 B.2 Analysis commands 493 B.3 Graph commands 497 B.4 Common options for graph commands (insert after comma) 500 B.5 Post-estimation commands (affected by preceding regression-type command) 502 B.6 Command prefixes 504 B.7 Command qualifiers (insert before comma) 504 B.8 Logical and relational operators and system variables (see Stata User's Guide) 505 B.9 Functions (see Stata Data Management Manual) 506 References 507 Index 513

Library of Congress Subject Headings for this publication:

Medicine -- Research -- Statistical methods -- Mathematical models.

Biometry -- methods -- Problems and Exercises.

Data Interpretation, Statistical -- Problems and Exercises.

Mathematical Computing -- Problems and Exercises.

Models, Statistical -- Problems and Exercises.