Statistical methods for marginal inference from multivariate ordinal data Nooraee, Nazanin

Pages 33
Views 6

Please download to get full document.

View again

of 33
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Description
Statistical methods for marginal inference from multivariate ordinal data Nooraee, Nazanin DOI: /j.csda IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's
Transcript
Statistical methods for marginal inference from multivariate ordinal data Nooraee, Nazanin DOI: /j.csda IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2015 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Nooraee, N. (2015). Statistical methods for marginal inference from multivariate ordinal data [Groningen]: University of Groningen DOI: /j.csda Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: Chapter 2 GEE for Longitudinal Ordinal Data: Comparing R-geepack, R-multgee, R-repolr, SAS-GENMOD, SPSS-GENLIN N. Nooraee, G. Molenberghs, and E.R. van den Heuvel Computational Statistics & Data Analysis. 2014; 77: 22 GEE for Longitudinal Ordinal Data 2.1 Abstract Studies in epidemiology and social sciences are often longitudinal and outcome measures are frequently obtained by questionnaires in ordinal scales. To understand the relationship between explanatory variables and outcome measures, generalized estimating equations can be applied to provide a population-averaged interpretation and address the correlation between outcome measures. It can be performed by dierent software packages, but a motivating example showed dierences in the output. This paper investigated the performance of GEE in R (version 3.0.2), SAS (version 9.4), and SPSS (version ) using simulated data under default settings. Multivariate logistic distributions were used in the simulation to generate correlated ordinal data. The simulation study demonstrated substantial bias in the parameter estimates and numerical issues for data sets with relative small number of subjects. The unstructured working association matrix requires larger numbers of subjects than the independence and exchangeable working association matrices to reduce the bias and diminish numerical issues. The coverage probabilities of the condence intervals for xed parameters were satisfactory for the independence and exchangeable working association matrix, but they were frequently liberal for the unstructured option. Based on the performance and the available options, SPSS and multgee, and repolr in R all perform quite well for relatively large sample sizes (e.g. 300 subjects), but multgee seems to do a little better than SPSS and repolr in most settings. Key words: Correlated ordinal data, generalized estimating equations, copula, multivariate logistic distribution, Bridge distribution. 2.2 Introduction Introduction Motivating example Change in Quality of Life was investigated in a study of women who underwent a laparoscopic hysterectomy (surgery). In total, 72 patients were measured using the Short Form-36 Health Survey questionnaire before surgery (baseline), and six weeks after surgery, and then six months after surgery. One specic domain is the emotional role (ER). It was scored with just one item having four possible outcome levels, coded {1, 2, 3, 4}. Higher scores indicate a higher quality of life. The goal was to investigate whether ER was aected by surgery and to determine the role of some explanatory variables, such as age (a), comorbidity (cm), blood loss (bl) and complications (c) during surgery, and duration (d) of surgery. We decided to implement the following model logit [P ( O ij c )] = β 0c + β 1 a i + β 2 cm i + β 3 t ij + (α 1 bl i + α 2 c i + α 3 d i ) δ tij, with O ij the j th ordinal response for subject i, t ij the j th time moment for subject i (t i1 = 0, t i2 = 6, and t i3 = 26 weeks), and with δ x an indicator variable equal to one when x 0 and zero otherwise. The indicator δ x is needed because the covariates bl, d, c can only aect ER after surgery. The parameter β 3 would indicate the eect of surgery over time when corrected for other variables. We decided to estimate the parameters with generalized estimating equations (GEE) to obtain a population-averaged interpretation and to address the correlation between subject outcomes. We applied the geepack (ordgee function), repolr (repolr function) and multgee (ordlorgee function) packages in R under default settings and selected the most complex working association structure available in each package: unstructured working association in geepack and multgee, and exchangeable working correlation in repolr. Geepack and multgee provided surprisingly dierent results (Table 2.1), while repolr produced no parameter estimates due to the estimation of cell probabilities equal to one. The highest score of 4 was indeed frequently observed: almost 90 percent after six months of surgery. Not yet completely satised with the results, we decided to analyse this data also with SAS (GENMOD procedure) and SPSS (GENLIN command) to verify the parameter estimates of multgee and geepack. We chose unstructured working correlation ma- 24 GEE for Longitudinal Ordinal Data trix in SPSS and independence structure in SAS, based on the options available. Similar to repolr, SPSS did not converge, but SPSS was able to produced results with the exchangeable correlation structure. The results are listed again in Table 2.1. Table 2.1 The parameter estimates (robust/empirical standard error) under an independent working correlation matrix. Parameters geepack multgee SPSS SAS multgee Unstructured Unstructured Exchangeable Independent Exchangeable Threshold (1.040) 0.533(0.947) 0.846(0.932) 0.289(1.063) 0.647(0.995) Threshold (1.020) 1.007(0.907) 1.306(0.902) 0.739(1.009) 1.090(0.918) Threshold (1.019) 1.368(0.925) 1.657(0.916) 1.077(1.023) 1.438(0.934) Age (0.019) (0.017) 0.039(0.017) (0.019) (0.017) Comorbidity (0.551) (0.411) 0.508(0.414) (0.439) (0.421) Time (0.020) (0.017) 0.025(0.018) (0.018) (0.017) Blood loss (0.001) (0.001) 0.003(0.001) (0.001) (0.001) Complication 0.592(1.001) 1.832(0.677) (0.626) 1.812(0.800) 1.700(0.654) Duration 0.000(0.000) 0.000(0.000) 0.000(0.000) 0.000(0.000) 0.000(0.000) Comparing the results demonstrates several dierences. First, not all packages seem to converge, but secondly, there exist dierences in the parameter estimates between the packages (Table 2.1). This could be due to the dierent choices in correlation structure, but dierences remain even when the same class of structure is chosen. Indeed, as we already mentioned, geepack and multgee provided dierent results for the unstructured association, but also SPSS and multgee produce dierent results under the exchangeable structure (Table 2.1). Not only did the estimates dier in this case, they are also opposite in sign. When each package is run with the independence structure, all packages are identical (to the results of SAS in Table 2.1), except for geepack, which leads to completely dierent results, and for SPSS, which gives opposite signs, but the same absolute numbers. These dierent results in performance and in estimates encouraged us to investigate the similarities and discrepancy between the GEE methods in R (version 3.0.2), SAS (version 9.4), and SPSS (version ) for longitudinal ordinal data using simulation studies. In these studies we would know what mean models the software should estimate. Note that they all estimate the same mean model, and that they treat the associations as nuisance parameters, although they may have implemented dierent association structures (even in the same class). 2.2 Introduction Background Generalized estimating equations (GEE) were introduced by Liang and Zeger 18,52 as general approach for handling correlated discrete and continuous outcome variables. It only requires specication of the rst moments, the second moments, and correlation among the outcome variables. The goal of this procedure is to estimate xed parameters without specifying the joint distribution. Prentice 39 extended the GEE approach by improving the estimation of the correlation parameters using a second set of equations based on Pearson's residuals, see also Lipsitz and Fitzmaurice 20. Others modeled the association parameter as an odds ratio 22,19,4. An alternative approach considered latent variables with a bivariate normal distribution underneath the correlated binary variables, see Qu et al. 40. Extending GEE to ordinal data is not immediately obvious because the rst and second moments are not dened for ordinal observations. It requires the introduction of a vector of binary variables that relates one-to-one to the ordinal variables 7. With this set of binary variables the original GEE method 18,52 as well as the method for estimation of the association parameters can be extended to ordinal data 21,11,38,44. Dierent approaches have been used to estimate the association parameters in GEE. Lipsitz et al. 21 used Pearson's residual, while Parsons et al. 38 minimized the logarithm of the determinant of the covariance matrix of the xed parameters, i.e. minimized the standard errors of the parameter estimates. Instead of using correlations, Lumley 24 applied common odds ratios for the association of multivariate ordinal variables to reduce the number of association parameters. Williamson et al. 46 suggested a GEE method for bivariate ordinal responses with the global odds ratio as measure of dependency. In this context, two sets of equations were used: one for the xed parameters and one for the association parameters. To make the approach available to others, Williamson et al. 47 developed two SAS macros but they were not ocially incorporated in SAS. Yu and Yuan 51 developed one macro that extended these two macros to unbalanced data and it is only available upon request from the authors. approach with two sets of equations was further extended to multivariate ordinal outcomes using global odds ratios as measure of dependency, while the two sets of equations can be integrated into one set of equations for the xed and association parameters simultaneously (see Heagerty and Zeger 11 ). Nores and del Pilar Díaz 31 The 26 GEE for Longitudinal Ordinal Data investigated the eciency and convergence of this approach via simulation. They applied function ordgee of R. Recently, Touloumis et al. 44 extended the GEE method for ordinal outcomes by considering local odds ratios as the measure of association. Several overviews of GEE have been provided. Ziegler et al. 55 developed a bibliography of GEE, and Zorn 56 indicated the use of GEE in Political science. To recent books of of Ziegler 53 ; Hardin and Hilbe 53,10 were fully dedicated to GEE, while Agresti and Natarajan 2, Liu and Agresti 23, and Agresti 1 discussed comprehensive reviews of more general models and methods for (correlated) categorical data. Two particular overviews focused on the models and tests that were programmed in the software packages LogXact 4.1, SAS 8.2, Stata 7, StatXact 5, and Testimate 6 for (correlated) categorical outcomes, including GEE 32,33. Oster and Hilbe 34 also presented a general overview of software packages on exact methods, but they did not investigate the performance of these packages. Ziegler and Gromping 54 ; Horton and Lipsitz 13 did compare software packages for the analysis of correlated data via GEE, but they focused on binary outcomes only. A comprehensive comparison of frequently used software packages for correlated ordinal data using GEE has not yet been conducted. We applied a simulation study to compare the functions ordgee in geepack, ordlorgee in multgee and repolr in package repolr in R 3.0.2, the procedure GENMOD in SAS 9.4, and nally the procedure GENLIN in SPSS We took the perspective of a general user with limited knowledge of the mathematical and numerical details of GEE. This means that we mainly used default settings in the simulation study. We simulated moderately to highly correlated multivariate logistic distributed latent variables using copula functions to obtain correlated ordinal data. This choice implies the logit models for the marginal distributions, but the correlation between the binary variables coding the ordinal outcomes is dierent from choices implemented in the software. We investigated the frequency of simulation runs with numerical convergence issues, and the bias in parameter estimates. We reported the coverage probabilities of the condence intervals on these parameters using the Wald statistic. Finally we provided rejection rates of the proportionality test (if available). 2.3 Generalized estimating equations Generalized estimating equations Generalized estimating equations for ordinal outcomes require several aspects. The rst aspect is to choose a model for the covariates and a non-linear link function to connect the model to the cumulative probabilities. Then the second aspect is to create a set of binary variables describing all possible outcomes for the ordinal observations 7. The third aspect is to choose a working correlation matrix or working association structure to describe the possible association between all binary variables. The fourth and nal aspect is the estimation method for the association parameters involved in the association structure. To illustrate these aspects in more detail, consider a random sample of observations from n subjects. Let O i = ( ) O i1, O i2,..., O ini be the ordinal responses of ) subject i and O it takes values in {1, 2,..., C} and let X i = (X i1, X i2,..., X ini be a p n i dimensional matrix of time varying and/or time stationary covariates for subject i. Then the connection between the covariates and the conditional probabilities of each ordinal outcome is described by h[p(o it c X it = x it )] = β 0c + x itβ 1, (2.1) for c = 1, 2,..., C 1, β 0c the threshold parameter for level c, β 1 the vector of regression coecients corresponding to the covariates and with h a known link function. Any monotone increasing function h which would transfer the interval (0, 1) to (, ) could be applied as the link function 26, e.g. logit, probit and complementary log-log. The cumulative logits model is very popular for clustered ordinal outcomes due to its simple and comprehensive interpretation, the same as in logistic regression. This model is often referred to as the proportional odds model 1. The cumulative probabilities with probit link function is more popular in econometrics, but then the model should no longer be interpreted as an odds ratio. The formulation in (2.1) is ascending in terms of level of ordinal outcomes but the model can be changed to descending in which O it c is replaced by O it c. There are three options for choosing the binary variables Y it =(Y it1, Y it2,..., Y itc 1 ), with dimension C 1. The rst option selects Y itc = I(O it = c) (see Lipsitz et al. 21, and Touloumis et al. 44 ) the second option selects Y itc = I(O it c) (see Heagerty and Zeger 11 ), and nally the third option selects Y itc = I(O it c) (see Parsons et al. 38 ). Note that for all options c = 1, 2,..., C 1 and I(.) is the indicator function equal to one when the argument is true and zero otherwise. 28 GEE for Longitudinal Ordinal Data Consequently, the mean vector µ i = E(Y i X i = x i ) is the mean of all binary variables Yi = (Yi1,..., Y in i ). Now the vector of regression parameter β = (β 01, β 02,..., β 0C 1, β 11, β 12,..., β 1p ) can be estimated using the GEE method by solving u(β) = N i=1 D i V 1 i [Y i µ i ] = 0, (2.2) where D i = µ i / β and V i is the so-called weight matrix or working covariance matrix of Y i. This matrix may depend on the vector of parameters β and the vector of association parameters α for the binary variables. Liang and Zeger 18 ; Lipsitz et al. 21 showed that given any parameterisations of the matrix V i and assuming that the marginal model (2.1) is correctly specied, the solution β for (2.2) is a consistent estimator of β and n( β β) has an asymptotic multivariate normal distribution with mean vector 0 and covariance matrix V β = lim n nv β (n), with V β (n) dened by ( n V β (n) = Di V 1 i i=1 D i ) 1 [ n i=1 ( Di V 1 i COV Y i )V 1 i D i ]( n i=1 ) 1.(2.3) Di V 1 i D i This form of variance is referred to as the empirical or robust variance estimator since it provides a consistent estimator regardless of the (mis)specication of V i 21. A model-based standard error would be obtained when COV (Y i ) in (2.3) is replaced by matrix V i and then the covariance matrix in (2.3) would reduce to the last term in (2.3), which means that V 1 β (n) = n i=1 D i V i 1 D i. It should be noted however, that the choice for a model-based estimator does not imply that the working covariance matrix V i for the binary vector Y i is a true covariance matrix. Issues related to covariance matrices for multivariate binary outcome variables were discussed by 5,6. Fortunately, these issues do not cause diculties in applying GEE, since the multivariate distribution can always partially be described by semiparametric models, see Molenberghs and Kenward 28. To be able to determine GEE estimates, the vector of association parameters α should be estimated. Commonly, the matrix V i is re-parameterized by V i = A 1 2 i R i (α)a 1 2 i, (2.4) with A i a n i (C 1) n i (C 1) diagonal matrix with elements given by the variance of the binary variable Y itc, and the matrix R i (α) consists of the associations 2.3 Generalized estimating equations 29 between the binary variables. The R i matrix contains three parts of associations. The rst part is the association between the binary variables at one time point. The second one is the association of the same coded binary variables across time, and the third and nal part is the association of two dierently coded binary variables across time. Thus the variance of each ordinal outcome and the association between any pair of ordinal outcomes are represented by matrices rather than scalers. Although Pearson's correlation has been applied to the association between binary variables within the same time point, dierent association measures have been applied to model the dependency between binary variables across time. Lipsitz et al. 21 assumed Pearson's correlation for all associations between binary variables and estimate the association parameters α with Pearson's residuals. Restricting to the logit link function, Parsons et al. 38 described the association between each pair of the binary variables over time as a product of a function of single parameter α and Pearson's correlation of the same pair of binary variable within a time point, i.e. g st (α) exp( β 0c β 0k /2), see Kenward et al. 15. This scaler parameter is estimated by minimizing the logarithm of the determinant of the covariance matrix (log V β (n) ) of the parameter estimates in each step of the tting algorithm for solving (2.2). As an alternative to Pearson's correlation, one can use the odds ratio. Heagerty and Zeger 11 applied global odds ratios for the association of repeated binary variables in the matrix V i. They applied a second set of estimating equations of the form (2.2) to obtain these association parameters. This choice was rst introduced for binary outcomes by Prentice 39, and for ordinal outcomes by Miller et al. 27. Touloumis et al. 44 utilized local odds ratios to capture the association parameters in the V i matrix. They used the Goodman's row and column eects model 9 to reparameterize the local odds, and then estimated the
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x