530 likes | 834 Vues
Presentation Overview. Accountability in mental healthDescription and intended use of the MHSIPReview of constructs of measurementPurpose and Methods Results of the psychometric investigationReliabilitiesMeasurement invarianceDifferential item functioningDiscussion of resultsFuture directio
E N D
1. The MHSIP: A Tale of Three Centers P. Antonio Olmos-Gallo, Ph.D.
Kathryn DeRoche, M.A.
Mental Health Center of Denver
Richard Swanson, Ph.D., J.D.
Aurora Research Institute
John Mahalik, Ph.D., M.P.A.
Jefferson Center for Mental Health
Presented at the Organization for Program Evaluation in Colorado Annual Meeting, May 15, 2008
1
2. Presentation Overview Accountability in mental health
Description and intended use of the MHSIP
Review of constructs of measurement
Purpose and Methods
Results of the psychometric investigation
Reliabilities
Measurement invariance
Differential item functioning
Discussion of results
Future directions for accountability in mental health
2
3. Accountability in Mental health 3
4. Accountability in Mental Health 4
5. How does accountability work in MH? Accountability has changed from Formative- to more Summative-oriented
Grant funding (Federal, Private) requires that outcomes be demonstrated (NOMS, GPRA)
State-based requirements (CCAR, MHSIP, YSSF)
Stakeholders are more in-tune with accountability
6. Description and Intended Uses of the MHSIP What is the MHSIP?
What is it used for? 6
7. 7
8. 8
9. Domains of the MHSIP 9
10. 10
11. 11
12. Measurement Constructs 12
13. 13
14. Reliability of the MHSIP 14
15. What are we comparing? 15
16. Rasch Modeling Perspective 16
17. Purpose and Methods Participants, Procedures, and Data Analysis 17
18. Purpose of the Investigation 18
19. Participants 19
20. Procedures 20
21. Psychometric Examination of the MHSIP Reliability, Measurement Invariance, and Differential item Functioning 21
22. Comparing Subscales 22
23. Reliability Estimates in 2007 among Subscales and Centers 23
24. Reliability Summary 24
25. Invariance Testing Across Centers 25
26. Confirmatory Factor Analysis A model with all five domains could not be fit
Some of the parameters could not be estimated (Variance-Covariance matrix may not be identified)
Exploratory analyses using only Outcomes and Participation showed that Outcomes was the major culprit
28. Invariance with 3 domains We tested invariance on three domains only: Satisfaction, Access and Quality
We ran separate models for every center to have an idea up-front of their similarities/differences
Trouble can be expected based on the fit
Center 2 had the worst fit, Center 3 had a not-so-bad fit; Center 1 was in between the other two centers
32. Measurement Invariance Whether or not, we can assert that we measured the same attribute under different conditions
If there is evidence of variability, any findings reporting differences between individuals and groups cannot be interpreted
Differences in average scores can be just as easily interpreted as indicating that different things were measured
Correlations between variables will be for different attributes for different groups
33. Factorial Invariance One way to test measurement invariance is FACTORIAL INVARIANCE
The main question it addresses: Do the items making a particular measuring instrument work the same across different populations (e.g., Males and Females)?
The measurement model is group-invariant
Tests for Factorial Invariance (in order of difficulty):
34. Steps in Factor Invariance testing Equivalent Factor structure
Same number of factors, items associated with the same factors (Structural model invariance)
Equivalent Factor loading paths
Factor loadings are identical for every item and every factor
35. Steps in Factor Invariance testing (cont) Equivalent Factor variance/ covariance
Variances and Covariances (correlations) among factors are the same across populations
Equivalent Item reliabilities
Residuals for every item are the same across populations
36. Results Factorial Invariance
37. Conclusions Factorial Invariance The model does not provide a good fit for the different centers
Most of the discrepancy is centered on loadings and how the domains interact with each other (variance-covariance)
Since the model is incremental, (later tests are more challenging than early ones), we did not run equivalent item reliabilities (the most stringent test)
38. Differential Item Functioning (DIF) 38
39. Differential Item Functioning 39
40. 40
42. 42
44. 44
45. Summary of DIFF Analysis 45
46. Discussion 46
47. What did we learn about the MHSIP? Some items and subscales (domains) do not seem to measure equally across centers
Therefore comparing centers using these items/domains may not reflect true differences in performance
It is more likely that they reflect differences in measurement (including error, difficulty, reliability) 47
48. Some domains are reliable, some are not
Satisfaction was Ok from all 3 perspectives
Quality had some good characteristics, but some items were bad
Participation is not very reliable (only two items; but the items were good)
Outcomes is overall, a real bad domain (bad items, lots of cross-loading, correlated errors)
Employment/education may not be a desired outcome for all consumers
49. Discussion Despite the fact that the samples may not be appropriate (biases, sampling frameworks that can be improved), the data at hand suggests that there are some intrinsic problems with the MHSIP
But the analyses also suggest some very specific ways to improve it 49
50. Suggestions Revise the Outcomes Scale (differentiate between recovery/resiliency)
Add items to participation scale
Some items in Access need to be reviewed (Q4 and Q6)
How do we deal with all these cross-loading factors?
Is it one domain (satisfaction) that we artificially broke into many domains (outcomes, access, )?
How does the factor structure for the entire sample (EFA included in the annual report) holds up for individual centers?
More research is needed in this area
51. More suggestions Sampling Suggestions:
Attempt to Stratify the sample by Consumers needs level
At MHCD, we have developed a measure of consumers recovery needs level (RNL)
Equating Suggestions:
Use some form of equating procedures to equate scores across centers
Using Item Response Theory techniques:
IRT could help learn more about how the MHSIP measures satisfaction/performance within/among mental health centers
52. More suggestions Mixed Method Design:
Conducting focus groups at each center would provide a cross-validation to quantitative measurement
This would also enhance the utilization of the results for quality improvement
Include in the annual reports the psychometrics (reliability) for every center
Helps to know how much confidence we should have in the scores
53. Questions??? 53
54. ?2 (Chi-Square): in this context, it tests the closeness of fit between the unrestricted sample covariance matrix and the restricted (model) covariance matrix. Very sensitive to sample size: The statistic will be significant when the model fits approximately in the population and the sample size is large.
RMSEA (Root Mean Square Error of Approximation): Analyzes the discrepancies between observed and implied covariance matrices. Lower bound of zero indicates perfect fit with values increasing as the fit deteriorates. Suggested that values below 0.1 indicate a good fit to the data, and values below 0.05 indicate a very good fit. It is recommended not to use models with RMSEA values larger than 0.1
GFI (Goodness of Fit Index): Analogous to R2 in that it indicates the proportion of variance explained by the model. Oscillates between 0 and 1 with values exceeding 0.9 indicating a good fit to the data.
CFI (Comparative Fit Index): Indicates the proportion of improvement of the overall fit compared to a null (independent) model. Sample size independent, and penalizes for model complexity. It uses a 0-1 norm, with 1 indicating perfect fit. Values of about 0.9 or higher reflect a good fit