190 likes | 213 Vues
This study focuses on determining the appropriate probability sampling method and sample size for high-resolution studies in order to derive valid conclusions applicable to the entire population. The analysis also includes power calculations for logistic regression and relative survival analysis to assess differences in standard care and survival rates.
 
                
                E N D
Descriptive Studies and Health Planning Unit, Department of Preventive and Predictive Medicine, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan Sampling and power analysis in the High Resolution studies Pamela Minicozzi
High Resolution studies collected detailed data from patients’ clinical records, so that the influence of non-routinely collected factors (tumour molecular characteristics, diagnostic investigations, treatment, relapse) on survival and differences in standard care could be analysed
Problem • In each country, the population of incident cases • for a particular cancer consists of N subjects • N is large (so, rare cancers are not considered here) • Since N is large, not all cases can be investigated • use a representative sample to derive valid conclusions • that are applicable to the entire original population Solution
Two questions What kind of probability sampling should we use? What sample size should we use?
Previous High Resolution studies • Samples were representative of • 1-year incidence • a time interval (e.g. 6 months) within the study period, provided that incidence was complete • an administratively defined area covered by cancer registration
Present High Resolution studies We want to eliminate variations in types of sampling between countries and within a single country This implies more sophisticated sampling Main types of probability sampling
Simple random sampling • assign a unique number to each element of the study population • determine the sample size • randomly select the population elements using • a table of random numbers • a list of numbers generated randomly by a computer Advantage: - auxiliary information on subjects is not required Disadvantage: - if subgroups of the population are of particular interest, they may not be included in sufficient numbers in the sample
Stratified sampling • identify stratification variable(s) and determine the number of • strata to be used (e.g. day and month of birth, year of diagnosis, cancer registry, etc.) • divide the population into strata and determine the sample size of each • stratum • randomly select the population elements in each stratum Advantage: - a more representative sample is obtained Disadvantage: - requires information on the proportion of the total population belonging to each stratum
Systematic sampling • determine the sample size (n); thus the sampling interval “i” is n/N • randomly select a number “r” from 1 to “i” • select all the other subjects in the following positions: • r, r+ i, r+ 2*i, etc, until the sample is exhausted Advantage: - eliminate the possibility of autocorrelation Disadvantage: - only the first element is selected on a probability basis  pseudo-random sampling
How many subjects do we need?
The main elements Previous pilot studies • the probability that the • difference will be detected (e.g. 80%, 90%) • the probability that a positive finding is due to chance alone (e.g. 1%, 5%) they explored whether some variables can be measured with sufficient precision (or available) and checked the study vision
Previous High Resolution studies • Number of patients was defined based on: • observed differences in survival and risk of death • incidence of the cancer under study • difficulties in collecting clinical information • available economic resources Notwithstanding that ... • we were able to identify statistically significant relative excess risks of death • up to 1.60 among European countries • up to 1.40 among Italian areas • for breast cancer for which differences in survival are small. •  Applicable to other cancers for which survival differences are larger
Example for breast cancer (diagnosis 95-99) Plot power as a function of hazard ratio for a 5% two-sided log-rank test with 80% power over sample sizes ranging from 100 and 1000 Assume 75% survival as reference (the overall survival in Europe, range: 65-90%) 45%
Example for colorectal cancer (diagnosis 95-99) Plot power as a function of hazard ratio for a 5% two-sided log-rank test with 80% power over sample sizes ranging from 100 and 1000 Assume 50% survival as reference (the overall survival in Europe, range: 30-70%) 32%
Example for lung cancer (diagnosis 95-99) Plot power as a function of hazard ratio for a 5% two-sided log-rank test with 80% power over sample sizes ranging from 100 and 1000 Assume 10% survival as reference (the overall survival in Europe, range: 5-20%) 30%
Present High Resolution studies We want to analyse both differences in survival and adherence to standard care Power analysis for both logistic regression analysis (to analyse the odds of receiving one type of care (typically standard care)) and relative survival analysis (to analyse differences in relative survival and relative excess risks of death)
Conclusions • Taking into account • existing samplings and power methodology • experience from previous studies • different coverage of Cancer Registries • available economic resources • We want to • standardize the selection of data • include a minimum number of cases that satisfies statistical • considerations related to all aims of our studies Prof. JS Long1(Regression Models for Categorical and Limited Dependent,1997) suggests that sample sizes of less than 100 cases should be avoided and that 500 observations should be adequate for almost any situation. 1Professor of Sociology and Statistics at Indiana University
Thank you for your attention And… Whatabout yourexperience?