Effect of Combining Transferable Data on Model Sensitivities in Home Interview Surveys

Effect on Model Sensitivities of Combining Transferable Data from Separate Home Interview Surveys Presented to the 11th Conference on Transportation Planning Applications May 8, 2007 By Jonathan Avner, Wilbur Smith Associates Gregory Giaimo, Ohio Department of Transportation

Outline • Analysis of Data Transferability • Production Rates • Trip Length Analysis • Time of Day Analysis • Sensitivity of Models with Transferred Data

Background • ODOT undertook a household survey data collection effort in 2000 to support the development of a new generation of travel demand models in the small and medium sized MPOs. • In total, over sixteen thousand households were surveyed (MPO and non MPO areas) that included more than 100,000 trip records.

Survey Household Locations

Data Transferability

Data Transferability • Previous research has focused on feasibility of avoiding surveying by borrowing other data. • This research focused on combining data to obtain improved parameter estimates. • Each area had 1300 to 1900 households surveyed and would be getting the same model design with calibrated parameters. • Considered following model components • Trip Production Rates • Trip Distribution (Friction Factor Calibration) • Time of Day

Areas Considered for Combination

Trip Production Rate Analysis - Purpose • Determine whether datasets could be combined to create larger estimation datasets for better parameter estimation. • Depending on purpose, trip rates are stratified by wealth (vehicles / hh), size (hh size, hh workers, etc.) and possibly area type. • With combined datasets able to achieve minimum number of observations per cell with area type stratification (not necessarily without). • Thus if area type dimension needed, combining study area datasets could be necessary.

Trip Production Rate Analysis – Statistical Analysis • The mean trip production rate was compared on a cellular basis for each combination (small, large, group 1, group 2). • ANOVA (analysis of variance) was used since greater than two samples were being considered. • Results were based on looking at F statistic: • Ratio between the group variability and within group variability • Value close to 1 → accept Ho (means are equal) • Value much >1 → reject Ho (means are not equal)

Trip Production Rate Analysis – Area Type • For the larger MPOs, the trip rates were compared between area types to determine need for this dimension. • Four area types are used in generation: CBD, Urban, Suburban, Rural • Average trip rates between area types in the large MPOs were tested using ANOVA.

Trip Production Rate Analysis – Area Type • High F statistics indicated difference between average trips between different area types. • Unique production rates were calibrated for area types or combinations of area types when: • F statistic was large between area types; and • Sample size large enough in each cell • Households per cell>30

Trip Production Rate Analysis - Results • Chosen combination of study area data would be applied to all trip purposes in the trip generation model • Necessary to develop overall “score” for each combination, since actual ANOVA at a cellular level • Households in each cell of combination were added together if cell had significant F statistic (accept Ho) • Results below indicate percentage of households that are in cells with similar trip rates.

Trip Production Rate Analysis - Recommendations • Group 2 – removed because of overlap with Large • Dayton removed because of independent model development

Trip Length Analysis - Purpose • Intent of analysis was to find areas where a friction factor curve could be shared between areas. • Same combination datasets were considered: small, large, group 1 and group 2

Trip Length Analysis – Statistical Analysis • Trips used in analysis were restricted to those with both trip ends within an MPO area and with known locations of trip ends. • Rather than using reported trip length, the skimmed trip length was used in the analysis. • ANOVA was used to compare average trip length. • Trips were compared two ways • Same trip purpose across areas • Purposes within an area to see if differences existed

Trip Length Analysis by Purpose - Results • Results indicate that there is significant difference between average trip length between areas in combination datasets. • Logical findings given different network characteristics, geographic size of area and other travel related factors.

Average Trip Length by MPO Area

Trip Length Frequency Distribution - HBW

Trip Length Analysis by Area - Results • Results for comparison of purposes within an MPO area showed little potential for combination. • Consistent with traditional approaches to have unique gravity model for each trip purpose.

Average Trip Length by Purpose by Area

Time of Day - Purpose • Determine whether datasets could be combined for estimation of time of day factors and directional factors for Time of Day model. • Coincidence Ratio was used to determine if all areas shared similar daily distribution of trips. • Four time periods were defined: • Over Night (6pm to 6am) • AM Peak Period (6am to 9am) • Midday (9am to 2pm) • PM Peak Period (2pm to 6pm)

Time of Day – Statistical Analysis • Difference of proportions test was used to compare the proportion of trips made between each area being compared: • Small with Large • Small only • Large only • Group 1 and Group 2

Time of Day - Results • From a cursory inspection, it seems all areas could share the same dataset. • Further review of the results indicates that for HBSH (Period 1), HBW (Period 2), HBO (Period 3) and HBSH and HBO (Period 4) there are significant differences between the small and large datasets. • Since all MPOs are included as either small or large, this was the recommended dataset for TOD calibration.

Time of Day - Results HBW – Percent of Trips by Period HBSH – Percent of Trips by Period Percent Departure by Period (Shaded = Statistically Different)

Additional Analysis • Reviewed cell compression scheme suggested by ODOT. • Cells compressed based on rarity of households in survey • Cells with more vehicles than persons were compressed (based on analysis of OKI, MORPC and NOACA survey) • Evaluation of compression based on: • Number of households in each cell from survey dataset • Difference in trip rate between independent cells and compressed cells • Analysis supported ODOT compression techniques.

Additional Analysis • Evaluated the potential of a HB School trip purpose. • Compared Average trip length for school and non school HB activities • Evaluated frequency of trips for sufficient numbers for calibration. • Evaluated distribution of households in cross classification matrix (vehicle ownership x students in household) • Determined that a HB School purpose was warranted

Trip Rate Sensitivity Analysis Further pursue the impact that various trip generation rates would have on model results Calculate various “feasible” sets of trip rates based on the combined and Toledo stand-alone survey data sets Smaller sample size in the stand alone data implies a broader range of “feasible” trip rate sets

Total Households Comparison

Trip Rates

Construction of Alternate Trip Rates • Calculate Percent Errors for a given confidence interval (rather arbitrarily selected 90%) • E = Z*CV/SQRT(N) • Develop other feasible sets of trip rates within plus / minus this error percentage of the calculated mean

Construction of Alternate Trip Rates • Trip rates varied by cross-class cell, however, the overall resultant trip rates were also held within the 90% confidence interval • Various perturbations of the trip rates were created within this range, the two shown are: • Systematic perturbation involving increasing zero Vehicle HH trip rates by exactly the calculated percent error while reducing all other trip rates by 10% of this value • Random perturbation of each trip rate within its percent error range

HBW Proportion of the Percent Error Applied to Create Alternate Trip Rate

Alternate Reality Socio-Economic Data • Given the concentration of variance in certain rare cells of the cross classificatin matrix… • An alternative set of zonal SE data was constructed that placed more HH’s in these cells by: • Reducing Vehicles by 50% in CBD / Urban Area • Increase Workers 16% in all zones • No change in # of HH’s or attraction variables

Test Impact on Measures of Effectiveness (MOE’s) • 12 Test Cases Based Upon: • 6 Sets of Trip Rates • Combined Data, Base • Combined Data, Systematic Perturbation • Combined Data, Random Perturbation • Toledo Data, Base • Toledo Data, Systematic Perturbation • Toledo Data, Random Perturbation • 2 Sets of SE Data • Base • Modified

Test Impact on Measures of Effectiveness (MOE’s) • Evaluate Various MOE’s: • Link Volume • VMT • VHT • %RMSE or %RMSD • Tons of Pollutants • Trips • Transit Riders

Toledo Data Only, Systematic Perturbation, Modified SE Data Volume on New River Crossing Base Model & SE Data

VMT

VHT

%RMS Error and Difference

Ozone Precursors

Trips

Conclusions • Randomly perturbed trip rates, even when applied to purposefully skewed SE data showed almost no impact on typical MOE’s • Systematically perturbed trip rates produced slightly lower %RMSD between the SE data scenarios • Base %RMSD: 10.43 • Combined: 8.36 • Stand Alone: 7.09 • These slight differences are minor compared to the models %RMSE values

Conclusions • The Toledo stand alone sample was sufficient for the given model (not surprising since it was designed as such) • Increasing sample size much beyond the computed minimums wouldn’t have added much • It was still useful to combine the data sets where practical to give more faith in the low incidence cells • This also allowed the addition of the area type dimension to the smaller areas whose smaller survey sample was not originally designed for this

Effect of Combining Transferable Data on Model Sensitivities in Home Interview Surveys

Effect of Combining Transferable Data on Model Sensitivities in Home Interview Surveys

Presentation Transcript

Combining Effect Sizes

The Effect of a Model on Behavior

MERGING DATA SETS OF SEPARATE ORIGIN

Establishing Cause and Effect from Data

Combining Effect Sizes

COMBINING TWO SURVEYS

Effect of Soil Data on SWAT Modeling

The Unification Model of Active Galaxies: Implications from Spectropolarimetric Surveys

THE EFFECT OF MEDICAID RATE ON POTENTIALLY PREVENTABLE HOSPITALIZATIONS FROM NURSING HOME *

Sacramento Model Effect of Parameters on Model Response

Data Retrieval from Statistics Canada Surveys

Home working – the effect on organisations

Combining Travel Surveys and Physical Activity Studies

Time Scale Dependent Sensitivities of XinAnJiang Model Parameters

Combined use of data from registers and sample surveys

Effect of Model Calibration on Streamflow Forecast Results

Data Products from Public Surveys@ESO

'Quality'? Indicators from combining Likert items and other measures from surveys

Integration of Register’s and Surveys’ data on MNEs

COMBINING TWO SURVEYS

Effect of Earnings Management on Bankruptcy Predicting Model Evidence from Nigerian Banks

Make quick money: Paid surveys from home, advantages of online surveys