Benchmarking Web Accessibility Evaluation Tools:

http://dx.doi.org/10.6084/m9.figshare.701216 Benchmarking Web Accessibility Evaluation Tools: Measuring the Harm of Sole Reliance on Automated Tests Markel Vigo University of Manchester (UK) Justin Brown Edith Cowan University (Australia) Vivienne Conway Edith Cowan University (Australia) 10th International Cross-Disciplinary Conference on Web Accessibility W4A2013

Problem & Fact WWW is not accessible 13 May 2013 W4A2013

Evidence Webmasters are familiar with accessibility guidelines Lazar et al., 2004 Improving web accessibility: a study of webmaster perceptions Computers in Human Behavior 20(2), 269–288 13 May 2013 W4A2013

Hypothesis I Assuming guidelines do a good job... H1: Accessibility guidelines awareness is not that widely spread. 13 May 2013 W4A2013

Evidence II Webmasters put compliance logos on non-compliant websites Gilbertson and Machin, 2012 Guidelines, icons and marketable skills: an accessibility evaluation of 100 web development company homepages W4A 2012 13 May 2013 W4A2013

Hypothesis II Assuming webmasters are not trying to cheat... H2: A lack of awareness on the negative effects of overreliance on automated tools. 13 May 2013 W4A2013

Expanding on H2Why we rely on automated tests • It's easy • In some scenarios seems like the only option: web observatories, real-time... • We don't know how harmful they can be 13 May 2013 W4A2013

Expanding on H2Knowing the limitations of tools • If we are able to measure these limitations we can raise awareness • Inform developers and researchers • We run a study with 6 tools • Compute coverage, completeness and correctnesswrt WCAG 2.0 13 May 2013 W4A2013

MethodComputed Metrics • Coverage: whether a given Success Criteria (SC) is reported at least once • Completeness: • Correctness: 13 May 2013 W4A2013

MethodStimuli Vision Australia www.visionaustralia.org.au Non-profit Non-government Accessibility resource Prime Minister www.pm.gov.au Federal Government Should abide by the Transition Strategy Transperth www.transperth.wa.gov.au Government affiliated Used by people with disabilities 13 May 2013 W4A2013

MethodObtaining the "Ground Truth" Ad-hoc sampling Manual evaluation Agreement Ground truth 13 May 2013 W4A2013

MethodComputing Metrics For every page in the sample... Evaluate Get reports Compare with the GT Compute metrics T1 M1 R1 GT T2 M2 R2 T3 M3 R3 R4 T4 M4 T5 M5 R5 R6 T6 M6 13 May 2013 W4A2013

Accessibility of Stimuli Vision Australia www.visionaustralia.org.au Prime Minister www.pm.gov.au Transperth www.transperth.wa.gov.au 13 May 2013 W4A2013

ResultsCoverage • 650 WCAG Success Criteria violations (A and AA) • 23-50% of SC are covered by automated test • Coverage varies across guidelines and tools 13 May 2013 W4A2013

ResultsCompleteness per tool • Completeness ranges in 14-38% • Variable across tools and principles 13 May 2013 W4A2013

ResultsCompleteness per type of SC • How conformance levels influence on completeness • Wilcoxon Signed Rank: W=21, p<0.05 • Completeness levels are higher for 'A level' SC 13 May 2013 W4A2013

ResultsCompleteness vs. accessibility • How accessibility levels influence on completeness • ANOVA: F(2,10)=19.82, p<0.001 • The less accessible a page is the higher levels of completeness 13 May 2013 W4A2013

ResultsTool Similarity on Completeness • Cronbach's α = 0.96 • Multidimensional Scaling (MDS) • Tools behave similarly 13 May 2013 W4A2013

ResultsCorrectness • Tools with lower completeness scores exhibit higher levels of correctness 93-96% • Tools that obtain higher completeness yield lower correctness 66-71% • Tools with higher completeness are also the most incorrect ones 13 May 2013 W4A2013

ImplicationsCoverage • We corroborate that 50% is the upper limit for automatising guidelines • Natural Language Processing? • Language: 3.1.2 Language of parts • Domain: 3.3.4 Error prevention 13 May 2013 W4A2013

ImplicationsCompleteness I • Automated tests do a better job... ...on non-accessible sites ...on 'A level' success criteria • Automated tests aim at catching stereotypical errors 13 May 2013 W4A2013

ImplicationsCompleteness II • Strengths of tools can be identified across WCAG principles and SC • A method to inform decision making • Maximising completeness in our sample of pages • On all tools: 55% (+17 percentage points) • On non-commercial tools: 52% 13 May 2013 W4A2013

Conclusions • Coverage: 23-50% • Completeness: 14-38% • Higher completeness leads to lower correctness 13 May 2013 W4A2013

Follow up Contact @markelvigo | markel.vigo@manchester.ac.uk Presentation DOI http://dx.doi.org/10.6084/m9.figshare.701216 Datasets http://www.markelvigo.info/ds/bench12/index.html 10th International Cross-Disciplinary Conference on Web Accessibility W4A2013 13 May 2013

Benchmarking Web Accessibility Evaluation Tools:

Benchmarking Web Accessibility Evaluation Tools:

Presentation Transcript

SELF-HELP HOMEOWNERSHIP OPPORTUNITIES PROGRAM

HireRight Employment Screening Benchmarking Report

DHTML Accessibility Yahoo! Experiences with Accessibility (a11y), DHTML, and Ajax in Rich Internet Applications

Web Accessibility

Identifying Good Practice in Benchmarking

Chapter 7

Keyboard accessibility

Web Accessibility: Laws, Coding Practices, and Testing Tools

Detailed Project Plan Evaluation

enchmarking

Strengthening organisational capacities for evaluation of humanitarian action

Section 508 and Website Accessibility

Chapter 2 Statistical Tools in Evaluation

Diagnosis, Evaluation, and Treatment of Stroke

Evaluation of information systems

Hand Tools

Benchmarking for Improvement

Transmission Lines

Email Tools review- Email Tools $27,300 bonus & discount