230 likes | 340 Vues
Learn how to automate repetitive operations, transform appearance, and integrate multiple web sites using Chickenfoot language to customize the web without looking at HTML source. Explore keyword patterns, commands, and widgets for efficient web automation.
E N D
Automation and Customization of Rendered Web Pages Michael Bolin, Greg Little, Marcos Ojeda, Matt Webber, Philip Rha, Tom Wilson, Rob MillerMIT CSAIL http://uid.csail.mit.edu/chickenfoot Supported by NSF IIS-0447800
Web Applications • The Web has become a major application platform
Automating Repetitive Operations • Bookmark my latest bank statement • Download many links at once • Fill in defaults for forms
Transforming Appearance • Change color scheme for better contrast • Concatenate multiple pages
Integrating Multiple Web Sites • Bookstore has links for New Books, Used Books, Auction… but not for my local library • Realtor has lots of data about houses for sale… but not length of my commute
Web Apps Are Wonderfully Open • Web apps have automatic hooks for scripting • Display: machine-readable HTML • Commands: generic HTTP requests • Presentation: editable HTML, stylesheets • Web “screen scraping” is already common, mainly behind the scenes (e.g., pricescan.com) • But most users don’t do it
Problem: Many Web Apps Require A Browser • Many web apps depend on the rich browser environment • Cookies, authentication, SSL, session IDs, plugins, user-agents, client-side scripting, proxies • Perl/Python scripts run outside the browser, so they can’t easily access these web apps • Solution: do customization in the browser • Greasemonkey for Firefox • User Javascript for Opera
Problem: Web Apps Are Scary Under the Hood • HTML source of most sites is complex • This complexity is a real barrier to automation & customization
Solution: Use Rendered View • Chickenfoot: user shouldn’t have to look at HTML source to customize the Web
Outline • Demo • Language • Commands • Keyword patterns • Implementation • Pattern matching algorithm • Evaluation
Chickenfoot Language • Chickenscratch = Javascript + runtime library • Javascript syntax • Standard browser objects document.links[] window.open() • Document Object Model (DOM) Node, Element, Text, Range • Chickenfoot-specific objects and commands
Commands • Page navigation go(url) openTab(url) fetch(url) • Clicking and form manipulation click(button-or-link) check(checkbox-or-radio) enter([textbox], value) pick([listbox], choice) • Pattern matching find(pattern) • Page modification insert(pattern, html) replace(pattern, html) remove(pattern) • Widgets & input handling new Link(html, action) onClick(pattern, action)
Keyword Patterns • Keywords + component type • Component type is optional for click(), enter(), check(), pick() • Nested pattern matching: find(“start address form”).find(“city textbox”) feeling lucky button depart textbox search web form
Keyword Patterns vs. Other Names Keyword “all words textbox” Javascript document.f.as_q XPATH //body/form/table[1]/tbody/tr/td/table/tbody/tr[0]/td/ table/tbody/tr/td[1]/table/tbody/tr[0]/td[1]/input …<td>with <b>all</b> of the words</font></td> <td><input value="" name="as_q" size="25" type="text">…
Pattern Matching Algorithm • Find labels matching the keywords • Find components matching each label • Rank & choose best Pattern Ranked list of components google search button Matcher Web page 1.0 0.5 0.5
1. Find Labels Matching Keywords • Label = visible chunk of text • text nodes • button labels, listbox items • ALT attributes on images • Tolerant matching • capitalization • word ordering • punctuation • typos with <b>all</b> of the words
2. Find Component Matching Label • Search in rendered view • Component must be aligned with label • Degree of match given by: • pixel distance • relative position • HTML path length
3. Rank the Matching Components • Rank score for each <label,component> pair is computed from: • Match between keywords and label • Match between label and component • Highest-ranked component is returned • If there’s a tie, find() returns the ambiguous matches, but click/enter/pick/check() throw an error
Evaluation • Web-based survey of textbox naming • 40 respondents (24 programmers, rest not) • Comprehension: which textbox on the page is identified by this pattern? • Generation: how would you identify this textbox uniquely using only words visible on the page?
40 0 0 40 0 0 38 2 0 40 0 0 37 2 1 Results of Generation Task Patterns for which algorithm found: Right match Wrong match Multiple matches 0 26 14
Disambiguation Strategies • Keywords from section heading “above person not available Mi” • Counting “second mi” same caption
Future Work • More component types for patterns • Programming by demonstration • Pointing at page to generate patterns • Clicking & form filling to generate scripts • Javascript syntax extensions box table image
Conclusion • Chickenfoot automates and customizes web applications without looking under the hood • Simple language • Keyword patterns • Developmentenvironmentin web browser http://uid.csail.mit.edu/chickenfoot