1 / 13

YOLT

YOLT. Y uan Zheng O mar Ahmed L ukas Dudkowski T . Mark Kuba. Overview of YOLT. Simple scripting language Easy for coding and maintenance. Regular expression support := and @ “Web-scraping” uses Natural Language Processing Generating RSS Feeds

Télécharger la présentation

YOLT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. YOLT Yuan Zheng Omar Ahmed Lukas Dudkowski T. Mark Kuba

  2. Overview of YOLT • Simple scripting language • Easy for coding and maintenance. • Regular expression support • := and @ • “Web-scraping” uses • Natural Language Processing • Generating RSS Feeds • Reformatting HTML for other uses (XML,etc)

  3. A Useful YOLT Program

  4. Semantics • YOLT Semantic checker is extremely simple. It serves a few main tasks: • Make sure that functions are declared properly, i.e. function declarations match functions, and function calls match the declarations • Make sure that variables are initialized before they are used (or, in some cases, un-initialized) • (redundant) Make sure that the tree is properly formed (i.e. make sure that an if-then-else node has exactly three children, etc) *note*: there was once basic type-checking, but no longer.

  5. Semantics Lessons Learned • It is very easy to do too much in semantic checking • Either there are types, or no types (NO MIDDLE GROUND) • Scripting languages are an enormous relief to a semantic checker--they take away the biggest hassles • The tree walker should know EXACTLY what the structure of the AST will look like and cannot make ANY assumptions--things, as evident, can break down when you least expect them to.

  6. Code Generation • Written in Java • Input: correct AST • Output: Perl program AST Perl Program Code generator Java

  7. Implementation • Walk AST • According to the information of the node, generate code or go down to the child node e.g.: := $a http://www.columbia.edu Go down to the tree at node “:=“ Generate code at node “$a” and “http://www.columbia.edu”

  8. Implementation (tricks) • The httpget := • invoke UNIX system call “wget” to download the web page into a temp file • Read the file line by line and store them into an perl array • Invoke another UNIX system call “rm” to remove the temp file • Keep the web address in an perl scalar • Scalar and arrays use same syntax • Compiler (code generator) “guesses” whether the variable is a scalar or an array • Arrays can only appears in certain places (e.g.. Foreach)

  9. Documentation and Testing Lexer/Parser - Semantic Checker Log result: Good should be good. Bad should be bad. Test Cases • Good • Bad Lexer/Parser Semantic Checker Diff Reference File: What I think it should produce

  10. Integration Testing Trying little YOLT programs to see functionality, code generation, etc. Working out bugs in implementation & design. Example: Generated Perl • Goal: display any comics that have the word hamster in the URL of www.toothpastefordinner.com, Summer 2002 archive. $toothpaste_home ="http://www.toothpastefordinner.com/"; system('wget -q -O - http://www.toothpastefordinner.com/archives-sum02.php > toothpaste.txt'); open INFILE, "toothpaste.txt"; @toothpaste=<INFILE>; close INFILE; system ('rm toothpaste.txt'); $toothpaste = "http://www.toothpastefordinner.com/archives-sum02.php"; $tags ="<a href=\"(.*)\">.*hamster.*</a>"; @tmp1=(); foreach ( @toothpaste) { if ($_=~m/($tags)/i){ push @tmp1, $2} } @elements = @tmp1; foreach $x ( @elements ) { print "<img src=\"".$toothpaste_home.$x."\""."><br>"; print "\n"; } Yolt Program begin $toothpaste_home="http://www.toothpastefordinner.com/"; $toothpaste:="http://www.toothpastefordinner.com/archives-sum02.php"; $tags="<a href=\"(.*)\">.*hamster.*</a>"; $elements = $tags @ $toothpaste; foreach $x in $elements { echo "<img src=\"".$toothpaste_home.$x."\""."><br>"; echo "\n"; } end Resultant HTML <img src="http://www.toothpastefordinner.com/072802/hamster-table-tennis.gif"><br> <img src="http://www.toothpastefordinner.com/072502/even-hamsters.gif"><br> <img src="http://www.toothpastefordinner.com/060602/hamsters-are-the-best.gif"><br>

  11. The Result The source site The end result

  12. Lessons Learned • Develop and test incrementally • There are ALWAYS bugs, you just haven’t found them yet • CLIC is not designed to be lived in

  13. One More Example

More Related