1 / 49

XML Processing in

XML LONDON 2014. XML Processing in. William Narmontas Dino Fancellu www.scala.contractors. Dino Fancellu 35 years IT Scala • Java • XML. William Narmontas 10 years IT Scala • XML • Web. What is Scala?. Scala processes XML fast. It is powerful. Modular. Concise. Functional.

ranger
Télécharger la présentation

XML Processing in

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML LONDON 2014 XML Processing in William Narmontas Dino Fancellu www.scala.contractors

  2. Dino Fancellu 35 years IT Scala • Java • XML William Narmontas 10 years ITScala • XML • Web

  3. What is Scala?

  4. Scala processes XML fast

  5. It is powerful

  6. Modular Concise Functional Type-safe Performant Object-oriented Strongly-typed Statically-typed Unopinionated Composable Java-interoperable First-class XML

  7. eBay eHarmony EDF FourSquare Gawker HSBC ITV Klout Who uses Scala? Apple Bank of America Barclays BBC BSkyB Cisco Citigroup Credit Suisse LinkedIn Morgan Stanley Netflix Novell Rackspace Sky Sony Springer The Guardian TomTom Trafigura Tumblr Twitter UBS VMware Xerox

  8. Projects in Scala - Less code to write = less to maintain - Communication clearer - Testing easier - Software robust - Time to market: fast - Happier developers

  9. Scala language: Intro

  10. let $conferenceName := "XML London 2014" Scala XQuery var conferenceName ="XML London 2014"conferenceName ="XML London 2015" Scala (Mutable) Values val conferenceName ="XML London 2014"

  11. Strings val language ="Scala" s"XML Processing in $language"| XML Processing in Scala s"""An introduction to: |The "$language" programming language""".stripMargin| An introduction to:| The "Scala" programming language s"$language has ${language.length} chars in its name"| Scala has 5 chars in its name

  12. declarefunctionlocal:fun( $x asxs:integer, $y asxs:double) asxs:string {concat($x, ": ", $y)}; Scala XQuery Functions def fun(x:Int, y:Double) = s"$x: $y"

  13. Everything is an expression val trainSpeed =if ( train.speed.mph >= 60 ) "Fast"else"Slow"def divide(numerator:Int, denominator:Int) =try { s"${numerator/denominator}" } catch {case_:java.lang.ArithmeticException => s"Cannot divide $numerator by $denominator" }

  14. Types: Explicit def withTitle(name:String, title:String):String = s"$title. $name"val x:Int = {val y =1000100 + y}| x:Int = 1100

  15. Functions: named parameters Further clarity in method calls: def makeLink(url:String, text:String) = s"""<a href="$url">$text</a>"""makeLink(text ="XML London 2014", url ="http://www.xmllondon.com")| <a href="http://www.xmllondon.com">XML London 2014</a>

  16. Functions: default parameters Reduce repetition in method calls: def withTitle(name:String, title:String = "Mr") = s"$title. $name"withTitle("John Smith")| Mr. John SmithwithTitle("Mary Smith", "Miss")| Miss. Mary Smith

  17. Functional def incrementedByOne(x:Int) = x + 1(1 to 5).map(incrementedByOne)| Vector(2, 3, 4, 5, 6)

  18. Lambdas (1 to 5).map(x => x + 1) | Vector(2, 3, 4, 5, 6)(1 to 5).map(_ + 1) | Vector(2, 3, 4, 5, 6)

  19. For comprehensions for { x <- (1 to 5) }yield x + 1 | Vector(2, 3, 4, 5, 6)

  20. Implicit classes: Enrich types implicitclass stringWrapper(str:String) {def wrapWithParens = s"($str)"} "Text".wrapWithParens| (Text)

  21. Powerful features for scalability - Case classes - Traits - Partial functions - Pattern matching - Implicits - Flexible Syntax - Generics - User defined operators - Call-by-name - Macros

  22. Scala & XML

  23. Values: Inline XML val url ="http://www.xmllondon.com"val title ="XML London 2014"val xmlTree = <div> <p>Welcome to <a href={url}>{title}</a>!</p></div>| xmlTree:scala.xml.Elem =| <div>| <p>Welcome to <a href="http://www.xmllondon.com/">XML London 2014</a>!</p>| </div>

  24. XML Lookups val listOfPeople = <people> <person>Fred</person> <person>Ron</person> <person>Nigel</person></people>listOfPeople \ "person"| NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)listOfPeople \ "_"| NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)

  25. XML Lookups val fact = <fact type="universal"> <variable>A</variable> = <variable>A</variable></fact>fact \\ "variable"| NodeSeq(<variable>A</variable>, <variable>A</variable>)fact \ "@type"| :scala.xml.NodeSeq = universalfact \@ "type"| :String = universal

  26. XML Loading val pun ="""<pun rating="extreme"> | <question>Why do CompSci students need glasses?</question> | <answer>To C#<!-- C# is a Microsoft's programming language -->.</answer> |</pun>""".stripMargin scala.xml.XML.loadString(pun)| <pun rating="extreme">| <question>Why do CompSci students need glasses?</question>| <answer>To C#.</answer>| </pun>

  27. Collections: expressive val root = <numbers> {for {i <-1 to 10} yield <number>{i}</number>}</numbers>val numbers = root \ "number"numbers(0)| <number>1</number>numbers.head| <number>1</number>numbers.last| <number>10</number>numbers take 3| NodeSeq(<number>1</number>, <number>2</number>, <number>3</number>)

  28. Collections: expressive numbers filter (_.text.toInt > 6)| NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>)numbers(_.text.toInt > 6)| NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>)numbers maxBy (_.text)| <number>9</number>numbers maxBy (_.text.toInt)| <number>10</number>numbers.reverse| NodeSeq(<number>10</number>, <number>9</number>, <number>8</number>, <number>7</number>, <number>6</number>, <number>5</number>, <number>4</number>, <number>3</number>, <number>2</number>, <number>1</number>)numbers.groupBy(_.text.toInt % 3)| Map(| 2 -> NodeSeq(<number>2</number>, <number>5</number>, <number>8</number>),| 1 -> NodeSeq(<number>1</number>, <number>4</number>, <number>7</number>, <number>10</number>),| 0 -> NodeSeq(<number>3</number>, <number>6</number>, <number>9</number>))

  29. ++ :\ andThen buildString companion copyToBuffer distinct endsWith flatten genericBuilder headOption inits isTraversableAgain lastIndexWhere max nameToString par product reduceRightOption sameElements seq sorted stringPrefix takeWhile toIndexedSeq toSet union xmlType zipWithIndex ++: \ apply canEqual compose corresponds doCollectNamespaces exists fold getNamespace indexOf intersect iterator lastOption maxBy namespace partition reduce repr scan size span sum text toIterable toStream unzip xml_!= +: \@ applyOrElse child contains count doTransform filter foldLeft groupBy indexOfSlice isAtom label length min nonEmpty patch reduceLeft reverse scanLeft slice splitAt tail theSeq toIterator toString unzip3 xml_== /: \\ asInstanceOf collect containsSlice descendant drop filterNot foldRight grouped indexWhere isDefinedAt last lengthCompare minBy nonEmptyChildren permutations reduceLeftOption reverseIterator scanRight sliding startsWith tails to toList toTraversable updated xml_sameElements /:\ addString attribute collectFirst copy descendant_or_self dropRight find forall hasDefiniteSize indices isEmpty lastIndexOf lift minimizeEmpty orElse prefix reduceOption reverseMap scope sortBy strict_!= take toArray toMap toVector view zip XML Methods: a rich API % :+ aggregate attributes combinations copyToArray diff dropWhile flatMap foreach head init isInstanceOf lastIndexOfSlice map mkString padTo prefixLength reduceRight runWith segmentLength sortWith strict_== takeRight toBuffer toSeq transpose withFilter zipAll

  30. <bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib>

  31. <bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib>

  32. <bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib>

  33. <bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib>

  34. <bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib>

  35. <bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib> Nice! ... yet is general purpose

  36. Hybrid XML - XQuery for Scala - java.xml.* for free - Look up: XPath - Transform: XSLT - Stream: StAX

  37. XQuery for Scala (XQS) - Wraps XQuery API for Java (javax.xml.xquery) - Scala access to XQuery in: - MarkLogic, BaseX, Saxon, Sedna, eXist, … - Converts DOM to Scala XML & vice versa - http://github.com/fancellu/xqs

  38. XQuery via XQS val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget></widgets>import com.felstar.xqs.XQS._val conn =new net.xqj.basex.local.BaseXXQDataSource().getConnectionval nodes: NodeSeq = conn("for $w in /widgets/widget order by $w return $w", widgets)| NodeSeq(<widget>Menu</widget>, <widget id="panel-1">Panel</widget>, | <widget id="panel-2">Panel</widget>, <widget>Status bar</widget>)

  39. XPath import com.felstar.xqs.XQS._val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget></widgets>val xpath = XPathFactory.newInstance().newXPath()val nodes = xpath.evaluate("/widgets/widget[not(@id)]", toDom(widgets), XPathConstants.NODESET).asInstanceOf[NodeList](nodes:NodeSeq)| NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>)Natively in Scala:(widgets \ "widget")(widget => (widget \ "@id").isEmpty)| NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>)

  40. XSLT val stylesheet = <xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:templatematch="john"> <xsl:copy>Hello, John.</xsl:copy> </xsl:template> <xsl:templatematch="node()|@*"> <xsl:copy> <xsl:apply-templatesselect="node()|@*"/> </xsl:copy> </xsl:template></xsl:stylesheet>import com.felstar.xqs.XQS._val xmlResultResource =new java.io.StringWriter()val xmlTransformer = TransformerFactory.newInstance().newTransformer(stylesheet)xmlTransformer.transform(peopleXml, new StreamResult(xmlResultResource))xmlResultResource.getBuffer| <?xml version="1.0" encoding="UTF-8"?><people>| <john>Hello, John.</john>| <smith>Smith is here.</smith>| <another>Hello.</another>| </people> val peopleXml = <people> <john>Hello, John.</john> <smith>Smith is here.</smith> <another>Hello.</another> </people>

  41. XML Stream Processing // 4GB file, comes back in a secondval src = Source.fromURL("http://dumps.wikimedia.org/enwiki/20140402/enwiki-20140402-abstract.xml")val er = XMLInputFactory.newInstance().createXMLEventReader(src.reader)implicitclass XMLEventIterator(ev:XMLEventReader) extends scala.collection.Iterator[XMLEvent]{def hasNext = ev.hasNextdef next = ev.nextEvent()}er.dropWhile(!_.isStartElement).take(10).zipWithIndex.foreach {case (ev, idx) => println(s"${idx+1}:\t$ev") }src.close() | 1: <feed> | 2: | | 3: <doc> | 4: | | 5: <title> | 6: Wikipedia: Anarchism | 7: </title> | 8: | | 9: <url> | 10: http://en.wikipedia.org/wiki/Anarchism

  42. Use Cases - Data extraction - Serving XML via REST - Dynamically generated XSLT - Interfacing with XML databases - Flexibility to choose the best tool for the job

  43. Excellent Ecosystem SBT Akka Spark Spray Specs scalaz shapeless scala-xml Scaladin ScalaTest macro-paradise scala-maven-plugin JVM

  44. Conclusion - Practical - Practical for XML processing

  45. Where do I start? - atomicscala.com - typesafe.com/activator - scala-lang.org - scala-ide.org - IntelliJ

  46. Matt Stephens Charles Foster

  47. Open to consulting www.scala.contractors Follow us on Twitter: @DinoFancellu @ScalaWilliam @MaffStephens

More Related