1 / 24

Advanced Find and Replace with Regular Expressions

Advanced Find and Replace with Regular Expressions. Robert Kiffe Senior Customer Support Engineer. Agenda. Review: Global Find and Replace Introduction to Regular Expressions Challenge #1 Solution Advanced Regular Expressions Challenge #2 Solution Hands On Q & A.

orourke
Télécharger la présentation

Advanced Find and Replace with Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Find and Replacewith Regular Expressions Robert KiffeSenior Customer Support Engineer

  2. Agenda • Review: Global Find and Replace • Introduction to Regular Expressions • Challenge #1 • Solution • Advanced Regular Expressions • Challenge #2 • Solution • Hands On • Q & A

  3. Global Find and Replace • Location: Content > Find and Replace • Administrators Only (User Level 10) • Searches a single site • Adjust ‘Scope’ to limit searchable content • Literal Text or Regex patterns

  4. Global Find and Replace • Find • Simple search with results list • Preview Replace • Safe multi-step process • Perform ‘sample’ find/replace and display results list • Select pages from results to perform the actual find/replace operation • (Optional) Publish selected results

  5. Regular Expressions • Regular Expression • A pattern that ‘describes’ a certain amount of text • The concept arose in the 1950s when the American mathematician Stephen Cole Kleene formalized the description of a regular language. (Thanks Wikipedia) • Now used in almost every major programming language

  6. Literal Characters • Literal Text Matches • Most characters match exactly themselves • Case Sensitive Robert does not like to be called robert. Robert does not like to be called robert. Robert

  7. Special Characters • Symbol characters that have special purpose (explained later) • Full List: \ ^ $ . | ? * + ( ) [ { • To match as literal characters, you must ‘escape’ them by adding “\” in front Rob does not like to be called Robert? Rob does not like to be called Robert? Robert\?

  8. Special Character: Period • ‘Wildcard’ Character • Matches any character except newline. Robert does not like to be calledoberth, Bobert, or Goobert. Robert does not like to be calledoberth, Bobert, or Goobert. .obert

  9. Special Characters: Quantifiers • Symbol characters that define how many of the previous character(s) to match • ? (0 or 1) • * (0 or More) • +(1 or More) • Use Curly Brackets to indicate an exact number or range • {3} (Exactly 3) • {3,} (3 or More) • {3,5} (3, 4, or 5) • Only modifies the previous character (or group)

  10. Special Characters: Quantifiers • Quantifiers: Example • ? : 0 or 1 Robert does not like to be called Roberta. Robert does not like to be called Roberta. Roberta?

  11. Special Characters: Parenthesis • Capture Groups • Encapsulate a character sequence using parentheses: “(…)” • Add a quantifier to affect the whole group • Replace • In the ‘replace field’, refer to your groups using the “dollar sign” and then the group number: $# • Count the opening parenthesis characters, “(” , to determine the correct #

  12. Special Characters: Parenthesis • Capture Group: Example FIND I like https://school.edubut not https://www.school.edu. I like https://school.edu but not https://www.school.edu. https://www\.(school\.edu) REPLACE I like https://school.edubut not https://school.edu. https://$1

  13. Challenge #1 • Find All Links to a Particular Domain • Problem is that it can have many formats: • Root-relative “/” • /about/contact.html • Absolute (either protocol) • http://www.gallena.com/about/contact.html • https://www.gallena.com/about/contact.html • No Subdomain • http://gallena.com/about/contact.html • Examples: • <a href="/about/"> • <a href="http://www.gallena.com/about/">

  14. Challenge #1: Tips • Use a quantifier (ie. ‘?’) to make a part of the URL optional • a? • Combine a quantifier with Parenthesis to make a substring of the URL optional • (abc)?

  15. Challenge #1: Solution Steps to Build the Regex Pattern: • href="https?://www\.gallena\.com/ (HTTPS protocol) • href="https?://(www\.)?gallena\.com/ (+Subdomain optional) • href="(https?://(www\.)?gallena\.com)?/ (+Root-relative) • Example Matches: • <a href="http://www.gallena.com/about/">About</a> • <a href="http://gallena.com/records/index.html">Records</a> • <a href="/academics/index.html">Academics</a> • <a href="https://www.gallena.com/portal/">Portal Login</a>

  16. Special Characters: Square Brackets • Character Sets • Characters encased inside square brackets define all possible matches for a single text character: [abc] • A quantifier placed directly after the set will affect the whole character set • Placing a “-” between characters indicates a ‘range’ • Placing a “^” as the first item in the set creates a ‘negative pattern’ • Quantifier characters become literal matches: ? + * { } • Period character becomes literal match: .

  17. Character Sets: Examples Robert does not like to be called robert. Robert does not like to be called robert. [Rr]obert Robert does not like to be called Richard. Robert does not like to be called Richard. [A-Z][a-z]+ RobertdoesnotliketobecalledRoberta. Robert doesnotliketobecalled Roberta. [^A-Z .]+

  18. Shorthand Character Classes • Certain characters can reference a range of characters when ‘escaped’ by a backslash (\) • Common Examples: • \d matches all digit characters: [0-9] • \w matches all ‘word’ characters: [A-Za-z0-9_] • \s matches all ‘space’ characters (including line breaks) • Using the capital letter will ‘inverse’ the match • \S matches all non-space characters: [^\s]

  19. Character Classes: Example Jenny’s number is 867-5309. Jenny’s number is 867-5309. \d{3}-\d{4}

  20. Greedy Matches • When using quantifiers, a careless (or purposeful) pattern could match beyond an expected result • Apply an extra coating of “?” after the initial quantifier, to make the pattern stop at the first successful match Robert likes dogs! Robert likes cats! Robert likes .*! Robert likes dogs!Robert likes cats! Robert likes .*?!

  21. Challenge #2 • Set External Links to Create a New Window • Need to add the attribute target="_blank" • Links will start with “http” or “https” • Examples: • <a href="http://www.omniupdate.com/">OmniUpdate</a> • <a href="https://petitions.whitehouse.gov/">Petitions</a> • Desired Result: • <a href="http://www.omniupdate.com/" target="_blank">OmniUpdate</a> • <a href="https://petitions.whitehouse.gov/" target="_blank">Petitions</a>

  22. Challenge #2: Tips • Remember lessions learned from Challenge #1 • (abc)? • Remember syntax requirements of HTML (or XML) • HTML/XML have special characters that can only be used in certain places • Use a “Not” to match any character not in the set • [^abc] • Use capture groups to re-place content as needed • (abc) -> $1

  23. Challenge #2: Solution Steps to Build the Regex Pattern FIND: • <a href="http://www\.omniupdate\.com/">OmniUpdate</a>(Starting Pattern) • <a\s*href="http://www\.omniupdate\.com/"\s*>(Account for whitespace) • <a\s*href="https?://[^"]+"\s*>(Match any absolute URL) • (<a\s*href="https?://[^"]+"\s*)>(Capture Group) REPLACE: $1 target="_blank">(Use capture group, then end anchor tag) • Example Match/Replace: • <a href="http://www.omniupdate.com/about/">About</a> (Full Match) • <a href="http://www.omniupdate.com/about/">About</a> (Capture) • <a href="http://www.omniupdate.com/about/" target="_blank">About</a> (Replace)

  24. Thank you. Robert Kiffe Sr. Customer Support Engineer OmniUpdate 805-484-9400 ext 223 rkiffe@omniupdate.com outc18.com/surveys

More Related