280 likes | 406 Vues
Dive into the world of regular expressions (RegEx) and learn how to harness their power for effective string manipulation. This comprehensive guide covers essential concepts, including string definitions, case sensitivity, and practical tasks like finding specific words within strings. Discover the basic rules of RegEx, including character matching, control characters, and using anchors to locate text within strings. With practical examples from ColdFusion, you'll gain the skills needed to implement RegEx effectively in various programming languages.
E N D
Hands-on Regular Expressions Simple rules for powerful changes
Definitions • String - Any collection of 0 or more characters. • Example: • “This is a String” • SubString - A segment of a String • Example: • “is a” • Case Sensitivity - detection if a character is upper or lower case. www.cfunited.com
Simple Task • Find the word “Name” inside a string: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#Find(‘Name’, String)# • </CFOUTPUT> • Position=0 www.cfunited.com
Simple Text • Find the word “Name” inside a string: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#Find(‘name’, String)# • </CFOUTPUT> • Position=4 www.cfunited.com
Simple Task • Find the word “Name” inside a string: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position= #FindNoCase(‘Name’, String)# • </CFOUTPUT> • Position=4 www.cfunited.com
Simple Task • Find the word “Name” inside a string using Regular Expressions: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#REFindNoCase(‘Name’, String)# • </CFOUTPUT> • Position=4 www.cfunited.com
Intro to Regular Expressions • Refereed to as RegEx • Matches patterns of characters • Used in many languages (ColdFusion, Perl, JavaScript, etc.) • Uses a small syntax library to do ‘dynamic’ matches • Can be used for Search and/or Replace actions • Slightly slower than similar Find() and Replace() functions • Has both a case sensitive and a non-case sensitive version of each function operation • REFind() • REFindNoCase() • REReplace() • REReplaceNoCase www.cfunited.com
RegEx Basics • Rule 1: A character matches itself as long as it is not a control character. • Example: • A=“A” • A=“a” (non-case sensitive) • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position= #REFindNoCase(‘n’, String)# • </CFOUTPUT> • Position=4 www.cfunited.com
RegEx Basics • Rule 1a: A search will return the first successful match. To get a different match, set the start position (third attribute of the function - optional) • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position1= #REFindNoCase(‘M’, String)# • Position2= #REFindNoCase(‘M’, String, 2)# • </CFOUTPUT> • Position1=1 • Position2=12 www.cfunited.com
RegEx Basics • Rule 2: A collection of non-control characters matches another collection of non-control characters. • AA=“AA” • AA!=“Aa” (case sensitive) • AA=“Aa” (non-case sensitive) • A A=“A A” (notice the space) • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#REFindNoCase(‘y n’, String)# • </CFOUTPUT> • Position=2 www.cfunited.com
RegEx Basics • Rule 3: A period (.) is a control character that matches ANY other character. • Example: • . = “A” • A. = “Ac” • A.A=“A A” • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position= #REFindNoCase(‘N.me’, String)# • </CFOUTPUT> • Position=4 www.cfunited.com
RegEx Basics • Rule 4: A control character can be ‘escaped’ by using a backslash (\) before it. This will cause the control character to match a text version of itself. • Example: • . = “.” • \. = “.” • A\.A = “A.A” • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘tz\.’, String)# • </CFOUTPUT> • Position=26 www.cfunited.com
RegEx Anchoring • Rule 5a: Using the caret (^) will make sure the text your searching for is at the start of the string. • Example: • ^A= “A” • ^M != “AM” • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘^My’, String)# • Position2=#REFindNoCase(‘^is’, String)# • </CFOUTPUT> • Position1=1 • Position2=0 www.cfunited.com
RegEx Anchoring • Rule 5b: Using the dollar sign ($) will make sure the text your searching for is at the end of the string. • Example: • A$ = “A” • M$ = “MAM” (second M will be returned) • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘\.$’, String)# • </CFOUTPUT> • Position1=28 www.cfunited.com
RegEx Ranges • Rule 6: When looking for one of a group of characters, place them inside square brackets ([]). • Example: • ‘[abc]’ will match either a, b, or c. • ‘[.+$^]’ will match either a period (.), a plus (+), a dollar sign ($) or a caret (^). Note that all special characters are escaped within square brackets. • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘M[aeiou]’, String)# • </CFOUTPUT> • Position1=6 www.cfunited.com
RegEx Ranges • Rule 7a: A caret (^), when used within square brackets ([]) is has the effect of saying ‘NOT these characters’. It must be the first character for this to work. • Example: • ‘[^abc]’ will match ANY character other than a, b, or c. • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘M[^aeiou]’, String)# • </CFOUTPUT> • Position1=1 www.cfunited.com
RegEx Ranges • Rule 7b: A dash (-), when used within square brackets ([]) has the effect of saying ‘all characters from the first character till the last’. • Example: • ‘[a-e]’ will match ANY character between a and e. • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘M[a-m]’, String)# • </CFOUTPUT> • Position1=6 www.cfunited.com
RegEx Ranges • Rule 8: ColdFusion has a series of pre-built character ranges. These are referenced as [[:range name:]]. • Example: • [[:digit:]] - same as 0-9 (all numbers) • [[:alpha:]] - same as A-Z and a-z (all letters of both case) • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘[[:space:]]’, String)# • </CFOUTPUT> • Position1=3 www.cfunited.com
RegEx Character Classes www.cfunited.com
RegEx Multipliers • Any character or character class can be assigned a multiplier that will define the use of the character or class. These multipliers can say that a character must exist, is optional, may exist for a certain minimum or maximum, etc. • Multiplier characters include: • Plus (+) One or more • Asterisk (*) 0 or more • Question Mark (?) may or may not exist once • Curly Brackets({}) A specific range of occurances www.cfunited.com
RegEx Multipliers • The Plus (+) multiplier specifies that the character or character group must exist but can exist more than once. • Example: • A+ - A followed by any number of additional A’s • [[:digit:]]+ - A number (0-9) followed by any amount of additional numbers • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘is+i’, String)# • </CFOUTPUT> • Position1=2 www.cfunited.com
RegEx Multipliers • The Asterisk (*) multiplier specifies that the character or character group may or may not exist, and can exist more than once. (I.e. 0 or more) • Example: • A* - Either no A or an A followed by any number of additional A’s • [[:digit:]]* - Either no number (0-9) or a number followed by any amount of additional numbers • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘si*s’, String)# • </CFOUTPUT> • Position1=3 www.cfunited.com
RegEx Multipliers • The Question mark (?) multiplier specifies that the character or character group may or may not exist, but only once. • Example: • A? - Either A or no As • [[:digit:]]+ - One or no numbers (0-9) • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘p?i’, String)# • </CFOUTPUT> • Position1=2 www.cfunited.com
RegEx Multipliers • Curly brackets ({}) can be used to specify a minimum and maximum range for a character to appear. The format is {min, max} • Example: • A{2,4} - 2 As or more but no more than 4. • [[:digit:]]{1,6} - 1 number (0-9) or more, but no more than 6. • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘s{2,3}’, String)# • </CFOUTPUT> • Position1=3 www.cfunited.com
RegEx SubExpressions • SubExpressions are a way of grouping characters together. This allows us to reference the entire group at once. To group characters together, place them within parenthesis (). • Example: • (Name) = name • (Name)+ = name, namename or basically one or more names. • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘(iss)+’, String)# • </CFOUTPUT> • Position1=2 www.cfunited.com
RegEx SubExpressions • An additional special character that is usable within a subExpression is the pipe (|). This means either the first group of text or the second (or more). • Example: • (Na|me) = na or me • (Name|Date) = Name or date • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘(hard|word)’, String)# • </CFOUTPUT> • Position1=18 www.cfunited.com
RegEx SubExpressions • SubExpressions allow us to do something else that’s special; back referencing. This is the ability to reference one or more groups directly. This is done by using the backslash (\) followed by a number that specifies which subexpression we want. • Example: • (name)\1 = namename • (Name|Date)\1 = namename or datedate • <CFSET String=“Mississippi is is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘(is )\1’, String)# • </CFOUTPUT> • Position1=13 www.cfunited.com
REReplace • The REReplace() and REReplaceNoCase() functions use everything you’ve learned about searching and allows you to ‘work’ with the search results, I.e. replace them with something. • Example: • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REReplaceNoCase(String, ‘iss’, ‘emm’)# • Position2=#REReplaceNoCase(String, ‘iss’, ‘emm’, ‘all’)# • </CFOUTPUT> • Position1=Memmissippi is a hard word • Position2=Memmemmippi is a hard word www.cfunited.com