570 likes | 717 Vues
Characters and Strings. Representation of single characters. Data type char is the data type that represents single characters, such as letters, numerals, and punctuation marks A literal value of type char is written as a single character enclosed within single quotation marks Examples:
E N D
Representation of single characters • Data type char is the data type that represents single characters, such as letters, numerals, and punctuation marks • A literal value of type char is written as a single character enclosed within single quotation marks • Examples: ‘a’, ‘F’, ‘9’, ‘&’, ‘ ’, ‘,’
Character encoding • ASCII stands for American Standard Code for Information Interchange. • ASCII is one of the document coding schemes widely used today. This coding scheme allows different computers to share information easily. • Most programming languages support ASCII characters
ASCII Encoding • ASCII works well for English-language documents because all characters and punctuation marks are included in the ASCII codes. • ASCII does not represent the full character sets of other languages.
9 For example, character 'O' is 79 (row value 70 + col value 9 = 79). O 70 ASCII Encoding
Limitations of ASCII • ASCII uses 8 bits to represent a single character • One bit is reserved for the sign in standard ASCII • This leaves 27 (128) unique combinations of bits to represent characters • The extended ASCII set uses all 8 bits to represent a character, given 256 unique combinations
Unicode Encoding • The Unicode Worldwide Character Standard (Unicode) supports the interchange, processing, and display of the written texts of diverse languages. • Java uses the Unicode standard for representing char constants. • Each Unicode character occupies 16 bits, allowing for the possibility of 216 (65,536) unique bit combinations • Currently 34,168 distinct characters are defined, covering most of the major world languages
ASCII/Unicode equivalence • Unicode uses the same bit combinations for the characters that exist in the ASCII set • Thus, an English alphabetic character has the same numeric value in both ASCII and Unicode
Special characters • Several keys on a standard keyboard don’t translate directly into printable (or displayable) characters • For example, the Enter key moves the cursor to a new line; we already know that the character that corresponds to this action can be represented as ‘\n’
Special characters • Some other special characters used in Java include: • ‘\t’: horizontal tab character • ‘\a’: alarm “character” – causes system speaker to beep • ‘\\’: a single backslash
char ch1 = 'X'; System.out.println(ch1); System.out.println( (int) ch1); int x = 99; System.out.println(x); // prints 99 System.out.println( (char) x); // prints c X 88 Converting between char and int We can convert between a numeric (int) value and its corresponding ASCII character equivalent by using type casting, as the examples below illustrate:
Character comparison • Values of type char can be compared just like integers are compared, since they are actually stored as binary whole numbers • In the ASCII (and Unicode) set, uppercase letters have lower numeric value than lowercase letters • So, for example, ‘A’ is less than ‘a’, and ‘b’ is greater than ‘Z’
Strings • A string is a sequence of characters that is treated as a single value. • Instances of the String class are used to represent strings in Java. • We access individual characters of a string by calling the charAt method of the String object.
Strings • Each character in a string has an index we use to access the character. • Java uses zero-based indexing; the first character’s index is 0, the second is 1, and so on. • To refer to the first character of the word name, we say name.charAt(0)
String indexing with charAt method • An indexed expression is used to refer to individual characters in a string.
Constructing strings • Since String is a class, we can create an instance of a class by using the new method. • The statements we have used so far, such as String name1 = “Kona”; • works as a shorthand for String name1 = new String(“Kona”); • But this shorthand works for the String class only.
char letter; String name = JOptionPane.showInputDialog(null,"Your name:"); int numberOfCharacters = name.length(); int vowelCount = 0; for(int i = 0; i < numberOfCharacters; i++) { letter = name.charAt(i); if(letter == 'a' || letter == 'A' || letter == 'e' || letter == 'E' || letter == 'i' || letter == 'I' || letter == 'o' || letter == 'O' || letter == 'u' || letter == 'U') { vowelCount++; } } System.out.print(name + ", your name has " + vowelCount + " vowels"); Here’s the code to count the number of vowels in the input string. Example: Counting Vowels
int javaCount = 0; boolean repeat = true; String word; while( repeat ) { word = JOptionPane.showInputDialog(null,"Next word:"); if( word.equals("STOP") ){ repeat = false; }else if( word.equalsIgnoreCase("Java") ) { javaCount++; } } Notice how the comparison is done. We are not using the == operator. Example: Counting ‘Java’ Continue reading words and count how many times the word Java occurs in the input, ignoring the case.
Comparing Strings • Comparing String objects is similar to comparing other objects. • The equality test (==) is true if the contents of the variables are the same. • For a reference data type, the equality test is true if both variables refer to the same object, because they both contain the same address. Thus, the “contents of the variable” does not mean “the sequence of characters in the String”
Comparing Strings • We don’t usually use the == operator to compare Strings • The equals method is true if the String objects to which the two variables refer contain the same string value. String s1 = new String (“hello”); String s2 = new String (“hello”); if (s1 == s2) System.out.println (“They are equal”); // this won’t print if (s1.equals(s2)) System.out.println (“No, really, they are”); // this will print
The difference between the equality test and the equals method
Comparing Strings • String comparison may be done in several ways. • The methods equals and equalsIgnoreCase compare string values; one is case-sensitive and one is not. • The method compareTo returns a value: • Zero (0) if the strings are equal. • A negative integer if the first string is less than the second. • A positive integer if the first string is greater than the second.
Comparing Strings • As long as a new String object is created using the new operator, the rule for comparing objects applies to comparing strings. String str = new String (“Java”); • If the new operator is not used, string data are treated as if they are of the primitive data type. String str = “Java”;
The difference between using and not using the new operator for String
Pattern Matching and Regular Expressions • Pattern matching is a common function in many applications. • In Java 2 SDK 1.4, two new classes, Pattern and Matcher, are added. • The String class also includes several new methods that support pattern matching.
first character is 5 third character is any digit between 1 and 7 second character is 1, 2, or 3 Pattern Example • Suppose students are assigned a three-digit code: • The first digit represents the major (5 indicates computer science); • The second digit represents either in-state (1), out-of-state (2), or foreign (3); • The third digit indicates campus housing: • On-campus dorms are numbered 1-7. • Students living off-campus are represented by the digit 8. The 3-digit pattern to represent computer science majors living on-campus is 5[123][1-7]
Pattern Matching and Regular Expression • The pattern is called a regular expression that allows us to denote a large set of “words” (any sequence of symbols) succinctly. • Brackets [ ] represent choices, so [abc] means a, b, or c. • For example, the definition for a valid Java identifier may be stated as [a-zA-Z][a-zA-Z0-9_$]*
Regular Expressions • Rules • The brackets [ ] represent choices • The asterisk symbol * means zero or more occurrences. • The plus symbol + means one or more occurrences. • The hat symbol ^ means negation. • The hyphen – means ranges. • The parentheses ( ) and the vertical bar | mean a range of choices for multiple characters.
Pattern Matching and Regular Expression • The matches method from the String class is similar to the equals method. • However, unlike equals, the argument to matches can be a pattern.
Pattern Matching and Regular Expression • The period symbol (.) is used to match any character except a line terminator (\n or \r). String document; document = ...; //assign text to ‘document’ if (document.matches(“.*zen of objects.*”){ System.out.println(“Found”); } else { System.out.println(“Not found”); }
Pattern Matching and Regular Expression • Brackets ([ ]) are used for expressing a range of choices for a given character. • To express a range of choices for multiple characters, use parentheses and the vertical bar.
Pattern Matching and Regular Expression • The replaceAll method is new to the Version 1.4 String class. • This method allows us to replace all occurrences of a substring that matches a given regular expression with a given replacement string.
Pattern Matching and Regular Expression • For example, to replace all vowels in a string with the @ symbol: String originalText, modifiedText; originalText = ...; //assign string to ‘originalText’ modifiedText = originalText.replaceAll(“[aeiou]”,”@”); • Note that this method does not change the original text; it simply returns a modified text as a separate string.
Pattern Matching and Regular Expression • To match a whole word, use the \b symbol to designate the word boundary. str.replaceAll(“\\btemp\\b”, “temporary”); • Two backslashes are necessary because we must write the expression in a String representation. Two backslashes prevents the system from interpreting the regular expression backslash as a control character.
Pattern Matching and Regular Expression • The backslash is also used to search for a command character. For example: • To search for the plus symbol (+) in text, we use the backslash as \+. • To express it as a string, we write “\\+”.
The Pattern and Matcher Classes • The matches and replaceAll methods of the String class are shorthand for using the Pattern and Matcher classes from the java.util.regex package.
The Pattern and Matcher Classes • If str and regex are String objects, then both str.matches(regex); and Pattern.matches(regex, str); are equivalent to Pattern pattern = Pattern.compile(regex); Matcher matcher = p.matcher(str); matcher.matches();
The Pattern and Matcher Classes • Creating Pattern and Matcher objects gives us more options and efficiency. • The compile method of the Pattern class converts the stated regular expression to an internal format to carry out the pattern-matching operation. • This conversion is carried out every time the matches method of the String or Pattern class is executed.
The Pattern and Matcher Classes /* Chapter 9 Sample Program: Checks whether the input string is a valid identifier. This version uses the Matcher and Pattern classes. File: Ch9MatchJavaIdentifier2.java */ import javax.swing.*; import java.util.regex.*; class Ch9MatchJavaIdentifier2 { private static final String STOP = STOP"; private static final String VALID ="Valid Java identifier"; private static final String INVALID ="Not a valid Java identifier";
The Pattern and Matcher Classes private static final String VALID_IDENTIFIER_PATTERN = "[a-zA-Z][a-zA-Z0-9_$]*"; public static void main (String[] args) { String str, reply; Matcher matcher; Pattern pattern = Pattern.compile(VALID_IDENTIFIER_PATTERN); while (true) { str = JOptionPane.showInputDialog null, "Identifier:"); if (str.equals(STOP)) break;
The Pattern and Matcher Classes matcher = pattern.matcher(str); if (matcher.matches()) { reply = VALID; } else { reply = INVALID; } JOptionPane.showMessageDialog(null, str + ":\n" + reply); } // ends loop } // ends main } // ends class
The Pattern and Matcher Classes • The find method is another powerful method of the Matcher class. • The method searches for the next sequence in a string that matches the pattern, and returns true if the pattern is found.
The Pattern and Matcher Classes • When a matcher finds a matching sequence of characters, we can query the location of the sequence by using the start and end methods.
The Pattern and Matcher Classes • The start method returns the position in the string where the first character of the pattern is found. • The end method returns the value 1 more than the position in the string where the last character of the pattern is found.
The String Class is Immutable • In Java a String object is immutable • This means once a String object is created, it cannot be changed, such as replacing a character with another character or removing a character • The String methods we have used so far do not change the original string. They created a new string from the original. For example, substring creates a new string from a given string. • The String class is defined in this manner for efficiency reasons.