Introducing the STL: Common Types Basic I/O

Introducing the STL: Common Types Basic I/O. Fred Kuhns Computer Science and Engineering Applied Research laboratory Washington University in St. Louis. Introduction to the STL. Namespace std and using declarations

  1. Introducing the STL:Common TypesBasic I/O Fred Kuhns Computer Science and Engineering Applied Research laboratory Washington University in St. Louis

  2. Introduction to the STL • Namespace std and using declarations • Do not place using declarations within your header files, users may not what to pollute their namespaces. • Examples: using std::cout; using namespace std; • Common types: vectors and strings provide a convenient, variable length storage class. • Prefer these two classes over using arrays or C-style strings. #include <string>;using std::string; #include <vector>; using std::vector; • bitset: generalizes bit-vectors. • Iterators: generalizes pointer arithmetic and iterating through collections of objects. CSE332: Object Oriented Software Development Laboratory

  3. Working Example • As software developers we frequently find ourselves needing to parse strings. • We focus on character strings using the STL string class but the techniques used have broader applicability • Common scenario • a file contains tabulated data that we must read, process and store results (perhaps within the same file) • A popular file format is CSV or Comma Separated Values • Commas separate fields, newlines ‘\n’ separate records • Alternatively, tabs may be used as field separators (with newlines serving as record separators). We can call this TSV though one often just sees CSV used to represent the various schemes. • Our example uses commas ‘,’ simply because it is easier to identify the field separator then if we used a tab. CSE332: Object Oriented Software Development Laboratory

  4. CSV # Registration table. # Leading and trailing whitespace is not significant. # Entry format (per registered person): # Last<fs>First<fs>MI<fs>ID<fs>Email<fs>Comments<RS> Smith , John , M , 1001 , john@someplace.com, needs receipt Jackson , Mary , I , 2010 , mary@thatplace.edu, Mitchel , Mark , L ,4000 ,mm@candy.com ,must call Hicks,,,2110 , ,must get missing information • Convenient to think of the file as a two-dimensional array where the rows define individual records and the columns fields. • It is important to be specific about assumptions concerning data format, in particular if there is a possibility of a field or record separator appearing within a field. You may decide to define an escape sequence in order to include control chars within fields. • Don’t assume that all fields will have values, nor that the proper number of field separators are present, especially if people are permitted to edit the file. • The first row may contain a header which defines column (field) names. CSE332: Object Oriented Software Development Laboratory

  5. Accessing the file • C++ gives you access to the C and C++ standard libraries for I/O. • Assume you need both an input and output file: char ch; ifstreamfin(“data.cvs”); if (!fin) {cerr << “open fin failed”; return -1;} ofstream fout(“result.cvs”); if (fout) {fin.close(); cerr << “open fout failed”; return -1;} while (fin.get(ch)) { … fout.put(ch);} if(!fin.eof()|| !fout){cerr<<“File IO error”;exit(1);} • you may explicitly open a file: fin.open(“filename”); • stream destructor closes or you may explicitly close: fin.close(); CSE332: Object Oriented Software Development Laboratory

  6. istream (input interface) ostream (output interface) iostream (console) ofstream (write to file) ifstream (read from file) stringstream (string I/O) fstream (file I/O) ostringstream (write to string) istringstream (read from string) IO Library Class Hierarchy CSE332: Object Oriented Software Development Laboratory

  7. Operations • Stream objects have stategood() : next operation expected to succeedeof() : end of file (input) reachedfail() : next operation will failbad() : corrupted stream • An operation on a stream not in a good state is a null op • bool operator!() const on a stream returns fail() • operator void*()const returns (fail() ? 0 : -1); • char oriented I/O uses get, put, read, write, getline and the operators << and >>. • get(char*,…) does not remove ‘\n’ • getline(char *,…) does remove the trailing newline. • Can also use the non-member function getline which takes a string CSE332: Object Oriented Software Development Laboratory

  8. Reading CSV • Questions: • is it OK to add fl to the vector records? • does the line read retain all whitespace? istream fin(argv[1]; string line; vector<string> lines; vector<FieldList> records; while (getline(fin, line)) { lines.push_back(line); // example of using vectors FieldList fl(line); records.push_back(fl); } CSE332: Object Oriented Software Development Laboratory

  9. Reading the fields, one of many ways string fname(defaultRoster), line; char FS = ‘\t’, RS = ‘\n’; std::ifstream fin(fname.c_str()); if (!fin) {cerr << “Bad fopen\n”; return 1;} while (getline(fin, line, RS)) { string field; std::istringstream str(line); while (getline(str, field, FS)) record.push_back(field); roster.push_back(record); record.clear(); } • To solve this consider the edge cases • Make sure you explicitly address each case • Draw a picture • Do you allow comments? • What about quoted text with embedded field separators? CSE332: Object Oriented Software Development Laboratory

  10. A second Approach vector<string> record; string fld; string WS = “ \t\n”, FS = “,”; string::size_type indx, lastIndx, tmp, end = rec.size(); for (indx = 0; indx <= end; indx = lastIndx + 1) { // begin by removing any leading whitespace. if ((indx = rec.find_first_not_of(WS, indx)) == string::npos) { // end of record fld = ""; lastIndx = end; } else if (FS.find_first_of(rec[indx]) != string::npos) { // rec[indx] is a char, so says the field is empty (i.e. rec[indx]==‘,’) fld = ""; lastIndx = indx; } else { // non-empty field if ((lastIndx = rec.find_first_of(FS, indx)) == string::npos) lastIndx = end; // remove trailing whitespace, start at index == lastIndx-1 tmp = rec.find_last_not_of(WS, lastIndx-1) + 1; fld = rec.substr(indx, tmp - indx); // ^start ^^^^^^^end } record.push_back(fld); } CSE332: Object Oriented Software Development Laboratory

  11. and another way typedef string fieldType; string RS = “\n”; string FS = “,”; typedef vector<string> recordType; typedef vector<recordType> rosterType; rosterType records; recordType rec; while (getline(fin, line, RS)) { std::istringstream str(line); char ch; fieldType field; while (str.get(ch)) { if (ch == FS) { // end of field rec.push_back(field); field.clear(); // field = ""; continue; } if (isspace(ch)) continue; // for now no spaces allowed! field += ch; } records.push_back(rec); } for (rosterType::const_iterator sit=roster.begin(); sit != roster.end(); ++sit){ for (recordType::const_iterator fit=sit->begin(); fit != sit->end(); ++fit) { cout << *fit << " : "; } cout << endl; } CSE332: Object Oriented Software Development Laboratory

  12. Simple Examples • You can use the find family of string member function to split up this line: exact match: find(), find at least one: find_first_of(), find_first_not_of(), find_last_of(), find_last_not_of() a, b, c\n char a , b , c 0 1 2 3 4 5 6 7 index Record as it appears in file string representation of record after a cal to getline(fin, line). line.size() == 8 CSE332: Object Oriented Software Development Laboratory

  13. strings • Defining and initializing: string s1, s2(s1), s3(“Three”), s4(5, ‘a’); • strings can be read from/written to iostreams • cin >> s: whitespace delineates returned string values. discards leading WS, reads char until next WS char. string s(“text”);cin >> s;cout << s << endl; • Can read an entire line (excluding the newline) string line; getline(cin, line); // returns cin • Operations • s.empty(), • s.size() : returns string::size_type • s[(string::size_type)n] : yields an Lvalue. Does not perform range checking. • + : concatenation • = : copy chars from rhs to lhs string • == : strings must be the same size and contain the same chars. • !=, <, <=, >, >= CSE332: Object Oriented Software Development Laboratory

  14. char processing • #include <cctype> islanum(c), isalpha(c), isdigit(c), isxdigit(c) isspace(c), ispunct(c), isupper(c), islower(c), iscntrl(c), isgraph(c), isprint(c), tolower(c), toupper(c) CSE332: Object Oriented Software Development Laboratory

  15. vectors • A vector is not exactly a type, it is a template that can be used to generate new types. It requires a type parameter to specify the element types. vector<string> lines; • Initialization:vector<int> counts(20, -1);vector<int> more(counts);vector<int> some(25); // value initializer • Value initializer: library creates initializer element for us. • For built-in types 0 is used • For user defined types the default constructor is used • If no constructor is defined, then each element of the object is value initialized. CSE332: Object Oriented Software Development Laboratory

  16. Operations on Vectors • Operations:x.empty() : returns true if emptyx.size() : number of elements, returns vector<X>:size_typex.push_back(t) : adds element t to tailx[n] : element at index n, yields an Lvaluex = y : replaces elements in x with copies of elements in y.x == y : true if they are equal!=, <, <=, >, >= CSE332: Object Oriented Software Development Laboratory

  17. Iterators • Used for indexing through containers, we will keep it simple for now. There is a first element and one past the end (i.e. the sentinal element). vector<int> vec; … fill vec …; vector<int>::iterator iter; for (iter = vec.begin(); iter != vec.end(); ++iter) cout << *iter << endl; • There is also a const which should be used on const objects vector<int> vec; … fill vec …; vector<int>::const_iterator iter; for (iter = vec.begin(); iter != vec.end(); ++iter) cout << *iter << endl; CSE332: Object Oriented Software Development Laboratory

