C-style strings P.J. Drongowski 15 October 2004 (Revised: 1 March 2005) Why C-style strings? * C++ provides a very comprehensive string library (#include ) * A large body of existing C++ code uses C-style strings * Plus, knowledge of C-style strings gives insight into the representation of arrays, pointers, etc. What is a C-style string? * A C-style string is a sequence of characters terminated by a null character ('\0') * A C-style string is stored in a one dimensional array of characters: char name[128] ; * The length of a string is the number of characters from the beginning of the string up to (but not including) the null character * The length is not the same as the size of the array in which the string is stored * Because a C-style string is stored in an array, it is conventional practice to refer to the string by a pointer to the first character in the array (string) C-style library routines * C-style strings are manipulated using a different library (#include ) * A programmer should always use library routines * The library routines are usually optimized for the host platform (instruction set, compiler, etc.) strcat(s1,s2) Concatenate s2 to s1 strncat(s1,s2,n) Concatenate at most n characters of s2 to s1 strcpy(s1,s2) Copy s2 to s1; Note the direction of the copy! strncpy(s1,s2,n) Copy first n characters of s2 to s1 strlen(s) Return length of string s, not counting '\0' strcmp(s1,s2) Compare s1 with s2; Return integer less than zero, equal to zero, or greater than zero strncmp(s1,s2,n) Compare only the first n characters of s1 and s2 strchr(s,c) Return a pointer to first occurence of character c in string s; return null zero if not found strrchr(s,c) Return a pointer to last occurence of character c in string s; return null zero if not found strstr(s1,s2) Return a pointer to the first occurence of string s1 in string s2; return zero if not found strpbrk(s1,s2) Return a pointer to the first occurence in string s1 of any character contained in string s2; return zero if not found strspn(s1,s2) Return number of characters in s1 before occurrence of any character in string s2 strcspn(s1,s2) Return number of characters in s1 before occurrence of any character not in string s2 ******************************************** Command line arguments ******************************************** int main(int argc, char *argv[]) { ... } * The operating system and C++ runtime environment set-up the arguments to the function main * By convention, the arguments are called "argc" and "argv" + argc: Number of command line arguments including the program name + argv: Contains pointers to the individual arguments * The individual arguments are represented as strings; argv is a one dimensional array of char pointers where each pointer points to an individual argument * Example: g++ -c file.cpp argc = 3 argv[0] -> "g++" argv[1] -> "-c" argv[2] -> "file.cpp * UNIX conventions + Options: '-' followed by a single character + An option may take a take a parameter which follows the option character + Options usually appear before other arguments on command line, but this is not always required * The main function usually processes the command line * Idiom (from Practical C++ Programming) while (argc > 1) { if (argv[1][0] == '-') { switch( argv[1][1]) { case 'v': { ... break ; } case 'u': { ... break ; } default: { cerr << "Unknown option: " << argv[1][1] << endl ; break ; } } } ++argv ; --argc ; } * The standard library includes a C language routine called "getopt" which assists command line processing (See "man 3 getopt") * There is also a C++ version of the GNU getopt function called "GetOpt" ******************************************** Using integer indices to copy a string ******************************************** void string_copy(char from[], char to[]) { int i ; for (i = 0 ; from[i] != '\0' ; i++) { to[i] = from[i] ; } to[i] = '\0' ; } string_copy: pushl %ebp // Prelude movl %esp,%ebp subl $24,%esp movl $0,-4(%ebp) // i = 0 loop: movl 8(%ebp),%eax // Move address of "from" to eax register movl -4(%ebp),%edx // Move i to edx register addl %edx,%eax // Compute address of from[i] cmpb $0,(%eax) // Compare from[i] with '\0' jne body // Jump to body if from[i] != '\0' jmp done // Jump out of loop if from[i] == '\0' body: movl 12(%ebp),%eax // Move address of "to" to eax register movl -4(%ebp),%edx // Move i to edx register addl %edx,%eax // Compute address of to[i] movl 8(%ebp),%edx // Move address of "from" to eax register movl -4(%ebp),%ecx // Move i to ecx register addl %ecx,%edx // Compute address of from[i] movb (%edx),%cl // Read from[i] into register cl movb %cl,(%eax) // Write character in register cl to to[i] incl -4(%ebp) // i++ jmp loop // Jump to top of loop done: movl 12(%ebp),%eax // Move address of "to" to eax movl -4(%ebp),%edx // Move i to edx register addl %edx,%eax // Compute address of to[i] movb $0,(%eax) // Assign '\0' to to[i] movl %ebp,%esp // Postlude popl %ebp ret ******************************************** Using char pointers to copy a string ******************************************** void string_copy(char *from, char *to) { while(*to++ = *from++) ; } string_copy: pushl %ebp // Prelude movl %esp,%ebp loop: movl 12(%ebp),%eax // Move "to" pointer into register EAX movl 8(%ebp),%edx // Move "from" pointer into register EDX movb (%edx),%cl // Move memory[EDX] into register CL movb %cl,(%eax) // Move register CL into memory[EAX] incl 8(%ebp) // Increment "from" pointer incl 12(%ebp) // Increment "to" pointer testb %cl,%cl // Test character jne loop // Repeat loop if not equal to zero movl %ebp,%esp // Postlude popl %ebp ret