Simple pointers and arrays P.J. Drongowski 15 October 2004 C/C++ are system programming languages * A system programming language affords relatively complete access to the underlying machine * It's important to understand the mapping of certain C/C++ features in order to fully exploit the capabilities of the language * Because access is complete and unfettered, C/C++ are dangerous languages Data items * Kinds of data items + Primitive: char, short int, int, long int, float, double + Arrays: one dimensional, two dimensional, ... + Structured: object, struct * All data items share common characteristics + They reside in memory + They occupy space (one or more contiguous bytes) + They are addressable - There is a "base address" for the whole item - There are offsets from the base address to find subitems / members * Example: A simple integer + On x86, an integer occupies 4 contiguous bytes + The integer is located at a particular memory address * Example: A one dimensional array of N integers, where N is the array size + On x86, the array is 4*N contiguous bytes of storage + The first integer in the array is at the base address + The ith integer, a[i], is at (base address)+4*i * Example: An object + The size and layout of the object depend upon the declared data members + Let's say the object has the data members: - Two integers, x and y - A one dimensional array of 16 characters, s + The object is "laid down" in memory as: - Two integers of 4 bytes each followed by - Sixteen contiguous bytes + The base address of the object is the address of the first item, namely, the integer x + The integer y is located at (object base address)+4 + The character array starts at (object base address)+8 + The ith character of the array, c[i], is at (object base address)+8+i C++ gives us 2.5 ways to refer to a data item * By name, using a declared variable * "Anonymously," using a pointer (ok, and references, too) Declared variables * When we declare a variable, we declare a symbolic name by which we can refer to its value * The compiler needs to know the type of the variable + To perform type checking + To figure out how many bytes to allocate for the variable's value + To choose machine instructions to manipulate data of that type (e.g., machines have different instructions for integer and floating point) * The compiler decides where to put the contents in memory (i.e., where to store the value) * The compiler remembers the name-address association and replaces the name with the address when generating code Pointer * A pointer refers to a data item in memory * A pointer is (generally) an address of a data item * A pointer variable holds a pointer's value, that is, the address of a data item * A pointer is declared by putting an asterisk '*' in front of the variable name in a declaration: int item, v ; int *pointer_to_int ; * The "address of" operator returns the address of a data item pointer_to_int = &item ; "pointer_to_int" now refers to the same integer (the same memory location) as the integer variable "item." * The "dereference" operator is used to obtain the value referenced by a pointer: v = *pointer_to_int ; The expression on the right hand side returns the integer at the selected memory location ==> If you have trouble understanding these concepts, try writing some simple C++ programs to display the values of addresses, etc. cout << "Value of item: " << item << endl ; cout << "Address of item: " << &item << endl ; cout << "pointer_to_int: " << pointer_to_int << endl ; cout << "*pointer_to_int: " << *pointer_to_int << endl ; Some common questions * If a pointer is an address, why does it need a type? + You would like the compiler to perform type checking, right? + The compiler needs to know the type in order to generate the appropriate machine instruction to dereference the pointer * Can I have a pointer to a pointer? + Yes, this is sometimes a useful device + This can be confusing, so draw a picture of who points to what! * Can two pointers point to the same data item? + Yes, why not? + Multiple pointers to the same data item leads to a compilation problem called "aliasing" + A compiler cannot tell if two or more pointers refer to the same data item + Aliasing inhibits code optimization * Are pointers dangerous? + With power, this is always hazard + If you have an address, you can touch and overwrite any memory location at will + Malicious code can use pointers to destroy data, as Trojan horses, etc. + Java does not use pointers to close such security holes * What happens when a pointer refers to memory that is not part of the program's address space? + How is a program's memory space organized by the operating system? + A memory access violation (segmentation fault) will occur Pointer operators Operator Purpose -------- ----------------------------------------------------- * Dereference (return the value of the referenced item) & Address of The NULL pointer * C/C++ standard library defines a special pointer value, NULL * The value of NULL is zero * NULL points to nothing * On most machines, dereferencing a NULL pointer (that is, a pointer variable with the value NULL) causes an exception or a memory access violation const pointers * Consider "const char *answer_ptr = "Forty-two" ;" + The data pointed to by answer_ptr is constant and cannot be changed + Can the pointer be changed? Yes * Consider "char * const name_ptr = "Test" ;" + name_ptr is a constant pointer and cannot be changed + Can the data pointed to by name_ptr be changed? Yes * Consider "const char * const title_ptr = "Title" ;" + Both the pointer and data are constant + Can either be changed? No Pointers and arrays * Elements in an array are allocated contiguously in memory at consecutive addresses * The compiler generates code to convert an array index into an offset from the base address of the array * Using an array name by itself is shorthand for &array[0] + This is the base address of the array + It is also the address of the first element in the array int array[32] ; // Declare a 32 element array int *ptr_i ; // Declare a pointer to an int ptr_i = array ; // Assigns base address of the array to the pointer ptr_i = &array[0] ; // Has the same effect as the previous assignment * Some arithmetic operations on pointers are allowed *(ptr_int+1) // Returns the value of the next int in memory // Is the same as ptr_int[1] *ptr_int++ // Returns an int from memory and advances the pointer // to the next int Portability concerns * The allocate size of a data item (e.g., an integer) depends upon the compiler and the underlying processor + x86 example: long integer is 4 bytes (32-bits) + Alpha example: long integer is 8 bytes (64-bits) * Order and packing of members in objects (structs) depends on compiler and sometimes, the underlying processor * Pointers and integers are not interchangeable on all architectures + PDP-10 example: byte pointers + Alpha example: 64-bit addresses (clean) + Itanium example: 32-bit addresses must be swizzled to 64-bits * Beware of casting one pointer type into another pointer type * Little endian / big endian machines + Little endian: low-order byte of the number is stored in memory at the lowest address, and the high-order byte at the highest address (Example: Intel x86 architecture, Digital Alpha) + Big endian: high-order byte of the number is stored in memory at the lowest address, and the low-order byte at the highest address (Example: Sun SPARC, HP PA-RISC, OS X/PowerPC, 68K) * Never, ever write code that assumes a particular item size, object layout, etc. or uses "tricky" address arithmetic Source: Practical C++ Programming, Steve Oualline, O'Reilly Associates, 2003 ************************************** A simple program to try ************************************** #include using namespace std ; int *ptr_to_int ; int item ; int main(int argc, char *argv[]) { item = 57 ; ptr_to_int = &item ; cout << "Initializing:" << endl ; cout << " Value of item: " << item << endl ; cout << " Value of &item: " << &item << endl ; cout << " Value of ptr_to_int: " << ptr_to_int << endl ; item = 63 ; cout << "Changing item to 63" << endl ; cout << " Value of item: " << item << endl ; cout << " Value of *ptr_to_int: " << *ptr_to_int << endl ; cout << " Value of ptr_to_int: " << ptr_to_int << endl ; *ptr_to_int = 34 ; cout << "Changing *ptr_to_int to 34" << endl ; cout << " Value of item: " << item << endl ; cout << " Value of *ptr_to_int: " << *ptr_to_int << endl ; cout << " Value of ptr_to_int: " << ptr_to_int << endl ; } ************************************** Program output ************************************** 42 > g++ print_ptr.cpp 43 > ./a.out Initializing: Value of item: 57 Value of &item: 0x52024 Value of ptr_to_int: 0x52024 Changing item to 63 Value of item: 63 Value of *ptr_to_int: 63 Value of ptr_to_int: 0x52024 Changing *ptr_to_int to 34 Value of item: 34 Value of *ptr_to_int: 34 Value of ptr_to_int: 0x52024