Complexity and Big-O notation P.J. Drongowski 27 November 2004 Need a way to analyze and compare algorithms * Define and choose one or more high-level primitive operations + Comparison of two numbers or two strings + Data copy operations (shuffles, swaps) + Method invocations (subroutine calls) + Following (chasing) a pointer to another object * Execution time of a primitive operation should be constant * Count high-level primitive operations * Use the count as an estimate of execution time * Execution time should be proportional to the number of primitive operations performed Case analysis * Algorithm run-time may depend on the input data (aka the "data set") * Three cases to consider + Best case: The minimum number of operations (or time) needed to process an input data set + Average case: The average number of operations (or time) needed to process any data set + Worst case: The maximum number of operations (or time) needed to process an input data set * Real performance can range from best case to worst case * Worst case analysis is generally easier to perform than average case Example: Find the minimum item in an array Function: find_minimum Purpose: Find and return minimum item in an array Parameters: array: The array to be searched n: Number of elements in the array int find_minimum(int array[], int n) { int i ; int minimum ; minimum = array[0] ; for (i = 1 ; i < n ; i++) { if (array[i] < minimum) { minimum = array[i] ; } } return( minimum ) ; } "Order of" time complexity * Precise estimation of execution time is tedious and not always accurate + There are many individual operations + The time for each operation may not be known (e.g., time is dependent on the code generated, the machine's instruction set, speed of the CPU, etc.) * Instead, assume that differences in software/hardware can be accomodated by a small constant factor * Big-O notation for functions/algorithms disregards constant factors (multiplicative constants) + We need to relate two functions, F(N) and G(N), such that F(N) is bounded by G(N) times some constant + N is a measure of problem size + The relationship should hold for sufficiently large values of N + It reflects the dominant term in a formula/function; It is the term that grows the fastest * Formal definition + "F(N) = O( G(N) )" means that there are positive constants c and k such that 0 <= F(N) <= c*G(N) for all N >= k + c and k cannot depend upon N + F(N) is said to be "order G(N)" Order-of by increasing growth rate ---------------------------------- Term Notation When the problem size is doubled... ----------- ----------- --------------------------------------------- Constant O(1) Execution time does not increase Logarithmic O(log n) Execution time increases by a fixed amount Log-squared O(log^2 N) Linear O(n) Execution time is doubled Supralinear O(n log n) Quadratic O(n^2) Execution time is quadrupled Polynomial O(n^k) k>=1 Exponential O(a^n) a>=1 Factorial O(n!) Limitations of Big-O analysis * N must be sufficiently large + Performance of a "poor" algorithm may be better than a "good" algorithm for small N + A polynomial time algorithm is always prefered over an exponential time algorithm * The constant c may itself be large