Complexity and Big-O notation
P.J. Drongowski
27 November 2004

Need a way to analyze and compare algorithms

  * Define and choose one or more high-level primitive operations
    + Comparison of two numbers or two strings
    + Data copy operations (shuffles, swaps)
    + Method invocations (subroutine calls)
    + Following (chasing) a pointer to another object
  * Execution time of a primitive operation should be constant
  * Count high-level primitive operations
  * Use the count as an estimate of execution time
  * Execution time should be proportional to the number of primitive
    operations performed

Case analysis

  * Algorithm run-time may depend on the input data (aka the "data set")
  * Three cases to consider
    + Best case: The minimum number of operations (or time) needed to
      process an input data set
    + Average case: The average number of operations (or time) needed to
      process any data set
    + Worst case: The maximum number of operations (or time) needed to
      process an input data set
  * Real performance can range from best case to worst case
  * Worst case analysis is generally easier to perform than average case

Example: Find the minimum item in an array

  Function: find_minimum
  Purpose: Find and return minimum item in an array
  Parameters:
    array: The array to be searched
    n: Number of elements in the array

  int find_minimum(int array[], int n)
  {
    int i ;
    int minimum ;

    minimum = array[0] ;

    for (i = 1 ; i < n ; i++) {
      if (array[i] < minimum) {
        minimum = array[i] ;
      }
    }
    return( minimum ) ;
  }

"Order of" time complexity

  * Precise estimation of execution time is tedious and not always accurate
    + There are many individual operations
    + The time for each operation may not be known (e.g., time is dependent
      on the code generated, the machine's instruction set, speed of the
      CPU, etc.)
  * Instead, assume that differences in software/hardware can be accomodated
    by a small constant factor
  * Big-O notation for functions/algorithms disregards constant factors
    (multiplicative constants)
    + We need to relate two functions, F(N) and G(N), such that F(N) is
      bounded by G(N) times some constant
    + N is a measure of problem size
    + The relationship should hold for sufficiently large values of N
    + It reflects the dominant term in a formula/function; It is the term
      that grows the fastest
  * Formal definition
    + "F(N) = O( G(N) )" means that there are positive constants c and k
      such that 0 <= F(N) <= c*G(N) for all N >= k
    + c and k cannot depend upon N
    + F(N) is said to be "order G(N)"

                      Order-of by increasing growth rate
                      ----------------------------------

  Term          Notation      When the problem size is doubled...
  -----------   -----------   ---------------------------------------------
  Constant      O(1)          Execution time does not increase
  Logarithmic   O(log n)      Execution time increases by a fixed amount
  Log-squared   O(log^2 N)
  Linear        O(n)          Execution time is doubled
  Supralinear   O(n log n)
  Quadratic     O(n^2)        Execution time is quadrupled
  Polynomial    O(n^k) k>=1
  Exponential   O(a^n) a>=1
  Factorial     O(n!)

Limitations of Big-O analysis

  * N must be sufficiently large
    + Performance of a "poor" algorithm may be better than a "good"
      algorithm for small N
    + A polynomial time algorithm is always prefered over an exponential
      time algorithm
  * The constant c may itself be large