Complexity and Big-O notation
P.J. Drongowski
27 November 2004
Need a way to analyze and compare algorithms
* Define and choose one or more high-level primitive operations
+ Comparison of two numbers or two strings
+ Data copy operations (shuffles, swaps)
+ Method invocations (subroutine calls)
+ Following (chasing) a pointer to another object
* Execution time of a primitive operation should be constant
* Count high-level primitive operations
* Use the count as an estimate of execution time
* Execution time should be proportional to the number of primitive
operations performed
Case analysis
* Algorithm run-time may depend on the input data (aka the "data set")
* Three cases to consider
+ Best case: The minimum number of operations (or time) needed to
process an input data set
+ Average case: The average number of operations (or time) needed to
process any data set
+ Worst case: The maximum number of operations (or time) needed to
process an input data set
* Real performance can range from best case to worst case
* Worst case analysis is generally easier to perform than average case
Example: Find the minimum item in an array
Function: find_minimum
Purpose: Find and return minimum item in an array
Parameters:
array: The array to be searched
n: Number of elements in the array
int find_minimum(int array[], int n)
{
int i ;
int minimum ;
minimum = array[0] ;
for (i = 1 ; i < n ; i++) {
if (array[i] < minimum) {
minimum = array[i] ;
}
}
return( minimum ) ;
}
"Order of" time complexity
* Precise estimation of execution time is tedious and not always accurate
+ There are many individual operations
+ The time for each operation may not be known (e.g., time is dependent
on the code generated, the machine's instruction set, speed of the
CPU, etc.)
* Instead, assume that differences in software/hardware can be accomodated
by a small constant factor
* Big-O notation for functions/algorithms disregards constant factors
(multiplicative constants)
+ We need to relate two functions, F(N) and G(N), such that F(N) is
bounded by G(N) times some constant
+ N is a measure of problem size
+ The relationship should hold for sufficiently large values of N
+ It reflects the dominant term in a formula/function; It is the term
that grows the fastest
* Formal definition
+ "F(N) = O( G(N) )" means that there are positive constants c and k
such that 0 <= F(N) <= c*G(N) for all N >= k
+ c and k cannot depend upon N
+ F(N) is said to be "order G(N)"
Order-of by increasing growth rate
----------------------------------
Term Notation When the problem size is doubled...
----------- ----------- ---------------------------------------------
Constant O(1) Execution time does not increase
Logarithmic O(log n) Execution time increases by a fixed amount
Log-squared O(log^2 N)
Linear O(n) Execution time is doubled
Supralinear O(n log n)
Quadratic O(n^2) Execution time is quadrupled
Polynomial O(n^k) k>=1
Exponential O(a^n) a>=1
Factorial O(n!)
Limitations of Big-O analysis
* N must be sufficiently large
+ Performance of a "poor" algorithm may be better than a "good"
algorithm for small N
+ A polynomial time algorithm is always prefered over an exponential
time algorithm
* The constant c may itself be large