Introduction to Big-O

Asymptotic Time Complexity Analysis is a technique for determining the performance of an algorithm as the amount of data it operates on becomes infinitely large.

Introduction

Understanding the performance of our software is an important factor in the design
If the algorithm is "too slow," the software could be of little value
We have two general ways to assess the performance of an algorithm
1. Benchmarking
2. Asymptotic Time Complexity Analysis

Benchmarking

Here we time how long it takes the algorithm to run on a specific hardware with a specific data set.
What advantages and disadvantages can you think of for such a process?

Asymptotic Time Complexity Analysis

Since asymptotic time complexity analysis is a mouthful, we will often just call this Big-O analysis.
Here the goal is to determine a function, \( T(n) \), that describes the time for the algorithm to complete as a function of the input data set.
In particularly, we are interested in the rate of growth of this function.
This can be very tedious, so the goal is to categorize the rate of growth in one of several categories: constant, logarithmic, linear, quadratic, etc...

T(n) for algorithm on an array

Let's analyze the following algorithm:

Source code for displaying odd numbers in an array

Suppose that it takes:

\( c_1 \) microseconds to call displayOdds() and pass the argument
\( c_2 \) microseconds to assign a value to an integer
\( c_3 \) microseconds to compare two integers (less than)
\( c_4 \) microseconds to determine the length of an array
\( c_5 \) microseconds to increment an integer
\( c_6 \) microseconds to get a value from an array of integers
\( c_7 \) microseconds to calculate the modulus of an integer
\( c_8 \) microseconds to compare two integers (equality)
\( c_9 \) microseconds to print an integer

Let \( n \) be the length of the array. Then:

\( c_1 \) and \( c_2 \) only run once.
\( c_3 \) and \( c_4 \) will run \( n + 1 \) times
\( c_5 \), \( c_6 \) (in the conditional), \( c_7 \), and \( c_8 \) run \( n \) times
\( c_9 \) and \( c_6 \) (in the println call) run once for each odd number in nums

In general, we will be interested in the run time in the worst case; however, there are times when understanding the run time for the best-case or average-case are useful.¹

In the worst case, all of the numbers are odd. In that case, we can calculate \( T(n) \) as: \[ T(n) = c_1 + c_2 + (c_3 + c_4) \times (n + 1) + (c_5 + c_6 + c_7 + c_8) \times n + (c_9 + c_6) \times n \]

which can be rewritten as:

\[ T(n) = c_1 + c_2 + c_3 + c_4 + (c_3 + c_4 + c_5 + 2 c_6 + c_7 + c_8 + c_9) \times n \]

or:

\[ T(n) = a + bn \]

where \( a = c_1 + c_2 + c_3 + c_4 \) and \( b = c_3 + c_4 + c_5 + 2 c_6 + c_7 + c_8 + c_9 \).

Big-O Notation

We say that \( T(n) = O(f(n)) \)² if there exist two positive constants, \( n_0 \) and \( c \), and a function, \( f(n) \), such that:

\[ T(n) \leq c f(n) \]

for all \( n > n_0 \).

Essentially this says that as \( n \) goes towards infinity, \( c f(n) \) is always larger than \( T(n) \).
We say that \( T(n) \) is bounded above by \( c f(n) \) as \( n \) gets large.
As a result, if we have two different algorithms that are both \( O(n) \), we know that, as \( \lim_{n\to\infty} \), their performance with differ by no more than a constant value.

Determining f(n) based on T(n)

Since Big-O notation relies on analysis as \( n\to\infty \), we will only care about the term that grows most rapidly as \( n\to\infty \). Some examples: \[ a = O(1) \] \[ a + b n = O(n) \] \[ a + b n + c n^2 = O(n^2) \] \[ a + b \log n = O(\log n) \]

Note: when using Big-O notation, we will always simplify \( f(n) \) as much as possible. As a result, you should never say \( O(2n) \) or \(O(1.8 + 3 n) \).

Growth Rates

\( O(1) \) — Constant
\( O(\log n) \) — Logarithmic
\( O(n) \) — Linear
\( O(n \log n) \) — Log-Linear
\( O(n^2) \) — Quadratic
\( O(n^3) \) — Cubic
\( O(2^n) \) — Exponential

T(n) for algorithm on a `List`

Hopefully it is now clear that the displayOdds() method above is \( O(n) \) since any value of \( c \) that is slightly larger than \( b \) so that \( c n > b n \) for large values of \( n \).

Now suppose displayOdds() is modified as follows:

public static void displayOdds(List<Integer> nums) {
    for (int i = 0; i < nums.size(); ++i) {
        if (nums.get(i) % 2 == 1) {
            System.out.println(nums.get(i));
        }
    }
}

How does the analysis change?

It depends on the performance of nums.size() and nums.get(). If these are both \( O(1) \) algorithms, then \( T(n) = a + b n \) where \( a \) and \( b \) differ since the actual time to do nums.size() on a list is different than the time to do nums.length on an array... similarly for nums.get(i) versus nums[i].

T(n) for algorithm on an `ArrayList`

If an ArrayList<Integer> is passed to displayOdds(), the time complexity remains \( O(n) \) because nums.size() is \( O(1) \) and nums.get() is \( O(1) \).

T(n) when `nums.size()` is not constant

Suppose the time it takes for nums.size() to complete is \( O(n) \). How does the time complexity for displayOdds() change?

Instead of nums.length taking \( c_4 \) amount of time, calling nums.size() now requires \( T(n)_{size} = \alpha + \beta n \) amount of time (since it is \( O(n) \)). Replacing \( c_4 \) in our equation for \( T(n) \) results in:

\[ T(n) = c_1 + c_2 + c_3 + c_4 + (c_3 + \alpha + \beta n + c_5 + 2 c_6 + c_7 + c_8 + c_9) \times n \]

which simplifies to:

\[ T(n) = c_1 + c_2 + c_3 + c_4 + (c_3 + \alpha + c_5 + 2 c_6 + c_7 + c_8 + c_9) \times n + \beta n^2 \]

or:

\[ T(n) = a + b'n + \beta n^2 \]

where \( a \) is the same as above, \( b' = c_3 + \alpha + c_5 + 2 c_6 + c_7 + c_8 + c_9 \).

As a result, \( T(n) = O(n^2) \).

How do things change if nums.get() was \( O(n) \)?

Actually, the average-case is very relevant. However, doing that analysis is more complicated, so we will leave that for a different course.

This is really an abuse of mathematical notation. What we are really saying here is \( T(n) \in O(f(n)) \). That is, \( T(n) \) is in the set of functions for which we can find positive constants, \( n_0 \) and \( c \) such that \( T(n) \leq c f(n) \).