1  Computer Programming

I’ll begin this book by covering the basics of computer programming in python. I’ll also talk a little bit about algorithms and how to measure their performance, and I’ll give a high-level overview of how a computer works. The goal is to get you proficient enough to be able to follow along in the rest of the book, and to be able to hopefully write simple python notebooks and scripts yourself. I’ll generally assume that the reader has at some point learned the basics of how to code, and has learned math up to at least a late high school level. Even if you’re rusty, you should be able to follow along if you make an effort.

First, a word on the book itself. You’re reading a book that’s been completely written using Jupyter Notebook and rendered using Quarto. Notebooks are a fantastic place to do experimental coding like we often do in machine learning. They’re also, of course, fantastic for writing reports or books that contain substantial code. All of my lessons are notebooks that you should be able to find in the book’s GitHub repository. Each notebook is fully executable, so if you like you can follow along this book using the notebooks, executing each line of code along the way, or even experimenting with ideas yourself. For a quick and basic introduction to Jupyter Notebook, see this getting started guide.

Tip: If you are reading this lesson using in Jupyter Notebook, note that each notebook is a series of cells, each of which can be executed or run. You can run a cell either by pressing SHIFT-ENTER, or by clicking the Run in the toolbar. Notebook cells will execute in the order you run them, even if they’re not in order on the page. Be careful with this. When in doubt, click on Kernel in the toolbar, and then Restart & Clear Output in the dropdown menu to reset the session and clear any output being rendered. This will reset the state so you can start over. Note that this notebook and any other in this book was structured in a way that’s designed to be run in linear order from top to bottom.

1.1 Computer Organization

Before diving into python it’s useful to briefly give a high-level overview of how a computer works. At a high level, a computer has a few fundamental components. I’ll just very briefly mention what these are.

The Processor

The processor is the component of the computer that actually performs computations. Traditionally computers had only a single processor known as a central processing unit or CPU. The CPU was where all computations were performed. Until 20 or so years ago, the CPU was single-core, meaning it could only perform a single operation at a time. In the past 20 years though CPUs have become multi-core, which means they’re capable of executing operations in parallel. That is, multiple operations can be executed at the same time, one on each core. Nowadays, each core executes operations at about 3.2 GigaHertz, which just means that in one second a given core can execute 3.2 billion cycles or operations.

Another important trend of the past 30 years or so is the use of specialty processors for performing specific tasks. The most important of these is the graphics processing unit or GPU. As the name suggests, the GPU was originally intended to be used to accelerate computer graphics. However, over the past 10-15 years machine learning researchers have realized that the GPU is also very good at performing machine learning computations as well. Since then, we frequently find ourselves training models on the GPU, and increasingly running inference as well. Variants of the GPU have recently been created specifically for doing machine learning operations. The most well-known is the tensor processing unit or TPU from Google, though there are many others as well.

Memory

Memory is the way that a computer stores data. Depending on what the data is being stored for, there are different components of memory that each tradeoff in terms of the amount of data that can be stored and the speed with which data can be fetched from memory (called latency or access time). The components of memory make up what’s called the memory hierarchy.

At the top of the hierarchy is what’s called register memory. This is where the processor stores the data it needs to access while performing computations. It’s fast, but not very big. The next level down is called cache memory. Cache is where data gets stored that needs to be quickly accessed, but is too big to fit in the registers. Strictly speaking there are different levels of caches, but I won’t dwell on this. The next level down memory hierarchy is called main memory or RAM (for random access memory). RAM is where all the data being actively used by the computer is stored, when it’s too big to fit in register or cache. The last level, the base of the hierarchy is called disk. This is where persistent data gets stored, i.e. data that’s meant to persist even if the computer is shut down. It used to be that disk memory was literally a disk with a sort of record-player dial that spun around to fetch a particular piece of data. Nowadays though disk memory is usually a solid state drive or SSD. For our purposes this doesn’t really mean that much, except that accessing data from disk is faster than it used to be. The last level of the hierarchy nowadays is the cloud. This is remote storage. Data is hosted on a completely different computer, called a server, that the user accesses via an internet connection. Cloud storage is where we’d store data too big to fit on a local computer. Typically large data on the cloud is stored and accessed via buckets.

I/O

The last major component of a computer is I/O, a long-standing abbreviation for in-out. I/O is where all the interfacing between the user and the computer happens. The data printed to the screen is I/O. The keyboard is also I/O, since the user is feeding information into the computer. So are rendered images and video, audio, printers, FAX, scanners, copiers, etc. For our purposes, the only thing to note about I/O is that it’s slow. In programs where speed matters, you should try to minimizes the amount you print. That said, I/O is essential for debugging and experimental development, so we don’t want to completely avoid it.

Computer Program

Strictly speaking a computer program isn’t a component of a computer. It’s just some data that tells the computer to execute some sequence of instructions. At the lowest level a computer executes instructions in machine code. Executable code is just the raw bits that the computer directly executes on the hardware. One level up from executable code is what’s called assembly code. Assembly code is a somewhat more user friendly way to write machine code. It’s still very low level though and not very user friendly relative to say python. One level up from assembly is compiled code. Compiled code includes programming languages like C, C++, or Fortran whose code gets directly compiled down to machine code and executed all at once as an executable file. One level higher than compiled code is where python sits. This is called interpreted code. Interpretable code includes programming languages like python, Java, and Javascript. With interpreted code, instead of the program getting compiled and executed all at once, it’s allowed to execute line-by-line (like how we’re executing code in a Jupyter notebook).

A computer program consists of instructions to be executed, written in some programming language that the computer can recognize via some compiler or interpreter. The program’s instructions are stored in a block of memory known as a heap. When a program is executed it also has more memory allocated for executing things like functions, called a stack. Each time a function, module, or class is called, the program builds a stack to execute that object and return the output to the main programming environment. The operating system only allocates a fixed amount of memory for a given program. If the program tries to use more memory than it’s been allocated it’ll result in what’s called a stack overflow. These usually result from things like infinite loops or recursions that the user forgot to check test for.

Operating System

The fundamental set of computer programs that make a computer function is called its operating system. Nowadays the three main types of operating systems are Linux, Mac, and Windows. Each is built in its own particular way, but at a high level they all perform the same operations. Most importantly is the management and execution of processes, which are essentially computer programs plus their allocated blocks of memory. The operating system is what decides which processes get executed, and in which order. Usually those processes are executed concurrently, which allows the operating system to simultaneously keep many processes going at once. The user traditionally interacted with the operating system using a terminal, or a command line interface, though nowadays it’s more common to interact with the computer via a graphical user interface or GUI.

1.2 Python

Python is a programming language well-known for its simplicity and ease of use, which makes it highly popular among programmers. Python also has extensive library support for numerical computing, data science, and machine learning tasks. All of this together makes python by far the most popular programming language in machine learning and the data sciences in general. As of this writing there is no language that even comes close to matching python in the machine learning field in terms of popularity and general capabilities.

Python is a high-level programming language. This essentially means it supports many layers of abstraction, allowing programmers to not have to worry about many of the low-level details of programming that other lower-level programming languages might require you to keep track of. Python is an interpreted language, which essentially means we can execute code line-by-line instead of having to wait until we’re finished and then compiling the entire file into an executable to run. This makes it easy experiment with code line-by-line. The trade-off, however, is that python can be slower than compiled languages, sometimes 10-100 times slower. We’ll talk about ways to address this trade-off in future lessons, particularly by using libraries that automatically run code in lower-level languages for us. For most of this book though we’ll treat the interpreted nature of python as a blessing.

The most basic operations python can perform is arithmetic. At its most basic level the language can be used as a calculator. For example, if you wanted to calculate 1.75 + 3.5 * 4, you’d just type it into a cell and run it directly, e.g. by pressing SHIFT-ENTER.

1.75 + 3.5 * 4
15.75

All of the usual arithmetic operations from a scientific calculator carry over to python as well. This even includes things like parenthesis, exponents, and order of operations. For example, we could calculate something like (4 - 2) / 7.3 * 5 directly. This will execute the statement (4 - 2) in parentheses first, then the multiplication by 5, and finally the division by 7.3.

5 * (4 - 2) / 7.3
1.36986301369863

Python also supports several arithmetic operations that are less commonly found on a calculator. For example, we can integer divide two numbers to get the ratio of two numbers rounded down to the nearest whole number. Regular division is done using the / operator, while integer division is done using the // operator. For example, 5 / 2 is just \(5 \div 2 = 2.5\), while 5 // 2 is division rounded down to the nearest integer, \(5 \ // \ 2 = 2\).

5 / 2
2.5
5 // 2
2

We can calculate exponents too, not just integer exponents, but fractional exponents as well. In python, exponentiation is done using the ** operator (not ^ as you might be used to). For example 2**3 is just \(2^3 = 8\), while 9**(1/2) is the square root of \(9\), i.e. \(\sqrt{9} = 9^{1/2} = 3\).

2**3
8
9**(1/2)
3.0

Note that python returned 3.0 instead of 3. Mathematically this doesn’t make a difference, but it does change the type of the number from an integer, or int, to a floating point number, or float. We can get the type of a number or any other variable by using the type function like so.

type(3)
int
type(3.0)
float

Each line of code I’ve run is called a statement. Usually, running a statement won’t automatically print it so we can see it. We normally have to wrap a statement inside of the print function to print it out. The reason we haven’t had to do this so far is because Jupyter automatically prints the output of the last statement in a cell, unless you tell it not to. If we wanted to explicitly print out, say, 3.0 we’d just type print(3.0).

print(3.0)
3.0

Numbers like the integer 3 or the float 3.0 are called data structures or data types. Each data structure has a type, and each type has its own properties that govern how the data structure behaves. We’ve seen two types of data structures already, int and float. There are many others too. One important data structure is a string, or str. A string is a data structure for storing text. To make a string, type whatever text you want and place it inside of quotation marks. Python allows both single quotes ' and double quotes " to specify a string. You just have to be consistent about it. You can’t start a string with ' and end it with ", for example. Here’s an example of a string that contains the text Hello!.

"hello"
'hello'
'hello'
'hello'
type('hello')
str

Another important data structure is a boolean, or bool. A boolean is any statement that can be True or False. Mathematical comparison operators like \(<\), \(>\), \(\leq\), \(\geq\), or \(=\) are all boolean. For example, \(3 = 7\) is either a true statement or a false statement. In python, these comparison operators are denoted by the symbols <, >, <=, >=, and == respectively. Notice how == is used for equality, not =, which means something completely different in python.

3 == 7
False
3 <= 7
True
type(3 == 7)
bool

Just as numbers can be composed together to get other numbers using arithmetic operators like \(+, -, \times, \div\), booleans can be composed to get other booleans using logical operators like AND, OR, and NOT. Each logic operator obeys the standard rules of logic. For example, if \(A\) is true and \(B\) is false, then \(A\) AND \(B\) is false, but \(A\) OR \(B\) is true. If \(A\) is true, then NOT \(A\) is false. In python, we’d use and, or, and not respectively for these operators.

To see a specific example, suppose \(A\) is the statement (3 == 7), and \(B\) is the statement (3 <= 7). We already saw that \(A\) in this case is true, but \(B\) is false. This means \(A\) AND \(B\) should be false, and \(A\) OR \(B\) should be true. Since \(A\) is true, NOT \(A\) should be false. And since \(B\) is false, NOT \(B\) should be true. Let’s make sure.

(3 < 7) and (3 == 7)
False
(3 < 7) or (3 == 7)
True
not (3 < 7)
False
not (3 == 7)
True

We don’t like to have to keep copying and pasting statements like this to use them later on. It’s convenient to assign them to a variable and carry around the variable instead. A variable is just a placeholder that stands for the object it’s referencing. To assign a value to a variable, we use the assignment operator =. This is why we can’t use = for equality. It represent variable assignment, which means something completely different under the hood.

We can use any alpha-numeric symbol to represent variable name as long as it doesn’t start with a number. For example, A, B are fine. So are foo or bar or foo_bar. Note that python is case-sensitive. This means that a variable named a will not be the same as a variable named A.

As an example, let’s assign (3 < 7) to the variable A and (3 == 7) to the variable B and verify that the same logical operations using A and B.

A = (3 < 7)
B = (3 == 7)
A and B
False
A or B
True
not B
True

As long as we don’t change the value of a variable it’ll always reference whatever value it was originally assigned. If we re-assign the variable name to another value though, it’ll reference that new value instead. For example, if we assign x = 3, and then after assign x = 'foo', then x will there after be a string containing the words foo.

x = 3
type(x)
int
x = 'foo'
type(x)
str

Another useful data structure in python is the list, or list. A list is an array, i.e. a collection of ordered objects or elements. Those objects can be any valid python data structure. To create a list in python we’d just wrap the objects inside of closing brackets. For example, to create a list containing the objects 1, 3.1415, and 'pi', use [1, 3.1415, 'pi']. The length of a list is the number of objects it contains. For example, the list [1, 3.1415, 'pi'] contains 3 elements. We can get the length of an array by using the len function on the list.

x = [1, 3.1415, 'pi']
print(x)
[1, 3.1415, 'pi']
type(x)
list
len(x)
3

Notice that the elements of a list need not even be of the same type. They can be essentially anything. We can even put lists inside of lists. For example, the list x below contains 3 elements, 1, 2, and the list [3, 4]. This means that the length of this list is 3, not 4 as you might expect. Be careful.

x = [1, 2, [3, 4]]
print(x)
[1, 2, [3, 4]]
len(x)
3

To access the elements inside of a list we can use indexing. An index is a count from left to right of where in the list a given element is, starting from zero. Counting elements starting from zero is called zero-indexing. Not all programming languages start at zero, but most do. Be careful with this. Your mind naturally wants to count from one. It takes some getting used to. For example, the left-most element of a list has an index 0. The next one over has an index 1. And so on until we get to the end of a list. To get the value of a list at a given index, just pass the index in using brackets. For example, the element of x at index 0 is x[0], which is 1.

If a list has \(n\) total elements, its last index will be \(n-1\) since we’re counting from zero. In theory, this means we’d have to type something like x[len(x) - 1] to get the last element of a list. But python has a nice shortcut called negative indexing. If we index a list with a negative number, python counts starting from the end of the list. For example, the last element can be gotten using x[-1], the second-to-last with x[-2], and so on.

x[0]
1
x[1]
2
x[-1]
[3, 4]

What if we wanted to pick out not just one element, but a subsequence of the list. For example, we might want the subsequence [0, 1] from x. We can do this using slicing. Slicing is a form of indexing where you specify both the start and end indices of the elements you want. Slicing is done using the : operator.

For example, to get the [0, 1] subsequence from x, we’d slice x to get x[0: 2]. This tells python to return the list whose elements run from x[0] to x[2], i.e. the list [0, 1]. We could also use x[0: -1], which tells python to return every element in x except for the last element x[-1].

x[0: 2]
[1, 2]
x[0: -1]
[1, 2]

We can add elements to a list we’ve already defined two ways. One way is to use the operator + to concatenate two lists together from left to right. Notice how + here is being used differently than simple addition. This is called operator overloading. Depending on what data structures we’re using, operators can mean completely different things. For example, to concatenate the lists [1, 2] and [3, 4] from left to right, we’d use [1, 2] + [3, 4], which will return a new list [1, 2, 3, 4].

Another way to add an element to a list is to use the append method. A method is an operation attached to a data structure. For example, to append 'foo' to the list x, we’d use x.append('foo'). This will add 'foo' to the end of x. Note append is an in-place operation, which means it doesn’t return any new data as output. It updates x automatically under the hood, without creating a new variable.

[1, 2] + [3, 4]
[1, 2, 3, 4]
x = [1, 2, 3]
x.append(4)
print(x)
[1, 2, 3, 4]

Lists have the property of being mutable. This means we can update the elements of an already-defined list at any time. For example, we can find an element at a specific index in the list and change it to something else. This is usually a nice property to have, but sometimes it’s not. If we instead want a list that we can’t change once it’s defined, we’d use a related data structure called a tuple instead. A tuple looks exactly like a list, except it’s wrapped using parentheses instead of brackets. We can index into a tuple or find its length, but we can’t change elements or add new elements into it.

x = (1, 2, 3, 4)
print(x)
(1, 2, 3, 4)
type(x)
tuple
x[0]
1

If we like, we can always re-cast a list as a tuple or vice versa by using the list or tuple functions. Each data structure has a function for re-casting like this, though they may not always do what you think they’ll do. For example, if we use int on 3.9, it’ll return the integer 3. It won’t round up. If you want to round a number up to the nearest integer, use round instead.

x = [1, 2, 3]
y = tuple(x)
print(y)
(1, 2, 3)
int(3.9)
3
round(3.9)
4

Another important data structure is called a dictionary, or dict, sometimes also called a hash table in other languages. A dictionary behaves a lot like a list, except it allows us to use arbitrary indexes instead of having to count left to right from zero. These arbitrary indexes are called keys, and the elements they reference are called values. A key can be any non-mutable data structure, like an int, float, string, or even tuple. A value can be any object, just like with a list.

To define a dictionary in python, wrap the keys and values inside of braces. Instead of just specifying the elements, we have to specify both the keys and values using key: value notation for each one. For example, suppose we wanted to create a dictionary where key 'a' has value 1, key 'b' has value 2, and key 'c' has value 3. Then we’d write {'a': 1, 'b': 2, 'a': 3}. Note that dictionaries, like lists are mutable, which means we can always set a key to a new value, or add new key-value pairs.

Indexing into a dictionary works the same as usual, except we’d index a value with its key, not its count. Note each key needs to be distinct. If you try to use the same key for multiple values, python will set that key to refer to the last value you set it to. Dictionaries also support the length operation. With dictionaries, len tells you how many key-value pairs the dictionary contains. For the previous example, that would be 3.

x = {'a': 1, 'b': 2, 'c': 3}
print(x)
{'a': 1, 'b': 2, 'c': 3}
type(x)
dict
len(x)
3
x['a']
1

The final python data structure I’ll mention is kind of trivial. It’s called None. None is often used as a null value or a placeholder for something else. One place None is useful is to define a list of specified size to be filled in with values later on. Sometimes you want to pre-define a list so that python can tell the hardware to allocate a certain amount of memory to the list, which can be useful for performance reasons. To pre-define a list of size n, where each element is initialized to None, we can use [None] * n. Notice again that we’re operator overloading, this time the * operator. When used with a list, * means to copy the elements in a list n times.

x = [None] * 10
print(x)
[None, None, None, None, None, None, None, None, None, None]

1.3 Conditionals Statements

Suppose we only wanted to execute a statement in some cases but not others. These are called conditional statements. Conditional statements are frequently formulated something like “if x is true then do y”. For example, suppose we had some whole number \(x\) and wanted to know if it was even or odd. If \(x\) is a multiple of \(2\) it’s even. Otherwise it’s odd.

We can check \(x\) is a multiple of \(2\) by seeing if its remainder when divided by \(2\) is either \(0\) or \(1\). If the remainder is \(0\) then \(x\) is even. If the remainder is \(1\) then \(x\) is odd. In python, we can calculate the remainder of \(x \div y\) using the modulo or % operator. For example, if x = 5, then x % 2 is 1, meaning x is odd. If x = 6, then x % 2 is 0, meaning x is even.

x = 5
x % 2
1
x = 6
x % 2
0

Anyway, suppose we didn’t know what x was and wanted to figure out if it was even or odd. We could define a boolean variable called is_even that we’ll set to True if x is even, and False if x is odd. As a conditional statement, this might look like the following,

if x % 2 == 0 then set is_even = True
if x % 2 == 1 then set is_even = False

To create a conditional statement in python we’d first create an if statement in the format if condition:. Then below that line, we’d indent over exactly four spaces (or one tab) and type what it is we want to happen if condition is true.

if condition:
    do_something

We’d do this for each conditional statement we want to check. Note that only the conditions indented get executed when the statement is true. Once you indent back out the statement will execute whether the statement is true or not. This means python is sensitive to indentation, since indentation refers to being inside of a block.

Here’s what an if statement might look like for checking whether a number x is even. In that case, condition would be a boolean, either x % 2 == 0 or x % 2 == 1. Let’s see how this works for x = 5, and then x = 6.

x = 5

if x % 2 == 0:
    is_even = True
if x % 2 == 1:
    is_even = False
    
print(is_even)
False
x = 6

if x % 2 == 0:
    is_even = True
if x % 2 == 1:
    is_even = False
    
print(is_even)
True

Sometimes we might want to execute a chain of conditional statements. We might want to execute the second conditional only if the first one is true, the third conditional only if both the first and second are true, and the fourth condition if none of the other conditions are true. We can do this by using an if else if else style conditional. In this kind of statement there are three different conditionals, an if statement, an else if statement, and an else statement. The else if statement only gets executed if the if statement is false, and the else statement only gets executed if both statements are false.

For example, suppose we wanted to increment some number x by one, but only if x is less than some other variable y. But if x is greater than y, we want to increment y by one instead. Finally, if x equals y, we want to reset both values to zero. As a conditional statement this might look as follows:

if x < y then increment x to x + 1
else if x > y then increment y to y + 1
else set x = 0 and y = 0

In python, to execute a complex conditional like this we just need to chain them in order. An else if condition is specified using elif in python, and an else condition is specified using else. The chain would look something like this.

if condition:
    do_something
elif other_condition:
    do_something_else
else last_condition:
    do_something_else_again

Here’s what it looks like for the previous example. I’ll first take x = 5 and y = 10, which should trigger the if condition and increment x to 6. Then I’ll take x = 7 and y = 4, which should trigger the else if condition and increment y to 5. Finally, I’ll take x = 1 and y = 1, which should trigger the else condition and set both to 0. To do the incrementing I’m going to use the shorthand operator +=. Writing x += 1 means exactly the same thing as x = x + 1. Both increment x by one.

I’m also going to use a format string to print the output. A format string is a way to print the output of a variable inside a string. They’re just strings with an f placed before the left quote. To pass a variable x into a format string we just use {x} at the place in the string we want x to render. Format strings are useful for doing nice printing. Instead of just typing print(x), we can type something like print(f'x = {x}') to explicitly make it clear in the output what we’re printing. It’s more helpful for readers this way since they don’t have to try to figure out what it is you’re printing.

x = 5
y = 10

if x < y:
    print('the if got executed')
    x += 1
elif y < x:
    print('the elif got executed')
    y += 1
else:
    print('the else got executed')
    x = 0
    y = 0
    
print(f'x = {x}')
print(f'y = {y}')
the if got executed
x = 6
y = 10
x = 7
y = 4

if x < y:
    print('the if got executed')
    x += 1
elif y < x:
    print('the elif got executed')
    y += 1
else:
    print('the else got executed')
    x = 0
    y = 0
    
print(f'x = {x}')
print(f'y = {y}')
the elif got executed
x = 7
y = 5
x = 1
y = 1

if x < y:
    print('the if got executed')
    x += 1
elif y < x:
    print('the elif got executed')
    y += 1
else:
    print('the else got executed')
    x = 0
    y = 0
    
print(f'x = {x}')
print(f'y = {y}')
the else got executed
x = 0
y = 0

It’s kind of annoying to write these complex statements out like this every time. When we’re working with simple conditionals, python has a one-line way to write them called ternary statements. These are useful when we want to set a variable if a condition is true, otherwise set it to something else if the condition is false. Instead of writing this:

if condition:
    x = foo
else:
    x = bar

We can write the following one-liner to define x:

x = foo if condition else false

It may seem confusing to write a statement this way if you’re not used to it, but it’s actually more readable if you try to read the statement literally from left to right. You’ll get used to it. As an example, we could use a ternary statement like this to write the is_even example using one line. Here’s what it’d look like for x = 6.

x = 6

is_even = True if (x % 2 == 0) else False

print(is_even)
True

1.4 Loops

Let’s go back to lists. Suppose we had a large list of numbers of arbitrary length, and we needed to find the largest element in the list. How would we go about doing that? With what we’ve covered so far, we’d literally have to index into every single element one-by-one, print them all out, and then manually find the largest number. That might be fine if the list contains like ten numbers. But suppose it contained a million numbers. We clearly need a smarter way to proceed. This brings us to the topic of loops. A loop is a systematic way of iterating over some iterable data structure. To iterate over the elements of a list array in python, we can use the following syntax:

for x in array:
    do_something

This is called a for loop. What it’ll do is pick out each element x in the list from left to right, and with that element it’ll execute do_something. For example, doing something could just mean printing out x. Here’s an example. I’ll iterate over all the whole numbers from \(0\) to \(9\) and print them out.

array = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

for x in array:
    print(x)
0
1
2
3
4
5
6
7
8
9

In python, a convenient way to create an array like this is to use the range function. Writing range(10) would create a list of all the whole numbers from \(0\) to \(10-1=9\). Strictly speaking, range(10) is not a list but an iterator. That just means it doesn’t store the elements in the list directly, but iterates over them. When executing a loop this doesn’t really matter since we’re iterating one-by-one anyway. If you need a range object as an explicit list, just re-cast it the usual way using list(range(10)).

Range iterators can do more interesting things too. If we wanted to iterate starting from some other number, say 5, we could just use range(5, 10) instead, which will create an iterator of whole numbers from \(5\) to \(9\) instead. We can also skip over numbers in the sequence too. For example, if we only wanted the even numbers from \(-2\) to \(10\), we could use range(-2, 10, 2). The 2 on the right says to start from -2, counting to 10-1 in steps of 2.

for x in range(10):
    print(x)
0
1
2
3
4
5
6
7
8
9
print(range(10))
print(list(range(10)))
range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print(list(range(5, 10)))
print(list(range(-2, 10, 2)))
[5, 6, 7, 8, 9]
[-2, 0, 2, 4, 6, 8]

Back to our original loop example. Suppose we had a large list of numbers and wanted to find the largest one. How could we do this using a for loop? Well, what we can do is loop through the list, using another variable largest to keep track of the largest element we’ve seen so far. To do that we’ll need to use an if statement inside the loop to check whether largest is larger than the current value or not. Once we get to the end of the list, largest will be the largest in the entire list. For this to work, we have to pre-initialize largest to some value smaller than any element in the list. A reasonable choice would be negative infinity, which in python is denoted by float('-inf').

Here’s an example, using the list [5, 2, 9, 6, 0, 1, 8]. The largest element should be 9 in this case. Note that python actually has a builtin function to find the largest element in a list, called max. We could just use that instead and save having to write out the loop. Similarly, we can get the smallest element in a list using the min function.

array = [5, 2, 9, 6, 0, 1, 8]

largest = float('-inf')
for x in array:
    if x > largest:
        largest = x
        
print(largest)
9
print(max(array))
print(min(array))
9
0

We can loop over other iterable data structures as well, like tuples and dictionaries. When looping over a dictionary, note that the loop iterates over the keys, not the values. Since keys aren’t really ordered, you can’t really predict what order the dictionary will iterate in either.

We can have loops inside of loops as well. One for loop inside another for loop is called a double loop. Double loops might be used, for example, when you have to iterate over a list of lists. For example, suppose we needed to iterate over the following list and find its largest element. Each element of array is a list itself. To access the actual values in the array we’d need to index twice. For example, to get element 3 from list 2, we’d index using array[2][3].

array = [
    [1, 0, 1, 0, 1],
    [1, 2, 1, 2, 1],
    [1, 3, 1, 3, 1],
    [1, 4, 1, 4, 1]
]
print(array)
[[1, 0, 1, 0, 1], [1, 2, 1, 2, 1], [1, 3, 1, 3, 1], [1, 4, 1, 4, 1]]
array[2][3]
3

Suppose we wanted to find the largest element in this array. How could we do it programmatically? What we could do is use a double loop, with the outer loop running over the inner lists, and the inner loop running over the elements of each list. We’d again need to keep a largest variable. Note that using max won’t work as is for a list of lists like this. It’ll just return one of the inner lists, not a number.

largest = float('-inf')
for row in array:
    for x in row:
        if x > largest:
            largest = x

print(largest)
4

Suppose we had a list array, and wanted to create another list from array whose elements are squares of the elements in array. One way to do this would be to use a for loop like this. We’d initialize an empty array squares, loop over array, and append the square of each x to squares. This will work fine, creating the array [1, 4, 9, 16, 25].

array = [1, 2, 3, 4, 5]

squares = []
for x in array:
    x_sq = x ** 2
    squares.append(x_sq)
    
print(squares)
[1, 4, 9, 16, 25]

An even simpler way to create squares is to use a list comprehension. A list comprehension is a python shorthand syntax for creating a list by iterating over another list. We can fill in squares with the same numbers using the following single line of code:

squares = [x**2 for x in array]

Notice that all we’re essentially doing is doing the loop over array inside of the list definition for squares. This allows us to cut four lines of code down to one. It’s also more readable, once you get used to it at least. Note that list comprehensions only work when it’s simple to define squares from array. If the operation is too complex you’re stuck writing out the full loop. You’d be surprised how much we can do with list comprehensions though. We can do double loops inside of a list comprehension, and even ternary statements.

squares = [x**2 for x in array]
print(squares)
[1, 4, 9, 16, 25]

I’ll just briefly mention one other type of loop that occasionally shows up, called a while loop. A for loop just iterates over a list or some iterable. A while loop keeps iterating as long as some condition is true, whether there’s an iterable involved or not. The syntax of a while loop in python is:

while condition:
    do_something

Here’s a simple example of this. I’ll define a while loop that continues as long as some variable x is less than 5. Each step of the loop will increment x by one. In order to use a while loop like this we first have to make sure to initialize x to some value, otherwise x will be undefined inside the loop when we try to increment it. I’ll initialize x to 0 in this example.

x = 0
while x < 5:
    x += 1
    
print(x)
5

1.5 Functions

Very often we find ourselves needing to perform the same set of computations over and over. It’s convenient therefore to be able to package re-used computations up in a way that makes them easier to reuse later on. The most common way to do this is using a function. A computational function is an abstraction for mapping inputs to some desired outputs using a sequence of computational steps that get abstracted away from the user.

It’s hard to underestimate how important functions are to programming. They allow us to abstract away many repetitive, complex tasks so we can focus on other things. Any time you find yourself reusing the same code over and over, rather than copy and paste it you should consider defining a function to execute that code for you. Doing this well is more an art than a science. It’s in essence the bread and butter of software engineering.

In python, a function is defined by specifying a function signature that states the name of the function and the inputs it takes in. Once the computation inside a function gets performed, its output gets returned via a return statement. Here’s a simple example of a function called function that takes in two inputs foo and bar, and returns an output output

def function(foo, bar):
    do_something
    return output

Once the function has been defined, it can be called and executed later on using a command like

output = function(foo, bar)

You can place pretty much any computation you like inside a function. Only the inputs and outputs get exposed to the user when the function is used. All of the inner computation gets abstracted away. The variables that get used inside the function are all local variables unless specified otherwise. That means they won’t be available outside the function. They’re created when the function is called, and immediately destroyed when the function returns. This contrasts with global variables, which are any variables that persist after a function is called. The inputs and outputs are examples of global variables.

Let’s do an example. Suppose we got tired of writing out loops to find the largest elements in an array and wanted to package it inside a function that we can execute in one line, e.g. like the max function does. Let’s packages this sequence of steps inside a function I’ll call maximum. The function takes in a list array and returns the largest element largest.

def maximum(array):
    largest = array[0]
    for x in array:
        if x > largest:
            largest = x
    return largest
array = [5, 2, 9, 6, 0, 1, 8]
largest = maximum(array)
print(largest)
9

As written, this function as I’ve defined it only works on lists of numbers. It won’t work on lists of lists of numbers. What if we wanted to find the largest element of those? We could just write another function to flatten the list into a list of numbers, and then use the previous function on the flattened list. Here’s how we might do that. I’ll first define a function flatten that takes in a list array. Inside the function, I’ll define a new array called output to store the flattened array I’d like to return. Now, what we can do is loop over array and check if a given element x is a number or a list. If it’s a list, we’ll concatenate it to the end of output. If it’s not a list, we’ll just append it to output.

Notice something very subtle here. I’m actually calling the function flatten inside the function itself! Does this even make sense? Strangely enough it does. It’s called recursion. We can recursively call a function inside itself as long as we have some kind of condition to terminate the recursion at some point. This condition is called a base case. In this function the base case is implicit. We can’t have a list with infinitely many lists nested in. Eventually it must hit a number, which will get appended to the list instead of having to call flatten again.

In general, if you fail to specify a base case, recursion will keep going indefinitely, which will cause a stack overflow. A stack overflow results when you’ve used up all the memory your program is allowed to use, which will crash the kernel and force you to start over.

So you can see how recursion works, I’m going to define a nested list and call flatten on it, while printing the input each time the function gets called. You can see the it starts by printing the original array, but then it prints [1, 2], the first list inside of array. Since the elements in [1, 2] are just numbers we’ve hit the base case and we can move onto the next element in array, which is the list [3, [4, [5, 6]]]. The second element in this list is another list, [4, [5, 6]], so it’ll recurse again on that. And then it’ll recurse again on [5, 6]. Once it’s hit this point there are no more lists left. All the numbers have been added to output, and we can return. The final output should be the flattened array [1, 2, 3, 4, 5, 6].

def flatten(array):
    print(array)
    output = []
    for x in array:
        if isinstance(x, list):
            output += flatten(x)
        else:
            output.append(x)
    return output

array = [[1, 2], [3, [4, [5, 6]]]]
flattened = flatten(array)
[[1, 2], [3, [4, [5, 6]]]]
[1, 2]
[3, [4, [5, 6]]]
[4, [5, 6]]
[5, 6]
print(flattened)
[1, 2, 3, 4, 5, 6]

We can now use flatten combined with maximum to find the largest element in any list we like, no matter how many lists are nested inside it. Let’s test it out on the above example, whose largest element should be 6. Looks like it works fine, at least on the example tested.

maximum(flattened)
6

This doesn’t mean the function will work fine for all inputs though. You have to be very careful in making sure you explicitly lay out what the valid inputs should be as well as thinking about what edge case inputs are and whether the function gives the right value for those. One edge case for maximum might be an empty list, []. What’s the largest element in an empty list? It’s kind of an ill-defined question that we’d need to think about how to handle. An easy convention is to just return None if the input list is empty. If we do that then the function as already defined will work fine even on an empty list.

Functions can also take in keyword arguments, which are optional inputs that can be used to modify the state of the function. With a keyword argument, we’d specify a default value. When we call a function and don’t pass in that keyword argument it’ll default to that default value. Otherwise it’ll use the value of what we pass in. Here’s what the signature might look like for a function that takes a keyword argument keyword with a default value default.

def function(input, keyword=default):
    do_something
    return output

As an example, suppose we wanted to modify maximum by specifying an optional return value if_empty_return when an empty list is passed in. If no keyword is passed in, we’ll assume that if_empty_return has a default value of float('-inf'). To actually use the keyword argument as intended, we need to check whether the input array is empty, and if it is return whatever if_empty_return is set to.

I’ll test the function on two cases. The first has no keyword argument is passed in. That one should return None when an empty list is passed in. The second has a keyword if_empty_return='empty list has no elements'. This one should return the string 'empty list has no elements' when an empty list is passed in.

def maximum(array, if_empty_return=None):
    if len(array) == 0:
        return if_empty_return
    largest = array[0]
    for x in array:
        if x > largest:
            largest = x
    return largest
maximum([])
-inf
maximum([], if_empty_return='empty list has no elements')
'empty list has no elements'

Frequently we’ll find ourselves implementing functions that only perform a single line of computation. For these simple kinds of functions python has a nice syntax that we can use to implement them in one line. These are called lambda functions. Lambda functions can be defined using the following syntax:

function = lambda input: output

Calling these functions works exactly the same way as ordinary functions do: output = function(input). Lambda functions only work when the output is really simple to calculate from the input. Thankfully, in this book most functions you’ll see are simple enough to state using lambda functions. That’s because most of those functions will be mathematical functions like \(f(x) = x^2\) or \(f(x, y) = \sqrt{x + y}\). These functions are simple enough to do in one line.

Here’s an example of a function I’ll call square that takes in a number x and returns its square.

square = lambda x: x**2
square(3)
9

And here’s another function I’ll call sqrt that takes in two numbers x and y and computes the square too of their sum.

sqrt = lambda x, y: (x + y)**(1/2)
sqrt(2, 2)
2.0

There are more advanced topics related to functions as well. For example, we can imagine objects that not only execute code, but also hold data as well. These are called classes. They do indeed come up in machine learning, but I’ll defer that topic to when we need to use them. Another extension of functions is the idea of higher-level functions, which in python are called decorators. I’ll defer these for later as well.

One other extension of functions though that we will use is the idea of imports and libraries. Suppose we’ve written a bunch of functions that we’d like to use not just in one notebook, but across many notebooks. What we can do is put them all into a python script and import them into any notebook when we’d like to use them. For example, we might place the function maximum in a file utils.py. If we want to use maximum in a notebook, we’d first call import utils, and then use maximum by calling utils.maximum. Alternatively, we could import maximum directly from utils.py by calling from utils import maximum, in which case we can call maximum directly without referencing utils. We’ll see examples of this kind of thing in future lessons.

1.6 Algorithms

At their root, computer programs are about executing a given sequence of instructions. The way those instructions are written depend on the programming language, but the instructions themselves do not. A sequence of instructions for performing some given task is called an algorithm. An algorithm abstracts away the idea of what language we’re coding in, what hardware we’re using, what operating system we’re running on, and focuses only on the instructions we want to be executed. Very roughly speaking, you can think of a program or a function as an algorithm.

We’ve already seen an example of an algorithm in this lesson: Finding the maximum element in a list. In its original form, its python function looked like this:

def maximum(array):
    largest = array[0]
    for x in array:
        if x > largest:
            largest = x
    return largest

One way to think about this function is as a few lines of python code that get interpreted into machine code and then executed in hardware. Another way to think about it is as a sequence of instructions in English:

  • Take in an input list of numbers, which we’ll call array.
  • Define a variable largest to keep track of the largest element seen in the array so far. Initialize largest to the value in the first element of array.
  • Iterate through the list from left to right.
  • At each iteration, compare the current value x in the array to largest. If x is bigger than largest, update largest to have the value of x.
  • Once we’ve reached the end of the array, the value of largest will be the maximum element in the array. Return largest.

These instructions define an algorithm for finding the maximum element in a list of numbers. The instructions have nothing whatsoever to do with code. You can execute them by hand and get the maximum element too. When we talk about an algorithm, we generally care most about two things:

  1. Correctness: Is the algorithm correct? That is, for a given input, does it always return the correct output?
  2. Efficiency: Is the algorithm sufficiently fast? That is, for an input of a given size, is the algorithm optimal in the sense that it’s computing the output as efficiently as it possibly can?

To evaluate the correctness of an algorithm we have two options, a theoretical way, and a practical way. The theoretical way is to mathematically prove the algorithm gives the correct result for a given input. The practical way is to do unit testing. If an algorithm gives an incorrect output for some input we say it has a bug. To evaluate the efficiency of an algorithm we again have two options, a theoretical way, and a practical way. The theoretical way is to look at its asymptotic behavior as a function of the input size. The practical way is to do profiling.

1.6.1 Correctness

Let’s talk about correctness first. Suppose we wanted to prove that maximum is a correct algorithm. That is, for any list of numbers array containing at least one number, the output largest will always be the maximum element in the list. Let’s proceed by supposing the algorithm is not correct and seeing if it leads to a contradiction. If it does, the algorithm must be correct.

Claim: The maximum algorithm is correct for any input list of numbers containing at least one number.

Proof: Suppose maximum is not correct. That is, there is some input list array containing at least one number for which maximum does not return the largest value in array. Suppose x is the maximum element in the array, and the returned value largest is not equal to x. Since x is in the array, iterating over the array must eventually get us to x. Once we’ve reached x, we check if x > largest. If it is, we update largest to equal x. This means that largest must be at least as big as x. Can it be bigger? Well no. It’s initialized to the first element in array, and it only gets updated with other elements in the array. We’ve thus found a contradiction, since we assumed maximum did not in fact return the largest value in array. Thus, maximum must be correct for any array of numbers with at least one element. \(Q.E.D.\)

That’s the theoretical way to show an algorithm is correct. And it’s nice if we can do it. But most algorithms in the real world are complex enough that we can’t really prove it’s correct for any given input. Instead we have a practical way to test correctness, called unit testing. With unit testing, instead of trying to show an algorithm is correct for all inputs, we instead create a few test cases we’d like to test the algorithm on. That is, we create some example inputs for which we already know what the output should be, and then check that the algorithm gives the output we expect on those examples.

Suppose we wanted to use unit testing on maximum. What we’d do is define a few simple example inputs where we already know what the largest element is. The more examples the better, and the more varied and complex the examples are the better. When creating a set of examples, it’s good to think about edge cases inputs that might occur, and to test as many different edge cases as we can.

Let’s think about what these might look like for maximum. One thing we might want to do is test the algorithm on a few random input lists, where the maximum element could be anywhere in the list. Another thing we might want to think about are the edge cases. One case, for example, might be when the input list is empty. What do we return then? Another case might be when all the elements in the list are the same value, for example all zeros or ones. Another case might be when an input list is passed in that contains things other than numbers, for example strings or other lists.

I’ll write a new function called test_maximum to do this. It takes in no inputs, and returns no outputs. All it does is tests that maximum gives the right outputs on a few example input lists. To test each example I’ll use the python assert statement. The syntax for an assert statement is:

assert statement, optional_error_message

If statement is false, python will print optional_error_message and throw an assertion error. If no error message is passed it’ll just throw an assertion error with no message. If statement is true it’ll continue on to the next example. I’ll test the maximum algorithm on the following example lists: - input [5, 4, 3, 2, 1]: output 5 - input [4, 3, -5, 7, 0]: output 7 - input [1, 1, 1, 1, 1]: output 1 - input range(10000): output 9999 - input []: output None

def test_maximum():
    assert maximum([5, 4, 3, 2, 1]) == 5
    assert maximum([4, 3, -5, 7, 0]) == 7
    assert maximum([1, 1, 1, 1, 1]) == 1
    assert maximum(range(10000)) == 9999
    assert maximum([]) == None
test_maximum()
IndexError: list index out of range

It looks like this test fails on the last example, which tests that the maximum of an empty list is None. This shouldn’t be surprising. After all we assumed when defining maximum that the input was a list of numbers with at least one element. If we did want an empty list to return None we can just modify maximum to first check if the input list is empty, and if it is return None. Let’s do that and then run the test again. It should pass now.

def maximum(array):
    if len(array) == 0:
        return None
    largest = array[0]
    for x in array:
        if x > largest:
            largest = x
    return largest
test_maximum()

1.6.2 Efficiency

Let’s now talk about the efficiency of an algorithm. The theoretical way to measure the efficiency of an algorithm is asymptotic complexity. The idea is to suppose the input has an arbitrary size \(n\), which we assume to be really really large. I’m being deliberately ambiguous about what I mean by size, since it depends on what the inputs are. If the input is a list, then the size is usually the length of the list. If the input is a string, then the size might be the length of the string. If the input is a number, the size might be the number of bits needed to describe that number. Etc.

Notice that I said with asymptotic complexity that we assume the size \(n\) to be really really large. That is, \(n \gg 1\). This is why we say asymptotic complexity. Exactly how large \(n\) needs to be is, to be honest, subjective. Sometimes \(n=10\) will do. Sometimes we’d need \(n=1,000,000\) or even bigger. Anyway, the idea of asymptotic complexity is to figure out how many steps an algorithm is performing, as a mathematical function of the input size \(n\). What exactly I mean by a step is subjective too. Usually we’d think of each executed line of code as being a step, unless that line of code is doing a lot of work like calling another function or doing a list comprehension. Anyway, once we have the number of steps as a function of \(n\), we’d then approximate it by its leading order term.

I haven’t yet said what exactly I mean by efficiency either. What kind of efficiency are we talking about? Usually when we say efficiency we mean two things: 1) we want the algorithm to be fast, in that it runs in as little time as possible? 2) We want the algorithm to use as little memory as possible in the computation. This means we’re usually interested in two different types of asymptotic complexity, the time of an algorithm as a function of the input size, and the space of an algorithm as a function of the input size.

It’s probably a good idea at this point to do an example. Let’s try to estimate both the asymptotic time and memory usage of our maximum algorithm. We’ll assume the input list has an arbitrary size \(n \gg 1\). Let’s start by estimating the asymptotic time. For simplicity, I’ll count each single line of a function as one step, unless it’s calling a function or running a loop. This may seem weird. Why should, say, defining a variable take the same amount of time as dividing two large numbers? You’re right, it’s arbitrary. But when \(n\) is large it won’t really matter that much, as we’ll see.

For simplicity, let’s take the maximum function that ignores the empty list check. It doesn’t really make a difference. It’s just easier to count the original one. Here’s a count of each line:

  1. The first line is the function signature. This is just a kind of definition, so we’ll assume it’s one step.
  2. The second line sets largest to the first value in the input array. We generally assume indexing a list takes a single step. So we’ll count this line as one step.
  3. The third line specifies the for loop over the elements in array. The loop runs over all \(n\) elements. We’ll count the loop definition itself as one step. But we have to keep in mind that the inner body of the loop is being executed \(n\) times, once for each element in the array.
  4. The fourth line is an if statement that checks if x > largest. We’re just seeing if one number is larger than another, so we’ll again assume this takes one step.
  5. The fifth line sets largest = x provided the if statement is satisfied. We’ll again assume this takes one step.
  6. Finally we return largest. We’ll again assume this takes one step.

Now that we’ve counted all the steps executed we need to add them up. Steps (1) through (3) each took a single step, for a total of \(2\) steps. Steps (4) and (5) each took a single step for each element in the loop. Since the loop ran \(n\) times, that means \(2n\) total steps ran inside the loop. Finally, step (6) took a single step, so \(1\) more. In total then, maximum took \(2 + 2n + 1 = 2n + 3\) total steps. But, what we’re really interested in is the total time, not the total number of steps. Let’s assume a single step takes some time \(t\) to run, on average. Then all we need to do to get the total time is to multiply the total number of steps by the time per step \(t\). Let’s call the total time \(T(n)\), since it’s a function of \(n\). Then we have

\[T(n) = (2n + 3) t = 2tn + 3t.\]

So far we haven’t used the fact that \(n \gg 1\) at all. Let’s now suppose that \(n\) is really really big. What will that mean then for the total time \(T(n)\)? Suppose for sake of argument that \(n\) is a million and \(t=1\). Then the term \(2tn\) will be two million, and the term \(3t\) will just be three. This means \(2tn\) is really really big compared to \(3t\), so for all practical purposes we have that \(T(n) \approx 2tn\). The added constant \(3t\) is practically a rounding error.

For reasons I don’t necessarily agree with, it’s common to take one more step after this, which is to completely ignore the constant factors multiplying the leading power of \(n\). Instead of writing \(T(n) = 2tn\), we’d ignore the factor \(2t\) and just write \(T(n) = O(n)\). Read this as “the algorithm has an asymptotic time complexity on the order of \(n\)”. The notation \(O(n)\) is called big-O notation. The \(O\) just stands for on the order of.

As a good rule of thumb, the number of steps executed by the deepest nested loop will give you the asymptotic runtime of your algorithm. When counting this way, you have to be careful to count list comprehensions as loops, as well as any function that iterates over a list or any other iterable object. This rule of thumb will be true as long as there are no other function calls inside the algorithm, or recursions. For maximum, there was only a single loop that runs \(n\) times. Since no other function was being called, we can immediately conclude \(T(n) = O(n)\) in this case. If there was a double loop over the input we’d conclude \(T(n) = O(n^2)\). Etc. If there are other function calls you need to figure out what the runtime is of that function call. If there’s a recursion things can be more complicated.

When \(T(n) = O(n)\) we’d say the algorithm is a linear time algorithm, since it’s linear in the input size \(n\). If \(T(n) = O(n^2)\) we’d say it’s a quadratic time algorithm. If \(T(n) = O(n^3)\) we’d say it’s a cubic time algorithm. Etc. One example of an algorithm that has quadratic time is counting the number of subsequences inside of a list. That is, the number of possible ways to slice the list and get a distinct sub-list. One example of a cubic time algorithm is matrix multiplication, something we’ll study in great detail in future lessons.

Perhaps even more strange, it’s not even required that an algorithm’s asymptotic time be a power of \(n\). For example, recursive algorithms are quite often proportional to the logarithm of \(n\). Searching a sorted list for a value is an example of an \(O(\log n)\) time algorithm. Sorting a list is an example of an \(O(n \log n)\) time algorithm. Convolving an image with a filter is an example of an \(O(n^2 \log n)\) algorithm. Etc. We can even have exponential time algorithms, i.e. algorithms that run in \(O(2^n)\) time. Exponential time algorithms are really slow, so slow you quite often wouldn’t use them. As a general rule,

\[O(1) \ll O(\log n) \ll O(n) \ll O(n \log n) \ll O(n^2) \ll O(n^2 \log n) \ll O(n^3) \ll \cdots \ll O(2^n).\]

The fastest algorithms of all are the ones that don’t depend on the input size at all. For example, if all maximum did was return the string Hello World no matter what you passed in, it would be a constant time algorithm. Of course then it wouldn’t be correct. When an algorithm is constant time we’d write \(O(1)\).

Very roughly speaking, we’d say an algorithm is fast if it’s less than cubic time, though this depends a lot on what the problem is and how big \(n\) can get. With this definition, we’d say our maximum algorithm is fast. Is it the fastest algorithm we could list to find the maximum of a list? It turns out it is, at least for an arbitrary unsorted list. If the list had some kind of structure we might be able to find the maximum faster, perhaps in \(O(\log n)\) or even \(O(1)\) time.

Let’s now try to count the total amount of memory or space the maximum algorithm uses. This works exactly the same way as counting the time, except we’re keeping track of the size of objects in the algorithm instead of the number of steps it takes to run. Suppose again the input is a list of size \(n\). Let’s first count the number of total elements in every object the algorithm sees. By convention, we ignore the size of the input when counting elements. We only count the elements in the intermediate objects and the output. Here’s a line-by-line count:

  1. The first line is the function signature. We’ll assume no elements appear here.
  2. The second line sets largest to the first value in the input array. We’re just setting a variable equal to a single element, i.e. the number array[0], which is one element. Since we’re copying array[0] into largest we have to count this element.
  3. The third line specifies the for loop over the elements in array. The loop runs over all \(n\) elements. Here we have to be careful that the loop isn’t running over an object with multiple elements that we haven’t already counted. In this case it’s running over the input array, whose elements we’re ignoring. We’ll thus say this line has no new elements in it.
  4. The fourth line is an if statement that checks if x > largest. There’s nothing new to count here, so we’ll again assume no new elements appear.
  5. The fifth line sets largest = x provided the if statement is satisfied. This is just setting a variable equal to a number. But since we’ve already counted largest in line (2), we shouldn’t count it again. Thus, no new elements appear on this line.
  6. Finally we return largest. Since we’ve already counted largest, no new elements appear.

In total, we’ve only counted one element being used in maximum, namely the variable largest, which always stores a single number. Again, what we really care about though is how much memory is being used, not how many elements are being counted. Assume that each element, on average, uses \(m\) bits of memory. If we denote the total memory usage of the algorithm by \(M(n)\), then we’ve just shown that maximum has a total memory usage of

\[M(n) = m.\]

We can play exactly the same tricks with memory that we did with time. Assume \(n\) is really big. Since \(M(n) = m\) doesn’t depend at all on \(n\) it’s clearly a constant function. That is, \(M(n) = O(1)\). The maximum algorithm thus uses constant asymptotic memory. We can thus finally summarize the asymptotic efficiency of the maximum algorithm as

\[T(n) = O(n), \qquad M(n) = O(1).\]

Just as with asymptotic time, asymptotic memory can be proportional to powers of \(n\), as well as to \(\log n\) or \(2^n\). Usually it’ll be proportional to a power of \(n\) unless some kind of recursion is going on. As a good rule of thumb, your asymptotic memory usage will be the size of the largest non-input object that appears in the function. This will be true as long as there aren’t recursive calls showing up. If they do show up things get more complicated.

All this asymptotics stuff is all well and good. It gives a half decent proxy for how slow an algorithm will be or how much memory it uses. But remember that it’s doing a lot of coarse graining. In calculating \(T(n)\) and \(M(n)\) we’re ignoring a lot of important details, like what hardware the algorithm is running on, what language it’s programmed in, whether things are being run in parallel, what else the operating system is running, etc. In practice, a much better way to estimate how slow an algorithm is, or how much memory it’s using, is to use a profiler. A profiler is a tool that runs a given routine and analyzes how much time or memory is being used, usually averaged over a bunch of runs.

In Jupyter Notebook, the easiest way to profile a function is to use the %timeit magic. A magic is any special command prefixed by % or %%. One of those special commands is timeit, which runs a given statement a bunch of times and estimates the average amount of time it takes to execute that statement. Let’s use timeit to profile our maximum function and see how long it’s taking to run. Just for fun, let’s compare it to python’s builtin max function. Theoretically both algorithms should be \(O(n)\). But what happens in practice? I’ll run both functions using the same fairly big input range(10000).

%timeit maximum(range(10000))
425 µs ± 5.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit max(range(10000))
185 µs ± 4.46 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Even though both algorithms are linear time and perform the same task, python’s builtin max function seems to run about twice as fast as our custom maximum function. What gives? Well, timeit is capturing not just the \(O(n)\) nature of the algorithm, but also the practical aspects as well. Things like, what processor are the operations running on, what priority does the operating system give the operations relative to other things the computer is doing, whether any operations are being run in parallel, what programming language is being used, how efficiently the memory hierarchy is being managed, etc. Algorithmic complexity captures none of these important things, while profilers capture essentially all of them.

Not only can we profile the total runtime of a program, but we can even profile a program line by line to see which lines are running slow. In Jupyter you can do this using the %lprun magic. We can also profile the memory taken up by a program, but as a whole or line-by-line. In Jupyter you can do these with the %memit and %mprun magics. Usually we’ll be more worried about runtime than memory, but it is occasionally useful to profile the memory too. One major example is when running large models on the GPU. Since GPUs have a relatively small amount of memory, we have to be careful not to run out of GPU memory, which means it’s often useful to run a memory profiler on a training or inference loop.