Page tree
Skip to end of metadata
Go to start of metadata

Part one of this tutorial can be found here: Introduction to Python and NumPy I

Table of contents

Purpose

This is the second part of a tutorial designed to introduce you to Python, NumPy and plotting with Matplotlib. It is loosely based on a MATLAB tutorial provided by Prof. Todd Ehlers at the University of Tübingen, Germany. You may find you need to refer to part one to complete this tutorial.

Overview


This tutorial builds on the skills developed in part one. We will learn how to load and save data files, and about more advanced program components including loops, conditional statements and nested loops. Although these terms are likely unfamiliar, you'll soon see what they mean and why they are useful. Go ahead an open Canopy if you haven't already done so. Let's get started...

Reading and writing data


Reading (loading) and writing (saving) data files is a common operation in Python. Before we start, you should create a new directory called QG-Lab2 (or something similar) on the Desktop either by right clicking on the Desktop and selecting Create New Folder or typing the following into a Terminal window (not in Canopy ).

$ mkdir ~/Desktop/QG-Lab2

mkdir is a command to create a directory and ~ refers to your home directory, the directory containing your files on the computer.

I will use '$' to refer to the command prompt in a Terminal window and '>>>' to refer to the command prompt in Python/Canopy

With the new directory in place, click and save the file QG_demo_data.txt to that directory.

In order to read the file, we need to make sure Canopy is working in the same directory (i.e., QG-Lab2) as the location of the data file. We can change directories in Canopy several ways, but the easiest is to type

>>> cd ~/Desktop/QG-Lab2

In Python, a file must first be opened for reading (or writing) before data can be read.

Opening files in Python

Opening a file in Python creates a file object that can then be used to store data to be read from or written to the disk. The file object is like a connection from Python to the file that can be used to either read or write data. In general, opening a file will look like

<variable> = open(<name>, <mode>)                  # Generic form for opening files in Python; Don't type this in...

In this example, the <variable> is the new file object, <name> is the filename of either an existing or new file, and <mode> would be "r" to read a file or "w" to write to a file.

For the example file you just downloaded, you can can open the data file by typing

>>> reader = open("QG_demo_data.txt", "r")          # Open the file for reading

This will create a file object reader that can be used to read the data from the file.

Reading files in Python

As you might expect, data can be read from a file in Python in several ways

<file>.read()           # Read the entire contents of the file into a single (long) character string
 
<file>.readline()       # Read the next line from the data file
 
<file>.readlines()      # Read the entire contents of the file into a list of the length of the number of lines in the file

Which type of file read you will want to use will depend on the situation.

To read the data from our file object reader, we can type

>>> data = reader.read()       # Read the entire data file into the variable data

This will read the entire file into the variable data. We can take a look at the data that was read by typing

>>> print(data)                # Print the contents of data

If all has gone well, you should see a header line and 9 lines of data from the input file QG_demo_data.txt. Lastly, the we can close the file by typing

>>> reader.close()             # Close the file

It is always a good idea to close open files.

Writing to files in Python is similarly simple. We can write the information read into the variable data to a file using a similar approach to reading the data.

>>> writer = open("QG_test_write.txt", "w")    # Open file QG_test_write.txt and create the file object writer
>>> writer.write(data)                         # Write the contents of variable data to the file
>>> writer.close()                             # Close the file

The QG_test_write.txt file is first opened for writing ("w") after which the contents of the data variable are written to the file using the writer.write() function before finally closing the file. In this case, we've essentially copied the file QG_demo_data.txt to a new file called QG_test_write.txt. You can verify this in Canopy using the file browser on the left side of the Canopy window to go to the directory Desktop/QG-Lab2. In there, you can double click on the QG_test_write.txt file to view its contents in the Canopy editor.

Loops


Loops allow you to go through entries in a list or array and modify or use them individually.

for Loops in Python

Loops allow sections of code to be executed a number of times. The simplest type of code loops are called definite loops, in which a section of code will be executed a set number of times. A common loop of this type is a for loop, which has the format

for <var> in <sequence>:
    <code to execute>
    ...

You can think of this code as saying for each item in <sequence>, assign the value to the item to the variable <var> and execute the <code to execute> using that value.

Code that should be executed in a loop in Python is indicated by being indented by a consistent number of spaces. Four spaces is typical. Code beneath (and outside of) a loop should not be indented.

To clarify how this works, let's consider an example of a loop that will output numbers from 1 to 5 to the screen in order.

for number in [1, 2, 3, 4, 5]:
    print(number)

Here, the code will iterate over each number in the list [1, 2, 3, 4, 5] and use the print function to display the value on the screen. In the first iteration, the variable number is assigned the value of the first item in the list (1). The code then prints that value to the screen and that iteration of the loop is complete. Next, the code will perform the same thing for the next item in the list (2). This continues until the list ends.

Another way to use a for loop to run code a finite number of times is to use the range function to define a list of a given length. This kind of loop has the form

for <variable> in range(<expr>):
    <code to execute>
    ...

It is probably easiest to see how this kind of loop works with an example. Like the one above, this example will display the numbers 1-5 on the screen.

for number in range(5):
    print(number+1)

So, how does this work? The range function will produce a list of integers starting at 0 and ending at <expr>-1, increasing by increments of 1 unless otherwise specified. <expr> is a mathematical expression, which could simply be a number or variable, or a more complicated expression. There are two things to highlight about the range function: (1) the range does not include the value of <expr>, but <expr>-1 by default, and (2) the length of the list produced by the range function will be equal to the value of <expr>. If you'd like to see the list that range produces, you can simply type.

>>> range(5)

This is the reason we need to print number+1 in this example.

Let's look now at a slightly more complicated example. We have an array x and would like to create a new array xnew where the value at each location in xnew is the sum of all of the values above that same location in x. We can do that using a Python for loop.

# sum_x.py
#
# This program creates a new array from an existing array using a for loop
#
# dwhipp - 18.3.15
 
# Import NumPy
import numpy as np
import matplotlib.pyplot as plt
 
def main():
    x = np.linspace(0,100,51)   # Create x numpy array from 0-100 in 51 steps
    xnew = np.zeros(len(x))     # Create numpy array of zero values with same size as array x
    xold = 0.0                  # Set xold to zero to start
    for i in range(len(x)):     # Loop over all values in array x, calculate xnew, store xold
        xnew[i] = x[i] + xold
        xold = xnew[i]

    # Plot the results of x versus xnew
    plt.plot(x,xnew,'r-')
    plt.xlabel("x")
    plt.ylabel("xnew")
    plt.title("for loop results")
    plt.show()

main() 

As you can see, after we create the array x and an array full of zeros called xnew, we then loop over each value in x to calculate the running sum. There are two new things here that should be pointed out. First, we've used a different expression than just a number for the range function in this loop.

for i in range(len(x)):

We have used the len function here. The len function returns a number equal to the length of an array or list. In this case, we want to loop over all values in the array x, so we use a range of length len(x).

Second, we are filling in the values at specific locations in the array xnew by using the for loop variable, or loop index, i.

    xnew[i] = x[i] + xold

In each iteration of the loop, the value at position i of the array xnew is set equal to the value at position i of the array x plus the sum of all past values of x, xold. This kind of reference to a specific location in an array or list is extremely common in Python, and makes it easy to store values in arrays efficiently.

Python arrays and lists start at index value 0. The index value is the number corresponding to the location in the array or list. Lists and arrays starting with index 0 is one of the reasons that the range function produces lists of numbers starting at 0, which makes it easy to loop over arrays or lists by their index value.

Conditional statements


Like loops, conditional statements are very powerful program components that allow you to change what the program does based on logical tests.

if statements in Python

if statements allow different parts of a code to be executed depending on the evaluation of a logical test. The general form of an if statement is

if <condition>:
    <code to execute>
    ...

We can translate this code as saying that if <condition> is true, the <code to execute> will be executed. If <condition> is false, the <code to execute> will be passed over. Consider the example below

# absolute_value.py
 
def main():
    number = input("Please enter a number: ")
    if number < 0.0:
        number = -number
    print("The absolute value is "+str(number))
 
main()

OK, so there are a few new things here, but the use of the if statement should be pretty clear. In this case, we use the input function to prompt the user to enter a number. That value is stored as number. The if statement tests to see if the number is negative (if number < 0.0; < is the Python operator to see if a value is less than the value to the right). If the number is negative (i.e., the test is true), then number will be converted to a positive value. If the test is false, the number is already positive and will not be converted. Regardless, the absolute value is displayed on the screen at the end of the program.

The input function can be used to prompt a user to type something to be stored as a variable. The input could be a number, some text or even True or False.

The str function is used to convert values to type str, a character string. This is most often useful when you want to display some numbers along with text in a print statement or as text on a plot, such as in the example above.

Other conditional statements

In addition to the if statement, we can use other conditional statements to test for other conditions. Perhaps you want one part of a code to be executed if an if statement is true and another part to be executed when the statement is false. For this we can use the else statement. The else statement takes the form

if <condition>:
    <code to execute when true>:
else:
    <code to execute when false>

As you can well imagine, <code to execute when false> beneath the else statement will be run when the if statement is false.

We can also use the elif condition to perform additional if tests. For example, we might want to test first if a number is less than 5, then whether it is less than 10, and execute different code either way. We could do this using an elif condition as shown below

if number < 5:
    <code to execute for numbers below 5>
elif number < 10:
    <code to execute for numbers below 10>
Relational operators in Python

There are a number of relational (comparison) operators in Python for comparing values. Comparisons take the form

<expr> <relop> <expr>

where <expr> is an expression and <relop> is a relational operator. We've already seen the less than comparison <. Below is a complete list of relational operators.

<   # Less than
<=  # Less than or equal to
==  # Equal to
>=  # Greater than or equal to
>   # Greater than
!=  # Not equal to


For example, in the previous section, we plotted the values of x versus xnew as a red line, but perhaps we are interested in highlighting values of xnew that are between 2000-5000, and also those values of xnew that are greater than 5000. We can modify the code above to do that using the conditional statements if, elif and else.

# sum_x_cond.py
#
# This program creates a new array from an existing array using a for loop
# and conditional statements
#
# dwhipp - 18.3.15
 
# Import NumPy
import numpy as np
import matplotlib.pyplot as plt
 
def main():
    x = np.linspace(0,100,51)   # Create x numpy array from 0-100 in 51 steps
    xnew = np.zeros(len(x))     # Create numpy arrays of zero values with same size as array x
    xmid  = np.zeros(len(x))
    xhigh = np.zeros(len(x))
    xold = 0.0                  # Set xold to zero to start
    for i in range(len(x)):     # Loop over all values in array x, calculate xnew, store xold
        xnew[i] = x[i] + xold
        # Do this if xnew is less than 500
        if xnew[i] < 500:
            # Not in middle or high range, set xmid and xhigh to not a number (nan)
            xmid[i]  = nan
            xhigh[i] = nan
        # Do this if xnew is less than or equal to 1500
        elif xnew[i] <= 1500:
            # In middle range, set xmid to current xnew, set xhigh to not a number (nan)
            xmid[i] = xnew[i]
            xhigh[i] = nan
        # Do this if xnew is greater than 1500
        else:
            # In high range, set xmid to not a number (nan), set xhigh to current xnew
            xmid[i]  = nan
            xhigh[i] = xnew[i]
        xold = xnew[i]

    # Plot the results of x versus xnew
    plt.plot(x,xnew,'r-')
    plt.xlabel("x")
    plt.ylabel("xnew")
    plt.title("for loop results")
    # Plot xmid values as black circles
    plt.plot(x,xmid,'ko')
    # Plot xhigh values as blue stars
    plt.plot(x,xhigh,'b*')
    plt.show()

main()

Conditional statements allow us to manipulate subsets of the variables based, for example, on their value. In this case, if a value is not within one of the ranges of interest, it is assigned a value of nan (not a number). We'll get a better sense of the power of conditional statements as the course progresses.

Just like functions and loops, the contents of a conditional statement should be indented, typically by 4 spaces

Exercises


These exercises will help you continue to develop your Python skills and to learn a bit more about reading data files and plotting age data. Feel free to refer back to the first Python tutorial as needed. For each exercise, you will be asked to submit a copy of your Python code, a plot and answers to a few questions.

Exercise 1 - Files, loops and bedrock age plots?

For this exercise, you are given a broken Python script for reading and plotting thermochronometer age data. Your job, with your newly acquired Python skills, is to fix the script so that it does what says it should in the comments. You will need a copy of the script and the data file to proceed. Data for this exercise comes from a recent paper published on the exhumation of the Himalaya in Bhutan, in case you're curious. Your modified script should:

  1. Read the data file and split the data into separate arrays for each variable. A hint about splitting character strings is listed at the end of this exercise.
  2. Calculate the goodness of fit between the measured and predicted ages in the data file. The goodness of fit equation you should use is

    \[ \chi^{2} = \frac{1}{n} \sum \frac{(O - E)^{2}}{\sigma^{2}} \]

    where n is the number of ages, O is the measured age, E is the predicted age and σ is the standard deviation.

  3. Produce a plot of the measured ages with their error bars and the predicted ages, both as a function of latitude. Be sure to include axis labels and a title

In addition to modifying the script, you should submit answers to the following questions:

  1. Looking at your plot and without looking at the goodness of fit value, how well would you say the predicted ages fit the measured ages in this example? Is this difficult to do? Why or why not?
  2. How well would you say the predicted ages fit the measurements using the calculated goodness of fit? Is your calculated goodness of fit intuitive to use? Why or why not?

Files for this exercise

For this exercise, you should submit a copy of your modified Python script, the plot it produces and answers to the questions above.

Hints

Splitting character strings

To understand how to manipulate the data read from a file, it is important to establish how the data is stored in the variable. If you read the contents of a file into a variable data, we can start by checking the type

>>> type(data)
str

What we can see is that the variable data is a single string variable, meaning the contents of the file have been read as one long string of characters. If we want to manipulate data, we'll need to modify it. We can start by making a list out of the data, with the contents of each line divided into separate entries in the list. This can be done by typing

>>> data = data.split()
>>> type(data)
list
>>> print(data)
...

With the line broken into separate list values, we can access a given value in the list by using the index value for that item. For example, we can find the first item in the list data by typing

>>> data[0]

Similarly, the second item in the list data can be accessed by typing

>>> data[1]
Looping over lines in a file

In this exercise we see a slightly different type of for loop.

for line in contents:

The list contents contains a character string for each line in the data file. Because contents is a list, we can use a for loop similarly to how we might with numbers in a list, and loop over each line in the data file. Thus, each iteration in this loop, line will contain a character string of the data from one line of the data file. This is a quite handy feature in Python.

Exercise 2 - This one is iffy

For this exercise, add a conditional statement to your modified code from Exercise 1 that changes the formatting of the plot as a function of one of the plotted values. You might consider changing the line color, symbol type or some other aspect of the plot. For this exercise, you should

  1. Provide a brief description of the change you made to the code and what it does.
  2. Provide a copy of your modified code.
  3. Provide a plot demonstrating the modified plotting. Be sure it is clear that something has changed!
  4. Write a caption for your plot describing it as if it were in a scientific journal. You should clearly mention what is plotted, the symbol types/colors and any other particular items in the plot such as changes related to your conditional statement.

For this exercise, you should submit a copy of your modified Python script, a description of what was changed in the code, the plot it produces and a caption for the plot.