Introduction to Python and NumPy II

Last modified by Xwiki VePa on 2024/02/07 07:37

Part one of this tutorial can be found here: Introduction to Python and NumPy I

Purpose

This is the second part of a tutorial designed to introduce you to Python, NumPy and plotting with Matplotlib. It is loosely based on a MATLAB tutorial provided by Prof. Todd Ehlers at the University of Tübingen, Germany. You may find you need to refer to part one to complete this tutorial.

Overview


This tutorial builds on the skills developed in part one. We will learn how to load and save data files, and about more advanced program components including loops, conditional statements and nested loops. Although these terms are likely unfamiliar, you'll soon see what they mean and why they are useful. Go ahead an open Canopy if you haven't already done so. Let's get started...

Reading and writing data


Reading (loading) and writing (saving) data files is a common operation in Python. Before we start, you should create a new directory called QG-Lab2 (or something similar) on the Desktop either by right clicking on the Desktop and selecting Create New Folder or typing the following into a Terminal window (not in Canopy ).

$ mkdir ~/Desktop/QG-Lab2

mkdir is a command to create a directory and ~{{/code}} refers to your home directory, the directory containing your files on the computer.

{{note}}
I will use '{{code language="none"}}${{/code}}' to refer to the command prompt in a **Terminal** window and '{{code language="none"}}>>>{{/code}}' to refer to the command prompt in Python/**Canopy**
{{/note}}

With the new directory in place, click and save the file {{code language="none"}}QG_demo_data.txt{{/code}}[[ >>attach:QG_demo_data.txt]]to that directory.

In order to read the file, we need to make sure **Canopy** is working in the same directory (i.e., {{code language="none"}}QG-Lab2{{/code}}) as the location of the data file. We can change directories in **Canopy** several ways, but the easiest is to type

{{code language="python"}}
>>> cd ~/Desktop/QG-Lab2
{{/code}}

In Python, a file must first be opened for reading (or writing) before data can be read.

{{panel title="Opening files in Python"}}
Opening a file in Python creates a file object that can then be used to store data to be read from or written to the disk. The file object is like a connection from Python to the file that can be used to either read or write data. In general, opening a file will look like

{{code language="py"}}
<variable> = open(<name>, <mode>)                  # Generic form for opening files in Python; Don't type this in...
{{/code}}

In this example, the {{code language="none"}}<variable>{{/code}} is the new file object, {{code language="none"}}<name>{{/code}} is the filename of either an existing or new file, and {{code language="none"}}<mode>{{/code}} would be {{code language="none"}}"r"{{/code}} to read a file or {{code language="none"}}"w"{{/code}} to write to a file.
{{/panel}}

For the example file you just downloaded, you can can open the data file by typing

{{code language="python"}}
>>> reader = open("QG_demo_data.txt", "r")          # Open the file for reading
{{/code}}

This will create a file object {{code language="none"}}reader{{/code}} that can be used to read the data from the file.

{{panel title="Reading files in Python"}}
As you might expect, data can be read from a file in Python in several ways

{{code language="py"}}
<file>.read()           # Read the entire contents of the file into a single (long) character string

<file>.readline()       # Read the next line from the data file

<file>.readlines()      # Read the entire contents of the file into a list of the length of the number of lines in the file
{{/code}}

Which type of file read you will want to use will depend on the situation.
{{/panel}}

To read the data from our file object {{code language="none"}}reader{{/code}}, we can type

{{code language="py"}}
>>> data = reader.read()       # Read the entire data file into the variable data
{{/code}}

This will read the entire file into the variable data. We can take a look at the data that was read by typing

{{code language="py"}}
>>> print(data)                # Print the contents of data
{{/code}}

If all has gone well, you should see a header line and 9 lines of data from the input file {{code language="none"}}QG_demo_data.txt{{/code}}. Lastly, the we can close the file by typing

{{code language="py"}}
>>> reader.close()             # Close the file
{{/code}}

It is always a good idea to close open files.

(% style="font-size: 10.0pt;line-height: 13.0pt;" %)Writing to files in Python is similarly simple. We can write the information read into the variable {{code language="none"}}data{{/code}} to a file using a similar approach to reading the data.

{{code language="python"}}
>>> writer = open("QG_test_write.txt", "w")    # Open file QG_test_write.txt and create the file object writer
>>> writer.write(data)                         # Write the contents of variable data to the file
>>> writer.close()                             # Close the file
{{/code}}

(% style="font-size: 10.0pt;line-height: 13.0pt;" %)The {{code language="none"}}QG_test_write.txt{{/code}} file is first opened for writing ({{code language="none"}}"w"{{/code}}) after which the contents of the {{code language="none"}}data{{/code}} variable are written to the file using the {{code language="none"}}writer.write(){{/code}} function before finally closing the file. In this case, we've essentially copied the file {{code language="none"}}QG_demo_data.txt{{/code}} to a new file called {{code language="none"}}QG_test_write.txt{{/code}}. You can verify this in **Canopy** using the file browser on the left side of the **Canopy** window to go to the directory {{code language="none"}}Desktop/QG-Lab2{{/code}}. In there, you can double click on the {{code language="none"}}QG_test_write.txt{{/code}} file to view its contents in the **Canopy** editor.

=== Loops ===

----

Loops allow you to go through entries in a list or array and modify or use them individually.

{{panel title="for Loops in Python"}}
Loops allow sections of code to be executed a number of times. The simplest type of code loops are called definite loops, in which a section of code will be executed a set number of times. A common loop of this type is a for loop, which has the format

{{code language="py"}}
for <var> in <sequence>:
    <code to execute>
    ...
{{/code}}

You can think of this code as saying for each item in {{code language="none"}}<sequence>{{/code}}, assign the value to the item to the variable {{code language="none"}}<var>{{/code}} and execute the {{code language="none"}}<code to execute>{{/code}} using that value.

{{note}}
Code that should be executed in a loop in Python is indicated by being indented by a consistent number of spaces. Four spaces is typical. Code beneath (and outside of) a loop should not be indented.
{{/note}}

To clarify how this works, let's consider an example of a loop that will output numbers from 1 to 5 to the screen in order.

{{code language="py"}}
for number in [1, 2, 3, 4, 5]:
    print(number)
{{/code}}

Here, the code will iterate over each number in the list {{code language="none"}}[1, 2, 3, 4, 5]{{/code}} and use the {{code language="none"}}print{{/code}} function to display the value on the screen. In the first iteration, the variable {{code language="none"}}number{{/code}} is assigned the value of the first item in the list ({{code language="none"}}1{{/code}}). The code then prints that value to the screen and that iteration of the loop is complete. Next, the code will perform the same thing for the next item in the list (2). This continues until the list ends.

Another way to use a {{code language="none"}}for{{/code}} loop to run code a finite number of times is to use the {{code language="none"}}range{{/code}} function to define a list of a given length. This kind of loop has the form

{{code language="py"}}
for <variable> in range(<expr>):
    <code to execute>
    ...
{{/code}}

It is probably easiest to see how this kind of loop works with an example. Like the one above, this example will display the numbers 1-5 on the screen.

{{code language="py"}}
for number in range(5):
    print(number+1)
{{/code}}

So, how does this work? The {{code language="none"}}range{{/code}} function will produce a list of integers starting at 0 and ending at {{code language="none"}}<expr>-1{{/code}}, increasing by increments of 1 unless otherwise specified. {{code language="none"}}<expr>{{/code}} is a mathematical expression, which could simply be a number or variable, or a more complicated expression. There are two things to highlight about the {{code language="none"}}range{{/code}} function: (1) the range does not include the value of {{code language="none"}}<expr>{{/code}}, but {{code language="none"}}<expr>-1{{/code}} by default, and (2) the length of the list produced by the {{code language="none"}}range{{/code}} function will be equal to the value of {{code language="none"}}<expr>{{/code}}. If you'd like to see the list that {{code language="none"}}range{{/code}} produces, you can simply type.

{{code language="py"}}
>>> range(5)
{{/code}}

This is the reason we need to {{code language="none"}}print{{/code}} {{code language="none"}}number+1{{/code}} in this example.
{{/panel}}

Let's look now at a slightly more complicated example. We have an array {{code language="none"}}x{{/code}} and would like to create a new array {{code language="none"}}xnew{{/code}} where the value at each location in {{code language="none"}}xnew{{/code}} is the sum of all of the values above that same location in {{code language="none"}}x{{/code}}. We can do that using a Python {{code language="none"}}for{{/code}} loop.

{{code language="python"}}
# sum_x.py
#
# This program creates a new array from an existing array using a for loop
#
# dwhipp - 18.3.15

# Import NumPy
import numpy as np
import matplotlib.pyplot as plt

def main():
    x = np.linspace(0,100,51)   # Create x numpy array from 0-100 in 51 steps
    xnew = np.zeros(len(x))     # Create numpy array of zero values with same size as array x
    xold = 0.0                  # Set xold to zero to start
    for i in range(len(x)):     # Loop over all values in array x, calculate xnew, store xold
        xnew[i] = x[i] + xold
        xold = xnew[i]

    # Plot the results of x versus xnew
    plt.plot(x,xnew,'r-')
    plt.xlabel("x")
    plt.ylabel("xnew")
    plt.title("for loop results")
    plt.show()

main() 
{{/code}}

As you can see, after we create the array {{code language="none"}}x{{/code}} and an array full of zeros called {{code language="none"}}xnew{{/code}}, we then loop over each value in {{code language="none"}}x{{/code}} to calculate the running sum. There are two new things here that should be pointed out. First, we've used a different expression than just a number for the {{code language="none"}}range{{/code}} function in this loop.

{{code language="py"}}
for i in range(len(x)):
{{/code}}

We have used the {{code language="none"}}len{{/code}} function here. The {{code language="none"}}len{{/code}} function returns a number equal to the length of an array or list. In this case, we want to loop over all values in the array {{code language="none"}}x{{/code}}, so we use a range of length {{code language="none"}}len(x){{/code}}.

Second, we are filling in the values at specific locations in the array {{code language="none"}}xnew{{/code}} by using the for loop variable, or loop index, {{code language="none"}}i{{/code}}.

{{code language="py"}}
    xnew[i] = x[i] + xold
{{/code}}

In each iteration of the loop, the value at position {{code language="none"}}i{{/code}} of the array {{code language="none"}}xnew{{/code}} is set equal to the value at position {{code language="none"}}i{{/code}} of the array {{code language="none"}}x{{/code}} plus the sum of all past values of {{code language="none"}}x{{/code}}, {{code language="none"}}xold{{/code}}. This kind of reference to a specific location in an array or list is extremely common in Python, and makes it easy to store values in arrays efficiently.

{{note}}
Python arrays and lists start at index value 0. The index value is the number corresponding to the location in the array or list. Lists and arrays starting with index 0 is one of the reasons that the {{code language="none"}}range{{/code}} function produces lists of numbers starting at 0, which makes it easy to loop over arrays or lists by their index value.
{{/note}}

=== Conditional statements ===

----

Like loops, conditional statements are very powerful program components that allow you to change what the program does based on logical tests.

{{panel title="if statements in Python"}}
if statements allow different parts of a code to be executed depending on the evaluation of a logical test. The general form of an if statement is

{{code language="py"}}
if <condition>:
    <code to execute>
    ...
{{/code}}

We can translate this code as saying that if {{code language="none"}}<condition>{{/code}} is true, the {{code language="none"}}<code to execute>{{/code}} will be executed. If {{code language="none"}}<condition>{{/code}} is false, the {{code language="none"}}<code to execute>{{/code}} will be passed over. Consider the example below

{{code language="py"}}
# absolute_value.py
 
def main():
    number = input("Please enter a number: ")
    if number < 0.0:
        number = -number
    print("The absolute value is "+str(number))
 
main()
{{/code}}

OK, so there are a few new things here, but the use of the {{code language="none"}}if{{/code}} statement should be pretty clear. In this case, we use the {{code language="none"}}input{{/code}} function to prompt the user to enter a number. That value is stored as {{code language="none"}}number{{/code}}. The {{code language="none"}}if{{/code}} statement tests to see if the number is negative ({{code language="none"}}if number < 0.0{{/code}}; {{code language="none"}}<{{/code}} is the Python operator to see if a value is less than the value to the right). If the number is negative (i.e., the test is true), then number will be converted to a positive value. If the test is false, the number is already positive and will not be converted. Regardless, the absolute value is displayed on the screen at the end of the program.

{{info}}
The {{code language="none"}}input{{/code}} function can be used to prompt a user to type something to be stored as a variable. The input could be a number, some text or even True or False.
{{/info}}

{{info}}
The {{code language="none"}}str{{/code}} function is used to convert values to type {{code language="none"}}str{{/code}}, a character string. This is most often useful when you want to display some numbers along with text in a print statement or as text on a plot, such as in the example above.
{{/info}}
{{/panel}}

{{panel title="Other conditional statements"}}
In addition to the {{code language="none"}}if{{/code}} statement, we can use other conditional statements to test for other conditions. Perhaps you want one part of a code to be executed if an {{code language="none"}}if{{/code}} statement is true and another part to be executed when the statement is false. For this we can use the {{code language="none"}}else{{/code}} statement. The {{code language="none"}}else{{/code}} statement takes the form

{{code language="py"}}
if <condition>:
    <code to execute when true>:
else:
    <code to execute when false>
{{/code}}

As you can well imagine, {{code language="none"}}<code to execute when false>{{/code}} beneath the else statement will be run when the if statement is false.

We can also use the {{code language="none"}}elif{{/code}} condition to perform additional if tests. For example, we might want to test first if a number is less than 5, then whether it is less than 10, and execute different code either way. We could do this using an {{code language="none"}}elif{{/code}} condition as shown below

{{code language="py"}}
if number < 5:
    <code to execute for numbers below 5>
elif number < 10:
    <code to execute for numbers below 10>
{{/code}}
{{/panel}}

{{panel title="Relational operators in Python"}}
There are a number of relational (comparison) operators in Python for comparing values. Comparisons take the form

{{code language="py"}}
<expr> <relop> <expr>
{{/code}}

where {{code language="none"}}<expr>{{/code}} is an expression and {{code language="none"}}<relop>{{/code}} is a relational operator. We've already seen the less than comparison {{code language="none"}}<{{/code}}. Below is a complete list of relational operators.

{{code language="py"}}
<   # Less than
<=  # Less than or equal to
==  # Equal to
>=  # Greater than or equal to
>   # Greater than
!=  # Not equal to
{{/code}}

\\
{{/panel}}

For example, in the previous section, we plotted the values of {{code language="none"}}x{{/code}} versus {{code language="none"}}xnew{{/code}} as a red line, but perhaps we are interested in highlighting values of {{code language="none"}}xnew{{/code}} that are between 2000-5000, and also those values of {{code language="none"}}xnew{{/code}} that are greater than 5000. We can modify the code above to do that using the conditional statements {{code language="none"}}if{{/code}}, {{code language="none"}}elif{{/code}} and {{code language="none"}}else{{/code}}.

{{code language="python"}}
# sum_x_cond.py
#
# This program creates a new array from an existing array using a for loop
# and conditional statements
#
# dwhipp - 18.3.15
 
# Import NumPy
import numpy as np
import matplotlib.pyplot as plt

def main():
    x = np.linspace(0,100,51)   # Create x numpy array from 0-100 in 51 steps
    xnew = np.zeros(len(x))     # Create numpy arrays of zero values with same size as array x
    xmid  = np.zeros(len(x))
    xhigh = np.zeros(len(x))
    xold = 0.0                  # Set xold to zero to start
    for i in range(len(x)):     # Loop over all values in array x, calculate xnew, store xold
        xnew[i] = x[i] + xold
        # Do this if xnew is less than 500
        if xnew[i] < 500:
            # Not in middle or high range, set xmid and xhigh to not a number (nan)
            xmid[i]  = nan
            xhigh[i] = nan
        # Do this if xnew is less than or equal to 1500
        elif xnew[i] <= 1500:
            # In middle range, set xmid to current xnew, set xhigh to not a number (nan)
            xmid[i] = xnew[i]
            xhigh[i] = nan
        # Do this if xnew is greater than 1500
        else:
            # In high range, set xmid to not a number (nan), set xhigh to current xnew
            xmid[i]  = nan
            xhigh[i] = xnew[i]
        xold = xnew[i]

    # Plot the results of x versus xnew
    plt.plot(x,xnew,'r-')
    plt.xlabel("x")
    plt.ylabel("xnew")
    plt.title("for loop results")
    # Plot xmid values as black circles
    plt.plot(x,xmid,'ko')
    # Plot xhigh values as blue stars
    plt.plot(x,xhigh,'b*')
    plt.show()

main()
{{/code}}

Conditional statements allow us to manipulate subsets of the variables based, for example, on their value. In this case, if a value is not within one of the ranges of interest, it is assigned a value of {{code language="none"}}nan{{/code}} (not a number). We'll get a better sense of the power of conditional statements as the course progresses.

{{note}}
Just like functions and loops, the contents of a conditional statement should be indented, typically by 4 spaces
{{/note}}

=== Exercises ===

----

These exercises will help you continue to develop your Python skills and to learn a bit more about reading data files and plotting age data. Feel free to refer back to the [[first Python tutorial>>doc:Introduction to Python and NumPy I]] as needed. For each exercise, you will be asked to submit a copy of your Python code, a plot and answers to a few questions.

===== Exercise 1 - Files, loops and bedrock age plots? =====

For this exercise, you are given [[a broken Python script>>attach:read_and_plot_data.py]] for reading and plotting thermochronometer age data. Your job, with your newly acquired Python skills, is to **fix the script** so that it does what says it should in the comments. You will need a [[copy of the script>>attach:read_and_plot_data.py]] and [[the data file>>attach:Coutand2014_AFT_ages.txt]] to proceed. Data for this exercise comes from a [[recent paper published on the exhumation of the Himalaya in Bhutan>>url:http://dx.doi.org/10.1002/2013JB010891||shape="rect"]], in case you're curious. Your modified script should:

1. Read the data file and split the data into separate arrays for each variable. A hint about splitting character strings is listed at the end of this exercise.
1. (((
Calculate the goodness of fit between the measured and predicted ages in the data file. The goodness of fit equation you should use is

{{mathdisplay}}
\chi^{2} = \frac{1}{n} \sum \frac{(O - E)^{2}}{\sigma^{2}}
{{/mathdisplay}}

where //n// is the number of ages, //O// is the measured age, //E// is the predicted age and //σ// is the standard deviation.
)))
1. Produce a plot of the measured ages with their error bars and the predicted ages, both as a function of latitude. Be sure to include axis labels and a title

In addition to modifying the script, you should submit answers to the following questions:

1. Looking at your plot and without looking at the goodness of fit value, how well would you say the predicted ages fit the measured ages in this example? Is this difficult to do? Why or why not?
1. How well would you say the predicted ages fit the measurements using the calculated goodness of fit? Is your calculated goodness of fit intuitive to use? Why or why not?

**Files for this exercise**

* [[Python script for Exercise 1>>attach:read_and_plot_data.py]]
* [[Data file for Exercise 1>>attach:Coutand2014_AFT_ages.txt]]

**For this exercise, you should submit a copy of your modified Python script, the plot it produces and answers to the questions above.**

**Hints**

{{panel title="Splitting character strings"}}
To understand how to manipulate the data read from a file, it is important to establish how the data is stored in the variable. If you read the contents of a file into a variable {{code language="none"}}data,{{/code}} we can start by checking the type

{{code language="python"}}
>>> type(data)
str
{{/code}}

What we can see is that the variable data is a single string variable, meaning the contents of the file have been read as one long string of characters. If we want to manipulate data, we'll need to modify it. We can start by making a list out of the data, with the contents of each line divided into separate entries in the list. This can be done by typing

{{code language="python"}}
>>> data = data.split()
>>> type(data)
list
>>> print(data)
...
{{/code}}

With the line broken into separate list values, we can access a given value in the list by using the index value for that item. For example, we can find the first item in the list data by typing

{{code language="py"}}
>>> data[0]
{{/code}}

Similarly, the second item in the list data can be accessed by typing

{{code language="py"}}
>>> data[1]
{{/code}}
{{/panel}}

{{panel title="Looping over lines in a file"}}
In this exercise we see a slightly different type of {{code language="none"}}for{{/code}} loop.

{{code language="py"}}
for line in contents:
{{/code}}

The list {{code language="none"}}contents{{/code}} contains a character string for each line in the data file. Because {{code language="none"}}contents{{/code}} is a list, we can use a {{code language="none"}}for{{/code}} loop similarly to how we might with numbers in a list, and loop over each line in the data file. Thus, each iteration in this loop, {{code language="none"}}line{{/code}} will contain a character string of the data from one line of the data file. This is a quite handy feature in Python.
{{/panel}}

===== (% style="color: rgb(61,61,61);line-height: 1.6666666;" %)**Exercise 2 - This one is iffy**(%%) =====

For this exercise, add a conditional statement to your modified code from Exercise 1 that changes the formatting of the plot as a function of one of the plotted values. You might consider changing the line color, symbol type or some other aspect of the plot. For this exercise, you should

1. Provide a brief description of the change you made to the code and what it does.
1. Provide a copy of your modified code.
1. Provide a plot demonstrating the modified plotting. Be sure it is clear that something has changed!
1. Write a caption for your plot describing it as if it were in a scientific journal. You should clearly mention what is plotted, the symbol types/colors and any other particular items in the plot such as changes related to your conditional statement.

**For this exercise, you should submit a copy of your modified Python script, a description of what was changed in the code, **the plot it produces and a caption for the plot.