Introduction to Python for sciences 2020

Acquire strong basis in Python to use it efficiently

Pierre Augier (LEGI), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Christophe Picard (LJK), Loïc Huder (ISTerre)

Modules, import statement and the standard library

Import code from other files

Useful for:

  • reusing code
  • organizing code
  • distributing code (not detailed here)

Modules and packages

  • A module is a python file that can be imported.
  • A package is a directory containing module(s).

Multi-file program (example with 2 files) and imports

  • ../pyfiles/example0/util.py
  • ../pyfiles/example0/prog.py
In [1]:
with open('../pyfiles/example0/util.py') as file:
    print(file.read())
print('begin of util.py')
myvar0 = 0
myvar1 = 1

def print_variables():
    print(f'in function print_variables: myvar0 = {myvar0}; myvar1 = {myvar1}')

In [2]:
run ../pyfiles/example0/util.py
begin of util.py

Multi-file program (example with 2 files) and imports

  • ../pyfiles/example0/util.py
  • ../pyfiles/example0/prog.py
In [3]:
with open('../pyfiles/example0/prog.py') as file:
    print(file.read())
# 2 different syntaxes for importing a module
import util
from util import myvar1, print_variables

util.myvar0 = 100
myvar1 += 100
print(f'in prog.py, util.myvar0 = {util.myvar0}; myvar1 = {myvar1}')
print_variables()

In [4]:
run ../pyfiles/example0/prog.py
begin of util.py
in prog.py, util.myvar0 = 100; myvar1 = 101
in function print_variables: myvar0 = 100; myvar1 = 1

Warning: files imported more than once are executed only once per process.

Multi-file program (example with 2 files) and imports

if __name__ == "__main__": ...

  • ../pyfiles/example1/util.py
  • ../pyfiles/example1/prog.py
In [20]:
with open('../pyfiles/example1/util.py') as file:
    print(file.read())
print('begin of util.py')
myvar0 = 0
myvar1 = 1

def print_variables():
    print(f'in function print_variables: myvar0 = {myvar0}; myvar1 = {myvar1}')

print('in util.py, __name__ =', __name__)
# __name__ is a special variable always defined.
# its value depends on how the file is called (directly executed or imported)
if __name__ == '__main__':
    # this code is executed only in the file is directly executed
    print('the module util.py has been directly executed')
    print_variables()
    print('end of util.py')
else:
    print('the module util.py has been imported')

In [21]:
run ../pyfiles/example1/util.py
begin of util.py
in util.py, __name__ = __main__
the module util.py has been directly executed
in function print_variables: myvar0 = 0; myvar1 = 1
end of util.py

Warning on from ... import *

There is another import syntax with a start:

In [5]:
from matplotlib.pylab import *

It imports in the global namespace all names of the namespace matplotlib.pylab. It can be useful in some situations but should be avoid in many cases. With this syntax, you don't know from where come the names and automatic code analysis becomes much more difficult.

Standard structure of a Python module

In [6]:
"""A program...

Documentation of the module.

"""

# import functions, modules and/or classes

from math import sqrt

# definition of functions and/or classes

def mysum(variables):
    """ sum all the variables of the function and return it.
        No type check
        :param variables: (iterable) an iterable over elements 
                          that can be summed up
        :return: the sum of the variables
    """
    result = 0
    for var in variables:
        result += var
    return result

# main part of the program (protected)

if __name__ == '__main__':
    l = [1, 2, 3, 4]
    print('the square of mysum(l) is', sqrt(mysum(l)))
the square of mysum(l) is 3.1622776601683795

The standard library + presentation of few very common packages

The Python standard library (see also this tuto) is a quite impressive set of packages useful for many many things. These packages are included nativelly in Python. They are very stable (difficult to find bugs). Here is a small list:

  • math - Mathematical functions
  • sys - System-specific parameters and functions (a lot about the Python system)
  • copy - Shallow and deep copy operations
  • os - Miscellaneous operating system interfaces
  • glob - Unix style pathname pattern expansion (like ls)
  • shutil - High-level file operations
  • pdb - activate the python debugger
  • subprocess
  • datetime
  • pickle - Python object serialization
  • re - Regular expressions
  • argparse - Parser for command-line options, arguments and sub-commands
  • unittest - Unit testing framework
  • logging - Event logging system
  • platform - Access to underlying platform’s identifying data
  • threading - Higher-level threading interface
  • multiprocessing - Process-based “threading” interface

math - Mathematical functions

For example to use $\pi$ in an environment where Numpy might not be installed:

In [7]:
import math

print(type(math))
<class 'module'>
In [8]:
from math import cos
print('pi is approximately equal to     ', math.pi)
print('cos(pi) is approximately equal to', cos(math.pi))
pi is approximately equal to      3.141592653589793
cos(pi) is approximately equal to -1.0

sys - System-specific parameters and functions (a lot about the Python system)

If you want to know where Python looks for module during the import statements, you can do

In [9]:
import sys
print(sys.path)
['/home/pierre/Output/Teach/py-training-2017/ipynb', '/home/pierre/.pyenv/versions/3.7.2/lib/python37.zip', '/home/pierre/.pyenv/versions/3.7.2/lib/python3.7', '/home/pierre/.pyenv/versions/3.7.2/lib/python3.7/lib-dynload', '', '/home/pierre/.local/lib/python3.7/site-packages', '/home/pierre/.pyenv/versions/3.7.2/lib/python3.7/site-packages', '/home/pierre/Dev/fluiddyn', '/home/pierre/Dev/fluidlab', '/home/pierre/Dev/pythran', '/home/pierre/Dev/beniget', '/home/pierre/Dev/transonic', '/home/pierre/Dev/fluidsim', '/home/pierre/Dev/fluidfft', '/home/pierre/Dev/mpi4py-fft', '/home/pierre/.pyenv/versions/3.7.2/lib/python3.7/site-packages/IPython/extensions', '/home/pierre/.ipython']

os: Miscellaneous operating system interfaces

os is a very important module.

In [10]:
import os

os.getcwd()
Out[10]:
'/home/pierre/Output/Teach/py-training-2017/ipynb'

There is in particular the os.path module, which you use each time you work with paths towards files and directories. It can be used to build paths in the most robust manner:

In [11]:
# Building a path to a file to read...
directory_path = './files/'
file_name = 'file_to_read.txt'
# String concatenation works but is not very robust
full_path = directory_path + file_name
print(full_path)
# Better to do
full_path = os.path.join(directory_path, file_name)
print(full_path)
./files/file_to_read.txt
./files/file_to_read.txt

For example, we can create the string for a new path in a cross-platform way like this

In [12]:
# Method to get cross-platform home directory ($HOME)
home_dir = os.path.expanduser('~')
os.path.join(home_dir, 'opt', 'miniconda3', 'lib/python3.6')
Out[12]:
'/home/pierre/opt/miniconda3/lib/python3.6'

To make a new directory if it does not exist:

In [13]:
path_tmp = '../pyfiles/tmp_directory'
if not os.path.exists(path_tmp):
    os.mkdir(path_tmp)
print(os.listdir('../pyfiles/'))
['helloworld.py', 'tmp_directory', 'wrong.py', 'example1', 'example0']

To scan the content of a directory:

In [14]:
def list_dir_files():
    for base, path_dir, path_files in os.walk('../pyfiles'):
        if base.startswith('__'):
            continue
        print((f'In the directory {base}:\n'
               f'\tdirectories: {path_dir}\n\tfiles {path_files}.'))

list_dir_files()        
print(os.path.exists(path_tmp))
os.rmdir(path_tmp)
print(os.path.exists(path_tmp))
list_dir_files()
In the directory ../pyfiles:
	directories: ['tmp_directory', 'example1', 'example0']
	files ['helloworld.py', 'wrong.py'].
In the directory ../pyfiles/tmp_directory:
	directories: []
	files [].
In the directory ../pyfiles/example1:
	directories: []
	files ['prog.py', 'util.py'].
In the directory ../pyfiles/example0:
	directories: ['__pycache__']
	files ['prog.py', 'util.py'].
In the directory ../pyfiles/example0/__pycache__:
	directories: []
	files ['util.cpython-37.pyc'].
True
False
In the directory ../pyfiles:
	directories: ['example1', 'example0']
	files ['helloworld.py', 'wrong.py'].
In the directory ../pyfiles/example1:
	directories: []
	files ['prog.py', 'util.py'].
In the directory ../pyfiles/example0:
	directories: ['__pycache__']
	files ['prog.py', 'util.py'].
In the directory ../pyfiles/example0/__pycache__:
	directories: []
	files ['util.cpython-37.pyc'].

Other handy functions of os.path:

  • os.path.basename: returns the basename of a path (last member of a path)
  • os.path.isfile: returns True if the path points to a file
  • ...

See https://docs.python.org/3.7/library/os.path.html

glob - Unix style pathname pattern expansion

The equivalent of the Unix "ls" is in the glob module:

In [15]:
from glob import glob
l = glob('*')
print('list unsorted:', l)
print('list sorted:  ', sorted(l))
list unsorted: ['pres00_intro_first_steps.slides.html', 'pres09_practical1.slides.html', 'pres080_oop_encapsulation.ipynb', 'practical_numpy_img_median.ipynb', 'pres080_oop_encapsulation.slides.html', 'practical_numpy_img_median.slides.html', 'pres081_oop_inheritance.ipynb', 'pres13_doc_applications.slides.html', 'slides_reveal_wide.tpl', 'reveal.js', 'pres06_import_standard_library.slides.html', 'pres12_practical2.slides.html', 'pres111_intro_matplotlib.ipynb', 'pres12_practical2.ipynb', 'pres04_readwritefiles.ipynb', 'index.rst', 'pres15_practical5.ipynb', 'images', 'pres07_data_struct.ipynb', 'pres02_basic_statements.ipynb', 'pres110_intro_numpy_scipy_pandas.slides.html', 'pres13_doc_applications.ipynb', 'pres03_functions.ipynb', 'pres10_environnement.slides.html', 'pres04_readwritefiles.slides.html', 'pres01_intro_language.slides.html', 'pres06_import_standard_library.ipynb', 'pres111_intro_matplotlib.slides.html', 'table_of_contents.rst', 'pres05_practical0.ipynb', 'pres14_advanced.ipynb', 'pres01_intro_language.ipynb', 'pres081_oop_inheritance.slides.html', 'pres15_practical5.slides.html', 'pres05_practical0.slides.html', 'pres07_data_struct.slides.html', 'pres09_practical1.ipynb', 'pres00_intro_first_steps.ipynb', 'pres14_advanced.slides.html', 'pres03_functions.slides.html', 'pres10_environnement.ipynb', 'index.html', 'introduction.slides.html', 'pres02_basic_statements.slides.html', 'pres110_intro_numpy_scipy_pandas.ipynb', 'introduction.ipynb']
list sorted:   ['images', 'index.html', 'index.rst', 'introduction.ipynb', 'introduction.slides.html', 'practical_numpy_img_median.ipynb', 'practical_numpy_img_median.slides.html', 'pres00_intro_first_steps.ipynb', 'pres00_intro_first_steps.slides.html', 'pres01_intro_language.ipynb', 'pres01_intro_language.slides.html', 'pres02_basic_statements.ipynb', 'pres02_basic_statements.slides.html', 'pres03_functions.ipynb', 'pres03_functions.slides.html', 'pres04_readwritefiles.ipynb', 'pres04_readwritefiles.slides.html', 'pres05_practical0.ipynb', 'pres05_practical0.slides.html', 'pres06_import_standard_library.ipynb', 'pres06_import_standard_library.slides.html', 'pres07_data_struct.ipynb', 'pres07_data_struct.slides.html', 'pres080_oop_encapsulation.ipynb', 'pres080_oop_encapsulation.slides.html', 'pres081_oop_inheritance.ipynb', 'pres081_oop_inheritance.slides.html', 'pres09_practical1.ipynb', 'pres09_practical1.slides.html', 'pres10_environnement.ipynb', 'pres10_environnement.slides.html', 'pres110_intro_numpy_scipy_pandas.ipynb', 'pres110_intro_numpy_scipy_pandas.slides.html', 'pres111_intro_matplotlib.ipynb', 'pres111_intro_matplotlib.slides.html', 'pres12_practical2.ipynb', 'pres12_practical2.slides.html', 'pres13_doc_applications.ipynb', 'pres13_doc_applications.slides.html', 'pres14_advanced.ipynb', 'pres14_advanced.slides.html', 'pres15_practical5.ipynb', 'pres15_practical5.slides.html', 'reveal.js', 'slides_reveal_wide.tpl', 'table_of_contents.rst']

pathlib: Object-oriented filesystem paths

A modern (Python 3) and nicer method to manipulate file paths.

In [16]:
from pathlib import Path
In [17]:
path_tmp = Path("..") / "pyfiles/tmp_directory"
print(path_tmp.exists())
path_tmp.mkdir(exist_ok=True)
False

shutil - High-level file operations

Copy of files and directories can be done with shutil, in particular with shutil.copytree.

pdb: useful to debug code

On a script:

  1. import pdb
  2. write pdb.set_trace() to set up a breakpoint
  3. run the script

At execution time, the script will stop at the first line containing pdb.set_trace() and gives the user access to the interpreter.

Remarks:

  • even nicer: ipdb (but not part of the standard library).

  • even nicer: breakpoint() built-in function in Python 3.7.

subprocess

subprocess is very important since it is the simple way to launch other programs and bash commands from Python. For example, in order to run bash (and not sh) commands, you can do

In [18]:
import subprocess
def call_bash(commands):
    return subprocess.call(['/bin/bash', '-c', commands])
ret = call_bash("""
echo Hello; cat /tmp/jfdkfjdk
""")
if ret == 0:
    print("command succeed")
else:
    print(f"command failed with return code {ret}")
command failed with return code 1

argparse - Parser for command-line options, arguments and sub-commands

argparse is the right tool to develop a command line script with options and help. Example from the tutorial at https://docs.python.org/3/howto/argparse.html :

# File prog.py
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("echo", help="echo the string you use here")
args = parser.parse_args()
print(args.echo)

Usage :

$ python3 prog.py
usage: prog.py [-h] echo
prog.py: error: the following arguments are required: echo
$ python3 prog.py --help
usage: prog.py [-h] echo

positional arguments:
  echo        echo the string you use here

optional arguments:
  -h, --help  show this help message and exit
$ python3 prog.py foo
foo

logging - Event logging system

logging allows the programmer to print (or not) different levels of messages.

In [19]:
import logging
log_level = logging.INFO  # to get information messages
# log_level = logging.WARNING  # no information messages
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                    level=log_level)

thing = 'beer'
logging.info('Would you like to have a "%s"?', thing)
2019-03-31 22:17:22,616 - root - INFO - Would you like to have a "beer"?