Much of the material in this document is taken from Appendix F in the book A Primer on Scientific Programming with Python, 4th edition, by the same author, published by Springer, 2014.
Let us illustrate the points above in a specific programming problem: implementation of the Midpoint rule for numerical integration. The Midpoint rule for approximating an integral \( \int_a^b f(x)dx \) reads $$ \begin{equation} I = h\sum_{i=1}^{n} f(a + (i-\frac{1}{2})h), \quad h = {b-a\over n}\tp \tag{1} \end{equation} $$ We just follow the individual steps in the recipe to develop the code.
1. Understand the problem. In this problem we must understand how to program the formula (1). Observe that we do not need to understand how the formula is derived, because we do not apply the derivation in the program. What is important, is to notice that the formula is an approximation of an integral. Comparing the result of the program with the exact value of the integral will in general show a discrepancy. Whether we have an approximation error or a programming error is always difficult to judge. We will meet this difficulty below.
2. Work out examples. As a test case we choose to integrate $$ \begin{equation} f(x) = \sin^{-1}(x)\tp \tag{2} \end{equation} $$ between 0 and \( \pi \). From a table of integrals we find that this integral equals $$ \begin{equation} \left\lbrack x\sin^{-1}(x) + \sqrt{1-x^2}\,\,\right\rbrack_{0}^{\pi}\tp \tag{3} \end{equation} $$ The formula (1) gives an approximation to this integral, so the program will (most likely) print out a result different from (3). It would therefore be very helpful to construct a calculation where there are no approximation errors. Numerical integration rules usually integrate some polynomial of low order exactly. For the Midpoint rule it is obvious, if you understand the derivation of this rule, that a constant function will be integrated exactly. We therefore also introduce a test problem where we integrate \( g(x)=1 \) from 0 to 10. The answer should be exactly 10.
Input and output: The input to the calculations is the function to integrate, the integration limits \( a \) and \( b \), and the \( n \) parameter (number of intervals) in the formula (1). The output from the calculations is the approximation to the integral.
3. Decide on a user interface. We find it easiest at this beginning stage to program the two functions \( f(x) \) and \( g(x) \) directly in the program. We also specify the corresponding integration limits \( a \) and \( b \) in the program, but we read a common \( n \) for both integrals from the command line. Note that this is not a flexible user interface, but it suffices as a start for creating a working program. A much better user interface is to read \( f \), \( a \), \( b \), and \( n \) from the command line, which will be done later in a more complete solution to the present problem.
4. Make algorithms.
Like most mathematical programming problems, also this one has a
generic part and an application part. The generic part is the formula
(1), which is applicable to an
arbitrary function \( f(x) \). The implementation should reflect that we
can specify any Python function f(x)
and get it integrated. This
principle calls for calculating (1) in
a Python function where the input to the computation (\( f \), \( a \), \( b \),
\( n \)) are arguments. The function heading can look as integrate(f, a,
b, n)
, and the value of (1) is
returned.
The test part of the program consists of defining the test functions \( f(x) \) and \( g(x) \) and writing out the calculated approximations to the corresponding integrals.
A first rough sketch of the program can then be
def integrate(f, a, b, n):
# compute integral, store in I
return I
def f(x):
...
def g(x):
...
# test/application part:
n = sys.argv[1]
I = integrate(g, 0, 10, n)
print "Integral of g equals %g" % I
I = integrate(f, 0, pi, n)
# calculate and print out the exact integral of f
The next step is to make a detailed implementation of the integrate
function. Inside this function we need to compute the sum
(1). In general, sums are computed by
a for
loop over the summation index, and inside the loop we
calculate a term in the sum and add it to an accumulation variable.
Here is the algorithm in Python code:
s = 0
for i in range(1, n+1):
s = s + f(a + (i-0.5)*h)
I = s*h
5. Look up information.
Our test function \( f(x)=\sin^{-1}(x) \) must be evaluated in the
program. How can we do this? We know that many common mathematical
functions are offered by the math
module. It is therefore natural to
check if this module has an inverse sine function. The best place to
look for Python modules is the Python Standard Library [2]
documentation, which has a search facility.
Typing math brings up a link to the
math
module, there we find math.asin
as the function we need.
Alternatively, one can use the command line utility pydoc
and write
pydoc math
to look up all the functions in the module.
In this simple problem, we use very basic programming constructs and
there is hardly any need for looking at similar examples to get
started with the problem solving process. We need to know how to
program a sum, though, via a for
loop and an accumulation variable
for the sum.
6. Write the program. Here is our first attempt to write the program. You can find the whole code in the file integrate_v1.py.
def integrate(f, a, b, n):
s = 0
for i in range(1, n):
s += f(a + i*h)
return s
def f(x):
return asin(x)
def g(x):
return 1
# Test/application part
n = sys.argv[1]
I = integrate(g, 0, 10, n)
print "Integral of g equals %g" % I
I = integrate(f, 0, pi, n)
I_exact = pi*asin(pi) - sqrt(1 - pi**2) - 1
print "Integral of f equals %g (exact value is %g)' % \
(I, I_exact)
7. Run the program. We try a first execution from IPython
In [1]: run integrate_v1.py
Unfortunately, the program aborts with an error:
File "integrate_v1.py", line 8 return asin(x) ^ IndentationError: expected an indented block
We go to line 8 and look at that line and the surrounding code:
def f(x):
return asin(x)
Python expects that the return line is indented, because
the function body must always be indented.
By the way, we realize that there is a similar error in the g(x)
function as well.
We correct these errors:
def f(x):
return asin(x)
def g(x):
return 1
Running the program again makes Python respond with
File "integrate_v1.py", line 24 (I, I_exact) ^ SyntaxError: EOL while scanning single-quoted string
There is nothing wrong with line 24, but line 24 is a part of the statement starting on line 23:
print "Integral of f equals %g (exact value is %g)' % \
(I, I_exact)
A SyntaxError
implies that we have written illegal Python
code. Inspecting line 23 reveals that the string to be printed
starts with a double quote, but ends with a single quote.
We must be consistent and use the same enclosing quotes in a string.
Correcting the statement,
print "Integral of f equals %g (exact value is %g)" % \
(I, I_exact)
and rerunning the program yields the output
Traceback (most recent call last): File "integrate_v1.py", line 18, in <module> n = sys.argv[1] NameError: name 'sys' is not defined
Obviously, we need to import sys
before using it.
We add import sys
and run again:
Traceback (most recent call last): File "integrate_v1.py", line 19, in <module> n = sys.argv[1] IndexError: list index out of range
This is a very common error: we index the list sys.argv
out of range
because we have not provided enough command-line arguments.
Let us use \( n=10 \) in the test and provide that number on the command line:
In [5]: run integrate_v1.py 10
We still have problems:
Traceback (most recent call last): File "integrate_v1.py", line 20, in <module> I = integrate(g, 0, 10, n) File "integrate_v1.py", line 7, in integrate for i in range(1, n): TypeError: range() integer end argument expected, got str.
It is the final File
line that counts (the previous ones
describe the nested functions calls up to the point where
the error occurred).
The error message for line 7
is very precise: the end argument to range
,
n
, should be an integer, but it is a string.
We need to convert the string sys.argv[1]
to int
before
sending it to the integrate
function:
n = int(sys.argv[1])
After a new edit-and-run cycle we have other error messages waiting:
Traceback (most recent call last): File "integrate_v1.py", line 20, in <module> I = integrate(g, 0, 10, n) File "integrate_v1.py", line 8, in integrate s += f(a + i*h) NameError: global name 'h' is not defined
The h
variable is used without being assigned a value.
From the formula (1) we
see that \( h=(b-a)/n \), so we insert this assignment at the top of
the integrate
function:
def integrate(f, a, b, n):
h = (b-a)/n
...
A new run results in a new error:
Integral of g equals 9 Traceback (most recent call last): File "integrate_v1.py", line 23, in <module> I = integrate(f, 0, pi, n) NameError: name 'pi' is not defined
Looking carefully at all output, we see that the program managed to
call the integrate
function with g
as input and write
out the integral. However, in the call to integrate
with f
as argument, we get a NameError
, saying that pi
is undefined.
When we wrote the program we took it for granted that pi
was
\( \pi \), but we need to import pi
from math
to get this
variable defined, before we call integrate
:
from math import pi
I = integrate(f, 0, pi, n)
The output of a new run is now
Integral of g equals 9 Traceback (most recent call last): File "integrate_v1.py", line 24, in <module> I = integrate(f, 0, pi, n) File "integrate_v1.py", line 9, in integrate s += f(a + i*h) File "integrate_v1.py", line 13, in f return asin(x) NameError: global name 'asin' is not defined
A similar error occurred: asin
is not defined as a function,
and we need to import it from math
. We can either do a
from math import pi, asin
or just do the rough
from math import *
to avoid any further errors with undefined names from the math
module (we will get one for the sqrt
function later, so we simply
use the last "import all" kind of statement).
There are still more errors:
Integral of g equals 9 Traceback (most recent call last): File "integrate_v1.py", line 24, in <module> I = integrate(f, 0, pi, n) File "integrate_v1.py", line 9, in integrate s += f(a + i*h) File "integrate_v1.py", line 13, in f return asin(x) ValueError: math domain error
Now the error concerns a wrong x
value in the f
function.
Let us print out x
:
def f(x):
print x
return asin(x)
The output becomes
Integral of g equals 9 0.314159265359 0.628318530718 0.942477796077 1.25663706144 Traceback (most recent call last): File "integrate_v1.py", line 25, in <module> I = integrate(f, 0, pi, n) File "integrate_v1.py", line 9, in integrate s += f(a + i*h) File "integrate_v1.py", line 14, in f return asin(x) ValueError: math domain error
We see that all the asin(x)
computations are successful up to and
including \( x=0.942477796077 \), but for \( x=1.25663706144 \) we get an
error. A math domain error
may point to a wrong \( x \) value for
\( \sin^{-1}(x) \) (recall that the domain of a function specifies the
legal \( x \) values for that function).
To proceed, we need to think about the mathematics of our problem: Since \( \sin (x) \) is always between \( -1 \) and \( 1 \), the inverse sine function cannot take \( x \) values outside the interval \( [-1,1] \). The problem is that we try to integrate \( \sin^{-1}(x) \) from 0 to \( \pi \), but only integration limits within \( [-1,1] \) make sense (unless we allow for complex-valued trigonometric functions). Our test problem is hence wrong from a mathematical point of view. We need to adjust the limits, say 0 to 1 instead of 0 to \( \pi \). The corresponding program modification reads
I = integrate(f, 0, 1, n)
We run again and get
Integral of g equals 9 0 0 0 0 0 0 0 0 0 Traceback (most recent call last): File "integrate_v1.py", line 26, in <module> I_exact = pi*asin(pi) - sqrt(1 - pi**2) - 1 ValueError: math domain error
It is easy to go directly to the ValueError
now, but one should
always examine the output from top to bottom. If there is strange
output before Python reports an error, there may be an error indicated
by our print
statements.
This is not the case in the present example, but it is a good
habit to start at the top of the output anyway. We see that all our
print x
statements inside the f
function say that x
is zero.
This must be wrong - the idea of the integration rule is to pick \( n \)
different points in the integration interval \( [0,1] \).
Our f(x)
function is called from the integrate
function.
The argument to f
, a + i*h
, is seemingly always 0. Why? We print out
the argument and the values of the variables that make up the argument:
def integrate(f, a, b, n):
h = (b-a)/n
s = 0
for i in range(1, n):
print a, i, h, a+i*h
s += f(a + i*h)
return s
Running the program shows that h
is zero and therefore
a+i*h
is zero.
Why is h
zero? We need a new print
statement in the computation
of h
:
def integrate(f, a, b, n):
h = (b-a)/n
print b, a, n, h
...
The output shows that a
, b
, and n
are correct.
Now we have encountered a very common error in Python version 2
and C-like programming languages:
integer
division.
The formula \( (1-0)/10=1/10 \) is zero according to integer division.
The reason is that a
and b
are specified as 0
and 1
in the call to integrate
, and 0
and 1
imply int
objects. Then b-a
becomes
an int
, and n
is an int
, causing an int/int
division. We must ensure that b-a
is float
to get the
right mathematical division in the computation of h
:
def integrate(f, a, b, n):
h = float(b-a)/n
...
Thinking that the problem with wrong \( x \) values in the inverse sine
function is resolved, we may remove all the print
statements in the
program, and run again.
The output now reads
Integral of g equals 9 Traceback (most recent call last): File "integrate_v1.py", line 25, in <module> I_exact = pi*asin(pi) - sqrt(1 - pi**2) - 1 ValueError: math domain error
That is, we are back to the ValueError
we have seen before.
The reason is that asin(pi)
does not make sense, and
the argument to sqrt
is negative. The error is simply that we
forgot to adjust the upper integration limit in the computation
of the exact result. This is another very common error.
The correct line is
I_exact = 1*asin(1) - sqrt(1 - 1**2) - 1
We could have avoided the error by introducing variables for the integration limits, and a function for \( \int f(x)dx \) would make the code cleaner:
a = 0; b = 1
def int_f_exact(x):
return x*asin(x) - sqrt(1 - x**2)
I_exact = int_f_exact(b) - int_f_exact(a)
Although this is more work than what we initially aimed at, it usually saves time in the debugging phase to do things this proper way.
Eventually, the program seems to work! The output is just
the result of our two print
statements:
Integral of g equals 9 Integral of f equals 5.0073 (exact value is 0.570796)
8. Verify the results. Now it is time to check if the numerical results are correct. We start with the simple integral of 1 from 0 to 10: the answer should be 10, not 9. Recall that for this particular choice of integration function, there is no approximation error involved (but there could be a small round-off error). Hence, there must be a programming error.
To proceed, we need to calculate some intermediate mathematical results by hand and compare these with the corresponding statements in the program. We choose a very simple test problem with \( n=2 \) and \( h=(10-0)/2=5 \). The formula (1) becomes $$ \begin{equation*} I = 5\cdot\left( 1 + 1\right) = 10\tp\end{equation*} $$ Running the program with \( n=2 \) gives
Integral of g equals 1
We insert some print
statements inside the integrate
function:
def integrate(f, a, b, n):
h = float(b-a)/n
s = 0
for i in range(1, n):
print 'i=%d, a+i*h=%g' % (i, a+i*h)
s += f(a + i*h)
return s
Here is the output:
i=1, a+i*h=5 Integral of g equals 1 i=1, a+i*h=0.5 Integral of f equals 0.523599 (exact value is 0.570796)
There was only one pass in the i
loop in integrate
. According to
the formula, there should be \( n \) passes, i.e., two in this test
case. The limits of i
must be wrong. The limits are produced by the
call range(1,n)
. We recall that such a call results in integers
going from 1 up to n
, but not including n
. We need to include
n
as value of i
, so the right call to range
is range(1,n+1)
.
We make this correction and rerun the program. The output is now
i=1, a+i*h=5 i=2, a+i*h=10 Integral of g equals 2 i=1, a+i*h=0.5 i=2, a+i*h=1 Integral of f equals 2.0944 (exact value is 0.570796)
The integral of 1 is still not correct. We need more intermediate results!
In our quick hand calculation we knew that \( g(x)=1 \) so all
the \( f(a+(i-\frac{1}{2})h) \) evaluations were rapidly replaced by ones.
Let us now compute all the \( x \) coordinates \( a+(i-\frac{1}{2})h \) that
are used in the formula:
$$
\begin{equation*} i=1:\ a+(i-\frac{1}{2})h=2.5,\quad i=2:\ a+(i-\frac{1}{2})h=7.5\tp \end{equation*}
$$
Looking at the output from the program, we see that the argument to
g
has a different value - and fortunately we realize that the
formula we have coded is wrong. It should be a+(i-0.5)*h
.
We correct this error and run the program:
i=1, a+(i-0.5)*h=2.5 i=2, a+(i-0.5)*h=7.5 Integral of g equals 2 ...
Still the integral is wrong. At this point you may give up programming, but the more skills you pick up in debugging, the more fun it is to hunt for errors! Debugging is like reading an exciting criminal novel: the detective follows different ideas and tracks, but never gives up before the culprit is caught.
Now we read the code more carefully and compare expressions with those in the mathematical formula. We should, of course, have done this already when writing the program, but it is easy to get excited when writing code and hurry for the end. This ongoing story of debugging probably shows that reading the code carefully can save much debugging time. (Actually, being extremely careful with what you write, and comparing all formulas with the mathematics, may be the best way to get more spare time when taking a programming course!)
We clearly add up all the \( f \) evaluations correctly, but then this sum
must be multiplied by \( h \), and we forgot that in the code. The
return
statement in integrate
must therefore be modified to
return s*h
Eventually, the output is
Integral of g equals 10 Integral of f equals 0.568484 (exact value is 0.570796)
and we have managed to integrate a constant function in our program! Even the second integral looks promising!
To judge the result of integrating the inverse sine function, we need
to run several increasing \( n \) values and see that the approximation
gets better. For \( n=2,10,100,1000 \) we get 0.550371, 0.568484,
0.570714, 0.570794, to be compared to the exact value 0.570796. (This
is not the mathematically exact value, because it involves
computations of \( \sin^{-1}(x) \), which is only approximately calculated
by the asin
function in the math
module. However, the
approximation error is very small (\( \sim 10^{-16} \)).) The decreasing
error provides evidence for a correct program, but it is not a strong
proof. We should try out more functions. In particular, linear
functions are integrated exactly by the Midpoint rule. We can also
measure the speed of the decrease of the error and check that the
speed is consistent with the properties of the Midpoint rule, but this
is a mathematically more advanced topic.
The very important lesson learned from these debugging sessions is that you should start with a simple test problem where all formulas can be computed by hand. If you start out with \( n=100 \) and try to integrate the inverse sine function, you will have a much harder job with tracking down all the errors.
9. Use a debugger.
Another lesson learned from these sessions is that we needed many
print
statements to see intermediate results. It is an open question
if it would be more efficient to run a debugger and stop the code at
relevant lines. In an edit-and-run cycle of the type we met here, we
frequently need to examine many numerical results, correct something,
and look at all the intermediate results again. Plain print
statements are often better suited for this massive output than the
pure manual operation of a debugger, unless one writes a program to
automate the interaction with the debugger.
The correct code for the implementation of the Midpoint rule is found in integrate_v2.py. Some readers might be frightened by all the energy it took to debug this code, but this is just the nature of programming. The experience of developing programs that finally work is very awarding.
People only become computer programmers if they're obsessive about details, crave power over machines, and can bear to be told day after day exactly how stupid they are. Gregory J. E. Rawlins [3], computer scientist.
We briefly mentioned that the chosen user interface, where the user
can only specify \( n \), is not particularly user friendly. We should
allow \( f \), \( a \), \( b \), and \( n \) to be specified on the command line.
Since \( f \) is a function and the command line can only provide strings
to the program, we may use the StringFunction
object from
scitools.std
to convert a string expression for the function to be
integrated to an ordinary Python
function.
The other parameters should be easy to retrieve from the
command line
by looking up the sys.argv
list. We
enclose
the input statements in a try-except
block, here with
a specific exception type IndexError
(because an index
in sys.argv
out of bounds is the only type of error we expect
to handle):
try:
f_formula = sys.argv[1]
a = eval(sys.argv[2])
b = eval(sys.argv[3])
n = int(sys.argv[4])
except IndexError:
print 'Usage: %s f-formula a b n' % sys.argv[0]
sys.exit(1)
Note that the use of eval
allows us to specify a
and b
as pi
or exp(5)
or another mathematical expression.
With the input above we can perform the general task of the program:
from scitools.std import StringFunction
f = StringFunction(f_formula)
I = integrate(f, a, b, n)
print I
Instead of having these test statements as a main program we follow the good habits and make a module with
integrate
function,test_integrate
function for testing the integrate
function's
ability to exactly integrate linear functions,main
function for reading data from the command line and
calling integrate
for the user's problem at hand.
The test_integrate
function can perform a loop over some specified n
values
and check that the Midpoint rule integrates a linear function exactly.
As always, we must be prepared for round-off errors, so "exactly" means
errors less than (say) \( 10^{-14} \). The relevant code becomes
def test_integrate():
"""Check that linear functions are integrated exactly."""
def g(x):
return p*x + q # general linear function
def int_g_exact(x): # integral of g(x)
return 0.5*p*x**2 + q*x
a = -1.2; b = 2.8 # "arbitrary" integration limits
p = -2; q = 10
success = True # True if all tests below are passed
for n in 1, 10, 100:
I = integrate(g, a, b, n)
I_exact = int_g_exact(b) - int_g_exact(a)
error = abs(I_exact - I)
if error > 1E-14:
success = False
assert success
We have followed the programming standard that will make this test function automatically work with the nose test framework:
test_
,assert
.assert success
statement raises an AssertionError
exception
if success
is false, otherwise nothing happens. The nose testing
framework searches for functions whose name start with test_
,
execute each function, and record if an AssertionError
is raised.
It is overkill to use nose for small programs, but in larger projects
with many functions in many files, nose can run all tests with
a short command and write back a notification that all tests passed.
The main
function is simply a wrapping of the main program given
above. The test block may call or test_integrate
function or main
,
depending on whether the user will test the module or use it:
if __name__ == '__main__':
if sys.argv[1] == 'verify':
verify()
else:
# Compute the integral specified on the command line
main()
Here is a short demo computing \( \int_0^{2\pi}(\cos (x) + \sin(x))dx \) with the aid of the integrate.py file:
integrate.py 'cos(x)+sin(x)' 0 2*pi 10
-3.48786849801e-16