Author: | Dave Kuhlman |
---|---|
Address: | dkuhlman@rexx.com http://www.rexx.com/~dkuhlman |
Revision: | 1.1a |
---|---|
Date: | April 22, 2012 |
Copyright: | Copyright (c) 2009 Dave Kuhlman. All Rights Reserved. This document is subject to the provisions of the Open Source MIT License http://www.opensource.org/licenses/mit-license.php. |
---|---|
Abstract: | This document is a self-learning document for a course in Python programming. This course contains (1) a part for beginners, (2) a discussion of several advanced topics that are of interest to Python programmers, and (3) a Python workbook with lots of exercises. |
Contents
[table of contents]
Preface
This book is a collection of materials that I've used when conducting Python training and also materials from my Web site that are intended for self-instruction.
You may prefer a machine readable copy of this book. You can find it in various formats here:
And, let me thank the students in my Python classes. Their questions and suggestions were a great help in the preparation of these materials.
Python is a high-level general purpose programming language:
Some things you will need to know:
Python uses indentation to show block structure. Indent one level to show the beginning of a block. Out-dent one level to show the end of a block. As an example, the following C-style code:
if (x) { if (y) { f1() } f2() }
in Python would be:
if x: if y: f1() f2()
And, the convention is to use four spaces (and no hard tabs) for each level of indentation. Actually, it's more than a convention; it's practically a requirement. Following that "convention" will make it so much easier to merge your Python code with code from other sources.
If you execute Python from the command line with no script (no arguments), Python gives you an interactive prompt. This is an excellent facility for learning Python and for trying small snippets of code. Many of the examples that follow were developed using the Python interactive prompt.
Start the Python interactive interpreter by typing python with no arguments at the command line. For example:
$ python Python 2.6.1 (r261:67515, Jan 11 2009, 15:19:23) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print 'hello' hello >>>
You may also want to consider using IDLE. IDLE is a graphical integrated development environment for Python; it contains a Python shell. It is likely that Idle was installed for you when you installed Python. You will find a script to start up IDLE in the Tools/scripts directory of your Python distribution. IDLE requires Tkinter.
In addition, there are tools that will give you a more powerful and fancy Python interactive interpreter. One example is IPython, which is available at http://ipython.scipy.org/.
Python represents block structure and nested block structure with indentation, not with begin and end brackets.
The empty block -- Use the pass no-op statement.
Benefits of the use of indentation to indicate structure:
Editor considerations -- The standard for indenting Python code is 4 spaces (no hard tabs) for each indentation level. You will need a text editor that helps you respect that.
Doc strings are like comments, but they are carried with executing code. Doc strings can be viewed with several tools, e.g. help(), obj.__doc__, and, in IPython, a question mark (?) after a name will produce help.
A doc string is a quoted string at the beginning of a module, function, class, or method.
We can use triple-quoting to create doc strings that span multiple lines.
There are also tools that extract and format doc strings, for example:
See: http://docs.python.org/ref/operators.html. Python defines the following operators:
+ - * ** / // % << >> & | ^ ~ < > <= >= == != <>
The comparison operators <> and != are alternate spellings of the same operator. != is the preferred spelling; <> is obsolescent.
Logical operators:
and or is not in
There are also (1) the dot operator, (2) the subscript operator [], and the function/method call operator ().
For information on the precedences of operators, see Summary of operators -- http://docs.python.org/ref/summary.html, which is reproduced below.
The following table summarizes the operator precedences in Python, from lowest precedence (least binding) to highest precedence (most binding). Operators on the same line have the same precedence. Unless the syntax is explicitly given, operators are binary. Operators on the same line group left to right (except for comparisons, including tests, which all have the same precedence and chain from left to right -- see section 5.9 -- and exponentiation, which groups from right to left):
Operator Description ======================== ================== lambda Lambda expression or Boolean OR and Boolean AND not x Boolean NOT in, not in Membership tests is, is not Identity tests <, <=, >, >=, <>, !=, == Comparisons | Bitwise OR ^ Bitwise XOR & Bitwise AND <<, >> Shifts +, - Addition and subtraction *, /, % Multiplication, division, remainder +x, -x Positive, negative ~x Bitwise not ** Exponentiation x.attribute Attribute reference x[index] Subscription x[index:index] Slicing f(arguments...) Function call (expressions...) Binding or tuple display [expressions...] List display {key:datum...} Dictionary display `expressions...` String conversion
Note that most operators result in calls to methods with special names, for example __add__, __sub__, __mul__, etc. See Special method names http://docs.python.org/ref/specialnames.html
Later, we will see how these operators can be emulated in classes that you define yourself, through the use of these special names.
For more on lexical matters and Python styles, see:
Understanding the Python execution model -- How Python evaluates and executes your code. Python evaluates a script or module from the top to the bottom, binding values (objects) to names as it proceeds.
Evaluating expressions -- Expressions are evaluated in keeping with the rules described for operators, above.
Creating names/variables -- Binding -- The following all create names (variables) and bind values (objects) to them: (1) assignment, (2) function definition, (3) class definition, (4) function and method call, (5) importing a module, ...
First class objects -- Almost all objects in Python are first class. Definition: An object is first class if: (1) we can put it in a structured object; (2) we can pass it to a function; and (3) we can return it from a function.
References -- Objects (or references to them) can be shared. What does this mean?
In Python, strings are immutable sequences of characters. They are immutable in that in order to modify a string, you must produce a new string.
Any text information.
Create a new string from a constant:
s1 = 'abce' s2 = "xyz" s3 = """A multi-line string. """
Use any of the string methods, for example:
>>> 'The happy cat ran home.'.upper() 'THE HAPPY CAT RAN HOME.' >>> 'The happy cat ran home.'.find('cat') 10 >>> 'The happy cat ran home.'.find('kitten') -1 >>> 'The happy cat ran home.'.replace('cat', 'dog') 'The happy dog ran home.'
Type "help(str)" or see http://www.python.org/doc/current/lib/string-methods.html for more information on string methods.
You can also use the equivalent functions from the string module. For example:
>>> import string >>> s1 = 'The happy cat ran home.' >>> string.find(s1, 'happy') 4
See string - Common string operations -- http://www.python.org/doc/current/lib/module-string.html for more information on the string module.
There is also a string formatting operator: "%". For example:
>>> state = 'California' >>> 'It never rains in sunny %s.' % state 'It never rains in sunny California.' >>> >>> width = 24 >>> height = 32 >>> depth = 8 >>> print 'The box is %d by %d by %d.' % (width, height, depth, ) The box is 24 by 32 by 8.
Things to know:
You can learn about the various conversion characters and flags used to control string formatting here: String Formatting Operations -- http://docs.python.org/library/stdtypes.html#string-formatting-operations.
You can also write strings to a file and read them from a file. Here are some examples.
Writing - For example:
>>> outfile = open('tmp.txt', 'w') >>> outfile.write('This is line #1\n') >>> outfile.write('This is line #2\n') >>> outfile.write('This is line #3\n') >>> outfile.close()
Notes:
Reading an entire file -- example:
>>> infile = file('tmp.txt', 'r') >>> content = infile.read() >>> print content This is line #1 This is line #2 This is line #3 >>> infile.close()
Notes:
Reading a file one line at a time -- example:
>>> infile = file('tmp.txt', 'r') >>> for line in infile: ... print 'Line:', line ... Line: This is line #1 Line: This is line #2 Line: This is line #3 >>> infile.close()
Notes:
Learn more about the for: statement in section for: statement.
"infile.readlines()" returns a list of lines in the file. For large files use the file object itself or "infile.xreadlines()", both of which are iterators for the lines in the file.
In older versions of Python, a file object is not itself an iterator. In those older versions of Python, you may need to use infile.readlines() or a while loop containing infile.readline() For example:
>>> infile = file('tmp.txt', 'r') >>> for line in infile.readlines(): ... print 'Line:', line ...
A few additional comments about strings:
A string is a special kind of sequence. So, you can index into the characters of a string and you can iterate over the characters in a string. For example:
>>> s1 = 'abcd' >>> s1[1] 'b' >>> s1[2] 'c' >>> for ch in s1: ... print ch ... a b c d
If you need to do fast or complex string searches, there is a regular expression module in the standard library. re re - Regular expression operations -- http://docs.python.org/library/re.html.
An interesting feature of string formatting is the ability to use dictionaries to supply the values that are inserted. Here is an example:
names = {'tree': 'sycamore', 'flower': 'poppy', 'herb': 'arugula'} print 'The tree is %(tree)s' % names print 'The flower is %(flower)s' % names print 'The herb is %(herb)s' % names
There are several types of sequences in Python. We've already discussed strings, which are sequences of characters. In this section we will describe lists and tuples. See Built-in Types: Sequence Types - str, unicode, list, tuple, buffer, xrange -- http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-buffer-xrange for more information on Python's built-in sequence types.
Lists are dynamic arrays. They are arrays in the sense that you can index items in a list (for example "mylist[3]") and you can select sub-ranges (for example "mylist[2:4]"). They are dynamic in the sense that you can add and remove items after the list is created.
Tuples are light-weight lists, but differ from lists in that they are immutable. That is, once a tuple has been created, you cannot modify it. You can, of course, modify any (modifiable) objects that the tuple contains, in other words that it refers to.
Capabilities of lists:
Capabilities of lists and tuples:
Collections -- Not all collections in Python are ordered sequences. Here is a comparison of some different types of collections in Python and their characteristics:
To create a list use square brackets. Examples:
>>> items = [111, 222, 333] >>> items [111, 222, 333]
Create a new list or copy an existing one with the list constructor:
>>> trees1 = list(['oak', 'pine', 'sycamore']) >>> trees1 ['oak', 'pine', 'sycamore'] >>> trees2 = list(trees1) >>> trees2 ['oak', 'pine', 'sycamore'] >>> trees1 is trees2 False
To create a tuple, use commas, and possibly parentheses as well:
>>> a = (11, 22, 33, ) >>> b = 'aa', 'bb' >>> c = 123, >>> a (11, 22, 33) >>> b ('aa', 'bb') >>> c (123,) >>> type(c) <type 'tuple'>
Notes:
To create a tuple containing a single item, we still need the comma. Example:
>>> print ('abc',) ('abc',) >>> type(('abc',)) <type 'tuple'>
To add an item to the end of a list, use append():
>>> items.append(444) >>> items [111, 222, 333, 444]
To insert an item into a list, use insert(). This example inserts an item at the beginning of a list:
>>> items.insert(0, -1) >>> items [-1, 111, 222, 333, 444]
To add two lists together, creating a new list, use the + operator. To add the items in one list to an existing list, use the extend() method. Examples:
>>> a = [11, 22, 33,] >>> b = [44, 55] >>> c = a + b >>> c [11, 22, 33, 44, 55] >>> a [11, 22, 33] >>> b [44, 55] >>> a.extend(b) >>> a [11, 22, 33, 44, 55]
You can also push items onto the right end of a list and pop items off the right end of a list with append() and pop(). This enables us to use a list as a stack-like data structure. Example:
>>> items = [111, 222, 333, 444,] >>> items [111, 222, 333, 444] >>> items.append(555) >>> items [111, 222, 333, 444, 555] >>> items.pop() 555 >>> items [111, 222, 333, 444]
And, you can iterate over the items in a list or tuple (or other collection, for that matter) with the for: statement:
>>> for item in items: ... print 'item:', item ... item: -1 item: 111 item: 222 item: 333 item: 444
For more on the for: statement, see section for: statement.
A dictionary is:
A dictionary has the following capabilities:
For help on dictionaries, type:
>>> help dict
at Python's interactive prompt, or:
$ pydoc dict
at the command line.
It also may be helpful to use the built-in dir() function, then to ask for help on a specific method. Example:
>>> a = {} >>> dir(a) ['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values'] >>> >>> help(a.keys) Help on built-in function keys: keys(...) D.keys() -> list of D's keys
More information about dictionary objects is available here: Mapping types - dict -- http://docs.python.org/library/stdtypes.html#mapping-types-dict.
Create a dictionary with curly bracets. Items in a dictionary are separate by commas. Use a colon between each key and its associated value:
>>> lookup = {} >>> lookup {} >>> states = {'az': 'Arizona', 'ca': 'California'} >>> states['ca'] 'California'
or:
>>> def fruitfunc(): ... print "I'm a fruit." >>> def vegetablefunc(): ... print "I'm a vegetable." >>> >>> lookup = {'fruit': fruitfunc, 'vegetable': vegetablefunc} >>> lookup {'vegetable': <function vegetablefunc at 0x4028980c>, 'fruit': <function fruitfunc at 0x4028e614>} >>> lookup['fruit']() I'm a fruit. >>> lookup['vegetable']() I'm a vegetable.
or:
>>> lookup = dict((('aa', 11), ('bb', 22), ('cc', 33))) >>> lookup {'aa': 11, 'cc': 33, 'bb': 22}
Note that the keys in a dictionary must be immutable. Therefore, you can use any of the following as keys: numbers, strings, tuples.
Test for the existence of a key in a dictionary with the in operator:
>>> if 'fruit' in lookup: ... print 'contains key "fruit"' ... contains key "fruit"
or, alternatively, use the (slightly out-dated) has_key() method:
>>> if lookup.has_key('fruit'): ... print 'contains key "fruit"' ... contains key "fruit"
Access the value associated with a key in a dictionary with the indexing operator (square brackets):
>>> print lookup['fruit'] <function fruitfunc at 0x4028e614>
Notice that the above will throw an exception if the key is not in the dictionary:
>>> print lookup['salad'] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'salad'
And so, the get() method is an easy way to get a value from a dictionary while avoiding an exception. For example:
>>> print lookup.get('fruit') <function fruitfunc at 0x4028e614> >>> print lookup.get('salad') None >>> print lookup.get('salad', fruitfunc) <function fruitfunc at 0x4028e614>
A dictionary is an iterator object that produces its keys. So, we can iterate over the keys in a dictionary as follows:
>>> for key in lookup: ... print 'key: %s' % key ... lookup[key]() ... key: vegetable I'm a vegetable. key: fruit I'm a fruit.
And, remember that you can sub-class dictionaries. Here are two versions of the same example. The keyword arguments in the second version require Python 2.3 or later:
# # This example works with Python 2.2. class MyDict_for_python_22(dict): def __init__(self, **kw): for key in kw.keys(): self[key] = kw[key] def show(self): print 'Showing example for Python 2.2 ...' for key in self.keys(): print 'key: %s value: %s' % (key, self[key]) def test_for_python_22(): d = MyDict_for_python_22(one=11, two=22, three=33) d.show() test_for_python_22()
A version for newer versions of Python:
# # This example works with Python 2.3 or newer versions of Python. # Keyword support, when sub-classing dictionaries, seems to have # been enhanced in Python 2.3. class MyDict(dict): def show(self): print 'Showing example for Python 2.3 or newer.' for key in self.keys(): print 'key: %s value: %s' % (key, self[key]) def test(): d = MyDict(one=11, two=22, three=33) d.show() test()
Running this example produces:
Showing example for Python 2.2 ... key: one value: 11 key: three value: 33 key: two value: 22 Showing example for Python 2.3 or newer. key: three value: 33 key: two value: 22 key: one value: 11
A few comments about this example:
Use a file object any time you wish to read from or write to the disk file system.
Here is an example that (1) writes to a file, then (2) appends to that file, and finally, (3) reads from the file:
def write_file(outfilename): outfile = open(outfilename, 'w') outfile.write('Line # 1\n') outfile.write('Line # 2\n') outfile.write('Line # 3\n') outfile.close() def append_file(outfilename): outfile = open(outfilename, 'a') outfile.write('Line # 4\n') outfile.write('Line # 5\n') outfile.close() def read_file(infilename): infile = open(infilename, 'r') for line in infile: print line.rstrip() infile.close() def test(): filename = 'temp_file.txt' write_file(filename) read_file(filename) append_file(filename) print '-' * 50 read_file(filename) test()
To read a text file, first create a file object. Here is an example:
inFile = open('messages.log', 'r')
Then use the file object as an iterator or use one or more of the file object's methods to process the contents of the file. Here are a few strategies:
Use for line in inFile: to process one line at a time. You can do this because (at least since Python 2.3) file objects obey the iterator protocol, that is they support methods __iter__() and next(). For more on the iterator protocol see Python Standard Library: Iterator Types -- http://docs.python.org/library/stdtypes.html#iterator-types.
Example:
>>> inFile = file('tmp.txt', 'r') >>> for line in inFile: ... print 'Line:', line, ... Line: aaaaa Line: bbbbb Line: ccccc Line: ddddd Line: eeeee >>> inFile.close()
For earlier versions of Python, one strategy is to use "inFile.readlines()", which creates a list of lines.
If your want to get the contents of an entire text file as a collection of lines, use readlines(). Alternatively, you could use read() followed by splitlines(). Example:
>>> inFile = open('data2.txt', 'r') >>> lines = inFile.readlines() >>> inFile.close() >>> lines ['aaa bbb ccc\n', 'ddd eee fff\n', 'ggg hhh iii\n'] >>> >>> inFile = open('data2.txt', 'r') >>> content = inFile.read() >>> inFile.close() >>> lines = content.splitlines() >>> lines ['aaa bbb ccc', 'ddd eee fff', 'ggg hhh iii']
Use inFile.read() to get the entire contents of the file (a string). Example:
>>> inFile = open('tmp.txt', 'r') >>> content = inFile.read() >>> inFile.close() >>> print content aaa bbb ccc ddd eee fff ggg hhh iii >>> words = content.split() >>> print words ['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff', 'ggg', 'hhh', 'iii'] >>> for word in words: ... print word ... aaa bbb ccc ddd eee fff ggg hhh iii
Simple statements in Python do not contain a nested block.
Alert: In Python version 3.0, the print statement has become the print() built-in function. You will need to add parentheses.
The print statement sends output to stdout.
Here are a few examples:
print obj print obj1, obj2, obj3 print "My name is %s" % name
Notes:
To print multiple items, separate them with commas. The print statement inserts a blank between objects.
The print statement automatically appends a newline to output. To print without a newline, add a comma after the last object, or use "sys.stdout", for example:
print 'Output with no newline',
which will append a blank, or:
import sys sys.stdout.write("Some output")
To re-define the destination of output from the print statement, replace sys.stdout with an instance of a class that supports the write method. For example:
import sys class Writer: def __init__(self, filename): self.filename = filename def write(self, msg): f = file(self.filename, 'a') f.write(msg) f.close() sys.stdout = Writer('tmp.log') print 'Log message #1' print 'Log message #2' print 'Log message #3'
More information on the print statement is at The print statement -- http://docs.python.org/reference/simple_stmts.html#the-print-statement.
The assignment operator is =.
Here are some of the things you can assign a value to:
A name (variable)
An item (position) in a list. Example:
>>> a = [11, 22, 33] >>> a [11, 22, 33] >>> a[1] = 99 >>> a [11, 99, 33]
A key in a dictionary. Example:
>>> names = {} >>> names['albert'] = 25 >>> names {'albert': 25}
A slice in a list. Example:
>>> a = [11, 22, 33, 44, 55, 66, 77, ] >>> a [11, 22, 33, 44, 55, 66, 77] >>> a[1:3] = [999, 888, 777, 666] >>> a [11, 999, 888, 777, 666, 44, 55, 66, 77]
A tuple or list. Assignment to a tuple or list performs unpacking. Example:
>>> values = 111, 222, 333 >>> values (111, 222, 333) >>> a, b, c = values >>> a 111 >>> b 222 >>> c 333
Unpacking suggests a convenient idiom for returning and capturing a multiple arguments from a function. Example:
>>> def multiplier(n): ... return n, n * 2, n * 3 ... >>> >>> x, y, z = multiplier(4) >>> x 4 >>> y 8 >>> z 12
If a function needs to return a variable number of values, then unpacking will not do. But, you can still return multiple values by returning a container of some kind (for example, a tuple, a list, a dictionary, a set, etc.).
An attribute. Example:
>>> class A(object): ... pass ... >>> c = A() >>> >>> a = A() >>> a.size = 33 >>> print a.size 33 >>> a.__dict__ {'size': 33}
Things to know about the import statement:
Here are several forms of the import statement:
Import a module. Refer to an attribute in that module:
import test print test.x
Import a specific attribute from a module:
from test import x from othertest import y, z print x, y, z
Import all the attributes in a module:
from test import * print x print y
Recommendation: Use this form sparingly. Using from mod import * makes it difficult to track down variables and, thus, to debug your code
Import a module and rename it. Import an attribute from a module and rename it:
import test as theTest from test import x as theValue print theTest.x print theValue
A few comments about import:
More information on import is at Language Reference: The import statement -- http://docs.python.org/reference/simple_stmts.html#the-import-statement.
Use the assert statement to place error checking statements in your code. Here is an example:
def test(arg1, arg2): arg1 = float(arg1) arg2 = float(arg2) assert arg2 != 0, 'Bad dividend -- arg1: %f arg2: %f' % (arg1, arg2) ratio = arg1 / arg2 print 'ratio:', ratio
When arg2 is zero, running this code will produce something like the following:
Traceback (most recent call last): File "tmp.py", line 22, in ? main() File "tmp.py", line 18, in main test(args[0], args[1]) File "tmp.py", line 8, in test assert arg2 != 0, 'Bad dividend -- arg1: %f arg2: %f' % (arg1, arg2) AssertionError: Bad dividend -- arg1: 2.000000 arg2: 0.000000
A few comments:
The problem -- Imagine a global variable NAME. If, in a function, the first mention of that variable is "name = NAME", then I'll get the value of the the global variable NAME. But, if, in a function, my first mention of that variable is an assignment to that variable, then I will create a new local variable, and will not refer to the global variable at all. Consider:
NAME = "Peach" def show_global(): name = NAME print '(show_global) name: %s' % name def set_global(): NAME = 'Nectarine' name = NAME print '(set_global) name: %s' % name show_global() set_global() show_global()
Running this code produces:
(show_global) name: Peach (set_global) name: Nectarine (show_global) name: Peach
The set_global modifies a local variable and not the global variable as I might have intended.
The solution -- How can I fix that? Here is how:
NAME = "Peach" def show_global(): name = NAME print '(show_global) name: %s' % name def set_global(): global NAME NAME = 'Nectarine' name = NAME print '(set_global) name: %s' % name show_global() set_global() show_global()
Notice the global statement in function set_global. Running this code does modify the global variable NAME, and produces the following output:
(show_global) name: Peach (set_global) name: Nectarine (show_global) name: Nectarine
Comments:
You can list more than one veriable in the global statement. For example:
global NAME1, NAME2, NAME3
A compound statement has a nested (and indented) block of code. A compound statement may have multiple clauses, each with a nested block of code. Each compound statement has a header line (which starts with a keyword and ends with a colon).
The if statement enables us to execute code (or not) depending on a condition:
if condition: statement-block if condition: statement-block-1 else: statement-block-2 if condition-1: statement-block-1 elif condition-2: statement-block-2 o o o else: statement-block-n
Here is an example:
>>> y = 25 >>> >>> if y > 15: ... print 'y is large' ... else: ... print 'y is small' ... y is large
A few notes:
The condition can be any expression, i.e. something that returns a value. A detailed description of expressions can be found at Python Language Reference: Expressions -- http://docs.python.org/reference/expressions.html.
Parentheses are not needed around the condition. Use parentheses to group sub-expressions and control the order of evaluation when the natural operator precedence is not what you want. Python's operator precedences are described at Python Language Reference: Expressions: Summary -- http://docs.python.org/reference/expressions.html#summary.
Python has no switch statement. Use if:elif:.... Or consider using a dictionary. Here is an example that uses both of these techniques:
def function1(): print "Hi. I'm function 1." def function2(): print "Hi. I'm function 2." def function3(): print "Hi. I'm function 3." def error_function(): print "Invalid option." def test1(): while 1: code = raw_input('Enter "one", "two", "three", or "quit": ') if code == 'quit': break if code == 'one': function1() elif code == 'two': function2() elif code == 'three': function3() else: error_function() def test2(): mapper = {'one': function1, 'two': function2, 'three': function3} while 1: code = raw_input('Enter "one", "two", "three", or "quit": ') if code == 'quit': break func = mapper.get(code, error_function) func() def test(): test1() print '-' * 50 test2() if __name__ == '__main__': test()
The for: statement enables us to iterate over collections. It enables us to repeat a set of lines of code once for each item in a collection. Collections are things like strings (arrays of characters), lists, tuples, and dictionaries.
Here is an example:
>>> collection = [111,222,333] >>> for item in collection: ... print 'item:', item ... item: 111 item: 222 item: 333
Comments:
You can iterate over strings, lists, and tuples. Actually, you can iterate over almost any container-like object.
Iterate over the keys or values in a dictionary with "aDict.keys()" and "aDict.values()". Here is an example:
>>> aDict = {'cat': 'furry and cute', 'dog': 'friendly and smart'} >>> aDict.keys() ['dog', 'cat'] >>> aDict.values() ['friendly and smart', 'furry and cute'] >>> for key in aDict.keys(): ... print 'A %s is %s.' % (key, aDict[key]) ... A dog is friendly and smart. A cat is furry and cute.
In recent versions of Python, a dictionary itself is an iterator for its keys. Therefore, you can also do the following:
>>> for key in aDict: ... print 'A %s is %s.' % (key, aDict[key]) ... A dog is friendly and smart. A cat is furry and cute.
And, in recent versions of Python, a file is also an iterator over the lines in the file. Therefore, you can do the following:
>>> infile = file('tmp.txt', 'r') >>> for line in infile: ... print line, ... This is line #1 This is line #2 This is line #3 >>> infile.close()
There are other kinds of iterators. For example, the built-in iter will produce an iterator from a collection. Here is an example:
def test(): anIter = iter([11,22,33]) for item in anIter: print 'item:', item test()
Which produces:
item: 11 item: 22 item: 33
You can also implement iterators of your own. One way to do so, is to define a function that returns values with yield (instead of with return). Here is an example:
def t(collection): icollection = iter(collection) for item in icollection: yield '||%s||' % item def test(): collection = [111, 222, 333, ] for x in t(collection): print x test()
Which prints out:
||111|| ||222|| ||333||
If an iterator produces a sequence of lists or tuples, each of which contain the same (small) number of items, then you can do unpacking directly in the header of the for: statement. Here is an example:
In [5]: collection = [('apple', 'red'), ('banana', 'yello'), ('kiwi', 'green')] In [6]: for name, color in collection: ...: print 'name: %s, color: %s' % (name, color, ) ...: ...: name: apple, color: red name: banana, color: yello name: kiwi, color: green
The unpacking described above and the enumerate built-in function provides a convenient way to process items in a sequence with an index. This example adds each value in one list to the corresponding value in a second list:
In [9]: a = [11, 22, 33] In [10]: a = [11, 22, 33] In [11]: In [12]: a = [11, 22, 33] In [13]: b = [111, 222, 333] In [14]: for idx, value in enumerate(a): ....: b[idx] += value ....: ....: In [15]: a Out[15]: [11, 22, 33] In [16]: b Out[16]: [122, 244, 366]
Another way to implement an iterator is to implement a class that supports the iterator protocol. See Python Standard Library: Iterator Types -- http://docs.python.org/library/stdtypes.html#iterator-types for more on implementing iterators. But notice that this protocol has changed in Python 3.0. For information on the iterator protocol in Python 3.0, see Python 3.0 Standard Library: Iterator Types -- http://docs.python.org/3.0/library/stdtypes.html#iterator-types.
while: is another repeating statement. It executes a block of code until a condition is false.
Here is an example:
>>> reply = 'repeat' >>> while reply == 'repeat': ... print 'Hello' ... reply = raw_input('Enter "repeat" to do it again: ') ... Hello Enter "repeat" to do it again: repeat Hello Enter "repeat" to do it again: bye
Comments:
Use the break statement to exit immediately from a loop. This works in both for: and while:. Here is an example that uses break in a for: statement:
# for_break.py """Count lines until a line that begins with a double #. """ import sys def countLines(infilename): infile = file(infilename, 'r') count = 0 for line in infile.readlines(): line = line.strip() if line[:2] == '##': break count += 1 return count def usage(): print 'Usage: python python_101_for_break.py <infilename>' sys.exit(-1) def main(): args = sys.argv[1:] if len(args) != 1: usage() count = countLines(args[0]) print 'count:', count if __name__ == '__main__': main()
Use the continue statement to skip the remainder of the code block in a for: or while: statement. A continue is a short-circuit which, in effect, branches immediately back to the top of the for: or while: statement (or if you prefer, to the end of the block).
The test if __name__ == '__main__': is used to enable a script to both be (1) imported and (2) run from the command line. That condition is true only when the script is run, but not imported. This is a common Python idiom, which you should consider including at the end of your scripts, whether (1) to give your users a demonstration of what your script does and how to use it or (2) to provide a test of the script.
Use a try:except: statement to catch an exception.
Use the raise statement to raise an exception.
Comments and hints:
Catch all exceptions with a "bare" except:. For example:
>>> try: ... x = y ... except: ... print 'y not defined' ... y not defined
Note, however, that it is usually better to catch specific exceptions.
Catch a specific error by refering to an exception class in the except:. To determine what error or exception you want to catch, generate it and try it. Because Python reports errors with a walk-back that ends with reporting the exception, you can learn which exception to catch. For example, suppose I want to learn which exception is thrown when a Python can't open a file. I can try the following from the interactive prompt:
>>> myfile = file('amissingfile.py', 'r') Traceback (most recent call last): File "<stdin>", line 1, in ? IOError: [Errno 2] No such file or directory: 'amissingfile.py'
So, now I know that I can do:
def test(): try: myfile = file('amissingfile.py', 'r') except IOError: print 'amissingfile.py is missing' test()
Which produces:
amissingfile.py is missing
Exception types are described here: Python Standard Library: Built-in Exceptions -- http://docs.python.org/library/exceptions.html.
Catch any one of several execption types by using a tuple containing the exceptions to be caught. Example:
try: f = open('abcdexyz.txt', 'r') d = {} x = d['name'] except (IOError, KeyError), e: print 'The error is --', e
Note that multiple types of exceptions to be caught by a single except: clause are in parentheses; they are a tuple.
You can customize your error handling still further (1) by passing an object when you raise the exception and (2) by catching that object in the except: clause of your try: statement. By doing so, you can pass information up from the raise statement to an exception handler. One way of doing this is to pass an object. A reasonable strategy is to define a sub-class of a standard exception. For example:
class E(Exception): def __init__(self, msg): self.msg = msg def getMsg(self): return self.msg def test(): try: raise E('my test error') except E, obj: print 'Msg:', obj.getMsg() test()
Which produces:
Msg: my test error
If you catch an exception using try:except:, but then find that you do not want to handle the exception at that location, you can "re-raise" the same exception (with the same arguments) by using raise with no arguments. An example:
class GeneralException(Exception): pass class SimpleException(GeneralException): pass class ComplexException(GeneralException): pass def some_func_that_throws_exceptions(): #raise SimpleException('this is a simple error') raise ComplexException('this is a complex error') def test(): try: some_func_that_throws_exceptions() except GeneralException, e: if isinstance(e, SimpleException): print e else: raise test()
This section describes Python features that you can use to organize and structure your code.
Use def to define a function. Here is a simple example:
def test(msg, count): for idx in range(count): print '%s %d' % (msg, idx) test('Test #', 4)
Comments:
After evaluation def creates a function object.
Call the function using the parentheses function call notation, in this case "test('Test #', 4)".
As with other Python objects, you can stuff a function object into other structures such as tuples, lists, and dictionaries. Here is an example:
# Create a tuple: val = (test, 'A label:', 5) # Call the function: val[0](val[1], val[2])
Providing default arguments allows the caller to omit some arguments. Here is an example:
def testDefaultArgs(arg1='default1', arg2='default2'): print 'arg1:', arg1 print 'arg2:', arg2 testDefaultArgs('Explicit value')
The above example prints:
arg1: Explicit value arg2: default2
Here is an example:
def testArgLists_1(*args, **kwargs): print 'args:', args print 'kwargs:', kwargs testArgLists_1('aaa', 'bbb', arg1='ccc', arg2='ddd') def testArgLists_2(arg0, *args, **kwargs): print 'arg0: "%s"' % arg0 print 'args:', args print 'kwargs:', kwargs def test(): testArgLists_1('aaa', 'bbb', arg1='ccc', arg2='ddd') print '=' * 40 testArgLists_2('a first argument', 'aaa', 'bbb', arg1='ccc', arg2='ddd') test()
Running this example displays:
args: ('aaa', 'bbb') kwargs: {'arg1': 'ccc', 'arg2': 'ddd'} ======================================== arg0: "a first argument" args: ('aaa', 'bbb') kwargs: {'arg1': 'ccc', 'arg2': 'ddd'}
A little guidance:
You can also call a function using the name of a parameter as a keyword. Here is an example:
def test_keyword_args(foreground_color='black', background_color='white', link_color='blue', visited_link_color='red'): print 'foreground_color: "%s"' % foreground_color print 'background_color: "%s"' % background_color print 'link_color: "%s"' % link_color print 'visited_link_color: "%s"' % visited_link_color def test(): test_keyword_args() print '-' * 40 test_keyword_args(background_color='green') print '-' * 40 test_keyword_args(link_color='gray', visited_link_color='yellow') test()
When we run this example, it produces the following:
foreground_color: "black" background_color: "white" link_color: "blue" visited_link_color: "red" ---------------------------------------- foreground_color: "black" background_color: "green" link_color: "blue" visited_link_color: "red" ---------------------------------------- foreground_color: "black" background_color: "white" link_color: "gray" visited_link_color: "yellow"
Define a basic class as follows:
class Basic: def __init__(self, name): self.name = name def show(self): print 'Basic -- name: %s' % self.name def test(): obj1 = Basic('Apricot') obj1.show() test()
Running the above example produces the following:
Basic -- name: Apricot
Explanation:
A few more notes on self:
Define a class Special that inherits from a super-class Basic as follows:
class Basic: def __init__(self, name): self.name = name def show(self): print 'Basic -- name: %s' % self.name class Special(Basic): def __init__(self, name, edible): Basic.__init__(self, name) self.upper = name.upper() self.edible = edible def show(self): Basic.show(self) print 'Special -- upper name: %s.' % self.upper if self.edible: print "It's edible." else: print "It's not edible." def edible(self): return self.edible def test(): obj1 = Basic('Apricot') obj1.show() print '=' * 30 obj2 = Special('Peach', 1) obj2.show() test()
Running this example produces the following:
Basic -- name: Apricot ============================== Basic -- name: Peach Special -- upper name: PEACH. It's edible.
Comments:
A class data member is a member that has only one value for the class and all its instances. Here is an example from the Python FAQ at http://www.python.org/doc/FAQ.html:
class C: count = 0 # number of times C.__init__ called def __init__(self): C.count += 1 def getcount(self): return C.count # or return self.count def test(): c1 = C() print 'Current count:', c1.getcount() c2 = C() print 'Current count:', c2.getcount() test()
Running this example produces:
Current count: 1 Current count: 2
New-style classes can have static methods and class methods.
A new-style class is a class that inherits directly or indirectly from object or from a built-in type.
Here is an example that shows how to define static methods and class methods:
class Advanced(object): def __init__(self, name): self.name = name def Description(): return 'This is an advanced class.' def ClassDescription(cls): return 'This is advanced class: %s' % repr(cls) Description = staticmethod(Description) ClassDescription = classmethod(ClassDescription) obj1 = Advanced('Nectarine') print obj1.Description() print obj1.ClassDescription() print '=' * 30 print Advanced.Description() print Advanced.ClassDescription()
Running the above produces the following output:
This is an advanced class. This is advanced class: <class __main__.Advanced at 0x401c926c> ============================== This is an advanced class. This is advanced class: <class __main__.Advanced at 0x401c926c>
Notes:
The difference between static and class methods is that a class method receives the class (not the instance) as its first argument. A summary:
By now, you are likely to be asking: "Why and when should I use class methods and static methods?" Here is a bit of guidance:
To summarize:
Implement an instance method, unless ... the method needs access to class variables but not instance variables, then implement a class method, unless ... the method needs access to neither instance variables nor class variables and you still want to include it within the class definition, then implement a static method.
Above all, write clear, plain code that will be understandable to your readers. Do not use a more confusing language feature and do not force your readers to learn a new language feature unles you have a good reason.
A new-style class can have properties. A property is an attribute of a class that is associated with a getter and a setter function.
Declare the property and its getter and setter functions with property().
Here is an example:
class A(object): count = 0 def __init__(self, name): self.name = name def set_name(self, name): print 'setting name: %s' % name self.name = name def get_name(self): print 'getting name: %s' % self.name return self.name objname = property(get_name, set_name) def test(): a = A('apple') print 'name: %s' % a.objname a.objname = 'banana' print 'name: %s' % a.objname test()
Running the above produces the following output:
getting name: apple name: apple setting name: banana getting name: banana name: banana
Notes:
You can use a module to organize a number of Python definitions in a single file. A definition can be a function, a class, or a variable containing any Python object. Here is an example:
# python_101_module_simple.py """ This simple module contains definitions of a class and several functions. """ LABEL = '===== Testing a simple module =====' class Person: """Sample of a simple class definition. """ def __init__(self, name, description): self.name = name self.description = description def show(self): print 'Person -- name: %s description: %s' % (self.name, self.description) def test(msg, count): """A sample of a simple function. """ for idx in range(count): print '%s %d' % (msg, idx) def testDefaultArgs(arg1='default1', arg2='default2'): """A function with default arguments. """ print 'arg1:', arg1 print 'arg2:', arg2 def testArgLists(*args, **kwargs): """ A function which references the argument list and keyword arguments. """ print 'args:', args print 'kwargs:', kwargs def main(): """ A test harness for this module. """ print LABEL person = Person('Herman', 'A cute guy') person.show() print '=' * 30 test('Test #', 4) print '=' * 30 testDefaultArgs('Explicit value') print '=' * 30 testArgLists('aaa', 'bbb', arg1='ccc', arg2='ddd') if __name__ == '__main__': main()
Running the above produces the following output:
===== Testing a simple module ===== Person -- name: Herman description: A cute guy ============================== Test # 0 Test # 1 Test # 2 Test # 3 ============================== arg1: Explicit value arg2: default2 ============================== args: ('aaa', 'bbb') kwargs: {'arg1': 'ccc', 'arg2': 'ddd'}
Comments:
The string definitions at the beginning of each of the module, class definitions, and function definitions serve as documentation for these items. You can show this documentation with the following from the command-line:
$ pydoc python_101_module_simple
Or this, from the Python interactive prompt:
>>> import python_101_module_simple >>> help(python_101_module_simple)
It is common and it is a good practice to include a test harness for the module at the end of the source file. Note that the test:
if __name__ == '__main__':
will be true only when the file is run (e.g. from the command-line with something like:
"$ python python_101_module_simple.py
but not when the module is imported.
Remember that the code in a module is only evaluated the first time it is imported in a program. So, for example, change the value of a global variable in a module might cause behavior that users of the module might not expect.
Constants, on the other hand, are safe. A constant, in Python, is a variable whose value is initialized but not changed. An example is LABEL, above.
A package is a way to organize a number of modules together as a unit. Python packages can also contain other packages.
To give us an example to talk about, consider the follow package structure:
package_example/ package_example/__init__.py package_example/module1.py package_example/module2.py package_example/A.py package_example/B.py
And, here are the contents:
__init__.py:
# __init__.py # Expose definitions from modules in this package. from module1 import class1 from module2 import class2
module1.py:
# module1.py class class1: def __init__(self): self.description = 'class #1' def show(self): print self.description
module2.py:
# module2.py class class2: def __init__(self): self.description = 'class #2' def show(self): print self.description
A.py:
# A.py import B
B.py:
# B.py def function_b(): print 'Hello from function_b'
In order to be used as a Python package (e.g. so that modules can be imported from it) a directory must contain a file whose name is __init__.py. The code in this module is evaluated the first time a module is imported from the package.
In order to import modules from a package, you may either add the package directory to sys.path or, if the parent directory is on sys.path, use dot-notation to explicitly specify the path. In our example, you might use: "import package_example.module1".
A module in a package can import another module from the same package directly without using the path to the package. For example, the module A in our sample package package_example can import module B in the same package with "import B". Module A does not need to use "import package_example.B".
You can find additional information on packages at http://www.python.org/doc/essays/packages.html.
Suggested techniques:
In the __init__.py file, import and make available objects defined in modules in the package. Our sample package package_example does this. Then, you can use from package_example import * to import the package and its contents. For example:
>>> from package_example import * >>> dir() ['__builtins__', '__doc__', '__file__', '__name__', 'atexit', 'class1', 'class2', 'module1', 'module2', 'readline', 'rlcompleter', 'sl', 'sys'] >>> >>> c1 = class1() >>> c2 = class2() >>> c1.show() class #1 >>> c2.show() class #2
A few additional notes:
With Python 2.3, you can collect the modules in a package into a Zip file by using PyZipFile from the Python standard library. See http://www.python.org/doc/current/lib/pyzipfile-objects.html.
>>> import zipfile >>> a = zipfile.PyZipFile('mypackage.zip', 'w', zipfile.ZIP_DEFLATED) >>> a.writepy('Examples') >>> a.close()
Then you can import and use this archive by inserting its path in sys.path. In the following example, class_basic_1 is a module within package mypackage:
>>> import sys >>> sys.path.insert(0, '/w2/Txt/Training/mypackage.zip') >>> import class_basic_1 Basic -- name: Apricot >>> obj = class_basic_1.Basic('Wilma') >>> obj.show() Basic -- name: Wilma
Thanks to the implementors of Python for producing an exceptionally usable and enjoyable programming language.
This document is intended as notes for a course on (slightly) advanced Python topics.
For more help on regular expressions, see:
A regular expression pattern is a sequence of characters that will match sequences of characters in a target.
The patterns or regular expressions can be defined as follows:
Because of the use of backslashes in patterns, you are usually better off defining regular expressions with raw strings, e.g. r"abc".
When a regular expression is to be used more than once, you should consider compiling it. For example:
import sys, re pat = re.compile('aa[bc]*dd') while 1: line = raw_input('Enter a line ("q" to quit):') if line == 'q': break if pat.search(line): print 'matched:', line else: print 'no match:', line
Comments:
Use match() to match at the beginning of a string (or not at all).
Use search() to search a string and match the first string from the left.
Here are some examples:
>>> import re >>> pat = re.compile('aa[0-9]*bb') >>> x = pat.match('aa1234bbccddee') >>> x <_sre.SRE_Match object at 0x401e9608> >>> x = pat.match('xxxxaa1234bbccddee') >>> x >>> type(x) <type 'NoneType'> >>> x = pat.search('xxxxaa1234bbccddee') >>> x <_sre.SRE_Match object at 0x401e9608>
Notes:
When a match or search is successful, it returns a match object. When it fails, it returns None.
You can also call the corresponding functions match and search in the re module, e.g.:
>>> x = re.search(pat, 'xxxxaa1234bbccddee') >>> x <_sre.SRE_Match object at 0x401e9560>
For a list of functions in the re module, see Module Contents -- http://docs.python.org/library/re.html#module-contents.
Match objects enable you to extract matched sub-strings after performing a match. A match object is returned by successful match. The part of the target available in the match object is the portion matched by groups in the pattern, that is the portion of the pattern inside parentheses. For example:
In [69]: mo = re.search(r'height: (\d*) width: (\d*)', 'height: 123 width: 456') In [70]: mo.groups() Out[70]: ('123', '456')
Here is another example:
import sys, re Targets = [ 'There are <<25>> sparrows.', 'I see <<15>> finches.', 'There is nothing here.', ] def test(): pat = re.compile('<<([0-9]*)>>') for line in Targets: mo = pat.search(line) if mo: value = mo.group(1) print 'value: %s' % value else: print 'no match' test()
When we run the above, it prints out the following:
value: 25 value: 15 no match
Explanation:
In addition, you can:
Use "values = mo.groups()" to get a tuple containing the strings matched by all groups.
Use "mo.expand()" to interpolate the group values into a string. For example, "mo.expand(r'value1: \1 value2: \2')"inserts the values of the first and second group into a string. If the first group matched "aaa" and the second matched "bbb", then this example would produce "value1: aaa value2: bbb". For example:
In [76]: mo = re.search(r'h: (\d*) w: (\d*)', 'h: 123 w: 456') In [77]: mo.expand(r'Height: \1 Width: \2') Out[77]: 'Height: 123 Width: 456'
You can extract multiple items with a single search. Here is an example:
import sys, re pat = re.compile('aa([0-9]*)bb([0-9]*)cc') while 1: line = raw_input('Enter a line ("q" to quit):') if line == 'q': break mo = pat.search(line) if mo: value1, value2 = mo.group(1, 2) print 'value1: %s value2: %s' % (value1, value2) else: print 'no match'
Comments:
A simple way to perform multiple replacements using a regular expression is to use the re.subn() function. Here is an example:
In [81]: re.subn(r'\d+', '***', 'there are 203 birds sitting in 2 trees') Out[81]: ('there are *** birds sitting in *** trees', 2)
For more complex replacements, use a function instead of a constant replacement string:
import re def repl_func(mo): s1 = mo.group(1) s2 = '*' * len(s1) return s2 def test(): pat = r'(\d+)' in_str = 'there are 2034 birds in 21 trees' out_str, count = re.subn(pat, repl_func, in_str) print 'in: "%s"' % in_str print 'out: "%s"' % out_str print 'count: %d' % count test()
And when we run the above, it produces:
in: "there are 2034 birds in 21 trees" out: "there are **** birds in ** trees" count: 2
Notes:
Here is an even more complex example -- You can locate sub-strings (slices) of a match and replace them:
import sys, re pat = re.compile('aa([0-9]*)bb([0-9]*)cc') while 1: line = raw_input('Enter a line ("q" to quit): ') if line == 'q': break mo = pat.search(line) if mo: value1, value2 = mo.group(1, 2) start1 = mo.start(1) end1 = mo.end(1) start2 = mo.start(2) end2 = mo.end(2) print 'value1: %s start1: %d end1: %d' % (value1, start1, end1) print 'value2: %s start2: %d end2: %d' % (value2, start2, end2) repl1 = raw_input('Enter replacement #1: ') repl2 = raw_input('Enter replacement #2: ') newline = (line[:start1] + repl1 + line[end1:start2] + repl2 + line[end2:]) print 'newline: %s' % newline else: print 'no match'
Explanation:
Alternatively, use "mo.span(1)" instead of "mo.start(1)" and "mo.end(1)" in order to get the start and end of a sub-match in a single operation. "mo.span(1)"returns a tuple: (start, end).
Put together a new string with string concatenation from pieces of the original string and replacement values. You can use string slices to get the sub-strings of the original string. In our case, the following gets the start of the string, adds the first replacement, adds the middle of the original string, adds the second replacement, and finally, adds the last part of the original string:
newline = line[:start1] + repl1 + line[end1:start2] + repl2 + line[end2:]
You can also use the sub function or method to do substitutions. Here is an example:
import sys, re pat = re.compile('[0-9]+') print 'Replacing decimal digits.' while 1: target = raw_input('Enter a target line ("q" to quit): ') if target == 'q': break repl = raw_input('Enter a replacement: ') result = pat.sub(repl, target) print 'result: %s' % result
Here is another example of the use of a function to insert calculated replacements.
import sys, re, string pat = re.compile('[a-m]+') def replacer(mo): return string.upper(mo.group(0)) print 'Upper-casing a-m.' while 1: target = raw_input('Enter a target line ("q" to quit): ') if target == 'q': break result = pat.sub(replacer, target) print 'result: %s' % result
Notes:
This is also a convenient use for a lambda instead of a named function, for example:
import sys, re, string pat = re.compile('[a-m]+') print 'Upper-casing a-m.' while 1: target = raw_input('Enter a target line ("q" to quit): ') if target == 'q': break result = pat.sub( lambda mo: string.upper(mo.group(0)), target) print 'result: %s' % result
Note 1: You will need a sufficiently recent version of Python in order to use iterators and generators. I believe that they were introduced in Python 2.2.
Note 2: The iterator protocol has changed slightly in Python version 3.0.
Goals for this section:
Definitions:
A few additional basic points:
This section attempts to provide examples that illustrate the generator/iterator pattern.
Why is this important?
Examples - The remainder of this section provides a set of examples which implement and use iterators.
This function contains a yield statement. Therefore, when we call it, it produces an iterator:
def generateItems(seq): for item in seq: yield 'item: %s' % item anIter = generateItems([]) print 'dir(anIter):', dir(anIter) anIter = generateItems([111,222,333]) for x in anIter: print x anIter = generateItems(['aaa', 'bbb', 'ccc']) print anIter.next() print anIter.next() print anIter.next() print anIter.next()
Running this example produces the following output:
dir(anIter): ['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', 'gi_frame', 'gi_running', 'next'] item: 111 item: 222 item: 333 item: aaa item: bbb item: ccc Traceback (most recent call last): File "iterator_generator.py", line 14, in ? print anIter.next() StopIteration
Notes and explanation:
An alternative and perhaps simpler way to create an interator is to use a generator expression. This can be useful when you already have a collection or iterator to work with.
Then following example implements a function that returns a generator object. The effect is to generate the objects in a collection which excluding items in a separte collection:
DATA = [ 'lemon', 'lime', 'grape', 'apple', 'pear', 'watermelon', 'canteloupe', 'honeydew', 'orange', 'grapefruit', ] def make_producer(collection, excludes): gen = (item for item in collection if item not in excludes) return gen def test(): iter1 = make_producer(DATA, ('apple', 'orange', 'honeydew', )) print '%s' % iter1 for fruit in iter1: print fruit test()
When run, this example produces the following:
$ python workbook063.py <generator object <genexpr> at 0x7fb3d0f1bc80> lemon lime grape pear watermelon canteloupe grapefruit
Notes:
Each time this method is called, it produces a (new) iterator object. This method is analogous to the iterkeys and itervalues methods in the dictionary built-in object:
# # A class that provides an iterator generator method. # class Node: def __init__(self, name='<noname>', value='<novalue>', children=None): self.name = name self.value = value self.children = children if children is None: self.children = [] else: self.children = children def set_name(self, name): self.name = name def get_name(self): return self.name def set_value(self, value): self.value = value def get_value(self): return self.value def iterchildren(self): for child in self.children: yield child # # Print information on this node and walk over all children and # grandchildren ... def walk(self, level=0): print '%sname: %s value: %s' % ( get_filler(level), self.get_name(), self.get_value(), ) for child in self.iterchildren(): child.walk(level + 1) # # An function that is the equivalent of the walk() method in # class Node. # def walk(node, level=0): print '%sname: %s value: %s' % ( get_filler(level), node.get_name(), node.get_value(), ) for child in node.iterchildren(): walk(child, level + 1) def get_filler(level): return ' ' * level def test(): a7 = Node('gilbert', '777') a6 = Node('fred', '666') a5 = Node('ellie', '555') a4 = Node('daniel', '444') a3 = Node('carl', '333', [a4, a5]) a2 = Node('bill', '222', [a6, a7]) a1 = Node('alice', '111', [a2, a3]) # Use the walk method to walk the entire tree. print 'Using the method:' a1.walk() print '=' * 30 # Use the walk function to walk the entire tree. print 'Using the function:' walk(a1) test()
Running this example produces the following output:
Using the method: name: alice value: 111 name: bill value: 222 name: fred value: 666 name: gilbert value: 777 name: carl value: 333 name: daniel value: 444 name: ellie value: 555 ============================== Using the function: name: alice value: 111 name: bill value: 222 name: fred value: 666 name: gilbert value: 777 name: carl value: 333 name: daniel value: 444 name: ellie value: 555
Notes and explanation:
This class implements the iterator protocol. Therefore, instances of this class are iterators. The presence of the next() and __iter__() methods means that this class implements the iterator protocol and makes instances of this class iterators.
Note that when an iterator is "exhausted" it, normally, cannot be reused to iterate over the sequence. However, in this example, we provide a refresh method which enables us to "rewind" and reuse the iterator instance:
# # An iterator class that does *not* use ``yield``. # This iterator produces every other item in a sequence. # class IteratorExample: def __init__(self, seq): self.seq = seq self.idx = 0 def next(self): self.idx += 1 if self.idx >= len(self.seq): raise StopIteration value = self.seq[self.idx] self.idx += 1 return value def __iter__(self): return self def refresh(self): self.idx = 0 def test_iteratorexample(): a = IteratorExample('edcba') for x in a: print x print '----------' a.refresh() for x in a: print x print '=' * 30 a = IteratorExample('abcde') try: print a.next() print a.next() print a.next() print a.next() print a.next() print a.next() except StopIteration, e: print 'stopping', e
Running this example produces the following output:
d b ---------- d b ============================== b d stopping
Notes and explanation:
There may be times when the next method is easier and more straight-forward to implement using yield. If so, then this class might serve as an model. If you do not feel the need to do this, then you should ignore this example:
# # An iterator class that uses ``yield``. # This iterator produces every other item in a sequence. # class YieldIteratorExample: def __init__(self, seq): self.seq = seq self.iterator = self._next() self.next = self.iterator.next def _next(self): flag = 0 for x in self.seq: if flag: flag = 0 yield x else: flag = 1 def __iter__(self): return self.iterator def refresh(self): self.iterator = self._next() self.next = self.iterator.next def test_yielditeratorexample(): a = YieldIteratorExample('edcba') for x in a: print x print '----------' a.refresh() for x in a: print x print '=' * 30 a = YieldIteratorExample('abcde') try: print a.next() print a.next() print a.next() print a.next() print a.next() print a.next() except StopIteration, e: print 'stopping', e test_yielditeratorexample()
Running this example produces the following output:
d b ---------- d b ============================== b d stopping
Notes and explanation:
Because the _next method uses yield, calling it (actually, calling the iterator object it produces) in an iterator context causes it to be "resumed" immediately after the yield statement. This reduces bookkeeping a bit.
However, with this style, we must explicitly produce an iterator. We do this by calling the _next method, which contains a yield statement, and is therefore a generator. The following code in our constructor (__init__) completes the set-up of our class as an iterator class:
self.iterator = self._next() self.next = self.iterator.next
Remember that we need both __iter__() and next() methods in orderDictionary to satisfy the iterator protocol. The __iter__() method is already there and the above code in the constructor creates the next() method.
A list comprehension looks a bit like an iterator, but it produces a list. See: The Python Language Reference: List displays -- http://docs.python.org/reference/expressions.html#list-displays for more on list comprehensions.
Here is an example:
In [4]: def f(x): ...: return x * 3 ...: In [5]: list1 = [11, 22, 33] In [6]: list2 = [f(x) for x in list1] In [7]: print list2 [33, 66, 99]
A generator expression looks quite similar to a list comprehension, but is enclosed in parentheses rather than square brackets. Unlike a list comprehension, a generator expression does not produce a list; it produces an generator object. A generator object is an iterator.
For more on generator expressions, see The Python Language Reference: Generator expressions -- http://docs.python.org/reference/expressions.html#generator-expressions.
The following example uses a generator expression to produce an iterator:
mylist = range(10) def f(x): return x*3 genexpr = (f(x) for x in mylist) for x in genexpr: print x
Notes and explanation:
Unit test and the Python unit test framework provide a convenient way to define and run tests that ensure that a Python application produces specified results.
This section, while it will not attempt to explain everything about the unit test framework, will provide examples of several straight-forward ways to construct and run tests.
Some assumptions:
In the test class, implement a number of methods to perform your tests. Name your test methods with the prefix "test". Here is an example:
class MyTest: def test_one(self): # some test code pass def test_two(self): # some test code pass
Create a test harness. Here is an example:
# make the test suite. def suite(): loader = unittest.TestLoader() testsuite = loader.loadTestsFromTestCase(MyTest) return testsuite # Make the test suite; run the tests. def test(): testsuite = suite() runner = unittest.TextTestRunner(sys.stdout, verbosity=2) result = runner.run(testsuite)
Here is a more complete example:
import sys, StringIO, string import unittest import webserv_example_heavy_sub # A comparison function for case-insenstive sorting. def mycmpfunc(arg1, arg2): return cmp(string.lower(arg1), string.lower(arg2)) class XmlTest(unittest.TestCase): def test_import_export1(self): inFile = file('test1_in.xml', 'r') inContent = inFile.read() inFile.close() doc = webserv_example_heavy_sub.parseString(inContent) outFile = StringIO.StringIO() outFile.write('<?xml version="1.0" ?>\n') doc.export(outFile, 0) outContent = outFile.getvalue() outFile.close() self.failUnless(inContent == outContent) # make the test suite. def suite(): loader = unittest.TestLoader() # Change the test method prefix: test --> trial. #loader.testMethodPrefix = 'trial' # Change the comparison function that determines the order of tests. #loader.sortTestMethodsUsing = mycmpfunc testsuite = loader.loadTestsFromTestCase(XmlTest) return testsuite # Make the test suite; run the tests. def test_main(): testsuite = suite() runner = unittest.TextTestRunner(sys.stdout, verbosity=2) result = runner.run(testsuite) if __name__ == "__main__": test_main()
Running the above script produces the following output:
test_import_export (__main__.XmlTest) ... ok ---------------------------------------------------------------------- Ran 1 test in 0.035s OK
A few notes on this example:
This example tests the ability to parse an xml document test1_in.xml and export that document back to XML. The test succeeds if the input XML document and the exported XML document are the same.
The code which is being tested parses an XML document returned by a request to Amazon Web services. You can learn more about Amazon Web services at: http://www.amazon.com/webservices. This code was generated from an XML Schema document by generateDS.py. So we are in effect, testing generateDS.py. You can find generateDS.py at: http://www.rexx.com/~dkuhlman/#generateDS.
Testing for success/failure and reporting failures -- Use the methods listed at http://www.python.org/doc/current/lib/testcase-objects.html to test for and report success and failure. In our example, we used "self.failUnless(inContent == outContent)" to ensure that the content we parsed and the content that we exported were the same.
Add additional tests by adding methods whose names have the prefix "test". If you prefer a different prefix for tests names, add something like the following to the above script:
loader.testMethodPrefix = 'trial'
By default, the tests are run in the order of their names sorted by the cmp function. So, if needed, you can control the order of execution of tests by selecting their names, for example, using names like test_1_checkderef, test_2_checkcalc, etc. Or, you can change the comparison function by adding something like the following to the above script:
loader.sortTestMethodsUsing = mycmpfunc
As a bit of motivation for creating and using unit tests, while developing this example, I discovered several errors (or maybe "special features") in generateDS.py.
Extending vs. embedding -- They are different but related:
Documentation -- The two important sources for information about extending and embedding are the following:
Types of extensions:
Tools -- There are several tools that support the development of Python extensions:
Writing an extension module by hand -- What to do:
Implementing a wrapper function -- What to do:
Capture the arguments with PyArg_ParseTuple. The format string specifies how arguments are to be converted and captured. See 1.7 Extracting Parameters in Extension Functions. Here are some of the most commonly used types:
Use "i", "s", "f", etc to convert and capture simple types such as integers, strings, floats, etc.
Use "O" to get a pointer to Python "complex" types such as lists, tuples, dictionaries, etc.
Use items in parentheses to capture and unpack sequences (e.g. lists and tuples) of fixed length. Example:
if (!PyArg_ParseTuple(args, "(ii)(ii)", &x, &y, &width, &height)) { return NULL; } /* if */
A sample call might be:
lowerLeft = (x1, y1) extent = (width1, height1) scan(lowerLeft, extent)
Use ":aName" (colon) at the end of the format string to provide a function name for error messages. Example:
if (!PyArg_ParseTuple(args, "O:setContentHandler", &pythonInstance)) { return NULL; } /* if */
Use ";an error message" (semicolon) at the end of the format string to provide a string that replaces the default error message.
Docs are available at: http://www.python.org/doc/current/ext/parseTuple.html.
Write the logic.
Handle errors and exceptions -- You will need to understand how to (1) clearing errors and exceptions and (2) Raise errors (exceptions).
Many functions in the Python C API raise exceptions. You will need to check for and clear these exceptions. Here is an example:
char * message; int messageNo; message = NULL; messageNo = -1; /* Is the argument a string? */ if (! PyArg_ParseTuple(args, "s", &message)) { /* It's not a string. Clear the error. * Then try to get a message number (an integer). */ PyErr_Clear(); if (! PyArg_ParseTuple(args, "i", &messageNo)) { o o o
You can also raise exceptions in your C code that can be caught (in a "try:except:" block) back in the calling Python code. Here is an example:
if (n == 0) { PyErr_SetString(PyExc_ValueError, "Value must not be zero"); return NULL; }
See Include/pyerrors.h in the Python source distribution for more exception/error types.
And, you can test whether a function in the Python C API that you have called has raised an exception. For example:
if (PyErr_Occurred()) { /* An exception was raised. * Do something about it. */ o o o
For more documentation on errors and exceptions, see: http://www.python.org/doc/current/api/exceptionHandling.html.
Create and return a value:
Note: Our discussion and examples are for SWIG version 1.3
SWIG will often enable you to generate wrappers for functions in an existing C function library. SWIG does not understand everything in C header files. But it does a fairly impressive job. You should try it first before resorting to the hard work of writing wrappers by hand.
More information on SWIG is at http://www.swig.org.
Here are some steps that you can follow:
Create an interface file -- Even when you are wrapping functions defined in an existing header file, creating an interface file is a good idea. Include your existing header file into it, then add whatever else you need. Here is an extremely simple example of a SWIG interface file:
%module MyLibrary %{ #include "MyLibrary.h" %} %include "MyLibrary.h"
Comments:
The "%{" and "%}" brackets are directives to SWIG. They say: "Add the code between these brackets to the generated wrapper file without processing it.
The "%include" statement says: "Copy the file into the interface file here. In effect, you are asking SWIG to generate wrappers for all the functions in this header file. If you want wrappers for only some of the functions in a header file, then copy or reproduce function declarations for the desired functions here. An example:
%module MyLibrary %{ #include "MyLibrary.h" %} int calcArea(int width, int height); int calcVolume(int radius);
This example will generate wrappers for only two functions.
You can find more information about the directives that are used in SWIG interface files in the SWIG User Manual, in particular at:
Generate the wrappers:
swig -python MyLibrary.i
Compile and link the library. On Linux, you can use something like the following:
gcc -c MyLibrary.c gcc -c -I/usr/local/include/python2.3 MyLibrary_wrap.c gcc -shared MyLibrary.o MyLibrary_wrap.o -o _MyLibrary.so
Note that we produce a shared library whose name is the module name prefixed with an underscore. SWIG also generates a .py file, without the leading underscore, which we will import from our Python code and which, in turn, imports the shared library.
Use the extension module in your python code:
Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import MyLibrary >>> MyLibrary.calcArea(4.0, 5.0) 20.0
Here is a makefile that will execute swig to generate wrappers, then compile and link the extension.
CFLAGS = -I/usr/local/include/python2.3
all: _MyLibrary.so
- _MyLibrary.so: MyLibrary.o MyLibrary_wrap.o
- gcc -shared MyLibrary.o MyLibrary_wrap.o -o _MyLibrary.so
- MyLibrary.o: MyLibrary.c
- gcc -c MyLibrary.c -o MyLibrary.o
- MyLibrary_wrap.o: MyLibrary_wrap.c
- gcc -c ${CFLAGS} MyLibrary_wrap.c -o MyLibrary_wrap.o
- MyLibrary_wrap.c: MyLibrary.i
- swig -python MyLibrary.i
- clean:
- rm -f MyLibrary.py MyLibrary.o MyLibrary_wrap.c
- MyLibrary_wrap.o _MyLibrary.so
Here is an example of running this makefile:
$ make -f MyLibrary_makefile clean rm -f MyLibrary.py MyLibrary.o MyLibrary_wrap.c \ MyLibrary_wrap.o _MyLibrary.so $ make -f MyLibrary_makefile gcc -c MyLibrary.c -o MyLibrary.o swig -python MyLibrary.i gcc -c -I/usr/local/include/python2.3 MyLibrary_wrap.c -o MyLibrary_wrap.o gcc -shared MyLibrary.o MyLibrary_wrap.o -o _MyLibrary.so
And, here are C source files that can be used in our example.
MyLibrary.h:
/* MyLibrary.h */ float calcArea(float width, float height); float calcVolume(float radius); int getVersion(); int getMode();
MyLibrary.c:
/* MyLibrary.c */ float calcArea(float width, float height) { return (width * height); } float calcVolume(float radius) { return (3.14 * radius * radius); } int getVersion() { return 123; } int getMode() { return 1; }
Pyrex is a useful tool for writing Python extensions. Because the Pyrex language is similar to Python, writing extensions in Pyrex is easier than doing so in C. Cython appears to be the a newer version of Pyrex.
More information is on Pyrex and Cython is at:
Here is a simple function definition in Pyrex:
# python_201_pyrex_string.pyx import string def formatString(object s1, object s2): s1 = string.strip(s1) s2 = string.strip(s2) s3 = '<<%s||%s>>' % (s1, s2) s4 = s3 * 4 return s4
And, here is a make file:
CFLAGS = -DNDEBUG -O3 -Wall -Wstrict-prototypes -fPIC \ -I/usr/local/include/python2.3 all: python_201_pyrex_string.so python_201_pyrex_string.so: python_201_pyrex_string.o gcc -shared python_201_pyrex_string.o -o python_201_pyrex_string.so python_201_pyrex_string.o: python_201_pyrex_string.c gcc -c ${CFLAGS} python_201_pyrex_string.c -o python_201_pyrex_string.o python_201_pyrex_string.c: python_201_pyrex_string.pyx pyrexc python_201_pyrex_string.pyx clean: rm -f python_201_pyrex_string.so python_201_pyrex_string.o \ python_201_pyrex_string.c
Here is another example. In this one, one function in the .pyx file calls another. Here is the implementation file:
# python_201_pyrex_primes.pyx def showPrimes(int kmax): plist = primes(kmax) for p in plist: print 'prime: %d' % p cdef primes(int kmax): cdef int n, k, i cdef int p[1000] result = [] if kmax > 1000: kmax = 1000 k = 0 n = 2 while k < kmax: i = 0 while i < k and n % p[i] <> 0: i = i + 1 if i == k: p[k] = n k = k + 1 result.append(n) n = n + 1 return result
And, here is a make file:
#CFLAGS = -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC # -I/usr/local/include/python2.3 CFLAGS = -DNDEBUG -I/usr/local/include/python2.3
all: python_201_pyrex_primes.so
- python_201_pyrex_primes.so: python_201_pyrex_primes.o
- gcc -shared python_201_pyrex_primes.o -o python_201_pyrex_primes.so
- python_201_pyrex_primes.o: python_201_pyrex_primes.c
- gcc -c ${CFLAGS} python_201_pyrex_primes.c -o python_201_pyrex_primes.o
- python_201_pyrex_primes.c: python_201_pyrex_primes.pyx
- pyrexc python_201_pyrex_primes.pyx
- clean:
- rm -f python_201_pyrex_primes.so python_201_pyrex_primes.o
- python_201_pyrex_primes.c
Here is the output from running the makefile:
$ make -f python_201_pyrex_makeprimes clean rm -f python_201_pyrex_primes.so python_201_pyrex_primes.o \ python_201_pyrex_primes.c $ make -f python_201_pyrex_makeprimes pyrexc python_201_pyrex_primes.pyx gcc -c -DNDEBUG -I/usr/local/include/python2.3 python_201_pyrex_primes.c -o python_201_pyrex_primes.o gcc -shared python_201_pyrex_primes.o -o python_201_pyrex_primes.so
Here is an interactive example of its use:
$ python Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import python_201_pyrex_primes >>> dir(python_201_pyrex_primes) ['__builtins__', '__doc__', '__file__', '__name__', 'showPrimes'] >>> python_201_pyrex_primes.showPrimes(5) prime: 2 prime: 3 prime: 5 prime: 7 prime: 11
This next example shows how to use Pyrex to implement a new extension type, that is a new Python built-in type. Notice that the class is declared with the cdef keyword, which tells Pyrex to generate the C implementation of a type instead of a class.
Here is the implementation file:
# python_201_pyrex_clsprimes.pyx """An implementation of primes handling class for a demonstration of Pyrex. """ cdef class Primes: """A class containing functions for handling primes. """ def showPrimes(self, int kmax): """Show a range of primes. Use the method primes() to generate the primes. """ plist = self.primes(kmax) for p in plist: print 'prime: %d' % p def primes(self, int kmax): """Generate the primes in the range 0 - kmax. """ cdef int n, k, i cdef int p[1000] result = [] if kmax > 1000: kmax = 1000 k = 0 n = 2 while k < kmax: i = 0 while i < k and n % p[i] <> 0: i = i + 1 if i == k: p[k] = n k = k + 1 result.append(n) n = n + 1 return result
And, here is a make file:
CFLAGS = -DNDEBUG -I/usr/local/include/python2.3 all: python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.so: python_201_pyrex_clsprimes.o gcc -shared python_201_pyrex_clsprimes.o -o python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.o: python_201_pyrex_clsprimes.c gcc -c ${CFLAGS} python_201_pyrex_clsprimes.c -o python_201_pyrex_clsprimes.o python_201_pyrex_clsprimes.c: python_201_pyrex_clsprimes.pyx pyrexc python_201_pyrex_clsprimes.pyx clean: rm -f python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.o \ python_201_pyrex_clsprimes.c
Here is output from running the makefile:
$ make -f python_201_pyrex_makeclsprimes clean rm -f python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.o \ python_201_pyrex_clsprimes.c $ make -f python_201_pyrex_makeclsprimes pyrexc python_201_pyrex_clsprimes.pyx gcc -c -DNDEBUG -I/usr/local/include/python2.3 python_201_pyrex_clsprimes.c -o python_201_pyrex_clsprimes.o gcc -shared python_201_pyrex_clsprimes.o -o python_201_pyrex_clsprimes.so
And here is an interactive example of its use:
$ python Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import python_201_pyrex_clsprimes >>> dir(python_201_pyrex_clsprimes) ['Primes', '__builtins__', '__doc__', '__file__', '__name__'] >>> primes = python_201_pyrex_clsprimes.Primes() >>> dir(primes) ['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', 'primes', 'showPrimes'] >>> primes.showPrimes(4) prime: 2 prime: 3 prime: 5 prime: 7
Documentation -- Also notice that Pyrex preserves the documentation for the module, the class, and the methods in the class. You can show this documentation with pydoc, as follows:
$ pydoc python_201_pyrex_clsprimes
Or, in Python interactive mode, use:
$ python Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import python_201_pyrex_clsprimes >>> help(python_201_pyrex_clsprimes)
Choose SWIG when:
Choose Pyrex when:
Here is a simple example that uses Cython to wrap a function implemented in C.
First the C header file:
/* test_c_lib.h */ int calculate(int width, int height);
And, the C implementation file:
/* test_c_lib.c */ #include "test_c_lib.h" int calculate(int width, int height) { int result; result = width * height * 3; return result; }
Here is a Cython file that calls our C function:
# test_c.pyx # Declare the external C function. cdef extern from "test_c_lib.h": int calculate(int width, int height) def test(w, h): # Call the external C function. result = calculate(w, h) print 'result from calculate: %d' % result
We can compile our code using this script (on Linux):
#!/bin/bash -x cython test_c.pyx gcc -c -fPIC -I/usr/local/include/python2.6 -o test_c.o test_c.c gcc -c -fPIC -I/usr/local/include/python2.6 -o test_c_lib.o test_c_lib.c gcc -shared -fPIC -I/usr/local/include/python2.6 -o test_c.so test_c.o test_c_lib.o
Here is a small Python file that uses the wrapper that we wrote in Cython:
# run_test_c.py import test_c def test(): test_c.test(4, 5) test_c.test(12, 15) if __name__ == '__main__': test()
And, when we run it, we see the following:
$ python run_test_c.py result from calculate: 60 result from calculate: 540
The goal -- A new built-in data type for Python.
Existing examples -- Objects/listobject.c, Objects/stringobject.c, Objects/dictobject.c, etc in the Python source code distribution.
In older versions of the Python source code distribution, a template for the C code was provided in Objects/xxobject.c. Objects/xxobject.c is no longer included in the Python source code distribution. However:
And, you can use Pyrex to generate a new built-in type. To do so, implement a Python/Pyrex class and declare the class with the Pyrex keyword cdef. In fact, you may want to use Pyrex to generate a minimal extension type, and then edit that generated code to insert and add functionality by hand. See the Pyrex section for an example.
Pyrex also goes some way toward giving you access to (existing) C structs and functions from Python.
Extension classes the easy way -- SWIG shadow classes.
Start with an implementation of a C++ class and its header file.
Use the following SWIG flags:
swig -c++ -python mymodule.i
More information is available with the SWIG documentation at: http://www.swig.org/Doc1.3/Python.html.
Extension classes the Pyrex way -- An alternatie is to use Pyrex to compile a class definition that does not have the cdef keyword. Using cdef on the class tells Pyrex to generate an extension type instead of a class. You will have to determine whether you want an extension class or an extension type.
Python is an excellent language for text analysis.
In some cases, simply splitting lines of text into words will be enough. In these cases use string.split().
In other cases, regular expressions may be able to do the parsing you need. If so, see the section on regular expressions in this document.
However, in some cases, more complex analysis of input text is required. This section describes some of the ways that Python can help you with this complex parsing and analysis.
There are a number of special purpose parsers which you will find in the Python standard library:
XML parsers and XML tools -- There is lots of support for parsing and processing XML in Python. Here are a few places to look for support:
For simple grammars, this is not so hard.
You will need to implement:
As an example, we'll implement a recursive descent parser written in Python for the following grammer:
Prog ::= Command | Command Prog Command ::= Func_call Func_call ::= Term '(' Func_call_list ')' Func_call_list ::= Func_call | Func_call ',' Func_call_list Term = <word>
Here is an implementation of a recursive descent parser for the above grammar:
#!/usr/bin/env python """ A recursive descent parser example. Usage: python rparser.py [options] <inputfile> Options: -h, --help Display this help message. Example: python rparser.py myfile.txt The grammar: Prog ::= Command | Command Prog Command ::= Func_call Func_call ::= Term '(' Func_call_list ')' Func_call_list ::= Func_call | Func_call ',' Func_call_list Term = <word> """ import sys import string import types import getopt # # To use the IPython interactive shell to inspect your running # application, uncomment the following lines: # ## from IPython.Shell import IPShellEmbed ## ipshell = IPShellEmbed((), ## banner = '>>>>>>>> Into IPython >>>>>>>>', ## exit_msg = '<<<<<<<< Out of IPython <<<<<<<<') # # Then add the following line at the point in your code where # you want to inspect run-time values: # # ipshell('some message to identify where we are') # # For more information see: http://ipython.scipy.org/moin/ # # # Constants # # AST node types NoneNodeType = 0 ProgNodeType = 1 CommandNodeType = 2 FuncCallNodeType = 3 FuncCallListNodeType = 4 TermNodeType = 5 # Token types NoneTokType = 0 LParTokType = 1 RParTokType = 2 WordTokType = 3 CommaTokType = 4 EOFTokType = 5 # Dictionary to map node type values to node type names NodeTypeDict = { NoneNodeType: 'NoneNodeType', ProgNodeType: 'ProgNodeType', CommandNodeType: 'CommandNodeType', FuncCallNodeType: 'FuncCallNodeType', FuncCallListNodeType: 'FuncCallListNodeType', TermNodeType: 'TermNodeType', } # # Representation of a node in the AST (abstract syntax tree). # class ASTNode: def __init__(self, nodeType, *args): self.nodeType = nodeType self.children = [] for item in args: self.children.append(item) def show(self, level): self.showLevel(level) print 'Node -- Type %s' % NodeTypeDict[self.nodeType] level += 1 for child in self.children: if isinstance(child, ASTNode): child.show(level) elif type(child) == types.ListType: for item in child: item.show(level) else: self.showLevel(level) print 'Child:', child def showLevel(self, level): for idx in range(level): print ' ', # # The recursive descent parser class. # Contains the "recognizer" methods, which implement the grammar # rules (above), one recognizer method for each production rule. # class ProgParser: def __init__(self): pass def parseFile(self, infileName): self.infileName = infileName self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.infile = file(self.infileName, 'r') self.tokens = genTokens(self.infile) try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty file' result = self.prog_reco() self.infile.close() self.infile = None return result def parseStream(self, instream): self.tokens = genTokens(instream, '<instream>') try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty file' result = self.prog_reco() return result def prog_reco(self): commandList = [] while 1: result = self.command_reco() if not result: break commandList.append(result) return ASTNode(ProgNodeType, commandList) def command_reco(self): if self.tokenType == EOFTokType: return None result = self.func_call_reco() return ASTNode(CommandNodeType, result) def func_call_reco(self): if self.tokenType == WordTokType: term = ASTNode(TermNodeType, self.token) self.tokenType, self.token, self.lineNo = self.tokens.next() if self.tokenType == LParTokType: self.tokenType, self.token, self.lineNo = self.tokens.next() result = self.func_call_list_reco() if result: if self.tokenType == RParTokType: self.tokenType, self.token, self.lineNo = \ self.tokens.next() return ASTNode(FuncCallNodeType, term, result) else: raise ParseError(self.lineNo, 'missing right paren') else: raise ParseError(self.lineNo, 'bad func call list') else: raise ParseError(self.lineNo, 'missing left paren') else: return None def func_call_list_reco(self): terms = [] while 1: result = self.func_call_reco() if not result: break terms.append(result) if self.tokenType != CommaTokType: break self.tokenType, self.token, self.lineNo = self.tokens.next() return ASTNode(FuncCallListNodeType, terms) # # The parse error exception class. # class ParseError(Exception): def __init__(self, lineNo, msg): RuntimeError.__init__(self, msg) self.lineNo = lineNo self.msg = msg def getLineNo(self): return self.lineNo def getMsg(self): return self.msg def is_word(token): for letter in token: if letter not in string.ascii_letters: return None return 1 # # Generate the tokens. # Usage: # gen = genTokens(infile) # tokType, tok, lineNo = gen.next() # ... def genTokens(infile): lineNo = 0 while 1: lineNo += 1 try: line = infile.next() except: yield (EOFTokType, None, lineNo) toks = line.split() for tok in toks: if is_word(tok): tokType = WordTokType elif tok == '(': tokType = LParTokType elif tok == ')': tokType = RParTokType elif tok == ',': tokType = CommaTokType yield (tokType, tok, lineNo) def test(infileName): parser = ProgParser() #ipshell('(test) #1\nCtrl-D to exit') result = None try: result = parser.parseFile(infileName) except ParseError, exp: sys.stderr.write('ParseError: (%d) %s\n' % \ (exp.getLineNo(), exp.getMsg())) if result: result.show(0) def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 1: usage() inputfile = args[0] test(inputfile) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Comments and explanation:
And, here is a sample of the data we can apply this parser to:
aaa ( ) bbb ( ccc ( ) ) ddd ( eee ( ) , fff ( ggg ( ) , hhh ( ) , iii ( ) ) )
And, if we run the parser on the this input data, we see:
$ python workbook045.py workbook045.data Node -- Type ProgNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: aaa Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: bbb Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ccc Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ddd Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: eee Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: fff Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ggg Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: hhh Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: iii Node -- Type FuncCallListNodeType
Lexical analysis -- The tokenizer in our recursive descent parser example was (for demonstration purposes) overly simple. You can always write more complex tokenizers by hand. However, for more complex (and real) tokenizers, you may want to use a tool to build your tokenizer.
In this section we'll describe Plex and use it to produce a tokenizer for our recursive descent parser.
You can obtain Plex at http://www.cosc.canterbury.ac.nz/~greg/python/Plex/.
In order to use it, you may want to add Plex-1.1.4/Plex to your PYTHONPATH.
Here is a simple example from the Plex tutorial:
#!/usr/bin/env python """ Sample Plex lexer Usage: python plex_example.py inputfile """ import sys import Plex def count_lines(scanner, text): scanner.line_count += 1 print '-' * 60 def test(infileName): letter = Plex.Range("AZaz") digit = Plex.Range("09") name = letter + Plex.Rep(letter | digit) number = Plex.Rep1(digit) space = Plex.Any(" \t") endline = Plex.Str('\n') #comment = Plex.Str('"') + Plex.Rep( Plex.AnyBut('"')) + Plex.Str('"') resword = Plex.Str("if", "then", "else", "end") lexicon = Plex.Lexicon([ (endline, count_lines), (resword, 'keyword'), (name, 'ident'), (number, 'int'), ( Plex.Any("+-*/=<>"), 'operator'), (space, Plex.IGNORE), #(comment, 'comment'), (Plex.Str('('), 'lpar'), (Plex.Str(')'), 'rpar'), # comments surrounded by (* and *) (Plex.Str("(*"), Plex.Begin('comment')), Plex.State('comment', [ (Plex.Str("*)"), Plex.Begin('')), (Plex.AnyChar, Plex.IGNORE), ]), ]) infile = open(infileName, "r") scanner = Plex.Scanner(lexicon, infile, infileName) scanner.line_count = 0 while True: token = scanner.read() if token[0] is None: break position = scanner.position() posstr = ('(%d, %d)' % (position[1], position[2], )).ljust(10) tokstr = '"%s"' % token[1] tokstr = tokstr.ljust(20) print '%s tok: %s tokType: %s' % (posstr, tokstr, token[0],) print 'line_count: %d' % scanner.line_count def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infileName = args[0] test(infileName) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Here is a bit of data on which we can use the above lexer:
mass = (height * (* some comment *) width * depth) / density totalmass = totalmass + mass
And, when we apply the above test program to this data, here is what we see:
$ python plex_example.py plex_example.data (1, 0) tok: "mass" tokType: ident (1, 5) tok: "=" tokType: operator (1, 7) tok: "(" tokType: lpar (1, 8) tok: "height" tokType: ident (1, 15) tok: "*" tokType: operator (1, 36) tok: "width" tokType: ident (1, 42) tok: "*" tokType: operator (1, 44) tok: "depth" tokType: ident (1, 49) tok: ")" tokType: rpar (1, 51) tok: "/" tokType: operator (1, 53) tok: "density" tokType: ident ------------------------------------------------------------ (2, 0) tok: "totalmass" tokType: ident (2, 10) tok: "=" tokType: operator (2, 12) tok: "totalmass" tokType: ident (2, 22) tok: "+" tokType: operator (2, 24) tok: "mass" tokType: ident ------------------------------------------------------------ line_count: 2
Comments and explanation:
And, here are some comments on constructing the patterns used in a lexicon:
Now let's revisit our recursive descent parser, this time with a tokenizer built with Plex. The tokenizer is trivial, but will serve as an example of how to hook it into a parser:
#!/usr/bin/env python """ A recursive descent parser example using Plex. This example uses Plex to implement a tokenizer. Usage: python python_201_rparser_plex.py [options] <inputfile> Options: -h, --help Display this help message. Example: python python_201_rparser_plex.py myfile.txt The grammar: Prog ::= Command | Command Prog Command ::= Func_call Func_call ::= Term '(' Func_call_list ')' Func_call_list ::= Func_call | Func_call ',' Func_call_list Term = <word> """ import sys, string, types import getopt import Plex ## from IPython.Shell import IPShellEmbed ## ipshell = IPShellEmbed((), ## banner = '>>>>>>>> Into IPython >>>>>>>>', ## exit_msg = '<<<<<<<< Out of IPython <<<<<<<<') # # Constants # # AST node types NoneNodeType = 0 ProgNodeType = 1 CommandNodeType = 2 FuncCallNodeType = 3 FuncCallListNodeType = 4 TermNodeType = 5 # Token types NoneTokType = 0 LParTokType = 1 RParTokType = 2 WordTokType = 3 CommaTokType = 4 EOFTokType = 5 # Dictionary to map node type values to node type names NodeTypeDict = { NoneNodeType: 'NoneNodeType', ProgNodeType: 'ProgNodeType', CommandNodeType: 'CommandNodeType', FuncCallNodeType: 'FuncCallNodeType', FuncCallListNodeType: 'FuncCallListNodeType', TermNodeType: 'TermNodeType', } # # Representation of a node in the AST (abstract syntax tree). # class ASTNode: def __init__(self, nodeType, *args): self.nodeType = nodeType self.children = [] for item in args: self.children.append(item) def show(self, level): self.showLevel(level) print 'Node -- Type %s' % NodeTypeDict[self.nodeType] level += 1 for child in self.children: if isinstance(child, ASTNode): child.show(level) elif type(child) == types.ListType: for item in child: item.show(level) else: self.showLevel(level) print 'Child:', child def showLevel(self, level): for idx in range(level): print ' ', # # The recursive descent parser class. # Contains the "recognizer" methods, which implement the grammar # rules (above), one recognizer method for each production rule. # class ProgParser: def __init__(self): self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.infile = None self.tokens = None def parseFile(self, infileName): self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.infile = file(infileName, 'r') self.tokens = genTokens(self.infile, infileName) try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty file' result = self.prog_reco() self.infile.close() self.infile = None return result def parseStream(self, instream): self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.tokens = genTokens(self.instream, '<stream>') try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty stream' result = self.prog_reco() self.infile.close() self.infile = None return result def prog_reco(self): commandList = [] while 1: result = self.command_reco() if not result: break commandList.append(result) return ASTNode(ProgNodeType, commandList) def command_reco(self): if self.tokenType == EOFTokType: return None result = self.func_call_reco() return ASTNode(CommandNodeType, result) def func_call_reco(self): if self.tokenType == WordTokType: term = ASTNode(TermNodeType, self.token) self.tokenType, self.token, self.lineNo = self.tokens.next() if self.tokenType == LParTokType: self.tokenType, self.token, self.lineNo = self.tokens.next() result = self.func_call_list_reco() if result: if self.tokenType == RParTokType: self.tokenType, self.token, self.lineNo = \ self.tokens.next() return ASTNode(FuncCallNodeType, term, result) else: raise ParseError(self.lineNo, 'missing right paren') else: raise ParseError(self.lineNo, 'bad func call list') else: raise ParseError(self.lineNo, 'missing left paren') else: return None def func_call_list_reco(self): terms = [] while 1: result = self.func_call_reco() if not result: break terms.append(result) if self.tokenType != CommaTokType: break self.tokenType, self.token, self.lineNo = self.tokens.next() return ASTNode(FuncCallListNodeType, terms) # # The parse error exception class. # class ParseError(Exception): def __init__(self, lineNo, msg): RuntimeError.__init__(self, msg) self.lineNo = lineNo self.msg = msg def getLineNo(self): return self.lineNo def getMsg(self): return self.msg # # Generate the tokens. # Usage - example # gen = genTokens(infile) # tokType, tok, lineNo = gen.next() # ... def genTokens(infile, infileName): letter = Plex.Range("AZaz") digit = Plex.Range("09") name = letter + Plex.Rep(letter | digit) lpar = Plex.Str('(') rpar = Plex.Str(')') comma = Plex.Str(',') comment = Plex.Str("#") + Plex.Rep(Plex.AnyBut("\n")) space = Plex.Any(" \t\n") lexicon = Plex.Lexicon([ (name, 'word'), (lpar, 'lpar'), (rpar, 'rpar'), (comma, 'comma'), (comment, Plex.IGNORE), (space, Plex.IGNORE), ]) scanner = Plex.Scanner(lexicon, infile, infileName) while 1: tokenType, token = scanner.read() name, lineNo, columnNo = scanner.position() if tokenType == None: tokType = EOFTokType token = None elif tokenType == 'word': tokType = WordTokType elif tokenType == 'lpar': tokType = LParTokType elif tokenType == 'rpar': tokType = RParTokType elif tokenType == 'comma': tokType = CommaTokType else: tokType = NoneTokType tok = token yield (tokType, tok, lineNo) def test(infileName): parser = ProgParser() #ipshell('(test) #1\nCtrl-D to exit') result = None try: result = parser.parseFile(infileName) except ParseError, exp: sys.stderr.write('ParseError: (%d) %s\n' % \ (exp.getLineNo(), exp.getMsg())) if result: result.show(0) def usage(): print __doc__ sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 1: usage() infileName = args[0] test(infileName) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
And, here is a sample of the data we can apply this parser to:
# Test for recursive descent parser and Plex. # Command #1 aaa() # Command #2 bbb (ccc()) # An end of line comment. # Command #3 ddd(eee(), fff(ggg(), hhh(), iii())) # End of test
And, when we run our parser, it produces the following:
$ python plex_recusive.py plex_recusive.data Node -- Type ProgNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: aaa Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: bbb Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ccc Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ddd Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: eee Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: fff Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ggg Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: hhh Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: iii Node -- Type FuncCallListNodeType
Comments:
For complex parsing tasks, you may want to consider the following tools:
And, for lexical analysis, you may also want to look here:
In the sections below, we give examples and notes about the use of PLY and pyparsing.
In this section we will show how to implement our parser example with PLY.
First down-load PLY. It is available here: PLY (Python Lex-Yacc) -- http://www.dabeaz.com/ply/
Then add the PLY directory to your PYTHONPATH.
Learn how to construct lexers and parsers with PLY by reading doc/ply.html in the distribution of PLY and by looking at the examples in the distribution.
For those of you who want a more complex example, see A Python Parser for the RELAX NG Compact Syntax, which is implemented with PLY.
Now, here is our example parser. Comments and explanations are below:
#!/usr/bin/env python """ A parser example. This example uses PLY to implement a lexer and parser. The grammar: Prog ::= Command* Command ::= Func_call Func_call ::= Term '(' Func_call_list ')' Func_call_list ::= Func_call* Term = <word> Here is a sample "program" to use as input: # Test for recursive descent parser and Plex. # Command #1 aaa() # Command #2 bbb (ccc()) # An end of line comment. # Command #3 ddd(eee(), fff(ggg(), hhh(), iii())) # End of test """ import sys import types import getopt import ply.lex as lex import ply.yacc as yacc # # Globals # startlinepos = 0 # # Constants # # AST node types NoneNodeType = 0 ProgNodeType = 1 CommandNodeType = 2 CommandListNodeType = 3 FuncCallNodeType = 4 FuncCallListNodeType = 5 TermNodeType = 6 # Dictionary to map node type values to node type names NodeTypeDict = { NoneNodeType: 'NoneNodeType', ProgNodeType: 'ProgNodeType', CommandNodeType: 'CommandNodeType', CommandListNodeType: 'CommandListNodeType', FuncCallNodeType: 'FuncCallNodeType', FuncCallListNodeType: 'FuncCallListNodeType', TermNodeType: 'TermNodeType', } # # Representation of a node in the AST (abstract syntax tree). # class ASTNode: def __init__(self, nodeType, *args): self.nodeType = nodeType self.children = [] for item in args: self.children.append(item) def append(self, item): self.children.append(item) def show(self, level): self.showLevel(level) print 'Node -- Type: %s' % NodeTypeDict[self.nodeType] level += 1 for child in self.children: if isinstance(child, ASTNode): child.show(level) elif type(child) == types.ListType: for item in child: item.show(level) else: self.showLevel(level) print 'Value:', child def showLevel(self, level): for idx in range(level): print ' ', # # Exception classes # class LexerError(Exception): def __init__(self, msg, lineno, columnno): self.msg = msg self.lineno = lineno self.columnno = columnno def show(self): sys.stderr.write('Lexer error (%d, %d) %s\n' % \ (self.lineno, self.columnno, self.msg)) class ParserError(Exception): def __init__(self, msg, lineno, columnno): self.msg = msg self.lineno = lineno self.columnno = columnno def show(self): sys.stderr.write('Parser error (%d, %d) %s\n' % \ (self.lineno, self.columnno, self.msg)) # # Lexer specification # tokens = ( 'NAME', 'LPAR','RPAR', 'COMMA', ) # Tokens t_LPAR = r'\(' t_RPAR = r'\)' t_COMMA = r'\,' t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*' # Ignore whitespace t_ignore = ' \t' # Ignore comments ('#' to end of line) def t_COMMENT(t): r'\#[^\n]*' pass def t_newline(t): r'\n+' global startlinepos startlinepos = t.lexer.lexpos - 1 t.lineno += t.value.count("\n") def t_error(t): global startlinepos msg = "Illegal character '%s'" % (t.value[0]) columnno = t.lexer.lexpos - startlinepos raise LexerError(msg, t.lineno, columnno) # # Parser specification # def p_prog(t): 'prog : command_list' t[0] = ASTNode(ProgNodeType, t[1]) def p_command_list_1(t): 'command_list : command' t[0] = ASTNode(CommandListNodeType, t[1]) def p_command_list_2(t): 'command_list : command_list command' t[1].append(t[2]) t[0] = t[1] def p_command(t): 'command : func_call' t[0] = ASTNode(CommandNodeType, t[1]) def p_func_call_1(t): 'func_call : term LPAR RPAR' t[0] = ASTNode(FuncCallNodeType, t[1]) def p_func_call_2(t): 'func_call : term LPAR func_call_list RPAR' t[0] = ASTNode(FuncCallNodeType, t[1], t[3]) def p_func_call_list_1(t): 'func_call_list : func_call' t[0] = ASTNode(FuncCallListNodeType, t[1]) def p_func_call_list_2(t): 'func_call_list : func_call_list COMMA func_call' t[1].append(t[3]) t[0] = t[1] def p_term(t): 'term : NAME' t[0] = ASTNode(TermNodeType, t[1]) def p_error(t): global startlinepos msg = "Syntax error at '%s'" % t.value columnno = t.lexer.lexpos - startlinepos raise ParserError(msg, t.lineno, columnno) # # Parse the input and display the AST (abstract syntax tree) # def parse(infileName): startlinepos = 0 # Build the lexer lex.lex(debug=1) # Build the parser yacc.yacc() # Read the input infile = file(infileName, 'r') content = infile.read() infile.close() try: # Do the parse result = yacc.parse(content) # Display the AST result.show(0) except LexerError, exp: exp.show() except ParserError, exp: exp.show() USAGE_TEXT = __doc__ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 1: usage() infileName = args[0] parse(infileName) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Applying this parser to the following input:
# Test for recursive descent parser and Plex. # Command #1 aaa() # Command #2 bbb (ccc()) # An end of line comment. # Command #3 ddd(eee(), fff(ggg(), hhh(), iii())) # End of test
produces the following output:
Node -- Type: ProgNodeType Node -- Type: CommandListNodeType Node -- Type: CommandNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: aaa Node -- Type: CommandNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: bbb Node -- Type: FuncCallListNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: ccc Node -- Type: CommandNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: ddd Node -- Type: FuncCallListNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: eee Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: fff Node -- Type: FuncCallListNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: ggg Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: hhh Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: iii
Comments and explanation:
pyparsing is a relatively new parsing package for Python. It was implemented and is supported by Paul McGuire and it shows promise. It appears especially easy to use and seems especially appropriate in particular for quick parsing tasks, although it has features that make some complex parsing tasks easy. It follows a very natural Python style for constructing parsers.
Good documentation comes with the pyparsing distribution. See file HowToUseParsing.html. So, I won't try to repeat that here. What follows is an attempt to provide several quick examples to help you solve simple parsing tasks as quickly as possible.
You will also want to look at the samples in the examples directory, which are very helpful. My examples below are fairly simple. You can see more of the ability of pyparsing to handle complex tasks in the examples.
Where to get it - You can find pyparsing at: Pyparsing Wiki Home -- http://pyparsing.wikispaces.com/
How to install it - Put the pyparsing module somewhere on your PYTHONPATH.
And now, here are a few examples.
Note: This example is for demonstration purposes only. If you really to need to parse comma delimited fields, you can probably do so much more easily with the CSV (comma separated values) module in the Python standard library.
Here is a simple grammar for lines containing fields separated by commas:
import sys from pyparsing import alphanums, ZeroOrMore, Word fieldDef = Word(alphanums) lineDef = fieldDef + ZeroOrMore("," + fieldDef) def test(): args = sys.argv[1:] if len(args) != 1: print 'usage: python pyparsing_test1.py <datafile.txt>' sys.exit(-1) infilename = sys.argv[1] infile = file(infilename, 'r') for line in infile: fields = lineDef.parseString(line) print fields test()
Here is some sample data:
abcd,defg 11111,22222,33333
And, when we run our parser on this data file, here is what we see:
$ python comma_parser.py sample1.data ['abcd', ',', 'defg'] ['11111', ',', '22222', ',', '33333']
Notes and explanation:
Note how the grammar is constructed from normal Python calls to function and object/class constructors. I've constructed the parser in-line because my example is simple, but constructing the parser in a function or even a module might make sense for more complex grammars. pyparsing makes it easy to use these these different styles.
Use "+" to specify a sequence. In our example, a lineDef is a fieldDef followed by ....
Use ZeroOrMore to specify repetition. In our example, a lineDef is a fieldDef followed by zero or more occurances of comma and fieldDef. There is also OneOrMore when you want to require at least one occurance.
Parsing comma delimited text happens so frequently that pyparsing provides a shortcut. Replace:
lineDef = fieldDef + ZeroOrMore("," + fieldDef)
with:
lineDef = delimitedList(fieldDef)
And note that delimitedList takes an optional argument delim used to specify the delimiter. The default is a comma.
This example parses expressions of the form func(arg1, arg2, arg3):
from pyparsing import Word, alphas, alphanums, nums, ZeroOrMore, Literal lparen = Literal("(") rparen = Literal(")") identifier = Word(alphas, alphanums + "_") integer = Word( nums ) functor = identifier arg = identifier | integer args = arg + ZeroOrMore("," + arg) expression = functor + lparen + args + rparen def test(): content = raw_input("Enter an expression: ") parsedContent = expression.parseString(content) print parsedContent test()
Explanation:
This example parses expressions having the following form:
Input format: [name] [phone] [city, state zip] Last, first 111-222-3333 city, ca 99999
Here is the parser:
import sys from pyparsing import alphas, nums, ZeroOrMore, Word, Group, Suppress, Combine lastname = Word(alphas) firstname = Word(alphas) city = Group(Word(alphas) + ZeroOrMore(Word(alphas))) state = Word(alphas, exact=2) zip = Word(nums, exact=5) name = Group(lastname + Suppress(",") + firstname) phone = Combine(Word(nums, exact=3) + "-" + Word(nums, exact=3) + "-" + Word(nums, exact=4)) location = Group(city + Suppress(",") + state + zip) record = name + phone + location def test(): args = sys.argv[1:] if len(args) != 1: print 'usage: python pyparsing_test3.py <datafile.txt>' sys.exit(-1) infilename = sys.argv[1] infile = file(infilename, 'r') for line in infile: line = line.strip() if line and line[0] != "#": fields = record.parseString(line) print fields test()
And, here is some sample input:
Jabberer, Jerry 111-222-3333 Bakersfield, CA 95111 Kackler, Kerry 111-222-3334 Fresno, CA 95112 Louderdale, Larry 111-222-3335 Los Angeles, CA 94001
Here is output from parsing the above input:
[['Jabberer', 'Jerry'], '111-222-3333', [['Bakersfield'], 'CA', '95111']] [['Kackler', 'Kerry'], '111-222-3334', [['Fresno'], 'CA', '95112']] [['Louderdale', 'Larry'], '111-222-3335', [['Los', 'Angeles'], 'CA', '94001']]
Comments:
This example (thanks to Paul McGuire) parses a more complex structure and produces a dictionary.
Here is the code:
from pyparsing import Literal, Word, Group, Dict, ZeroOrMore, alphas, nums,\ delimitedList import pprint testData = """ +-------+------+------+------+------+------+------+------+------+ | | A1 | B1 | C1 | D1 | A2 | B2 | C2 | D2 | +=======+======+======+======+======+======+======+======+======+ | min | 7 | 43 | 7 | 15 | 82 | 98 | 1 | 37 | | max | 11 | 52 | 10 | 17 | 85 | 112 | 4 | 39 | | ave | 9 | 47 | 8 | 16 | 84 | 106 | 3 | 38 | | sdev | 1 | 3 | 1 | 1 | 1 | 3 | 1 | 1 | +-------+------+------+------+------+------+------+------+------+ """ # Define grammar for datatable heading = (Literal( "+-------+------+------+------+------+------+------+------+------+") + "| | A1 | B1 | C1 | D1 | A2 | B2 | C2 | D2 |" + "+=======+======+======+======+======+======+======+======+======+").suppress() vert = Literal("|").suppress() number = Word(nums) rowData = Group( vert + Word(alphas) + vert + delimitedList(number,"|") + vert ) trailing = Literal( "+-------+------+------+------+------+------+------+------+------+").suppress() datatable = heading + Dict( ZeroOrMore(rowData) ) + trailing def main(): # Now parse data and print results data = datatable.parseString(testData) print "data:", data print "data.asList():", pprint.pprint(data.asList()) print "data keys:", data.keys() print "data['min']:", data['min'] print "data.max:", data.max if __name__ == '__main__': main()
When we run this, it produces the following:
data: [['min', '7', '43', '7', '15', '82', '98', '1', '37'], ['max', '11', '52', '10', '17', '85', '112', '4', '39'], ['ave', '9', '47', '8', '16', '84', '106', '3', '38'], ['sdev', '1', '3', '1', '1', '1', '3', '1', '1']] data.asList():[['min', '7', '43', '7', '15', '82', '98', '1', '37'], ['max', '11', '52', '10', '17', '85', '112', '4', '39'], ['ave', '9', '47', '8', '16', '84', '106', '3', '38'], ['sdev', '1', '3', '1', '1', '1', '3', '1', '1']] data keys: ['ave', 'min', 'sdev', 'max'] data['min']: ['7', '43', '7', '15', '82', '98', '1', '37'] data.max: ['11', '52', '10', '17', '85', '112', '4', '39']
Notes:
This section will help you to put a GUI (graphical user interface) in your Python program.
We will use a particular GUI library: PyGTK. We've chosen this because it is reasonably light-weight and our goal is to embed light-weight GUI interfaces in an (possibly) existing application.
For simpler GUI needs, consider EasyGUI, which is also described below.
For more heavy-weight GUI needs (for example, complete GUI applications), you may want to explore WxPython. See the WxPython home page at: http://www.wxpython.org/
Information about PyGTK is here: The PyGTK home page -- http://www.pygtk.org//.
In this section we explain how to pop up a simple dialog box from your Python application.
To do this, do the following:
Here is a sample that displays a message box:
#!/usr/bin/env python import sys import getopt import gtk class MessageBox(gtk.Dialog): def __init__(self, message="", buttons=(), pixmap=None, modal= True): gtk.Dialog.__init__(self) self.connect("destroy", self.quit) self.connect("delete_event", self.quit) if modal: self.set_modal(True) hbox = gtk.HBox(spacing=5) hbox.set_border_width(5) self.vbox.pack_start(hbox) hbox.show() if pixmap: self.realize() pixmap = Pixmap(self, pixmap) hbox.pack_start(pixmap, expand=False) pixmap.show() label = gtk.Label(message) hbox.pack_start(label) label.show() for text in buttons: b = gtk.Button(text) b.set_flags(gtk.CAN_DEFAULT) b.set_data("user_data", text) b.connect("clicked", self.click) self.action_area.pack_start(b) b.show() self.ret = None def quit(self, *args): self.hide() self.destroy() gtk.main_quit() def click(self, button): self.ret = button.get_data("user_data") self.quit() # create a message box, and return which button was pressed def message_box(title="Message Box", message="", buttons=(), pixmap=None, modal= True): win = MessageBox(message, buttons, pixmap=pixmap, modal=modal) win.set_title(title) win.show() gtk.main() return win.ret def test(): result = message_box(title='Test #1', message='Here is your message', buttons=('Ok', 'Cancel')) print 'result:', result USAGE_TEXT = """ Usage: python simple_dialog.py [options] Options: -h, --help Display this help message. Example: python simple_dialog.py """ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 0: usage() test() if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Some explanation:
And, here is an example that displays an text input dialog:
#!/usr/bin/env python import sys import getopt import gtk class EntryDialog( gtk.Dialog): def __init__(self, message="", default_text='', modal=True): gtk.Dialog.__init__(self) self.connect("destroy", self.quit) self.connect("delete_event", self.quit) if modal: self.set_modal(True) box = gtk.VBox(spacing=10) box.set_border_width(10) self.vbox.pack_start(box) box.show() if message: label = gtk.Label(message) box.pack_start(label) label.show() self.entry = gtk.Entry() self.entry.set_text(default_text) box.pack_start(self.entry) self.entry.show() self.entry.grab_focus() button = gtk.Button("OK") button.connect("clicked", self.click) button.set_flags(gtk.CAN_DEFAULT) self.action_area.pack_start(button) button.show() button.grab_default() button = gtk.Button("Cancel") button.connect("clicked", self.quit) button.set_flags(gtk.CAN_DEFAULT) self.action_area.pack_start(button) button.show() self.ret = None def quit(self, w=None, event=None): self.hide() self.destroy() gtk.main_quit() def click(self, button): self.ret = self.entry.get_text() self.quit() def input_box(title="Input Box", message="", default_text='', modal=True): win = EntryDialog(message, default_text, modal=modal) win.set_title(title) win.show() gtk.main() return win.ret def test(): result = input_box(title='Test #2', message='Enter a valuexxx:', default_text='a default value') if result is None: print 'Canceled' else: print 'result: "%s"' % result USAGE_TEXT = """ Usage: python simple_dialog.py [options] Options: -h, --help Display this help message. Example: python simple_dialog.py """ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 0: usage() test() if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Most of the explanation for the message box example is relevant to this example, too. Here are some differences:
This example shows a file selection dialog box:
#!/usr/bin/env python import sys import getopt import gtk class FileChooser(gtk.FileSelection): def __init__(self, modal=True, multiple=True): gtk.FileSelection.__init__(self) self.multiple = multiple self.connect("destroy", self.quit) self.connect("delete_event", self.quit) if modal: self.set_modal(True) self.cancel_button.connect('clicked', self.quit) self.ok_button.connect('clicked', self.ok_cb) if multiple: self.set_select_multiple(True) self.ret = None def quit(self, *args): self.hide() self.destroy() gtk.main_quit() def ok_cb(self, b): if self.multiple: self.ret = self.get_selections() else: self.ret = self.get_filename() self.quit() def file_sel_box(title="Browse", modal=False, multiple=True): win = FileChooser(modal=modal, multiple=multiple) win.set_title(title) win.show() gtk.main() return win.ret def file_open_box(modal=True): return file_sel_box("Open", modal=modal, multiple=True) def file_save_box(modal=True): return file_sel_box("Save As", modal=modal, multiple=False) def test(): result = file_open_box() print 'open result:', result result = file_save_box() print 'save result:', result USAGE_TEXT = """ Usage: python simple_dialog.py [options] Options: -h, --help Display this help message. Example: python simple_dialog.py """ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 0: usage() test() if __name__ == '__main__': main() #import pdb #pdb.run('main()')
A little guidance:
Note that there are also predefined dialogs for font selection (FontSelectionDialog) and color selection (ColorSelectionDialog)
If your GUI needs are minimalist (maybe a pop-up dialog or two) and your application is imperative rather than event driven, then you may want to consider EasyGUI. As the name suggests, it is extremely easy to use.
How to know when you might be able to use EasyGUI:
EasyGUI plus documentation and examples are available at EasyGUI home page at SourceForge -- http://easygui.sourceforge.net/
EasyGUI provides functions for a variety of commonly needed dialog boxes, including:
See the documentation at the EasyGUI Web site for more features.
For a demonstration of EasyGUI's capabilities, run the easygui.py as a Python script:
$ python easygui.py
Here is a simple example that prompts the user for an entry, then shows the response in a message box:
import easygui def testeasygui(): response = easygui.enterbox(msg='Enter your name:', title='Name Entry') easygui.msgbox(msg=response, title='Your Response') testeasygui()
This example presents a dialog to allow the user to select a file:
import easygui def test(): response = easygui.fileopenbox(msg='Select a file') print 'file name: %s' % response test()
Python has an excellent range of implementation organization structures. These range from statements and control structures (at a low level) through functions, methods, and classes (at an intermediate level) and modules and packages at an upper level.
This section provides some guidance with the use of packages. In particular:
A Python package is a collection of Python modules in a disk directory.
In order to be able to import individual modules from a directory, the directory must contain a file named __init__.py. (Note that requirement does not apply to directories that are listed in PYTHONPATH.) The __init__.py serves several purposes:
One simple way to enable the user to import and use a package is to instruct the use to import individual modules from the package.
A second, slightly more advanced way to enable the user to import the package is to expose those features of the package in the __init__ module. Suppose that module mod1 contains functions fun1a and fun1b and suppose that module mod2 contains functions fun2a and fun2b. Then file __init__.py might contain the following:
from mod1 import fun1a, fun1b from mod2 import fun2a, fun2b
Then, if the following is evaluated in the user's code:
import testpackages
Then testpackages will contain fun1a, fun1b, fun2a, and fun2b.
For example, here is an interactive session that demostrates importing the package:
>>> import testpackages >>> print dir(testpackages) [`__builtins__', `__doc__', `__file__', `__name__', `__path__', `fun1a', `fun1b', `fun2a', `fun2b', `mod1', `mod2']
Distutils (Python Distribution Utilities) has special support for distrubuting and installing packages. Learn more here: Distributing Python Modules -- http://docs.python.org/distutils/index.html.
As our example, imagine that we have a directory containing the following:
Testpackages Testpackages/README Testpackages/MANIFEST.in Testpackages/setup.py Testpackages/testpackages/__init__.py Testpackages/testpackages/mod1.py Testpackages/testpackages/mod2.py
Notice the sub-directory Testpackages/testpackages containing the file __init__.py. This is the Python package that we will install.
We'll describe how to configure the above files so that they can be packaged as a single distribution file and so that the Python package they contain can be installed as a package by Distutils.
The MANIFEST.in file lists the files that we want included in our distribution. Here is the contents of our MANIFEST.in file:
include README MANIFEST MANIFEST.in include setup.py include testpackages/*.py
The setup.py file describes to Distutils (1) how to package the distribution file and (2) how to install the distribution. Here is the contents of our sample setup.py:
#!/usr/bin/env python from distutils.core import setup # [1] long_description = 'Tests for installing and distributing Python packages' setup(name = 'testpackages', # [2] version = '1.0a', description = 'Tests for Python packages', maintainer = 'Dave Kuhlman', maintainer_email = 'dkuhlman@rexx.com', url = 'http://www.rexx.com/~dkuhlman', long_description = long_description, packages = ['testpackages'] # [3] )
Explanation:
Now, to create a distribution file, we run the following:
python setup.py sdist --formats=gztar
which will create a file testpackages-1.0a.tar.gz under the directory dist.
Then, you can give this distribution file to a potential user, who can install it by doing the following:
$ tar xvzf testpackages-1.0a.tar.gz $ cd testpackages-1.0a $ python setup.py build $ python setup.py install # as root
This document takes a workbook and exercise-with-solutions approach to Python training. It is hoped that those who feel a need for less explanation and more practical exercises will find this useful.
A few notes about the exercises:
The latest version of this document is at my Web site (URL above).
If you have comments or suggestions, please send them my way.
A name is any combination of letters, digits, and the underscore, but the first character must be a letter or an underscore. Names may be of any length.
Case is significant.
Exercises:
Solutions:
The following are keywords in Python and should not be used as variable names:
and del from not while as elif global or with assert else if pass yield break except import print class exec in raise continue finally is return def for lambda try
Exercises:
Solutions:
The following are operators in Python and will separate names:
+ - * ** / // % << >> & | ^ ~ < > <= >= == != <> and or is not in Also: () [] . (dot)
But, note that the Python style guide suggests that you place blanks around binary operators. One exception to this rule is function arguments and parameters for functions: it is suggested that you not put blanks around the equal sign (=) used to specify keyword arguments and default parameters.
Exercises:
Solutions:
In Python, normally we write one statement per line. In fact, Python assumes this. Therefore:
Extending a Python statement to a subsequent line -- Follow these two rules:
Exercises:
Extend the following statement to a second line using parentheses:
total_count = tree_count + vegetable_count + fruit_count
Extend the following statement to a second line using the backslash line continuation character:
total_count = tree_count + vegetable_count + fruit_count
Solutions:
Parentheses create an open context that tells Python that a statement extends to the next line:
total_count = (tree_count + vegetable_count + fruit_count)
A backslash as the last character on line tells Python that the current statement extends to the next line:
total_count = tree_count + \ vegetable_count + fruit_count
For extending a line on a subsequent line, which is better, parentheses or a backslash? Here is a quote:
"The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses around an expression, but sometimes using a backslash looks better."
-- PEP 8: Style Guide for Python Code -- http://www.python.org/dev/peps/pep-0008/
Python uses indentation to indicate program structure. That is to say, in order to nest a block of code inside a compound statement, you indent that nested code. This is different from many programming languages which use some sort of begin and end markers, for example curly brackets.
The standard coding practice for Python is to use four spaces per indentation level and to not use hard tabs. (See the Style Guide for Python Code.) Because of this, you will want to use a text editor that you can configure so that it will use four spaces for indentation. See here for a list of Python-friendly text editors: PythonEditors.
Exercises:
Given the following, nest the print statement inside the if statement:
if x > 0: print x
Nest these two lines:
z = x + y print z
inside the following function definition statement:
def show_sum(x, y):
Solutions:
Indentation indicates that one statement is nested inside another statement:
if x > 0: print x
Indentation indicates that a block of statements is nested inside another statement:
def show_sum(x, y): z = x + y print z
Here are a few rules:
Exercises:
Will the following code produce an error?
show_version() def show_version(): print 'Version 1.0a'
Will the following code produce an error?
def test(): show_version() def show_version(): print 'Version 1.0a' test()
Will the following code produce an error? Assume that show_config is not defined:
x = 3 if x > 5: show_config()
Solutions:
Answer: Yes, it generates an error. The name show_version would not be created and bound to a value until the def function definition statement binds a function object to it. That is done after the attempt to use (call) that object.
Answer: No. The function test() does call the function show_version(), but since test() is not called until after show_version() is defined, that is OK.
Answer: No. It's bad code, but in this case will not generate an error. Since x is less than 5, the body of the if statement is not evaluated.
N.B. This example shows why it is important during testing that every line of code in your Python program be evaluated. Here is good Pythonic advice: "If it's not tested, it's broken."
Each of the subsections in this section on built-in data types will have a similar structure:
The numbers you will use most commonly are likely to be integers and floats. Python also has long integers and complex numbers.
A few facts about numbers (in Python):
Python will convert to using a long integer automatically when needed. You do not need to worry about exceeding the size of a (standard) integer.
The size of the largest integer in your version of Python is in sys.maxint. To learn what it is, do:
>>> import sys >>> print sys.maxint 9223372036854775807
The above show the maximum size of an integer on a 64-bit version of Python.
You can convert from integer to float by using the float constructor. Example:
>>> x = 25 >>> y = float(x) >>> print y 25.0
Python does "mixed arithmetic". You can add, multiply, and divide integers and floats. When you do, Python "promotes" the result to a float.
An integer is constructed with a series of digits or the integer constructor (int(x)). Be aware that a sequence of digits beginning with zero represents an octal value. Examples:
>>> x1 = 1234 >>> x2 = int('1234') >>> x3 = -25 >>> x1 1234 >>> x2 1234 >>> x3 -25
A float is constructed either with digits and a dot (example, 12.345) or with engineering/scientific notation or with the float constructor (float(x)). Examples:
>>> x1 = 2.0e3 >>> x1 = 1.234 >>> x2 = -1.234 >>> x3 = float('1.234') >>> x4 = 2.0e3 >>> x5 = 2.0e-3 >>> print x1, x2, x3, x4, x5 1.234 -1.234 1.234 2000.0 0.002
Exercises:
Construct these numeric values:
Solutions:
0
0.0, 0., or .0
101
1000.0
1e3 or 1.0e3
Asigning integer values to variables:
In [7]: value1 = 23 In [8]: value2 = -14 In [9]: value3 = 0 In [10]: value1 Out[10]: 23 In [11]: value2 Out[11]: -14 In [12]: value3 Out[12]: 0
Assigning expression values to variables:
value1 = 4 * (3 + 5) value2 = (value1 / 3.0) - 2
Assigning floats to variables:
value1 = 0.01 value2 = -3.0 value3 = 3e-4
Assigning expressions containing varialbes:
value4 = value1 * (value2 - value3) value4 = value1 + value2 + value3 - value4
Mixed arithmetic:
x = 5 y = 8 z = float(x) / y
You can also construct integers and floats using the class. Calling a class (using parentheses after a class name, for example) produces an instance of the class.
Exercises:
Solutions:
Use the int data type to construct an integer instance from a string:
int("123")
Use the float data type to construct a float instance from an integer:
float(123)
Use the int data type to construct an integer instance from a float:
int(12.345) # --> 12
Notice that the result is truncated to the integer part.
You can use most of the familiar operators with numbers, for example:
+ - * ** / // % << >> & | ^ ~ < > <= >= == != <>
Look here for an explanation of these operators when applied to numbers: Numeric Types -- int, float, long, complex -- http://docs.python.org/lib/typesnumeric.html.
Some operators take precedence over others. The table in the Web page just referenced above also shows that order of priority.
Here is a bit of that table:
All numeric types (except complex) support the following operations, sorted by ascending priority (operations in the same box have the same priority; all numeric operations have a higher priority than comparison operations): Operation Result --------- ------ x + y sum of x and y x - y difference of x and y x * y product of x and y x / y quotient of x and y x // y (floored) quotient of x and y x % y remainder of x / y -x x negated +x x unchanged abs(x) absolute value or magnitude of x int(x) x converted to integer long(x) x converted to long integer float(x) x converted to floating point complex(re,im) a complex number with real part re, imaginary part im. im defaults to zero. c.conjugate() conjugate of the complex number c divmod(x, y) the pair (x // y, x % y) pow(x, y) x to the power y x ** y x to the power y
Notice also that the same operator may perform a different function depending on the data type of the value to which it is applied.
Exercises:
Solutions:
Arithmetic expressions are follow standard infix algebraic syntax:
3 + 4 + 5
Use another infix expression:
2 + 3 * 4
Or:
2 + (3 * 4)
But, in this case the parentheses are not necessary because the * operator binds more tightly than the + operator.
Use parentheses to control order of evaluation:
(2 + 3) * 4
Note that the * operator has precedence over (binds tighter than) the + operator, so the parentheses are needed.
Python does mixed arithemetic. When you apply an operation to an integer and a float, it promotes the result to the "higher" data type, a float.
If you need to perform an operation on several integers, but want use a floating point operation, first convert one of the integers to a float using float(x), which effectively creates an instance of class float.
Try the following at your Python interactive prompt:
Exercises:
Given the following assignments:
x = 20 y = 50
Divide x by y giving a float result.
Solutions:
Promote one of the integers to float before performing the division:
z = float(x) / y
Most of the methods implemented by the data types (classes) int and float are special methods that are called through the use of operators. Special methods often have names that begin and end with a double underscore. To see a list of the special names and a bit of an indication of when each is called, do any of the following at the Python interactive prompt:
>>> help(int) >>> help(32) >>> help(float) >>> help(1.23) >>> dir(1) >>> dir(1.2)
Lists are a container data type that acts as a dynamic array. That is to say, a list is a sequence that can be indexed into and that can grow and shrink.
A tuple is an index-able container, like a list, except that a tuple is immutable.
A few characteristics of lists and tuples:
The literal representation of a list is square brackets containing zero or more items separated by commas.
Examples:
Try these at the Python interactive prompt:
>>> [11, 22, 33] >>> ['aa', 'bb', 'cc', ] >>> [100, 'apple', 200, 'banana', ] # The last comma is >>> optional.
A list can contain lists. In fact a list can contain any kind of object:
>>> [1, [2, 3], 4, [5, 6, 7, ], 8]
Lists are heterogenous, that is, different kinds of objects can be in the same list. Here is a list that contains a number, a string, and another list:
>>> [123, 'abc', [456, 789]]
Exercises:
Solutions:
We can define list literals at the Python or IPython interactive prompt:
Create a tuple using commas, optionally with parentheses:
In [1]: a1 = (11, 22, 33, ) In [2]: a1 Out[2]: (11, 22, 33)
Quoted characters separated by commas create a tuple of strings:
In [3]: a2 = ('aaa', 'bbb', 'ccc') In [4]: a2 Out[4]: ('aaa', 'bbb', 'ccc')
Items separated by commas inside square brackets create a list:
In [26]: a3 = [100, 200, 300, ] In [27]: a3 Out[27]: [100, 200, 300]
Strings separated by commas inside square brackets create a list of strings:
In [5]: a3 = ['basil', 'parsley', 'coriander'] In [6]: a3 Out[6]: ['basil', 'parsley', 'coriander'] In [7]:
A tuple or a list can contain tuples and lists:
In [8]: a5 = [(11, 22), (33, 44), (55,)] In [9]: a5 Out[9]: [(11, 22), (33, 44), (55,)]
A list or tuple can contain items of different types:
In [10]: a6 = [101, 102, 'abc', "def", (201, 202), ('ghi', 'jkl')] In [11]: a6 Out[11]: [101, 102, 'abc', 'def', (201, 202), ('ghi', 'jkl')]
In order to create a tuple containing exactly one item, we must use a comma:
In [13]: a7 = (6,) In [14]: a7 Out[14]: (6,)
In order to create an empty tuple, use the tuple class/type to create an instance of a empty tuple:
In [21]: a = tuple() In [22]: a Out[22]: () In [23]: type(a) Out[23]: <type 'tuple'>
There are several operators that are applicable to lists. Here is how to find out about them:
Exercises:
Solutions:
The plus operator, applied to two lists produces a new list that is a concatenation of two lists:
>>> [11, 22] + ['aa', 'bb']
Multiplying a list by an integer n creates a new list that repeats the original list n times:
>>> [11, 'abc', 4.5] * 3
The comparison operators can be used to compare lists:
>>> [11, 22] == [11, 22] >>> [11, 22] < [11, 33]
Again, use dir() and help() to learn about the methods supported by lists.
Examples:
Solutions:
The extend method adds elements from another list, or other iterable:
>>> a = [11, 22, 33, 44, ] >>> b = [55, 66] >>> a.extend(b) >>> a [11, 22, 33, 44, 55, 66]
Use the append method on a list to add/append an item to the end of a list:
>>> a = ['aa', 11] >>> a.append('bb') >>> a.append(22) >>> a ['aa', 11, 'bb', 22]
The insert method on a list enables us to insert items at a given position in a list:
>>> a = [11, 22, 33, 44, ] >>> a.insert(0, 'aa') >>> a ['aa', 11, 22, 33, 44] >>> a.insert(2, 'bb') >>> a ['aa', 11, 'bb', 22, 33, 44]
But, note that we use append to add items at the end of a list.
The pop method on a list returns the "right-most" item from a list and removes that item from the list:
>>> a = [11, 22, 33, 44, ] >>> >>> b = a.pop() >>> a [11, 22, 33] >>> b 44 >>> b = a.pop() >>> a [11, 22] >>> b 33
Note that the append and pop methods taken together can be used to implement a stack, that is a LIFO (last in first out) data structure.
A list comprehension is a convenient way to produce a list from an iterable (a sequence or other object that can be iterated over).
In its simplest form, a list comprehension resembles the header line of a for statement inside square brackets. However, in a list comprehension, the for statement header is prefixed with an expression and surrounded by square brackets. Here is a template:
[expr(x) for x in iterable]
where:
Here is an example:
>>> a = [11, 22, 33, 44] >>> b = [x * 2 for x in a] >>> b [22, 44, 66, 88]
Exercises:
Given the following list of strings:
names = ['alice', 'bertrand', 'charlene']
produce the following lists: (1) a list of all upper case names; (2) a list of capitalized (first letter upper case);
Given the following function which calculates the factorial of a number:
def t(n): if n <= 1: return n else: return n * t(n - 1)
and the following list of numbers:
numbers = [2, 3, 4, 5]
create a list of the factorials of each of the numbers in the list.
Solutions:
For our expression in a list comprehension, use the upper and capitalize methods:
>>> names = ['alice', 'bertrand', 'charlene'] >>> [name.upper() for name in names] ['ALICE', 'BERTRAND', 'CHARLENE'] >>> [name.capitalize() for name in names] ['Alice', 'Bertrand', 'Charlene']
The expression in our list comprehension calls the factorial function:
def t(n): if n <= 1: return n else: return n * t(n - 1) def test(): numbers = [2, 3, 4, 5] factorials = [t(n) for n in numbers] print 'factorials:', factorials if __name__ == '__main__': test()
A list comprehension can also contain an if clause. Here is a template:
[expr(x) for x in iterable if pred(x)]
where:
Only values for which the if clause evaluates to true are included in creating the resulting list.
Examples:
>>> a = [11, 22, 33, 44] >>> b = [x * 3 for x in a if x % 2 == 0] >>> b [66, 132]
Exercises:
Given two lists, generate a list of all the strings in the first list that are not in the second list. Here are two sample lists:
names1 = ['alice', 'bertrand', 'charlene', 'daniel'] names2 = ['bertrand', 'charlene']
Solutions:
The if clause of our list comprehension checks for containment in the list names2:
def test(): names1 = ['alice', 'bertrand', 'charlene', 'daniel'] names2 = ['bertrand', 'charlene'] names3 = [name for name in names1 if name not in names2] print 'names3:', names3 if __name__ == '__main__': test()
When run, this script prints out the following:
names3: ['alice', 'daniel']
A string is an ordered sequence of characters. Here are a few characteristics of strings:
Create strings with single quotes or double quotes. You can put single quotes inside double quotes and you can put double quotes inside single quotes. You can also escape characters with a backslash.
Exercises:
Solutions:
Create a string with double quotes to include single quotes inside the string:
>>> str1 = "that is jerry's ball"
Create a string enclosed with single quotes in order to include double quotes inside the string:
>>> str1 = 'say "goodbye", bullwinkle'
Take your choice. Escape either the single quotes or the double quotes with a backslash:
>>> str1 = 'say "hello" to jerry\'s mom' >>> str2 = "say \"hello\" to jerry's mom" >>> str1 'say "hello" to jerry\'s mom' >>> str2 'say "hello" to jerry\'s mom'
Triple quotes enable you to create a string that spans multiple lines. Use three single quotes or three double quotes to create a single quoted string.
Examples:
Solutions:
Use triple single quotes or triple double quotes to create multi-line strings:
String1 = '''This string extends across several lines. And, so it has end-of-line characters in it. ''' String2 = """ This string begins and ends with an end-of-line character. It can have both 'single' quotes and "double" quotes in it. """ def test(): print String1 print String2 if __name__ == '__main__': test()
Python does not have a distinct character type. In Python, a character is a string of length 1. You can use the ord() and chr() built-in functions to convert from character to integer and back.
Exercises:
Solutions:
The character "a" is a plain string of length 1:
>>> x = 'a'
The integer equivalent of the letter "A":
>>> x = "A" >>> ord(x) 65
You can concatenate strings with the "+" operator.
You can create multiple concatenated copies of a string with the "*" operator.
And, augmented assignment (+= and *=) also work.
Examples:
>>> 'cat' + ' and ' + 'dog' 'cat and dog' >>> '#' * 40 '########################################' >>> >>> s1 = 'flower' >>> s1 += 's' >>> s1 'flowers'
Exercises:
Given these strings:
>>> s1 = 'abcd' >>> s2 = 'efgh'
create a new string composed of the first string followed by (concatenated with) the second.
Create a single string containing 5 copies of the string 'abc'.
Use the multiplication operator to create a "line" of 50 dashes.
Here are the components of a path to a file on the file system: "home", "myusername", "Workdir", "notes.txt". Concatenate these together separating them with the path separator to form a complete path to that file. (Note that if you use the backslash to separate components of the path, you will need to use a double backslash, because the backslash is the escape character in strings.
Solutions:
The plus (+) operator applied to a string can be used to concatenate strings:
>>> s3 = s1 + s2 >>> s3 'abcdefgh'
The multiplication operator (*) applied to a string creates a new string that concatenates a string with itself some number of times:
>>> s1 = 'abc' * 5 >>> s1 'abcabcabcabcabc'
The multiplication operator (*) applied to a string can be used to create a "horizontal divider line":
>>> s1 = '-' * 50 >>> print s1 --------------------------------------------------
The sep member of the os module gives us a platform independent way to construct paths:
>>> import os >>> >>> a = ["home", "myusername", "Workdir", "notes.txt"] >>> path = a[0] + os.sep + a[1] + os.sep + a[2] + os.sep + a[3] >>> path 'home/myusername/Workdir/notes.txt'
And, a more concise solution:
>>> import os >>> a = ["home", "myusername", "Workdir", "notes.txt"] >>> os.sep.join(a) 'home/myusername/Workdir/notes.txt'
Notes:
String support a variety of operations. You can obtain a list of these methods by using the dir() built-in function on any string:
>>> dir("") ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
And, you can get help on any specific method by using the help() built-in function. Here is an example:
>>> help("".strip) Help on built-in function strip: strip(...) S.strip([chars]) -> string or unicode Return a copy of the string S with leading and trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping
Exercises:
Solutions:
The rstrip() method strips whitespace off the right side of a string:
>>> s1 = 'some text \n' >>> s1 'some text \n' >>> s2 = s1.rstrip() >>> s2 'some text'
The center(n) method centers a string within a padded string of width n:
>>> s1 = 'Dave' >>> s2 = s1.center(20) >>> s2 ' Dave '
The upper() method produces a new string that converts all alpha characters in the original to upper case:
>>> s1 = 'Banana' >>> s1 'Banana' >>> s2 = s1.upper() >>> s2 'BANANA'
The split(sep) method produces a list of strings that are separated by sep in the original string. If sep is omitted, whitespace is treated as the separator:
>>> s1 = """how does it feel ... to be on your own ... no directions known ... like a rolling stone ... """ >>> words = s1.split() >>> words ['how', 'does', 'it', 'feel', 'to', 'be', 'on', 'your', 'own', 'no', 'directions', 'known', 'like', 'a', 'rolling', 'stone']
Note that the split() function in the re (regular expression) module is useful when the separator is more complex than whitespace or a single character.
The join() method concatenates strings from a list of strings to form a single string:
>>> lines = [] >>> lines.append('how does it feel') >>> lines.append('to be on your own') >>> lines.append('no directions known') >>> lines.append('like a rolling stone') >>> lines ['how does it feel', 'to be on your own', 'no directions known', 'like a rolling stone'] >>> s1 = ''.join(lines) >>> s2 = ' '.join(lines) >>> s3 = '\n'.join(lines) >>> s1 'how does it feelto be on your ownno directions knownlike a rolling stone' >>> s2 'how does it feel to be on your own no directions known like a rolling stone' >>> s3 'how does it feel\nto be on your own\nno directions known\nlike a rolling stone' >>> print s3 how does it feel to be on your own no directions known like a rolling stone
Raw strings give us a convenient way to include the backslash character in a string without escaping (with an additional backslash). Raw strings look like plain literal strings, but are prefixed with an "r" or "R". See String literals http://docs.python.org/reference/lexical_analysis.html#string-literals
Excercises:
Solutions:
We use an "r" prefix to define a raw string:
>>> print 'abc \\ def' abc \ def >>> print r'abc \ def' abc \ def
Unicode strings give us a consistent way to process character data from a variety of character encodings.
Excercises:
Solutions:
We can represent unicode string with either the "u" prefix or with a call to the unicode type:
def exercise1(): a = u'abcd' print a b = unicode('efgh') print b
We convert a string from another character encoding into unicode with the decode() string method:
import sys def exercise2(): a = 'abcd'.decode('utf-8') print a b = 'abcd'.decode(sys.getdefaultencoding()) print b
We can convert a unicode string to another character encoding with the encode() string method:
import sys def exercise3(): a = u'abcd' print a.encode('utf-8') print a.encode(sys.getdefaultencoding())
Here are two ways to check the type of a string:
import types def exercise4(): a = u'abcd' print type(a) is types.UnicodeType print type(a) is type(u'')
We can encode unicode characters in a string in several ways, for example, (1) by defining a utf-8 string and converting it to unicode or (2) defining a string with an embedded unicode character or (3) concatenating a unicode characher into a string:
def exercise5(): utf8_string = 'Ivan Krsti\xc4\x87' unicode_string = utf8_string.decode('utf-8') print unicode_string.encode('utf-8') print len(utf8_string) print len(unicode_string) unicode_string = u'aa\u0107bb' print unicode_string.encode('utf-8') unicode_string = 'aa' + unichr(263) + 'bb' print unicode_string.encode('utf-8')
Guidance for use of encodings and unicode:
Convert/decode from an external encoding to unicode early:
my_source_string.decode(encoding)
Do your work (Python processing) in unicode.
Convert/encode to an external encoding late (for example, just before saving to an external file):
my_unicode_string.encode(encoding)
For more information, see:
A dictionary is an un-ordered collection of key-value pairs.
A dictionary has a length, specifically the number of key-value pairs.
The keys must be immutable object types.
Curley brackets are used to represent a dictionary. Each pair in the dictionary is represented by a key and value separated by a colon. Multiple pairs are separated by comas. For example, here is an empty dictionary and several dictionaries containing key/value pairs:
In [4]: d1 = {} In [5]: d2 = {'width': 8.5, 'height': 11} In [6]: d3 = {1: 'RED', 2: 'GREEN', 3: 'BLUE', } In [7]: d1 Out[7]: {} In [8]: d2 Out[8]: {'height': 11, 'width': 8.5} In [9]: d3 Out[9]: {1: 'RED', 2: 'GREEN', 3: 'BLUE'}
Notes:
Exercises:
Define a dictionary that has the following key-value pairs:
Key
Value
Eggplant
Purple
Tomato
Red
Parsley
Green
Lemon
Yellow
Pepper
Green, Red, Yellow
Define a dictionary to represent the "enum" days of the week: Sunday, Monday, Tuesday, ...
Solutions:
A dictionary whose keys and values are strings can be used to represent this table:
vegetables = { 'Eggplant': 'Purple', 'Tomato': 'Red', 'Parsley': 'Green', 'Lemon': 'Yellow', 'Pepper': 'Green', }
Note that the open curly bracket enables us to continue this statement across multiple lines without using a backslash.
We might use strings for the names of the days of the week as keys:
DAYS = { 'Sunday': 1, 'Monday': 2, 'Tuesday': 3, 'Wednesday': 4, 'Thrusday': 5, 'Friday': 6, 'Saturday': 7, }
Dictionaries support the following "operators":
Length -- len(d) returns the number of pairs in a dictionary.
Indexing -- You can both set and get the value associated with a key by using the indexing operator [ ]. Examples:
In [12]: d3[2] Out[12]: 'GREEN' In [13]: d3[0] = 'WHITE' In [14]: d3[0] Out[14]: 'WHITE'
Test for key -- The in operator tests for the existence of a key in a dictionary. Example:
In [6]: trees = {'poplar': 'deciduous', 'cedar': 'evergreen'} In [7]: if 'cedar' in trees: ...: print 'The cedar is %s' % (trees['cedar'], ) ...: The cedar is evergreen
Exercises:
Create an empty dictionary, then use the indexing operator [ ] to in sert the following name-value pairs:
"red" -- "255:0:0" "green" -- "0:255:0" "blue" -- "0:0:255"
Print out the number of items in your dictionary.
Solutions:
We can use "[ ]" to set the value of a key in a dictionary:
def test(): colors = {} colors["red"] = "255:0:0" colors["green"] = "0:255:0" colors["blue"] = "0:0:255" print 'The value of red is "%s"' % (colors['red'], ) print 'The colors dictionary contains %d items.' % (len(colors), ) test()
When we run this, we see:
The value of red is "255:0:0" The colors dictionary contains 3 items.
The len() built-in function gives us the number of items in a dictionary. See the previous solution for an example of this.
Here is a table that describes the methods applicable to dictionarys:
Operation | Result |
---|---|
len(a) | the number of items in a |
a[k] | the item of a with key k |
a[k] = v | set a[k] to v |
del a[k] | remove a[k] from a |
a.clear() | remove all items from a |
a.copy() | a (shallow) copy of a |
k in a | True if a has a key k, else False |
k not in a | equivalent to not k in a |
a.has_key(k) | equivalent to k in a, use that form in new code |
a.items() | a copy of a's list of (key, value) pair |
a.keys() | a copy of a's list of keys |
a.update([b]) | updates a with key/value pairs from b, overwriting existing keys, returns None |
a.fromkeys(seq[, value]) | creates a new dictionary with keys from seq and values set to value |
a.values() | a copy of a's list of values |
a.get(k[, x]) | a[k] if k in a, else x) |
a.setdefault(k[, x]) | a[k] if k in a, else x (also setting it) |
a.pop(k[, x]) | a[k] if k in a, else x (and remove k) (8) |
a.popitem() | remove and return an arbitrary (key, value) pair |
a.iteritems() | return an iterator over (key, value) pairs |
a.iterkeys() | return an iterator over the mapping's keys |
a.itervalues() | return an iterator over the mapping's values |
You can also find this table at the standard documentation Web site in the "Python Library Reference": Mapping Types -- dict http://docs.python.org/lib/typesmapping.html
Exercises:
Solutions:
We can use the d.items() method to retrieve a list of tuples containing key-value pairs, then use unpacking to capture the key and value:
Vegetables = { 'Eggplant': 'Purple', 'Tomato': 'Red', 'Parsley': 'Green', 'Lemon': 'Yellow', 'Pepper': 'Green', } def test(): for key, value in Vegetables.items(): print 'key:', key, ' value:', value test()
We retrieve a list of keys with the keys() method, the sort it with the list sort() method:
Vegetables = { 'Eggplant': 'Purple', 'Tomato': 'Red', 'Parsley': 'Green', 'Lemon': 'Yellow', 'Pepper': 'Green', } def test(): keys = Vegetables.keys() keys.sort() for key in keys: print 'key:', key, ' value:', Vegetables[key] test()
To test for the existence of a key in a dictionary, we can use either the in operator (preferred) or the d.has_key() method (old style):
Vegetables = { 'Eggplant': 'Purple', 'Tomato': 'Red', 'Parsley': 'Green', 'Lemon': 'Yellow', 'Pepper': 'Green', } def test(): if 'Eggplant' in Vegetables: print 'we have %s egplants' % Vegetables['Eggplant'] if 'Banana' not in Vegetables: print 'yes we have no bananas' if Vegetables.has_key('Parsley'): print 'we have leafy, %s parsley' % Vegetables['Parsley'] test()
Which will print out:
we have Purple egplants yes we have no bananas we have leafy, Green parsley
A Python file object represents a file on a file system.
A file object open for reading a text file is iterable. When we iterate over it, it produces the lines in the file.
A file may be opened in these modes:
The open() built-in function is used to create a file object. For example, the following code (1) opens a file for writing, then (2) for reading, then (3) for appending, and finally (4) for reading again:
def test(infilename): # 1. Open the file in write mode, which creates the file. outfile = open(infilename, 'w') outfile.write('line 1\n') outfile.write('line 2\n') outfile.write('line 3\n') outfile.close() # 2. Open the file for reading. infile = open(infilename, 'r') for line in infile: print 'Line:', line.rstrip() infile.close() # 3. Open the file in append mode, and add a line to the end of # the file. outfile = open(infilename, 'a') outfile.write('line 4\n') outfile.close() print '-' * 40 # 4. Open the file in read mode once more. infile = open(infilename, 'r') for line in infile: print 'Line:', line.rstrip() infile.close() test('tmp.txt')
Exercises:
Solutions:
Use the open() built-in function to open the file and create a file object. Use the read() method on the file object to read the entire file. Use the split() or splitlines() methods to split the file into lines:
>>> infile = open('tmp.txt', 'r') >>> content = infile.read() >>> infile.close() >>> lines = content.splitlines() >>> print lines ['line 1', 'line 2', 'line 3', '']
The f.readlines() method returns a list of lines in a file:
>>> infile = open('tmp.txt', 'r') >>> lines = infile.readlines() >>> infile.close() >>> print lines ['line 1\n', 'line 2\n', 'line 3\n']
Since a file object (open for reading) is itself an iterator, we can iterate over it in a for statement:
""" Test iteration over a text file. Usage: python test.py in_file_name """ import sys def test(infilename): infile = open(infilename, 'r') for line in infile: # Strip off the new-line character and any whitespace on # the right. line = line.rstrip() # Print only non-blank lines. if line: print line infile.close() def main(): args = sys.argv[1:] if len(args) != 1: print __doc__ sys.exit(1) infilename = args[0] test(infilename) if __name__ == '__main__': main()
Notes:
None is a singleton. There is only one instance of None. Use this value to indicate the absence of any other "real" value.
Test for None with the identity operator is.
Exercises:
Solutions:
The identity operators is and is not can be used to test for None:
>>> a = [11, None, 'abc', None, {}] >>> a [11, None, 'abc', None, {}] >>> count = 0 >>> for item in a: ... if item is None: ... count += 1 ... >>> >>> print count 2
Python has the two boolean values True and False. Many comparison operators return True and False.
Examples:
What value is returned by 3 > 2?
Answer: The boolean value True.
Given these variable definitions:
x = 3 y = 4 z = 5
What does the following print out:
print y > x and z > y
Answer -- Prints out "True"
The assignment statement uses the assignment operator =.
The assignment statement is a binding statement: it binds a value to a name within a namespace.
Exercises:
Solutions:
1. The = operator is an assignment statement that binds a value to a variable:
>>> vegetable = "eggplant"
There is also augmented assignment using the operators +=, -=, *=, /=, etc.
Exercises:
Solutions:
The += operator increments the value of an integer:
>>> count = 0 >>> count += 1 >>> count 1 >>> count += 1 >>> count 2
The += operator appends characters to the end of a string:
>>> buffer = 'abcde' >>> buffer += 'fgh' >>> buffer 'abcdefgh'
The += operator appends items in one list to another:
In [20]: a = [11, 22, 33] In [21]: b = [44, 55] In [22]: a += b In [23]: a Out[23]: [11, 22, 33, 44, 55]
The -= operator decrements the value of an integer:
>>> count = 5 >>> count 5 >>> count -= 1 >>> count 4
You can also assign a value to (1) an element of a list, (2) an item in a dictionary, (3) an attribute of an object, etc.
Exercises:
Create a list of three items, then assign a new value to the 2nd element in the list.
Create a dictionary, then assign values to the keys "vegetable" and "fruit" in that dictionary.
Use the following code to create an instance of a class:
class A(object): pass a = A()
Then assign values to an attribue named category in that instance.
Solutions:
Assignment with the indexing operator [] assigns a value to an element in a list:
>>> trees = ['pine', 'oak', 'elm'] >>> trees ['pine', 'oak', 'elm'] >>> trees[1] = 'cedar' >>> trees ['pine', 'cedar', 'elm']
Assignment with the indexing operator [] assigns a value to an item (a key-value pair) in a dictionary:
>>> foods = {} >>> foods {} >>> foods['vegetable'] = 'green beans' >>> foods['fruit'] = 'nectarine' >>> foods {'vegetable': 'green beans', 'fruit': 'nectarine'}
Assignment along with the dereferencing operator . (dot) enables us to assign a value to an attribute of an object:
>>> class A(object): ... pass ... >>> a = A() >>> a.category = 25 >>> a.__dict__ {'category': 25} >>> a.category 25
Warning: Be aware that the print statement will go away in Python version 3.0. It will be replaced by the built-in print() function.
The print statement sends output to standard output. It provides a somewhat more convenient way of producing output than using sys.stdout.write().
The print statement takes a series of zero or more objects separated by commas. Zero objects produces a blank line.
The print statement normally adds a newline at the end of its output. To eliminate that, add a comma at the end.
Exercises:
Solutions:
We can print a literal string:
>>> print 'Hello, there' Hello, there
We can print literals and the value of variables:
>>> description = 'cute' >>> print 'I am a', description, 'kid.' I am a cute kid.
The string formatting operator gives more control over formatting output:
>>> name = 'Alice' >>> print 'My name is "%s".' % (name, ) My name is "Alice".
The if statement is a compound statement that enables us to conditionally execute blocks of code.
The if statement also has optional elif: and else: clauses.
The condition in an if: or elif: clause can be any Python expression, in other words, something that returns a value (even if that value is None).
In the condition in an if: or elif: clause, the following values count as "false":
All other values count as true.
Exercises:
Given the following list:
>>> bananas = ['banana1', 'banana2', 'banana3',]
Print one message if it is an empty list and another messge if it is not.
Here is one way of defining a Python equivalent of an "enum":
NO_COLOR, RED, GREEN, BLUE = range(4)
Write an if: statement which implements the effect of a "switch" statement in Python. Print out a unique message for each color.
Solutions:
We can test for an empty or non-empty list:
>>> bananas = ['banana1', 'banana2', 'banana3',] >>> if not bananas: ... print 'yes, we have no bananas' ... else: ... print 'yes, we have bananas' ... yes, we have bananas
We can simulate a "switch" statement using if:elif: ...:
NO_COLOR, RED, GREEN, BLUE = range(4) def test(color): if color == RED: print "It's red." elif color == GREEN: print "It's green." elif color == BLUE: print "It's blue." def main(): color = BLUE test(color) if __name__ == '__main__': main()
Which, when run prints out the following:
It's blue.
The for: statement is the Python way to iterate over and process the elements of a collection or other iterable.
The basic form of the for: statement is the following:
for X in Y: statement o o o
where:
Exercises:
Solutions:
The for: statement can iterate over the items in a list:
In [13]: a = [11, 22, 33, ] In [14]: for value in a: ....: print 'value: %d' % value ....: ....: value: 11 value: 22 value: 33
The for: statement can iterate over the characters in a string:
In [16]: b = 'chocolate' In [17]: for chr1 in b: ....: print 'character: %s' % chr1 ....: ....: character: c character: h character: o character: c character: o character: l character: a character: t character: e
Notes:
When we need a sequential index, we can use the range() built-in function to create a list of integers. And, the xrange() built-in function produces an interator that produces a sequence of integers without creating the entire list. To iterate over a large sequence of integers, use xrange() instead of range().
Exercises:
Print out the integers from 0 to 5 in sequence.
Compute the sum of all the integers from 0 to 99999.
Given the following generator function:
import urllib Urls = [ 'http://yahoo.com', 'http://python.org', 'http://gimp.org', # The GNU image manipulation program ] def walk(url_list): for url in url_list: f = urllib.urlopen(url) stuff = f.read() f.close() yield stuff
Write a for: statement that uses this iterator generator to print the lengths of the content at each of the Web pages in that list.
Solutions:
The range() built-in function gives us a sequence to iterate over:
In [5]: for idx in range(6): ...: print 'idx: %d' % idx ...: ...: idx: 0 idx: 1 idx: 2 idx: 3 idx: 4 idx: 5
Since that sequence is a bit large, we'll use xrange() instead of range():
In [8]: count = 0 In [9]: for n in xrange(100000): ...: count += n ...: ...: In [10]: count Out[10]: 4999950000
The for: statement enables us to iterate over iterables as well as collections:
import urllib Urls = [ 'http://yahoo.com', 'http://python.org', 'http://gimp.org', # The GNU image manipulation program ] def walk(url_list): for url in url_list: f = urllib.urlopen(url) stuff = f.read() f.close() yield stuff def test(): for url in walk(Urls): print 'length: %d' % (len(url), ) if __name__ == '__main__': test()
When I ran this script, it prints the following:
length: 9562 length: 16341 length: 12343
If you need an index while iterating over a sequence, consider using the enumerate() built-in function.
Exercises:
Given the following two lists of integers of the same length:
a = [1, 2, 3, 4, 5] b = [100, 200, 300, 400, 500]
Add the values in the first list to the corresponding values in the second list.
Solutions:
The enumerate() built-in function gives us an index and values from a sequence. Since enumerate() gives us an interator that produces a sequence of two-tuples, we can unpack those tuples into index and value variables in the header line of the for statement:
In [13]: a = [1, 2, 3, 4, 5] In [14]: b = [100, 200, 300, 400, 500] In [15]: In [16]: for idx, value in enumerate(a): ....: b[idx] += value ....: ....: In [17]: b Out[17]: [101, 202, 303, 404, 505]
A while: statement executes a block of code repeatedly as long as a condition is true.
Here is a template for the while: statement:
while condition: statement o o o
Where:
Exercises:
Solutions:
A while: loop with an index variable can be used to modify each element of a list:
def test_while(): numbers = [11, 22, 33, 44, ] print 'before: %s' % (numbers, ) idx = 0 while idx < len(numbers): numbers[idx] *= 2 idx += 1 print 'after: %s' % (numbers, )
But, notice that this task is easier using the for: statement and the built-in enumerate() function:
def test_for(): numbers = [11, 22, 33, 44, ] print 'before: %s' % (numbers, ) for idx, item in enumerate(numbers): numbers[idx] *= 2 print 'after: %s' % (numbers, )
The continue statement skips the remainder of the statements in the body of a loop and starts immediately at the top of the loop again.
A break statement in the body of a loop terminates the loop. It exits from the immediately containing loop.
break and continue can be used in both for: and while: statements.
Exercises:
Solutions:
The continue statement enables us to "skip" items that satisfy a condition or test:
def test(): numbers = [11, 22, 33, 44, 55, 66, ] print 'before: %s' % (numbers, ) for idx, item in enumerate(numbers): if item % 2 != 0: continue numbers[idx] *= 3 print 'after: %s' % (numbers, ) test()
The break statement enables us to exit from a loop when we find a zero:
def test(): numbers = [11, 22, 33, 0, 44, 55, 66, ] print 'numbers: %s' % (numbers, ) sum = 0 for item in numbers: if item == 0: break sum += item print 'sum: %d' % (sum, ) test()
The try:except: statement enables us to catch an exception that is thrown from within a block of code, or from code called from any depth withing that block.
The raise statement enables us to throw an exception.
An exception is a class or an instance of an exception class. If an exception is not caught, it results in a traceback and termination of the program.
There is a set of standard exceptions. You can learn about them here: Built-in Exceptions -- http://docs.python.org/lib/module-exceptions.html.
You can define your own exception classes. To do so, create an empty subclass of the class Exception. Defining your own exception will enable you (or others) to throw and then catch that specific exception type while ignore others exceptions.
Exercises:
Write a try:except: statement that attempts to open a file for reading and catches the exception thrown when the file does not exist.
Question: How do you find out the name of the exception that is thrown for an input/output error such as the failure to open a file?
Define an exception class. Then write a try:except: statement in which you throw and catch that specific exception.
Define an exception class and use it to implement a multi-level break from an inner loop, by-passing an outer loop.
Solutions:
Use the Python interactive interpreter to learn the exception type thrown when a I/O error occurs. Example:
>>> infile = open('xx_nothing__yy.txt', 'r') Traceback (most recent call last): File "<stdin>", line 1, in <module> IOError: [Errno 2] No such file or directory: 'xx_nothing__yy.txt' >>>
In this case, the exception type is IOError.
Now, write a try:except: block which catches that exception:
def test(): infilename = 'nothing_noplace.txt' try: infile = open(infilename, 'r') for line in infile: print line except IOError, exp: print 'cannot open file "%s"' % infilename test()
We define a exception class as a sub-class of class Exception, then throw it (with the raise statement) and catch it (with a try:except: statement):
class SizeError(Exception): pass def test_exception(size): try: if size <= 0: raise SizeError, 'size must be greater than zero' # Produce a different error to show that it will not be caught. x = y except SizeError, exp: print '%s' % (exp, ) print 'goodbye' def test(): test_exception(-1) print '-' * 40 test_exception(1) test()
When we run this script, it produces the following output:
$ python workbook027.py size must be greater than zero goodbye ---------------------------------------- Traceback (most recent call last): File "workbook027.py", line 20, in <module> test() File "workbook027.py", line 18, in test test_exception(1) File "workbook027.py", line 10, in test_exception x = y NameError: global name 'y' is not defined
Notes:
We define a sub-class of of class Exception, then raise it in an inner loop and catch it outside of an outer loop:
class BreakException1(Exception): pass def test(): a = [11, 22, 33, 44, 55, 66, ] b = [111, 222, 333, 444, 555, 666, ] try: for x in a: print 'outer -- x: %d' % x for y in b: if x > 22 and y > 444: raise BreakException1('leaving inner loop') print 'inner -- y: %d' % y print 'outer -- after' print '-' * 40 except BreakException1, exp: print 'out of loop -- exp: %s' % exp test()
Here is what this prints out when run:
outer -- x: 11 inner -- y: 111 inner -- y: 222 inner -- y: 333 inner -- y: 444 inner -- y: 555 inner -- y: 666 outer -- after ---------------------------------------- outer -- x: 22 inner -- y: 111 inner -- y: 222 inner -- y: 333 inner -- y: 444 inner -- y: 555 inner -- y: 666 outer -- after ---------------------------------------- outer -- x: 33 inner -- y: 111 inner -- y: 222 inner -- y: 333 inner -- y: 444 out of loop -- exp: leaving inner loop
A function has these characteristics:
A function is defined with the def: statement. Here is a simple example/template:
def function_name(arg1, arg2): local_var1 = arg1 + 1 local_var2 = arg2 * 2 return local_var1 + local_var2
And, here is an example of calling this function:
result = function_name(1, 2)
Here are a few notes of explanation:
Exercises:
Solutions:
The return statement enables us to return a value from a function:
def list_sum(values): sum = 0 for value in values: sum += value return sum def test(): a = [11, 22, 33, 44, ] print list_sum(a) if __name__ == '__main__': test()
You can provide a default value for an argument to a function.
If you do, that argument is optional (when the function is called).
Here are a few things to learn about optional arguments:
Provide a default value with an equal sign and a value. Example:
def sample_func(arg1, arg2, arg3='empty', arg4=0):
All parameters with default values must be after (to the right of) normal parameters.
Do not use a mutable object as a default value. Because the def: statement is evaluated only once and not each time the function is called, the mutable object might be shared across multiple calls to the function. Do not do this:
def sample_func(arg1, arg2=[]):
Instead, do this:
def sample_func(arg1, arg2=None): if arg2 is None: arg2 = []
Here is an example that illustrates how this might go wrong:
def adder(a, b=[]): b.append(a) return b def test(): print adder('aaa') print adder('bbb') print adder('ccc') test()
Which, when executed, displays the following:
['aaa'] ['aaa', 'bbb'] ['aaa', 'bbb', 'ccc']
Exercises:
Solutions:
We can pass a file as we would any other object. And, we can use a newline character as a default parameter value:
import sys def writer(outfile, msg='\n'): outfile.write(msg) def test(): writer(sys.stdout, 'aaaaa\n') writer(sys.stdout) writer(sys.stdout, 'bbbbb\n') test()
When run from the command line, this prints out the following:
aaaaa bbbbb
In this solution we are careful not to use a mutable object as a default value:
def add_to_dict(name, value, dic=None): if dic is None: dic = {} dic[name] = value return dic def test(): dic1 = {'albert': 'cute', } print add_to_dict('barry', 'funny', dic1) print add_to_dict('charlene', 'smart', dic1) print add_to_dict('darryl', 'outrageous') print add_to_dict('eddie', 'friendly') test()
If we run this script, we see:
{'barry': 'funny', 'albert': 'cute'} {'barry': 'funny', 'albert': 'cute', 'charlene': 'smart'} {'darryl': 'outrageous'} {'eddie': 'friendly'}
Notes:
A function, like any other object, can be passed as an argument to a function. This is due the the fact that almost all (maybe all) objects in Python are "first class objects". A first class object is one which we can:
Exercises:
Write a function that takes three arguments: (1) an input file, (2) an output file, and (3) a filter function:
The above function should read each line in the input text file, pass that line through the filter function, then write that (possibly) transformed line to the output file.
Now, write one or more "filter functions" that can be passed to the function described above.
Solutions:
This script adds or removes comment characters to the lines of a file:
import sys def filter(infile, outfile, filterfunc): for line in infile: line = filterfunc(line) outfile.write(line) def add_comment(line): line = '## %s' % (line, ) return line def remove_comment(line): if line.startswith('## '): line = line[3:] return line def main(): filter(sys.stdin, sys.stdout, add_comment) if __name__ == '__main__': main()
Running this might produce something like the following (note for MS Windows users: use type instead of cat):
$ cat tmp.txt line 1 line 2 line 3 $ cat tmp.txt | python workbook005.py ## line 1 ## line 2 ## line 3
Additional positional arguments passed to a function that are not specified in the function definition (the def: statement``), are collected in an argument preceded by a single asterisk. Keyword arguments passed to a function that are not specified in the function definition can be collected in a dictionary and passed to an argument preceded by a double asterisk.
Examples:
Solutions:
We use *args and **kwargs to collect extra arguments and extra keyword arguments:
def show_args(x, y=-1, *args, **kwargs): print '-' * 40 print 'x:', x print 'y:', y print 'args:', args print 'kwargs:', kwargs def test(): show_args(1) show_args(x=2, y=3) show_args(y=5, x=4) show_args(4, 5, 6, 7, 8) show_args(11, y=44, a=55, b=66) test()
Running this script produces the following:
$ python workbook006.py ---------------------------------------- x: 1 y: -1 args: () kwargs: {} ---------------------------------------- x: 2 y: 3 args: () kwargs: {} ---------------------------------------- x: 4 y: 5 args: () kwargs: {} ---------------------------------------- x: 4 y: 5 args: (6, 7, 8) kwargs: {} ---------------------------------------- x: 11 y: 44 args: () kwargs: {'a': 55, 'b': 66}
Notes:
We use args and kwargs to catch and pass on all arguments:
def func1(*args, **kwargs): print 'args: %s' % (args, ) print 'kwargs: %s' % (kwargs, ) def func2(*args, **kwargs): print 'before' func1(*args, **kwargs) print 'after' def test(): func2('aaa', 'bbb', 'ccc', arg1='ddd', arg2='eee') test()
When we run this, it prints the following:
before args: ('aaa', 'bbb', 'ccc') kwargs: {'arg1': 'ddd', 'arg2': 'eee'} after
Notes:
In a function definition, arguments must appear in the following order, from left to right:
In a function call, arguments must appear in the following order, from left to right:
If the arguments and return value of a function satisfy some description, then we can say that the function is polymorphic with respect to that description.
If the some of the methods of an object satisfy some description, then we can say that the object is polymorphic with respect to that description.
Basically, what this does is to enable us to use a function or an object anywhere that function satisfies the requirements given by a description.
Exercises:
Solutions:
We can pass a function as an argument to a function:
def fancy(obj): print 'fancy fancy -- %s -- fancy fancy' % (obj, ) def plain(obj): print 'plain -- %s -- plain' % (obj, ) def show(func, obj): func(obj) def main(): a = {'aa': 11, 'bb': 22, } show(fancy, a) show(plain, a) if __name__ == '__main__': main()
We can also put functions (function objects) in a data structure (for example, a list), and then pass that data structure to a function:
def fancy(obj): print 'fancy fancy -- %s -- fancy fancy' % (obj, ) def plain(obj): print 'plain -- %s -- plain' % (obj, ) Func_list = [fancy, plain, ] def show(funcs, obj): for func in funcs: func(obj) def main(): a = {'aa': 11, 'bb': 22, } show(Func_list, a) if __name__ == '__main__': main()
Notice that Python supports polymorphism (with or) without inheritance. This type of polymorphism is enabled by what is called duck-typing. For more on this see: Duck typing -- http://en.wikipedia.org/wiki/Duck_typing at Wikipedia.
A recursive function is a function that calls itself.
A recursive function must have a limiting condition, or else it will loop endlessly.
Each recursive call consumes space on the function call stack. Therefore, the number of recursions must have some reasonable upper bound.
Exercises:
Write a recursive function that prints information about each node in the following tree-structure data structure:
Tree = { 'name': 'animals', 'left_branch': { 'name': 'birds', 'left_branch': { 'name': 'seed eaters', 'left_branch': { 'name': 'house finch', 'left_branch': None, 'right_branch': None, }, 'right_branch': { 'name': 'white crowned sparrow', 'left_branch': None, 'right_branch': None, }, }, 'right_branch': { 'name': 'insect eaters', 'left_branch': { 'name': 'hermit thrush', 'left_branch': None, 'right_branch': None, }, 'right_branch': { 'name': 'black headed phoebe', 'left_branch': None, 'right_branch': None, }, }, }, 'right_branch': None, }
Solutions:
We write a recursive function to walk the whole tree. The recursive function calls itself to process each child of a node in the tree:
Tree = { 'name': 'animals', 'left_branch': { 'name': 'birds', 'left_branch': { 'name': 'seed eaters', 'left_branch': { 'name': 'house finch', 'left_branch': None, 'right_branch': None, }, 'right_branch': { 'name': 'white crowned sparrow', 'left_branch': None, 'right_branch': None, }, }, 'right_branch': { 'name': 'insect eaters', 'left_branch': { 'name': 'hermit thrush', 'left_branch': None, 'right_branch': None, }, 'right_branch': { 'name': 'black headed phoebe', 'left_branch': None, 'right_branch': None, }, }, }, 'right_branch': None, } Indents = [' ' * idx for idx in range(10)] def walk_and_show(node, level=0): if node is None: return print '%sname: %s' % (Indents[level], node['name'], ) level += 1 walk_and_show(node['left_branch'], level) walk_and_show(node['right_branch'], level) def test(): walk_and_show(Tree) if __name__ == '__main__': test()
Notes:
The "iterator protocol" defines what an iterator object must do in order to be usable in an "iterator context" such as a for statement. The iterator protocol is described in the standard library reference: Iterator Types -- http://docs.python.org/lib/typeiter.html
An easy way to define an object that obeys the iterator protocol is to write a generator function. A generator function is a function that contains one or more yield statements. If a function contains at least one yield statement, then that function when called, returns generator iterator, which is an object that obeys the iterator protocol, i.e. it's an iterator object.
Note that in recent versions of Python, yield is an expression. This enables the consumer to communicate back with the producer (the generator iterator). For more on this, see PEP: 342 Coroutines via Enhanced Generators - http://www.python.org/dev/peps/pep-0342/.
Exercises:
Implement a generator function -- The generator produced should yield all values from a list/iterable that satisfy a predicate. It should apply the transforms before return each value. The function takes these arguments:
values -- A list of values. Actually, it could be any iterable.
predicate -- A function that takes a single argument, performs a test on that value, and returns True or False.
transforms -- (optional) A list of functions. Apply each function in this list and returns the resulting value. So, for example, if the function is called like this:
result = transforms([11, 22], p, [f, g])
then the resulting generator might return:
g(f(11))
Implement a generator function that takes a list of URLs as its argument and generates the contents of each Web page, one by one (that is, it produces a sequence of strings, the HTML page contents).
Solutions:
Here is the implementation of a function which contains yield, and, therefore, produces a generator:
#!/usr/bin/env python """ filter_and_transform filter_and_transform(content, test_func, transforms=None) Return a generator that returns items from content after applying the functions in transforms if the item satisfies test_func . Arguments: 1. ``values`` -- A list of values 2. ``predicate`` -- A function that takes a single argument, performs a test on that value, and returns True or False. 3. ``transforms`` -- (optional) A list of functions. Apply each function in this list and returns the resulting value. So, for example, if the function is called like this:: result = filter_and_transforms([11, 22], p, [f, g]) then the resulting generator might return:: g(f(11)) """ def filter_and_transform(content, test_func, transforms=None): for x in content: if test_func(x): if transforms is None: yield x elif isiterable(transforms): for func in transforms: x = func(x) yield x else: yield transforms(x) def isiterable(x): flag = True try: x = iter(x) except TypeError, exp: flag = False return flag def iseven(n): return n % 2 == 0 def f(n): return n * 2 def g(n): return n ** 2 def test(): data1 = [11, 22, 33, 44, 55, 66, 77, ] for val in filter_and_transform(data1, iseven, f): print 'val: %d' % (val, ) print '-' * 40 for val in filter_and_transform(data1, iseven, [f, g]): print 'val: %d' % (val, ) print '-' * 40 for val in filter_and_transform(data1, iseven): print 'val: %d' % (val, ) if __name__ == '__main__': test()
Notes:
The following function uses the urllib module and the yield function to generate the contents of a sequence of Web pages:
import urllib Urls = [ 'http://yahoo.com', 'http://python.org', 'http://gimp.org', # The GNU image manipulation program ] def walk(url_list): for url in url_list: f = urllib.urlopen(url) stuff = f.read() f.close() yield stuff def test(): for x in walk(Urls): print 'length: %d' % (len(x), ) if __name__ == '__main__': test()
When I run this, I see:
$ python generator_example.py length: 9554 length: 16748 length: 11487
Classes provide Python's way to define new data types and to do OOP (object-oriented programming).
If you have made it this far, you have already used lots of objects. You have been a "consumer" of objects and their services. Now, you will learn how to define and implement new kinds of objects. You will become a "producer" of objects. You will define new classes and you will implement the capabilities (methods) of each new class.
A class is defined with the class statement. The first line of a class statement is a header (it has a colon at the end), and it specifies the name of the class being defined and an (optional) superclass. And that header introduces a compound statement: specifically, the body of the class statement which contains indented, nested statements, importantly, def statements that define the methods that can be called on instances of the objects implemented by this class.
Exercises:
Solutions:
A simple instance method can have the self parameter and no others:
class Demo(object): def show(self): print 'hello' def test(): a = Demo() a.show() test()
Notes:
A class can define methods with special names. You have seem some of these before. These names begin and end with a double underscore.
One important special name is __init__. It's the constructor for a class. It is called each time an instance of the class is created. Implementing this method in a class gives us a chance to initialize each instance of our class.
Exercises:
Solutions:
The constructor for a class is a method with the special name __init__:
class Plant(object): def __init__(self, name, size): self.name = name self.size = size def show(self): print 'name: "%s" size: %d' % (self.name, self.size, ) def test(): p1 = Plant('Eggplant', 25) p2 = Plant('Tomato', 36) plants = [p1, p2, ] for plant in plants: plant.show() test()
Notes:
It is a good idea to initialize all instance variables in the constructor. That enables someone reading our code to learn about all the instance variables of a class by looking in a single location:
# simple_node.py Indents = [' ' * n for n in range(10)] class Node(object): def __init__(self, name=None, children=None): self.name = name if children is None: self.children = [] else: self.children = children def show_name(self, indent): print '%sname: "%s"' % (Indents[indent], self.name, ) def show(self, indent=0): self.show_name(indent) indent += 1 for child in self.children: child.show(indent) def test(): n1 = Node('N1') n2 = Node('N2') n3 = Node('N3') n4 = Node('N4') n5 = Node('N5', [n1, n2,]) n6 = Node('N6', [n3, n4,]) n7 = Node('N7', [n5, n6,]) n7.show() if __name__ == '__main__': test()
Notes:
A subclass extends or specializes a superclass by adding additional methods to the superclass and by overriding methods (with the same name) that already exist in the superclass.
Exercises:
Solutions:
We can import our previous Node script, then implement classes that have the Node class as a superclass:
from simple_node import Node, Indents class Plant(Node): def __init__(self, name, height=-1, children=None): Node.__init__(self, name, children) self.height = height def show(self, indent=0): self.show_name(indent) print '%sheight: %s' % (Indents[indent], self.height, ) indent += 1 for child in self.children: child.show(indent) class Animal(Node): def __init__(self, name, color='no color', children=None): Node.__init__(self, name, children) self.color = color def show(self, indent=0): self.show_name(indent) print '%scolor: "%s"' % (Indents[indent], self.color, ) indent += 1 for child in self.children: child.show(indent) def test(): n1 = Animal('scrubjay', 'gray blue') n2 = Animal('raven', 'black') n3 = Animal('american kestrel', 'brown') n4 = Animal('red-shouldered hawk', 'brown and gray') n5 = Animal('corvid', 'none', [n1, n2,]) n6 = Animal('raptor', children=[n3, n4,]) n7a = Animal('bird', children=[n5, n6,]) n1 = Plant('valley oak', 50) n2 = Plant('canyon live oak', 40) n3 = Plant('jeffery pine', 120) n4 = Plant('ponderosa pine', 140) n5 = Plant('oak', children=[n1, n2,]) n6 = Plant('conifer', children=[n3, n4,]) n7b = Plant('tree', children=[n5, n6,]) n8 = Node('birds and trees', [n7a, n7b,]) n8.show() if __name__ == '__main__': test()
Notes:
Python also supports class-based polymorphism, which was, by the way, demonstrated in the previous example.
Exercises:
Solution:
We implement three simple classes and then create a list of instances of these classes:
class A(object): def show(self, msg): print 'class A -- msg: "%s"' % (msg, ) class B(object): def show(self, msg): print 'class B -- msg: "%s"' % (msg, ) class C(object): def show(self, msg): print 'class C -- msg: "%s"' % (msg, ) def test(): objs = [A(), B(), C(), A(), ] for idx, obj in enumerate(objs): msg = 'message # %d' % (idx + 1, ) obj.show(msg) if __name__ == '__main__': test()
Notes:
A method in a class can recusively call itself. This is very similar to the way in which we implemented recursive functions -- see: Recursive functions.
Exercises:
Solutions:
We implement a class with three instance variables: (1) name, (2) left branch, and (3) right branch. Then, we implement a show() method that displays the name and calls itself to show the children in each sub-tree:
Indents = [' ' * idx for idx in range(10)] class AnimalNode(object): def __init__(self, name, left_branch=None, right_branch=None): self.name = name self.left_branch = left_branch self.right_branch = right_branch def show(self, level=0): print '%sname: %s' % (Indents[level], self.name, ) level += 1 if self.left_branch is not None: self.left_branch.show(level) if self.right_branch is not None: self.right_branch.show(level) Tree = AnimalNode('animals', AnimalNode('birds', AnimalNode('seed eaters', AnimalNode('house finch'), AnimalNode('white crowned sparrow'), ), AnimalNode('insect eaters', AnimalNode('hermit thrush'), AnimalNode('black headed phoebe'), ), ), None, ) def test(): Tree.show() if __name__ == '__main__': test()
Instead of using a left branch and a right branch, in this solution we use a list to represent the children of a node:
class AnimalNode(object): def __init__(self, data, children=None): self.data = data if children is None: self.children = [] else: self.children = children def show(self, level=''): print '%sdata: %s' % (level, self.data, ) level += ' ' for child in self.children: child.show(level) Tree = AnimalNode('animals', [ AnimalNode('birds', [ AnimalNode('seed eaters', [ AnimalNode('house finch'), AnimalNode('white crowned sparrow'), AnimalNode('lesser gold finch'), ]), AnimalNode('insect eaters', [ AnimalNode('hermit thrush'), AnimalNode('black headed phoebe'), ]), ]) ]) def test(): Tree.show() if __name__ == '__main__': test()
Notes:
A class variable is one whose single value is shared by all instances of the class and, in fact, is shared by all who have access to the class (object).
"Normal" methods are instance methods. An instance method receives the instance as its first argument. A instance method is defined by using the def statement in the body of a class statement.
A class method receives the class as its first argument. A class method is defined by defining a normal/instance method, then using the classmethod built-in function. For example:
class ASimpleClass(object): description = 'a simple class' def show_class(cls, msg): print '%s: %s' % (cls.description , msg, ) show_class = classmethod(show_class)
A static method does not receive anything special as its first argument. A static method is defined by defining a normal/instance method, then using the staticmethod built-in function. For example:
class ASimpleClass(object): description = 'a simple class' def show_class(msg): print '%s: %s' % (ASimpleClass.description , msg, ) show_class = staticmethod(show_class)
In effect, both class methods and static methods are defined by creating a normal (instance) method, then creating a wrapper object (a class method or static method) using the classmethod or staticmethod built-in function.
Exercises:
Solutions:
We use a class variable named instance_count, rather than an instance variable, to keep a running total of instances. Then, we increment that variable each time an instance is created:
class CountInstances(object): instance_count = 0 def __init__(self, name='-no name-'): self.name = name CountInstances.instance_count += 1 def show(self): print 'name: "%s"' % (self.name, ) def show_instance_count(cls): print 'instance count: %d' % (cls.instance_count, ) show_instance_count = classmethod(show_instance_count) def test(): instances = [] instances.append(CountInstances('apple')) instances.append(CountInstances('banana')) instances.append(CountInstances('cherry')) instances.append(CountInstances()) for instance in instances: instance.show() CountInstances.show_instance_count() if __name__ == '__main__': test()
Notes:
When we run this script, it prints out the following:
name: "apple" name: "banana" name: "cherry" name: "-no name-" instance count: 4
The call to the classmethod built-in function effectively wraps the show_instance_count method in a class method, that is, in a method that takes a class object as its first argument rather than an instance object. To read more about classmethod, go to Built-in Functions -- http://docs.python.org/lib/built-in-funcs.html and search for "classmethod".
A static method takes neither an instance (self) nor a class as its first paramenter. And, static method is created with the staticmethod() built-in function (rather than with the classmethod() built-in):
class CountInstances(object): instance_count = 0 def __init__(self, name='-no name-'): self.name = name CountInstances.instance_count += 1 def show(self): print 'name: "%s"' % (self.name, ) def show_instance_count(): print 'instance count: %d' % ( CountInstances.instance_count, ) show_instance_count = staticmethod(show_instance_count) def test(): instances = [] instances.append(CountInstances('apple')) instances.append(CountInstances('banana')) instances.append(CountInstances('cherry')) instances.append(CountInstances()) for instance in instances: instance.show() CountInstances.show_instance_count() if __name__ == '__main__': test()
A decorator enables us to do what we did in the previous example with a somewhat simpler syntax.
For simple cases, the decorator syntax enables us to do this:
@functionwrapper def method1(self): o o o
instead of this:
def method1(self): o o o method1 = functionwrapper(method1)
So, we can write this:
@classmethod def method1(self): o o o
instead of this:
def method1(self): o o o method1 = classmethod(method1)
Exercises:
Solutions:
A decorator is an easier and cleaner way to define a class method (or a static method):
class CountInstances(object): instance_count = 0 def __init__(self, name='-no name-'): self.name = name CountInstances.instance_count += 1 def show(self): print 'name: "%s"' % (self.name, ) @classmethod def show_instance_count(cls): print 'instance count: %d' % (cls.instance_count, ) # Note that the following line has been replaced by # the classmethod decorator, above. # show_instance_count = classmethod(show_instance_count) def test(): instances = [] instances.append(CountInstances('apple')) instances.append(CountInstances('banana')) instances.append(CountInstances('cherry')) instances.append(CountInstances()) for instance in instances: instance.show() CountInstances.show_instance_count() if __name__ == '__main__': test()
Decorators can be used to "wrap" a function with another function.
When implementing a decorator, it is helpful to remember that the following decorator application:
@dec def func(arg1, arg2): pass
is equivalent to:
def func(arg1, arg2): pass func = dec(func)
Therefore, to implement a decorator, we write a function that returns a function object, since we replace the value originally bound to the function with this new function object. It may be helpful to take the view that we are creating a function that is a wrapper for the original function.
Exercises:
Solutions:
A function that contains and returns an inner function can be used to wrap a function:
def trace(func): def inner(*args, **kwargs): print '>>' func(*args, **kwargs) print '<<' return inner @trace def func1(x, y): print 'x:', x, 'y:', y func2((x, y)) @trace def func2(content): print 'content:', content def test(): func1('aa', 'bb') test()
Notes:
Decorators can also take arguments.
The following decorator with arguments:
@dec(argA, argB) def func(arg1, arg2): pass
is equivalent to:
def func(arg1, arg2): pass func = dec(argA, argB)(func)
Because the decorator's arguments are passed to the result of calling the decorator on the decorated function, you may find it useful to implement a decorator with arguments using a function inside a function inside a function.
Exercises:
Solutions:
Implement this decorator that takes arguments with a function containing a nested function which in turn contains a nested function:
def trace(msg): def inner1(func): def inner2(*args, **kwargs): print '>> [%s]' % (msg, ) retval = func(*args, **kwargs) print '<< [%s]' % (msg, ) return retval return inner2 return inner1 @trace('tracing func1') def func1(x, y): print 'x:', x, 'y:', y result = func2((x, y)) return result @trace('tracing func2') def func2(content): print 'content:', content return content * 3 def test(): result = func1('aa', 'bb') print 'result:', result test()
Decorators can be "stacked".
The following stacked decorators:
@dec2 @dec1 def func(arg1, arg2, ...): pass
are equivalent to:
def func(arg1, arg2, ...): pass func = dec2(dec1(func))
Exercises:
Solutions:
Reuse your tracing function from the previous exercise, then write a simple decorator that prints a row of dashes:
def trace(msg): def inner1(func): def inner2(*args, **kwargs): print '>> [%s]' % (msg, ) retval = func(*args, **kwargs) print '<< [%s]' % (msg, ) return retval return inner2 return inner1 def horizontal_line(func): def inner(*args, **kwargs): print '-' * 50 retval = func(*args, **kwargs) print '-' * 50 return retval return inner @trace('tracing func1') def func1(x, y): print 'x:', x, 'y:', y result = func2((x, y)) return result @horizontal_line @trace('tracing func2') def func2(content): print 'content:', content return content * 3 def test(): result = func1('aa', 'bb') print 'result:', result test()
Once again, a decorator with arguments can be implemented with a function nested inside a function which is nested inside a function. This remains the same whether the decorator is used as a stacked decorator or not. Here is a solution:
def trace(msg): def inner1(func): def inner2(*args, **kwargs): print '>> [%s]' % (msg, ) retval = func(*args, **kwargs) print '<< [%s]' % (msg, ) return retval return inner2 return inner1 def horizontal_line(line_chr): def inner1(func): def inner2(*args, **kwargs): print line_chr * 15 retval = func(*args, **kwargs) print line_chr * 15 return retval return inner2 return inner1 @trace('tracing func1') def func1(x, y): print 'x:', x, 'y:', y result = func2((x, y)) return result @horizontal_line('<**>') @trace('tracing func2') def func2(content): print 'content:', content return content * 3 def test(): result = func1('aa', 'bb') print 'result:', result test()
There is more about decorators here:
Definition: iterable (adjective) -- that which can be iterated over.
A good test of whether something is iterable is whether it can be used in a for: statement. For example, if we can write for item in X:, then X is iterable. Here is another simple test:
def isiterable(x): try: y = iter(x) except TypeError, exp: return False return True
Some kinds of iterables:
Exercises:
Solutions:
We implement a class that has __iter__() and next() methods:
import urllib class WebPages(object): def __init__(self, urls): self.urls = urls self.current_index = 0 def __iter__(self): self.current_index = 0 return self def next(self): if self.current_index >= len(self.urls): raise StopIteration url = self.urls[self.current_index] self.current_index += 1 f = urllib.urlopen(url) content = f.read() f.close() return content def test(): urls = [ 'http://www.python.org', 'http://en.wikipedia.org/', 'http://en.wikipedia.org/wiki/Python_(programming_language)', ] pages = WebPages(urls) for page in pages: print 'length: %d' % (len(page), ) pages = WebPages(urls) print '-' * 50 page = pages.next() print 'length: %d' % (len(page), ) page = pages.next() print 'length: %d' % (len(page), ) page = pages.next() print 'length: %d' % (len(page), ) page = pages.next() print 'length: %d' % (len(page), ) test()
The itertools module in the Python standard library has helpers for iterators: http://docs.python.org/library/itertools.html#module-itertools
Exercises:
SAX -- Parse an XML document with SAX, then show some information (tag, attributes, character data) for each element.
Minidom -- Parse an XML document with minidom, then walk the DOM tree and show some information (tag, attributes, character data) for each element.
Here is a sample XML document that you can use for input:
<?xml version="1.0"?> <people> <person id="1" value="abcd" ratio="3.2"> <name>Alberta</name> <interest>gardening</interest> <interest>reading</interest> <category>5</category> </person> <person id="2"> <name>Bernardo</name> <interest>programming</interest> <category></category> <agent> <firstname>Darren</firstname> <lastname>Diddly</lastname> </agent> </person> <person id="3" value="efgh"> <name>Charlie</name> <interest>people</interest> <interest>cats</interest> <interest>dogs</interest> <category>8</category> <promoter> <firstname>David</firstname> <lastname>Donaldson</lastname> <client> <fullname>Arnold Applebee</fullname> <refid>10001</refid> </client> </promoter> <promoter> <firstname>Edward</firstname> <lastname>Eddleberry</lastname> <client> <fullname>Arnold Applebee</fullname> <refid>10001</refid> </client> </promoter> </person> </people>
ElementTree -- Parse an XML document with ElementTree, then walk the DOM tree and show some information (tag, attributes, character data) for each element.
lxml -- Parse an XML document with lxml, then walk the DOM tree and show some information (tag, attributes, character data) for each element.
Modify document with ElementTree -- Use ElementTree to read a document, then modify the tree. Show the contents of the tree, and then write out the modified document.
XPath -- lxml supports XPath. Use the XPath support in lxml to address each of the following in the above XML instance document:
Solutions:
We can use the SAX support in the Python standard library:
#!/usr/bin/env python """ Parse and XML with SAX. Display info about each element. Usage: python test_sax.py infilename Examples: python test_sax.py people.xml """ import sys from xml.sax import make_parser, handler class TestHandler(handler.ContentHandler): def __init__(self): self.level = 0 def show_with_level(self, value): print '%s%s' % (' ' * self.level, value, ) def startDocument(self): self.show_with_level('Document start') self.level += 1 def endDocument(self): self.level -= 1 self.show_with_level('Document end') def startElement(self, name, attrs): self.show_with_level('start element -- name: "%s"' % (name, )) self.level += 1 def endElement(self, name): self.level -= 1 self.show_with_level('end element -- name: "%s"' % (name, )) def characters(self, content): content = content.strip() if content: self.show_with_level('characters: "%s"' % (content, )) def test(infilename): parser = make_parser() handler = TestHandler() parser.setContentHandler(handler) parser.parse(infilename) def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infilename = args[0] test(infilename) if __name__ == '__main__': main()
The minidom module contains a parse() function that enables us to read an XML document and create a DOM tree:
#!/usr/bin/env python """Process an XML document with minidom. Show the document tree. Usage: python minidom_walk.py [options] infilename """ import sys from xml.dom import minidom def show_tree(doc): root = doc.documentElement show_node(root, 0) def show_node(node, level): count = 0 if node.nodeType == minidom.Node.ELEMENT_NODE: show_level(level) print 'tag: %s' % (node.nodeName, ) for key in node.attributes.keys(): attr = node.attributes.get(key) show_level(level + 1) print '- attribute name: %s value: "%s"' % (attr.name, attr.value, ) if (len(node.childNodes) == 1 and node.childNodes[0].nodeType == minidom.Node.TEXT_NODE): show_level(level + 1) print '- data: "%s"' % (node.childNodes[0].data, ) for child in node.childNodes: count += 1 show_node(child, level + 1) return count def show_level(level): for x in range(level): print ' ', def test(): args = sys.argv[1:] if len(args) != 1: print __doc__ sys.exit(1) docname = args[0] doc = minidom.parse(docname) show_tree(doc) if __name__ == '__main__': #import pdb; pdb.set_trace() test()
ElementTree enables us to parse an XML document and create a DOM tree:
#!/usr/bin/env python """Process an XML document with elementtree. Show the document tree. Usage: python elementtree_walk.py [options] infilename """ import sys from xml.etree import ElementTree as etree def show_tree(doc): root = doc.getroot() show_node(root, 0) def show_node(node, level): show_level(level) print 'tag: %s' % (node.tag, ) for key, value in node.attrib.iteritems(): show_level(level + 1) print '- attribute -- name: %s value: "%s"' % (key, value, ) if node.text: text = node.text.strip() show_level(level + 1) print '- text: "%s"' % (node.text, ) if node.tail: tail = node.tail.strip() show_level(level + 1) print '- tail: "%s"' % (tail, ) for child in node.getchildren(): show_node(child, level + 1) def show_level(level): for x in range(level): print ' ', def test(): args = sys.argv[1:] if len(args) != 1: print __doc__ sys.exit(1) docname = args[0] doc = etree.parse(docname) show_tree(doc) if __name__ == '__main__': #import pdb; pdb.set_trace() test()
lxml enables us to parse an XML document and create a DOM tree. In fact, since lxml attempts to mimic the ElementTree API, our code is very similar to that in the solution to the ElementTree exercise:
#!/usr/bin/env python """Process an XML document with elementtree. Show the document tree. Usage: python lxml_walk.py [options] infilename """ # # Imports: import sys from lxml import etree def show_tree(doc): root = doc.getroot() show_node(root, 0) def show_node(node, level): show_level(level) print 'tag: %s' % (node.tag, ) for key, value in node.attrib.iteritems(): show_level(level + 1) print '- attribute -- name: %s value: "%s"' % (key, value, ) if node.text: text = node.text.strip() show_level(level + 1) print '- text: "%s"' % (node.text, ) if node.tail: tail = node.tail.strip() show_level(level + 1) print '- tail: "%s"' % (tail, ) for child in node.getchildren(): show_node(child, level + 1) def show_level(level): for x in range(level): print ' ', def test(): args = sys.argv[1:] if len(args) != 1: print __doc__ sys.exit(1) docname = args[0] doc = etree.parse(docname) show_tree(doc) if __name__ == '__main__': #import pdb; pdb.set_trace() test()
We can modify the DOM tree and write it out to a new file:
#!/usr/bin/env python """Process an XML document with elementtree. Show the document tree. Modify the document tree and then show it again. Write the modified XML tree to a new file. Usage: python elementtree_walk.py [options] infilename outfilename Options: -h, --help Display this help message. Example: python elementtree_walk.py myxmldoc.xml myotherxmldoc.xml """ import sys import os import getopt import time # Use ElementTree. from xml.etree import ElementTree as etree # Or uncomment to use Lxml. #from lxml import etree def show_tree(doc): root = doc.getroot() show_node(root, 0) def show_node(node, level): show_level(level) print 'tag: %s' % (node.tag, ) for key, value in node.attrib.iteritems(): show_level(level + 1) print '- attribute -- name: %s value: "%s"' % (key, value, ) if node.text: text = node.text.strip() show_level(level + 1) print '- text: "%s"' % (node.text, ) if node.tail: tail = node.tail.strip() show_level(level + 1) print '- tail: "%s"' % (tail, ) for child in node.getchildren(): show_node(child, level + 1) def show_level(level): for x in range(level): print ' ', def modify_tree(doc, tag, attrname, attrvalue): root = doc.getroot() modify_node(root, tag, attrname, attrvalue) def modify_node(node, tag, attrname, attrvalue): if node.tag == tag: node.attrib[attrname] = attrvalue for child in node.getchildren(): modify_node(child, tag, attrname, attrvalue) def test(indocname, outdocname): doc = etree.parse(indocname) show_tree(doc) print '-' * 50 date = time.ctime() modify_tree(doc, 'person', 'date', date) show_tree(doc) write_output = False if os.path.exists(outdocname): response = raw_input('Output file (%s) exists. Over-write? (y/n): ' % outdocname) if response == 'y': write_output = True else: write_output = True if write_output: doc.write(outdocname) print 'Wrote modified XML tree to %s' % outdocname else: print 'Did not write output file.' def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help', ]) except: usage() for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 2: usage() indocname = args[0] outdocname = args[1] test(indocname, outdocname) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Notes:
When we parse and XML document with lxml, each element (node) has an xpath() method.
# test_xpath.py from lxml import etree def test(): doc = etree.parse('people.xml') root = doc.getroot() print root.xpath("//name/text()") print root.xpath("//@id") test()
And, when we run the above code, here is what we see:
$ python test_xpath.py ['Alberta', 'Bernardo', 'Charlie'] ['1', '2', '3']
For more on XPath see: XML Path Language (XPath) -- http://www.w3.org/TR/xpath
You can find information about database programming in Python here: Database Programming -- http://wiki.python.org/moin/DatabaseProgramming/.
For database access we use the Python Database API. You can find information about it here: Python Database API Specification v2.0 -- http://www.python.org/dev/peps/pep-0249/.
To use the database API we do the following:
Our examples use the gadfly database, which is written in Python. If you want to use gadfly, you can find it here: http://gadfly.sourceforge.net/. gadfly is a reasonable choice if you want an easy to use database on your local machine.
Another reasonable choice for a local database is sqlite3, which is in the Python standard library. Here is a descriptive quote from the SQLite Web site:
"SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world. The source code for SQLite is in the public domain."
You can learn about it here:
If you want or need to use another, enterprise class database, for example PostgreSQL, MySQL, Oracle, etc., you will need an interface module for your specific database. You can find information about database interface modules here: Database interfaces -- http://wiki.python.org/moin/DatabaseInterfaces
Excercises:
Solutions:
We can execute a SQL query and then retrieve all the rows with fetchall():
import gadfly def test(): connection = gadfly.connect("dbtest1", "plantsdbdir") cur = connection.cursor() cur.execute('select * from plantsdb order by p_name') rows = cur.fetchall() for row in rows: print '2. row:', row connection.close() test()
The cursor itself is an iterator. It iterates over the rows returned by a query. So, we execute a SQL query and then we use the cursor in a for: statement:
import gadfly def test(): connection = gadfly.connect("dbtest1", "plantsdbdir") cur = connection.cursor() cur.execute('select * from plantsdb order by p_name') for row in cur: print row connection.close() test()
The description attribute in the cursor is a container that has an item describing each field:
import gadfly def test(): cur.execute('select * from plantsdb order by p_name') for field in cur.description: print 'field:', field rows = cur.fetchall() for row in rows: for idx, field in enumerate(row): content = '%s: "%s"' % (cur.description[idx][0], field, ) print content, print connection.close() test()
Notes:
The solutions using sqlite3 are very similar to those using gadfly. For information on sqlite3, see: sqlite3 — DB-API 2.0 interface for SQLite databases http://docs.python.org/library/sqlite3.html#module-sqlite3.
#!/usr/bin/env python """ Perform operations on sqlite3 (plants) database. Usage: python py_db_api.py command [arg1, ... ] Commands: create -- create new database. show -- show contents of database. add -- add row to database. Requires 3 args (name, descrip, rating). delete - remove row from database. Requires 1 arg (name). Examples: python test1.py create python test1.py show python test1.py add crenshaw "The most succulent melon" 10 python test1.py delete lemon """ import sys import sqlite3 Values = [ ('lemon', 'bright and yellow', '7'), ('peach', 'succulent', '9'), ('banana', 'smooth and creamy', '8'), ('nectarine', 'tangy and tasty', '9'), ('orange', 'sweet and tangy', '8'), ] Field_defs = [ 'p_name varchar', 'p_descrip varchar', #'p_rating integer', 'p_rating varchar', ] def createdb(): connection = sqlite3.connect('sqlite3plantsdb') cursor = connection.cursor() q1 = "create table plantsdb (%s)" % (', '.join(Field_defs)) print 'create q1: %s' % q1 cursor.execute(q1) q1 = "create index index1 on plantsdb(p_name)" cursor.execute(q1) q1 = "insert into plantsdb (p_name, p_descrip, p_rating) values ('%s', '%s', %s)" for spec in Values: q2 = q1 % spec print 'q2: "%s"' % q2 cursor.execute(q2) connection.commit() showdb1(cursor) connection.close() def showdb(): connection, cursor = opendb() showdb1(cursor) connection.close() def showdb1(cursor): cursor.execute("select * from plantsdb order by p_name") hr() description = cursor.description print description print 'description:' for rowdescription in description: print ' %s' % (rowdescription, ) hr() rows = cursor.fetchall() print rows print 'rows:' for row in rows: print ' %s' % (row, ) hr() print 'content:' for row in rows: descrip = row[1] name = row[0] rating = '%s' % row[2] print ' %s%s%s' % ( name.ljust(12), descrip.ljust(30), rating.rjust(4), ) def addtodb(name, descrip, rating): try: rating = int(rating) except ValueError, exp: print 'Error: rating must be integer.' return connection, cursor = opendb() cursor.execute("select * from plantsdb where p_name = '%s'" % name) rows = cursor.fetchall() if len(rows) > 0: ql = "update plantsdb set p_descrip='%s', p_rating='%s' where p_name='%s'" % ( descrip, rating, name, ) print 'ql:', ql cursor.execute(ql) connection.commit() print 'Updated' else: cursor.execute("insert into plantsdb values ('%s', '%s', '%s')" % ( name, descrip, rating)) connection.commit() print 'Added' showdb1(cursor) connection.close() def deletefromdb(name): connection, cursor = opendb() cursor.execute("select * from plantsdb where p_name = '%s'" % name) rows = cursor.fetchall() if len(rows) > 0: cursor.execute("delete from plantsdb where p_name='%s'" % name) connection.commit() print 'Plant (%s) deleted.' % name else: print 'Plant (%s) does not exist.' % name showdb1(cursor) connection.close() def opendb(): connection = sqlite3.connect("sqlite3plantsdb") cursor = connection.cursor() return connection, cursor def hr(): print '-' * 60 def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] if len(args) < 1: usage() cmd = args[0] if cmd == 'create': if len(args) != 1: usage() createdb() elif cmd == 'show': if len(args) != 1: usage() showdb() elif cmd == 'add': if len(args) < 4: usage() name = args[1] descrip = args[2] rating = args[3] addtodb(name, descrip, rating) elif cmd == 'delete': if len(args) < 2: usage() name = args[1] deletefromdb(name) else: usage() if __name__ == '__main__': main()
There is support for parsing and generating CSV files in the Python standard library. See: csv — CSV File Reading and Writing http://docs.python.org/library/csv.html#module-csv.
Exercises:
Read a CSV file and print the fields in columns. Here is a sample file to use as input:
# name description rating Lemon,Bright yellow and tart,5 Eggplant,Purple and shiny,6 Tangerine,Succulent,8
Solutions:
Use the CSV module in the Python standard library to read a CSV file:
""" Read a CSV file and print the contents in columns. """ import csv def test(infilename): infile = open(infilename) reader = csv.reader(infile) print '==== =========== ======' print 'Name Description Rating' print '==== =========== ======' for fields in reader: if len(fields) == 3: line = '%s %s %s' % (fields[0].ljust(20), fields[1].ljust(40), fields[2].ljust(4)) print line infile.close() def main(): infilename = 'csv_report.csv' test(infilename) if __name__ == '__main__': main()
And, when run, here is what it displays:
==== =========== ====== Name Description Rating ==== =========== ====== Lemon Bright yellow and tart 5 Eggplant Purple and shiny 6 Tangerine Succulent 8
YAML is a structured text data representation format. It uses indentation to indicate nesting. Here is a description from the YAML Web site:
"YAML: YAML Ain't Markup Language
"What It Is: YAML is a human friendly data serialization standard for all programming languages."
You can learn more about YAML and PyYAML here:
Exercises:
Read the following sample YAML document. Print out the information in it:
american: - Boston Red Sox - Detroit Tigers - New York Yankees national: - New York Mets - Chicago Cubs - Atlanta Braves
Load the YAML data used in the previous exercise, then make a modification (for example, add "San Francisco Giants" to the National League), then dump the modified data to a new file.
Solutions:
Printing out information from YAML is as "simple" as printing out a Python data structure. In this solution, we use the pretty printer from the Python standard library:
import yaml import pprint def test(): infile = open('test1.yaml') data = yaml.load(infile) infile.close() pprint.pprint(data) test()
We could, alternatively, read in and then "load" from a string:
import yaml import pprint def test(): infile = open('test1.yaml') data_str = infile.read() infile.close() data = yaml.load(data_str) pprint.pprint(data) test()
The YAML dump() function enables us to dump data to a file:
import yaml import pprint def test(): infile = open('test1.yaml', 'r') data = yaml.load(infile) infile.close() data['national'].append('San Francisco Giants') outfile = open('test1_new.yaml', 'w') yaml.dump(data, outfile) outfile.close() test()
Notes:
If we want to produce the standard YAML "block" style rather than the "flow" format, then we could use:
yaml.dump(data, outfile, default_flow_style=False)
Here is a quote from Wikipedia entry for Json:
"JSON (pronounced 'Jason'), short for JavaScript Object Notation, is a lightweight computer data interchange format. It is a text-based, human-readable format for representing simple data structures and associative arrays (called objects)."
The Json text representation looks very similar to Python literal representation of Python builtin data types (for example, lists, dictionaries, numbers, and strings).
Learn more about Json and Python support for Json here:
Excercises:
Write a Python script, using your favorite Python Json implementation (for example python-json or simplejson), that dumps the following data structure to a file:
Data = { 'rock and roll': ['Elis', 'The Beatles', 'The Rolling Stones',], 'country': ['Willie Nelson', 'Hank Williams', ] }
Write a Python script that reads Json data from a file and loads it into Python data structures.
Solutions:
This solution uses simplejson to store a Python data structure encoded as Json in a file:
import simplejson as json Data = { 'rock and roll': ['Elis', 'The Beatles', 'The Rolling Stones',], 'country': ['Willie Nelson', 'Hank Williams', ] } def test(): fout = open('tmpdata.json', 'w') content = json.dumps(Data) fout.write(content) fout.write('\n') fout.close() test()
We can read the file into a string, then decode it from Json:
import simplejson as json def test(): fin = open('tmpdata.json', 'r') content = fin.read() fin.close() data = json.loads(content) print data test()
Note that you may want some control over indentation, character encoding, etc. For simplejson, you can learn about that here: simplejson - JSON encoder and decoder -- http://simplejson.googlecode.com/svn/tags/simplejson-2.0.1/docs/index.html.
This section discusses a specific Python tool, specifically a Python code generator that generates Python bindings for XML files.
Thus, this section will help you in the following ways:
Additional information:
If you plan to work through this tutorial, you may find it helpful to look at the sample code that accompanies this tutorial. You can find it in the distribution under:
tutorial/ tutorial/Code/
You can find additional information about generateDS.py here:
That documentation is also included in the distribution.
generateDS.py generates Python data structures (for example, class definitions) from an XML schema document. These data structures represent the elements in an XML document described by the XML schema. generateDS.py also generates parsers that load an XML document into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document.
The generated Python code contains:
Each generated class contains the following:
The generated subclass file contains one (sub-)class definition for each data representation class. If the subclass file is used, then the parser creates instances of the subclasses (instead of creating instances of the superclasses). This enables the user to extend the subclasses with "tree walk" methods, for example, that process the contents of the XML file. The user can also generate and extend multiple subclass files which use a single, common superclass file, thus implementing a number of different processes on the same XML document type.
This document introduces the user to generateDS.py and walks the user through several examples that show how to generate Python code and how to use that generated code.
Note: The sample files used below are under the tutorial/Code/ directory.
Use the following to get help:
$ generateDS.py --help
I'll assume that generateDS.py is in a directory on your path. If not, you should do whatever is necessary to make it accessible and executable.
Here is a simple XML schema document:
And, here is how you might generate classes and subclasses that provide data bindings (a Python API) for the definitions in that schema:
$ generateDS.py -o people_api.py -s people_sub.py people.xsd
And, if you want to automatically over-write the generated Python files, use the -f command line flag to force over-write without asking:
$ generateDS.py -f -o people_api.py -s people_sub.py people.xsd
And, to hard-wire the subclass file so that it imports the API module, use the --super command line file. Example:
$ generateDS.py -o people_api.py people.xsd $ generateDS.py -s people_appl1.py --super=people_api people.xsd
Or, do both at the same time with the following:
$ generateDS.py -o people_api.py -s people_appl1.py --super=people_api people.xsd
And, for your second application:
$ generateDS.py -s people_appl2.py --super=people_api people.xsd
If you take a look inside these two "application" files, you will see and import statement like the following:
import ??? as supermod
If you had not used the --super command line option when generating the "application" files, then you could modify that statement yourself. The --super command line option does this for you.
You can also use the The graphical front-end to configure options and save them in a session file, then use that session file with generateDS.py to specify your command line options. For example:
$ generateDS.py --session=test01.session
You can test the generated code by running it. Try something like the following:
$ python people_api.py people.xml
or:
$ python people_appl1.py people.xml
Why does this work? Why can we run the generated code as a Python script? -- If you look at the generated code, down near the end of the file you'll find a main() function that calls a function named parse(). The parse function does the following:
Except for some indentation (ignorable whitespace), this exported XML should be the same as the original XML document. So, that gives you a reasonably thorough test of your generated code.
And, the code in that parse() function gives you a hint of how you might build your own application-specific code that uses the generated API (those generated Python classes).
Now that you have generated code for your data model, you can test it by running it as an application. Suppose that you have an XML instance document people1.xml that satisfies your schema. Then you can parse that instance document and export it (print it out) with something like the following:
$ python people_api.py people1.xml
And, if you have used the --super command line option, as I have above, to connect your subclass file with the superclass (API) file, then you could use the following to do the same thing:
$ python people_appl1.py people1.xml
You may want to merely skim this section for now, then later refer back to it when some of these options are are used later in this tutorial. Also, remember that you can get information about more command line options used by generateDS.py by typing:
$ python generateDS.py --help
and by reading the document http://www.rexx.com/~dkuhlman/generateDS.html
Use this option to tell generateDS.py which of the elements defined in your XM schema is the "root" element. The root element is the outer-most (top-level) element in XML instance documents defined by this schema. In effect, this tells your generated modules which element to use as the root element when parsing and exporting documents.
generateDS.py attempts to guess the root element, usually the first element defined in your XML schema. Use this option when that default is not what you want.
There is also a point-and-click way to run generateDS. It enables you to specify the options needed by generateDS.py through a graphical interface, then to run generateDS.py with those options. It also
You can run it, if you have installed generateDS, by typing the following at a command line:
$ generateds_gui.py
After configuring options, you can save those options in a "session" file, which can be loaded later. Look under the File menu for save and load commands and also consider using the "--session" command line option.
Also note that generateDS.py itself supports a "--session" command line option that enables you to run generateDS.py with the options that you specified and saved with the graphical front-end.
generateDS.py generates Python code which, with no modification, will parse and then export an XML document defined by your schema. However, you are likely to want to go beyond that. In many situations you will want to construct a custom application that processes your XML documents using the generated code.
One strategy is to generate a subclass file and to add your application-specific code to that. Generate the subclass file with the "-s" command line flag:
$ generateDS.py -s myapp.py people.xsd
Now add some application-specific code to myapp.py, for example, if you are using the included "people" sample files:
class peopleTypeSub(supermod.people): def __init__(self, comments=None, person=None, programmer=None, python_programmer=None, java_programmer=None): supermod.people.__init__(self, comments, person, programmer, python_programmer, java_programmer) def fancyexport(self, outfile): outfile.write('Starting fancy export') for person in self.get_person(): person.fancyexport(outfile) supermod.people.subclass = peopleTypeSub # end class peopleTypeSub class personTypeSub(supermod.person): def __init__(self, vegetable=None, fruit=None, ratio=None, id=None, value=None, name=None, interest=None, category=None, agent=None, promoter=None, description=None): supermod.person.__init__(self, vegetable, fruit, ratio, id, value, name, interest, category, agent, promoter, description) def fancyexport(self, outfile): outfile.write('Fancy person export -- name: %s' % self.get_name(), ) supermod.person.subclass = personTypeSub # end class personTypeSub
In this approach you might do things like the following:
Get to know the generated export API by inspecting the generated code in the superclass file. That's the file generated with the "-o" command line flag.
What to look for:
Now, you can import your generated API module, and use it to construct and manipulate objects. Here is an example using code generated with the "people" schema:
import sys import people_api as api def test(names): people = api.peopleType() for count, name in enumerate(names): id = '%d' % (count + 1, ) person = api.personType(name=name, id=id) people.add_person(person) people.export(sys.stdout, 0) test(['albert', 'betsy', 'charlie'])
Run this and you might see something like the following:
$ python tmp.py <people > <person id="1"> <name>albert</name> </person> <person id="2"> <name>betsy</name> </person> <person id="3"> <name>charlie</name> </person> </people>
Note: You can find examples of the code in this section in these files:
tutorial/Code/upcase_names.py tutorial/Code/upcase_names_appl.py
Here are the relevant, modified subclasses (upcase_names_appl.py):
import people_api as supermod class peopleTypeSub(supermod.peopleType): def __init__(self, comments=None, person=None, specialperson=None, programmer=None, python_programmer=None, java_programmer=None): super(peopleTypeSub, self).__init__(comments, person, specialperson, programmer, python_programmer, java_programmer, ) def upcase_names(self): for person in self.get_person(): person.upcase_names() supermod.peopleType.subclass = peopleTypeSub # end class peopleTypeSub class personTypeSub(supermod.personType): def __init__(self, vegetable=None, fruit=None, ratio=None, id=None, value=None, name=None, interest=None, category=None, agent=None, promoter=None, description=None, range_=None, extensiontype_=None): super(personTypeSub, self).__init__(vegetable, fruit, ratio, id, value, name, interest, category, agent, promoter, description, range_, extensiontype_, ) def upcase_names(self): self.set_name(self.get_name().upper()) supermod.personType.subclass = personTypeSub # end class personTypeSub
Notes:
Here is the application itself (upcase_names.py):
import sys import upcase_names_appl as appl def create_people(names): people = appl.peopleTypeSub() for count, name in enumerate(names): id = '%d' % (count + 1, ) person = appl.personTypeSub(name=name, id=id) people.add_person(person) return people def main(): names = ['albert', 'betsy', 'charlie'] people = create_people(names) print 'Before:' people.export(sys.stdout, 1) people.upcase_names() print '-' * 50 print 'After:' people.export(sys.stdout, 1) main()
Notes:
And, when you run this mini-application, here is what you might see:
$ python upcase_names.py Before: <people > <person id="1"> <name>albert</name> </person> <person id="2"> <name>betsy</name> </person> <person id="3"> <name>charlie</name> </person> </people> -------------------------------------------------- After: <people > <person id="1"> <name>ALBERT</name> </person> <person id="2"> <name>BETSY</name> </person> <person id="3"> <name>CHARLIE</name> </person> </people>
There are times when you would like to implement a function or method that can perform operations on a variety of members and that needs type information about each member.
You can get help with this by generating your code with the "--member-specs" command line option. When you use this option, generateDS.py add a list or a dictionary containing an item for each member. If you want a list, then use "--member-specs=list", and if you want a dictionary, with member names as keys, then use "--member-specs=dict".
Here is an example -- In this example, we walk the document/instance tree and convert all string simple types to upper case.
Here is a schema (Code/member_specs.xsd):
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="contact-list" type="contactlistType" /> <xs:complexType name="contactlistType"> <xs:sequence> <xs:element name="description" type="xs:string" /> <xs:element name="contact" type="contactType" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="locator" type="xs:string" /> </xs:complexType> <xs:complexType name="contactType"> <xs:sequence> <xs:element name="first-name" type="xs:string"/> <xs:element name="last-name" type="xs:string"/> <xs:element name="interest" type="xs:string" maxOccurs="unbounded" /> <xs:element name="category" type="xs:integer"/> </xs:sequence> <xs:attribute name="id" type="xs:integer" /> <xs:attribute name="priority" type="xs:float" /> <xs:attribute name="color-code" type="xs:string" /> </xs:complexType> </xs:schema>
We generate code with the following command line:
$ generateDS.py -f \ -o member_specs_api.py \ -s member_specs_upper.py \ --super=member_specs_api \ --member-specs=list \ member_specs.xsd
Notes:
And, here is the subclass file (member_specs_upper.py, generated with the "-s" command line option), to which we have added a bit of code that converts any string-type members to upper case. You can think of this module as a special "application" of the generated classes.
#!/usr/bin/env python # # member_specs_upper.py # # # Generated Tue Nov 9 15:54:47 2010 by generateDS.py version 2.2a. # import sys import member_specs_api as supermod etree_ = None Verbose_import_ = False ( XMLParser_import_none, XMLParser_import_lxml, XMLParser_import_elementtree ) = range(3) XMLParser_import_library = None try: # lxml from lxml import etree as etree_ XMLParser_import_library = XMLParser_import_lxml if Verbose_import_: print("running with lxml.etree") except ImportError: try: # cElementTree from Python 2.5+ import xml.etree.cElementTree as etree_ XMLParser_import_library = XMLParser_import_elementtree if Verbose_import_: print("running with cElementTree on Python 2.5+") except ImportError: try: # ElementTree from Python 2.5+ import xml.etree.ElementTree as etree_ XMLParser_import_library = XMLParser_import_elementtree if Verbose_import_: print("running with ElementTree on Python 2.5+") except ImportError: try: # normal cElementTree install import cElementTree as etree_ XMLParser_import_library = XMLParser_import_elementtree if Verbose_import_: print("running with cElementTree") except ImportError: try: # normal ElementTree install import elementtree.ElementTree as etree_ XMLParser_import_library = XMLParser_import_elementtree if Verbose_import_: print("running with ElementTree") except ImportError: raise ImportError("Failed to import ElementTree from any known place") def parsexml_(*args, **kwargs): if (XMLParser_import_library == XMLParser_import_lxml and 'parser' not in kwargs): # Use the lxml ElementTree compatible parser so that, e.g., # we ignore comments. kwargs['parser'] = etree_.ETCompatXMLParser() doc = etree_.parse(*args, **kwargs) return doc # # Globals # ExternalEncoding = 'ascii' # # Utility funtions needed in each generated class. # def upper_elements(obj): for item in obj.member_data_items_: if item.get_data_type() == 'xs:string': name = remap(item.get_name()) val1 = getattr(obj, name) if isinstance(val1, list): for idx, val2 in enumerate(val1): val1[idx] = val2.upper() else: setattr(obj, name, val1.upper()) def remap(name): newname = name.replace('-', '_') return newname # # Data representation classes # class contactlistTypeSub(supermod.contactlistType): def __init__(self, locator=None, description=None, contact=None): super(contactlistTypeSub, self).__init__(locator, description, contact, ) def upper(self): upper_elements(self) for child in self.get_contact(): child.upper() supermod.contactlistType.subclass = contactlistTypeSub # end class contactlistTypeSub class contactTypeSub(supermod.contactType): def __init__(self, priority=None, color_code=None, id=None, first_name=None, last_name=None, interest=None, category=None): super(contactTypeSub, self).__init__(priority, color_code, id, first_name, last_name, interest, category, ) def upper(self): upper_elements(self) supermod.contactType.subclass = contactTypeSub # end class contactTypeSub def get_root_tag(node): tag = supermod.Tag_pattern_.match(node.tag).groups()[-1] rootClass = None if hasattr(supermod, tag): rootClass = getattr(supermod, tag) return tag, rootClass def parse(inFilename): doc = parsexml_(inFilename) rootNode = doc.getroot() rootTag, rootClass = get_root_tag(rootNode) if rootClass is None: rootTag = 'contact-list' rootClass = supermod.contactlistType rootObj = rootClass.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_=rootTag, namespacedef_='') doc = None return rootObj def parseString(inString): from StringIO import StringIO doc = parsexml_(StringIO(inString)) rootNode = doc.getroot() rootTag, rootClass = get_root_tag(rootNode) if rootClass is None: rootTag = 'contact-list' rootClass = supermod.contactlistType rootObj = rootClass.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_=rootTag, namespacedef_='') return rootObj def parseLiteral(inFilename): doc = parsexml_(inFilename) rootNode = doc.getroot() rootTag, rootClass = get_root_tag(rootNode) if rootClass is None: rootTag = 'contact-list' rootClass = supermod.contactlistType rootObj = rootClass.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('#from member_specs_api import *\n\n') sys.stdout.write('import member_specs_api as model_\n\n') sys.stdout.write('rootObj = model_.contact_list(\n') rootObj.exportLiteral(sys.stdout, 0, name_="contact_list") sys.stdout.write(')\n') return rootObj USAGE_TEXT = """ Usage: python ???.py <infilename> """ def usage(): print USAGE_TEXT sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infilename = args[0] root = parse(infilename) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Notes:
Here is a test driver (member_specs_test.py) for our (mini-) application:
#!/usr/bin/env python # # member_specs_test.py # import sys import member_specs_api as supermod import member_specs_upper def process(inFilename): doc = supermod.parsexml_(inFilename) rootNode = doc.getroot() rootClass = member_specs_upper.contactlistTypeSub rootObj = rootClass.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_="contact-list", namespacedef_='') rootObj.upper() sys.stdout.write('-' * 60) sys.stdout.write('\n') rootObj.export(sys.stdout, 0, name_="contact-list", namespacedef_='') return rootObj USAGE_MSG = """\ Synopsis: Sample application using classes and subclasses generated by generateDS.py Usage: python member_specs_test.py infilename """ def usage(): print USAGE_MSG sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infilename = args[0] process(infilename) if __name__ == '__main__': main()
Notes:
We can use the following command line to run our application:
$ python member_specs_test.py member_specs_data.xml
When we run our application, here is the output:
$ python member_specs_test.py member_specs_data.xml <?xml version="1.0" ?> <contact-list locator="http://www.rexx.com/~dkuhlman"> <description>My list of contacts</description> <contact priority="0.050000" color-code="red" id="1"> <first-name>arlene</first-name> <last-name>Allen</last-name> <interest>traveling</interest> <category>2</category> </contact> </contact-list> ------------------------------------------------------------ <contact-list locator="HTTP://WWW.REXX.COM/~DKUHLMAN"> <description>MY LIST OF CONTACTS</description> <contact priority="0.050000" color-code="RED" id="1"> <first-name>ARLENE</first-name> <last-name>ALLEN</last-name> <interest>TRAVELING</interest> <category>2</category> </contact> </contact-list>
Notes:
The following hints are offered for convenience. You can discover them for yourself rather easily by inspecting the generated code.
If a child element is defined in the XML schema with maxOccurs="unbounded" or a value of maxOccurs greater than 1, then access to the child is through a list.
If a child element is defined as a numeric type such as xs:integer, xs:float, or xs:double or as a simple type that is (ultimately) based on a numeric type, then the value is stored (in the Python object) as a Python data type (int, float, etc).
But, when the element itself is defined as mixed="true" or the element a restriction of and has a simple (numeric) as a base, then the valueOf_ instance variable holds the character content and it is always a string, that is it is not converted.
All parameters to the constructors of generated classes have default parameters. Therefore, you can create an "empty" instance of any element by calling the constructor with no parameters.
For example, suppose we have the following XML schema:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="plant-list" type="PlantList" /> <xs:complexType name="PlantType"> <xs:sequence> <xs:element name="description" type="xs:string" /> <xs:element name="catagory" type="xs:integer" /> <xs:element name="fertilizer" type="FertilizerType" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="identifier" type="xs:string" /> </xs:complexType> <xs:complexType name="FertilizerType"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="description" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="xs:integer" /> </xs:complexType> </xs:schema>
And, suppose we generate a module with the following command line:
$ ./generateDS.py -o garden_api.py garden.xsd
Then, for the element named PlantType in the generated module named garden_api.py, you can create an instance as follows:
>>> import garden_api >>> plant = garden_api.PlantType() >>> import sys >>> plant.export(sys.stdout, 0) <PlantType/>