Minute Crash Course in XML

If you already know about XML, you can skip this section. XML is a generalized way of describing hierarchical structured data. An XML document contains one or more elements, which are delimited by start and end tags. This is a complete (albeit boring) XML document 1. This is the start tag of the foo element. 2. This is the matching end tag of the foo element. Like balancing parentheses in writing or mathematics or code, every start tag must be closed (matched) by a corresponding end tag....

Short Digression Into Multi File Modules

I could have chosen to put all the code in one file (named chardet.py), but I didn't. Instead, I made a directory (named chardet), then I made an_init_.py file in that directory. If Python sees an_init_.py file in a directory, it assumes that all of the files in that directory are part of the same module. The module's name is the name of the directory. Files within the directory can reference other files within the same directory, or even within subdirectories....

Adding Items To a List

There are four ways to add items to a list. > > > a_list a_list + 2.0, 3 > > > a_list > > > a_list.append(True) > > > a_list.extend( 'four', '0' ) 'a', 2.0, 3, True, 'four', '0' > > > a_list.insert(0, '0') '0', 'a', 2.0, 3, True, 'four', '0' 1. The + operator concatenates lists to create a new list. A list can contain any number of items there is no size limit (other than available memory). However, if memory is a concern, you should be aware that list...

Attributes Are Dictonaries

XML isn't just a collection of elements each element can also have its own set of attributes. Once you have a reference to a specific element, you can easily get its attributes as a Python dictionary. continuing from the previous example > > > root.attrib 'en' > > > root 4 < Element http www.w3.org 2005 Atomjlink at e181b0> > > > root 4 .attrib 'href' 'http diveintomark.org ', 'type' 'text html', 'rel' 'alternate' > > > root 3 < Element http www.w3.org 2005...

Chapter Packaging Python Libraries

You'll find the shame is like the pain you only feel it once. w Marquise de Merteuil, Dangerous Liaisons eal artists ship. Or so says Steve Jobs. Do you want to release a Python script, library, framework, or application Excellent. The world needs more Python code. Python 3 comes with a packaging framework called Distutils. Distutils is many things a build tool (for you), an installation tool (for your users), a package metadata format (for search engines), and more. It integrates with the...

Classes That Act Like Iterators

In the Iterators chapter, you saw how to build an iterator from the ground up using the_iter_() and to get the next value from an iterator to create an iterator in reverse order 1. The_iter_() method is called whenever you create a new iterator. It's a good place to initialize the 2. The_next_() method is called whenever you retrieve the next value from an iterator. 3. The_reversed_() method is uncommon. It takes an existing sequence and returns an iterator that yields the items in the sequence...

Classes That Act Like Sets

If your class acts as a container for a set of values that is, if it makes sense to ask whether your class contains a value then it should probably define the following special methods that make it act like a set. to know whether it contains a specific value The cgi module uses these methods in its FieldStorage class, which represents all of the form fields or query parameters submitted to a dynamic web page. A script which responds to http example.com search q cgi import cgi An excerpt from...

Classes That Can Be Compared

I broke this section out from the previous one because comparisons are not strictly the purview of numbers. Many datatypes can be compared strings, lists, even dictionaries. If you're creating your own class and it makes sense to compare your objects to other objects, you can use the following special methods to implement comparisons. If you define a_It_() method but no_gt_() method, Python will use the _It_() method with operands swapped. However, Python will not combine methods. For example,...

Classes That Can Be Used in a with Block

A with block defines a runtime context you enter the context when you execute the with statement, and you exit the context after you execute the last statement in the block. do something special when entering a with block do something special when leaving a with block This is how the with file idiom works. '''Internal raise an ValueError if file is closed raise ValueError('I O operation on closed file.' '''Context management protocol. Returns self.''' self._checkClosed() '''Context management...

Creating a Source Distribution

Distutils supports building multiple types of release packages. At a minimum, you should build a source distribution that contains your source code, your Distutils setup script, your read me file, and whatever additional files you want to include. To build a source distribution, pass the sdist command to your Distutils setup script. c Users pilgrim chardet> c python31 python.exe setup.py sdist running sdist running check reading manifest template 'MANIFEST.in' writing manifest file 'MANIFEST'...

Diving In

N early all the chapters in this book revolve around a piece of sample code. But XML isn't about code it's about data. One common use of XML is syndication feeds that list the latest articles on a blog, forum, or other frequently-updated website. Most popular blogging software can produce a feed and update it whenever new articles, discussion threads, or blog posts are published. You can follow a blog by subscribing to its feed, and you can follow multiple blogs with a dedicated feed aggregator...

Elements Are Lists

In the ElementTree API, an element acts like a list. The items of the list are the element's children. continued from the previous example > > > root.tag ' http www.w3.org 2005 Atomjfeed' > > > for child in root at e2b5d0> at e2b4e0> at e2b6c0> at e2b6f0> at e2b4b0> at e2b720> at e2b510> at e2b750> 1. Continuing from the previous example, the root element is http www.w3.org 2005 Atomjfeed. 2. The length of the root element is the number of child elements. 3. You can...

Features of HTTP

There are five important features which all HTTP clients should support. The most important thing to understand about any type of web service is that network access is incredibly expensive. I don't mean dollars and cents expensive (although bandwidth ain't free). I mean that it takes an extraordinary long time to open a connection, send a request, and retrieve a response from a remote server. Even on the fastest broadband connection, latency (the time it takes to send a request and start...

Fil ter global function

In Python 2, the filter() function returned a list, the result of filtering a sequence through a function that returned True or False for each item in the sequence. In Python 3, the filter() function returns an iterator, not a list. list(filter(a_function, a_sequence)) list(filter(a_function, a_sequence)) for an_iterator in a_sequence_of_iterators an_iterator.next() i for i in filter(a_function, a_sequence) 1. In the most basic case, 2to3 will wrap a call to filter() with a call to list(),...

Finding the unique items in a sequence

Sets make it trivial to find the unique items in a sequence. > > > a_List 'The', 'sixth', 'sick', sheik's, 'sixth', sheep's, 'sick' 'sixth', 'The', sheep's, 'sick', sheik's > > > a_string 'EAST IS EAST' > > > words 'SEND', 'MORE', 'MONEY' > > > set(''.join(words)) 'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y' 1. Given a list of several strings, the set() function will return a set of unique strings from the list. This makes sense if you think of it like a for loop. Take the...

Fixing What to Cant

Now for the real test running the test harness against the test suite. Since the test suite is designed to cover all the possible code paths, it's a good way to test our ported code to make sure there aren't any bugs lurking anywhere. C home chardet> python test.py tests * * Traceback (most recent call last) File test.py, line 1, in < module> from chardet.universaldetector import UniversalDetector File line 51 self.done constants.False Hmm, a small snag. In Python 3, False is a reserved...

Generating XML

Python's support for XML is not limited to parsing existing documents. You can also create XML documents from scratch. > > > import xml.etree.ElementTree as etree > > > new_feed 'en' ) > > > < ns0 feed xmlns ns0 'http www.w3.org 2005 Atom' xml lang 'en' > 1. To create a new element, instantiate the Element class. You pass the element name (namespace + local name) as the first argument. This statement creates a feed element in the Atom namespace. This will be our new...

H a skey dictionary method

In Python 2, dictionaries had a has_key() method to test whether the dictionary had a certain key. In Python 3, this method no longer exists. Instead, you need to use the in operator. a_dictionary.has_key(x) or a_dictionary.has_key(y) x in a_dictionary or y in a_dictionary 2. The in operator takes precedence over the or operator, so there is no need for parentheses around x in a_dictionary or around y in a_dictionary. 3. On the other hand, you do need parentheses around x or y here, for the...

Handling Changing Requirements

Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible nasty things involving scissors and hot wax, requirements will change. Most customers don't know what they want until they see it, and even if they do, they aren't that good at articulating what they want precisely enough to be useful. And even if they do, they'll want more in the next release anyway. So be prepared to update your test cases as requirements change. Suppose,...

More Bad Input

Now that the from_roman() function works properly with good input, it's time to fit in the last piece of the puzzle making it work properly with bad input. That means finding a way to look at a string and determine if it's a valid Roman numeral. This is inherently more difficult than validating numeric input in the to_roman() function, but you have a powerful tool at your disposal regular expressions. (If you're not familiar with regular expressions, now would be a good time to read the regular...

O Installing on Mac OS X

All modern Macintosh computers use the Intel chip (like most Windows PCs). Older Macs used PowerPC chips. You don't need to understand the difference, because there's just one Mac Python installer for all Macs. Visit python.org download and download the Mac installer. It will be called something like Python 3.1 Mac Installer Disk Image, although the version number may vary. Be sure to download version 3.x, not 2.x. Your browser should automatically mount the disk image and open a Finder window...

Other Common String Methods

Besides formatting, strings can do a number of other useful tricks. > > > s '''Finished files are the re- suit of years of scientif- ic study combined with the experience of years.''' > > > s.splitlinesQ 'Finished files are the re-', 'suit of years of scientif-', 'ic study combined with the', 'experience of years.' > > > print(s.lowerQ) finished files are the result of years of scientific study combined with the experience of years. > > > s.lowerO.count('f') 1. You can...

Raise statement

The syntax for raising your own exceptions has changed slightly between Python 2 and Python 3. raise MyException, 'error message', a_traceback raise MyException('error 1. In the simplest form, raising an exception without a custom error message, the syntax is unchanged. 2. The change becomes noticeable when you want to raise an exception with a custom error message. Python 2 separated the exception class and the message with a comma Python 3 passes the error message as a parameter. 3. Python 2...

Redirecting Standard Output

Sys.stdout and sys.stderr are stream objects, albeit ones that only support writing. But they're not constants they're variables. That means you can assign them a new value any other stream object to redirect their output. self.out_old sys.stdout sys.stdout self.out_new with open('out.log', mode 'w', encoding 'utf-8') as a_file, RedirectStdoutTo(a_file) python3 stdout.py cat out.log B If so, you're probably using Python 3.0. You should really upgrade to Python 3.1. Python 3.0 supported the with...

Saving Data to a Pickle File

The pickle module works with data structures. Let's build one. > > > entry 'title' 'Dive into history, 2009 edition' > > > entry 'article_link' > > > entry 'comments_link' None > > > entry 'internal_id' b' xDE xD5 xB4 xF8' > > > entry 'tags' ('diveintopython', 'docbook', 'html') > > > entry 'published' True > > > entry 'published_date' time.strptime('Fri Mar 27 22 20 42 2009') > > > entry 'published_date' time.struct_time(tm_year 2009, tm_mon 3,...

The Structure Of An Atom Feed

Think of a weblog, or in fact any website with frequently updated content, like CNN.com. The site itself has a title (CNN.com), a subtitle (Breaking News, U.S., World, Weather, Entertainment & Video News), a last-updated date (updated 12 43 p.m. EDT, Sat May 16, 2009), and a list of articles posted at different times. Each article also has a title, a first-published date (and maybe also a last-updated date, if they published a correction or fixed a typo), and a unique URL. The Atom...

Try except statement

The syntax for catching exceptions has changed slightly between Python 2 and Python 3. import mymodule except ImportError, e pass import mymodule except ImportError as e pass import mymodule except (RuntimeError, ImportError), e pass import mymodule except (RuntimeError, ImportError) as e pass import mymodule except ImportError pass 1. Instead of a comma after the exception type, Python 3 uses a new keyword, as. 2. The as keyword also works for catching multiple types of exceptions at once. 3....

Unicode global function

Python 2 had two global functions to coerce objects into strings unicode() to coerce them into Unicode strings, and str() to coerce them into non-Unicode strings. Python 3 has only one string type, Unicode strings, so the str() function is all you need. (The unicode() function no longer exists.) Python 2 had separate int and long types for non-floating-point numbers. An int could not be any larger than sys.maxint, which varied by platform. Longs were defined by appending an L to the end of the...

Verbose Regular Expressions

So far you've just been dealing with what I'll call compact regular expressions. As you've seen, they are difficult to read, and even if you figure out what one does, that's no guarantee that you'll be able to understand it six months later. What you really need is inline documentation. Python allows you to do this with something called verbose regular expressions. A verbose regular expression is different from a compact regular expression in two ways Whitespace is ignored. Spaces, tabs, and...

Where To Look For Python Compatible Code

As Python 3 is relatively new, there is a dearth of compatible libraries. Here are some of the places to look for code that works with Python 3. Python Package Index list of Python 3 packages Python Cookbook list of recipes tagged python3 Google Project Hosting list of projects tagged python3 SourceForge list of projects matching Python 3 GitHub list of projects matching python3 (also, list of projects matching python 3) BitBucket list of projects matching python3 (and those matching python 3)

Callable global function

In Python 2, you could check whether an object was callable like a function with the global callable function. In Python 3, this global function has been eliminated. To check whether an object is callable, check for the existence of the_call_ special method. In Python 2, the global zip function took any number of sequences and returned a list of tuples. The first tuple contained the first item from each sequence the second tuple contained the second item from each sequence and so on. In Python...

Using The Python Shell

The Python Shell is where you can explore Python syntax, get interactive help on commands, and debug short programs. The graphical Python Shell named IDLE also contains a decent text editor that supports Python syntax coloring and integrates with the Python Shell. If you don't already have a favorite text editor, you should give IDLE a try. First things first. The Python Shell itself is an amazing interactive playground. Throughout this book, you'll see examples like this The three angle...

Beyond Http Get

HTTP web services are not limited to GET requests. What if you want to create something new Whenever you post a comment on a discussion forum, update your weblog, publish your status on a microblogging service like Twitter or Identi.ca, you're probably already using HTTP POST. Both Twitter and Identi.ca both offer a simple HTTP-based API for publishing and updating your status in 140 characters or less. Let's look at Identi.ca's API documentation for updating your status Identi.ca rest api...

Relative imports within a package

A package is a group of related modules that function as a single entity. In Python 2, when modules within a package need to reference each other, you use import foo or from foo import Bar. The Python 2 interpreter first searches within the current package to find foo.py, and then moves on to the other directories in the Python search path sys.path . Python 3 works a bit differently. Instead of searching the current package, it goes directly to the Python search path. If you want one module...

Whats On The Wire

To see why this is inefficient and rude, let's turn on the debugging features of Python's HTTP library and see what's being sent on the wire i.e. over the network . gt gt gt from http.client import HTTPConnection gt gt gt HTTPConnection.debuglevel 1 gt gt gt from urllib.request import urlopen gt gt gt response send b'GET examples feed.xml HTTP 1.1 Host diveintopython3.org Accept-Encoding identity User-Agent Python-urllib 3.1' Connection close reply 'HTTP 1.1 200 OK' further debugging...

IntroducingTHe chardet Module

Before we set off porting the code, it would help if you understood how the code worked This is a brief guide to navigating the code itself. The chardet library is too large to include inline here, but you can download it from chardet.feedparser.org. universaldetector.py, which has one class, UniversalDetector. You might think the main entry point is the detect function in chardet _init_.py, but that's really just a convenience function that creates a UniversalDetector object, calls it, and...

Serializing Datatypes Unsupported by json

Even if JSON has no built-in support for bytes, that doesn't mean you can't serialize bytes objects. The json module provides extensibility hooks for encoding and decoding unknown datatypes. By unknown, I mean not defined in JSON. Obviously the json module knows about byte arrays, but it's constrained by the limitations of the JSON specification. If you want to encode bytes or other datatypes that JSON doesn't support natively, you need to provide custom encoders and decoders for those types....

Searching For Nodes Within An XML Document

So far, we've worked with this XML document from the top down, starting with the root element, getting its child elements, and so on throughout the document. But many uses of XML require you to find specific elements. Etree can do that, too. gt gt gt import xml.etree.ElementTree as etree gt gt gt tree etree.parse 'examples feed.xml' gt gt gt root tree.getroot gt gt gt lt Element http www.w3.org 2005 Atom entry at e2b4e0 gt , lt Element http www.w3.org 2005 Atom entry at e2b510 gt , lt Element...

Loading Data from a json File

Like the pickle module, the json module has a load function which takes a stream object, reads JSON-encoded data from it, and creates a new Python object that mirrors the JSON data structure. File lt stdin gt , line 1, in lt module gt NameError name 'entry' is not defined gt gt gt import json gt gt gt with open 'entry.json', 'r', encoding 'utf-8' as f entry json.load f 'internal_id' '_class_' 'bytes', '_value_' 222, 213, 180, 248 , 'title' 'Dive into history, 2009 edition', 'tags'...

Matters of style

The rest of the fixes listed here aren't really fixes per se. That is, the things they change are matters of style, not substance. They work just as well in Python 3 as they do in Python 2, but the developers of Python have a vested interest in making Python code as uniform as possible. To that end, there is an official Python style guide which outlines in excruciating detail all sorts of nitpicky details that you almost certainly don't care about. And given that 2to3 provides such a great...

List Comprehensions

List Python

A list comprehension provides a compact way of mapping a list into another list by applying a function to each of the elements of the list. gt gt gt elem 2 for elem in a_list gt gt gt a_list elem 2 for elem in a_list 1. To make sense of this, look at it from right to left. a_list is the list you're mapping. The Python interpreter loops through a_list one element at a time, temporarily assigning the value of each element to the variable elem. Python then applies the function elem 2 and appends...

Introducing httpl i b

Before you can use httplib2, you'll need to install it. Visit code.google.com p httplib2 and download the latest version. httplib2 is available for Python 2.x and Python 3.x make sure you get the Python 3 version, named something like httplib2-python3-0.5.0.zip. Unzip the archive, open a terminal window, and go to the newly created httplib2 directory. On Windows, open the Start menu, select Run , type cmd.exe and press ENTER. c Users pilgrim Downloads gt dir Volume in drive C has no label....