## Bar Charts

A favorite of many, bar charts allow quantitative comparison of several values. To use a bar chart, call the function bar(left, height), where left is the x coordinates of the bar and height is the bar height. The function bar() allows for considerable customization; issuing help(bar) will provide most of the details.

Example: GDP, N Top Countries

For this example, which plots the purchasing power parity (GDP) of various countries, you'll need the CIA GDP Rank Order file, available from the CIA World Factbook (https://www.cia. gov/library/publications/the-world-factbook/rankorder/2001rank.txt); this is a tab-delimited file, perfect for easy data processing. I'll assume that you've downloaded the file and saved it in folder Ch6/data; the source code resides in Ch6/src, and the output files are located in Ch6/images.

First, we'll define a function to read the data, as we will use it in several examples in the chapter. The code in Listing 6-7 should be saved under file src/read_world_data.py.

Listing 6-7. Function read_world_data() import csv, re def read_world_data(N=10, fn='../data/2001rank.txt'): A function to read CIA World Factbook file.

N is the number of countries to process.

See https://www.cia.gov/library/publications/the-world-factbook/ rankorder/2001rank.txt

# read the data and process it for i, row in enumerate(csv.reader(open(fn), delimiter='\t')):

# skip first several lines ifi>3:

# remove the dollar, comma and space characters gdp_value = re.sub(r'[\\$, ]', '', row[2])

# store data in billions of dollars gdp.append(float(gdp_value)/le9) labels.append(row[l].strip())

# stop analyzing the data after N countries have been processed ifi> N+2:

break return (gdp, labels)

The function reads data from the first N countries and returns their GDP alongside the country names. I've made use of two modules. The first, the csv module, reads the data, which is tab delimited. The second, the re module, gets rid of the dollar sign, comma, and space characters in the GDP value field.

Armed with read_world_data() function, we turn to plot the bar chart (see Listing 6-8).

Listing 6-8. Plotting the GDP Bar Chart

# a script to plot GDP bar chart from pylab import *

# initialize variables, N is the number of countries N = 5

# plot the bar chart bar(arange(N), gdp, align='center')

# annotate with text xticks(arange(N), labels) for i, val in enumerate(gdp):

text(i, val/2, str(val), va='center', ha='center', color='yellow') ylabel('\$ (Billions)')

title('GDP rank, data from CIA World Factbook')

The script by now should be quite readable. Notice that I've decided to put the read_ world_data() function in a separate file, and so to be able to use the function, I've called the function execfile('read_world_data.py').

If you scroll down to the end of CIA GDP rank order file, you'll find a note similar to this:

This file was last updated on 23 October, 2008

It's a good idea to extract the date information and add it to the title (or some other spot of your choice):

>>> title('GDP rank, data from CIA World Factbook, '+last_line[31:-l])

Alternatively, you can modify the function read_world_data() to return this string as well. Figure 6-9 shows our bar chart.

GDP rank, data from CIA World Factbook

14000

eooo

6000

10000

12000

4000

2000

GDP rank, data from CIA World Factbook

United States China Japan

India Germany

Figure 6-9. Bar chart showing World GDP rank

United States China Japan

India Germany

### Figure 6-9. Bar chart showing World GDP rank

It's also possible to add error bars. To add an error bar equivalent to ±1000 billion dollars (talk about an error, eh?), add this line to the script shown in Listing 6-8, just after the bar() function call:

Finally, the function barh() plots a horizontal bar chart instead of a vertical one should you require one.