The cgi Module Parsing HTML Forms

When you click one of the Submit buttons on SimpleHTMLForm.html, notice that you're not exactly GETting the resource /cgi-bin/PrintFormSubmission.cgi, the resource specified in the action attribute of the <FORM> tag. You're GETting a slightly different resource, something with the long, unwieldy identifier of /cgi-bin/PrintFormSubmission.cgi?textField=Some+text&radioButton= 2&button=Submit.

This is how a GET form submission works: The web browser gathers the values of the fields in the form you submitted and encodes them so they don't contain any characters not valid in a URL (for instance, spaces are replaced by plus signs). It then appends the field values to the form destination, to get the actual resource to be retrieved. Assuming there's a CGI at the other end to intercept the request, the CGI will see that encoded form information in its QUERY_STRING environment variable. A similar encoding happens when you submit a form using the POST verb, but in that case the form data is sent as part of the data, not as part of the resource identifier. Instead of being made available to the script in environment variables, POSTed data is made available on standard input.

The cgi module knows how to decode the form data present in HTTP requests, whether the request uses GET or POST. The cgi module can obtain the data from environment variables (GET) or standard input (POST), and use it to create a reconstruction of the original HTML form in a class called FieldStorage.

FieldStorage can be accessed just like a dictionary, but in Python 2.2 and later, the safest way to use it is to call its getfirst() method, passing in the name of the field whose value you want.

In versions of Python prior to 2.2, the getfirst method is not available. Instead, to be safe you need to simulate getfirst with code like the following:

fieldVal = form.getValue("field")

if isinstance(fieldVal, list): #More than one "field" was submitted. fieldVal = fieldVal[0]

When you're actually expecting multiple values for a single CGI variable, use the _getlist_method instead of getfirst to get all the set values.

Safety when accessing form values

Why is form.getfirst('fieldName') safer than form['fieldName']? The root of the problem is that sometimes a single form submission can legitimately provide two or more values for the same field (for instance, this happens when a user selects more than one value of a selection box that allows multiple selections). If this happens, form['fieldName'] will return a list of values (e.g., all the selected values in the multiple-selection box) instead of a single value. This is fine as long as your script is expecting it to happen, but because users have complete control of the data they submit to your CGI script, a malicious user could easily submit multiple values for a field in which you were only expecting one.

If someone pulls that trick on you and your script is using form['fieldName'], you'll get a list where you were expecting a single object. If you treat a list as though it were a single object your script will surely crash. That's why it's safer to use getfirst: It is always guaranteed to return only the first submitted value, even if a user is trying to crash your script with bad data.

Now that you know about the FieldStorage object, it's easy to write the other half of SimpleHTMLForm. html: PrintFormSubmission.cgi, a CGI script that prints the values it finds in the form's fields:








form =


textField = form.getfirst("textField")

radioButton = form.getfirst("radioButton")

submitButton = form.getfirst("button")


'Content-type: text/html\n'






'<p>Here are the values of your form submission:</p>'




'<li>In the text field, you entered "%s".</li>' % textField


'<li>Of the radio buttons, you selected "%s".' % radioButton


'<li>The name of the submit button you clicked is "%s".' % submitButton







Now, when you click the submit button on SimpleHTMLForm.html, instead of getting a 404 Not Found error, you'll see something similar to what is shown in Figure 21-2.

Here are the values of your form submission:

• In the text field, you entered "Some text"

• Of the radio buttons, you selected "2"

• The name of the submit button you clicked is "Submit"

Figure 21-2

So far so good. Let's go a little further, though, and create a script capable of printing out any form submission at all. That way, you can experiment with HTML forms of different types. To get you started, let's have the new script print out a fairly complex HTML form when you hit it without submitting a form to it. The script that follows deserves to be called PrintAnyFormSubmission.cgi:

#!/usr/bin/python import cgi import cgitb import os cgitb.enable()

form = cgi.FieldStorage()

print 'Content-type: text/html\n' print '<html>' print '<body>' if form.keys():

verb = os.environ['REQUEST_METHOD']

print '<p>Here are the values of your %s form submission:</p>' % verb print '<ul>'

for field in form.keys():

valueObject = form[field] if isinstance(valueObject, list):

#More than one value was submitted. We therefore have a

#whole list of ValueObjects. getlist() would have given us

#the string values directly.

values = [v.value for v in valueObject]

connector = '" and "' #'"Foo" and "bar"' else:

connector = '", and "' #'"Foo", "bar", and "baz"' value = '", "'.join(values[:-1]) + connector + values[-1] else:

#Only one value was submitted. We therefore have only one #ValueObject. getfirst() would have given us the string #value directly. value = valueObject.value print '<li>For <var>%s</var>, I got "%s"</li>' % (field, value)


print '''<form method="GET" action="%s"> <p>Here's a sample HTML form.</p>

<p><input type="text" name="textField" value="Some text" /><br /> <input type="password" name="passwordField" value="A password" /> <input type="hidden" name="hiddenField" value="A hidden field" /></p>


<input type="checkbox" name="checkboxField1" checked="checked" /> 1 <input type="checkbox" name="checkboxField2" selected="selected" /> 2 </p>

<input type="radio" name="radioButton" value="1" /> 1<br />

<input type="radio" name="radioButtons" value="2" checked="checked" /> 2<br /> <input type="radio" name="radioButtons" value="3" /> 3<br /></p>

<textarea name="largeTextEntry">A lot of text</textarea>

<p>Choose one or more: <select name="selection" size="4" multiple="multiple"> <option value="Option 1">Option 1</option>

<option value="Option 2" selected="selected">Option 2</option> <option value="Option 3" selected="selected">Option 3</option> <option value="Option 4" selected="selected">Option 4</option> </select></p>

<p><input type="Submit" name="button" value="Submit this form" /> <p><input type="Submit" name="button" value="Submit this form (Button #2)" />

Try It Out Printing Any HTML Form Submission

Put PrintAnyFormSubmission.cgi in your cgi-bin/ directory and start up EasyCGIServer. Visit http://localhost:8000/cgi-bin/PrintAnyFormSubmission.cgi. You'll be given an HTML form that looks something like what is shown in Figure 21-3.

Here's a sample HTML form.

Some text

Checkboxes: F 1 r 2

A lot of test

Choose one or more:

Submit this form Submit this form (Button #2)

Figure 21-3

Change any of the form data you want and click one of the Submit buttons. You'll be taken to a screen that looks like the one shown in Figure 21-4.

Here are the values of your GET form submission:

• For passwordField, I got "A password"

• For hiddenFieid, I got "A hidden field"

• For checkboxFieldl, I got "on"

• For largeTextEntry, I got "A lot of text"

• For selection, I got "Option 2", "Option 3", and "Option 4"

• For button, I got "Submit this form (Button #2)"

Figure 21-4

Was this article helpful?

0 0

Post a comment