Info

<p class="noLabel">Or<a href="registration.htm">register now</a>

if you do not have a Telegraph.co.uk profile </p> </form>

When you fill in the values and hit the Submit button, the web browser encodes the values by combining all fields (including the field names and their new values) into one string and sends that information as an HTTP POST request. The HTTP method is usually specified in the form definition, and as you can see from our example is currently set to POST.

If we want to achieve the same result, we first need to encapsulate the data we are going to submit. Unfortunately, urllib2 does not provide this functionality and we have to use the urllib method to encode the form data. The formatted string containing the form data should be supplied as an optional argument to the urlopen() method. If the additional data is supplied, the method will automatically send the POST request instead of the default GET request.

â– Note What is the difference between the post and get requests? The main difference is in the way these two requests subm it additional data to the web services. If you are sending a get request, the data is contained within the URL string. The URL would then have the syntax similar to this: http://example.com/some_page? key=value&key2=value2. Whereas if you send the post request, the URL will be http:// example.com/some_page, and the data will be encapsulated in the HTTP request headers.

Web sites usually manage user sessions with HTTP cookies. An HTTP cookie is a protocol message field, which is included in the communication messages sent from the web browser application to the web server. The HTTP protocol by nature is stateless. The HTTP requests do not carry any information that could help identifying the request sender. Keeping track of user activities is essential for the web shopping services or any other service that needs to provide personalized results. This activity is referred to as "maintaining a web session." One of the ways to maintain this session is by using HTTP cookies. Here's an example of an HTTP cookie:

Set-Cookie: BBC-

UID=444b6e3cefd9cadf2d0a1f38c1d37453cbc43c1fd0a0b13a641bca65f2 21d5240Python%2durllib %2f2%2e6; expires=Sat, 14-May-11 07:20:15 GMT; path=/; domain=bbc.co.uk;

There can be multiple cookies set in the HTTP header message. Each cookie has a name and a value along with some extra properties such as the domain that is supposed to receive it, the expiration time, and the URL portion. So how do cookies help to maintain sessions? When the web server receives a request, it sends the initial response back to the web browser. Along with the other HTTP header fields, it inserts the cookie field. The web client in turn saves the cookie in its internal database. When it makes another request it scans the database for cookies that both belong to the same domain it is currently sending the request to and have the matching path property. The web client then includes all matching cookies in its subsequent requests. Now the web server receives requests that are "marked" with the cookies and therefore knows that these requests are part of the same "conversation," or in other words belong to the same web session.

I've described the behavior of a typical web browser that handles the cookie storing and management activities automatically. The default URL processor (or the opener in urllib2 terms) does not process cookies. Luckily, all classes for handling cookies are included in the urllib2 module and you just need to replace the default opener with the custom opener object. The HTTPCookieProcessor class that we are going to use in constructing the new opener object is responsible for storing the HTTP cookies received from the server and then injecting them into all HTTP requests going to the same web site:

>>> import urllib, urllib2

Was this article helpful?

0 0

Post a comment