TQST Drivq

Type the code into IDLE, save the program (choose a name for the program that is meaningful to you), and run it.

3.H9

And tWe it is: the d^ent ^ W the Beans R'Ws web page.

python j.D.i (rïUiidïMfc, her i; jüuy, îîiiaiSVJ [ncn 4.1 ihn linn I? rype "copyright", "cieditB" or "license;)" tor more inlormation. >»-------------------------------BIlflTftRT ---------------------

3.H9

And tWe it is: the d^ent ^ W the Beans R'Ws web page.

That looks a lot cleaner. Now, instead of displaying the entire HTML text of the web page, you've cut it down to just the piece of the string (the substring) that you need.

string theory

TheStmg

This week's interview: We ask the String what it's like being the world's most eligible datatype.

Head First: String, it's so good of you to find the time to speak to us.

String: Please, the honor is mine. Sit. Sit. Make yourself at home. Did you eat yet?

Head First: I'm fine, thank you. String, where should I begin? You are known the world over for your work. In your time you've carried the works of Shakespeare, Geothe...

String: Dan Brown.

Head First: ...all the great works of literature. And even mundane things like names and addresses. Tell me, how did you become so popular?

String: It's a question of character. Well, characters. See, before I existed, computer systems used to record text one character at a time.

Head First: That must have been rather inconvenient.

String: Inconvenient? It was a royal pain in the tuchis.

Head First: Quite.

String: Without me, handling text was like riding a pedal cycle without a saddle.

Head First: In what way?

String: It was possible to get somewhere, but the journey was kind of stressful.

Head First: You simplify things.

String: Certainly. I simplify. Instead of keeping track of a hundred, or a thousand, or a million letters, you just need keep an eye on one thing. Me!

Head First: That's a good point.

String: I like to think of myself as a front. An agent, you might say, for all the characters I work with.

Head First: People deal with you, so they don't have to deal with individual characters in memory.

String: Exactly. I'm an organizer. I keep an eye on the day to day business of the letters. If I need to be shorter or longer, I arrange for the characters to be made available.

Head First: Tell me about your substrings.

String: Ah, my substrings. Like chips off the old block. That a humble datatype should be so blessed!

Head First: A tissue?

String: Bless you. <blows nose>. Those boys are so close to me. Here's a photo. Can you see the resemblance?

Head First: Why he looks just like...

String: Ah, you guessed! Yes, my character sequence from 137 to 149. Exactly. Just like his old man. But shorter. Little more hair.

Head First: Your substrings are strings as well.

String: Certainly. Strings just like me. And they, I hope, should one day be able to produce their own substrings as well.

Head First: Yet some people are confused by your indexing.

String: What can I say? I started with nothing!

Head First: String, thank you.

String: A pleasure. Are you sure you ate?

itereigre no

Dumb Questions

So, I can put any web address into this code and grab the associated web page from the Internet?

A" Yes, feel free to try it out for yourself.

Don't I need a web browser to view web pages?

A" Yes, to view a web page in all its formatted glory—with embedded pictures, music, videos and the like—a web browser is a must-have. However, if all you want to see is the "raw" HTML, a browser is overkill.

What does the import line of code do?

A." 't gives the program the ability to talk to the Internet. The urllib . request code comes as standard with Python 3.

And I guess that call to urlopen() goes and gets the web page?

A." That's right! The provided web address (or "URL" to use the proper web-speak) is fetched from the Internet and returned by the call to urlopen (). In this code, the fetched web page is assigned to the page variable.

Q/ And the urllib.request bit?

A." That just tells the program to use the urlopen() function that comes as standard with Python 3's Internet page-reading technology. We'll have more to say about urllib . request in a little bit. For now, just think how lucky we all are not to have to write code to fetch web pages from the Internet.

I get that the call to read() actually reads the web page from the page variable, but what's that decode("utf8") thing?

A" When the web page is fetched from the Internet, it is in a "raw" textual format. This format can be a little hard for humans to read. The call to decode () converts the raw web page into something that looks a little easier on the eye.

To see what we mean, try removing the call to decode () from the program and running the code again. Looks a little weird, doesn't it? (Don't forget to put the call to decode () back in before continuing.)

BULLET POINTS

You can download the HTML of a web page as a textual string.

A string is a sequence of characters.

You can access individual characters in a string using an offset.

The offset is known as the index value of the character (or just index for short).

Strings within strings are called substrings.

Substrings are specified using two index values-for example: text[10:20].

The first index value is the location of the first character of the substring.

The second index value is the location after the last character of the substring (up to, but not including).

Subtract the second index from the first to work out how long the substring should be.

change of address

Was this article helpful?

0 0

Post a comment