Working with String objects

Strings are one of the first objects many people use. You write that first "Hello World" application and marvel when the words appear on screen. In fact, strings are the mainstay of many applications. Without strings you can't provide prompts to the user or ask for input. Sure, you may not do any heavy lifting with strings, but every application out there requires strings to work properly. The following sections discuss the IronPython string object in more detail.

TYPING VARIABLES WITH TYPE()_

One of the problems you can encounter when working with an application is thinking a variable is of one type when it's actually something else. Each of the object types in IronPython has something different to offer, so it's important not to confuse one type with another. Chapter 4 demonstrated one potential type problem in working with ragged arrays — you never know whether you'll receive a list or a value. Consequently, you must check the type (or provide error trapping) before you make any assumptions. In order to perform a check, always use the__name__attribute for comparison purposes like this:

print 'MyVar is a string'

As with most things in IronPython, there are multiple ways to perform this task.

You don't have to use the type() function. Use the_class_attribute as shown here instead:

print 'MyVar is a string'

The result is the same. Theoretically, using__class_provides a performance boost. However, that performance boost, if any, is quite small, so you should use the approach that works best with your typing skills.

Performing Standard Tasks with Strings

You've already seen a few of the things you can do with strings in previous chapters. This chapter takes a little more organized look at the methods and properties associated with strings. The following list provides an overview of the most common tasks you can perform.

center(int width[, Char fillchar]): Centers the string within the space defined by width. The default is to use spaces to pad the left and right side of the string to center it. However, you can specify another character by specifying the optional fillchar. For example, if you want to center a string named MyString in a 40-character area using the * as a fill character, you'd type MyString.center(40, '*').

int count(str ssub, [int start[, int end]]): Counts the number of instances of a substring, ssub, within a string. The substring can be one or more letters that you want to find within the string. You may optionally provide a starting point, start, and an ending point, end, for the count. For example, if you want to count the number of ls found in MyString, you'd type MyString.count('l', 0, len(MyString)).

decode([object encoding[, str errors]]): Decodes an encoded string. Even though encoding is optional, you must provide a value in order to decode the string. You can find a list of standard encodings at http://www.python.org/doc/2.5.2/lib/ standard-encodings.html. The errors argument defines how decode() treats errors, with a default value of strict. You can find a list of error strings at http://www.python .org/doc/2.5.2/lib/codec-base-classes.html. For example, you might have a Unix-to-Unix Encode (uuencode) string named EncodeString that you want to decode into plain text. To convert the string, you'd type EncodeString.decode('uu_codec').

encode([object encoding[, str errors]]): Encodes a string to another format. You have the same options as when decoding a string (see the decode() entry in this list). For example, you might want to encode a string using uuencode. To perform this task, you'd type EncodeString = MyString.encode('uu_codec' EncodeString would contain the uuencoded string.

endswith(object suffix[, int start[, int end]]): Determines whether the string ends with a particular letter or substring, suffix. You may optionally provide a starting point, start, and ending point, end, in the string. When using an end value, endswith() checks the designated endpoint, rather than the actual end of the string. For example, if you want to determine whether there's a l at position 4 (an end point of 3 since the string count begins with 0), you'd type MyString.endswith('l', 0, 3).

expandtabs([int tabsize]): Expands the tabs within a string using spaces. You may optionally provide the number of spaces to use for each tab using tabsize. For example, if you want to expand the tabs in a string to four spaces, you'd type MyString.expandtabs(4).

^ find(str sub[, int start[, int end]]) or find(str sub, object start, object end) : Locates the substring, sub, within the string and outputs an integer value defining the first occurrence of the substring. You can optionally add a starting, start, and ending, end, value to change the location that the method searches within the string (the default is to search the entire string). In this case, the starting and ending value need not be an integer value, but can be an object that defines the starting and ending point instead. For example, if you want to search for the first occurrence of l within a string, you'd type MyString.find('l'). This method returns a value of -1 when the string doesn't contain the search value.

format(*args[, *kwargs]): Formats the string using a template (see the "Formatting String Output" section for details). The args argument contains positional information and kwargs contains a keyword argument.

index(str sub[, int start[, int end]]) or index(str sub, object start, object end): Performs precisely the same task as find(). However, instead of returning -1 when a value isn't found, index() raises a ValueError instead.

^ isalnum(), isalpha(), isdecimal(), isdigit(), islower(), isnumeric(), isspace(), istitle(), isunicode(), and isupper(): Detects the state of the string and returns True when the specified condition exists. For example, isalnum() returns True when a string contains some combination of letters and numbers. The string must contain at least one letter, but need not necessarily contain any numbers. The isalpha() method, on the other hand, only returns True when the string contains only letters, and isnumeric() returns True when the string contains only numbers.

join(list sequence) or join(object sequence): Appends a string to a list or a sequence. This method joins each member of the sequence to the source string. For example, if the source string contains ABC and you join 123 to it, you obtain '1ABC2ABC3' as output. To obtain this output, you'd type MyString.join('123'). As an alternative, you could type MyString .join(['1'/ '2', '3']) to obtain the same output using a list.

ljust(int width[, Char fillchar]): Left-justifies the string to a length specified by width by padding the left end with the specified number of characters. You can optionally specify a fill character other than the default of a space by providing fillchar. For example, if you want to left-justify a string to 40 spaces and fill the spaces with an *, you'd type MyString.ljust(40, '*').

lower(): Returns the lowercase version of the string.

lstrip([str chars]): Removes white space from the beginning of a string by default. You may also provide a chars value as input. In this case, the method removes that character from the beginning of the string when it exists. For example, to remove the leading spaces from a string, you'd type MyString.lstrip().

partition(str sep): Divides the string into three parts based on the value of sep. The first part contains the piece of the string before sep, the second part contains sep, and the third part contains the piece of the string after sep. For example, to split a string at the first space, you'd type MyString.partition(' ').

replace(object old, object new[, int maxsplit]): Replaces the occurrences of old with new in the target string. You may provide an optional number of replacements to make by defining maxsplit. For example, if you want to replace the spaces in a string with the newline escape code, you'd type MyString.replace(' '/n').

rfind(): Performs the same task as find(), except that this method searches from the right end of the string, rather than the left. See the find() entry in the list for details.

rindex(): Performs the same task as index(), except that this method indexes from the right end of the string, rather than the left. See the index() entry in the list for details.

rjust(): Performs the same task as ljust(), except that this method right-justifies the string, rather than left-justifying it. See the ljust() entry in the list for details.

rpartition(): Performs the same task as partition(), except that this method partitions the right side of the string, rather than the left side. See the partition() entry in the list for details.

rsplit(): Performs the same task as split(), except that this method begins at the right side of the string, rather than the left. See the split() entry in the list for details.

rstrip(): Performs the same task as lstrip(), except that this method begins at the right side of the string, rather than the left. See the lstrip entry in the list for details.

split(str sep[, int maxsplit]): Divides the string into a list using sep as the point of division. You may provide an optional number of replacements to make by defining maxsplit. For example, if you want to divide a string into individual words, you'd type MyString.split(' ').

splitlines([bool keepends]): Breaks a string apart by lines. The output is a list of lines within the string. Normally, the output doesn't include the newline character. However, you can keep the newline character by setting keepends to True. For example, to break a string part into individual lines, you'd type MyString.splitlines().

startswith(): Performs the same task as endswith(), except that this method works with the beginning of the string, rather than the end of the string. See the endswith() entry in the list for details.

strip(): Performs the same task as lstrip(), except that this method removes spaces (or other characters) from both ends of the string, rather than just the left. See the lstrip() entry in the list for details.

swapcase(): Sets all of the lowercase characters to uppercase and all of the uppercase characters to lowercase. For example, if you begin with 'Hello World', you'd receive 'hELLO wORLD' as output if you typed MyString.swapcase().

title(): Returns a title-cased version of a string where the first letter of each word is capitalized and all other letters are lowercase. For example, if you begin with 'helLo wORLD', you'd receive 'Hello World' as output if you typed MyString.title().

translate(str table, [str deletechars]) ortranslate(dict table): Replaces the characters in a string with the equivalents specified by table. The table argument is 256 characters long and you can create it using the MakeTrans() function found in the string module. (Remember to use from string import maketrans to make accessing the function easy.) For example, if you want to replace the first 16 lowercase letters with hexadecimal equivalents, you'd type MyString.translate(maketrans('abcdefghijklmnop', '0123456789ABCDEF')). Using this code as a starting point, 'Hello World' becomes 'H4BBE WErB3'.

upper(): Returns the uppercase version of the string.

zfill(int width): Returns a string that has zeros placed on the left side to pad the string to the length specified by width. For example, if you typed MyString.zfill(40), you'd receive a string that is 40 characters long with as many zeros on the left side as required to produce the required length.

Formatting String Output

String formatting can become quite complex in Python and IronPython. However, if you start with the basics, you'll find that you can usually figure out the complex elements without too much trouble. A basic format string contains one or more fields. A field is simply some text that appears within curly braces that you replace with a value. In fact, if you've worked with any .NET language, you've already used fields. Here's a simple sentence that contains a field.

Of course, you won't want to print this string directly onscreen. Instead, you'll want to replace {0} with some other value. In order to do this, you can use the format() method as shown here.

MyString.format('George')

The interpreter replaces the {0} with the name George. Consequently, you see 'Hello George' as output from these two lines of code. You have a number of options when working with replaceable variables in a string. The following list shows just a few of the options:

MyString = 'Hello {0}' : Provides a simple replacement from a list of input arguments. The input arguments must appear in the order required in the string.

MyString = 'Hello {0[name]}': Provides a replacement from a dictionary. The corresponding format() method input is MyString.format({'name':'George'}). Of course, you can provide additional field information if your dictionary contains arrays for each of the elements. In this case, you specify the element you want to use like this: MyString = 'Hello {0[names][0]}'. The resulting format() method input is MyString.format({'names':['George', 'Amy']}). The advantage of this method is that the input arguments can appear in any order.

MyString = 'The paths are {O.path}': Provides a means of accessing an attribute within an object. The corresponding format() method input is MyString.format(sys). If you want to access a specific path, simply include the element specifier like this: MyString = 'The path is {0.path[0]}'. The advantage of this technique is that you can access properties within objects without first placing the property value in a variable.

A formatting string can contain as many variables as needed to provide complete information to the user. For example, you can add a second argument like this.

When you call the format() method, you now need to add some more information. The format() method input for this string might look like this.

MyString.format('George', 'London')

In many cases, you need to provide input that doesn't translate into a string. For example, you might need to provide integer input for some strings. The interpreter won't automatically perform a conversion in this case so you need to perform the task manually. The conversion symbol is the exclamation mark (!) and the most common conversion is string (s). You can also call the repr() conversion function by using r in place of s. Here's an example of a conversion:

MyString = '{0!s} + {1!s} = {2!s}' MyString.format(1, 2, 1+2)

In this case, you get an output of '1 + 2 = 3'. Notice that this example places the math directly in the format string. You could place the output of a function there as well.

So far, the examples haven't done much formatting — they have simply replaced field values with information found in other sources. The format operator is the colon (:) and you can combine it with the conversion operator if you want. To see how this works, think about displaying the previous example in hexadecimal format. In that case, your code might look like this:

MyString = '{0:X} + {1:X} = {2:X}' MyString.format(10, 20, 10+20)

The output from this code is in hexadecimal format — you'd see 'A + 14 = 1E'. Of course, you might want all the values to take up the same space. In this case, you can tell the interpreter to add some space to the output using the following string:

This string outputs numbers with zeros as padding. The padding appears after any sign information. In addition, each of the entries is four digits long. Consequently, the output now looks like this: '000A + 0014 = 001E'. The formatting has specific entries, all of which are options. It looks like this:

Fill characters determine what appears as part of the padding the interpreter uses when you specify a width, and the field value doesn't fill the entire space. The default padding is the space, but you can specify any character other than the closing brace, which would end the formatting definition. When you specify a fill character, such as the 0 used in the previous example, you must also use one of the alignment characters found in Table 5-1.

TABLE 5-1: String Formatting Alignment Options option meaning

'<' Sets the field to use left alignment, which is the default.

'>' Sets the field to use right alignment.

'=' Adds the padding after the sign (if any), but before any digits. You've already seen the effect of this alignment option earlier in this chapter. The interpreter recognizes this alignment only when working with numeric types.

'A' Centers the field information within the available space.

The use of signs in the output comes next. For example, you can choose to have all positive numbers begin with a plus sign (+) so there's no confusion about their positive value. Table 5-2 shows the sign formatting options you can use.

TABLE 5-2: String Formatting Sign Options option meaning

'+' Adds a sign for both positive and negative numbers.

'-' Adds a sign only for negative numbers. This is the default behavior.

Spacebar Adds a leading space for positive numbers and a minus sign for negative space numbers. Using this option lets you align numbers in tables that contain both positive and negative numbers.

The pound sign (#), which is called by a host of names, such as octothorp and number sign, tells the interpreter to add a letter after numeric values to show their base — b for binary, o for octal, or x for hexadecimal (decimal values never have the letter added). For example, if you change the previous formatting string to include the # like this:

the output changes to include the correct base designation. You'll see '0X000A + 0X0014 = 0X001E' as the output.

The width and precision entries come next. If you precede the width value with a 0, then the interpreter will pad the numeric values with zeros. The precision entry tells the interpreter how many decimal places to use for the output.

The final formatting you can request is the output type. In this case, you must decide in advance what kind of value that the field will accept — integers use different type designations than floating point and decimal types. Table 5-3 shows the types you can use for integer input, while Table 5-4 shows the types for floating point and decimal.

TABLE 5-3: Integer Formatting Types option meaning

'b' Outputs the number as a base 2 (binary) value.

'c' Converts the integer value to a Unicode character prior to printing. The acceptable value range is from 0 to 255. The output shows printable characters up to 126 (the tilde, ~).

'd' Outputs the number as a base 10 (decimal) value. This is the default output.

'o' Outputs the number as a base 8 (octal) value.

TABLE 5-3 (continued)

option meaning

Outputs the number as a base 16 (hexadecimal) value. The Interpreter uses lowercase characters for any value above 9 and also for the base indicator.

Outputs the number as a base 16 (hexadecimal) value. The interpreter uses uppercase characters for any value above 9 and also for the base indicator.

Outputs the number as a base 10 (decimal) value. However, this setting uses the user's locale setting for separator characters. For example, many countries use the comma for the decimal point instead of a period.

TABLE 5-4: Floating Point and Decimal Formatting Types option meaning

Outputs the number in exponent (scientific notation) form, using the letter 'e' (lowercase) to indicate the exponent.

Outputs the number in exponent (scientific notation) form, using the letter 'E' (uppercase) to indicate the exponent.

Outputs the number in fixed-point format.

Outputs the number in a general format. The presentation depends on the numeric magnitude. Smaller numbers appear in fixed-point format, while larger numbers appear in scientific notation.

The rules for determining whether a number appears in either fixed-point or scientific notation are relatively complex, but are based on the size and precision of the number. If a number would require too many zeros (due to being too large or too small) to present as fixed point, the interpreter automatically chooses scientific notation. No matter how the interpreter presents the number, it removes insignificant trailing zeros. In addition, the interpreter removes the decimal point if there aren't any digits following it. The interpreter also presents positive infinity as inf, negative infinity as -inf, positive zero as 0, negative zero as -0, and Not-a-Number (NaN) as nan. You can read more about these special value representations at http://steve .hollasch.net/cgindex/coding/ieeefloat.html.

Outputs the number in general format using the same requirements as the 'g' type. The difference is that this type uses an uppercase 'E' for scientific notation. Both representations of infinity and NaN appear in uppercase as well.

Outputs the number in general format using the same requirements as the 'g' type. This type differs because it relies on the user's current locale settings to insert the appropriate number separator characters.

Outputs the number as a percentage by multiplying the number by 100 and appending a percent sign.

Was this article helpful?

0 0

Responses

  • ANSEGAR
    What is ##subobjectstart##?
    8 years ago

Post a comment