Unicode

Unicode is a standard for working with character sets from all common languages. It's essential for internationaliz software.

Are you the least bit curious about the differences between writeUTF () ,writeChars (), and writeBytes () ? Here they are:

>>> file.writeChars(" how are you")

>>> file.writeBytes(" fine thanks")

In essence, writeChars () writes the Unicode repfesentation of the characters to the file; writeUTF () an writeBytes () wnite out the ASCII equivalents . (More precise ly, writeUTF () writes in a Java-modified 1 format, which you cam read about inthe Java API documeptanion under the Datalnput mterface. In the previo interactive example;, UTF-8 equated to ASCII.)

The Unicode data in this exomple is reprenented by two bytes, whereas the writeBytes () and writeUTF is represented by one byte. Fire up a hexdump utility, and see for yourself (this is a hexdump of the rfile.bin file):

C:\dat>debug rFile.bin -d

1876:

: 0100

00

01

02

03

04

05

06

07-

08

09

0A

00

05

48

65

6C

1876:

: 0110

6C

6F

00

20

00

68

00

6 F-

00

77

00

20

00

61

00

72

lo.

.h.o.w.

1876:

: 012 0

00

65

00

20

00

79

00

6 F-

00

75

20

66

69

6E

65

20

.e.

.y.o.u

1876:

: 0130

74

68

61

6E

6B

73

00

01-

02

03

04

05

06

07

08

09

thanks

1876:

: 0140

0A

02

D3

74

0A

41

3C

22-

75

, E6

80

F7

20

EB

E 1

5E

...

. t.A<"u. .

1876:

: 0150

58

C3

A1

D7

D 7

8B

36

D9-

D7

C6

06

1B

D9

00

C6

06

X

1876:

0160

17

D9

00

8 B

3 6

D9

D7

8B-

0E

D7

D7

8B

D6

E3

42

51

. .6

1876:

: 017 0

56

5B

2B

DE

59

03

CB

8B-

D6

C6

06

BB

DB

00

E3

31

Notice that the how are you in the light column, which was written with writeChars (), has a period (' between each character, while the Hello and the fine thanks don't. This shows that how are you is 1 byte Unicode.

Was this article helpful?

0 0

Post a comment