The char
type is used to store single characters (letters, digits, symbols, etc...).
Remember, when values of variables are stored in the computer's memory, they must ultimately be stored in terms of only 1's and 0's. Recall, integer-based values are stored with the "Two's Complement" method, and floating-point decimal-based values are stored with IEEE 754 Notation. Similarly, we need a way to represent characters in terms of 1's and 0's.
Encoding refers to how something (like a char) is converted to its binary representation in the computer's memory.
A character set is a collection of characters along with an encoding scheme to convert them to their binary representations.
There are two popular character sets used frequently in programming languages:
ASCII
0
-9
, some common symbols, and several unprintable characters like tab, return, backspace, etc... The extended versions of ASCII add several more printable characters.
Unicode
'A'
, for example, is encoded to a decimal value of 65 in ASCII, then the Unicode version of 'A'
will also be encoded to the same value.
char
Data TypeCharacter literals are expressed with single quotes, as shown below:
char c1 = 'A'; char c2 = '4';
Chars can also be specified by their encoded values:
char c1 = 65; //c1 = 'A' char c2 = 52; //c2 = '4'
Character literals (especially those for characters not on most keyboards) can be specified in the following way as well:
char c1 = '\u0041'; // c1 = 'A' (note: 65 = 41 (base 16)) char c2 = '\u0034'; // c2 = '4' (note: 52 = 34 (base 16)) char c3 = '\u0060'; // c3 = '`' char c4 = '\u00A9'; // c4 = '©' char c5 = '\u03C6'; // c5 = 'φ'
In general, literals expressed in this way take the form '\u####'
, where ####
is a
hexadecimal (base 16) number corresponding to the Unicode value for the character.
You can use Unicode characters within Strings as well. For example,
System.out.println("\u0041\u0034"); will print "A4" to the console.
Some characters are hard to put into a string. Suppose we wanted to print
Bob said "That's Great!"
to the console. The following would produce an error:
System.out.println("Bob said "That's Great!""); \\ERROR!!
since the compiler will think the string ended when it sees the second quotation mark.
We could try the following, but even it causes problems:
System.out.println("Bob said \u0022That's Great!\u0022"); \\ERROR!!
Java gives us special escape sequences for characters like this. Some important ones are shown in the table below:
Description | Escape Sequence | Unicode |
Backspace | \b | \u0008
|
Tab | \t | \u0009
|
Linefeed | \n | \u000A
|
Carriage return | \r | \u000D
|
Backslash | \\ | \u005C
|
Single Quote | \' | \u0027
|
Double Quote | \" | \u0022
|
So, to print out comment about what Bob said, we can write:
System.out.println("Bob said \"That's Great!\"");
Which looks a bit better.
char
data type...chars can be used with other numeric values (sometimes requiring a cast) and with numeric operators
char c1 = 97; // c1 = 'a' char c2 = (char) 97.25; // c2 = 'a' int n = 'A'; // n = 65 int m = '2' + '3'; // (int) '2' = 50, so m = 101
The "+" operator can be used to concatenate a char with a string
char c = 'A'; String s = "BCD"; String s2 = c + s1; System.out.println(s2); //prints "ABCD"
The method charAt(int pos)
can be used to get the char at a given position in a String
String s = "HELLO WORLD"; char c1 = s.charAt(0); char c2 = s.charAt(6); System.out.println(c1 + " is at position 0, while " + c2 + " is at position 6"); //prints "H is at position 0, while W is at position 6"
The increment and decrement operators can be used to get the next or preceding Unicode character
char c = 'B'; System.out.println(++c); //prints the letter 'C' System.out.println(--c); //prints the letter 'A'