Candidates should be able to:
- explain the use of binary codes to represent characters
- explain the term character set
- describe with examples (for example ASCII and Unicode) the relationship between the number of bits per character in a character set and the number of characters which can be represented.
How are binary codes used to represent characters?
Each character (such as uppercase and lowercase letters, numbers and symbols) must be stored as a unique number called a character code if a computer system is to be able to store and process it.
For example the number code for the character ‘a’ could be decimal 97 and a ‘space’ character could be 32. When a character is stored on a computer system it is therefore the number code that is actually stored (as a binary number).
What is a character set?
A character set is a complete set of the characters and their number codes that can be recognised by a computer system.
How are the number of bits per character related to the number of different possible characters ?
The ASCII Character Set – 7-8 bits per character
The ASCII (American Standard Code for Information Interchange) character set uses 1 byte of memory per character. Original versions of ASCII only used 7 of the 8 bits available, allowing 128 different characters to be represented using the binary codes 0000000 to 1111111.
The ASCII character set now uses all 8 bits, allowing 256 total characters. This is still a very limited number of characters and means that different ASCII character sets are needed for the symbols and accented characters used in different countries.
The table below shows example characters, their decimal codes and the binary codes actually stored by the computer:
|Example characters, their decimal codes and the binary codes
actually stored by the computer.
|Character||Decimal ASCII Code||Binary code|
- Control characters: ASCII actually reserves the first 32 codes (numbers 0–31 decimal) for non-printable control characters. Many of these are now obsolete but some are still used, for example:
- ASCII code 13 is a carriage return, moving the cursor to a new line;
- ASCII code 9 inserts a tab into a line of text.
The Unicode Character Set – 16 bits per character
The Unicode character set potentially uses up to 16 bits (2 bytes) of memory per character which allows 65,536 different characters to be represented.
Using 16 bits means that Unicode can represent the characters and symbols from all the alphabets that exist around the globe, rather than having to use different character sets for different countries.
The first 128 characters in ASCII have the same numeric codes as those in Unicode, making Unicode backward compatible with ASCII.