Data and its Representation in the digital computer
Next Topic(s):
Created:
29th of July 2025
03:45:34 PM
Modified:
29th of July 2025
05:22:33 PM
Data Representation
Imagine every song, message or video you enjoy as a mosaic of tiny tiles—each tile only black or white, off or on. In computers, those tiles are bits, the smallest unit of information. Bits group into bytes, and with those building blocks, we paint every image, play every tune and display every character you see on screen.
Bits and Bytes
A single bit holds one of two values: 0 or 1. When you collect eight bits together, you get a byte. A byte can represent 256 different values (from 0 to 255). That’s why file sizes grow by bytes, kilobytes (1 024 bytes), megabytes (1 024 KB), and beyond.
Character Sets
Bytes become meaningful when we agree on a character set—a table that maps each numeric value (0–255) to a letter, symbol or control code. Without a shared map, “65” might be “A” on one machine and “क” on another.
ASCII Character Set
Born in the 1960s, ASCII uses seven bits (0–127) to define English letters, digits, punctuation and basic controls. For example:
65
→A
97
→a
48
→0
10
→ Line Feed (LF)13
→ Carriage Return (CR)
Since each ASCII code fits in one byte (with the top bit zero), it tied neatly to the 8-bit bytes of early hardware.
UTF-8 and Why It’s Preferred
Modern computing spans every language and script. UTF-8 extends ASCII by using one to four bytes per character, while keeping all 0–127 codes unchanged. This backward compatibility plus efficient encoding of common characters makes UTF-8 the global standard.
For instance:
- “வணக்கம்” (“Vannakkam” in Tamil) becomes the byte sequence
E0 AE B4 E0 AE A3 E0 AF 8D E0 AE 95 E0 AE BE E0 AE AE E0 AF 8D
. - “नमस्कार” (“Namaskar” in Hindi) becomes
E0 A4 A8 E0 A4 AE E0 A4 BF E0 A4 95 E0 A4 BE E0 A4 B0
.
Under the hood, each Tamil or Devanagari code point (above 1 000) expands into a unique multi-byte pattern that UTF-8 decoders reverse back into the correct glyph.
Internal Representation of Data
Everything a computer works with—numbers, text, images—is stored as bits. Integers use binary and often two’s-complement for negatives; floating-point values follow the IEEE 754 standard with sign, exponent and mantissa fields. Even colours in pictures become three bytes for red, green and blue channels.
Non-printing Characters (CR, LF and CRLF)
Not all bytes map to visible symbols. Some are control codes that steer printers or screens:
CR (13)
moves the cursor back to the line’s start.LF (10)
advances the cursor down one line.- CRLF (13 followed by 10) is the Windows convention for a new line; Unix systems use LF alone, and older Macs used CR alone.
Special Characters
Beyond letters and digits lie symbols and whitespace:
- Tab
\t
(ASCII 9) jumps the cursor to the next stop. - Bell
\a
(ASCII 7) once rang a terminal bell. - Escape
\e
(ASCII 27) starts sequences that change text colour or move the cursor in terminals.
Understanding bits and bytes, ASCII’s limits and UTF-8’s power, plus how control and special codes shape every line of text, you’re ready to see how different languages, including Python turns these raw patterns into commands you can write and read. Next, we’ll explore how to input and output this data, from serial cables to cloud APIs, all through code you’ll create by yourself.
ASCII Character Set
Decimal | Binary | Symbol |
---|---|---|
0 | 00000000 | NUL (non-printable) |
1 | 00000001 | SOH (non-printable) |
2 | 00000010 | STX (non-printable) |
3 | 00000011 | ETX (non-printable) |
4 | 00000100 | EOT (non-printable) |
5 | 00000101 | ENQ (non-printable) |
6 | 00000110 | ACK (non-printable) |
7 | 00000111 | BEL (non-printable) |
8 | 00001000 | BS (non-printable) |
9 | 00001001 | HT (non-printable) |
10 | 00001010 | LF (non-printable) |
11 | 00001011 | VT (non-printable) |
12 | 00001100 | FF (non-printable) |
13 | 00001101 | CR (non-printable) |
14 | 00001110 | SO (non-printable) |
15 | 00001111 | SI (non-printable) |
16 | 00010000 | DLE (non-printable) |
17 | 00010001 | DC1 (non-printable) |
18 | 00010010 | DC2 (non-printable) |
19 | 00010011 | DC3 (non-printable) |
20 | 00010100 | DC4 (non-printable) |
21 | 00010101 | NAK (non-printable) |
22 | 00010110 | SYN (non-printable) |
23 | 00010111 | ETB (non-printable) |
24 | 00011000 | CAN (non-printable) |
25 | 00011001 | EM (non-printable) |
26 | 00011010 | SUB (non-printable) |
27 | 00011011 | ESC (non-printable) |
28 | 00011100 | FS (non-printable) |
29 | 00011101 | GS (non-printable) |
30 | 00011110 | RS (non-printable) |
31 | 00011111 | US (non-printable) |
32 | 00100000 | Space |
33 | 00100001 | ! |
34 | 00100010 | " |
35 | 00100011 | # |
36 | 00100100 | $ |
37 | 00100101 | % |
38 | 00100110 | & |
39 | 00100111 | ' |
40 | 00101000 | ( |
41 | 00101001 | ) |
42 | 00101010 | * |
43 | 00101011 | + |
44 | 00101100 | , |
45 | 00101101 | - |
46 | 00101110 | . |
47 | 00101111 | / |
48 | 00110000 | 0 |
49 | 00110001 | 1 |
50 | 00110010 | 2 |
51 | 00110011 | 3 |
52 | 00110100 | 4 |
53 | 00110101 | 5 |
54 | 00110110 | 6 |
55 | 00110111 | 7 |
56 | 00111000 | 8 |
57 | 00111001 | 9 |
58 | 00111010 | : |
59 | 00111011 | ; |
60 | 00111100 | < |
61 | 00111101 | = |
62 | 00111110 | > |
63 | 00111111 | ? |
64 | 01000000 | @ |
65 | 01000001 | A |
66 | 01000010 | B |
67 | 01000011 | C |
68 | 01000100 | D |
69 | 01000101 | E |
70 | 01000110 | F |
71 | 01000111 | G |
72 | 01001000 | H |
73 | 01001001 | I |
74 | 01001010 | J |
75 | 01001011 | K |
76 | 01001100 | L |
77 | 01001101 | M |
78 | 01001110 | N |
79 | 01001111 | O |
80 | 01010000 | P |
81 | 01010001 | Q |
82 | 01010010 | R |
83 | 01010011 | S |
84 | 01010100 | T |
85 | 01010101 | U |
86 | 01010110 | V |
87 | 01010111 | W |
88 | 01011000 | X |
89 | 01011001 | Y |
90 | 01011010 | Z |
91 | 01011011 | [ |
92 | 01011100 | \ |
93 | 01011101 | ] |
94 | 01011110 | ^ |
95 | 01011111 | _ |
96 | 01100000 | ` |
97 | 01100001 | a |
98 | 01100010 | b |
99 | 01100011 | c |
100 | 01100100 | d |
101 | 01100101 | e |
102 | 01100110 | f |
103 | 01100111 | g |
104 | 01101000 | h |
105 | 01101001 | i |
106 | 01101010 | j |
107 | 01101011 | k |
108 | 01101100 | l |
109 | 01101101 | m |
110 | 01101110 | n |
111 | 01101111 | o |
112 | 01110000 | p |
113 | 01110001 | q |
114 | 01110010 | r |
115 | 01110011 | s |
116 | 01110100 | t |
117 | 01110101 | u |
118 | 01110110 | v |
119 | 01110111 | w |
120 | 01111000 | x |
121 | 01111001 | y |
122 | 01111010 | z |
123 | 01111011 | { |
124 | 01111100 | | |
125 | 01111101 | } |
126 | 01111110 | ~ |
127 | 01111111 | DEL (non-printable) |