Data and its Representation in the digital computer

Next Topic(s):

Created:
29th of July 2025
03:45:34 PM
Modified:
29th of July 2025
05:22:33 PM

Data Representation

Imagine every song, message or video you enjoy as a mosaic of tiny tiles—each tile only black or white, off or on. In computers, those tiles are bits, the smallest unit of information. Bits group into bytes, and with those building blocks, we paint every image, play every tune and display every character you see on screen.

Bits and Bytes

A single bit holds one of two values: 0 or 1. When you collect eight bits together, you get a byte. A byte can represent 256 different values (from 0 to 255). That’s why file sizes grow by bytes, kilobytes (1 024 bytes), megabytes (1 024 KB), and beyond.

Character Sets

Bytes become meaningful when we agree on a character set—a table that maps each numeric value (0–255) to a letter, symbol or control code. Without a shared map, “65” might be “A” on one machine and “क” on another.

ASCII Character Set

Born in the 1960s, ASCII uses seven bits (0–127) to define English letters, digits, punctuation and basic controls. For example:

  • 65A
  • 97a
  • 480
  • 10Line Feed (LF)
  • 13Carriage Return (CR)

Since each ASCII code fits in one byte (with the top bit zero), it tied neatly to the 8-bit bytes of early hardware.

UTF-8 and Why It’s Preferred

Modern computing spans every language and script. UTF-8 extends ASCII by using one to four bytes per character, while keeping all 0–127 codes unchanged. This backward compatibility plus efficient encoding of common characters makes UTF-8 the global standard.

For instance:

  • “வணக்கம்” (“Vannakkam” in Tamil) becomes the byte sequence E0 AE B4 E0 AE A3 E0 AF 8D E0 AE 95 E0 AE BE E0 AE AE E0 AF 8D.
  • “नमस्कार” (“Namaskar” in Hindi) becomes E0 A4 A8 E0 A4 AE E0 A4 BF E0 A4 95 E0 A4 BE E0 A4 B0.

Under the hood, each Tamil or Devanagari code point (above 1 000) expands into a unique multi-byte pattern that UTF-8 decoders reverse back into the correct glyph.

Internal Representation of Data

Everything a computer works with—numbers, text, images—is stored as bits. Integers use binary and often two’s-complement for negatives; floating-point values follow the IEEE 754 standard with sign, exponent and mantissa fields. Even colours in pictures become three bytes for red, green and blue channels.

Non-printing Characters (CR, LF and CRLF)

Not all bytes map to visible symbols. Some are control codes that steer printers or screens:

  • CR (13) moves the cursor back to the line’s start.
  • LF (10) advances the cursor down one line.
  • CRLF (13 followed by 10) is the Windows convention for a new line; Unix systems use LF alone, and older Macs used CR alone.

Special Characters

Beyond letters and digits lie symbols and whitespace:

  • Tab \t (ASCII 9) jumps the cursor to the next stop.
  • Bell \a (ASCII 7) once rang a terminal bell.
  • Escape \e (ASCII 27) starts sequences that change text colour or move the cursor in terminals.

Understanding bits and bytes, ASCII’s limits and UTF-8’s power, plus how control and special codes shape every line of text, you’re ready to see how different languages, including Python turns these raw patterns into commands you can write and read. Next, we’ll explore how to input and output this data, from serial cables to cloud APIs, all through code you’ll create by yourself.

ASCII Character Set

Decimal Binary Symbol
0 00000000 NUL (non-printable)
1 00000001 SOH (non-printable)
2 00000010 STX (non-printable)
3 00000011 ETX (non-printable)
4 00000100 EOT (non-printable)
5 00000101 ENQ (non-printable)
6 00000110 ACK (non-printable)
7 00000111 BEL (non-printable)
8 00001000 BS (non-printable)
9 00001001 HT (non-printable)
10 00001010 LF (non-printable)
11 00001011 VT (non-printable)
12 00001100 FF (non-printable)
13 00001101 CR (non-printable)
14 00001110 SO (non-printable)
15 00001111 SI (non-printable)
16 00010000 DLE (non-printable)
17 00010001 DC1 (non-printable)
18 00010010 DC2 (non-printable)
19 00010011 DC3 (non-printable)
20 00010100 DC4 (non-printable)
21 00010101 NAK (non-printable)
22 00010110 SYN (non-printable)
23 00010111 ETB (non-printable)
24 00011000 CAN (non-printable)
25 00011001 EM (non-printable)
26 00011010 SUB (non-printable)
27 00011011 ESC (non-printable)
28 00011100 FS (non-printable)
29 00011101 GS (non-printable)
30 00011110 RS (non-printable)
31 00011111 US (non-printable)
32 00100000 Space
33 00100001 !
34 00100010 "
35 00100011 #
36 00100100 $
37 00100101 %
38 00100110 &
39 00100111 '
40 00101000 (
41 00101001 )
42 00101010 *
43 00101011 +
44 00101100 ,
45 00101101 -
46 00101110 .
47 00101111 /
48 00110000 0
49 00110001 1
50 00110010 2
51 00110011 3
52 00110100 4
53 00110101 5
54 00110110 6
55 00110111 7
56 00111000 8
57 00111001 9
58 00111010 :
59 00111011 ;
60 00111100 <
61 00111101 =
62 00111110 >
63 00111111 ?
64 01000000 @
65 01000001 A
66 01000010 B
67 01000011 C
68 01000100 D
69 01000101 E
70 01000110 F
71 01000111 G
72 01001000 H
73 01001001 I
74 01001010 J
75 01001011 K
76 01001100 L
77 01001101 M
78 01001110 N
79 01001111 O
80 01010000 P
81 01010001 Q
82 01010010 R
83 01010011 S
84 01010100 T
85 01010101 U
86 01010110 V
87 01010111 W
88 01011000 X
89 01011001 Y
90 01011010 Z
91 01011011 [
92 01011100 \
93 01011101 ]
94 01011110 ^
95 01011111 _
96 01100000 `
97 01100001 a
98 01100010 b
99 01100011 c
100 01100100 d
101 01100101 e
102 01100110 f
103 01100111 g
104 01101000 h
105 01101001 i
106 01101010 j
107 01101011 k
108 01101100 l
109 01101101 m
110 01101110 n
111 01101111 o
112 01110000 p
113 01110001 q
114 01110010 r
115 01110011 s
116 01110100 t
117 01110101 u
118 01110110 v
119 01110111 w
120 01111000 x
121 01111001 y
122 01111010 z
123 01111011 {
124 01111100 |
125 01111101 }
126 01111110 ~
127 01111111 DEL (non-printable)