How Data Is Represented In Memory
How does you computer know what to do when you create a variable?
The first step to understanding memory is understanding what memory looks like. To do this we first need to define a few words.
Bits, Bytes, and Words
Bits: A bit is the fundamental unit of computer memory. A bit can have many physical representations, be it a dent on a CD, a transistor in a memory stick, or a magnetic region on a hard disk. What it boils down to is a bit is either “on” or “off” (1 or 0).
Bytes: A byte is a collection of 8 bits. Collections are part of a larger unit of memory called a word.
Words: A word is simply a collection of 4 bytes, or 32 bits.
Why is this important?
At the lowest level all data is just a collection of bits.
For example, the letter “A”. Every letter has an ASCII value associated with it. This is just a value that the computer reads and interprets and stores in memory. “A” has an ASCII value of 65. 65 in binary is 01000001. More on this later.
Hexadecimal
Hexadecimal is more efficient way of representing memory. In hex, each digit represents a value between 0–16, 0–9 being represented as numbers, 10–16 are represented as the letters A-F. A single hex digit represents 4 bits, therefore, each byte only requires 2 hex digits.
Hex is commonly represented with “0x” preceding it.
The letter “A” in binary is represented as 0100 0001 in binary, or 0x41 in hex.
Common Datatypes
Integer: We know that an integer is any positive whole number. Generally integers are stored using one word of memory. This gives us integers from 0 to 4,294,967,295 (2³²-1).
This representation has two major flaws.
First, we can’t represent negative numbers. This problem is easily solved by using the first bit of a number to be what is called a signed bit. Unfortunately this limits us to numbers in the range -2,147,483,647 to 2,147,483,647 (+- ²³¹-1).
Second, we can’t represent numbers larger than 2²³. Numbers larger than this can’t be stored in a word of memory. This problem is less easily solved as we only have a finite amount of memory on a computer.
One solution is using more memory to store integers, for example two words per integer, but this uses more memory and is therefore less efficient.
Another possible solution is storing integers as real numbers, but this can cause inaccuracies because of the way real numbers are represented.
A final and definitely more complicated solution is to use arbitrary precision arithmetic, which uses as much memory per integer as is needed, but makes calculations with the values slower.
Characters: A letter is typically stored using a single byte. Each letter is assigned an integer (ASCII representation). The computer first converts the letter to its respective integer value, then stores the integer.
For example the letter “A” has an integer value of 65, which is represented as 01000001 in binary. The word “hello” is represented as 104, 101, 108, 108, 111 which translates to 01101000 01100101 01101100 01101100 01101111 in binary.
This process is called encoding.
ASCII is just one of several methods for encoding text. ASCII is great for English, but we run into issues with other languages, such as Japanese, which can have thousands of characters.
Another common encoding is UNICODE. UNICODE attempts to assign every character of every language a number. Generally, UNICODE uses two bytes per character, allowing it to represent a much broader range of characters.
The final common encoding standard is UTF-8. This is pretty much the same as ASCII, the major difference being that there are three additional bytes at the start. These are referred to as the Byte Order Mark (BOM). This indicates the endianness of the three bytes.
How does the computer know what to do with all this data?
The simple answer is it doesn’t. A computer just knows 1 and 0, on and off. The actual translation falls in the hands of the compiler, but this is an entirely different topic ;)
Sources
http://statmath.wu.ac.at/courses/data-analysis/itdtHTML/node55.html
https://stackoverflow.com/questions/21693685/how-are-different-types-stored-in-memory
https://softwareengineering.stackexchange.com/questions/291950/are-data-type-declarators-like-int-and-char-stored-in-ram-when-a-c-program-e
https://www.geeksforgeeks.org/data-structure-alignment/