For beginners it is difficult to understand what a variable is. They are used in nearly all high-level languages and therefore vital for a good understanding of programming
Variables can have different meanings outside programming. Here I only describe variables in computer science.
A variable is a name (identifier) and a storage location (memory address).
But we should distinguish between the variable when you write code, when it is compiled, and when the program is running:
Variables in Source Code
When you write code you can use identifiers (in most languages a sequence of letters and other printable symbols that doesn’t start with a number) to declare and use variables. The declaration defines the name (and possibly a type). You don’t give it a memory address, but the type might define how much memory is required for that variable. A declaration of a variable with the name “foo” and the type “int” looks like this in Java:
“int” is a type that requires 4 bytes (32 bits). We say that this is one variable. We can have a second variable named “foo” as long as it’s not in the same scope. But later it might be used multiple times (i.e. every time the method with that variable is executed).
Then you can initialise the variable. Or you do that in just one line, but that doesn’t change anything. In most languages you must declare it first.
foo = 42;
So now there’s a reference to a value of 42 at the address of that variable. We can say “foo is 42”. What we really mean is that in our code we use the identifier “foo” to use the memory location which is initialised to store the value 42. But we are not running that code yet. So there’s no 42 stored anywhere while we write the code.
Then you can alter that value. The following increments whatever value is stored at the memory location associated to the identifier “foo” by one.
Maybe this line is in a loop or in a method that is called repeatedly. We can use the same identifier to read the current value.
It’s important to use good variable names, but only so that other programmers understand the meaning (including yourself when you revisit that code years later).
Variables at Compile-Time
In most cases the compiler will look for all identifiers that declare a variable. It creates a list of all variables. Most languages also manage the type, scope and visibility of each variable and where it is initialised. When you access a variable that was not declared in this scope or one without proper visibility you get a compiler error.
For each variable the compiler sees it must find some memory location that can be used. Modern dynamic languages can do this at runtime. But a lot of them are just local variables inside some method. In that case the stack can be used. The stack frame for that method allows for data that is only used inside the method. Fields of objects on the other hand are part of that object, so the memory is in the heap space. In both cases it’s important to know the offset to the beginning of the stack / object data because the compiler can’t know where that stack frame / object will be exactly during runtime.
The compiler translates the code to machine code or intermediate code (i.e. byte code for Java). The compiler can compile “foo = 42” so code that looks very similar, but it can’t use “foo”. The name doesn’t make it to the compiled code. Instead it just uses some memory address or offset to some pointer that will be known at runtime. Sometimes the compiler can just use a CPU register instead of storing the value at some location in the RAM.
Variables at Runtime
Once the program is running the compiled code is executed. Most variable names don’t exist. But a debugger can still map the values in memory to the names used in the source code. That’s one reason why debugging is much slower. In Java you can access a public field (when that exists for some reason instead of a getter) by name, so in the case of dynamic languages you still have some variable names. But in most cases the memory addresses and offsets are used. And the CPU has registers that can be used.
With stack machines we can easily have recursive methods. Such a method invokes itself from within its own code. So on each recursion we get a new stack frame. That’s how the same variable can exist more than once at the same time. This is no problem because the runtime only needs to know the offset to the beginning of the current stack frame. When a method returns, the runtime just jumps back to the beginning of the last stack frame.
What Variables are not
A variable is not the value and it’s not some object. And the name of the variable is not part of the value or referenced object. You can do this:
Person bob = new Person("Jane"); Person mary = bob;
The compiler doesn’t care. During compilation “Person bob” is just the information that a reference to an object of type “Person” is needed. The name could be “_$”, “jane”, or “thisIsNotJane” instead. Only programmers care about variable names.
The name of a variable is not something you can use at runtime. If you need this information you must use a map (i.e.
java.util.Map<String, Person> in Java) instead.
Map<String, Person> people = new TreeMap<>(); people.add("Bob", new Person("Bob"));
In mathematics we also use variables, but for different reasons. When you have an equation you can solve for a variable, but its value doesn’t just change. However, it could depend on some other variable.
Let’s say y = x²+6. If x is 1 then y is 7. But when x is 6 then y is 42. Both are true at the same time. We don’t actually set x or y to a value. Instead we say that f(x) = x²+6. Then we evaluate f(6), which is f(6) = 6²+6 = 42.
This is quite different from programming in most languages (C, Java), where the values change depending on the code that is executed and other values in memory. However, you can also define a function
static int f(int x) and have it return the value of y at the given x. In functional programming languages (such as Haskell) a variable is a name for some expression, which depends on input, but a variable’s value can’t be changed during runtime.