How programmers count isn’t really that much different from how anyone else is doing it. But there are some misconceptions about counting and numbering.
Counting is the process of determining the number of elements of a (usually finite) set of items. Sometimes it’s not even clear how that is done. In some East Asian regions it was common to use a different age system, where a baby is one year old right after birth. But now it’s common to round down. During your first year you are zero years old. Then during your second year, you are one year old.
Counting usually starts at zero. Numbering usually starts at one.
If possible, we don’t actually count the items. Often, we can just calculate the number of items in a set. This works well it the set is actually some sequence with a known beginning and end, such as a distance or duration. We know the a century has 100 years, no need to count them.
Another example of one-based counting is the Gregorian calendar, where the year 1 BC is followed directly by year AD 1. Year 0 doesn’t exist. But would we still do it like that today? Do you often count from -1 to 1, skipping 0? Counting years is actually about a number of items in a set that represents a duration. The number of years from the beginning of 2000 AD to the beginning of 2025 AD is simply 2025 – 2020 = 25. The years from the beginning of 1 BC to the beginning of 2 AD is two years because it only contains the years 1 BC and 1 AD. But now you can’t just calculate it as a simple (absolute) difference. There is a year missing. You actually have to count the years or have to handle the special case in the calculation.
Programmers avoid special cases, such as skipping numbers. Like most people we count from zero. That’s because the smallest number of elements in a set is zero. A set is not ordered and so the number of years in a duration that starts in the future and ends in the past is still a positive number. Count the seconds from some point in time to the same point in time, you get zero seconds.
One reason for the confusion is thinking that if the set is empty, then you don’t count anything. So only if the set has at least one element, you actually count the items and in that case you start counting at 1. But it’s easier to initialise your count at zero and then count each item until you have counted all of them. That’s when the counting stops and you have the correct result. So yes, computers are usually programmed to count from zero. A check for an empty set would require more processing by the computer. But for the human mind it might seem more natural to “not count” if there is nothing and “start counting at one” when there is something.
The other reason there is so much confusion about this is that counting is not the same as indexing, but the two get confused all the time – sometimes even by programmers. When you count the years to get a person’s age you probably start at zero. But that person is zero years old during their first year of life. Wouldn’t it be easier if we would generally just use the count and always start at zero an only use the label used for the item when we want to output that as a reference? This is often the case in programming. The age is the number of full years that have passed since birth. That’s something you can count or calculate. When you say “it’s my 5th year living in this city” you don’t give an amount. It’s a label you got from numbering the years, starting at one, when you arrived at that city. Those are not the same.
A set doesn’t have an order and doesn’t allow duplicates. If it really is a set, then you don’t have a “first item”. If there is an order, you can, of course, still count the items, and you probably start counting at the first item if it exists. But in practice it’s almost always a list that the computer holds in memory (i.e. it is ordered and duplicates have to be filtered somehow). The reason for that is mostly that performance is a lot better when items of a set are in sequence, so memory access is faster. That’s an oversimplification, but you have to use some actual data structure to implement the abstract idea of a “set”. That means some item has to be at the beginning of that data structure. You can count the items of anything that contains items. Unless the list is empty, some item is the first in iteration order. I.e. when the computer starts counting, there is some item that is counted first. It is at the position 0. That means the index (or the offset from the beginning of the list) is zero. The valid offsets or a sequence are zero (inclusive) to length (exclusive). Starting at zero, you iterate the items as long as the offset is less than the length. If length is zero, then you don’t get any item, because 0 is already equal to (i.e. not less than) length. In any other case, the first item is at the memory location of the list plus offset 0. The next one, if it exists, is at the memory location of the list plus offset 1. And so on.
That means that if you know the length, you don’t have to count. However, often the length isn’t fixed. In other words, the “length” is actually the size of the memory block and it defines the maximum length of the list and some of the items might be left empty. Then you do have to count the non-empty items. For that you set the result to zero and the offset also to zero. As long as the offset is less than the length you check if the item is empty. When it is empty you stop the loop. When it is not empty, you increment the result by one and continue the loop. Once the loop is finished, you return the result.
Modern programming is often done with high level languages that can easily count items. Most programmers don’t even write the code for that. To count the customers that are below 18 years old you simply do something like this:
int underage = customers.filter(person -> person.age() < 18).count();
Writing code for the case that “customers” is empty to then return zero and in any other case start counting at one rarely makes.
To get the first element of such a list we use the index (= offset):
Customer fifth = customers.get(4) // returns 5th element
So the numbering starts at zero. Exceptions are rare. When using regular expressions you use one-based indexing of the groups. That doesn’t mean you can’t use zero, as it gives you the complete match, so it really is zero-based and the first group is just implicit. One-based indexing is getting rare in programming. It only causes confusion when used in APIs and makes programming unnecessarily complicated. Since the zero-based indexing is often used instead of one-based numbering, it may appear as if programmers generally would prefer starting numbering things at a “zeroth” element. But that’s really just the offset/index of the element in a sequence. The first element is at index 0.
Numbering still usually starts at 1 even for programmers. Most programs use one-based numbering when presenting numbered lists to the user. And programmers re also users of such applications. While debugging it’s important to see at which line of code the problem occurs and all editors I know start numbering the lines at 1. A log output of CustomerService.java:42
means that the problem occurred on the 42nd line in a file named CustomerService.java
.