@SafeVarargs and Heap Pollution

When should we use this annotation? How can we prevent heap pollution?

Whenever the compiler tells me that there is possible heap pollution I just add this annotation. And most other programmers probably do the same. it might give you a bad feeling using something that seems like it might be still unsafe. And that could very well be that case. The explanations I found on the internet do not seem very useful to me. And they often ignore that Java does indeed have many flaws that should be addressed and not ignored. In this blog post I give some examples and explanations on the topics of heap pollution and (un)safe variadic arguments, in the hope that they help better understand the risks that come with passing around arrays.

What are Varargs?

Some methods accept a variable argument list. That means there is a variable number of arguments on a method and we also use the term variadic method.

Here’s a simple example of a method that accepts any number of Strings:

private static void varargsExample(String... strings) {
    for (String string : strings) {
        System.out.println(string);
    }
}

More about this: Java SE 8 Documentation > Varargs

Note that Java compiles the method to accept an array. If you write String... it actually just accepts String[]. So this is actually all about how Java handles array types.

What is heap pollution?

In Java the heap space is used for the dynamic memory allocation of objects and classes at runtime. The heap is the memory space that holds all the data that the program needs. The stack, which has references to objects in the heap, is separate. Because Java is mostly object-oriented, there are lots of objects and classes in the heap, while the stack only has primitive values and references to those objects. The heap can have all types of objects, and so the name “heap pollution” is a bit confusing. It’s not that some objects shouldn’t be inside the heap space. They are there anyway. The actual pollution happens when there is a data structure, such as a list or array, which contains other objects, but those are not of the expected types. Or more generally, whenever access to some data on the heap produces a reference to an object that is not of the expected type.

Here’s a simple example that the Java compiler will prevent:

List<CharSequence> list = new List<String>(); // Type mismatch!

The java compiler will tell you that it “cannot convert from LinkedList<String> to List<CharSequence>“. That is because if this was allowed you would be able to add a StringBuffer to that List. But the List only allows instances of String. A StringBuffer is a CharSequence, but not a String. You can use List<? extends CharSequence>, but you still couldn’t add a StringBuffer because it’s not known what elements can be added to that list. The Java compiler successfully prevents this type of heap pollution.

List<? extends CharSequence> list = new LinkedList<String>(); // ok
list.add(new StringBuffer()); // not applicable

Another heap pollution is prevented. The compiler tells you that the method “is not applicable for the arguments (StringBuffer)”. I won’t go into the details of the Liskov substitution principle (LSP) here, because I want to focus on varargs and heap pollution. But to understand why the method Collection.add(E e) of a List<? extends CharSequence> (i.e. the generic type parameter <E> is ? extends CharSequence) can’t accept a StringBuffer even though StringBuffer extends CharSequence, you must understand LSP first.

Here is another example. But this time the Java compiler will not detect the problem because it doesn’t follow the Liskov substitution principle by not behaving properly when the given Number[] is actually an Integer[]:

// This is where the polluted heap causes an Exception.
public static void main(String[] args) {
	Number[] numbers = new Integer[] { -8, 2, 42 };
	increment(numbers); // throws ArrayStoreException
	System.out.println(Arrays.toString(numbers));
}

// This method fails to check that the array 
// actually allows to store doubles.
static void increment(Number... numbers) {
	for (int i = 0; i < numbers.length; i++)
		numbers[i] = numbers[i].doubleValue() + 1;
}

For some reason there is no compiler error or even warning that the code stores a certain type (java.lang.Double) to an array that might not allow it. The heap pollution happens on the very first line of main, when a reference to Number[] points to an Integer[]. You may think that this should be ok because every Integer is a Number. That is true, but a Number[] should accept any Object that implements the Number interface. Integer[] doesn’t allow that. You can’t add a Double to the referenced Integer[].

This post is about varargs, but those are actually just arrays. And arrays aren’t actually type-safe in Java. That’s why we get the heap pollution. This comes from the fact that we can assign any subtype of an array to a variable (including parameters of methods and constructors). The following code doesn’t produce any errors or warnings during compilation and only at runtime we get an Exception:

Object[] objects = new Integer[] { 1, 2, 3 }; // not even a warning
objects[0] = new Object(); // java.lang.ArrayStoreException

As you can see the problem is only about how the java compiler checks the types of arrays that are assigned to a variable. It’s not about methods, generics, or varargs.

But let’s have a look at some examples that use generics and varargs. Since Java 8 we have java.util.Optional<T>, which is a great alternative to null. If we alter the increment() method from above to accept Optional<? extends Number>... instead of just Number..., we suddenly have to add @SafeVarargs:

public static void main(String[] args) {
  Optional<Integer>[] numbers = getIntegers();
  increment(numbers); // will contain Optional<Double>
}
@SafeVarargs
static void increment(Optional<? extends Number>... numbers) {
	for (int i = 0; i < numbers.length; i++)
		numbers[i] = numbers[i].map(n -> n.doubleValue() + 1);
}

Note that the problem with the heap pollution is somewhat different. We now can only put an Optional into the array, and that’s what the method does. However, it still pollutes the heap, because that Optional will contain a Double, not an Integer. Since we accept Optional<? extends Number>..., it should not compile. But the Java compiler allows it.

If we simply use a List instead of an array, we do not get this problem:

public static void main(String[] args) {
	List<Optional<Integer>> integers = getIntegers();
	increment(integers); // not applicable!
	List<Optional<? extends Number>> numbers = getNumbers();
	increment(numbers); // ok
}
static void increment(List<Optional<? extends Number>> numbers) {
	for (int i = 0; i < numbers.size(); i++)
		numbers.set(i, numbers.get(i).map(n -> n.doubleValue() + 1));
}

The reason is simply that the Java compiler actually checks the types and so you have to use the correct one, which is List<Optional<? extends Number>>. The problem in the example is about the use of an array, not the use of varargs or generics.

Maybe in a future release of Java we will get new array types, such as Array<T> and even ReadonlyArray<T>, and arrays that have a maximum size larger than Integer.MAX_VALUE (i.e. the type of length would be long). Then we would get Array<? extends Number> as the actual type of Number... and all old code that didn’t already treat it as that would have to be corrected.

Here’s the example you can find in the Javadocs of @SafeVarargs:

 @SafeVarargs // Not actually safe!
 static void m(List<String>... stringLists) {
   Object[] array = stringLists;
   List<Integer> tmpList = Arrays.asList(42);
   array[0] = tmpList; // Semantically invalid, but compiles without warnings
   String s = stringLists[0].get(0); // Oh no, ClassCastException at runtime!
 }

But the use of varargs and generics is completely irrelevant here. This version doesn’t use either and still isn’t safe:

 static void m(String[][] stringArrays) {
   Object[] array = stringArrays;
   Integer[] tmpArray = { 42 };
   array[0] = tmpArray; // Oh no, ArrayStoreException
   String s = stringArrays[0][0]; // This would also not work
 }

As you can see, varargs and generics do not add much to the problem. Arrays aren’t type-safe in Java and so heap pollution is possible. The exception is different, but you still end up with all kinds of problems when you accept some array and write to it.

Here is a more useful example of a method that seems to be type-safe and shouldn’t pollute the heap, but does it anyway:

public static void main(String[] args) {
		Integer[] numbers = { 5, null, 74, -3 };
		replaceNulls(Double.NaN, numbers);
		System.out.println(Arrays.toString(numbers));
}

static <T> void replaceNulls(T value, T[] data) {
	for (int i = 0; i < data.length; i++)
		if (data[i] == null)
			data[i] = value;
}

This happens when you forget that in Java the value “Not a number” (Double.NaN) is not an Integer. This is an easy mistake and the Java compiler just says nothing. Not even a warning.

Note that the method replaceNulls is called using the type Number, because both Integer and Double implement that interface. But Java allows us to pass an Integer[] instead of Number[]. It really shouldn’t allow that. It wouldn’t do that if the method signature was replaceNulls(Object value, Object[] strings). It would just not be type-safe because everything is an Object (primitive integers get autoboxed to Integer, which extends Object) and the compiler wouldn’t warn you about it.

Now let’s see what happens when we make it accept a generic variable arity parameter:

static <T> T[] replaceNulls(T value, T... data) {
	for (int i = 0; i < data.length; i++)
		if (data[i] == null)
			data[i] = value;
	return data;
}

Now we can use it like this:

var numbers = replaceNulls(Double.NaN, 5, null, 74, -3);

All that was changed is that T[] was replaced by T... and that we return the same array. But now we get the warning: Type safety: Potential heap pollution via varargs parameter strings

But the problem already exists without the variable arity parameter. Even when using Object[] as the type of data. It’s not really about generics or varargs. It’s almost the same method signature with the only difference that callers can list the elements instead of passing the array and then just use the returned array. This is purely for convenience and actually works well, because the Java compiler is actually quite good at figuring out the types for you. Java will infer that the type of var numbers is actually Number[]. And if you try to pass it to a method that only accepts Integer[], you get an error.

To further improve the robustness of this method you can use reflection to check that the type of the array is compatible with that of the value:

@SafeVarargs
static <T> T[] replaceNulls(T value, T... data) {
	var componentType = data.getClass().getComponentType();
	var valueType = value.getClass();
	if (!componentType.isAssignableFrom(valueType))
		throw new IllegalArgumentException(
				MessageFormat.format("Given {0}[] would not accept {1}", componentType, valueType));
	for (int i = 0; i < data.length; i++)
		if (data[i] == null)
			data[i] = value;
	return data;
}

It will throw an exception with a message like this:
Given class java.lang.Integer[] would not accept class java.lang.Double

Note that this can still lead to programming mistakes when someone passes an array and expects that the method creates and returns a copy instead of altering the existing array.

Conclusion

In my opinion the @SafeVarargs annotation doesn’t really help anyone. I don’t know of any equivalent in any other programming language. If they ever introduce immutable arrays to Java we can actually improve type safety of arrays because you can’t pollute the heap when the arrays are actually immutable. As long as you do not alter the arrays passed to methods, there can’t be any heap pollution. In other words: Do not change the arrays that got passed to your method because you never really know what the caller is expecting to be inserted by the method.

  • Only read from the array passed to a method, never write to it.
  • Only use variadic methods when they are actually supposed to be used as such, i.e. when the values are supposed to be explicitly listed in the method call.
  • Use other (generic) data structures instead, when the method is supposed to write to the given data structure.
  • Consider creating a copy of the array / data structure for even more robust programming.
  • Use the newest long-term support release of Java, which might have better solutions to deal with such problems.
  • Use some tools that are much better at detecting such problems than the Java compiler, such as SpotBugs.
  • Understand that the use of generics / varargs is not actually the cause of such problems.

Leave a Reply

Your email address will not be published. Required fields are marked *