Scala collections and equality
Scala collections follow a simple set of general equality rules[1]. Two collections are equal if:
- they are in the same overall category (
Seq
,Set
, orMap
) - they contain the same elements (as defined by
==
) - for sequences, the elements are in the same order
This means the following:
scala> List(1, 2, 3) == Vector(1, 2, 3)
res0: Boolean = true
scala> List(1, 2, 3) == Set(1, 2, 3)
res1: Boolean = false
scala> Set(1, 2, 3) == collection.mutable.LinkedHashSet(3, 1, 2)
res2: Boolean = true
Arrays don’t behave
The gotcha is that these rules break down with arrays:
scala> List(1, 2, 3) == Array(1, 2, 3)
res3: Boolean = false
And even:
scala> Array(1, 2, 3) == Array(1, 2, 3)
res4: Boolean = false
Why is this? The answer is that arrays in Java are treated specially all the way down to the JVM, and Scala arrays are just plain Java arrays. It is not possible to extend a JVM array, and this means in particular that it is not possible to override equals
on arrays. Since native array eqality via equals
does not compare array content, arrays are left to behave differently from Scala collections.
Solutions
So how do you go about comparing arrays in a way compatible with other Scala collections? First, there is an implicit conversion from Array
to collection.mutable.WrappedArray
, which is a Seq
, so you can write:
scala> (Array(1, 2, 3): WrappedArray[Int]) == (Array(1, 2, 3): WrappedArray[Int])
res5: Boolean = true
Or even shorter:
scala> (Array(1, 2, 3): Seq[Int]) == (Array(1, 2, 3): Seq[Int])
res6: Boolean = true
Of course this also works if you pass an array to anything which expects a Seq
:
scala> def sameStuff(s1: Seq[Int], s2: Seq[Int]) = s1 == s2
foo: (s1: Seq[Int], s2: Seq[Int])Boolean
scala> sameStuff(Array(1, 2, 3), Array(1, 2, 3))
res7: Boolean = true
From a memory perspective, this is not too bad because WrappedArray
doesn’t make an expensive copy of the collection and instead just wraps the array.[2] But there is boxing going on with WrappedArray
because it’s not a value class [3] (or rather concrete implementations of it are not value classes). Value classes cannot override equality[4], and a WrappedArray
couldn’t be a proper Seq
if that were the case.[5]
There is another way to compare array content with sameElements
:
scala> Array(1, 2, 3) sameElements Array(1, 2, 3)
res8: Boolean = true
The benefit is that this more explicitly states the intent, and you don’t need to convert the arrays to Seq
via type declarations.
Nested arrays
Both solutions above only work if arrays are not nested. Consider:
scala> Array(Array(11), Array(21, 22)) sameElements Array(Array(11), Array(21, 22))
res9: Boolean = false
The reason is that sameElements
(or ==
on WrappedArray
) just calls ==
on each array element. If those elements are arrays, Java array equality kicks in again (and we know that array content is not compared in that case).
Instead you can use the deep
method:
scala> Array(Array(11), Array(21, 22)).deep == Array(Array(11), Array(21, 22)).deep
res10: Boolean = true
The deep
method wraps arrays so that each access to an array element is first checked: if the element itself is an array, it is wrapped with WrappedArray
first.[6] This way equality recursively works (but only if each array is directly nested in an array!).
It is generally safe to use deep
instead of sameElements
or a conversion to WrappedArray
, but the implementation of deep
requires a number of pattern matches, which are not known to be the fastest. If the arrays are known to be flat, the other approaches might be more efficient.
Case classes
Consider this case class:
scala> case class Foo(a: Array[Int])
defined class Foo
scala> Foo(Array(1)) == Foo(Array(1))
res11: Boolean = false
By now you know why this happens: the case class provides an implementation of ==
for you, but doesn’t treat arrays specially. If equality matters to you (and it probably should if you use case classes), it is better to write instead something like this:
scala> case class Foo(a: Seq[Int])
defined class Foo
scala> Foo(Array(1)) == Foo(Array(1))
res12: Boolean = true
Here the case class actually refers to a WrappedArray
, obtained via implicit conversion from the original array.
Words of wisdom
The equality issue suggests that it is wise to avoid arrays when possible. If you cannot avoid them, beware of the semantic of equality on them! But since arrays offer interoperability with Java and are compact, native data structures that offers performance benefits, it’s often hard to live without them entirely.
There are a number of StackOverflow questions that cover this topic as well.[7] Please let me know if I omitted anything important!
-
See The Scala 2.8 Collections API - Equality for details. ↩
-
For an
Array[Int]
, the Scala implementation looks like: ↩final class ofInt(val array: Array[Int]) extends WrappedArray[Int] { ... }
On the other hand it doesn’t seem like
WrappedArray
usesjava.util.Arrays.equals()
so the actual comparison performance might not be absolutely optimal (but I don’t have numbers). -
Value classes are new in Scala 2.10. ↩
-
“A value class… may not define a equals or hashCode method” ↩
-
On the other hand
ArrayOps
are value classes: ↩trait ArrayOps[T] extends Any with ...
This is ok because the purpose of
ArrayOps
is to provide extension methods on native arrays, and equality cannot be implemented as an extension method:scala> (Array(1, 2, 3): ArrayOps[Int]) == (Array(1, 2, 3): ArrayOps[Int]) res12: Boolean = false
-
I find
deep
a bit funny: it returns anIndexedSeq[Any]
instead of being generic. I am not sure why that is, but for the purpose of comparing array content it doesn’t matter. ↩ -
See: ↩