Dec 25, 2009

Sort by Multiple Keys in Groovy

kobo-commons@github

Problem (or limit) of the Groovy's original sort methods


You can sort collections, arrays, iterators and maps by a single key using only Groovy Core API, as follows:
assert [1, 2, 3, 4, 5] == [5, 3, 1, 4, 2].sort{ it }
It's very useful. I like this methods.

Now, when the following Person class and instances are given, it assumes that you want to sort some list of people ordered by lastName and familyName.
class Person {
    def lastName, familyName
}
def aa1 = new Person(lastName:'aa', familyName:1) // Are these strange names? Never mind!
def aa2 = new Person(lastName:'aa', familyName:2)
def b1  = new Person(lastName:'b',  familyName:1)
def b2  = new Person(lastName:'b',  familyName:2)
def c1  = new Person(lastName:'c',  familyName:1)
def c2  = new Person(lastName:'c',  familyName:2)
def c3  = new Person(lastName:'c',  familyName:3)

In the following sample, it seems to work as you expected.
def people = [b2, c2, c3, b1, c1]
assert [b1, b2, c1, c2, c3] == people.sort{ [ it.lastName, it.familyName ] }
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                          a list in closure has multiple keys
But it's just accidental.

See the following sample code. Do you think that Groovy’s original sort works as we expected?
// our expected result
def people = [c3, aa1, b2, c1, b1, aa2, c2]
assert [aa1, aa2, b1, b2, c1, c2, c3] == people.sort{ [ it.lastName, it.familyName ] }

The answer is NO. This code throws an AssertionException. Groovy’s sort method returns a list like the following:
// actual result.
def people = [c3, aa1, b2, c1, b1, aa2, c2]
assert [b1, b2, c1, c2, c3, aa1, aa2] == people.sort{ [ it.lastName, it.familyName ] }
                           ^^^^^^^^^^

The reason is as follows. The original sort logic is finally based on the value of Object#hashCode(), when the object doesn't implement a Comparable interface.
// groovy.util.OrderBy
public int compare(T object1, T object2) {
    for (Closure closure : closures) {
        Object value1 = closure.call(object1);
        Object value2 = closure.call(object2);
        if (value1 == value2) {
            continue;
        }
        if (value1 == null) {
            return -1;
        }
        if (value1 instanceof Comparable) {
            Comparable c1 = (Comparable) value1;
            int result = c1.compareTo(value2);
            if (result == 0) {
                continue;
            } else {
                return result;
            }
        }
        if (value1.equals(value2)) {
            continue;
        }
        return value1.hashCode() - value2.hashCode(); // hashCode!!!
    }
    return 0;
}
The value of "aa".hashCode() is larger than the value of "b".hashCode(), so you've gotten the above result.
This behavior confuses us very much.


Our Solution


We provide the sort methods which can sort rightly by multiple keys.

kobo-commons@github

Sample codes are as follows:
import org.jggug.kobo.commons.lang.CollectionUtils
CollectionUtils.extendMetaClass()
def people = [c3, aa1, b2, c1, b1, aa2, c2]
assert [aa1, aa2, b1, b2, c1, c2, c3] == people.sort([ { it.lastName }, { it.familyName } ])
assert [aa1, aa2, b1, b2, c1, c2, c3] == people.sort({ it.lastName }, { it.familyName })
assert [aa1, aa2, b1, b2, c1, c2, c3] == people.sort { it.lastName }, { it.familyName }
assert [aa1, aa2, b1, b2, c1, c2, c3] == people.sort { it.lastName } { it.familyName }

You can use the API as utility, too:
import org.jggug.kobo.commons.lang.CollectionUtils as CU
def people = [c3, aa1, b2, c1, b1, aa2, c2]
assert [aa1, aa2, b1, b2, c1, c2, c3] == CU.sort(people, [ { it.lastName }, { it.familyName } ])
assert [aa1, aa2, b1, b2, c1, c2, c3] == CU.sort(people, { it.lastName }, { it.familyName })

Note


We found that a useful API for this idea of sorting by multiple keys is already prepared as a constructor of groovy.util.OrderBy. But it isn’t used. Why?

No comments:

Post a Comment