Sort methods added to ls class for Synergy/DE
Sterling Camden
Synergy/DE provides two built-in sorting functions, neither of which are useful for in-memory objects. That’s not too surprising, considering that objects are relatively new to Synergy/DE.
The SORT statement performs a Tournament Sort, but it only works on files. The QSORT subroutine performs an in-memory, in-place QuickSort, but it only works on primitive data types. Thus, the next logical extension to my ArrayList subclass ls would be the addition of sorting methods.
Algorithms
Naturally, I wanted to choose an algorithm that would usually require only O(nlogn) comparisons – but I was unsure how the manipulation of ArrayLists and object references would affect the performance of any given algorithm. So, I implemented three that seemed promising: MergeSort and two forms of QuickSort (in-place and not). Then I constructed a benchmark to time each of them on sorting large lists (anything less than 100,000 elements was too quick to be timed accurately). Next, I optimized each algorithm by making changes and timing the results again. This led to some surprising findings – for instance, on the merge phase of MergeSort, it’s more efficient given the Synergy/DE implementation of ArrayList to create a new list and append to it than to insert items from one list into the other. Once I’d optimized the algorithms to the limit of my small powers, the results were as follows (X = array size, Y = number of seconds):
“Quicksort” is the in-place algorithm – its time includes the overhead of copying the list, so if you use the destructive version it performs marginally better. For lists of up to about 900,000 elements, the in-place QuickSort rules, but after that point MergeSort takes the lead. I think that’s because of an optimization I implemented in MergeSort: if the two lists can be appended rather than merged, the second is appended to the first in-place, rather than creating a new list. That favors lists that are already somewhat sorted, but also improves the final stages of the recursion, where we’re assembling very short lists (n <= 2) together. Another factor might be a tendency for QuickSort to lean more towards its O(n2) worst case for larger lists. Because I’m sorting objects, I don’t have an algorithm for picking an intelligent pivot value – I just choose the object in the middle of the list (to give pre-sorted lists an advantage). My benchmark fills each list with random integers (boxed as Vars) and then sorts the same list using each algorithm. The benchmark is included in the download below, as tsort.dbl. It accepts command line switches “-from”, “-thru” and “-by” for specifying the array sizes to test.
Methods
I’ve added the following sorting methods to class ls:
- sort – returns a sorted copy of a list, using MergeSort for lists of 900,000 elements or more – otherwise QuickSort.
- sort$ – performs an in-place sort of the list, using QuickSort.
- quicksort – returns a sorted copy of the list, using QuickSort.
- quicksort$ – same as sort$
- mergesort – returns a sorted copy of the list, using MergeSort.
I’ve also left the Quicksort2 algorithm in the code, as method “quicksort2” – but I’m not including it in the documentation because it’s consistently slower than the other algorithms.
To support MergeSort, I also added two methods: merge and merge$. These merge two lists that are presumed to be pre-sorted. It turns out that merge is faster than merge$, presumably because of the overhead of ArrayList insertion. Thus, MergeSort uses merge.
I’ve also overridden Object.ToString() for the ls class – mostly for debugging purposes. This method outputs the ToString() representation of each of the list members, separated by commas – with the entire list enclosed in square brackets.
Comparisons
In order to sort objects, you have to have some way to compare them as “greater than”, “equal”, or “less than.” In languages with support for Functional Programming, sort functions generally take an argument that is a function or lambda to which two objects are passed for comparison and which returns a value indicating their collational relationship.
In Synergy/DE, you can pass a function around by its name or address, but calling functions lazily (using XSUBR) doesn’t work with parameters or return values that are objects (yet). Jeff Greene came up with the idea of wrapping a function as a method of an object (a functor), and then passing the object around instead of the function. While this adds a bit of extra code for the class declaration and requires that the new class be derived from a specific ancestor, it works. Thus, I created an abstract class Compare that requires one method, “test”, which takes two objects as arguments and returns the standard comparison result as an integer (eq = 0, gt = 1, lt = -1). I also supplied several example derived classes that cover the usual cases. Pass an instance of one of these to any of the sort methods listed above to use the corresponding comparison logic:
- CompareString – compare ToString() of each object. Optional case-insensitivity and natural ordering.
- CompareVar – standard Var comparison, sorting non-Vars to the end.
- CompareVarAlpha, CompareVarDec, CompareVarInt – compare Vars, but coerce to a specific type.
- CompareDesc – reverse another comparison (produces a “descending” ordering).
- CompareList – compare list objects memberwise.
- CompareCar – compare list objects by their first non-list member.
Some of these Compare classes take a Compare as the argument to their constructor. These classes modify the behavior of another Compare class, so you can combine effects. Of course, you can also roll your own comparisons by deriving a class from Compare and overriding method “test”.
CompareString exposes a static method TestNatural that compares two strings using natural ordering. This could be useful elsewhere.
Posted in SynergyDE |
1 Comment » RSS 2.0 | Sphere it!





I have been following your developing ls class with awe and wonderment. One thing I have been trying to do is create strongly-typed ArraysLists which enforce at compile time the type of object stored. For example, if I have a list of Colors I should not be able to add a Shape to it.
The logical way to add strong typing uses your mixin technology to overload the Add method and Indexer property to strongly typed versions.
public method Add, i
required in aNewItem, MEMBER_CLASS
endparams
proc
mreturn Add( (Object)aNewItem )
endmethod
However, in order to change the return type of the indexer, it must be overloaded with the NEW keyword.
new public property indexer, MEMBER_CLASS
required in aIndex, i4
method get
proc
mreturn (MEMBER_CLASS)parent[aIndex]
endmethod
endproperty
There is a catch. Because the indexer has been overloaded with NEW, the foreach statement is no longer allowed to be used on the derived class. I have a tracker requesting that to be changed, but what to do in the interim? In cases where the desire to use foreach supercedes the benefit of strongly typing the ArrayList, I keep using @ArrayList. An alternative is to overload the Add so the contents are strongly typed, but don’t overload the Indexer, relying on the programmer to know what type of object should be used. Does anyone have any thoughts?