We describe a parallel sort for a spatial computer that requires minimal communication. We show simulations of the sort in 1 and 2 dimensions.
This paper introduces spatial computers and presents a sorting scheme that can work with local communication and tolerate non-determinism in communication and synchronization. The authors do an excellent job of taking the reader through the construction of the sorting architecture/scheme — the graphical view of the sorting is excellent and helped me to visualize the collision sort.
I am wondering on the reliability aspects discussed in the paper. In particular, what happens to the data that is assigned to a region (the so-called holes) that fails? Is the data lost too? If so, this would be an approximation that is of different nature than the approximation in the sorting order, that could often be tolerated.
The proposed communication/synchronization scheme is interesting, however it is not clear how the scheme or the “colliding particles” model can be applied to other more general synchronization needs. From this perspective, I am not sure how much it fits into the RACES workshop’s scope. On the other hand, I think the proposed model is an interesting one for this community to know. So I will leave it to the voting to decide on the paper’s suitability for this workshop.
While I was reading about the regular spatial elements with local communication, I was often reminded of systolic arrays. A discussion of them would be a good addition to the related work section.
The paper introduces Collision Sort, a “spatial” sorting algorithm optimized for parallel evaluation. While not as efficient as other sorting algorithms, collision sort has the benefit that it does not require global communication (only local communication between neighbors) and that it can tolerate failures. The characteristic of local communication becomes important when considering parallelization on modern many-core hardware, which often has substantial non-uniform memory access times.
I like the concise description of the algorithm. If possible, adding some pseudo-code describing the algorithm could be helpful as well.
After reading the paper, one feels confident that the proposed algorithm is indeed a sorting algorithm, and that it seems to converge as desired. However, as Collision Sort seems to have been designed with scalability on highly parallel hardware in mind, I feel the paper (or a presentation at the workshop) could be improved if it would relate some (preliminary) performance characteristics of the algorithm on actual parallel hardware.
Given that the algorithm appears to be already implemented on a parallel Smalltalk VM, conducting such experiments would seem feasible. It would be especially interesting to study whether the workload stays nicely distributed over all cores until the end, or whether one instead observes local slowdowns, which would perturb the system and eventually lead to global slowdowns as faster cores must wait for input from slower cores.
This paper is the full meal deal—it’s a really interesting algorithm, and the presentation on how it scales and handles failures is very nicely done. I couldn’t help myself from wanting some very simple performance evaluation, or reasoning about expectations relative to other more conventional approaches. But the paper brings more than enough to the table as it is. I can see this spurring great discussions about new approaches to parallel algorithms, evolution of other algorithms that may need to consider adaptive load balancing, and metrics to consider in evaluation.