Timsort

from Wikipedia, the free encyclopedia

Timsort is a hybrid sorting algorithm that is derived from merge sort and insertion sort . It is designed to work quickly on a variety of real data. It was developed in 2002 by Tim Peters for use in Python and is the standard sorting algorithm in Python from version 2.3. It is now also used in Java SE 7 and on the Android platform .

functionality

Tim Peters describes the algorithm as follows:

“[…] An adaptable, stable natural merge sort, which is modestly called Timsort (hey, I deserve it <wink>). It is more powerful than natural merge sort when sorting by many types of partially sorted arrays (less than lg (N) comparisons! Required, even down to N-1), but as soon as the hybrid Python previously used, highly optimized Samplesort the Sort random arrays.

In short, the main routine goes through the array from left to right, alternately identifying the next presorted partial sequence or "intelligently" merging it with the previously recognized presorted partial sequences. The rest is used for acceleration and the hard-won improvement of storage efficiency . "

- Tim Peters :

It finds already sorted sections in the data. Sections sorted in descending order are reversed. It is then checked whether the length of these sections reaches the minimum section length for the respective array size. The minimum section length depends on the size of the array. For arrays with fewer than 64 elements, the minimum section length is the entire array, so that Timsort corresponds to an insertion location in this case. For larger arrays, a number between 32 and 64 is selected as the minimum section length, so that the size of the array divided by the minimum section length is equal to one or a little less than a power of two . The algorithm simply uses the six highest bits of the array length and adds one if at least one of the other bits is still set. If a section does not reach the minimum section length, Insertionort will enlarge it until it is long enough. The sections are then merged into a fully sorted array using merge sort.

Complexity and efficiency

Like Mergesort, Timsort is a stable , comparison-based sorting process with a best-case complexity of and a worst- and average-case complexity of .

According to information theory , no comparison-based sorting method can manage with less than Ω ( n log n ) comparisons in the average case. Timsort often needs significantly less than Ω ( n log n ) comparisons on real data , because it benefits from the fact that parts of the data are already sorted.

Known bugs

In February 2015, the Amsterdam computer scientist Stijn de Gouw discovered that all implementations of the Timsort algorithm contain an error. This error has no practical effect in the Python implementation, since it can only occur on computers with a lot of memory that do not currently exist. Nevertheless, the bug was fixed. For Java, however, it was possible to construct an input that crashes the program. Here, too, the error was corrected shortly after it became known.

Individual evidence

  1. jjb: Commit 6804124: Replace "modified mergesort" in java.util.Arrays.sort with timsort . In: Java Development Kit 7 Hg repo . Archived from the original on September 4, 2012. Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. Retrieved February 24, 2011. @1@ 2Template: Webachiv / IABot / hg.openjdk.java.net
  2. Class: java.util.TimSort <T> . In: Android JDK 1.5 Documentation . Retrieved February 24, 2011.
  3. a b Tim Peters: timsort . In: Python Issue Tracker . Retrieved February 24, 2011.
  4. Tim Peters: [Python-Dev] Sorting . In: Python Developers Mailinglist . Retrieved on February 24, 2011: "[Timsort] also has good aspects: It's stable (items that compare equal retain their relative order, so, eg, if you sort first on zip code, and a second time on name, people with the same name still appear in order of increasing zip code; this is important in apps that, eg, refine the results of queries based on user input). ... It has no bad cases (O (N log N) is worst case; N-1 compares is best). "
  5. Alex Martelli: Python in a Nutshell (In a Nutshell (O'Reilly)) . O'Reilly Media, Inc., 2006, ISBN 0596100469 , p. 57.
  6. http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
  7. http://bugs.python.org/issue23515
  8. https://bugs.openjdk.java.net/browse/JDK-8072909

Web links