Concurrent hash table

Simultaneous access to the same hash table.

A concurrent hash table ( English concurrent hash table or concurrent hash map ) is an implementation of hash tables that enables concurrent access by several threads .

Concurrent hash tables are therefore a central concurrent data structure in the parallel calculation , as they enable more efficient, parallelized calculations on common data.

Due to resource conflicts resulting from simultaneous data access, implementations of concurrent hash tables often differ in the way in which the table can be accessed simultaneously. In addition, it is not guaranteed that the acceleration through a concurrent hash table increases linearly with the number of threads, since additional overhead is required for resolving resource conflicts. There are several approaches to minimize this overhead, each of which also ensures that the correctness of the operations on the table is maintained.

As with their sequential counterpart, concurrent hash tables can be expanded to be used in different areas of application (e.g. complex data types for keys and values). However, such extensions can have a negative impact on performance and should therefore be selected according to the requirements of the application.

Simultaneous access

When creating concurrent hash tables, the access algorithms of the hash table must be adapted to allow simultaneous access by several threads. For this purpose, resource conflicts in particular must be resolved in order to guarantee the integrity and correctness of the table. Ideally, these changes should also enable more efficient parallel usage.

Herlihy and Shavit describe how a sequential access algorithm without strategies for resolving resource conflicts - in their example a simple version based on the cuckoo hashing algorithm - can be adapted for concurrent use. Fan et al. Another algorithm based on cuckoo hashing, which allows simultaneous access, maintains the space efficiency of cuckoo hashing and also improves the cache locality and the throughput of inserts.

If hash tables do not have a fixed size and thus grow and shrink as required, the hash function must be adapted to cover all possible keys of the new table. One possibility to design a growing table concurrently is from Maier et al. described.

Resolving resource conflicts

Resource conflict due to simultaneous access (marked in red).

Like other concurrent data structures, concurrent hash tables suffer from a variety of problems caused by resource conflicts. Examples are the ABA problem , race conditions and deadlocks . How strong, or even whether, these problems occur always depends on the implementation; in particular, which operations can be performed on the table at the same time, as well as the strategies for dealing with problems relating to resource conflicts.

In handling resource contention, as with any other concurrent data structure, the primary goal is to ensure the correctness of every operation on the table. Furthermore, this should ideally be designed in such a way that the concurrent implementation still works faster than the sequential version even then.

Atomic instructions

With atomic operations such as compare-and-swap or fetch-and-add , problems caused by resource conflicts can be reduced by ensuring that one access is completed before another access has the opportunity to interfere. Operations such as compare-and-swap, however, often severely limit the data size of the supported variables, which means that the key and value types of a table must be selected or converted accordingly.

Using what is known as hardware transactional storage (HTM), table operations can be viewed in a similar way to database transactions , whereby atomicity is preserved. An example of HTM in practice are the Transactional Synchronization Extensions .

Locking

With the help of locks , operations that try to access the table or its values at the same time can be carried out in a controlled manner so that correct behavior is guaranteed. However, this can have a negative effect on performance, especially if the locks are set too restrictively. This blocks table accesses that would otherwise be possible without problems. Further considerations must be made in order to avoid problems which also endanger the correctness of the entire table. Examples of such problems are e.g. B. Livelocks, Deadlocks or Starvation.

Phase concurrency

Concurrent accesses grouped into different phases.

A phase concurrent hash table groups accesses into phases in which only one type of operation is allowed (i.e. a write-only phase, a read-only phase, etc.). After each phase, the table status is synchronized across all threads. A formally proven algorithm was presented by Shun and Blelloch.

Read-copy-update

Read-copy-update (RCU) mechanisms are widespread in the Linux kernel . These are particularly useful when the number of read accesses far exceeds the number of write accesses.

Applications

Concurrent hash tables are used wherever sequential hash tables make sense. The advantage that the parallelism offers here lies in the possible acceleration of these applications as well as in their increased scalability. In view of hardware such as multi-core processors , which are becoming more and more powerful for parallel calculations, the importance of concurrent data structures is growing steadily.

Performance analysis

Maier et al. conduct a thorough analysis of a large number of concurrent hash table implementations, providing insight into how effective they are in different situations as well as in common real-world use cases.

surgery	Number of conflicts		notes
surgery	Low	High	notes
`find`			Very large speedups as well as successful and unsuccessful unique finds, even with many resource conflicts
`insert`			Large speedups are possible, many resource conflicts have a negative effect on performance if a key can hold more than one value (if not, inserts are simply discarded)
`update`			Both overwriting and changing values can achieve great speedups if the number of conflicts is low. Otherwise the performance can be worse than in the sequential case.
`delete`			The highest scalability could be achieved with phase parallelism. In implementations where `delete`was `update`realized by, the performance was slightly lower.

As expected, every operation of the hash table shows positive behavior if there are only a few resource conflicts. However, a high number of conflicts leads to problems for writing. The latter, however, is more of a general problem in parallel processing, since a high number of conflicts leads to high costs for their handling. Since the sequential version does not require such a conflict management, it is naturally more efficient. In spite of everything, it should be noted here that certain implementations carry out simultaneous write accesses in a way that high speedups are still achieved even with a high number of conflicts.

Furthermore, it is important to ensure that in the real world there are only rarely sequences consisting of a single type of operation. B. seldom is a table just written, but never read. If you look at the performance of a mixture of different operations, such as the very frequently occurring pair insertand find, the benefit of a concurrent hash table is clearly shown in the high scalability. This is especially very high when a high percentage of findoperations are performed.

Ultimately, the final performance of a concurrent hash table depends on a large number of factors that vary depending on the intended use. When choosing an appropriate implementation, aspects such as generality, conflict handling and the size of the table must be taken into account. For the last point, for example, it is necessary to consider whether the size of the desired table can be determined in advance, or whether an approach based on dynamic size must be used.

Implementations

Since Java 1.5 , concurrent hash tables based on the concurrent map interface are provided.

libcuckoo provides concurrent hash tables for C / C ++ that enable parallel reading and writing. The library is available on GitHub.

Threading Building Blocks offer concurrent unordered hash tables for C ++ , which allow simultaneous insertion and traversal and are kept in a similar style to the C ++ 11 std::unordered_map container. Also included are the concurrent, unordered multimaps, which make it possible to store multiple values for the same key in such a map. In addition, concurrent hash maps are provided, which are based on the concurrent, unordered map and continue to enable simultaneous deletion and contain integrated locks.

growt provides concurrent growing hash tables for C ++ based on the so-called folklore implementation. A variety of different growing hash tables are provided based on this non-growing implementation. These implementations allow simultaneous reading, insertion, updating (especially updating of values based on the current value on the key) and removal (based on updating with the help of tombstones ). In addition, variants based on Intel TSX are offered. The library is available on GitHub.

folly offers concurrent hash tables for C ++ 14 and later , the wait-free readers and lock-based, fragmented writers. As stated on the GitHub page, this library has useful features for Facebook .

Junction provides several implementations of concurrent hash tables for C ++ based on atomic operations to ensure thread safety for any function of the table. The library is available on GitHub.

literature

Maurice Herlihy, Nir Shavit: Chapter 13: Concurrent Hashing and Natural Parallelism . In: The Art of Multiprocessor Programming . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA 2008, ISBN 978-0-12-370591-4 , pp. 299-328.

Individual evidence

↑ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Tobias Maier, Peter Sanders, Roman Dementiev: Concurrent Hash Tables: Fast and General (?)! . In: ACM (Ed.): ACM Trans. Parallel Comput. . 5, No. 4, March 2019, ISSN 2329-4949 , pp. 1–32, Article 16. doi : 10.1145 / 3309206 .
^ ^A ^b ^c Julian Shun, Guy E. Blelloch: Phase-concurrent Hash Tables for Determinism . In: SPAA '14. ACM, 2014, pp. 96-107. ISBN 978-1-4503-2821-0 doi : 10.1145 / 2612669.2612687
↑ ^a ^b ^c ^d ^e ^f Xiaozhou Li, David G. Andersen, Michael Kaminsky, Michael J. Freedman: Algorithmic Improvements for Fast Concurrent Cuckoo Hashing . In: EuroSys '14. ACM, 2014, pp. 1–14, article 27. ISBN 978-1-4503-2704-6 doi : 10.1145 / 2592798.2592820
↑ ^a ^b Josh Triplett, Paul E. McKenney, Jonathan Walpole: Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming . In: USENIXATC'11. USENIX Association, 2011. doi : 10.1145 / 2592798.2592820
↑ Maurice Herlihy, Nir Shavit: Chapter 13: Concurrent Hashing and Natural Parallelism . In: The Art of Multiprocessor Programming . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA 2008, ISBN 978-0-12-370591-4 , pp. 316-325.
↑ Bin Fan, David G. Andersen, Michael Kaminsky: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing . In: nsdi'13. USENIX Association, 2013, pp. 371-384.
↑ Java ConcurrentHashMap documentation
↑ GitHub repository for libcuckoo
↑ Threading Building Blocks concurrent_unordered_map and concurrent_unordered_multimap documentation
↑ Threading Building Blocks concurrent_hash_map documentation
↑ GitHub repository for growt
↑ GitHub page for the implementation of concurrent hash maps in folly
↑ GitHub repository for folly
↑ GitHub repository for Junction

[maier-1] ↑ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Tobias Maier, Peter Sanders, Roman Dementiev: Concurrent Hash Tables: Fast and General (?)! . In: ACM (Ed.): ACM Trans. Parallel Comput. . 5, No. 4, March 2019, ISSN 2329-4949 , pp. 1–32, Article 16. doi : 10.1145 / 3309206 .

[shun-2] A ^b ^c Julian Shun, Guy E. Blelloch: Phase-concurrent Hash Tables for Determinism . In: SPAA '14. ACM, 2014, pp. 96-107. ISBN 978-1-4503-2821-0 doi : 10.1145 / 2612669.2612687

[Li-3] ↑ ^a ^b ^c ^d ^e ^f Xiaozhou Li, David G. Andersen, Michael Kaminsky, Michael J. Freedman: Algorithmic Improvements for Fast Concurrent Cuckoo Hashing . In: EuroSys '14. ACM, 2014, pp. 1–14, article 27. ISBN 978-1-4503-2704-6 doi : 10.1145 / 2592798.2592820

[Triplett-4] Josh Triplett, Paul E. McKenney, Jonathan Walpole: Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming . In: USENIXATC'11. USENIX Association, 2011. doi : 10.1145 / 2592798.2592820

[5] Maurice Herlihy, Nir Shavit: Chapter 13: Concurrent Hashing and Natural Parallelism . In: The Art of Multiprocessor Programming . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA 2008, ISBN 978-0-12-370591-4 , pp. 316-325.

[Fan-6] Bin Fan, David G. Andersen, Michael Kaminsky: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing . In: nsdi'13. USENIX Association, 2013, pp. 371-384.

[7] Java ConcurrentHashMap documentation

[8] GitHub repository for libcuckoo

[9] Threading Building Blocks concurrent_unordered_map and concurrent_unordered_multimap documentation

[10] Threading Building Blocks concurrent_hash_map documentation

[11] GitHub repository for growt

[12] GitHub page for the implementation of concurrent hash maps in folly

[13] GitHub repository for folly

[14] GitHub repository for Junction