Garbage collection


from Wikipedia, the free encyclopedia
Animation of an automatic garbage collection. The memory area is filled with various objects (shown with colors), some of which are also destroyed again and leave gaps in the memory area. If (as in this example) there is not enough free space “at the end” or if automatic garbage collection decides, the memory is “compressed”, with all objects still in use being placed at the beginning and all memory gaps consolidated at the end. This makes a large memory area available again for the future creation of objects.

The garbage collection , short GC ( English for refuse collection , even garbage collection or heap collection called) in the designated software and information technology , an automatic memory management , which helps to avoid memory problems; the advantage is bought with an increased consumption of resources. Among other things, the memory requirement of a computer program is minimized . Thereby at runtimetries to automatically identify memory areas that are no longer required in order to free them up. Some automatic memory cleanups also merge the memory areas that are still in use ( defragmentation ).

motivation

In many software systems , required (working) memory is reserved dynamically (when required) . If it is no longer used after a program part has been processed, the memory should be released again in order to enable this resource to be reused. In the case of explicit, manual memory management, this is done by defining the memory reservation and release in the program by the programmer , a procedure that quickly becomes complex and thus potentially prone to errors . In addition to forgetting a release, which can lead to a shortage of memory in the long term, releasing memory that is still required (elsewhere) too early usually quickly leads to a program crash. Forgotten memory releases often do not immediately lead to abnormalities in the program flow - at least not during the typically short program runs during development, but only when the finished program is operated by the end user without interruption for hours and days.

With manual memory management, it is often not possible or very time-consuming to defragment the memory. Heavily fragmented memory can cause memory allocation by the program to fail because there is not a sufficiently large contiguous area available.

description

The idea of ​​automatic garbage collection is to have a routine called a garbage collector do this job automatically, without the programmer having to do anything. I.e. the memory management is shifted from an explicit definition at the program creation time ( compile time ) to a dynamic analysis of the memory requirements during the runtime of the program.

Such an automatic garbage collection usually runs in the background (or concurrently ) at more or less regular intervals (e.g. during pauses in the program flow) and is not explicitly triggered by the program. However, GC can often also be triggered directly to give the program some control over the cleanup, e.g. B. in a situation of insufficient memory ( out-of-memory ).

approaches

There are several approaches to implementing automatic garbage collection. Desired requirements can be as low a memory cross-section as possible , a maximum allocation speed, a reduction in memory fragmentation and many more, which can also contradict each other and lead to conflicting goals . I.e. Depending on the application, automatic garbage collection can look very different and certainly meet many requirements, but some not.

Typically, however, all of these flavors are assigned to two basic types of garbage collection: conservative and non-conservative garbage collection.

Conservative automatic garbage collection

A conservative automatic garbage collector is one that can not reliably detect all non-referenced objects. This usually has no information about where references to other objects are located in the memory. To clean the garbage, it has to search the store for possible references. Any bit sequence that could be a valid reference in memory is taken as a reference. It cannot be determined whether this is not a random pattern after all. Therefore, conservative collectors occasionally recognize objects as referenced when they are actually not. Since an automatic garbage collector must never remove objects that could still be used, it must conservatively assume that the recognized bit sequence is a reference.

A conservative collector can pose a risk, especially if an automatic garbage collector also has to free more urgent resources than memory (see finalization ). In general, conservative GCs can be found where internal pointers (i.e. pointers to different parts of an object) are allowed, which makes it difficult to implement automatic memory management. Examples of this are the languages ​​C and C ++. It should be noted here that this does not apply to the “managed types” in C ++ / CLI , since their own reference types have been introduced there for automatic garbage collection, which do not allow the address of an object to be read out directly.

Non-conservative automatic garbage collection

A non-conservative automatic garbage collection (sometimes referred to as "exact garbage collection" ) is one that has metadata that enables it to find all references within objects and stack frames. With non-conservative garbage collection , a distinction is made between tracing garbage collectors and reference counting .

Tracking algorithms

Mark-and-sweep algorithm

With this method of garbage collection, objects that are known to be still in use are followed by all references to other objects. Every object reached in this way is marked. Then all unmarked objects are released for reuse.

The release can lead to memory fragmentation. The problem here is somewhat less than with manual memory management. While with manual memory management the deallocation always takes place immediately, with mark-and-sweep several objects are almost always removed at once, which frees up larger contiguous memory areas.

Mark-and-Compact algorithm

The mark-and-compact algorithm , like mark-and-sweep, uses the principle of reachability in graphs in order to recognize objects that are still referenced. It copies this to another location in the memory. The entire area from which the still referenced (one also speaks of "living") objects were copied is now regarded as a free memory area.

The disadvantage of this method is that the "living" objects themselves are moved, because pointers to them become invalid and must be adapted. There are basically at least two methods for this:

  1. Each object is addressed via two indirections (diversions) (via a pointer to a pointer to the object ), so that when moving, only the pointer that points directly to the object needs to be adjusted.
  2. All references refer directly to the object in order to avoid time-consuming dereferencing and are appropriately adapted after a move.

Moving the objects, however, has the advantage that those that “survived” the cleanup are now all compacted and the memory is practically defragmented. It is also possible to allocate very quickly because free storage space is not searched for. Clear: If the referenced objects are moved to the "beginning" of the memory, new memory can simply be allocated at the "end", behind the last living object. Allocation works comparatively easily, similar to the stack .

Generational

Generational GCs shorten the duration of the memory release. For this purpose, the situation is exploited that in practice the lifespan of objects is usually very different: On the one hand there are objects that survive the entire runtime of the application. On the other hand, there are a large number of objects that are only needed temporarily to carry out a single task. With generational GCs, the memory is divided into several sub-areas (generations). The longevity is quantified by a counter, which is incremented with each garbage collection. With each application of the release algorithm (e.g. Mark-and-Compact or Stop-And-Copy), long-lived objects are moved to a higher generation. The advantage is that garbage collection can be performed more frequently and faster for lower generations, since only some of the objects need to be moved and their pointers changed. There is a high probability that higher generations contain only living (or very few dead) objects and therefore need to be cleaned up less often.

GC generations in Java HotSpot

The number of generations is determined heuristically (for example three in .NET, two for young objects (also called young generation) and one for old objects (tenured generation) in the Java VM from Sun). In addition, different algorithms can be used for each generation. In Java, for example, a modified stop-and-copy algorithm is used for the lowest generation, and mark-and-compact for the higher generation.

Reference count

With this method, each object keeps a counter with the number of all references that point to this object. If the reference counter of an object falls to zero, it can be released.

A particular problem of heap collection with reference counting is what are known as cyclical references , in which objects hold references to one another but are otherwise no longer used by any consumer in the system. For example, suppose object A holds a reference to object B and vice versa while the rest of the system no longer needs its services. Both objects therefore refer to each other (cyclically) , which is why the automatic garbage collection cannot easily recognize that they are no longer being used. The consequence of this is that the memory remains occupied for the duration of the program execution. There are different algorithms that can recognize and resolve such situations, mostly according to the principle of accessibility in graphs .

properties

With a garbage collection some common programming errors, which are often made when dealing with dynamic memory management , can be completely or at least partially avoided. Particularly noteworthy here are memory leaks , the double release of resources and the dereferencing of resources accidentally released too early ( hanging pointers ). Releasing referenced objects leads to hanging pointers , which often lead to program crashes and undeterministic behavior.

As a consequence of Rice's theorem, it cannot be determined whether referenced objects will ever be used again. This is why automatic garbage collection only releases objects that are no longer referenced by the program; it does not prevent " memory leaks " of the kind that the program still holds a reference to the memory area but never uses the content again. Such memory leaks usually represent logical errors or design errors (errors in the basic concept, incorrect requirements for the software, software design errors) and can also occur with non-automatic memory management.

In addition, garbage collection solves the problem of memory fragmentation , which is not a programming error in the strict sense, but can be based on poor program design. This problem can lead to program crashes that are difficult to reproduce. The problem of memory fragmentation is generally not solved by explicit / manual memory management.

Efficiency

Whether automatic garbage collection speeds up or slows down programs as a whole is controversial. In some contexts, such as For example, if memory is only released when the system requirements are currently low or if the memory management of the system is relieved by defragmentation, it can lead to increased performance. There are microbenchmarks that prove that in programming languages ​​with automatic memory cleaning , the creation / release of objects is faster overall than without, but also microbenchmarks that see a predominantly negative impact on performance overall. A 2005 publication states that garbage collection can only perform as good or slightly better than explicit memory management if garbage collection is entitled to five times as much memory as is actually needed. With three times as much memory, garbage collection would run on average 17% slower, with twice as much memory 70% slower than with explicit memory management.

Memory usage

In terms of memory consumption, automatic memory management and clearing leads to an overhead compared to explicit, manual memory management due to the time-delayed clearing. A scientific publication from 1993 estimates the overhead for conservative garbage collection (as available for example for the C language) at typically 30–150%. On the other hand, a correct implementation of manual memory release in non-trivial programs is complex to implement, which creates sources of error for memory leaks in manual memory release. For example, the method of reference counting , which is often used, can not recognize cyclic references and leads to memory leaks without supplementation by complex algorithms.

determinism

By not explicitly defining the decision about the release time, the programmer also gives up some of the control over the program flow. Since automatic garbage collection i. d. R. concurrently takes place, the program itself has no information about when memory areas actually released or objects finalized are. As a result, the program flow is potentially no longer deterministic .

Specifically, the following forms of non-deterministic behavior can occur:

  • The time of finalization is indefinite: Even if an object is recognized as no longer needed and has been selected for cleaning, the time of finalization is indefinite, which means that the program flow is no longer deterministic. This is especially a problem when the object is using shared resources or performing final calculations. Doing this in the course of finalization is considered an anti-pattern in programming .
  • The runtime - both of the entire program and only of individual sections - can become non-deterministic due to the interruptions by the garbage collector. This is a problem especially for real-time systems . In real-time systems, for example, it is unacceptable for program execution to be interrupted at unpredictable times by performing garbage collection. For real-time systems, such as Real-Time Java , automatic garbage collection works preemptively (for example in the idle process ) and incrementally. Simple incremental processes work, for example, with so-called three-color marking.

Defragmentation

Using compacting algorithms, garbage collection can prevent memory fragmentation . See Mark and Compact . This avoids gaps in the memory that could not be filled due to large new objects. Defragmentation results in a longer delay in freeing up memory, but it reduces the allocation time. In order to be able to carry out the memory release as quickly as possible, it is ensured that large memory areas have to be cleared as rarely as possible. Therefore these algorithms are preferably used in combination with generational methods .

Defragmenting the memory has the following benefits:

  • The entire available memory is used.
  • Allocating memory takes less time because the data structures that manage the heap become less complex. Finding a free memory location of a suitable size is easier.
  • Objects allocated one after the other are usually next to one another in the memory (this is referred to as good memory location ). Studies have shown that objects created one after the other are often used simultaneously for a specific operation. If they are close enough to one another, the fast cache memory is accessed and not the slower memory behind it.

Finalization

As finalization ( English finalization ) is known in object-oriented programming languages, a special method that is called when an object by the garbage collector is released.

Unlike with destructors , finalization methods are not deterministic: A destructor is called when an object is explicitly released by the program. However, the finalization method is not called until the garbage collector decides to free the object. Depending on the garbage collector, this can happen at any point in time when it is determined that the program no longer uses the object - possibly never or at the end of the runtime (see also section Determinism ).

Finalization can lead to problems in practice if it is responsible for releasing resources:

  • Objects that manage resources should not only release them in the course of finalization. Otherwise, this could lead to blocked states within the program sequence, since the time of finalization cannot be predicted.
  • Finalization creates additional processing load for the automatic garbage collection, which should be carried out as quickly as possible and without disturbing the rest of the program flow.
  • There is no defined order of finalization. It can therefore happen that other objects are accessed during finalization that are also subject to finalization but no longer exist at all at this point in time.
  • Depending on the implementation (for example in the Java programming language ) there is no guarantee that the finalization routine will be called by the automatic garbage collector at all.

In the Java programming language , objects have a special method called finalize () that can be overridden for this purpose . For the reasons mentioned above, it is recommended for Java to completely dispense with finalization and to use an explicit termination method instead. The automatic memory cleaning then falls exclusively to the task of memory management.

distribution

Some older ( APL , LISP , BASIC ) and many newer programming languages have built-in automatic garbage collection.

For programming languages ​​such as C , in which the programmer has to do the memory management by hand, there are sometimes libraries that provide automatic garbage collection, which can easily be circumvented during programming, or even has to be circumvented in system-level programming. For this reason, in some programming languages, system-level programmed modules can be excluded from automatic garbage collection by explicitly marking them (for example in C # with the option / unsafe or in Component Pascal with the mandatory IMPORT SYSTEM statement ).

Further examples of programming languages ​​with automatic memory management are Smalltalk , Haskell , Oberon , Python , Ruby , OCaml , Perl , Visual Objects , ABAP , Objective-C (from version 2.0), D and all languages ​​that are based on the Java Virtual Machine (JVM ) ( Java , Groovy , Clojure , Scala , ...) as well as that were developed for the common language runtime of .NET (for example C # or VB.NET ).

Apple ecosystem

Apple introduced garbage collection in 2007 with the release of Mac OS X Leopard (10.5) as the "most important change" for Objective-C 2.0, which, according to Apple, brought "Objective-C the same ease of memory management as other modern languages". 2012 OS X Mountain Lion , however, garbage collection was (10.8) declared obsolete and the use of with Mac OS X Lion introduced (10.7) automatic reference counting mechanism (Engl. Automatic reference counting, ARC) at compile time based on the just-introduced CLANG / LLVM 3.0 compilers forced. With this automated reference counting, the compiler incorporates code to identify and remove objects that are no longer required by means of reference counting at suitable points. In contrast to GCs with reference counting, the automated reference counting runs serially and at times specified at compile time and is therefore deterministic. However, ARC has no way of recognizing cyclical references; Programmers therefore have to explicitly manage the lifespan of their objects and manually resolve cycles or work with weak or insecure references.

According to Apple, mobile apps without GC have better and more predictable performance. The GC-free iOS as a basis enables Apple to build mobile devices with less memory than the GC-based competition, which nevertheless have the same or better performance and battery life; an approach that has also been described in the trade press as an architectural advantage.

literature

  • Richard Jones, Rafael Lins: Garbage Collection. Algorithms for automatic dynamic memory management. John Wiley, Chichester 1996, ISBN 0-471-94148-4 .
  • Richard Jones, Anthony Hosking, Eliot Moss: The Garbage Collection Handbook. The Art of Automatic Memory Menagement. (Chapman & Hall Applied algorithms and data structures series). CRC Press, Boca Raton, Fla. 2011, ISBN 978-1-4200-8279-1 .

Web links

Individual evidence

  1. Satish Chandra Gupta, Rajeev Palanki: Java memory leaks - Catch me if you can. IBM DeveloperWorks, August 16, 2005, archived from the original on July 22, 2012 ; accessed on April 2, 2015 .
  2. How to Fix Memory Leaks in Java ( Memento from February 5, 2014 in the Internet Archive ) by Veljko Krunic (March 10, 2009)
  3. Creating a memory leak with Java on stackoverflow.com (English)
  4. Microbenchmarking C ++, C #, and Java: Object creation / destruction and method call. Dr. Dobb's Journal , July 1, 2005, accessed April 11, 2014 .
  5. Arne Schäpers, Rudolf Huttary: Daniel Düsentrieb - C #, Java, C ++ and Delphi in the efficiency test, part 2. c't , December 2003, pp. 222-227 , accessed on October 26, 2009 : "" The results show, firstly, that a garbage collector (during the destruction) does not seem to bring any noticeable disadvantages in terms of runtime behavior "and" The sometimes almost double time required by C ++ for construction compared to the other candidates ... ""
  6. ^ Robert Hundt: Loop Recognition in C ++ / Java / Go / Scala. (PDF; 318 kB) Scala Days 2011, April 27, 2011, accessed on November 17, 2012 (English, Stanford, California): “ Java shows a large GC component, but a good code performance. [...] We find that in regards to performance, C ++ wins out by a large margin. [...] The Java version was probably the simplest to implement, but the hardest to analyze for performance. Specifically the effects around garbage collection were complicated and very hard to tune "
  7. ^ Matthew Hertz, Emery D. Berger: Quantifying the Performance of Garbage Collection vs. Explicit memory management. OOPSLA 2005, 2005, accessed on March 15, 2015 (English): " In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collection's performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. "
  8. Benjamin Zorn: The Measured Cost of Conservative garbage collection. Department of Computer Science, University of Colorado Boulder , January 22, 1993, accessed November 18, 2012 : “ Conservative garbage collection does not come without a cost. In the programs measured, the garbage collection algorithm used 30–150 per cent more address space than the most space efficient explicit management algorithm. In addition, the conservative garbage collection algorithm significantly reduced the reference locality of the programs, greatly increasing the page fault rate and cache miss rate of the applications for a large range of cache and memory sizes. This result suggests that not only does the conservative garbage collection algorithm increase the size of the address space, but also frequently references the entire space it requires. "
  9. Constant-Time Root Scanning for Deterministic Garbage Collection (PDF; 375 kB): Garbage Collection [...] typically brings a high degree of indeterminism to the execution environment.
  10. ^ Garbage Collection for Parallel and Distributed Systems , Frank Joachim Frey, May 7, 2002
  11. Josuah Bloch: Effective Java, p. 31: Avoid Finalizers
  12. Objective-C 2.0 Overview ( Memento from July 24, 2010 in the Internet Archive )
  13. Mac OS X 10.7 Lion: the Ars Technica review John Siracusa (July 20, 2011)
  14. Rob Napier, Mugunth Kumar: iOS 6 Programming Pushing the Limit. John Wiley & Sons, November 20, 2012, accessed on March 30, 2015 (English): " " ARC is not garbage collection [...] this makes the code behave the way the programmer intended it to but without an extra garbage collection step. Memory is reclaimed faster than with garbage collection and decision are done at compile time rather than at run-time, which generally improves overall performance. " "
  15. Memory Management
  16. ARC vs. GC
  17. ^ The Clang Team: AutomaticReferenceCounting. In: Clang 3.7. Documentation. Retrieved May 31, 2015 : “It does not provide a cycle collector; Users must explicitly manage the lifetime of their objects, breaking cycles manually or with weak or unsafe references. "
  18. Developer Tools Kickoff - session 300. In: WWDC 2011. Apple, Inc. , June 24, 2011, accessed on March 27, 2015 (English): "" At the top of your wishlist of things we could do for you is bringing garbage collection to iOS. And that is exactly what we are not going to do ... Unfortunately garbage collection has a suboptimal impact on performance. Garbage can build up in your applications and increase the high water mark of your memory usage. And the collector tends to kick in at undeterministic times which can lead to very high CPU usage and stutters in the user experience. And that's why GC has not been acceptable to us on our mobile platforms. In comparison, manual memory management with retain / release is harder to learn, and quite frankly it's a bit of a pain in the ass. But it produces better and more predictable performance, and that's why we have chosen it as the basis of our memory management strategy. Because out there in the real world, high performance and stutter-free user experiences are what matters to our users. ""
  19. José RC Cruz: Automatic Reference Counting on iOS. (No longer available online.) Dr. Dobbs, May 22, 2012; archived from the original on August 16, 2012 ; accessed on March 30, 2015 (English): "" Finally, the [Garbage collection] service still incurs a performance hit despite being conservative. This is one reason why garbage collection is absent on iOS. […] ARC is an innovative approach that has many of the benefits of garbage collection, but without the performance costs. Internally, ARC is not a runtime service. It is, in fact, a deterministic two-part phase provided by the new clang front-end. ”“ Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.drdobbs.com
  20. Felix Disselhoff: Only 1 GB RAM ?! Why the iPhone 6 leaves androids behind. curved.de, November 17, 2014, accessed on September 22, 2018 : “Why is the iPhone with only 1 GB of RAM faster than the Android competition with 2 or 3 GB of RAM? The explanation is simple and has to do with "garbage". [...] Android needs a multiple of the memory used "
  21. Oliver Haslam: Why iPhone With 1GB RAM Performs Better Than Android Devices With 2GB Or More RAM? redmondpie.com, November 16, 2014, accessed March 25, 2015 .
  22. Precious Silva: iOS 8 vs Android 5.0 Lollipop: Apple Kills Google with Memory Efficiency. (No longer available online.) International Business Times , Nov. 18, 2014, archived from the original on April 3, 2015 ; accessed on April 7, 2015 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / au.ibtimes.com