Hutter Prize

The Hutter Prize is a prize in computer science that has been awarded since August 6, 2006. It rewards progress in the areas of lossless data compression and artificial intelligence . It is named after Marcus Hutter , a German computer scientist who currently teaches at the Australian National University .

The competition consists in compressing a certain text corpus - namely the first 100 million characters of a certain version of the English language Wikipedia - as strong and lossless as possible. This corpus consists of 75% natural English language. Marcus Hutter and the other members of the award committee assume that lossless data compression and artificial intelligence represent the same problem - namely, the behavior of a software agent in an unknown, but predictable environment. They also argue that predicting the next part of a sentence in a text requires knowledge and thus intelligence. Some data compression methods are faced with the same problem: If it is possible to guess the next character in a character string, it does not have to be saved.

regulate

The submitted program must be a self-extracting S- size archive . Otherwise, S is the sum of the size of the program and that of the supplied compressed file.
The program must generate a file that is identical to the uncompressed reference file ("enwik8").
The program must run on Windows or Linux (x86, 32 bit).
The program must not obtain any information, for example from other files or computer networks .
Hardware: The program may run for a maximum of 10 hours on a Pentium 4 computer with a clock rate of 2 GHz. The computer can have 1 GB of RAM and no more than 10 GB of temporary files can be stored on the hard drive.

The submitted program does not have to be open source .

To win the prize, the previous record (size S of the self-extracting archive) must be undercut by at least one percentage point. For each percentage point, prize money of 500 euros is awarded.

history

The PAQ8F software marked the starting line with a value of S = 18,324,887 bytes. It took PAQ8F five hours to decompress the data. The current record was set on May 23, 2009 by Alexander Rhatuschnjak. His program “decomp8” took around nine hours to run, and S was 15,949,688 bytes.

Web links

Hutter Prize website