Hungarian notation

from Wikipedia, the free encyclopedia

In the Hungarian notation is one of programmers used naming convention for the election of identifiers for variables and constants , functions and methods as well as other objects.

The Hungarian notation owes its name to the exotic appearance of the identifiers in an (English) program text, which were created by certain rules, and the Hungarian origin of its inventor Charles Simonyi .

The convention developed by Simonyi was used with great success at Microsoft in the Application Group ( Microsoft Office ) and was subsequently adopted by the Systems Group ( Windows ), which led to a fundamental misunderstanding. In his paper, Simonyi speaks of the “type” of a variable, which has often been interpreted as a “data type”. Rather, what is meant is the type of task of a variable in the specific context of an application. So it is not a question of whether a variable stores an integer or a fraction, but rather whether it represents a counter, a coordinate on the screen, or an index in an array. You should be able to deduce the meaning and not its memory type from the variable name.

Due to this ambiguity, there are two currents of the Hungarian notation, the Apps Hungarian , which is the real notation in the Simonyi sense, and the Systems Hungarian , which arose from the misinterpretation of Microsoft's operating system department. The latter is responsible for the bad reputation of the convention, because naming a variable according to the data type does little to understand the content and still causes a lot of effort.

The core of the Hungarian notation is to clarify the task and the type ( Apps Hungarian ) or only the type ( Systems Hungarian ) of a variable (or method) in its name.

Apps Hungarian

Composition of a variable name

Charles Simonyi's Hungarian notation describes the full name of a variable. Above all, he wants to exclude less meaningful variable names. var hilf: Integer;

For this purpose it is clearly defined which attributes a variable name may contain;

 {Präfix} {Datentyp} {Bezeichner}

The prefix and data type are consistently written in lower case and the first letter of the identifier is upper case. The underscore (_) should always be avoided. Example:

 var idFirst: Byte; // Pascal
 byte idFirst;      // C

In this example, i the prefix (for the index in an array), d the data type (for Double, s. U.) And First , the identifier (for the first [English first = German first ] element of an array). It is important that the variable is idFirstan integer, although the data type has a d for double. This is because it is a run variable to an array of double values. The physical data type of the variable itself is not listed in the name of the variable because it is irrelevant for its task.

In most cases, just a prefix and a data type are sufficient, since all attributes are optional. So you can simply call a run variable in a for loop id and thus achieve a more meaningful identification than with a variable name like run .

Prefixes

The most strictly sense-related attribute of the Hungarian variable name is the prefix. It only refers to the function of the variable in the program in which it is used.

The prefixes listed below are those already agreed. However, you can always use new ones (your own) to specify new tasks. As a rule, however, the following prefixes prove to be absolutely sufficient.

prefix derived from meaning
p p ointer A pointer to an address.
H h andle A pointer to a pointer, i.e. equivalent to pp . H is almost always used in connection with communication with the operating system.
rg r an g e An array which is indexed by "normal" integers. The array rg can be understood as the interval of a mathematical function in which an element is assigned to each whole number. An rgd, for example, is an array that contains double-precision floating point numbers.
mp m a p Also an array, with the difference to rg that any data types are used for indexing , so two data types are noted for the prefix mp , namely first the data type of the index, and then the data type of the content. If x is any data type, then mpix is equivalent to rgx .
dn d omai n Another prefix for an array: The specialty of dn is that this prefix emphasizes that what is important are not the elements of the array, but the indices themselves, which makes this prefix very rare.
i i ndex One of the most important prefixes related to arrays. For example, id indexes a rgd . With an mpfr , an array of floating point values, indexed by a Boolean data type, the index can be declared as ifr or simply ir (although it must have a Boolean data type).
b b ase A very rare prefix, which is similar to i , only that b describes the direct offset of an element in an array. If the array is of the physical data type byte, then b and i are even equal. If dch is the physical length of the elements of an array rgx , then the following applies to the index ix (starting with 0): bx = dch * ix.
e e lement The counterpart to i . e denotes an element of an array and is mostly used in conjunction with dn and is accordingly rare. Nevertheless, an element of the array can rgd with ed , also referred to when it is not appropriate in most cases.
c c ount A number of elements, such as in an array. The size of a rgul can be specified as cul .
d d ifference A difference between two variables, mostly in an array. One should not make the mistake of confusing d with c : d always refers to a difference between indices.
gr gr oup Not to be confused with rg : gr denotes a combination of several variables. However, this is not an array, but an arrangement of different variables. gr can be used with a struct , record or class .
f f lay One bit in a variable. Not to be confused with the data type f or bit , which relates to the entire (physical) variable. The prefix f designates a bit in a variable of the physical data type byte, word, etc., which has the character of a flag.
sh sh ift amount The index to a bit (f) in a variable ( not an array ). If only f is set, the variable has the value .
u u nion A non-specific variable that can contain different (Hungarian) data types (if this makes sense). This prefix is ​​therefore extremely rare, because two variables with different meaning are rarely compatible.
a a llocation An assignment, not an array . a is used as a complement to p or also h , since the dereferencing is stored in a . Thus, apl equivalent to l , since it is the variable at the address of L is so l itself.
v A global variable. For example to exchange data. Should be used sparingly or not at all in practice, as it tempts to omit the meaning of the variables. Unless the variable has a strict purpose, it is usually better to simply omit the prefix than to write a constructed v.

Data types

In order to achieve a better interchangeability of the source code, one (or Simonyi) has agreed on some data types or base types . In this case, to set a light " C fixed-flavored" which relates to the designation (for example l as long for a 32-bit integer value).

Data type derived from meaning
f f lay Boolean data types (again meaning the meaning, not the physical data type) or variables with a truth value. The identifier should describe the true state of the variable, i.e. if it is true .
ch ch ar (acter) A one-byte character. Mostly stored in an unsigned byte or char .
st st ring A character string that is similar to the one in the Pascal programming language, i.e. a character string whose first character contains the length of the string.
sz s tring z ero terminated A zero-terminated string as implemented in C (pointer-based char array)
fn f u n ction Mostly a pointer to a method.
fl f i l e A file or a data structure, usually transferred by the operating system.
w w ord A machine word, usually two bytes in size and signed. However, the implementation in the physical data type word is not necessarily meant . As usual with AppsUN , the purpose is meant. A w can justify a generic use of the variables with appropriate methods .
b b yte A byte that is also not linked to the physical data type of the same name, but mostly corresponds to it due to the unsigned 8 bits (see w ).
l l ong A double word, i.e. four bytes, also not bound to long ( C ) or integer ( Pascal ) (see w ).
etc. u nsigned w ord Unsigned machine word.
ul u nsigned l ong Unsigned double word.
r r eal Single precision floating point value. In C mostly float .
d d ouble Double precision floating point value. Usually double in C.
bit A single bit. Can usually be better identified with an 'f' (flag).
v v oid Theoretically an empty variable with no data type. Is only used in conjunction with a pointer to point to values ​​regardless of type.
env env ironment Is used for labels, i.e. jump destinations (Pascal: goto envLoop; ).
sb s egment b ase A segment pointer to the memory (see assembly language ).
ib i ndivisible b ase or i ndex b yte Initially, the variable can be viewed as an index ( i ) to an array of bytes ( b ). However, one can also ib i ndivisible b ase derived.

Identifier

Often the prefix and data type are sufficient to name and explain a variable. The variable to iterate through an array rgch is through

 var ich: Integer; // Pascal
 int ich;          // C

sufficiently described. Every epithet seems superfluous, " non-Hungarian " or simply wrong. For example: ichRun , ichIndex , ichArray etc.

Nevertheless, you occasionally need an identifier that specifically links the variable to a task. You can add any (of course meaningful) word. You just have to be careful not to use underscores (_) and to write down the word in the form " Xxxxx " ( only capitalize the first letter). For this purpose there are already some agreed words that have been introduced due to their frequent use. Most of these refer to an array or similar structure.

Identifier meaning
Based on arrays
Min Describes the very first element of an array and is often used in conjunction with a pointer or index; pchMin , IchMin .
Mic (> = Min) Very similar to Min , but describes the physically smallest element, which in practice is almost always also Min .
First (> = Mic) Describes the first -to-use element of an array. Is often tied to the prefix i ; iFirst .
Last (> = First) A variable xxLast is the counterpart to xxFirst . It is used to index the last element of an array.
Most (> = Last) In some ways the equivalent of Min , as it indicates the highest index of an array.
Lim (> must) With Lim the number of elements is specified in an array. This means that the index with the name xxLim is greater than the last element and is therefore invalid.
Mac (> = Lim) The counterpart to Mic and therefore very similar to Max . Like Lim an invalid index.
Max (> = Mac) Counterpart to Min ; is used to indicate the actual number of elements in an array. This value is also invalid as an index to an array.
Not related to an array
Nile Indicates an invalid value and is therefore mostly used as a constant (see Pascal : nil). Usually there are values ​​like " 0" or " -1" included.
zero Similar to the Nile . Usually, however, the number " 0" marks , also often as an invalid element, corresponds relatively exactly to the C and C ++ compiler constant NULL. In order to prevent misunderstandings, the simultaneous use of Nile and Zero should be avoided or, if necessary, commented specifically.
Src This identifier is used to specify that the variable is a source ( s ou rc e). For example with a transport algorithm .
Dest Dest is often used in conjunction with Src and refers to the destination ( dest ination) of an operation (the source of which is Src ).
Sav Is used as a temporary storage space ( sav e) for the value of a variable. Too frequent use of this identifier is stylistically just as questionable as the prefix v , since the variable no longer has a strict name link.
T Similar to Sav , only that this identifier emphasizes even shorter outsourcing of data and is therefore even less name-bound. T should certainly avoid, but especially the production of many " temporary " variables by repeatedly appending T . Names like xxTTT , xxTTTT or xxT5 are signs of incorrectly implemented Hungarian notation and should be avoided as a matter of principle.

In addition, any other identifier can of course be selected. However, one should try to use the identifier of the table first. This is especially true with regards to arrays. For example there is the function

 Length(rgx);

returns the length of the rgx array in Pascal . So it is tempting to store the result in a variable culLength . This solution is not wrong, since the use of the identifier is not strict. However, it is desirable to use culMax for standard-compliant programming.

Examples

example meaning
rgch An array of characters, in other words a string of characters. This notation is equivalent to sz (or, depending on the implementation, to st ).
ast The value at the point points to the st , i.e. the first element of a Pascal string and thus the number of elements. Synonymous with cst .
uuluwch A variable that stores both 32-bit, 16-bit and 8-bit numbers. In practice one would probably be satisfied with an ul . Unless it should be pointed out explicitly that the numbers belong to a certain set of data types.
rgbit A collection of brands. One could implement this variable as a long . Here is the great advantage of the Hungarian notation. From a long flags; it would be difficult to get a meaning of the variable flags . But a long rgbit; clearly indicates the character of a collection of brands.
rggr An array whose elements are compounds or classes.
mpchgr Also an array whose elements are compounds or classes. However, with the difference to rggr that the array is indexed by I, i.e. by positive bytes.

Systems Hungarian

This notation is a modification of the Microsoft Windows programmer and no longer corresponds to the sense that Simonyi pursued when developing the Hungarian notation .

Composition of the variable name

Prefix and identifier

In contrast to the Hungarian apps , the identifier is composed only of the prefix , which corresponds to the data type, and the freely chosen name .

prefix Data type example
n Integer nSize
b Boolean bBusy
sz null-terminated string szLastName
p pointer pMemory
a Array aCounter
ch char chName
dw Double word, 32 bit, unsigned dwNumber
w Word, 16 bit, unsigned wNumber

The individual prefixes can also be combined. PaszTable defines a pointer to an array of zero-terminated strings.

Visibility prefixes

In addition, prefixes for variable visibility can be defined:

prefix visibility example
m_ Member variable m_szLastName
p_ Method parameters p_nNewValue
i_ Interface parameters (argument of functions) i_nNewValue
s_ static variable s_nInstanceCount
G_ global variable g_nTimestamp

criticism

Objections to the Hungarian notation

Robert C. Martin

“… Nowadays HN and other forms of type encoding are simply impediments. They make it harder to change the name or type of a variable, function, member or class. They make it harder to read the code. And they create the possibility that the encoding system will mislead the reader. "

“Nowadays, Hungarian notation and other types of type coding are just obstacles. They make it harder to change the name or type of a variable, function, field or class. They make the code difficult to read. And they create the possibility that the coding scheme will be misleading to the reader. "

- Robert C. Martin : Clean Code

Linus Torvalds

"Encoding the type of a function into the name (so-called Hungarian notation) is brain damaged - the compiler knows the types anyway and can check those, and it only confuses the programmer."

“The coding of the type of a function in the name (the so-called Hungarian notation) is crazy. The compiler knows the types anyway and can check them and it only confuses the programmer. "

- Linus Torvalds : Linux kernel coding style

Bjarne Stroustrup

“No I don't recommend 'Hungarian'. I regard 'Hungarian' (embedding an abbreviated version of a type in a variable name) a technique that can be useful in untyped languages, but is completely unsuitable for a language that supports generic programming and object-oriented programming - both of which emphasize selection of operations based on the type and arguments (known to the language or to the run-time support). In this case, 'building the type of an object into names' simply complicates and minimizes abstraction. "

“No, I don't recommend the Hungarian notation. I consider Hungarian notation (embedding an abbreviated version of a type in a variable name) as a technique that can be useful in untyped languages, but is totally unsuitable for a language that uses generic and object-oriented programming - both techniques that are based on the selection of operations on the type (which is known due to the language or the runtime support) of an argument - supported. In this case, it just becomes more complicated and the abstraction is reduced. "

- Bjarne Stroustrup : C ++ Style and Technique FAQ

Microsoft

The Framework Design Guidelines (on German framework design guidelines , specifications from Microsoft regarding the naming and model for libraries that extend the .NET Framework) prohibit developers from using the Hungarian notation, although it was quite common on outdated development platforms such as Visual Basic 6 . However, the Framework Design Guidelines make no statement about the naming of private variables.

Binding of field name to field data type

Another disadvantage of the Hungarian notation is the difficult migration of code. If the data type of a field changes, the field has to be renamed, which means that code based on the API becomes invalid and has to be changed over a large area (e.g. when switching from 32-bit to 64-bit values). For reasons of code backward compatibility z. In the case of WinAPI, for example, the field name is not changed, which means that the Hungarian notation indicates an obsolete field type. Example:

Win16: WndProc(HWND hW, WORD wMsg, WORD wParam, LONG lParam)
Win32: WndProc(HWND hW, UINT wMsg, WPARAM wParam, LPARAM lParam)

Web links

Individual evidence

  1. ^ Charles Simonyi: Hungarian Notation. In: MSDN. Microsoft, 1999, accessed July 30, 2014 .
  2. Nil / de - Free Pascal wiki. Retrieved July 18, 2018 .
  3. NULL - cppreference.com. Retrieved July 18, 2018 .
  4. ^ Robert C. Martin: Clean Code: A Handbook of Agile Software Craftsmanship . Ed .: Prentice Hall PTR. 1st edition. Redmond WA 2008, ISBN 978-0-13-235088-4 .
  5. Linux kernel coding style. In: Linux (Kernel) Documentation. Retrieved August 10, 2019 .
  6. Bjarne Stroustrup: Bjarne Stroustrup's C ++ Style and Technique FAQ. June 8, 2014, accessed July 30, 2014 .
  7. ^ General Naming Conventions. In: MSDN. Microsoft, accessed July 30, 2014 .
  8. What do the letters W and L stand for in WPARAM and LPARAM? - The Old New Thing - Site Home - MSDN Blogs