A psychological test or psychological test procedure is an instrument that is intended to record psychological characteristics (e.g. current states or persistent properties / dispositions , interests , attitudes ) of people, groups of people or organizations. Test procedures are usually used to answer a question as part of a diagnostic process of psychological diagnostics . In addition, it is common to use it to clarify individual differences in the context of differential psychology (the research aspect predominates here).

It can be used to elucidate a time course ( intra- individual differences, e.g. the course of therapy) or a comparison between people ( inter- individual differences, e.g. suitability of people for professions).

What are psychological tests?

Schmidt-Atzert and Amelang define a psychological test by summarizing various other existing definitions as follows:

  • It is a measurement method
  • with which a psychological characteristic (or several) is to be recorded.
  • The procedure is standardized
  • and includes the collection of a behavioral sample.
  • The behavior is caused by the specific conditions realized in the test.
  • The variation is said to be largely due to the variation in the feature to be measured.
  • The aim is a quantitative statement on the characteristics of the feature or a qualitative statement on the existence or type of feature

The basic discipline for psychological tests and their application is psychological diagnostics as a branch of psychology . There are three aspects to be distinguished when describing tests:

  • Implementation (type of material, test requirement, logging and registration of the response)
  • Evaluation (calculation of raw values and standard values )
  • Interpretation (processing of results, diagnostic judgment, contribution to decision-making)

When applying tests are still important

  • Selection criteria for which questions and objects to be measured a procedure can be used and which requirements apply
  • Guidelines for communicating the results (to diagnosed and / or client)

Elements of the tests are the items as individual tasks or questions that are presented to people and that have to be responded to. From the evaluations of the reactions ( e.g. answers to questions ), conclusions are usually drawn about the characteristics of a characteristic through generalization using various items.

Numerous procedures are known as psychometric tests; H. a measurement is made based on a theory. In the simplest case, the result generalized over the items (raw value) is determined by adding up certain answers (e.g. correct answers or answers with a certain tendency). Differences then arise with regard to the type of interpretation:

Norm-oriented tests: In order to be able to interpret (evaluate) the result, the comparability with other results is established (position of a person in a comparison group or comparison norm). This is done by converting it into a standard value, which allows a comparison with a reference group ( e.g. with the general population, an age group, successful students or a diagnosis group ). This comparison helps to answer the question why the test was used ( Is the development age-appropriate? - What are the chances of successfully completing a degree? - Is a result conspicuous / typical for a certain diagnostic group ?). The development of these standards for a test is called normalization or calibration.

Criterion-oriented tests do not determine the position of the person in relation to a comparative norm, but rather the achievement / failure of a specific criterion. They must also be content-valid , but the criterion results from the achievement of certain goals (e.g. teaching goals, therapy goals). The determination of the value to be achieved (cut-off value or cut-off) or the importance of the criterion requires empirical criteria (e.g. comparison of groups according to the achievement of goals with regard to success).

Tests differ according to the degree of standardization (unification) of information acquisition. In the case of fully standardized procedures, the following elements are standardized:

  • Instructions (instructions given before and during the test application)
  • the items (quantity, order, design)
  • the answer options and the delivery of the answers
  • the evaluation (calculation of raw values ​​and standard values)
  • the interpretation and answer to the question

Standardization, application under comparable conditions, is a prerequisite for results to be compared with one another at all. It ensures sufficient objectivity as one of the three main quality criteria of tests.

If some elements are variable, one speaks of partly or semi-standardized instruments (e.g. partly standardized surveys, where questions can be selected and answer evaluation by a specialist or questions can be answered freely and an evaluator also makes the evaluations. Here, the standardization is based on clear assessment criteria and training of assessors achieved).

In addition, there are also “qualitative tests” that “ provoke ” behavior in a standardized way , which is then qualitatively assessed or interpreted by a specialist. These include classically assessed projectives or interpretation tests (for some of these tests, such as the Rorschach test , standardized evaluation methods have also been developed).

Quality: quality criteria of tests

According to the definition by Lienert and Raatz, a psychological test procedure must meet the following criteria: It must ...

  1. ... be scientifically sound.
  2. ... be routinely feasible under standard conditions.
  3. ... enable a relative position determination of an individual with regard to group or criterion.
  4. ... be empirically distinguishable, d. H. do not capture hidden features and phenomena such as B. the unconscious in psychoanalysis, but observable and measurable properties (constructs), so experience and behavioral analytical, phenomenological and not only purely conceptually definable properties.
  5. ... map a feature to be examined one-dimensionally and metrically (but note test batteries ).

Scientifically sound tests must meet certain test quality criteria. For tests in the field of suitability diagnostics, there is DIN standard 33430 , which, however, is one of the voluntary standards and is not legally binding.

In Germany there is a test board of the Federation of German Psychological Associations to monitor the quality of psychological procedures. Quality guidelines were developed as checklists (TBS-TK), according to which the quality of tests is assessed and published in the form of reviews.

Abusive use of tests is not uncommon. There are test procedures in use that have not been adequately scientifically verified. Tests can be used for questions for which they were not designed. Results can be absolutized unjustifiably. People can use tests who have no or insufficient qualification (see psychomarket ).

Types of psychological tests

The number of existing psychological tests in the German-speaking area alone can be estimated at several thousand. The quality and level of development can be very different. Different systems are common with regard to the classification of tests, in some of which the categories are mixed up.

  • In judgment tests , individual peculiarities are determined by assessing facts. There are no right or wrong answers here - one problem can be the making of judgments based on the social desirability of the judgment (“making a good impression”). There are conceptual overlaps
    • with personality tests (interpretation tests are also personality tests; performance tests can also record personality traits - intelligence as a personality trait; objective personality tests are performance tests that are evaluated with regard to “classic” personality traits);
    • and with questionnaires (judgment tests are only a part of each, there are numerous other types of questionnaires also outside of psychology, e.g. personnel questionnaires).
  • As performance tests , such tests are summarized where a qualitative or quantitative assessment of the quality of the answers is possible (amount of solution, solution quality, time to resolution). There are right and wrong solutions to these tests.
  • Interpretation tests, or projective tests, are another type of test. While the test person describes his or her own habitual behavior and experience in the questionnaire (self- assessment ), in the projective procedure he is asked to give a creative interpretation of the test template.

A further classification can be made according to whether hard skills / professional competence (predominantly performance characteristics) or soft skills ( social skills , inclinations, interests, personality characteristics in the narrower sense) are recorded.

You can basically describe each test in the following three dimensions and classify it accordingly:

  • which characteristics are measured (intelligence, memory, attention, concentration, ability to learn, personality, attitudes, motivation, complaints, mental state, etc.),
  • the way in which the characteristics are measured (judgment tests, performance tests, interpretation tests) or
  • For which questions the tests can be used ( aptitude tests , fitness to drive, tests to identify mental disorders, school tests, etc.).

Multimodal diagnostics or multi- method diagnostics is a concept that systematically varies the dimensions in order to obtain more precise information.

The existing test classifications merge these three aspects for the sake of simplicity.

The PSYNDEX test classification

PSYNDEX, the most widespread research and documentation system for German-language psychological literature and tests, uses the following test classification (in brackets the number of procedures available in December 2018):

  • Development tests including school readiness tests and gerontological procedures (710)
  • Intelligence tests with learning ability tests and memory tests (445)
  • Creativity Tests (29)
  • Performance, ability and aptitude tests with musicality tests and sports tests (671)
  • Procedure for recording sensorimotor skills (251)
  • School Achievement Tests (513)
  • Attitude tests including traffic psychology tests, job-related attitude tests and work psychology procedures (1883)
  • Interest Tests (79)
  • Personality Tests (1469)
  • Projective method (158)
  • Clinical Procedures (3124)
  • Behavioral Scales (257)
  • Other procedures including procedures for collecting sociographic data as well as exploration and anamnesis schemes (179)

Further sub-categories for the categories mentioned here can be found at the specified source.

Classification of the test center

The test center of the Hogrefe Verlag , which also handles the controlled test distribution for academically qualified psychologists , arranges tests according to the following categories: (Again the number of available procedures in brackets; in contrast to PSYNDEX, this only includes procedures that have been published ready for use by a corresponding publisher. )

  • Job-related procedures (103)
  • Development Tests (105)
  • Intelligence test (189)
  • Clinical Procedures - Adult (142)
  • Clinical Procedures - Children and Adolescents (145)
  • Achievement Tests (31)
  • Medical psychological procedures (63)
  • Neuropsychological Procedures (108)
  • Personality Tests (101)
  • School Tests (124)

Other differences

  • Tests are either the same for all persons in the course of implementation or they are adaptive ; H. the course of the test is influenced by the answers given previously.
  • In the case of a test battery , several tests that are important for a specific issue are carried out. These can also belong to different categories.
  • In addition to the tests for people, there are also tests for groups of people and organizations
  • Progress tests / parallel tests: Many tests cannot be repeated because the awareness of the test would falsify the results. For some tests, equivalent parallel versions (same measurement object, different items) are offered, which can be used repeatedly or alternatively, if e.g. B. in performance tests the second time the solutions would be easier to find or in group tests neighbors should not copy from each other or abnormal findings should be checked again. Other tests are designed as progression tests for multiple use (e.g. mood questionnaires).

Survey methods

With regard to the survey method, a distinction must be made between at least two types that have developed with the advancement of technology.

Paper-pencil tests

If the diagnosed person receives all documents on paper and also answers there or fills out worksheets, one speaks of paper and pencil diagnostics or tests. (Eng. P & P, paper and pencil ) This term arose when computerized versions became increasingly available for tests and the procedures that remained in paper form needed their own name. Many older test procedures are paper and pencil, but modern tests are often still designed and standardized in such a version , e.g. B. if they are aimed at clients who cannot use the computer or practical considerations in everyday clinical practice do not speak in favor of computer support. Even qualitative tests are now often P&P. These tests are generally considered to be more labor-intensive to evaluate. However, there are often hybrid solutions in which the test person works on paper, but the test manager enters the results into a program so that the values ​​can be calculated. Some tests can generally only be completed in paper form, such as drawing tasks in the diagnosis of stroke patients. ("Please paint a house with windows, door, chimney and roof.") Structured interviews are also often carried out as paper and pencil tests .

Computer-aided tests

Computer-aided tests are available when the test person works independently on the screen, the keyboard or a specially developed input unit. These tests are considered to be far more economical, but they cannot be used for every purpose. Input units can be simplified keyboards, but also mechanical devices such as pedals , joysticks or large buttons or controllers. More complex programs that carry out and evaluate several tests with computer assistance are called test systems . For computer-aided tests, in addition to the basic quality criteria of psychodiagnostic methods, there are other criteria that are specific to this type of testing; they should be tamper-proof, self-explanatory, hardware-independent and barrier-free and guarantee test fairness .

Computer-aided tests are also processes in which the answers are verbally communicated to the test supervisor, who then enters them into a registration program, which carries out the evaluation. Finally, answer sheets from paper-and-pencil tests can be scanned and evaluated by computer, especially if the evaluation is very complex (e.g. MMPI ).

With the development of the Internet , extensive possibilities for test procedures have arisen. These procedures are also sometimes referred to as online assessment . A distinction must be made between the extent to which the tested person receives the result after the implementation. The results must be prepared in an understandable way, because usually no psychodiagnostic trained specialist interprets the results. Applications can be found in the context of student counseling or other forms of “self-selection” to explore suitability and inclination for certain training courses, professions or careers. Such tests are increasingly being used by applicants as professional profiling , but are also often used for statistical studies without the tested person receiving a complete evaluation.

Test construction / test development

Psychological tests are measuring instruments that are designed, evaluated and calibrated according to scientific criteria . The development of a scientifically well-founded test is time-consuming and requires both theoretical preliminary work to define the object, features and items to be recorded as well as empirical tests on sufficiently large samples representative of the future area of ​​application. Standards of a test as a benchmark must be checked regularly to see whether they are still valid.

Tests can basically be developed according to two concepts:

There are several options for the sequence of work steps. On the one hand, there can be a dedicated theory about human behavior (e.g. personality theory or intelligence theory ). Based on this, questions ( items ) are then generated, which are checked for their quality using more or less complex statistical procedures ( measurement accuracy , objectivity and validity ). Hypothetical classes or groups are formed and given names or these classes are determined using statistical methods (e.g. factor analysis ). This can then be graded continuously or discretely according to intensity or frequency (e.g. very, slightly, little). The resulting measured values ​​then represent the characteristics of the feature.

Another method is called external construction. This procedure, which only makes sense at second glance, works like this: You consider two distinguishable social groups (e.g. alcohol addicts vs. non-addicts). These groups are presented with a wide range of (heterogeneous) items. These will be answered. Finally, those items are selected that separate the two groups from one another in a statistically reliable manner. The test is then put together based on this. Now (with a certain probability of error ) a correct classification into one or the other group (to other people) can be made. This procedure sometimes generates items that have little in common with what the test is supposed to examine. On the other hand, the test should also be as “opaque” as possible for the test subjects. An example would be the Minnesota Multiphasic Personality Inventory ( MMPI , Hathaway and McKinley, 1951), where some of the 566 questions do not indicate the type of evaluation.

The inductive construction itself is not bound to any theory. Here you put together "blind" items that fit together in terms of content. These items should be related ( correlate ) as far as possible . With the help of further correlation checks it can then be decided whether the scale developed in this way is valid.

User training

Tests must be sufficiently documented for the user (usually in a manual). According to DIN 33430, this manual must present the essential design steps and investigations relating to the quality criteria , as well as contain precise instructions for performing, evaluating and interpreting the tests.

Well-founded psychological tests are delivered to the user with precise instructions and can only be carried out meaningfully if the prescribed instructions are followed. This is especially true for tests that are carried out interactively by a test supervisor. Instructions on how to proceed must be available for possible special features during implementation. The execution of tests must therefore be practiced before the first use and is part of the range of psychology courses in Germany .

Since virtually all tests are subject to measurement errors, boundary conditions can influence the result and only probability statements regarding the prediction of certain facts (presence of a disorder, study success, etc.) are possible, when interpreting the results, precise knowledge of the test and the underlying theories is essential and concepts necessary. This applies in particular to the communication of the findings to the diagnosed and the client so that test results are not over-generalized (see also iatrogenic noxa ).

For some tests, there are therefore user seminars that are recommended or even compulsory. A controversial question is whether tests should only be accessible to trained psychologists for the reasons mentioned. A controlled test distribution (purchase only for trained psychologists by the test center with proof of diploma) was originally intended to guarantee this, but could only be legally and organizationally enforced to a limited extent. DIN 33430 has now expressly opened the group of users for the field of suitability diagnostics, but prescribes training standards. In Germany, the training is also open to other professional groups and is certified with a license.

Other types / areas of testing

See also

The presentation of the following wiki links is not exhaustive. In this area there is also a great variety of terms, the same phenomena and facts are named differently depending on the school.


Terms of test construction and application


