Criticism of the PISA studies

The PISA student assessments of the OECD have triggered not only an extraordinary media coverage, but also fierce scientific debate. This article summarizes criticism of the objectives, methodology and interpretation of the PISA studies (PISA = Program for International Student Assessment ).

The status of PISA: between science, business and politics

PISA has triggered hundreds, if not thousands, of secondary work that evaluates the scaled data set (competence values and background variables) in more detail under various aspects. Much of this work has appeared in peer-reviewed scientific journals. Paradoxically, PISA itself has not undergone any such quality control: PISA presents itself as interest-guided or even -bound contract research, financed by the individual governments of the participating countries and carried out by private-sector institutes (especially ACER Australia), and the results are self-published by the OECD published without prior external assessment. Recently, even the data sets have been withheld from the scientific public. A primary publication in specialist journals would also hardly be possible, because the test items (in the language of psychology: the "instruments") are mostly kept secret, which is an obvious violation of scientific standards.

The fact that PISA is not a scientific work in the strict sense of the word makes the criticism more difficult, because there is no central, international publication organ that could bundle the discussion. Critical works have so far been published in a very confusing way in sometimes very remote places. Scientists only come together after a few years' delay in order to present their PISA criticism in concerted form (in particular Jahnke / Meyerhöfer 2006; Hopmann / Brinek / Retzl 2007). Bank and Heidecke (2009) systematize various critical publications on PISA and thus provide a basis for a further, structured discourse. In addition to the aforementioned anthologies, the presentation refers to a few other sources and at the same time endeavors to critically reflect on the criticism.

Due to the complexity of the subject, criticism is an interdisciplinary undertaking in which not only educational researchers , but also educators, psychologists and other scientists with statistical expertise (mathematicians, physicists, economists) participate.

Hopmann and Brinek describe the reaction of PISA officers to public criticism as follows:

Silence so as not to give critics a public response;
if that is no longer sufficient, deny the critics' competence and assume unfair motivation;
admit isolated problems if necessary, but claim that these have no significant impact;
finally claim that the criticism is well known and long since refuted.

Criticism of the objectives of PISA

The utilitarian educational goal of PISA is criticized in particular by Francophone authors: First of all, it causes a distortion of the test results in favor of Anglo-Saxon countries and then pressure to adapt curricula in the direction of skills that are directly relevant to everyday life. This threatens, for example, the specificity of French math classes, which attach great importance to strict evidence . In this context, reference is made to the economic goals of the OECD and to the lack of transparency and lack of democratic legitimacy in the decision-making processes in PISA (Cytermann in DESCO 2003). A similar objection is that PISA, with its focus on the three areas of mathematics, mother tongue and natural sciences, promotes the marginalization of social sciences, foreign languages and the arts. Literary education is also excluded in the German area. Central educational areas such as linguistic expression, literary knowledge, historical, geographical, political and economic knowledge, religious and ethical basic education, aesthetic basic education are hidden.

Jahnke (in Jahnke / Meyerhöfer 2006) criticizes the basic idea of wanting to "standardize" education (cf. educational standards and Brügelmann 2005 for criticism), and interprets PISA as opening up the market for the test industry .

Freerk Huisken sees in PISA the claim of nation and economy to the achievements of education and science documented or this claim raised as the question of national honor in the international competition. Seen in this way, the damage to the national reputation falls back on "all of us" and challenges us to new top performances - for the good of the nation. The question of the interests of the individuals involved in the educational process is therefore only an abstract and national one.

Criticism of the methodology

Curricular validity of the test items

There are different views about the curricular validity , i.e. the conformity of the test items with the curriculum of the schools tested. While the head of PISA 2000, Baumert, categorically rejected this, the curricular validity is categorically postulated by his successor Prenzel. However, how curricular validity was determined in view of the largely secret tasks, except through so-called expert judgment , has not yet been clarified.

Quality of the test items

Following the tests in 2000 and 2003, only a small part of the tasks used (the instruments in the language of psychology) were published. A large number of authors have criticized these sample exercises as partly incorrect or misleading.

The mathematics didactic specialist Wolfram Meyerhöfer (2005) argues that PISA does not meet the requirement of testing mathematical performance or, in particular, "mathematical literacy". By means of interpretation (method: didactic analysis and objective hermeneutics ) he shows different problem areas:

Often there are so many ways to arrive at the desired solution (which is not always the right solution) that it is impossible to say which ability the task actually measures. The construct “mathematical performance” thus becomes a coincidental one.
Components of testability are also measured. The core competency of testability turns out to be neither taking the mathematical problem posed nor the alleged real problems seriously, but instead concentrating on what the testers want to see ticked or written down. In principle, it turns out to be beneficial to work mediocre, i.e. to forego intellectual depth in dealing with the tasks.
You can guess at multiple choice tests. The PISA group claims to be able to overcome this problem technically, but this turns out to be a misjudgment.
The didactic and psychological theories allegedly used are merely theoretical cloaks for a theory-poor test creation.
The tests are not created by operationalizing measurement constructs, but by systematically piecing together tasks.
In PISA "Mathematical Literacy" should be tested. In short, this should be the ability “to recognize and understand the role mathematics plays in the world, to make reasoned mathematical judgments, and to deal with mathematics in a way that meets the requirements of a person's present and future life of a constructive, committed and reflective citizen ”(PISA report). In view of the tasks, there can be no question of any of this.
It shows a mathematics didactic habitus, which is summarized under the keyword "turning away from the matter". It comprises the following elements: Manifest orientation on technical language with latent destruction of mathematics, illusion of closeness to students as delusion, calculation orientation instead of mathematical education, failure of the "conveyance" of real and mathematical in realistic tasks. The latter is based on neglecting the authenticity of both the real and the mathematical.

Doubts about intercultural comparability

The translation problem

The translation problem, which has not been resolved since the very first comparative school studies, distorted international comparisons in various ways:

Origin of the tasks (mainly from the Anglo-Saxon area and the Netherlands).
Different readability of different languages (the pure text length varies by 10% or more).
Texts tend to get longer when translating.
When translators understand the task, they tend to offer help (Freudenthal 1975).
If translators don't see all the pitfalls, the task can be much more difficult.
Translation errors occur.

Familiarity with the task format

Another problem is the different familiarity with the task format. Meyerhöfer speaks here of "testability"; The meaning of “testwiseness” has long been discussed in the USA. Wuttke (2006) discovered that up to 10% of German-speaking students do not understand the multiple-choice format and tick more than one alternative answer where it was implicit that exactly one answer is correct. According to Joachim Wuttke, “for a number of tasks [...] the art of writing down an answer sentence that takes information from the text without repeating it verbatim is required. Sometimes it is difficult to guess what the examiners want to hear. Only the correction instructions, which are only fully reproduced in the English paper, show how much what is measured here is shaped by Anglo-American examination habits. "

Motivation of the test participants

It is known from internal American studies that the difference between low-stakes and high-stakes tests can be half a standard deviation or more. Sjoeberg contrasts the unconditional achievement motivation in Taiwan and Korea, where the national anthem is sung before the test session, with the mentality of Norwegian and Danish students who ask themselves what the test will bring them personally, and who at the latest when the test tasks are not fun, no longer make a serious effort.

Statistical flaws

When evaluating PISA and similar studies, the basic problem arises that performance differences within each country are much greater than typical differences between countries. A measurement accuracy in the lower percentage range is therefore required in order to be able to make statistically significant statements about such differences. In PISA, this is formally achieved by using very large samples (around 5000 students per country). However, the official standard errors do not take into account possible systematic biases.

Such distortions are caused, among other things, by:

PISA tested 15 year olds. At this age, particularly weak pupils are no longer in school in many countries. In Turkey, for example, only 54 percent attend school at this age, in Mexico 58 percent, but in Germany 96.3 percent. That means: Particularly weak students depressed the level here, while in other countries they were no longer represented as school leavers - but they do stand for the general efficiency of a school system.
- China uses the so-called Hukou system to allocate certain residences to residents regardless of their actual place of residence, which are also linked to free access to educational institutions. The good test results in Shanghai may be due to this discrimination against the rural population. An indication of this is the comparatively low number of 15-year-olds tested in relation to the total population.
Unreliable initial data (there are no original lists with all fifteen-year-olds; the sampling is extremely complicated and cannot be checked).
Performance-based participation rates.
- The USA fell below the minimum school attendance rate of 65% with impunity.
- In South Tyrol only 83% of all fifteen-year-olds were recorded as schoolchildren, although schooling is still compulsory there at this age. Vocational schools have probably been largely excluded from the test, which would explain the top result in this country as a statistical bias (cf. statistical artifact ).
- In Austria, all results from PISA 2000 had to be revised downwards significantly years later due to insufficient consideration of vocational school students.
Inconsistent exclusion of students with learning disabilities. Only in seven countries, including Germany, were special school students tested in short tests. If these tests were excluded from PISA, Germany would come in Pisa 2003 with the reading performance of its students from 18th to 12th place among 29 countries. Other countries do not have special schools for the learning disabled, but have been able to exclude up to 4.5% of the population at the school level. The proportion of people with learning disabilities in Germany is 2.5%.
- Denmark, Finland, Greece, Ireland and Poland have excluded dyslexics from the test. The proportion of dyslexics in Germany is between 4 and 15%.
- In Denmark, students with numeracy problems were also excluded. Of a dyscalculia 4-5% of students in Germany are affected.
- Violation of international rules: Canada, Denmark, New Zealand, Spain and the USA have excluded more than the allowed 5% of the target population.
Details of the sampling and execution of the test are completely uncontrollable and can be manipulated at will if there is a corresponding political interest.

Violations of rules by those involved in the project

The head of the scientific support committee of the laboratory school Bielefeld , Prof. Dr. Tillmann is also a member of the national PISA consortium. This was criticized by the German Teachers Association. At its own request, the Bielefeld Laboratory School was re-tested one year after the main PISA run. At this point in time, some of the PISA tasks were already known. Evaluation for individual schools are only intended as feedback to the school principal; For statistical reasons, they cannot be used for serious comparisons. That is why the school principals were obliged to maintain confidentiality. Nonetheless, results from individual schools have been made public, in part for transparent political reasons.

interpretation

Are the PISA assessments a school performance test?

The majority of the public perceive PISA as a study of the performance of the school system. This is consistent with the perception of PISA as a country competition, as the basic structure of the school system differs from country to country. This interpretation of PISA as a school performance study is based on the following assumptions:

(1) PISA tests an age, not a grade level. That can be justified if one regards performance as a result up to a certain age. PISA discriminates against school systems in which a significant proportion of students have lower grades due to late school enrollment , staying seated, or voluntary repetition . Therefore, the PISA approach is unsuitable for comparing the performance of pupils “near the end of compulsory education”. However, one can also ask whether provisions etc. represent educational measures that make sense. However, the task of a scientific study is to define its own standards clearly and transparently and to refrain from any evaluation that is already implicit in the choice of sample definition.

(2) PISA does not measure the increase in cognitive skills over the course of school, but only the current situation in a certain age group. From the performance data of PISA it cannot be deduced to what extent the performance is due to the school education and to what extent it is due to different systems and environmental influences. This argument suppresses the fact that there are numerous other studies, including longitudinal sections such as B. the LAU investigations in Hamburg. The question of facility or environment does not arise for the school. The decisive factor is whether something can be changed and whether you have enough time to do so, which is questionable due to the early selection after class 4. However - so the criticism continues - PISA allows performance data to be correlated with social characteristics . The results show that social conditions in different countries have different effects on cognitive performance. The example most cited in Germany are migrant children, who lagged behind more in Germany than in other countries. In eastern Germany there is a higher rate of migrant children in grammar schools than in western Germany. In Thuringia, for example, 63% of children of Vietnamese descent attend a grammar school. This is explained both "progressively" with the much better crèche and daycare system in East Germany, "conservatively" with a very high appreciation of education in Vietnamese culture, "ethnic" with the higher average IQ of Asians (compared to Turks, Arabs or also Europeans) and "sociological-historical" with the fact that from Vietnam v. a. the elites fled to Germany from communism.

(3) The PISA results would be published approximately two years after the survey. Therefore, conclusions about the school political situation at the time of publication are not allowed (which is done in the reporting). However, this argument assumes rapid changes in the system, something that has hardly been proven so far.

(4) When interpreting the results, the PISA studies do not take into account the use of a different number of teaching hours, external institutionalized learning and the proportion of specialist hours, for example in mathematics, of the total number of teaching hours. Korea, for example, has very high values for the number of lessons in mathematics and for the values of extra-curricular engagement with the subject matter.

Do the PISA studies measure intelligence?

Heiner Rindermann and Siegfried Lehrl argue that PISA is a company for measuring general intelligence , and that it is the most complex and best ever undertaken internationally. According to the authors, the country results of PISA agreed to plus or minus three IQ points with the IQ data (and estimates) in the book : IQ and the Wealth of Nations (Lynn and Vanhanen, 2002). The correlations found, which due to the mostly "excellent representativeness" of the samples, are higher than between the previous IQ tests, prove for the proponents of IQ tests that PISA results are fundamentally consistent with the results of a century of classic intelligence research the usefulness of proven, standardized IQ tests, as well as the very high correlations between PISA values and IQ with the results of the TIMSS study and the IGLU study . However, the results of the PISA studies do not correlate with those of the IGLU study, since the two studies have different sample definitions. While the PISA study is based on the age of the test persons (15 years), the IGLU primary school study tests pupils of a certain grade level (mostly 4th grade). Therefore, the results of the two studies cannot correlate and insofar also cannot confirm the "results of a century of classical intelligence research", provided that one assumes that intelligence in fourth graders is independent of school and unchangeable, which is, however, a questionable assumption.

Individual evidence

↑ Hopmann / Brinek / Retzl 2007, p. 14f.

↑ news4teachers.de

↑ Freerk Huisken : The "PISA shock" and how to cope with it. How much stupidity does the republic need / tolerate? VSA-Verlag Hamburg 2005.

↑ For example, in a multiple-choice question, the answer "Evolution is a theory that has been proven by research" is rated as incorrect. Instead of “research” in the English original there is no “research” but “scientific experiments”. (Joachim Wuttke: PISA: an expensive random number generator. In: Berliner Zeitung. December 11, 2007)

↑ Joachim Wuttke: PISA: an expensive random number generator. In: Berliner Zeitung. December 11, 2007.

↑ in Hopmann et al. 2007, pp. 220-223.

↑ Wuttke (2007), first in Jahnke / Meyerhöfer (2006).

↑ Attention OECD-PISA: Your Silence on China is Wrong , December 12, 2013.

↑ Learn-foerdern.de ( Memento of the original from March 5, 2016 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2
↑ legasthenie-lds.org ( Memento of the original from March 5, 2016 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2
↑ akl-bayern.de
↑ Lehrerverband.de ( Memento of the original from September 15, 2007 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2
↑ statsoft.com

literature

Hans Brügelmann: Measured schools - standardized pupils: On risks and side effects of PISA, Hattie, VerA & Co. Beltz , Weinheim 2015, ISBN 978-3-407-25729-1 .
Stefan Hopmann, Gertrude Brinek, Martin Retzl (eds.): PISA According to PISA - PISA According to PISA. Does PISA keep what it promises? - Does PISA keep, what it promises? School education and educational psychology, Volume 6, LIT-Verlag, Vienna 2007, ISBN 978-3-8258-0946-1 (bilingual anthology with contributions from seventeen researchers)
Thomas Jahnke, Wolfram Meyerhöfer (Ed.): PISA & Co - Critique of a Program. 2nd Edition. Franzbecker, Hildesheim 2007, ISBN 978-3-88120-464-4 (anthology with contributions by nine researchers)
Volker Bank, Björn Heidecke: Headwind for PISA. A systematic overview of critical writings on international comparison measurements. In: Journal for Scientific Pedagogy. Vol. 85, Issue 3, 2009, pp. 361-372.
Hans Brügelmann : Measuring fever accurately is not yet a diagnosis, successfully lowering fever is not a therapy. 12/2007. (on-line)
Hans Brügelmann, Hans Werner Heymann : PISA - findings, interpretations, conclusions. In: Pedagogy. Weinheim 54, H. 3.2002, pp. 40-43. ISSN 0933-422X
Evaluation des connaissances et des compétences des élèves de 15 ans: questions et hypothèses formulées à partir de l'étude de l'OCDE. ( Memento of May 17, 2009 in the web archive archive.today ) Rencontres de la DESCO. Direction générale de l'Enseignement scolaire, Ministère de l'Éducation nationale. May 31, 2002. DESCO 2003.
Hans Freudenthal : Pupils achievements internationally compared. In: IEA. Educational studies in mathematics. Dordrecht 6.1975, pp. 127-186. ISSN 0013-1954 (criticism of one of the first comparative school performance studies; in parts still up-to-date)
Josef Kraus : The PISA hoax. Our children are better than their reputation. How parents and schools can promote potential. Signum, Vienna 2005, ISBN 3-85436-376-1 .
Volker Ladenthin : PISA - Law and Limits of a Global Empirical Study. An educational theory consideration. In: Quarterly journal for scientific pedagogy. Paderborn 79.2003, H. 3, pp. 354-375. ISSN 0507-7230
Siegfried Lehrl : PISA - a worldwide intelligence test. In: Mentally Fit. No. 1, Ebersberg 2005, pp. 3-6. ISSN 0941-0767
Wolfram Meyerhöfer : Tests in the Test - The PISA Example. Barbara Budrich, Opladen 2005, ISBN 3-938094-12-5 .
Heiner Rindermann : What do international school achievement studies measure? School achievement, student skills, cognitive skills, knowledge, or general intelligence? (PDF; 165 kB) In: Psychologische Rundschau. Göttingen 57, 2006, pp. 69-86. ISSN 0033-3042
Joachim Wuttke: PISA: Addenda to a debate that has not been conducted. In: Communications of the Society for Didactics of Mathematics. 87, 2009, pp. 22-34. (PDF; 843 kB)
Joachim Wuttke : The insignificance of significant differences. PISA's claim to accuracy is illusory. In: T. Jahnke, W. Meyerhöfer: PISA & Co - Critique of a Program. 2nd Edition. Franzbecker, Hildesheim 2007. Online version (2013) in SSOAR (PDF; 1.4 MB)

Web links

Rainer Bölling: On the questionability of the PISA ranking , Society for Education and Knowledge 2017
Volker Hagemeister: Wrong accents by PISA , pisa-kritik.de

[1] Hopmann / Brinek / Retzl 2007, p. 14f.

[2] ws4teachers.de

[3] Freerk Huisken : The "PISA shock" and how to cope with it. How much stupidity does the republic need / tolerate? VSA-Verlag Hamburg 2005.

[4] For example, in a multiple-choice question, the answer "Evolution is a theory that has been proven by research" is rated as incorrect. Instead of “research” in the English original there is no “research” but “scientific experiments”. (Joachim Wuttke: PISA: an expensive random number generator. In: Berliner Zeitung. December 11, 2007)

[5] Joachim Wuttke: PISA: an expensive random number generator. In: Berliner Zeitung. December 11, 2007.

[6] Hopmann et al. 2007, pp. 220-223.

[7] Wuttke (2007), first in Jahnke / Meyerhöfer (2006).

[8] Attention OECD-PISA: Your Silence on China is Wrong , December 12, 2013.