Big data

from Wikipedia, the free encyclopedia
Color representation of the activity of a Wikipedia bot over a longer period of time: typical example of the illustration of "Big Data" with a visualization

The term Big Data [ ˈbɪɡ ˈdeɪtə ] (from English big , large 'and data ' Daten ', German also mass data ) , which comes from the English-speaking area, denotes amounts of data that are, for example, too large, too complex, too fast-moving or too weakly structured to evaluate them using manual and conventional data processing methods .

“Big Data” is often used as a collective term for digital technologies that are held responsible for a new era of digital communication and processing in technical terms and for a social upheaval in social terms. As a catchphrase , the term is subject to continuous change; this also often describes the complex of technologies that are used to collect and evaluate this amount of data.

term

In the definition of big data, “big” refers to the four dimensions

  • volume (volume, data volume),
  • velocity (speed at which the data volumes are generated and transferred),
  • variety (range of data types and sources) and
  • veracity (authenticity of data).

This definition is expanded to include the two V value and validity , which stand for business added value and the assurance of data quality.

Other meanings

Big data primarily describes the processing of large, complex and rapidly changing amounts of data. As a buzzword, however , the term has other meanings in the mass media :

  • Increasing surveillance of people by secret services, also in western states, for example through data retention
  • Company violates the personal rights of customers
  • Increasing lack of transparency in data storage due to delocalization ( cloud computing )
  • Desire of the industry to be able to gain a competitive advantage from the available data
  • Automation of production processes ( Industry 4.0 , Internet of Things )
  • Non-transparent automation of decision-making processes in software
  • Use of new technologies instead of standard software (especially in companies with conservative IT often through the use of software as a service to bypass company-internal IT restrictions)
  • Development of own software solutions ("in-house IT") instead of the use of "off-the-shelf" software by external companies
  • Advertising based on data on internet and mobile phone usage
  • Organization of collaboration in the context of people analytics projects, even if this involves neither large nor complex amounts of data.

Data origin

The collected data can come from various sources (selection):

“Big data” also includes areas that are considered “ intimate ” or “ private ”: The desire of industry and certain authorities to have free access to this data, to be able to analyze it better and to use the knowledge gained is emerging inevitably in conflict with protected personal rights of the individual. A way out can only be achieved by anonymizing the data. Classic users are providers of social networks and search engines . The analysis, acquisition and processing of large amounts of data is commonplace in many areas today.

Big data can enable business process improvements in all functional areas of companies, but above all in the area of technology development and information technology as well as marketing . The collection and utilization of the data volumes is generally used to implement corporate goals or for national security. So far, large sectors, companies and areas of application in the economy, market research , sales and service management, medicine, administration and intelligence services have used the corresponding digital methods for themselves: The recorded data should be further developed and used profitably. The collection of the data is mostly used for group-oriented business models as well as trend research in social media and advertising analyzes in order to recognize future- oriented and possibly profitable developments and to convert them into forecasts .

growth

Amounts of data typically grow exponentially . According to calculations from 2011, the data volume generated worldwide doubles every 2 years. This development is mainly driven by the increasing machine generation of data, e.g. B. via protocols of telecommunication connections ( Call Detail Record , CDR) and web access ( log files ), automatic detection of RFID readers, cameras , microphones and other sensors. Big data also occurs in the financial industry (financial transactions, stock market data) as well as in the energy sector (consumption data) and in the healthcare sector (billing data from health insurance companies ). In the science large amounts of data are also included in such. B. in geology , genetics , climate research and nuclear physics . The IT industry association Bitkom described big data as a trend in 2012. In the case of large data complexes, the uneconomical effort of storing in reserve is not possible. Then only metadata is saved or the evaluation starts concurrently or at most with a slight time delay with the creation of the data.

Corresponding groups, such as search engines, and certain state institutions, such as secret services, have access to a corresponding volume of data.

Examples

In research, by linking large amounts of data and statistical evaluations, new knowledge can be gained, especially in disciplines in which a lot of data was previously evaluated by hand; Companies, for example, hope that the analysis of big data will provide opportunities to gain competitive advantages, to generate savings potential and to create new areas of business, government agencies hope for better results in criminalistics and the fight against terrorism . Examples of expected benefits are:

The pure analysis of customer data is not, however, automatically big data - many marketing applications are often more about “small data” analytics.

Big data processing

Classic relational database systems as well as statistical and visualization programs are often not able to process such large amounts of data. For big data, new types of data storage and analysis systems are used that work in parallel on up to hundreds or thousands of processors or servers, such as in cognitive systems . The challenges here include:

  • Processing of many records
  • Processing of many columns within a data record
  • Fast import of large amounts of data
  • Immediate query of imported data ( realtime processing )
  • Short response times ( latency and processing time) even for complex queries
  • Ability to process many simultaneous queries ( concurrent queries )
  • Analysis of various types of information (numbers, texts, images, ...)

The development of software for processing big data is still at an early stage. The MapReduce approach is well known and is used in open source software ( Apache Hadoop and MongoDB ) as well as in some commercial products (including Aster Data or Greenplum ).

Application (selection)

Political elections

In the presidential election in the United States in 2016 and in the referendum in Great Britain on leaving the European Union in the same year (" Brexit "), the surprising winners each involved Cambridge Analytica , which is responsible for the survey, evaluation, application and Assigning and selling personal data obtained mainly on the Internet and using methods of psychometrics, an offshoot of psychology ( see psychography ).

Social scoring

Collected data are used to evaluate e.g. B. the creditworthiness ( -> credit scoring ), health (and corresponding risks, from which, for example, the structuring of appropriately adjusted insurance premiums follows) or the consumption and shopping behavior of consumers, also used to attempt corresponding predictions (" predicting ") ; In China , the " social scoring " system is built on them , with which the social behavior of the residents is also monitored, assessed and improved.

Education

The use of big data opens up new possibilities for education. The technology can be used to optimize forms of learning and educational programs. Experts such as Viktor Mayer-Schönberger and Kenneth Cukier (* 1968) expect a fundamental upheaval in the education sector through the use of big data.

research

Advances in data processing mean that much more reliable results can be obtained from large amounts of data. Examples are a study with around 16,000 children, in which the connections between obesity and diabetes were examined, and a case-control study on the influence of aircraft noise , in which the health insurance data of over one million patients were evaluated.

Microtargeting

The company Cambridge Analytica had announced after the US presidential election in 2016 that the use of so-called micro-targeting techniques crucial to the election victory of Donald Trump is said to have contributed. By means of psychometric analyzes of large data sets, it was possible to identify undecided or more easily influenced voters (swing voters) and then confront them with targeted campaigns and content via Facebook. The use of these techniques in the US election campaign was preceded by research by the psychologist Michal Kosinski. In it, Kosinski combined big data evaluations with psychological behavioral analyzes and was able to show that users' Facebook likes can be used to predict their personality traits, sexual orientation, drug consumption and religious and political attitudes.

criticism

The American economist Shoshana Zuboff coined the term surveillance capitalism in connection with the collection of personal data by internet companies such as Google and Facebook and sees it as a mutation of industrial capitalism, which regards private human experience as freely available raw material for capitalist production and the exchange of goods and who uses the achievements of the digital revolution for conspiratorial surveillance, storage, manipulation and prediction of human behavior. Zuboff advocates the breaking up of such data monopoly groups and bans in order to interrupt the formation of data concentrations. Her book The Age of Surveillance Capitalism was published in German in 2018.

As research results from various scientists show, the content shared by users on the Internet can sometimes be used to extract highly sensitive information that was not intended to be shared. In order to protect digital privacy, rule-of-law regulations on information storage and collection are therefore becoming increasingly relevant. But even at the state level, big data is sometimes used to collect information about individuals, as the social credit system in China shows.

privacy

The data scientist Andreas Dewes has shown in a study that anonymized data from Internet users that have been collected and sold by companies can be decrypted again and assigned to people. Purchased from the of Dewes as part of its investigation of advertising companies, allegedly "anonymous" data of about three million Germans were members of the German Bundestag and state parliaments and other public figures such as judges , police officers or other officials .

The European Data Protection Supervisor Giovanni Buttarelli emphasized in March 2013 that personal information is not a commodity.

With regard to the adjustment of insurance premiums using big data, the "danger of a creeping de-solidarization in insurance" is emphasized.

Insufficient regulation

A crucial question is who owns the data collected from private individuals, who retains control over it and who controls its use. The extent to which the European General Data Protection Regulation , which will apply from May 25, 2018, is sufficient is being discussed in public.

The Schleswig-Holstein data protection officer Thilo Weichert warned in 2013: "Big data opens up possibilities of informational abuse of power through manipulation , discrimination and informational economic exploitation - combined with the violation of basic human rights."

Dirk Helbing , Professor of Computational Social Science at ETH Zurich, warned in January 2018 about possible technologies of subtle manipulation based on big data. The technology assessor Armin Grunwald , head of the Institute for Technology Assessment and Systems Analysis (ITAS) in Karlsruhe, warns that at no time in human history have there been "such good conditions for a totalitarian dictatorship" as today.

The social researcher Nils Zurawski advocates "solidarity-based data storage" in order to be able to use the advantages of big data for the common good .

Inadequate basis for evaluations

Above all, there is criticism that the data collection and evaluation is carried out almost exclusively according to technical aspects and, for example, the technically simplest way to collect the data is chosen. Basic statistical principles such as that of a representative sample are often neglected. So criticized the social researcher Danah Boyd :

  • Larger amounts of data would not have better quality data be
  • Not all data is equally valuable
  • “What” and “why” are two different questions
  • Care should be taken with interpretations
  • Just because it's available doesn't make it ethical .

For example, one researcher found that people had no more than 150 friendships ( Dunbar number ), which was then introduced as a technical limit in social networks - on the false assumption that acquaintances called "friends" reflect real friendships. Certainly not everyone would name all of their Facebook friends as friends in an interview - the term “friend” on Facebook only signals a willingness to communicate.

Another critical approach deals with the question of whether big data means the end of all theory. In 2008, Chris Anderson, editor-in-chief of Wired magazine, described the credibility problem of every scientific hypothesis and model when analyzing living and non-living systems in real time . Correlations are becoming more important than causal explanations, which can often only be verified or falsified later .

Hype, vague term

The term “big data” is sometimes used when data is neither large nor complex, or changing quickly or easily processed using conventional techniques. According to some observers, the increasing softening of the term means that it is becoming more and more a meaningless marketing term and, according to many forecasts, will experience a strong devaluation within the next few years (“valley of disappointments” in the hype cycle ).

reception

  • Congress Alte Feuerwache Cologne, September / October 2016: Life is not an algorithm - Solidarity perspectives against technological access

art

See also

literature

Non-fiction

Research reports

  • Carsten Orwat, Andrea Schankin: Attitudes towards big data practices and the institutional framework of privacy and data protection - A population survey (KIT Scientific Reports; 7753) . KIT Scientific Publishing, Karlsruhe 2018, ISBN 978-3-7315-0859-5 , doi : 10.5445 / KSP / 1000086677 (English).

Fiction

Legal literature

  • Thomas Sagstetter: Big Data and the European Legal Framework: Status Quo and Need for Reform in the Light of the Trade Secrets Directive 2016/943 / EU . In: Mackenrodt, Maute: Law as Infrastructure for Innovation. Nomos Verlag 2019, ISBN 978-3-8487-5379-6 .

Web links

Commons : Big Data  - collection of images, videos and audio files

Individual evidence

  1. ^ W. Christl: Commercial digital surveillance in everyday life. PDF, on: crackedlabs.org , November 2014, p. 12.
  2. R. Reichert: Big Data: Analyzes on the digital change of knowledge, power and economy . transcript Verlag, Bielefeld 2014, p. 9.
  3. President's Council of Advisors for Science and Technology : Big Data: Seizing Opportunities, Preserving Values , Executive Office of the President, May 2014.
  4. Edd Dumbill: What is big data? An introduction to the big data landscape. ( Memento of the original from April 23, 2014 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. on: strata.oreilly.com , January 11, 2012. @1@ 2Template: Webachiv / IABot / strata.oreilly.com
  5. Gartner IT Glossary: ​​"Big data is high-volume, high-velocity and high-variety in formation assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making". Retrieved January 15, 2016 from: http://www.gartner.com/it-glossary/big-data
  6. R. Bachmann, T. Gerzer, DG Kemper: Big Data - curse or blessing? - Companies in the mirror of social change. Mitp Verlag, Heidelberg / Munich / Landsberg / Frechen / Hamburg 2014, p. 23ff, 2014, p. 27ff.
  7. Stefan Schulz: We and our virtual zombies. In: FAZ. September 15, 2014, accessed February 19, 2015.
  8. a b Götz Hamann, Adam Soboczynski: The attack of intelligence. In: The time. September 10, 2014, accessed February 19, 2015.
  9. a b c Fergus Gloster: Talking about big data but thinking about small data. Computerwoche , October 1, 2014, accessed October 5, 2014 .
  10. Innovation potential analysis. Fraunhofer IAIS, 2012, accessed on May 17, 2016.
  11. Hannes Grassegger, Mikael Krogerus: I only showed that the bomb exists. on: dasmagazin.ch , December 48th, 3rd, 2016.
  12. ^ Rainer Schmidt, Michael Möhring, Stefan Maier, Julia Pietsch, Ralf-Christian Härting: Big Data as Strategic Enabler - Insights from Central European Enterprises . In: Business Information Systems (=  Lecture Notes in Business Information Processing . Volume 176 ). Springer International Publishing, 2014, ISBN 978-3-319-06694-3 , pp. 50-60 , doi : 10.1007 / 978-3-319-06695-0_5 .
  13. Commercial digital surveillance in everyday life. (PDF) on: crackedlabs.org. P. 12 ff.
  14. Klaus Manhart: IDC study on data growth - double the volume of data every two years. ( Memento from December 2, 2013 in the web archive archive.today ) In: CIO. July 12, 2011.
  15. Trend Congress: Big Data, Little Protection. Retrieved November 27, 2012 .
  16. See, for example, Armin Grunwald in an interview: Dangers of digitization: “People no longer notice how fragile the system is”. In: sueddeutsche.de. January 29, 2018, accessed January 30, 2018 .
  17. ^ Hilton Collins: Predicting Crime Using Analytics and Big Data. May 24, 2014, accessed January 23, 2014 .
  18. Ricardo Buettner: A Framework for Recommender Systems in Online Social Network Recruiting: An Interdisciplinary Call to Arms . In: 47th Annual Hawaii International Conference on System Sciences. IEEE, 2014, pp. 1415-1424. doi : 10.13140 / RG.2.1.2127.3048
  19. a b c Hannes Grassegger, Mikael Krogerus: I only showed that the bomb exists. on: dasmagazin.ch , December 48th, 3rd, 2016, accessed on December 10th, 2016.
  20. a b c Peter Welchering : Politics 4.0: Online manipulation of voters. on: deutschlandfunk.de , Computer und Kommunikation , December 10, 2016.
  21. Ricardo Buettner: Predicting user behavior in electronic markets based on personality-mining in large online social networks: A personality-based product recommender framework . In: Electronic Markets: The International Journal on Networked Business . Springer, 2016, p. 1-19 , doi : 10.1007 / s12525-016-0228-z .
  22. Philipp Gölzer: Data-driven operations management: organizational implications of the digital transformation in industrial practice . In: Production Planning & Control . tape 28 , no. 12 . Taylor & Francis, 2017, pp. 1332–1343 , doi : 10.1080 / 09537287.2017.1375148 .
  23. ^ The Time Has Come: Analytics Delivers for IT Operations. (No longer available online.) Data Center Journal, archived from the original on February 24, 2013 ; Retrieved February 18, 2013 .
  24. Big data on the farm. Frankfurter Allgemeine Zeitung, accessed on February 28, 2017 .
  25. Between promise and threat - big data in the insurance industry. (PDF) In: Die Volkswirtschaft, The Magazine for Economic Policy 5-2014. State Secretariat for Economic Affairs (SECO) and Federal Department of Economic Affairs, Education and Research (EAER), May 2014, accessed on October 1, 2016 . Pp. 23-25.
  26. Ben Waber: People Analytics: How Social Sensing Technology Will Transform Business and What It Tells Us about the Future of Work. Financial Times Prent. Int., 2013, ISBN 978-0-13-315831-1 .
  27. Consumer Scoring - "Many don't realize they are being rated all the time" . In: Deutschlandfunk . ( deutschlandfunk.de [accessed on November 1, 2018]).
  28. Yuval Noah Harari: Why Technology Favors Tyranny . In: The Atlantic . October 2018, ISSN  1072-7825 ( theatlantic.com [accessed March 11, 2019]).
  29. Ben Bergen: Big Data in School Classes. (PDF) Retrieved November 19, 2018 .
  30. Cukier, Kenneth; Viktor Mayer-Schönberger: Learning with Big Data: The Future of Education . 1st edition. REDLINE-Verl, Munich 2014, ISBN 3-86881-225-3 .
  31. Obesity and diabetes: early sugar imprinting will last a lifetime , Ärzte Zeitung online, November 9, 2018.
  32. Risk factor nighttime aircraft noise - final report on a case-control study on cardiovascular and mental illnesses in the vicinity of Cologne-Bonn Airport
  33. a b Fabian Prietzel: Big Data is watching you: Personality analysis and microtargeting on social media . In: Markus Appel (ed.): The psychology of the post-factual: About fake news, "Lügenpresse", Clickbait & Co. Springer, Berlin, Heidelberg 2020, ISBN 978-3-662-58695-2 , p. 81–89 , doi : 10.1007 / 978-3-662-58695-2_8 (DOI = 10.1007 / 978-3-662-58695-2_8 [accessed March 23, 2020]).
  34. a b Michal Kosinski, David Stillwell, Thore Graepel: Private traits and attributes are predictable from digital records of human behavior . In: Proceedings of the National Academy of Sciences . tape 110 , no. 15 , April 9, 2013, ISSN  0027-8424 , p. 5802-5805 , doi : 10.1073 / pnas.1218772110 , PMID 23479631 , PMC 3625324 (free full text) - ( pnas.org [accessed March 23, 2020]).
  35. Shoshana Zuboff: "There is an unbearable longing in many of us ." In: Der Spiegel . September 29, 2018 (Spiegel interview). ; Mirjam Hauck: Facebook, Google & Co .. "Surveillance capitalists know everything about us." sz-online, November 7, 2018.
  36. Yilun Wang, Michal Kosinski: Deep neural networks are more accurate than humans at detecting sexual orientation from facial images. In: Journal of Personality and Social Psychology . tape 114 , no. 2 , February 2018, ISSN  1939-1315 , p. 246-257 , doi : 10.1037 / pspa0000098 (DOI = 10.1037 / pspa0000098 [accessed March 23, 2020]).
  37. Andrew G Reece, Christopher M Danforth: Instagram photos reveal predictive markers of depression . In: EPJ Data Science . tape 6 , no. 1 , December 2017, ISSN  2193-1127 , p. 15 , doi : 10.1140 / epjds / s13688-017-0110-z ( springeropen.com [accessed March 23, 2020]).
  38. ^ Stefan Krempl: re: publica: US researcher considers China's social credit system to be propaganda. May 7, 2019, accessed March 23, 2020 .
  39. deutschlandfunk.de , Interview , January 28, 2017, Andreas Dewes in conversation with Stephanie Rohde : It is getting more and more difficult to protect yourself (January 28, 2017)
  40. netzpolitik.org
  41. Between promise and threat - big data in the insurance industry. (PDF) In: Die Volkswirtschaft, The Magazine for Economic Policy 5-2014. State Secretariat for Economic Affairs (SECO) and Federal Department of Economic Affairs, Education and Research (EAER), May 2014, accessed on October 1, 2016 . P. 25.
  42. Weichert calls for questioning and research into “Big Data”. (No longer available online.) March 18, 2013, archived from the original on December 2, 2013 ; Retrieved March 21, 2013 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.datenschutzzentrum.de
  43. Big Data: Both a threat to democracy and an economic opportunity. March 20, 2013, accessed March 21, 2013 .
  44. Dirk Helbing: Big Nudging - not very suitable for problem solving. In: Spektrum.de. November 12, 2015, accessed January 30, 2018 .
  45. Armin Grunwald in an interview: Dangers of digitization: “People no longer notice how fragile the system is”. In: sueddeutsche.de. January 29, 2018, accessed January 30, 2018 .
  46. Big data for the common good - Bring the data cooperative! A suggestion from Nils Zurawski. Deutschlandfunk Kultur, February 20, 2019, accessed on August 21, 2019 .
  47. ^ A b Danah Boyd: Privacy and Publicity in the Context of Big Data. In: WWW 2010 conference. April 29, 2010, accessed on April 18, 2011 (English, Keynote WWW 2010).
  48. Marco Metzler: The Mechanisms of Virtual Relationship Networks . In: Neue Zürcher Zeitung . November 16, 2007.
  49. See also: Chris Anderson in WIRED and cum hoc ergo propter hoc
  50. bigdata.blackblogs.org (September 17, 2016)
  51. freiheit.florianmehnert.de
  52. Stefan Schulz: You know everything. In: FAZ. September 15, 2014, accessed February 19, 2015.
  53. Vera Linß: Non-fiction book about Big Data - dangerous data fusion , Deutschlandradio Kultur, September 15, 2014, accessed on February 19, 2015.
  54. Michael Lange : The true "I" of people , Deutschlandfunk - science in focus . March 20, 2016.