Primary data

from Wikipedia, the free encyclopedia

Primary data , including raw data or original data , are those data that are obtained directly from an observation , a measurement or a data collection and which are still unprocessed. In contrast to this are the derived secondary data (processed data), which are obtained from the primary data during raw data processing .

Primary data in physical measurements are called measurement data or measured values .

Occasionally, a distinction is made between primary data and raw data as follows: Raw data are all data that are collected during an observation , measurement or data collection ; Primary data are the data that are used as a subset of the raw data for research.

Primary data is in the format in which it was recorded during the original observation , measurement, or data collection . It can be a recording on paper or an electronically generated raw data format that is stored as a file , image, data set in a database or in another digital format. Primary data is often supplemented with metadata during recording . The metadata contain essential information on characteristics of the primary data and are therefore part of the primary data.

Examples

Measuring a room results in a length of 5 m, a width of 4 m and a height of 2.80 m. These three values ​​are the primary data. The secondary data area with 20 m 2 and room volume with 56 m 3 can be derived from them by calculation .

In the laboratory, a scientist measures the pressure of a gas in a sealed container at different temperatures . The primary data here are the temperatures and the measured pressures. Metadata can be information about the measurement method, time and person who performed the measurement. Scientific laws or empirical formulas for the behavior of the gas can be derived from the primary measured values.

A team of doctors collects data on the effects of a drug in a clinical study . The blood values ​​of the test subjects , for example, are the primary data . Analyzes of the data enable statements to be made about the effectiveness and safety of the drug.

In digital photography, cameras often save the information read out by the photo diodes of the image sensor in raw data format (RAW format). The primary image data are therefore available unprocessed as they were captured by the camera's sensor . The camera saves additional information such as metadata . B. Date of recording or location ( GPS coordinates ). During image processing , the raw data is developed in the camera or externally on a computer (for example with regard to the color space or the dynamic range) and interpreted by the photographer. The edited images are often saved in a compressed format such as JPG .

An opinion research institute determines the popularity of political parties as part of a voter survey. The primary data here are information on the people surveyed (age, gender, etc.) and the people's answers. Evaluation and interpretation of the primary data obtained during the survey enable forecasts of the possible election result.

trouble

Primary data is unchecked data. They can be afflicted with errors (e.g. measurement errors ) of various kinds. Untested use can lead to wrong conclusions. Criticism of primary data can only relate to the survey methodology or the diligence of the survey.

In some cases - especially in the case of complex measurements with computer-controlled devices - a clear separation between primary and secondary data is often difficult because the electronics or the software of the devices can already prepare (preprocess) the primary data.

Test laboratories in the regulated area

Test laboratories ensure the quality of products in many areas and are therefore subject to different regulations or quality management standards such as GLP , GMP or ISO 9001, depending on the type of laboratory and the products to be tested . All of these regulations use definitions of raw data similar to the following: “The GLP Principles define raw data as all original laboratory records and records, including data that goes through a device interface directly into a computer as a result of the original observations or activities at a Examination and which are necessary for the reconstruction and evaluation of the final report of this examination ".

This definition shows that not only the data that arise during a measurement, but also the documentation of the activities associated with performing the measurement are part of the raw data. Only then can the measurement be reconstructed and the evaluation of the data be verifiably evaluated.

EDP ​​systems that are used in such test laboratories for the acquisition, processing, storage and archiving of raw data must be validated . For the archiving of the raw data each different retention periods apply to quality assurance system.

Regular documentation of the raw data obtained during the tests in a test laboratory enables proof that a product meets the required quality standards. B. protect against claims for damages.

Research laboratories

Research laboratories document the planning, implementation and evaluation of scientific experiments in laboratory journals , whereby similar requirements apply as for test laboratories. This means that the raw data can later be re-evaluated using other methods or by other scientists.

The recognition of an invention or a patent can depend on the meticulous documentation of the experiments and the raw data obtained during the measurements. Documentation requirements and retention periods for raw data depend on the type of laboratory and the purpose of the research. If inventions are to be patented, country-specific patent laws apply.

photography

The raw file contains the unfiltered image data as a digital negative as captured by the camera sensor. The photo can be re-developed from the raw file at any time, taking different aspects into account. With the help of the raw file, a photographer can prove that his photo was not improperly retouched.

Social sciences

The distinction between primary and secondary data becomes important even with more complex considerations. So are z. For example, in the case of measurements in the social science field, to differentiate between the primary and secondary data , because the survey method (including, for example, the type of question in an opinion poll) can be significant for the informative value of the derived data. By anonymizing the primary data, data elements can (intentionally) be lost, which limit the evaluation options. The anonymization also makes it possible to protect the privacy of interview partners.

Databases

Some database systems offer a special data type of raw data (raw, long raw or raw binary data) for storing binary character strings. This data type has nothing to do with the primary data described above. Nowadays it is largely replaced by the Binary Large Objects (BLOB).

See also

Individual evidence

  1. a b c Federal Ministry for the Environment, Nature Conservation and Nuclear Safety: Announcement of a consensus document of the Federal- State Working Group on Good Laboratory Practice (GLP) on the subject of "Good Laboratory Practice (GLP) and Data Processing". October 28, 1996, accessed June 16, 2020 .
  2. a b CFR - Code of Federal Regulations Title 21 PART 58 Good Laboratory Practice For Nonclinical Laboratory Studies § 58.3 - Definitions. April 1, 2019, accessed June 16, 2020 .
  3. a b OECD Principles of Good Laboratory Practice. In: OECD publications on environmental safety and hygiene (EHS). 1997, accessed June 16, 2020 .
  4. Stefan Luber, Nico Litzel: What are raw data? In: BigData-Insider. Vogel IT-Medien, April 9, 2020, accessed on June 16, 2020 .
  5. ^ Michael Franke: Research data management. In: University of Applied Sciences for Public Administration and Justice in Bavaria. Max Planck Digital Library, 2014, accessed June 28, 2020 .
  6. a b Digital imaging methods, chapter light conversion. In: Wikibooks. Retrieved June 16, 2020 .
  7. Bernhard Appel, Christoph Hornberger, Jannis Batoulis, Konstantin Clevermann, Ralf Hössel, Dieter Weiser: Archiving electronic data in the GxP environment Part 3: Implementation of archiving electronic data. In: Pharm. Ind. 73, No. 7, 1207-1215 (2011). 2011, accessed June 16, 2020 .
  8. Anke Schwarzer, Sandra Schwarzer: Das Laborjournal - For the documentation of the experiments and results. Institute for Inorganic Chemistry of the Technical University Bergakademie Freiberg, 2011, accessed on June 16, 2020 .
  9. Tim Aschermann: What is RAW? Explained in a simple and understandable way. In: Chip practical tips. September 7, 2018, accessed June 16, 2020 .
  10. ^ Database Concepts; Chapter 26: Oracle Data Types. Retrieved June 16, 2020 .