Scrapy (software)

from Wikipedia, the free encyclopedia
Scrapy

logo
Basic data

developer Scrapinghub Ltd
Publishing year 2008-06-26
Current  version 2.2.1
(2020-07-17)
operating system Windows , macOS , Linux
programming language python
License BSD license
scrapy.org , Github

Scrapy ( [skrɛɪ̯pi̯] ) is a free and open source web crawling framework written in the Python programming language . Scrapy was originally designed for web scraping , but it can be used as a general purpose web crawler or to extract data via API . It is currently being used by Scrapinghub Ltd. supervised.

The architecture is based on so-called "spiders". These are self-contained crawlers that are given a series of instructions. Following the principle of other don't repeat yourself frameworks, such as Django , the framework simplifies the construction and scaling of large crawling projects by allowing developers to reuse the code. Scrapy also provides a shell that developers can use to test their assumptions about the behavior of a website.

Some companies and products that use Scrapy are: Lyst, Parse.ly, Sayone Technologies, Sciences Po Medialab, Data.gov.uk.

history

Scrapy originated in London-based e-commerce company Mydeco, where it was developed and operated by employees of Mydeco and Insophia (a web consultancy company based in Montevideo, Uruguay). The first release took place in August 2008 under the BSD license ; version 1.0 appeared in June 2015. In 2011 Scrapinghub became the official supervisor of the project.

Web links

Website

Individual evidence

  1. Frequently Asked Questions .
  2. ^ Scrapy shell .
  3. ^ Eddie Bell, Jonathan Heusser: Scalable Scraping Using Machine Learning .
  4. Scrapy | Companies using Scrapy
  5. ^ Andrew Montalenti: Web Crawling & Metadata Extraction in Python .
  6. ^ Scrapy Companies .
  7. Hyphe v0.0.0: the first release of our new webcrawler is out!
  8. World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords http://bit.ly/5jU3La #opendata #datastore .
  9. Scrapy 1.0 official release out! . 19th June 2015.
  10. Pablo Hoffman: List of the primary authors & contributors 2013 (accessed November 18, 2013).
  11. Interview Scraping Hub .