Bielefeld Academic Search Engine
Bielefeld Academic Search Engine (BASE) | |
---|---|
www.base-search.net | |
description | Internet search engine |
Registration | optional |
languages | Chinese (simplified), German, English, French, Spanish (Castilian), Polish, Modern Greek, Ukrainian |
owner | Bielefeld University Library |
Originator | v. a. institutional repositories |
Published | June 24, 2004 |
items | over 150 million |
status | on-line |
BASE ( Bielefeld Academic Search Engine ) is a search engine for scientific documents. It is operated by the Bielefeld University Library with the search engine technology of the open source software Solr / Lucene . BASE is continuously being developed as a strategic project.
Target group and goal setting
BASE's offer is primarily aimed at scientists in universities and research institutions and at students. With the development of BASE, the university library is pursuing the goal of building a reliable, high-quality search service for research and teaching with the help of search engine technology.
BASE would like to give access to the content of scientific document servers that are made available free of charge as part of the Open Access movement via OAI-PMH ( Open Archives Initiative Protocol for Metadata Harvesting). The search engine is registered as an official OAI service provider and was involved in the EU project DRIVER (Digital Repository Infrastructure Vision for European Research), which was completed in 2009.
Due to the intellectual selection of the sources, BASE aims to deliver professionally qualified information in connection with extensive and high-quality metadata and thereby differentiate itself from commercial search engines.
Development history
chronology
date | event |
---|---|
June 2001 | A new conceptual idea arises from the deficits found in a meta search environment using the example of the library portal "Digitale Bibliothek NRW" : development of a non-commercial search engine for scientific use |
Feb. 2002 - Aug. 2002 | Evaluation of search engine technology |
2003, summer | Start of technical implementation; Development of a prototype (math demonstrator) |
Oct 2003 | Announcement of the collaboration between the Bielefeld University Library and the FAST company : start of a strategic partnership to test and promote enterprise search technologies; Agreement on the use of the "FAST Data Search" system |
March 2004, spring | Completion of the trial phase |
June 2004 | Activation of the Bielefeld Academic Search Engine |
Aug 2004 | Integration of further sources (university publication server, OAI sources, non-OAI-compatible sources); first indexing of full texts (electronic dissertations of the Ruhr University Bochum) |
Aug 2005 | new possibilities of search refinement (restriction to the data source), different sorting of hits, search history of performed search queries |
Feb 2006 | Replacement of the single server solution with a server farm (6 Linux computers) |
March 2006 | Integration of hit-related links to the scientific search engine Google Scholar |
June 2006 | Start of participation in the EU project DRIVER (Digital Repository Infrastructure Vision for European Research) |
May 2007 | Search for similar word forms |
July 2007 | over 100 German repositories in BASE; Introduction of a public test area: BASE Lab |
Oct 2007 | Multilingual search ( Eurovoc thesaurus ) |
July 2008 | Adoption of search results via extensions of the Firefox browser in reference management programs |
Jan. 2009 | Website relaunch: filtering according to document types in advanced search |
Aug 2010 | More than 25 million documents in the BASE index |
Feb 2011 | Preparation of the platform change from FAST to Lucene / Solr |
May 2011 | Release of the BASE index produced with Lucene / Solr |
Aug 2011 | More than 30 million documents from over 2,000 sources in the BASE index |
Jan. 2012 | Mobile version for smartphone users |
Apr. 2012 | Possibility to set up a personal login |
July 2012 | Own search interface for document servers located in Germany |
Aug 2013 | Marking of Open Access documents and sources with appropriate symbols. More than 50 million documents from 2,700 sources in the BASE index |
Nov 2013 | More than 3.3 million documents indexed by CiteSeerX for the first time. |
June 2014 | More than 3,000 sources / 60 million documents in the index. |
Sep 2014 | Open Access documents are upgraded in the relevance ranking (can be switched off); from July 2014 on a test basis in the BASE Lab, from 23 September 2014 regular. |
Aug 2015 | Search filters for subsequent use (license) and access (e.g. open access) can be selected in the advanced search |
Oct. 2016 | In October 2016, more than 100 million documents were in the search index for the first time. |
May 2017 | Since May 2017 users can "claim" their own publications, i. H. link with their ORCID . |
Content
Scientific internet sources
The content of BASE is multidisciplinary. Only scientific sources are evaluated. BASE aims to tap into "Internet sources of the ' invisible web ' which are not indexed in commercial search engines or which are lost in their large numbers of hits" . BASE indexed:
- OAI metadata
- The search engine primarily contains metadata from repositories that provide their content via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
- Data from local document servers at Bielefeld University
- Selected websites (e.g. websites of scientific organizations, Wikibooks , Project Gutenberg )
Choice of sources and transparency
All searched sources are intellectually selected and checked. A list of sources makes the selection transparent. In addition to the indexed sources, more than 1,000 other sources with over 30 million documents were harvested , which, however, are not suitable for indexing for various reasons.
Timeliness and scope
The index is updated daily, the contents of individual document servers are updated weekly.
149,820,832 documents from 7,188 sources are currently searchable via BASE. The number of documents and sources has increased steadily since production started, and the index is being expanded further. Repository operators who are not listed in the list of sources are asked to contact the BASE team.
Country coverage and languages
Sources by country
In total, there are sources from 132 countries in the index. The countries with more than 100 indexed sources (repositories) are:
country | swell | Documents |
---|---|---|
United States | 1112 | 46,783,311 |
Indonesia | 927 | 1,953,101 |
Japan | 565 | 2,647,278 |
Germany | 406 | 23,498,271 |
United Kingdom | 342 | 5,309,052 |
Brazil | 321 | 2,542,618 |
Spain | 264 | 5,766,729 |
France | 201 | 11,469,711 |
Russia | 192 | 2,289,342 |
Canada | 182 | 1,785,127 |
Italy | 171 | 5,285,122 |
India | 157 | 632.818 |
Ukraine | 136 | 818,900 |
Colombia | 133 | 599,628 |
Peru | 121 | 287,250 |
Turkey | 116 | 939.118 |
Poland | 114 | 3,622,417 |
Sources by continent
The European countries are most frequently represented, followed by Asia, North America, South America, Australia and Africa.
continent | swell | Documents |
---|---|---|
Europe | 2756 | 73,949,899 |
Asia | 1972 | 9,672,137 |
North America | 1405 | 49.128.728 |
South America | 784 | 4,633,903 |
Australia / Oceania | 113 | 3,287,447 |
Africa | 112 | 862.943 |
International / not assigned | 46 | 8,285,775 |
All information: as of July 25, 2019
Documents by language
Sorted by language, the following picture emerges of the most used languages with more than 250,000 recorded documents:
language | Documents |
---|---|
English | 62,893,004 |
French | 8,884,514 |
Spanish | 7,045,578 |
German | 5,406,771 |
Portuguese | 3,305,663 |
Polish | 2,679,494 |
Italian | 2,217,286 |
Japanese | 2,070,289 |
Russian | 1,499,516 |
Chinese | 1,465,775 |
Latin | 790.911 |
Norwegian | 785,804 |
Dutch | 717.558 |
Bahasa Indonesia | 707.180 |
Ukrainian | 569,819 |
Turkish | 551,805 |
Modern Greek | 462.023 |
Catalan | 456.335 |
Czech | 438.857 |
Swedish | 430.280 |
Finnish | 426.815 |
Hungarian | 412,687 |
Danish | 409.035 |
Croatian | 299,451 |
About 1/3 of all sources are not assigned to any language.
Access to the indexed documents
BASE does not only provide evidence of Open Access offers . BASE offers the option of restricting a hit list to documents clearly classified as Open Access. At the moment, only 45% of the indexed documents can be unequivocally marked as Open Access by BASE, even if the actual rate of freely accessible documents is around 60%. The labeling of Open Access documents at document level is to be expanded. Since July 2014, Open Access documents have been given a boost factor in the relevance ranking, i.e. they tend to be displayed higher up in the list of results. This new function can be switched off.
Functions
The barrier-free user interface of BASE is simple and clear. The search interface is optionally available in Chinese (simplified script), German, English, French, Greek, Polish, Spanish (Castilian) or Ukrainian. Information about BASE is available in German and English.
The start page enables a search in the BASE index (standard search). From here the transition to the other functional and research areas of BASE takes place: advanced search, help, browsing and search history as well as to the mobile version . The options are located in a heading bar that is uniformly designed for all research pages, so that you can easily switch between the functions. Below the search mask you get u. a. to the pages About BASE (general information about the research portal), the BASE blog and the Twitter channel .
Research functionality
Standard search Consciously oriented towards the success of Google, BASE presents the user with the standard search with just one simple search field, which is available as standard for the free text search. With the help of a syntax explained in the help, it is possible to limit the search for individual terms to individual metadata fields. When entering the search terms, placeholders can be used for right truncation.
In addition, the standard search offers the option of automatically expanding the search terms to other word forms ( lemmatization ).
Advanced search The advanced search allows you to enter the search terms specifically for the following metadata fields as standard: Entire document, title, author, keywords, DOI, (part of) the URL and publisher. The search in the entire document corresponds to the standard search. The individual metadata fields can be combined with one another. They are automatically linked with the Boolean operator AND. Within a search field, the search terms can be combined using various Boolean operators using a special syntax documented in the help.
In addition, there is the option of restricting the search to the origin of the sources (certain countries or continents), to certain years or periods of publication, to certain types of documents (e.g. books, articles, dissertations, videos) and licenses for subsequent use ( creative Commons , public domain , software licenses such as GPL ). The number of titles displayed in the hit list can also be limited (10, 20, 30, 50 or 100).
Results display The search results are displayed in a list that is sorted by relevance by default . The determination of the relevance takes place according to various criteria, e.g. B. It makes a difference whether the search term occurs in the title or just elsewhere. The predefined ranking can, however, be changed and a user-defined sorting according to author, title or year of publication can be selected, optionally in ascending or descending order.
The individual search result contains - if available - extensive, qualified metadata (e.g., in addition to title and author, keywords, publisher, source, language, abstract, URL). Integrated into the hit display is the
- Link to the original document (metadata or electronic full text),
- Link to a new search query for the author,
- Link to the data provider,
- Link to a search query in Google Scholar (by searching for the title in Google Scholar, linked citations or different versions of the work can be found),
- Link to export via email and in reference management programs ,
- Link to add as a favorite in the personal profile (with login ).
If the number of hits is too extensive, it can be limited to author, keyword , Dewey decimal classification , year of publication, source, language, document type, access (open access / unknown) or subsequent use (license). Only one option can be selected from the drop-down menus at a time.
In addition, the search queries of the current session are displayed in a search history, which can be reissued each time. Search queries can also be saved permanently with a personal login. Furthermore, can the searches as RSS - or nuclear - web feed to subscribe to that search results can be sent or stored by e-mail. A personal login is also required for the latter.
A new search can be triggered directly from the hit list by changing the current search query.
Browsing
In addition to the search, BASE also offers browsing according to Dewey decimal classification (DDC), document type, subsequent use / license and access. The DDC of the documents is determined in two different ways: On the one hand, DDC numbers are already assigned by some data sources, which are transferred directly to the browsing. On the other hand, documents are also automatically reclassified within BASE. The technology used for this was developed as part of the DFG- funded project "Automatic enrichment of OAI metadata".
Discontinued projects
BASE DE
In a separate search interface, you could search specifically in sources whose document servers are located in Germany. This should enable national proof of OAI metadata. The so-called "Germany View" comprised around 6,300,000 documents from over 250 sources.
BASE Lab
With BASE Lab, BASE offered a public test area in which new functions could be tried out. The following functions first appeared there:
- Use of computational linguistic processes for the automatic classification of OAI metadata within the framework of the DFG project "Automatic enrichment of OAI metadata with the aid of computational linguistic processes and development of services for the content-oriented networking of repositories".
- Development of a service for the provision of aggregated and normalized OAI metadata
- Expansion of the labeling of open access documents
- Higher weighting of open access documents
technical basics
Search engine technology
The technical basis is the search engine technology from Solr and Vufind . It enables
- the use of linguistic methods to optimize search queries (e.g. lemmatization , decomposition of compounds , permutations )
- The search terms are extended to other word forms (plural, genitive) through automatic language recognition and the creation of dictionaries.
- Relevance ranking of search results
- The relevance is determined by an algorithm contained in the software
- Subsequent limitation of the number of hits according to certain criteria (author, keyword, year of publication, source, language and type of document).
Integration of the data sources
The data is integrated into the search engine via different interfaces, namely via
- As a rule: OAI harvesting
- Metadata from selected OAI document servers are integrated via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
- in special cases: web crawlers
- Content from scientific websites is collected by a dedicated web crawler . The full text data determined here are analyzed for the metadata it contains.
Most of the data collected in Dublin Core format is very heterogeneous and therefore has to be normalized in a complex manner before indexing.
Interfaces to third-party providers
BASE enables a direct search for individual titles in Google Scholar through links in the hit lists . If the user uses BASE on site in a library, links in the Google hit lists can lead to the full text offered by the library. This requires the library to be configured.
Interfaces for the re-use of BASE services and data
BASE offers several programming interfaces :
- The search or HTTP interface is a REST API for direct searches in the BASE index via Solr. Use is free of charge for non-commercial projects and only requires the registration of a fixed IP address .
- The OAI-PMH- API offers project partners and selected non-commercial projects the opportunity to obtain the normalized BASE data (or thematic excerpts) up-to-date.
- The installation of an HTML form as a search box for searching in BASE from your own website can be implemented without any programming effort .
Subsequent users
Integration in specialist portals
BASE is integrated into the metasearch of several German specialist portals . Subject portal paedagogik.de , Germanistik on the net , ilissAfrica , vifabio , virtual subject library medien buehne film and Livivo ( ZB MED ) either integrate the full BASE index or filter the search query according to a selection of repositories that match the respective subject. Since not only classic university publication servers, but also platforms with digitized photos, maps and other source materials are harvested, BASE also paves the way for primary research data and virtual research environments .
Use by open access services
BASE is a primary source of the web service dissem.in , which helps authors to discover their own specialist publications that are (still) hidden behind a paywall , although the authors are allowed to offer them for free download.
In a similar way, the web-based Altmetrik service uses Impactstory BASE to check whether there is a freely available version of an article in the sense of the green path to open access .
The alternative DOI resolver doai.io and oadoi.org use BASE to find freely available versions (e.g. preprints / eprints ) of articles that are otherwise only available against payment or with a campus license.
The browser plug-in Unpaywall uses BASE data to display a link to a legal, free version of the same work (if available) when accessing academic payment barriers .
Use by Discovery Services
The EBSCO Discovery Service (EDS) has been integrating the data collected and processed by BASE into its service since December 2015 .
Use by other search engines
BASE is a default active source of the non-commercial German meta search engine MetaGer and (since mid-2016) the meta search engines etools.ch (optional) and Searx (in the Science tab). BASE can also search the bibliographic metasearch Karlsruhe Virtual Catalog .
Comparable offers
A similar offer as BASE offer the British CORE (Connecting repositories) and originally from the University of Michigan developed OAIster (now part of OCLC ). Both are much smaller in size. Comparable commercial search engines with a scientific cut - but lower metadata quality - are Google Scholar and Microsoft Academic Search .
literature
- Dirk Pieper, Friedrich Summann: Bielefeld Academic Search Engine (BASE): An end-user oriented institutional repository search service . In: Library Hi Tech. Vol. 24, No. 4, 2006, pp. 614-619. Accessed: August 27, 2013.
- Dirk Pieper, Sebastian Wolf: BASE - A search engine for OAI sources and scientific websites . In: Information, Wissenschaft & Praxis (IWP). Vol. 58, No. 3, 2007, pp. 179-182. Accessed: August 27, 2013.
- Further reading on the BASE website About BASE: Publications . Accessed: August 27, 2013.
Web links
Individual evidence
- ↑ a b FAQ . As of August 27, 2013.
- ↑ a b BASE Lab (further developments) . As of August 27, 2013.
- ↑ Norbert Lossau: Search engine technology and digital libraries - libraries have to open up the scientific Internet . In: Journal for Books and Libraries (ZfBB). 51 (2004), 5/6, p. 293. Accessed October 4, 2011.
- ↑ List of official OAI service providers: http://www.openarchives.org/service/listproviders.html . As of August 27, 2013.
- ↑ website of the DRIVER project ( Memento of 30 August 2013, Internet Archive )
- ↑ a b About BASE . As of August 27, 2013.
- ↑ a b Norbert Lossau, Friedrich Summann: Search engine technology and digital libraries: From theory to practice . In: Journal for Books and Libraries (ZfBB). 52 (2005), 1, p. 13. Accessed: August 27, 2013
- ↑ Norbert Lossau, Friedrich Summann: Search engine technology and digital libraries: From theory to practice . In: Journal for Books and Libraries (ZfBB). 52 (2005), 1, p. 13. Accessed: August 27, 2013. The use of Google software failed early due to organizational difficulties. Convera, Mnogo, Lucene and Fast Data Search were tested.
- ↑ Norbert Lossau, Friedrich Summann: Search engine technology and digital libraries: From theory to practice . In: Journal for Books and Libraries (ZfBB). 52 (2005), 1, p. 13 f. Accessed: August 27, 2013. Together with the University Library Center Cologne (hbz), an application for participation in the national project "Distributed Document Server (VDS)" was submitted on the basis of this preliminary work.
- ↑ Bielefeld University Library and industry leader FAST start strategic partnership to test and promote the new generation of enterprise search technologies for digital libraries . Bielefeld University, Information and Press Office: Press Release No. 168/2003. Accessed: August 27, 2013.
- ↑ Norbert Lossau, Friedrich Summann: Search engine technology and digital libraries: From theory to practice . In: Journal for Books and Libraries (ZfBB). 52 (2005), 1, p. 14f. Accessed: August 27, 2013.
- ^ Urte Kramer: Bielefeld Academic Search Engine . In: InetBib . June 24, 2004. Accessed: August 27, 2013.
- ↑ Urte Kramer: BASE Update . In: InetBib. August 27, 2004. Accessed: August 27, 2013.
- ↑ Urte Kramer: BASE: new release . In: InetBib. August 10, 2005. Accessed: August 27, 2013.
- ↑ Friedrich Summann, Sebastian Wolf: search engine technology and scientific search environment . In: VÖB Online-Mitteilungen. OM 86 (June 2006), p. 6. Accessed: August 27, 2013.
- ↑ Sebastian Wolf: BASE - new release with Google Scholar links . In: InetBib. March 2, 2006. Accessed: August 27, 2013.
- ↑ a b Sebastian Wolf: BASE-Update / DRIVER . In: InetBib. May 18, 2007. Access: August 27, 2013. The aim of the project is to network scientific repositories from universities and research institutions in Europe. Bielefeld University Library is responsible for the areas of harvesting, aggregation, storage and indexing of OAI metadata and contributes the expertise it has acquired through BASE to the project.
- ↑ Sebastian Wolf: More than 100 German repositories in BASE / New Features in BASE Lab . In: InetBib. July 6, 2007. Accessed: August 27, 2013.
- ↑ Sebastian Wolf: Over 500 repositories in BASE . In: InetBib. May 18, 2007. Accessed: August 27, 2013.
- ↑ Dirk Pieper: BASE Update . In: InetBib. July 7, 2008. Accessed: August 27, 2013.
- ↑ Sebastian Wolf: Search engine BASE: Over 1080 sources and new functions . In: InetBib. February 3, 2009. Accessed: August 27, 2013.
- ↑ Dirk Pieper: More than 25 million documents in BASE . In: InetBib. August 4, 2010. Accessed: August 27, 2013.
- ↑ Dirk Pieper: New BASE version in the BASE Lab . In: InetBib. February 14, 2011. Accessed: August 27, 2013.
- ↑ Dirk Pieper: BASE Migration . In: InetBib. May 18, 2011. Accessed: August 27, 2013.
- ↑ Sebastian Wolf: BASE search engine: Over 30 million documents / 2000 sources . In: InetBib. August 22, 2011. Accessed: August 27, 2013.
- ↑ Dirk Pieper: BASE smartphone usage . In: BASE blog . January 9, 2012. Accessed: August 27, 2013.
- ↑ Sebastian Wolf: Personal profile for BASE . In: BASE blog . April 17, 2013. Accessed: August 27, 2013.
- ↑ Dirk Pieper: National evidence of OAI metadata . In: BASE blog . July 17, 2013. Accessed: August 27, 2013.
- ↑ Dirk Pieper: New milestone for BASE: 50 million documents! In: BASE blog . August 26, 2013. Accessed: August 27, 2013.
- ↑ Dirk Pieper: Over 3.3 million documents from CiteSeerX in BASE In: BASE blog . December 11, 2013. Accessed: August 5, 2015.
- ↑ Sebastian Wolf: 60 million documents from 3000 sources in the BASE index . In: BASE blog . May 20, 2014. Accessed: June 25, 2014.
- ↑ a b c Sebastian Wolf: "Boosten" Open Access documents . In: BASE blog . July 29, 2014. Accessed: August 5, 2015.
- ↑ Christian Pietsch on Twitter : From now on, BASE (Bielefeld Academic Search Engine @BASEsearch) by default boosts search results that are declared Open Access. #openaccess . September 23, 2014. Accessed: August 5, 2015.
- ↑ @BASEsearch on Twitter: We added 2 new features: Search by license, eg #CreativeCommons and by access, eg #OpenAccess on August 25, 2015. Accessed : October 6, 2015.
- ↑ Bernd Fehling: OA (open access) processing . In: Inside BASE . September 19, 2015. Accessed October 6, 2015.
- ↑ @BASEsearch on Twitter: Huge milestone for BASE: More than 100 million documents indexed, about 60% Open Access on October 28, 2016. Accessed: November 14, 2016.
- ^ Paul Vierkant: ORCID claiming possible in BASE. Website of the DFG project ORCID DE on June 1, 2017. Accessed: August 16, 2017.
- ↑ a b c Dirk Pieper, Sebastian Wolf: Scientific documents in search engines . In: Handbook of Internet Search Engines. Heidelberg, 2009, p. 362. Accessed: August 27, 2013.
- ↑ a b c About BASE: The sources . Access: June 25, 2019.
- ↑ a b Sebastian Wolf: 10 years BASE . Accessed: June 25, 2014.
- ↑ About BASE: The sources (countries) . Access: June 25, 2019.
- ↑ Search in the entire index, limit search results by language . accessed on June 25, 2019.
- ↑ FAQ . This means that the metadata of the documents are displayed, but are not necessarily freely accessible in the electronic full text. If the document is subject to a license, BASE points out that the license control is carried out exclusively by the data provider and that the information seeker should contact his institution or university in order to obtain access. Accessed: August 27, 2013.
- ^ Matthias Lösch: Automatic subject indexing of electronic documents . Accessed: June 25, 2014.
- ↑ National evidence of OAI metadata . Accessed July 7, 2014.
- ↑ Norbert Lossau, Friedrich Summann: Search engine technology and digital libraries: From theory to practice . In: Journal for Books and Libraries (ZfBB). 52 (2005), 1, p. 15. Accessed: August 27, 2013.
- ↑ Dirk Pieper, Sebastian Wolf: BASE - A search engine for OAI sources and scientific websites . In: Information, Wissenschaft & Praxis (IWP). Vol. 58, No. 3, 2007, p. 155. Accessed: August 27, 2013.
- ↑ About BASE: Services . Retrieved August 5, 2015.
- ↑ http://dissem.in/sources
- ↑ Heather Piwowar: Now, a better way to find and reward open access . In: Impactstory Blog . June 5, 2016. Retrieved August 5, 2016.
- ↑ DOAI website . CAPSH (Committee for the Accessibility of Publications in Sciences and Humanities). Retrieved August 6, 2016.
- ↑ Frequently asked questions . In: unpaywall . Retrieved August 16, 2017.
- ↑ 80 million documents from BASE now accessible to EDS users . EBSCO. December 7, 2015. Retrieved August 5, 2016.
- ↑ https://etools.ch/
- ↑ The metasearch engine Searx contains a plugin for BASE since version 0.9.0.
- ↑ http://core.ac.uk/ CORE (COnnecting REpositories). Retrieved August 5, 2015.