Sitemaps log

from Wikipedia, the free encyclopedia

The Sitemaps protocol allows a webmaster , search engines about pages of its website to inform that from this read out to be. The standard was adopted on November 16, 2006 by Google , Yahoo and Microsoft . It is an XML based standard.

The aim of the Sitemaps protocol is to improve search results. The uniform standard helps to establish this type of "labeling" of a website, since it is not necessary to create a separate sitemap file for each search engine, as was the case before the standardization.

history

The Sitemaps protocol is based on the idea of web crawler- friendly web servers .

Google released Sitemaps 0.84 technology in June 2005 . This technique enabled webmasters to post a list of links on their page.

In November 2006, MSN and Yahoo announced that they would approve the Sitemaps protocol. The revision ID was changed to Sitemaps 0.90 , but the log remained unchanged.

In April 2007, Ask.com and IBM joined the standard. At the same time, Google, Yahoo and Microsoft announced support for the detection of sitemap files through the Robots Exclusion Standard .

XML sitemap format

Sitemaps file
File extension : .xml, .gz
MIME type : application / xml, text / xml
Extended by: XML
Standard (s) : sitemaps.org


Sitemap files are ordinary text files that use Extensible Markup Language . Sitemap files must use the UTF-8 character encoding .

As an alternative to the extensive XML notation, sitemap files can also be ordinary text files that only contain a list of URLs.

In addition, the standard stipulates that sitemap files, regardless of their form, can also be gzip-compressed .

In robots.txtcontrast to files, the file name of sitemap files is basically irrelevant. Also file extensions play, even with GZIP compression, no role.

Restrictions

According to the protocol, sitemap files must not contain more than 50,000 URLs in total and a maximum of 50  MB (52,428,800 bytes). When using compressed sitemap files, the uncompressed sitemap file must not be larger than 50 MB. This limitation can be circumvented by using several sitemap files, for which a “main” sitemap is created, which refers to a maximum of 50,000 sitemaps. In this way, theoretically 50,000 × 50,000 = 2,500,000,000 (2.5 billion) URLs can be written.

Examples

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
 <url>
  <loc>http://example.com/</loc>
  <lastmod>2006-11-18</lastmod>
  <changefreq>daily</changefreq>
  <priority>0.8</priority>
 </url>
</urlset>
 http://example.com/seite1.html
 http://example.com/verzeichnis/seite2.html
 http://example.com/bild3.png

Submission of sitemap files to search engines

Unlike robots.txtfiles, Sitemaps files are not necessarily published in a special place on the website, but are sent directly to any search engine (in a method similar to a pingback ). This then returns status outputs or errors when processing the sitemaps file. The data transferred with this submission, i. This means that the query mask and the output format depend heavily on the search engines used, the Sitemaps standard does not make any statements about this.

Alternatively, the address of a sitemap file can also be included in the robots.txtby placing the line

 Sitemap: sitemap_url

inserts, where sitemap_url represents the complete URL to the sitemap (e.g. http://www.example.org/sitemap.xml ). This information is evaluated independently of the user agent context, so the position of the line is irrelevant. If a website has multiple sitemaps, this URL should point to the main sitemap file.

The contents of a sitemap are not to be confused with commands. They just give a web crawler recommendations on how to most efficiently index a website. Whether or to what extent these will actually be implemented cannot be bindingly determined with sitemaps.

Web links

Individual evidence

  1. M. L. Nelson, J. A. Smith, del Campo, H. Van de Sompel, X. Liu: Efficient, Automated Web Resource Harvesting. 2006 ( public.lanl.gov PDF)
  2. ^ O. Brandman, J. Cho, Héctor García-Molina , Narayanan Shivakumar: Crawler-friendly web servers. In: Proceedings of ACM SIGMETRICS Performance Evaluation Review. Volume 28, No. 2, 2000.
  3. ^ Google blog: Webmaster-friendly
  4. googlepress.blogspot.de
  5. sitemaps.org