SitemapEntity represents a sitemap’s structure within the browser network API. It encapsulates comprehensive information about URLs, their hierarchical relationships, and associated metadata for each page in the sitemap. Even when no sitemap.xml file is available or no sitemap entry exists in robots.txt, our system can intelligently discover and compile sitemap data.
Our system employs various methods and algorithms to extract website sitemaps with maximum efficiency and accuracy. The SitemapEntity is specifically designed to organize this data in a structured, accessible format that simplifies manipulation and retrieval. Built on Rust, it delivers optimal performance and reliability, ensuring rapid and precise processing of sitemap data.
Definition
Below are the key definitions related to theSitemapEntity:
SitemapEntity
| Property | Type | Default | Description |
|---|---|---|---|
sort | 'asc' | 'desc' | 'asc' | Specifies the sorting order of URLs in the sitemap. Use ‘asc’ for ascending order and ‘desc’ for descending order. |
type | 'absolute' | 'relative' | 'absolute' | Determines whether URLs are returned as absolute (complete URLs) or relative (domain-relative paths). |
urls | string[] | - | An array of URLs from which to extract sitemap data. |
Examples
Basic Example
Below is a basic example of aSitemapEntity:
With Sorting and Type
Below is an example of aSitemapEntity with specified sorting order and URL type:
SitemapEntity will return URLs sorted in descending order and formatted as relative paths.
Relative Response Example:
Anti-Bot
The Tedi Browser Network component automatically handles common anti-bot protections, including rate limiting. When such measures are detected, Tedi Browser attempts to bypass them to extract the requested content. In rare cases where anti-bot defenses are highly advanced, manual intervention may be necessary. This ensure that any Browser entity, can reliably retrieve data even from websites with anti-bot measures in place.You do not need to worry about anti-bot protections when using Tedi Browser, as these are handled automatically. This is one of the key reasons we provide this entity through our API.It may sometimes take slightly longer to extract content from such protected sites, but Tedi Browser will make every effort to get you the data you need. This feature is automatically included in all other browser entities, such as this entity and other browser entities.
Conclusion
TheSitemapEntity is a powerful tool for managing and organizing sitemap data within the browser network API. Its structured format and customizable properties make it easy to extract, sort, and utilize URLs effectively.
This entity can map websites even when no sitemap.xml file is available or no sitemap entry exists in robots.txt. It uses advanced techniques to discover all accessible pages on the website, extending Tedi Radar capabilities.
