The SitemapEntity helps you extract all URLs from any website-even if the website doesn’t have a traditional sitemap.xml file. Think of it as a smart crawler that finds every page on a website and organizes them into a clean, easy-to-use list.Whether you need to audit a website’s structure, analyze competitors, or build a content inventory, this entity does the heavy lifting for you. It works fast and reliably, even on websites with thousands of pages.
Filters help you narrow down the results to only the URLs you care about. For example, you might want only blog posts, product pages, or URLs from a specific year.Here’s how filters work:
Websites change over time-pages get deleted, URLs get restructured, or content moves to new locations. When website owners forget to update their sitemaps, they end up with links pointing to pages that no longer exist (also known as “broken links” or “404 errors”).Our system can find these dead URLs for you when:
The validateUrls property is set to true
OR includeDeadUrls option is enabled
OR include property is used (metadata extraction requires URL validation)
This feature is particularly useful for:
SEO audits: Identify and fix broken links that may harm your search rankings
Website maintenance: Keep your sitemap clean and accurate
Migration projects: Verify that all old URLs redirect properly to new locations
Using the validateUrls option may increase processing time, especially for
large sitemaps.
When validateUrls is enabled, you will receive a response like this:
By default, you only get a list of URLs. But with the include option, you can also get extra information about each page-like its title, description, and images. This turns a simple URL list into a complete content inventory.
For large websites with hundreds of thousands (or even millions) of URLs, we use a CDN (Content Delivery Network) to deliver your results. Instead of trying to send all that data in one big response, we upload the files to fast servers around the world and give you download links.
Imagine trying to download a file with 500,000 URLs all at once-it would take forever, might time out, or could even crash your browser or app. The CDN solves this by:
Splitting data into manageable files: No more crashes or timeouts
Faster downloads: Files are served from servers close to you
Multiple formats: Get your data as TXT or CSV files
To use CDN mode, simply set responseType: "cdn" in your request.
Good news: you don’t always need to set responseType: "cdn" manually. If your sitemap has more than 100,000 URLs, we automatically switch to CDN mode. This way, you always get your results without worrying about crashes or timeouts.
The Tedi Browser Network component automatically handles common anti-bot protections, including rate limiting. When such measures are detected, Tedi Browser attempts to bypass them to extract the requested content. In rare cases where anti-bot defenses are highly advanced, manual intervention may be necessary.
Internal testing shows a success rate of nearly 98% against standard anti-bot protections.
This ensure that any Browser entity, can reliably retrieve data even from websites with anti-bot measures in place.
You do not need to worry about anti-bot protections when using Tedi Browser, as these are handled automatically. This is one of the key reasons we provide this entity through our API.
It may sometimes take slightly longer to extract content from such protected sites, but Tedi Browser will make every effort to get you the data you need.This feature is automatically included in all other browser entities, such as this entity and other browser entities.
The Sitemap API makes it easy to get a complete list of URLs from any website-even sites without a traditional sitemap file. Whether you’re doing SEO audits, competitor research, or building a content inventory, you’ll get fast, reliable results.Key features:
Works on any website, with or without a sitemap.xml file
Handles massive sites with millions of URLs
Finds broken links automatically
Delivers results in easy-to-use formats (TXT and CSV)