ElementEntity is a highly flexible and efficient solution for extracting elements from web pages. Built on top of Rust for optimal performance, it supports a wide range of use cases and advanced features.
We recommend using ElementEntity for all your element extraction needs, as it offers comprehensive options for structured data extraction and manipulation.
Definition
Below is the definition of theElementEntity interface used in the Browser Network component for extracting elements from web pages.
ElementEntity
| Property | Type | Description |
|---|---|---|
selector | keyof HTMLElementTagNameMap | string | A CSS selector string used to identify the elements to be extracted from the web page. |
children | ElementEntity[] | An array of nested ElementEntity definitions for extracting structured data from child elements. |
displayName | string | A custom name for the extracted data key in the response. |
onlyFirstMatch | boolean | If set to true, only the first matched element will be returned instead of an array. |
regex | RegExp | string | A regular expression pattern to further filter the extracted elements. |
regexFlags | "i" | "g" | "gi" | "m" | "gm" | "im" | "gim" | Flags for the regular expression (e.g., case-insensitive, global). |
replace | object | object[] | A pattern and value to replace in the extracted elements. Can be a single object or an array of objects for multiple replacements. |
defaultValue | any | A default value to return if no elements are found. |
unique | boolean | If set to true, only unique values will be returned, removing duplicates. |
sort | "asc" | "desc" | Sorts the extracted elements in ascending or descending order. |
coerce | "number" | "boolean" | "string" | Coerces the extracted values to the specified type. |
aggregate | "sum" | "avg" | "min" | "max" | "count" | Performs an aggregation operation on numeric values. |
extractType | "element" | "text" | "attribute" | Specifies what to extract: the entire element HTML, text content, or a specific attribute. |
attributeName | string | The name of the attribute to extract when extractType is set to attribute. |
trim | boolean | If set to true, leading and trailing whitespace will be removed from the extracted text. |
filters | Array<object> | An array of filter conditions to apply to the extracted elements. |
| Condition | Description |
|---|---|
lessThan | Checks if a value is less than a specified value. |
greaterThan | Checks if a value is greater than a specified value. |
lessThanOrEqual | Checks if a value is less than or equal to a specified value. |
greaterThanOrEqual | Checks if a value is greater than or equal to a specified value. |
equal | Checks if a value is equal to a specified value. |
notEqual | Checks if a value is not equal to a specified value. |
contains | Checks if a string contains a specified substring. |
notContains | Checks if a string does not contain a specified substring. |
startsWith | Checks if a string starts with a specified substring. |
endsWith | Checks if a string ends with a specified substring. |
operator | Logical operator to combine multiple filters (and or or). |
Examples
Below are many practical examples demonstrating how to use theElementEntity entity in the Elements API reference.
Basic Example
This example demonstrates different ways to extract elements directly using theselector string.
Request
Basic Example with ElementEntity
This example demonstrates how to extract elements using the fullElementEntity definition.
Request
Hybrid Example
This example demonstrates how to extract elements using a mix of simple selectors and fullElementEntity definitions.
Request
Display Name Example
By default, the extracted data keys will match the selector names in the response. For example, if your request is:displayName property:
Only First Match Example
By default, the extracted data will be an array of all matched elements. If you only want to retrieve the first matched element as astring, number, or another target type, you can use the onlyFirstMatch property:
h1 element as a string in the response:
Regular Expression Example
After matching all elements using the selector, you may want to further filter the results using a regular expression. You can achieve this by using theregex and regexFlags properties:
Default Value Example
If you are building a workflow or automation and want to ensure that a value is always returned even if no elements are found, you can use thedefaultValue property:
defaultValue—it can be any type such as string, number, boolean, object, etc.
Unique Example
By default,Tedi Network returns everything matched by the selector. If you want to ensure that only unique values are returned, you can use the unique property:
Sort Example
Once you have extracted the elements, you may want to sort them in ascending or descending order. You can achieve this by using thesort property:
h2 elements in ascending order and h3 elements in descending order before returning them in the response—either A-Z or Z-A based on the sort value.
Coerce Example
If you match a number as a string (sinceDomElement returns text by default) and you want to enforce a strict type such as number or boolean, you can use the coerce property:
"29.99" string to the 29.99 number and the "true" string to the true boolean in the response.
Aggregate Example
If you match multiple numeric values and want to perform an aggregation operation such as sum, average, min, max, or count, you can use theaggregate property:
Extract Type Example
Let’s say you want to track all elements that look like this:extractType property comes in handy:
text: extracts only the text content of the element.attribute: extracts the value of the specified attribute (e.g.,href,src, etc.).element: extracts the entire HTML of the element.
text if extractType is not specified.
If you choose
attribute as extractType, make sure to provide the
attributeName property to specify which attribute’s value you want to
extract.Trim Example
When extracting text content from elements, you might want to remove any leading or trailing whitespace. By default, Tedi Network trims the whitespace from the extracted text. However, if you want to keep the whitespace, you can set thetrim property to false:
" This is a description. " to " This is a description. " (with spaces) instead of "This is a description." (without spaces).
Filters Example
Example 1
After you match a set of element values, you may want to further filter them based on certain conditions. You can use thefilters property to apply various filtering criteria:
Example 2
For example, if you want to extract all URLs that start withhttps://www.evergreen.media/team:
Example 3 with Conditions
Filters can be combined using logical operators such asand and or. By default, filters are combined using the and operator. However, you can explicitly specify the operator to control how the filters are applied:
or operator like this:
Use Case Examples
We will now explore several practical use cases demonstrating how to leverage theElementEntity for various data extraction scenarios.
With this examples you will learn how to extract specific information from web pages using different configurations of the ElementEntity.
In each example, we target selectors in different ways to showcase the versatility of the ElementEntity.
Extract Website Title and Meta Description
In this example, we will extract the title and meta description from multiple websites usingElementEntity definitions.
Extract Personal Information
In this example, we will extract personal information such as names, email addresses, and phone numbers from a sample webpage. The target URL ishttps://www.evergreen.media/team/adnan-ali/.
We want to extract the following information:
- Full Name
- Bio
- Role/Position
- Email Address
- LinkedIn Profile URL
Extract Evergreen Media Team Members
In this example, we will extract the names, roles, email addresses, and LinkedIn profile URLs of all team members from the Evergreen Media team page. We will also learn how to use nestedElementEntity definitions to extract structured data.
Tedi + Browser Example
TheElementEntity API is a powerful for extracting structured data from web pages. However, in many real-world scenarios, you may need to go beyond simple extraction-such as gathering specific data points from multiple pages and aggregating them into a unified, structured response.
This is where the agentic capabilities of Tedi truly shine.
For example, imagine you want to collect a list of directors from various team pages across different companies or websites. Instead of manually scraping each page and merging the results, you can leverage Tedi in agentic mode to automate this process. The agentic mode allows you to define a high-level prompt (such as “Extract directors from Team”) and let the system intelligently navigate, extract, and aggregate the relevant information from multiple sources.
This approach is especially powerful for tasks that require reasoning, multi-step extraction, or combining data from diverse web pages into a single, well-structured output. By combining Tedi with Chromium browser, you unlock advanced workflows that go far beyond traditional scraping—enabling you to build robust, scalable data pipelines for complex use cases.
When sending a request in prompt mode with Tedi Network, it is required to specify the output schema to ensure the extracted data is structured correctly. We are like another endpoints we support JSON-Schema7 for defining the output structure.
Below is an example of how to set up such a request to extract directors from multiple team pages.
Considerations
When using theElementEntity for data extraction, keep the following considerations in mind:
- Selector Accuracy: Ensure that the CSS selectors used in the
selectorproperty accurately target the desired elements on the web page. Incorrect selectors may lead to unexpected results or no data being extracted. - Dynamic Content: Some web pages load content dynamically using JavaScript. In such cases, ensure that the browser network is configured to wait for the necessary content to load before extraction.
- Data Types: When using the
coerceproperty, ensure that the extracted data can be validly converted to the specified type. Invalid conversions may result in errors or unexpected values. - Performance: Complex selectors or large numbers of elements may impact performance. Optimize selectors and extraction logic to ensure efficient data retrieval.
- Error Handling: Implement appropriate error handling in your workflows to manage cases where elements are not found or extraction fails.
Rejection Criteria
If you are usingprompt mode with Tedi Network, be aware of the following rejection criteria related to the ElementEntity:
- Your prompt intent must clearly specify the need to extract structured data from web pages.
- Your account will be banned if you attempt to use the
ElementEntityfor unnecessary or frivolous data extraction that does not align with the purpose of structured data retrieval, for example:
Anti-Bot
The Tedi Browser Network component automatically handles common anti-bot protections, including rate limiting. When such measures are detected, Tedi Browser attempts to bypass them to extract the requested content. In rare cases where anti-bot defenses are highly advanced, manual intervention may be necessary. This ensure that any Browser entity, can reliably retrieve data even from websites with anti-bot measures in place.You do not need to worry about anti-bot protections when using Tedi Browser, as these are handled automatically. This is one of the key reasons we provide this entity through our API.It may sometimes take slightly longer to extract content from such protected sites, but Tedi Browser will make every effort to get you the data you need. This feature is automatically included in all other browser entities, such as this entity and other browser entities.
Conclusion
TheElementEntity provides a powerful and flexible way to extract structured data from web pages. By leveraging its various properties and features, you can tailor your data extraction processes to meet specific requirements and use cases.
Additionally, the ability to nest ElementEntity definitions allows for more complex and detailed data extraction strategies, enabling you to capture intricate relationships and hierarchies within the web content.
Explore the various properties and configurations of the ElementEntity to unlock its full potential in your web scraping and data extraction projects.
