ElementEntity

The ElementEntity is a highly flexible and efficient solution for extracting elements from web pages. Built on top of Rust for optimal performance, it supports a wide range of use cases and advanced features. We recommend using ElementEntity for all your element extraction needs, as it offers comprehensive options for structured data extraction and manipulation.

Definition

Below is the definition of the ElementEntity interface used in the Browser Network component for extracting elements from web pages.

ElementEntity

interface ElementEntity {
  selector?: keyof HTMLElementTagNameMap | string;
  children?: ElementEntity[];
  displayName?: string;
  onlyFirstMatch?: boolean;
  regex?: RegExp | string;
  regexFlags?: "i" | "g" | "gi" | "m" | "gm" | "im" | "gim";
  replace?: {
    pattern: RegExp | string;
    value: string;
    useRegex?: boolean;
    flags?: "i" | "g" | "gi" | "m" | "gm" | "im" | "gim";
  }|[];
  defaultValue?: any;
  unique?: boolean;
  sort?: "asc" | "desc";
  coerce?: "number" | "boolean" | "string";
  aggregate?: "sum" | "avg" | "min" | "max" | "count";
  extractType?: "element" | "text" | "attribute";
  attributeName?: string;
  trim?: boolean;
  filters?: Array<{
    lessThan?: number;
    greaterThan?: number;
    lessThanOrEqual?: number;
    greaterThanOrEqual?: number;
    equal?: string;
    notEqual?: string;
    contains?: string;
    notContains?: string;
    startsWith?: string;
    endsWith?: string;
    operator?: "and" | "or";
  }>;
}

Property	Type	Description
`selector`	`keyof HTMLElementTagNameMap \| string`	A CSS selector string used to identify the elements to be extracted from the web page.
`children`	`ElementEntity[]`	An array of nested `ElementEntity` definitions for extracting structured data from child elements.
`displayName`	`string`	A custom name for the extracted data key in the response.
`onlyFirstMatch`	`boolean`	If set to `true`, only the first matched element will be returned instead of an array.
`regex`	`RegExp \| string`	A regular expression pattern to further filter the extracted elements.
`regexFlags`	`"i" \| "g" \| "gi" \| "m" \| "gm" \| "im" \| "gim"`	Flags for the regular expression (e.g., case-insensitive, global).
`replace`	`object \| object[]`	A pattern and value to replace in the extracted elements. Can be a single object or an array of objects for multiple replacements.
`defaultValue`	`any`	A default value to return if no elements are found.
`unique`	`boolean`	If set to `true`, only unique values will be returned, removing duplicates.
`sort`	`"asc" \| "desc"`	Sorts the extracted elements in ascending or descending order.
`coerce`	`"number" \| "boolean" \| "string"`	Coerces the extracted values to the specified type.
`aggregate`	`"sum" \| "avg" \| "min" \| "max" \| "count"`	Performs an aggregation operation on numeric values.
`extractType`	`"element" \| "text" \| "attribute"`	Specifies what to extract: the entire element HTML, text content, or a specific attribute.
`attributeName`	`string`	The name of the attribute to extract when `extractType` is set to `attribute`.
`trim`	`boolean`	If set to `true`, leading and trailing whitespace will be removed from the extracted text.
`filters`	`Array<object>`	An array of filter conditions to apply to the extracted elements.

Available filter conditions

Condition	Description
`lessThan`	Checks if a value is less than a specified value.
`greaterThan`	Checks if a value is greater than a specified value.
`lessThanOrEqual`	Checks if a value is less than or equal to a specified value.
`greaterThanOrEqual`	Checks if a value is greater than or equal to a specified value.
`equal`	Checks if a value is equal to a specified value.
`notEqual`	Checks if a value is not equal to a specified value.
`contains`	Checks if a string contains a specified substring.
`notContains`	Checks if a string does not contain a specified substring.
`startsWith`	Checks if a string starts with a specified substring.
`endsWith`	Checks if a string ends with a specified substring.
`operator`	Logical operator to combine multiple filters (`and` or `or`).

Examples

Below are many practical examples demonstrating how to use the ElementEntity entity in the Elements API reference.

Basic Example

This example demonstrates different ways to extract elements directly using the selector string.

Request

{
  "urls": ["https://evergreen.media"],
  "elements": [
    "a", // Extract all anchor tags
    ".some-class-name", // Extract by class name
    "#some-id", // Extract by ID
    "[data-attribute='value']", // Extract by data attribute
    "footer .footer-link" // Extract nested elements
  ]
}

Basic Example with ElementEntity

This example demonstrates how to extract elements using the full ElementEntity definition.

Request

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "a"
    },
    {
      "selector": ".some-class-name"
    },
    {
      "selector": "#some-id"
    },
    {
      "selector": "[data-attribute='value']"
    },
    {
      "selector": "footer .footer-link"
    }
  ]
}

Hybrid Example

This example demonstrates how to extract elements using a mix of simple selectors and full ElementEntity definitions.

Request

{
  "urls": ["https://evergreen.media"],
  "elements": [
    "a", // Simple selector
    {
      "selector": ".some-class-name" // Full ElementEntity
    },
    "#some-id", // Simple selector
    "[data-attribute='value']", // Simple selector
    "footer .footer-link"
  ]
}

Display Name Example

By default, the extracted data keys will match the selector names in the response. For example, if your request is:

{
  "urls": ["https://evergreen.media"],
  "elements": ["h1", "a"]
}

The response will be:

{
  "results": {
    "https://evergreen.media": {
      "h1": [ ... ],
      "a": [ ... ]
    }
  }
}

However, if you want to override the key names with more user-friendly names, you can use the displayName property:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "h1",
      "displayName": "headings"
    },
    {
      "selector": "a",
      "displayName": "links"
    }
  ]
}

The response will be:

{
  "results": {
    "https://evergreen.media": {
      "headings": [ ... ],
      "links": [ ... ]
    }
  }
}

Only First Match Example

By default, the extracted data will be an array of all matched elements. If you only want to retrieve the first matched element as a string, number, or another target type, you can use the onlyFirstMatch property:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "h1",
      "onlyFirstMatch": true
    }
  ]
}

You will receive only the first h1 element as a string in the response:

{
  "results": {
    "https://evergreen.media": {
      "h1": "Welcome to Evergreen Media" // No array
    }
  }
}

Regular Expression Example

After matching all elements using the selector, you may want to further filter the results using a regular expression. You can achieve this by using the regex and regexFlags properties:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "a",
      "regex": "\\d+",
      "regexFlags": "g"
    }
  ]
}

This will extract all anchor tags and then apply the regex pattern to extract only the numeric parts from the href attribute or text content of the links.

Default Value Example

If you are building a workflow or automation and want to ensure that a value is always returned even if no elements are found, you can use the defaultValue property:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": ".non-existent-class",
      "defaultValue": "N/A"
    },
    {
      "selector": ".non-existent-class",
      "defaultValue": false
    }
  ]
}

There is no strict type requirement for defaultValue—it can be any type such as string, number, boolean, object, etc.

Unique Example

By default, Tedi Network returns everything matched by the selector. If you want to ensure that only unique values are returned, you can use the unique property:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "a",
      "unique": true
    }
  ]
}

This will return only unique anchor tags, removing any duplicates from the results.

Sort Example

Once you have extracted the elements, you may want to sort them in ascending or descending order. You can achieve this by using the sort property:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "h2",
      "sort": "asc"
    },
    {
      "selector": "h3",
      "sort": "desc"
    }
  ]
}

This will sort the extracted h2 elements in ascending order and h3 elements in descending order before returning them in the response—either A-Z or Z-A based on the sort value.

Coerce Example

If you match a number as a string (since DomElement returns text by default) and you want to enforce a strict type such as number or boolean, you can use the coerce property:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": ".price",
      "coerce": "number"
    },
    {
      "selector": ".in-stock",
      "coerce": "boolean"
    }
  ]
}

In short, this will convert the "29.99" string to the 29.99 number and the "true" string to the true boolean in the response.

Aggregate Example

If you match multiple numeric values and want to perform an aggregation operation such as sum, average, min, max, or count, you can use the aggregate property:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": ".price",
      "coerce": "number",
      "aggregate": "sum"
    },
    {
      "selector": ".rating",
      "coerce": "number",
      "aggregate": "avg"
    }
  ]
}

This will return the sum of all prices and the average of all ratings in the response.

Extract Type Example

Let’s say you want to track all elements that look like this:

<a href="https://example.com" class="link">Example Link</a>

The question is: what do you need exactly from this element? Just the text content, the href attribute, or the entire element HTML? That’s where the extractType property comes in handy:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "a.link",
      "extractType": "text"
    },
    {
      "selector": "a.link",
      "extractType": "attribute",
      "attributeName": "href"
    },
    {
      "selector": "a.link",
      "extractType": "element"
    }
  ]
}

text: extracts only the text content of the element.
attribute: extracts the value of the specified attribute (e.g., href, src, etc.).
element: extracts the entire HTML of the element.

The default value is text if extractType is not specified.

If you choose attribute as extractType, make sure to provide the attributeName property to specify which attribute’s value you want to extract.

Trim Example

When extracting text content from elements, you might want to remove any leading or trailing whitespace. By default, Tedi Network trims the whitespace from the extracted text. However, if you want to keep the whitespace, you can set the trim property to false:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "p.description",
      "trim": false
    }
  ]
}

This will convert text like " This is a description. " to " This is a description. " (with spaces) instead of "This is a description." (without spaces).

Filters Example

Example 1

After you match a set of element values, you may want to further filter them based on certain conditions. You can use the filters property to apply various filtering criteria:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": ".price",
      "coerce": "number",
      "filters": [
        {
          "greaterThan": 20
        },
        {
          "lessThanOrEqual": 100
        }
      ]
    }
  ]
}

This example extracts all prices, coerces them to numbers, and then filters the results to include only those prices that are greater than 20 and less than or equal to 100.

Example 2

For example, if you want to extract all URLs that start with https://www.evergreen.media/team:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "a",
      "extractType": "attribute",
      "attributeName": "href",
      "filters": [
        {
          "startsWith": "https://www.evergreen.media/team"
        }
      ]
    }
  ]
}

Example 3 with Conditions

Filters can be combined using logical operators such as and and or. By default, filters are combined using the and operator. However, you can explicitly specify the operator to control how the filters are applied:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "a",
      "extractType": "attribute",
      "attributeName": "href",
      "filters": [
        {
          "contains": "example"
        },
        {
          "operator": "and"
        },
        {
          "notContains": "test"
        }
      ]
    }
  ]
}

Alternatively, you can use the or operator like this:

{
  "urls": ["https://evergreen.media"],
  "elements": [
    {
      "selector": "a",
      "extractType": "attribute",
      "attributeName": "href",
      "filters": [
        {
          "contains": "example"
        },
        {
          "operator": "or"
        },
        {
          "contains": "sample"
        }
      ]
    }
  ]
}

Use Case Examples

We will now explore several practical use cases demonstrating how to leverage the ElementEntity for various data extraction scenarios. With this examples you will learn how to extract specific information from web pages using different configurations of the ElementEntity. In each example, we target selectors in different ways to showcase the versatility of the ElementEntity.

Extract Website Title and Meta Description

In this example, we will extract the title and meta description from multiple websites using ElementEntity definitions.

{
  "urls": [
    "https://evergreen.media",
    "https://www.tirol.gv.at",
    "https://www.torproject.org/"
  ],
  "elements": [
    {
      "selector": "title",
      "extractType": "text",
      "onlyFirstMatch": true
    },
    {
      "selector": "meta[name='description']",
      "displayName": "description",
      "extractType": "attribute",
      "attributeName": "content",
      "onlyFirstMatch": true
    }
  ]
}

Extract Personal Information

In this example, we will extract personal information such as names, email addresses, and phone numbers from a sample webpage. The target URL is https://www.evergreen.media/team/adnan-ali/. We want to extract the following information:

Full Name
Bio
Role/Position
Email Address
LinkedIn Profile URL

{
  "urls": [
    "https://www.evergreen.media/team/adnan-ali/"
  ],
  "elements": [
    {
      "selector": ".team-header-content h1 span",
      "displayName": "name",
      "extractType": "text",
      "onlyFirstMatch": true
    },
    {
      "selector": ".wp-block-column p",
      "displayName": "bio",
      "extractType": "text"
    },
    {
      "selector": ".team-header-content .fs-4 span",
      "displayName": "position",
      "extractType": "text",
      "onlyFirstMatch": true
    },
    {
      "selector": ".team-header-content a",
      "displayName": "email",
      "extractType": "attribute",
      "onlyFirstMatch": true,
      "replace": {
        "pattern": "mailto:",
        "value": ""
      },
      "filters": [
        {
          "startsWith": "mailto"
        }
      ]
    },
    {
      "selector": ".team-header-content a",
      "displayName": "linkedin",
      "extractType": "attribute",
      "attributeName": "href",
      "onlyFirstMatch": true,
      "filters": [
        {
          "contains": "linkedin.com"
        }
      ]
    }
  ]
}

Extract Evergreen Media Team Members

In this example, we will extract the names, roles, email addresses, and LinkedIn profile URLs of all team members from the Evergreen Media team page. We will also learn how to use nested ElementEntity definitions to extract structured data.

{
  "urls": [
    "https://www.evergreen.media/team"
  ],
  "elements": [
    {
      "selector": ".team-member",
      "displayName": "teamMembers",
      "children": [
        {
          "selector": ".card-body h3 a",
          "displayName": "name",
          "extractType": "text",
          "trim": true,
          "onlyFirstMatch": true
        },
        {
          "selector": ".card-body .mb-2",
          "displayName": "role",
          "extractType": "text",
          "trim": true,
          "onlyFirstMatch": true
        },
        {
          "selector": "a[href^='mailto']",
          "displayName": "email",
          "defaultValue": null,
          "extractType": "attribute",
          "attributeName": "href",
          "replace": {
            "pattern": "mailto:",
            "value": ""
          },
          "onlyFirstMatch": true
        },
        {
          "selector": "a[href*='linkedin']",
          "displayName": "linkedin",
          "extractType": "attribute",
          "attributeName": "href",
          "onlyFirstMatch": true
        }
      ]
    }
  ]
}

Tedi + Browser Example

The ElementEntity API is a powerful for extracting structured data from web pages. However, in many real-world scenarios, you may need to go beyond simple extraction-such as gathering specific data points from multiple pages and aggregating them into a unified, structured response. This is where the agentic capabilities of Tedi truly shine. For example, imagine you want to collect a list of directors from various team pages across different companies or websites. Instead of manually scraping each page and merging the results, you can leverage Tedi in agentic mode to automate this process. The agentic mode allows you to define a high-level prompt (such as “Extract directors from Team”) and let the system intelligently navigate, extract, and aggregate the relevant information from multiple sources. This approach is especially powerful for tasks that require reasoning, multi-step extraction, or combining data from diverse web pages into a single, well-structured output. By combining Tedi with Chromium browser, you unlock advanced workflows that go far beyond traditional scraping—enabling you to build robust, scalable data pipelines for complex use cases. When sending a request in prompt mode with Tedi Network, it is required to specify the output schema to ensure the extracted data is structured correctly. We are like another endpoints we support JSON-Schema7 for defining the output structure. Below is an example of how to set up such a request to extract directors from multiple team pages.

{
  "prompt": "Extract directors from Team",
  "urls": [
    "https://www.evergreen.media/ueber-uns/team/",
    "https://www.tirol.gv.at/landtag/landesrechnungshof/mitarbeiterinnen-und-mitarbeiter/",
  ],
  "output": {
    "type": "object",
    "properties": {
      "directors": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "name": { "type": "string"},
            "email": {  "type": "string" }
          },
          "required": ["name", "email"]
        }
      }
    },
    "required": ["directors"]
  }
}

Considerations

When using the ElementEntity for data extraction, keep the following considerations in mind:

Selector Accuracy: Ensure that the CSS selectors used in the selector property accurately target the desired elements on the web page. Incorrect selectors may lead to unexpected results or no data being extracted.
Dynamic Content: Some web pages load content dynamically using JavaScript. In such cases, ensure that the browser network is configured to wait for the necessary content to load before extraction.
Data Types: When using the coerce property, ensure that the extracted data can be validly converted to the specified type. Invalid conversions may result in errors or unexpected values.
Performance: Complex selectors or large numbers of elements may impact performance. Optimize selectors and extraction logic to ensure efficient data retrieval.
Error Handling: Implement appropriate error handling in your workflows to manage cases where elements are not found or extraction fails.

Rejection Criteria

If you are using prompt mode with Tedi Network, be aware of the following rejection criteria related to the ElementEntity:

Your prompt intent must clearly specify the need to extract structured data from web pages.
Your account will be banned if you attempt to use the ElementEntity for unnecessary or frivolous data extraction that does not align with the purpose of structured data retrieval, for example:

  "prompt": "how are you?",
  "urls": [
    "https://www.evergreen.media/ueber-uns/team/",
    "https://www.tirol.gv.at/landtag/landesrechnungshof/mitarbeiterinnen-und-mitarbeiter/"
  ],
  // ...

This will be rejected because it does not align with the purpose of structured data extraction. you will get something like this:

{
  "status": false,
  "message": "Tedi Error",
  "error": "Your prompt intent does not align with the purpose of structured data extraction using ElementEntity."
}

Anti-Bot

The Tedi Browser Network component automatically handles common anti-bot protections, including rate limiting. When such measures are detected, Tedi Browser attempts to bypass them to extract the requested content. In rare cases where anti-bot defenses are highly advanced, manual intervention may be necessary.

Internal testing shows a success rate of nearly 98% against standard anti-bot protections.

This ensure that any Browser entity, can reliably retrieve data even from websites with anti-bot measures in place.

You do not need to worry about anti-bot protections when using Tedi Browser, as these are handled automatically. This is one of the key reasons we provide this entity through our API.

It may sometimes take slightly longer to extract content from such protected sites, but Tedi Browser will make every effort to get you the data you need. This feature is automatically included in all other browser entities, such as this entity and other browser entities.

Conclusion

The ElementEntity provides a powerful and flexible way to extract structured data from web pages. By leveraging its various properties and features, you can tailor your data extraction processes to meet specific requirements and use cases. Additionally, the ability to nest ElementEntity definitions allows for more complex and detailed data extraction strategies, enabling you to capture intricate relationships and hierarchies within the web content. Explore the various properties and configurations of the ElementEntity to unlock its full potential in your web scraping and data extraction projects.

Introduction

Account

Chat completions & observations

Tedi Network

Definition

Examples

Basic Example

Basic Example with ElementEntity

Hybrid Example

Display Name Example

Only First Match Example

Regular Expression Example

Default Value Example

Unique Example

Sort Example

Coerce Example

Aggregate Example

Extract Type Example

Trim Example

Filters Example

Example 1

Example 2

Example 3 with Conditions

Use Case Examples

Extract Website Title and Meta Description

Extract Personal Information

Extract Evergreen Media Team Members

Tedi + Browser Example

Considerations

Rejection Criteria

Anti-Bot

Conclusion

Introduction

Account

Chat completions & observations

Tedi Network

​Definition

​Examples

​Basic Example

​Basic Example with ElementEntity

​Hybrid Example

​Display Name Example

​Only First Match Example

​Regular Expression Example

​Default Value Example

​Unique Example

​Sort Example

​Coerce Example

​Aggregate Example

​Extract Type Example

​Trim Example

​Filters Example

​Example 1

​Example 2

​Example 3 with Conditions

​Use Case Examples

​Extract Website Title and Meta Description

​Extract Personal Information

​Extract Evergreen Media Team Members

​Tedi + Browser Example

​Considerations

​Rejection Criteria

​Anti-Bot

​Conclusion

Definition

Examples

Basic Example

Basic Example with ElementEntity

Hybrid Example

Display Name Example

Only First Match Example

Regular Expression Example

Default Value Example

Unique Example

Sort Example

Coerce Example

Aggregate Example

Extract Type Example

Trim Example

Filters Example

Example 1

Example 2

Example 3 with Conditions

Use Case Examples

Extract Website Title and Meta Description

Extract Personal Information

Extract Evergreen Media Team Members

Tedi + Browser Example

Considerations

Rejection Criteria

Anti-Bot

Conclusion