Skip to content

Default Crawling Strategies

CSS selectors are used to determine which elements are of interest, but how the plugin will interpret provided rules and execute crawls depends solely on the configured crawling strategy (per tab). To make it easier for you, Docs to OpenAPI provides sensible, default crawling strategies so you don't have to code strategies from scratch on your own.

The Strategies, Explained

Crawling strategies are unique per tab, but that's only because they each cater to a unique set of configuration fields. They all still follow the same flow and logic, explained in detail below.

  1. Obtain the list of pages documenting API operations via the Operation Url field. If the Operation Url field is empty, then the plugin will crawl only the currently opened page.
  2. Open one of the crawlable pages.
  3. Look for operation wrappers in the visited page. Each wrapper wraps exactly one operation. In this case, operation wrappers are expected to contain the following data: operation name, description, HTTP method, path, and tags.
  4. Extract operation information from found operation wrappers, and store this information, mapping values to an internal schema.

    The wrapper's selector is implicitly prepended

    The plugin assumes the elements containing the operation name, description, method, path, and tags are under the wrapper element. Hence, it automatically prepends the wrapper's selector implicitly.

    So if our wrapper is div.api-block and the operation name can be found at div > div.indent > p:first-of-type, the plugin actually looks for the operation name at div.api-block div > div.indent > p:first-of-type. The plugin always assumes the other selectors are relative to the wrapper's selector.

  5. Repeat steps #2 to #4 until all pages have been crawled. API operation crawler

  1. Obtain the list of pages documenting supported request bodies via the Request Body Url field. If the Request Body Url field is empty, then the plugin will crawl only the currently opened page.
  2. Open one of the crawlable pages.
  3. Look for operation wrappers in the visited page. In this case, wrappers are expected to contain the operation name, and a sample request payload or a list defining payload properties.
  4. Extract request body information from found operation wrappers, and store this information in an internal schema.

    The plugin first extracts the operation name.

    Then, it checks the Request Body Payload field. If a selector is provided for this field, the plugin will ignore inputs in the Request Body tab's Fields section and infer the expected request body from the sample request body payload.

    If the configuration does not provide a selector for the Request Body Payload field, the plugin will look for payload property definitions using the selectors in the Fields section. Payload properties will be identified using the Field Wrapper's CSS selector.

  5. Repeat steps #2 to #4 until all pages have been crawled. API operation request body crawler

  1. Obtain the list of pages documenting supported parameters via the Parameter Url field. If the Parameter Url field is empty, then the plugin will crawl only the currently opened page.
  2. Open one of the crawlable pages.
  3. Look for operation wrappers in the visited page. In this case, wrappers are expected to contain the operation name, and a sample parameter payload or a list defining supported parameters.
  4. Extract the parameters from found operation wrappers, and store this information in an internal schema.

    The plugin first extracts the operation name.

    Then, it checks the Parameter Payload field. If a selector is provided for this field, the plugin will ignore inputs in the Parameter tab's Fields section and infer the supported parameters from the sample parameter payload.

    If the configuration does not provide a selector for the Parameter Payload field, the plugin will look for parameter definitions using the selectors in the Fields section. Parameters will be identified using the Field Wrapper's CSS selector.

  5. Repeat steps #2 to #4 until all pages have been crawled. API operation parameter crawler

  1. Obtain the list of pages documenting expected responses via the Response Url field. If the Response Url field is empty, then the plugin will crawl only the currently opened page.
  2. Open one of the crawlable pages.
  3. Look for operation wrappers in the visited page. In this case, wrappers are expected to contain the following data:

    • Operation name
    • Response status codes
    • Response descriptions
    • A sample request payload or a list of payload property definitions
  4. Extract response information from found operation wrappers, and store this information in an internal schema.

    The plugin first extracts the operation name, status code, and description of the response.

    Then, it checks the Response Payload field. If a selector is provided for this field, the plugin will ignore inputs in the Response tab's Fields section and infer the response's content from the sample response payload.

    If the configuration does not provide a selector for the Response Payload field, the plugin will look for payload field definitions using the selectors in the Fields section. Each property will be identified using the Field Wrapper's CSS selector.

  5. Repeat steps #2 to #4 until all pages have been crawled. API operation response crawler

There is a noticeable emphasis on ancestor-descendant relationships among selectors; finer-grained data is wrapped inside container elements. The flowchart above illustrates the entire process.

Every time the Crawler runs, it generates models for every API operation, endpoint request body, endpoint parameter, and endpoint response body, replacing previously generated data with the same ID.