Skip to content

An Introduction to the Crawler

With Docs to OpenAPI, you can create an API's corresponding OpenAPI schema out of data from its documentation. This data can either be manually entered or automatically extracted by the Docs to OpenAPI plugin for you using a feature called, as you might have guessed, Crawler.

What kind of data does the plugin expect?

The information required by the plugin pertains to the list of API operations, their request bodies, parameters, and responses. Each of these have their own respective tabs under the Crawler tab.

The Crawler needs to be configured so that it knows which information it should take from the web page and for which field (in the OpenAPI schema) each value should go into. It is able to identify which parts of the web page to acquire primarily through the use of CSS selectors and additionally, with the help of the Evaluate, Exclude, and Filter fields of the Crawler.

When CSS selectors aren't enough...

The Evaluate, Exclude, and Filter fields are typically only used when CSS selectors aren't enough to identify and isolate the data needed. This happens when, for example, a value requires further string processing because unnecessary characters are attached to it.

By default, the Crawler uses a built-in crawling algorithm when scanning documentation pages, which can be customized for increased control.

The Crawler is specifically designed to parse and extract data from HTML documents. It is ideal for use in pages where every operation is documented in a similar fashion, meaning they are all structurally identical.

The use of the Crawler is preferred and recommended by TORO as populating API data this way is quicker (especially when dealing with massive APIs) and less prone to human error. It is required, however, for the contents of documentation web pages to have consistent HTML formatting.

Learn the basics

You can beat the learning curve quicker by having a good understanding of HTML and Javascript. Being familiar with the use of browser development tools is a plus, as well.