Skip to content

Creating Your First OpenAPI Schema

You already know it's possible to create an OpenAPI-compliant schema through the use of the Docs to OpenAPI plugin. But as a beginner, where should you start and how? In this step-by-step tutorial, we will guide you through generating your first OpenAPI schema out of an example API documentation page. You will learn how to:

  • Configure and run the Crawler
  • Manually add schema information and edit crawled data
  • Export your plugin configuration

General Procedure

At a minimum, the process of using the plugin to create an OpenAPI schema is comprised of the following series of steps:

  1. Launch the Docs to OpenAPI plugin.
  2. Enter the API's general information, list of servers, and security configuration.
  3. Enter the details of the API's operations, parameters, request bodies, and responses either manually or via the Crawler.
  4. From all data gathered using the previous steps, export the resulting OpenAPI schema and/or the plugin configuration file.
Know the API

It's important to have at least some understanding of the API you are crawling. This will make the process of mapping values to fields required by the plugin easier.

Walkthrough

For our tutorial, we will be generating an OpenAPI schema for the fictional Forumock API which has 19 operations. While simple, this API features common use cases like the presence of path, query, and body parameters; tags; security; and the presence (or absence) of response payloads.

Want the completed plugin configuration?

Download the completed plugin configuration for our example API here. Import this file to load the plugin configuration.

1. Launching the Plugin

Go to the Forumock API documentation page and from there, launch Docs to OpenAPI.

Launching the plugin

2. Configuring the API's Servers

Configuration for general information and security schemes not covered

Long-story short, we omitted both topics from this tutorial to keep it concise.

Configuring the API's general information is not required (but highly recommended). If unprovided, the plugin supplies stubs for the resulting OpenAPI schema's general information.

Our example API also does not require security authentication, so security configuration was skipped. If your API has secured endpoints, security scheme configuration would be necessary.

Next, we will "tell" the plugin which server(s) host(s) the API operations by providing the API's base URL. To do this:

  1. Open the OpenAPI tab.
  2. Open the Servers tab underneath.

    What's the Servers tab for?

    The Servers tab is used to configure the base URL of your API. It presents fields for flexibly setting the URL to accommodate different server environments.

  3. Click the circular green button with the plus sign.

  4. Enter the following details (taken from the documentation page):

    • Url: http://api.forumock.io/v1
    • Description: Latest Forumock REST API.
    • Check Apply to all
    Use path variables in URLs for increased flexibility

    Most APIs include a version number in the base path so that multiple versions of the API can exist in parallel. This is true for the API we're crawling right now. In https://api.forumock.io/v1, the version number is v1.

    To make it easier for us to shift versions, we can create a path variable for the version path.

    1. Edit the Url field and set the base path to https://api.forumock.io/{version}.
    2. Save the URL.
    3. From the Variables dropdown below, select version.
    4. Since we're crawling the v1 of the API, we'll be setting the Default Value of this variable to v1.

    Server configuration

  5. Save your changes.

This information entered in the Servers tab will be used when constructing the resulting OpenAPI schema's Server Object. The base URL provided by the Server Object is prepended to each operation's path (which is relative) in order to construct the full URL where the operation may be accessed. It's possible to enter more than one base URL; simply repeat steps #3 to #5.

3. Using the Crawler

We will be using the Crawler to fetch the list of API operations, their request bodies, parameters, and responses. The Crawler allows for a quicker and easier way to populate API operations, as opposed to manually entering API operation information via the OpenAPI tab.

a. Rules for Scanning API Operations

To identify the operations of our API, we will populate the fields under the Operation tab with CSS selectors. CSS selectors allow the Crawler to pinpoint the location of the elements it needs to scan for data. Only fields marked with an asterisk are required. However, it will be ideal to put in as much information as you can to ensure the generated OpenAPI schema is complete.

What if CSS selectors aren't enough?

Sometimes data contained by selected elements needs further processing, either to remove unwanted data, or transform data so that it fulfills the requirements of the OpenAPI specification. Docs to OpenAPI supports intermediary data transformation using the Filter, Evaluate, and/or Exclude features.

But before we populate these fields, let's take a look at the documentation page's HTML structure by inspecting the page in Chrome.

API documentation page structure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<div class="section">
    <h5>Paths</h5>
    <div class="indent">
        <div class="indent">
            <div class="row pointer blue-text text-lighten-1">
                <span>/board</span></div> <!-- Path -->
            </div>
            <div>
                <div class="api-block round-border blue-border"> <!-- Operation wrapper -->
                    <!-- Operation details -->
                </div>
                <div class="api-block round-border blue-border"> <!-- Operation wrapper -->
                    <!-- Operation details -->
                </div>
                <div class="api-block round-border blue-border"> <!-- Operation wrapper -->
                    <!-- Operation details -->
                </div>
            </div>
        </div>
    </div>    
    <div class="indent">
        <div class="indent">
            <div class="row pointer blue-text text-lighten-1">
                <span>/board/{boardName}</span></div> <!-- Path -->
            </div>
            <div>
                <div class="api-block round-border blue-border"> <!-- Operation wrapper -->
                    <!-- Operation details -->
                </div>
                <div class="api-block round-border blue-border"> <!-- Operation wrapper -->
                    <!-- Operation details -->
                </div>
            </div>
        </div>
    </div>
    <!-- More paths -->
</div>

In most cases, like in our example, the HTML structure will be repetitive for similar data. This repetitiveness is what makes our documentation ideal for crawling. As you will notice from the condensed snippet below, each API operation is assigned its own <div class="api-block round-border blue-border"> element.

These operation wrappers1 contain the information we need to enter in our specification. A closer inspection of their contents reveals that the information we need for the Operation tab is located in the first few elements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<div class="api-block round-border blue-border">
    <div class="api-header lighten-5 round-border pointer blue">
        <div class="api-title light white-text blue">GET /board</div> <!-- HTTP method and path -->
    </div>
    <div>
        <div class="tag-container"> <!-- Tags -->
            <div class="tag right no-overflow pointer">board</div>
            <div class="tag right no-overflow pointer">GET</div>
        </div>
        <div class="indent r-indent">
            <h5 class="blue-text">Summary</h5>
            <p class="indent">Fetch boards</p>
            <h5 class="blue-text">Description</h5>
            <p class="indent">Fetch the list of existing boards. Results are paginated and can be sorted.</p>
            <!-- ... -->
        </div>
    </div>
</div>

Now that we know where to get the values for the Name, Description, Method, Path, and Tag fields, let's go ahead and populate the Operation tab.

Use the Inspect Page feature to quickly select elements

For every field under the Crawler tab, is a corresponding Inspect Page button on the right. This button allows you to select an element straight from the rendered web page and set that element's selector as the field value.

Using the Inspect Page feature

Label Selector
Operation Url
Wrapper div.api-block
Name div > div.indent > p:first-of-type
Description div > div.indent > p:nth-of-type(2)
Method div.api-title
Path div.api-title
Tags div.tag-container > div.tag
The wrapper's selector is implicitly prepended

The plugin assumes the elements containing the operation name, description, method, path, and tags are under the wrapper element. Hence, it automatically prepends the wrapper's selector implicitly.

So if our wrapper is div.api-block and the operation name can be found at div > div.indent > p:first-of-type, the plugin actually looks for the operation name at div.api-block div > div.indent > p:first-of-type. The plugin always assumes the other selectors are relative to the wrapper's selector.

Since we seem to have the Operation tab covered, let's take our configuration for a test drive by running the Crawler. We've only populated rules for fetching general information about operations, so just crawling operations would suffice.

Crawl operations

At this point, the Crawler would be able to fetch all 19 API operations. It will also be able to detect the presence of 14 path parameters from the provided URLs.

Crawling results

Everything's looking good... until we take a peek at the OpenAPI tab > Operations tab. Yikes! Our Path contains the HTTP method of the operation.

Invalid path extracted

What's the OpenAPI tab for?

The OpenAPI tab, like the Crawler tab, is comprised of multiple tabs. These tabs contain crawled or manually added API data, which will eventually be used to generate the OpenAPI schema.

This happened because our selector for the Path field is div.api-header > div.api-title and if you take a look at the contents of any of its selectable elements, it will be in the format <method> <url>. Fortunately, all this takes is a simple fix.

  1. Go back to the Crawler tab > Operation tab.
  2. Click the Path label. The Evaluate, Filter, and Exclude fields should appear underneath it.
  3. Set the Evaluate field so that it contains this snippet:

    1
    return text.split(" ")[1];
    

Simply put, this instructs the plugin to get the <url> part from the selector-obtained text, which we know is in the format of <method> <url>.

Obtained path fixed through filter

Why did we not fix the Method field?

The Method field uses the same selector as the Path field but we didn't have to add an Evaluate field for it like the latter because the Crawler automatically assigns it the HTTP method it detects from the given string. For example, if the text is GET /board, then the operation will be assigned the method GET because GET is part of the text.

b. Rules for Scanning Request Bodies

Our API contains a couple of POST and PUT endpoints requiring payload data. So for our next undertaking, we will populate the fields in the Request Body tab with CSS selectors (yet again) to identify the payload content required per endpoint.

To specify required payload content, API documentation pages typically provide tables or lists of accepted payload properties, or sample request payload content. The Docs to OpenAPI plugin is capable of crawling either. Lucky for us, the example API documentation provides the latter, which effectively means less fields for us to configure.

Again, let's take a look at the HTML structure of our API documentation; this time, let's check for elements containing request body information:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
<div class="api-block round-border blue-border">
    <!-- ... -->
    <div>
        <!-- ... -->
        <div class="indent r-indent">
            <h5 class="blue-text">Summary</h5>
            <p class="indent">Create comment</p>
            <h5 class="blue-text">Parameters</h5>
            <!-- ... -->
            <table class="bordered fixed b-margin-one-half JColResizer" id="JColResizer26">
                <thead>
                    <tr>
                        <th class="name" style="width: 128px;">Name</th>
                        <th class="location" style="width: 128px;">Located In</th>
                        <th class="desc" style="width: 256px;">Description</th>
                        <th class="required" style="width: 90px;">Required</th>
                        <th style="width: 616px;">Schema</th>
                    </tr>
                </thead>
                <tbody>
                    <tr>
                        <td>
                            <p class="lpad5px">threadId</p>
                        </td>
                        <td>
                            <p class="lpad5px">path</p>
                        </td>
                        <td>
                            <p class="lpad5px">The ID of the thread where the comment belongs to</p>
                        </td>
                        <td>
                            <p class="lpad5px">Yes</p>
                        </td>
                        <td>
                            <p class="lpad5px"></p>
                            <pre><span class="">integer</span></pre>
                            <p></p>
                        </td>
                    </tr>
                    <tr>
                        <td>
                            <p class="lpad5px">body</p>
                        </td>
                        <td>
                            <p class="lpad5px">body</p>
                        </td>
                        <td>
                            <p class="lpad5px">The comment to create</p>
                        </td>
                        <td>
                            <p class="lpad5px">Yes</p>
                        </td>
                        <td>
                            <p class="lpad5px"></p>
                            <pre>
                                <!-- JSON payload -->
                                <!-- Ommitted for conciseness -->
                            </pre>
                            <p></p>
                        </td>
                    </tr>
                </tbody>
            </table>
            <!-- ... -->
        </div>
    </div>
</div>

We can easily infer two things:

  • (1) The wrapper we configured earlier in the Operation tab still contains the request body information we need; and
  • (2) Request parameters, regardless of their expected location in the request, are described altogether in a single table. For body parameters, sample request payloads can be found under the Schema column. Body parameters are also consistently placed at the last row of the table.

With this knowledge, let's populate the Request Body tab with the following values:

Label Value Reason
Request Body Url Request bodies are NOT described in a different page.
Operation Wrapper div.api-block
Operation Name div > div.indent > p:first-of-type
Request Body Payload table:first-of-type > tbody > tr:last-child > td:last-child Sample request payloads are located within the last (5th) cell from the left of the last row.

Some sample request payloads also contain comments. We have to remove them to prevent errors. To do this:

  1. Click on the Request Body Payload label. The Evaluate and Filter fields would appear.
  2. Enter the following snippet in the Evaluate field:

    1
    return text.replace(/(\/\*([\s\S]*?)\*\/)|(\/\/(.*)$)/gm, '');
    

Instead of using the Filter field, we used the Evaluate field because we need to use advanced regular expression options.

Request bodies are mapped to operations using operation names

It is important to set the Operation Name field correctly because request bodies will be mapped to their respective API operation through the acquired operation name.

The CSS selector for sample payload data sometimes selects non-request payload data

Not all API operations require a body parameter. In such cases, like the fetchBoards operation, once the plugin detects that the Request Body Payload selector does not point to an XML or JSON payload, it tries to extract payload data using selectors in the Fields section. If they're unprovided, the plugin assumes the API operation does not accept request bodies.

You can run the crawler again to check how our changes have affected the generated OpenAPI schema. This time, choose to crawl All to ensure our configuration in the Operation tab and the Request Body tab are accounted in the process.

c. Rules for Scanning Parameters

Like we did in the Operation and Request Body tab, we will be populating the fields in the Parameter tab with CSS selectors; this time, to identify which elements contain parameter information.

From the previous step, we already know that all parameters, regardless of their location, are described in the Parameters table. This means a mixture of path, query, and body parameters can be found in the table.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
<div class="api-block round-border blue-border">
    <!-- ... -->
    <div>
        <!-- ... -->
        <div class="indent r-indent">
            <h5 class="blue-text">Summary</h5>
            <p class="indent">Create comment</p>
            <h5 class="blue-text">Parameters</h5>
            <!-- ... -->
            <table class="bordered fixed b-margin-one-half JColResizer" id="JColResizer26">
                <thead>
                    <tr>
                        <th class="name" style="width: 128px;">Name</th>
                        <th class="location" style="width: 128px;">Located In</th>
                        <th class="desc" style="width: 256px;">Description</th>
                        <th class="required" style="width: 90px;">Required</th>
                        <th style="width: 616px;">Schema</th>
                    </tr>
                </thead>
                <tbody>
                    <tr>
                        <td>
                            <p class="lpad5px">threadId</p>
                        </td>
                        <td>
                            <p class="lpad5px">path</p>
                        </td>
                        <td>
                            <p class="lpad5px">The ID of the thread where the comment belongs to</p>
                        </td>
                        <td>
                            <p class="lpad5px">Yes</p>
                        </td>
                        <td>
                            <p class="lpad5px"></p>
                            <pre><span class="">integer</span></pre>
                            <p></p>
                        </td>
                    </tr>
                    <tr>
                        <td>
                            <p class="lpad5px">body</p>
                        </td>
                        <td>
                            <p class="lpad5px">body</p>
                        </td>
                        <td>
                            <p class="lpad5px">The comment to create</p>
                        </td>
                        <td>
                            <p class="lpad5px">Yes</p>
                        </td>
                        <td>
                            <p class="lpad5px"></p>
                            <pre>
                                <!-- JSON payload -->
                                <!-- Ommitted for conciseness -->
                            </pre>
                            <p></p>
                        </td>
                    </tr>
                </tbody>
            </table>
            <!-- ... -->
        </div>
    </div>
</div>

Each row describes one parameter. Since the Request Body tab is already responsible for crawling request payloads, we have to find a way to crawl only path and query parameters. And while we have the option to crawl parameter payloads, our API documentation does not provide sample payloads for path and query parameters; thus, we'll be crawling <td> elements to get the information we need per parameter.

Here's the configuration we came up with:

Label Value Reason
Parameter Url Parameters are NOT described in a different page.
Operation Wrapper dvi.api-block This element contains both the operation name and the Parameters table.
Operation Name div > div.indent > p:first-of-type
Parameter Payload No sample parameter payload.
When will Operation Wrapper fields hold different values per tab?

In our example, Operation Wrapper fields are consistently identical across tabs. This is typically the case for most APIs. These values only deviate when operation wrappers do not wrap request body, parameter, or response body information altogether (which is the case in some API documentations, like eBay's).

Parameters are mapped to operations using operation names

It is important to set the Operation Name field correctly because parameters will be mapped to their respective API operation through the acquired operation name.

And for the Fields section:

Label Value Reason
Wrapper table:first-of-type > tbody > tr Parameters are described by row.
Name td:first-child The parameter name can be found in the first <td> element of the row.
Description td:nth-child(3) Parameter description is in the third column.
Location td:nth-child(2) Parameter location is in the second column.
Required td:nth-child(4) The fourth column indicates where or not a parameter is required.
Allow Empty Value No indicator.
Type td:last-child Parameter type is defined in the last column.
Array None of the API's parameters are arrays.

We'll use the Exclude field to exclude body parameters:

  1. Click the Location label. The Evaluate, Filter, and Exclude fields will show up.
  2. Enable Javascript.
  3. In the Exclude field, enter the following snippet:

    1
    return text === 'body';
    

Now that we're done with configuring the fields in the Parameter tab, let's run the Crawler again and see the results. This time, let's crawl All.

Crawling the page using all existing configuration

After crawling, you should be able to see an increase in the number of parameters retrieved. You can check all of the operations, request bodies, and parameters retrieved by going to the OpenAPI tab > Operations tab. This tab also allows you to edit crawled data; for correcting incorrectly extracted data or adding more uncrawled information.

d. Rules for Scanning Responses

Next, let's add rules for retrieving expected endpoint responses. We will do this through the Response tab under the Crawler tab.

The Request Body and Response tab's configuration are fairly similar

The only difference between these two is that the former contains rules for retrieving request bodies but the latter contains rules for retrieving responses.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
<div class="api-block round-border blue-border">
    <!-- ... -->
    <div>
        <!-- ... -->
        <div class="indent r-indent">
            <h5 class="blue-text">Summary</h5>
            <p class="indent">Fetch boards</p>
            <!-- ... -->
            <h5 class="blue-text">Responses</h5>
            <!-- ... -->
            <table class="bordered fixed b-margin-one-half JColResizer" id="JColResizer1">
                <thead>
                    <tr>
                        <th class="status" style="width: 64px;">Code</th>
                        <th class="desc" style="width: 256px;">Description</th>
                        <th style="width: 898px;">Schema</th>
                    </tr>
                </thead>
                <tbody>
                    <tr>
                        <td class="">
                            <p class="lpad5px green-text darken-2">200</p>
                        </td>
                        <td class="">
                            <p class="lpad5px">OK</p>
                        </td>
                        <td class="">
                            <p class="lpad5px"></p>
                            <pre>
                                <!-- JSON payload -->
                                <!-- Ommitted for conciseness -->
                            </pre>
                            <p></p>
                        </td>
                    </tr>
                    <tr>
                        <td class="">
                            <p class="lpad5px">default</p>
                        </td>
                        <td class="">
                            <p class="lpad5px">Unexpected error</p>
                        </td>
                        <td class="">
                            <p class="lpad5px"></p>
                            <pre>
                                <!-- JSON payload -->
                                <!-- Ommitted for conciseness -->
                            </pre>
                            <p></p>
                        </td>
                    </tr>
                </tbody>
            </table>
            <!-- ... -->
        </div>
    </div>
</div>

Like in the case of request bodies and parameters, we can either use a sample response body or crawl individual response body field definitions to infer response body information. In our case, since there is a sample JSON response payload per endpoint, we'll use the former method instead of the latter. It is possible to also use the latter method, but this would mean filling up more fields in the tab's Fields section.

Label Value Reason
Response Url Responses are NOT described in a different page.
Operation Wrapper div.api-block
Operation Name div > div.indent > p:first-of-type
Status Code table:last-of-type > tbody td:first-child Status codes are located in the second table's first column.
Response Description table:last-of-type > tbody td:nth-child(2) Response descriptions can be found in the second table's second column.
Response Payload table:last-of-type > tbody td:last-child Sample response payloads can be found in the second table's last column.
Responses are mapped to operations using operation names

It is important to set the Operation Name field correctly because responses will be mapped to their respective API operation through the acquired operation name.

After populating the Response tab, you can now run the Crawler to extract additional data. If you have existing operations, responses would automatically be added to the operation based on the operation name or ID. To view or edit crawled response data, go to the OpenAPI tab.

Crawled `createNewBoard` operation response

4. Exporting Your Configuration

Once you have all API data required to generate the schema, you may now export your configuration and/or resulting OpenAPI schema.


  1. All operations are written in the same format; hence, looking at just one of these operations would suffice.