Creating Your First OpenAPI Schema
You already know it's possible to create an OpenAPI-compliant schema through the use of the Docs to OpenAPI plugin. But as a beginner, where should you start and how? In this step-by-step tutorial, we will guide you through generating your first OpenAPI schema out of an example API documentation page. You will learn how to:
- Configure and run the Crawler
- Manually add schema information and edit crawled data
- Export your plugin configuration
General Procedure
At a minimum, the process of using the plugin to create an OpenAPI schema is comprised of the following series of steps:
- Launch the Docs to OpenAPI plugin.
- Enter the API's general information, list of servers, and security configuration.
- Enter the details of the API's operations, parameters, request bodies, and responses either manually or via the Crawler.
- From all data gathered using the previous steps, export the resulting OpenAPI schema and/or the plugin configuration file.
Know the API
It's important to have at least some understanding of the API you are crawling. This will make the process of mapping values to fields required by the plugin easier.
Walkthrough
For our tutorial, we will be generating an OpenAPI schema for the fictional Forumock API which has 19 operations. While simple, this API features common use cases like the presence of path, query, and body parameters; tags; security; and the presence (or absence) of response payloads.
Want the completed plugin configuration?
Download the completed plugin configuration for our example API here. Import this file to load the plugin configuration.
1. Launching the Plugin
Go to the Forumock API documentation page and from there, launch Docs to OpenAPI.
2. Configuring the API's Servers
Configuration for general information and security schemes not covered
Long-story short, we omitted both topics from this tutorial to keep it concise.
Configuring the API's general information is not required (but highly recommended). If unprovided, the plugin supplies stubs for the resulting OpenAPI schema's general information.
Our example API also does not require security authentication, so security configuration was skipped. If your API has secured endpoints, security scheme configuration would be necessary.
Next, we will "tell" the plugin which server(s) host(s) the API operations by providing the API's base URL. To do this:
- Open the OpenAPI tab.
-
Open the Servers tab underneath.
What's the Servers tab for?
The Servers tab is used to configure the base URL of your API. It presents fields for flexibly setting the URL to accommodate different server environments.
-
Click the circular green button with the plus sign.
-
Enter the following details (taken from the documentation page):
- Url:
http://api.forumock.io/v1
- Description:
Latest Forumock REST API.
- Check Apply to all
Use path variables in URLs for increased flexibility
Most APIs include a version number in the base path so that multiple versions of the API can exist in parallel. This is true for the API we're crawling right now. In
https://api.forumock.io/v1
, the version number isv1
.To make it easier for us to shift versions, we can create a path variable for the version path.
- Edit the Url field and set the base path to
https://api.forumock.io/{version}
. - Save the URL.
- From the Variables dropdown below, select
version
. - Since we're crawling the
v1
of the API, we'll be setting the Default Value of this variable tov1
.
- Url:
-
Save your changes.
This information entered in the Servers tab will be used when constructing the resulting OpenAPI schema's Server Object. The base URL provided by the Server Object is prepended to each operation's path (which is relative) in order to construct the full URL where the operation may be accessed. It's possible to enter more than one base URL; simply repeat steps #3 to #5.
3. Using the Crawler
We will be using the Crawler to fetch the list of API operations, their request bodies, parameters, and responses. The Crawler allows for a quicker and easier way to populate API operations, as opposed to manually entering API operation information via the OpenAPI tab.
a. Rules for Scanning API Operations
To identify the operations of our API, we will populate the fields under the Operation tab with CSS selectors. CSS selectors allow the Crawler to pinpoint the location of the elements it needs to scan for data. Only fields marked with an asterisk are required. However, it will be ideal to put in as much information as you can to ensure the generated OpenAPI schema is complete.
What if CSS selectors aren't enough?
Sometimes data contained by selected elements needs further processing, either to remove unwanted data, or transform data so that it fulfills the requirements of the OpenAPI specification. Docs to OpenAPI supports intermediary data transformation using the Filter, Evaluate, and/or Exclude features.
But before we populate these fields, let's take a look at the documentation page's HTML structure by inspecting the page in Chrome.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | <div class="section"> <h5>Paths</h5> <div class="indent"> <div class="indent"> <div class="row pointer blue-text text-lighten-1"> <span>/board</span></div> <!-- Path --> </div> <div> <div class="api-block round-border blue-border"> <!-- Operation wrapper --> <!-- Operation details --> </div> <div class="api-block round-border blue-border"> <!-- Operation wrapper --> <!-- Operation details --> </div> <div class="api-block round-border blue-border"> <!-- Operation wrapper --> <!-- Operation details --> </div> </div> </div> </div> <div class="indent"> <div class="indent"> <div class="row pointer blue-text text-lighten-1"> <span>/board/{boardName}</span></div> <!-- Path --> </div> <div> <div class="api-block round-border blue-border"> <!-- Operation wrapper --> <!-- Operation details --> </div> <div class="api-block round-border blue-border"> <!-- Operation wrapper --> <!-- Operation details --> </div> </div> </div> </div> <!-- More paths --> </div> |
In most cases, like in our example, the HTML structure will be repetitive for similar data. This repetitiveness is what
makes our documentation ideal for crawling. As you will notice from the condensed snippet below, each API operation is
assigned its own <div class="api-block round-border blue-border">
element.
These operation wrappers1 contain the information we need to enter in our specification. A closer inspection of their contents reveals that the information we need for the Operation tab is located in the first few elements:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | <div class="api-block round-border blue-border"> <div class="api-header lighten-5 round-border pointer blue"> <div class="api-title light white-text blue">GET /board</div> <!-- HTTP method and path --> </div> <div> <div class="tag-container"> <!-- Tags --> <div class="tag right no-overflow pointer">board</div> <div class="tag right no-overflow pointer">GET</div> </div> <div class="indent r-indent"> <h5 class="blue-text">Summary</h5> <p class="indent">Fetch boards</p> <h5 class="blue-text">Description</h5> <p class="indent">Fetch the list of existing boards. Results are paginated and can be sorted.</p> <!-- ... --> </div> </div> </div> |
Now that we know where to get the values for the Name, Description, Method, Path, and Tag fields, let's go ahead and populate the Operation tab.
Use the Inspect Page feature to quickly select elements
For every field under the Crawler tab, is a corresponding Inspect Page button on the right. This button allows you to select an element straight from the rendered web page and set that element's selector as the field value.
Label | Selector |
---|---|
Operation Url | |
Wrapper | div.api-block |
Name | div > div.indent > p:first-of-type |
Description | div > div.indent > p:nth-of-type(2) |
Method | div.api-title |
Path | div.api-title |
Tags | div.tag-container > div.tag |
The wrapper's selector is implicitly prepended
The plugin assumes the elements containing the operation name, description, method, path, and tags are under the wrapper element. Hence, it automatically prepends the wrapper's selector implicitly.
So if our wrapper is div.api-block
and the operation name can be found at div > div.indent > p:first-of-type
,
the plugin actually looks for the operation name at div.api-block div > div.indent > p:first-of-type
. The plugin
always assumes the other selectors are relative to the wrapper's selector.
Since we seem to have the Operation tab covered, let's take our configuration for a test drive by running the Crawler. We've only populated rules for fetching general information about operations, so just crawling operations would suffice.
At this point, the Crawler would be able to fetch all 19 API operations. It will also be able to detect the presence of 14 path parameters from the provided URLs.
Everything's looking good... until we take a peek at the OpenAPI tab > Operations tab. Yikes! Our Path contains the HTTP method of the operation.
What's the OpenAPI tab for?
The OpenAPI tab, like the Crawler tab, is comprised of multiple tabs. These tabs contain crawled or manually added API data, which will eventually be used to generate the OpenAPI schema.
This happened because our selector for the Path field is div.api-header > div.api-title
and if you take a look at the
contents of any of its selectable elements, it will be in the format <method> <url>
. Fortunately, all this takes is a
simple fix.
- Go back to the Crawler tab > Operation tab.
- Click the Path label. The Evaluate, Filter, and Exclude fields should appear underneath it.
-
Set the Evaluate field so that it contains this snippet:
1
return text.split(" ")[1];
Simply put, this instructs the plugin to get the <url>
part from the selector-obtained text, which we know is
in the format of <method> <url>
.
Why did we not fix the Method field?
The Method field uses the same selector as the Path field but we didn't have to add an Evaluate field for it like
the latter because the Crawler automatically assigns it the HTTP method it detects from the given string. For
example, if the text is GET /board
, then the operation will be assigned the method GET
because GET
is part of
the text.
b. Rules for Scanning Request Bodies
Our API contains a couple of POST
and PUT
endpoints requiring payload data. So for our next undertaking, we will
populate the fields in the Request Body tab with CSS selectors (yet again) to
identify the payload content required per endpoint.
To specify required payload content, API documentation pages typically provide tables or lists of accepted payload properties, or sample request payload content. The Docs to OpenAPI plugin is capable of crawling either. Lucky for us, the example API documentation provides the latter, which effectively means less fields for us to configure.
Again, let's take a look at the HTML structure of our API documentation; this time, let's check for elements containing request body information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | <div class="api-block round-border blue-border"> <!-- ... --> <div> <!-- ... --> <div class="indent r-indent"> <h5 class="blue-text">Summary</h5> <p class="indent">Create comment</p> <h5 class="blue-text">Parameters</h5> <!-- ... --> <table class="bordered fixed b-margin-one-half JColResizer" id="JColResizer26"> <thead> <tr> <th class="name" style="width: 128px;">Name</th> <th class="location" style="width: 128px;">Located In</th> <th class="desc" style="width: 256px;">Description</th> <th class="required" style="width: 90px;">Required</th> <th style="width: 616px;">Schema</th> </tr> </thead> <tbody> <tr> <td> <p class="lpad5px">threadId</p> </td> <td> <p class="lpad5px">path</p> </td> <td> <p class="lpad5px">The ID of the thread where the comment belongs to</p> </td> <td> <p class="lpad5px">Yes</p> </td> <td> <p class="lpad5px"></p> <pre><span class="">integer</span></pre> <p></p> </td> </tr> <tr> <td> <p class="lpad5px">body</p> </td> <td> <p class="lpad5px">body</p> </td> <td> <p class="lpad5px">The comment to create</p> </td> <td> <p class="lpad5px">Yes</p> </td> <td> <p class="lpad5px"></p> <pre> <!-- JSON payload --> <!-- Ommitted for conciseness --> </pre> <p></p> </td> </tr> </tbody> </table> <!-- ... --> </div> </div> </div> |
We can easily infer two things:
- (1) The wrapper we configured earlier in the Operation tab still contains the request body information we need; and
- (2) Request parameters, regardless of their expected location in the request, are described altogether in a single table. For body parameters, sample request payloads can be found under the Schema column. Body parameters are also consistently placed at the last row of the table.
With this knowledge, let's populate the Request Body tab with the following values:
Label | Value | Reason |
---|---|---|
Request Body Url | Request bodies are NOT described in a different page. | |
Operation Wrapper | div.api-block |
|
Operation Name | div > div.indent > p:first-of-type |
|
Request Body Payload | table:first-of-type > tbody > tr:last-child > td:last-child |
Sample request payloads are located within the last (5th) cell from the left of the last row. |
Some sample request payloads also contain comments. We have to remove them to prevent errors. To do this:
- Click on the Request Body Payload label. The Evaluate and Filter fields would appear.
-
Enter the following snippet in the Evaluate field:
1
return text.replace(/(\/\*([\s\S]*?)\*\/)|(\/\/(.*)$)/gm, '');
Instead of using the Filter field, we used the Evaluate field because we need to use advanced regular expression options.
Request bodies are mapped to operations using operation names
It is important to set the Operation Name field correctly because request bodies will be mapped to their respective API operation through the acquired operation name.
The CSS selector for sample payload data sometimes selects non-request payload data
Not all API operations require a body parameter. In such cases, like the fetchBoards
operation, once the plugin
detects that the Request Body Payload selector does not point to an XML or JSON payload, it tries to extract payload
data using selectors in the Fields section. If they're unprovided, the plugin assumes the API operation does not
accept request bodies.
You can run the crawler again to check how our changes have affected the generated OpenAPI schema. This time, choose to crawl All to ensure our configuration in the Operation tab and the Request Body tab are accounted in the process.
c. Rules for Scanning Parameters
Like we did in the Operation and Request Body tab, we will be populating the fields in the Parameter tab with CSS selectors; this time, to identify which elements contain parameter information.
From the previous step, we already know that all parameters, regardless of their location, are described in the Parameters table. This means a mixture of path, query, and body parameters can be found in the table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | <div class="api-block round-border blue-border"> <!-- ... --> <div> <!-- ... --> <div class="indent r-indent"> <h5 class="blue-text">Summary</h5> <p class="indent">Create comment</p> <h5 class="blue-text">Parameters</h5> <!-- ... --> <table class="bordered fixed b-margin-one-half JColResizer" id="JColResizer26"> <thead> <tr> <th class="name" style="width: 128px;">Name</th> <th class="location" style="width: 128px;">Located In</th> <th class="desc" style="width: 256px;">Description</th> <th class="required" style="width: 90px;">Required</th> <th style="width: 616px;">Schema</th> </tr> </thead> <tbody> <tr> <td> <p class="lpad5px">threadId</p> </td> <td> <p class="lpad5px">path</p> </td> <td> <p class="lpad5px">The ID of the thread where the comment belongs to</p> </td> <td> <p class="lpad5px">Yes</p> </td> <td> <p class="lpad5px"></p> <pre><span class="">integer</span></pre> <p></p> </td> </tr> <tr> <td> <p class="lpad5px">body</p> </td> <td> <p class="lpad5px">body</p> </td> <td> <p class="lpad5px">The comment to create</p> </td> <td> <p class="lpad5px">Yes</p> </td> <td> <p class="lpad5px"></p> <pre> <!-- JSON payload --> <!-- Ommitted for conciseness --> </pre> <p></p> </td> </tr> </tbody> </table> <!-- ... --> </div> </div> </div> |
Each row describes one parameter. Since the Request Body tab is already responsible for crawling request payloads, we
have to find a way to crawl only path and query parameters. And while we have the option to crawl parameter payloads,
our API documentation does not provide sample payloads for path and query parameters; thus, we'll be crawling <td>
elements to get the information we need per parameter.
Here's the configuration we came up with:
Label | Value | Reason |
---|---|---|
Parameter Url | Parameters are NOT described in a different page. | |
Operation Wrapper | dvi.api-block |
This element contains both the operation name and the Parameters table. |
Operation Name | div > div.indent > p:first-of-type |
|
Parameter Payload | No sample parameter payload. |
When will Operation Wrapper fields hold different values per tab?
In our example, Operation Wrapper fields are consistently identical across tabs. This is typically the case for most APIs. These values only deviate when operation wrappers do not wrap request body, parameter, or response body information altogether (which is the case in some API documentations, like eBay's).
Parameters are mapped to operations using operation names
It is important to set the Operation Name field correctly because parameters will be mapped to their respective API operation through the acquired operation name.
And for the Fields section:
Label | Value | Reason |
---|---|---|
Wrapper | table:first-of-type > tbody > tr |
Parameters are described by row. |
Name | td:first-child |
The parameter name can be found in the first <td> element of the row. |
Description | td:nth-child(3) |
Parameter description is in the third column. |
Location | td:nth-child(2) |
Parameter location is in the second column. |
Required | td:nth-child(4) |
The fourth column indicates where or not a parameter is required. |
Allow Empty Value | No indicator. | |
Type | td:last-child |
Parameter type is defined in the last column. |
Array | None of the API's parameters are arrays. |
We'll use the Exclude field to exclude body parameters:
- Click the Location label. The Evaluate, Filter, and Exclude fields will show up.
- Enable Javascript.
-
In the Exclude field, enter the following snippet:
1
return text === 'body';
Now that we're done with configuring the fields in the Parameter tab, let's run the Crawler again and see the results. This time, let's crawl All.
After crawling, you should be able to see an increase in the number of parameters retrieved. You can check all of the operations, request bodies, and parameters retrieved by going to the OpenAPI tab > Operations tab. This tab also allows you to edit crawled data; for correcting incorrectly extracted data or adding more uncrawled information.
d. Rules for Scanning Responses
Next, let's add rules for retrieving expected endpoint responses. We will do this through the Response tab under the Crawler tab.
The Request Body and Response tab's configuration are fairly similar
The only difference between these two is that the former contains rules for retrieving request bodies but the latter contains rules for retrieving responses.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | <div class="api-block round-border blue-border"> <!-- ... --> <div> <!-- ... --> <div class="indent r-indent"> <h5 class="blue-text">Summary</h5> <p class="indent">Fetch boards</p> <!-- ... --> <h5 class="blue-text">Responses</h5> <!-- ... --> <table class="bordered fixed b-margin-one-half JColResizer" id="JColResizer1"> <thead> <tr> <th class="status" style="width: 64px;">Code</th> <th class="desc" style="width: 256px;">Description</th> <th style="width: 898px;">Schema</th> </tr> </thead> <tbody> <tr> <td class=""> <p class="lpad5px green-text darken-2">200</p> </td> <td class=""> <p class="lpad5px">OK</p> </td> <td class=""> <p class="lpad5px"></p> <pre> <!-- JSON payload --> <!-- Ommitted for conciseness --> </pre> <p></p> </td> </tr> <tr> <td class=""> <p class="lpad5px">default</p> </td> <td class=""> <p class="lpad5px">Unexpected error</p> </td> <td class=""> <p class="lpad5px"></p> <pre> <!-- JSON payload --> <!-- Ommitted for conciseness --> </pre> <p></p> </td> </tr> </tbody> </table> <!-- ... --> </div> </div> </div> |
Like in the case of request bodies and parameters, we can either use a sample response body or crawl individual response body field definitions to infer response body information. In our case, since there is a sample JSON response payload per endpoint, we'll use the former method instead of the latter. It is possible to also use the latter method, but this would mean filling up more fields in the tab's Fields section.
Label | Value | Reason |
---|---|---|
Response Url | Responses are NOT described in a different page. | |
Operation Wrapper | div.api-block |
|
Operation Name | div > div.indent > p:first-of-type |
|
Status Code | table:last-of-type > tbody td:first-child |
Status codes are located in the second table's first column. |
Response Description | table:last-of-type > tbody td:nth-child(2) |
Response descriptions can be found in the second table's second column. |
Response Payload | table:last-of-type > tbody td:last-child |
Sample response payloads can be found in the second table's last column. |
Responses are mapped to operations using operation names
It is important to set the Operation Name field correctly because responses will be mapped to their respective API operation through the acquired operation name.
After populating the Response tab, you can now run the Crawler to extract additional data. If you have existing operations, responses would automatically be added to the operation based on the operation name or ID. To view or edit crawled response data, go to the OpenAPI tab.
4. Exporting Your Configuration
Once you have all API data required to generate the schema, you may now export your configuration and/or resulting OpenAPI schema.
-
All operations are written in the same format; hence, looking at just one of these operations would suffice. ↩