Custom Crawling Strategy
Configured crawling strategies dictate how crawls would be executed by the plugin. For your convenience, default strategies are provided; however, there will be times when these out-of-the-box behaviors will simply not suffice. To work around this and use custom logic, you can code your own crawling strategies.
Prerequisite to creating your own custom crawling strategy
You should know how to code in JavaScript (preferably ES6) as this will be the language your strategy will be written in.
Toggling Custom Crawling Behavior
To configure a custom crawling strategy:
- Click the Toggle strategy and scripts button along the Crawler toolbar.
- Select the tab whose strategy you want to customize, and then scroll down at the bottom.
-
Provide the code for your custom crawling strategy in the Strategy tab.
What's the Script tab for?
The Script tab is where you can define reusable functions and variables for your strategy code. This means any of your strategies, regardless of which tab they belong to, can use components declared there.
Writing Your Own Custom Crawling Strategy
Each tab's strategy is distinct; they all serve different purposes. When writing your own crawling strategy, you must write it in a way that accomplishes that purpose. For the Operation tab, the strategy's goal is to register operations; the Request Body tab, to register supported request bodies; the Parameter tab, supported parameters; and lastly, the Response tab, to register expected responses.
There are methods reserved for registering an operation, request body, parameter, or
response. They are the addOperation(Operation)
,
setRequestBody(String, RequestBody)
addParameter(String, Parameter)
, and addResponse(String, Response)
methods,
respectively. Your crawling strategy should call either of these methods (which one will depend on the context) when it
finds an operation/request body/parameter/response in the document.
However, the add*
methods require special arguments. For example, the addOperation
method requires an argument of
type Operation
. To build these special objects, you can use the
Factory
class's methods.
The Factory
methods, however, will require operation/request body/parameter/response data to be passed as arguments.
To get these data from the documentation page, you can use the
Crawler
class's methods. To further help you make transformations,
the Crawler also exposes a bunch of other utility methods.
Available Properties
The following properties will be available in the context of each tab's crawling strategy:
Name | Description |
---|---|
metadata |
Read only. It contains all available models. |
targetEl |
The target document element. It is usually the root element of the page. |
DataType |
Available data types for schemas. |
ParamLocation |
Available locations for parameter. |
ResponseStatus |
Available http statuses for response. |
Available Methods
You can call the following methods in any tab's crawling strategy code:
Name | Description |
---|---|
info(String:message) |
Display a message of level INFO in the Status dialog. |
warning(String:message) |
Display a message of level WARNING in the Status dialog. |
error(String:message) |
Display a message of level ERROR in the Status dialog. |
message(String:message) |
Display a primary message in the Status dialog. |
test(String:message) |
Print a message in the browser's console. |
addOperation(Operation:operation) |
Add an operation. |
addResponse(String:operationId, Response:response) |
Add a response to an operation. |
addParameter(String:operationId, Parameter:parameter) |
Add a parameter to an operation. |
setRequestBody(String:operationId, RequestBody:requestBody) |
Set the request body of an operation. |
addSchema(String:title, Schema:schema) |
Add a schema. |
addTag(Tag:tag) |
Add a tag. |
In addition to the custom methods above added by TORO, you can also call these:
Name | Description |
---|---|
setTimeout(Function:func, Number:ms, Object:params) |
Call a function or evaluate an expression after a specified number of milliseconds. |
setInterval(Function:func, Number:ms, Object:params) |
Repeatedly call a function or execute a code snippet, with a fixed time delay between each call. It returns an interval ID which uniquely identifies the interval, so you can remove it later by calling clearInterval(). This method is offered on the Window and Worker interfaces. |
clearTimeout(Object:id) |
Prevent the function set with the setTimeout() to execute. |
Available Methods from Config
You can call the following methods from the Config
class in your crawling strategy code:
Name | Description |
---|---|
get(String:propertyName): Crawler |
Get a property's Crawler configuration. |
Available Methods from Factory
You can call the following methods from the Factory
class in your crawling strategy code:
Name | Description |
---|---|
createOperation(String:operationId, String:description, String:method, String:path): Operation |
Create an operation model. |
createParameter(String:name, String:location, String:description, Boolean:required, Boolean:deprecated, Boolean:allowEmptyValue, String:schema, Boolean:array): Parameter |
Create a parameter model. |
createResponse(Integer:statusCode, String:mediaType, String:schema): Response |
Create a response model. |
createRequestBody(String:description, Boolean:required, Map<String, String>:content): RequestBody |
Create a request body model. |
createTag(String:name, String:description): Tag |
Create a tag model. |
createSchema(String:title): Schema |
Create a schema model. |
Available Methods from Crawler
You can call the following methods from the Factory
class in your crawling strategy code:
Name | Description |
---|---|
onProgressChanged(String:message) |
Display a message in the Progress dialog. |
getElements(Element:targetElement, String:selector): Element[] |
Retrieve elements. |
getElement(Element:targetElement, String:selector): Element[] |
Retrieve element. |
getElementText(Element:targetElement, String:selector) |
Retrieve text content from element. |
getDocument(String:url): Element |
Retrieve document element from provided URL. |
getDocuments(String[]:url): Element[] |
Retrieve document elements from provided URLs. |
getUrls(Element[]:targetElements, Config:config, String operationId): String[] |
Retrieve URLs from provided elements. You can pass your config to change how it will extract the URL from the element. |
filter(String:text, CrawlerFilter[]:filters): String |
Filters the text. |
exclude(String:text, CrawlerExclude[]:excludes): Boolean |
Returns boolean whether to exclude or not. |
evaluate(String:text, Element:targetElement, String:expression, String operationId = null): String |
Evaluates the text using the provided expression. |
setupText(Config:config, Element:targetElement, String operationId = null): String |
Retrieve text from element and then evaluate and filter the text. |
evalAndFilterText(String:text, Config:config, Element:targetElement, String operationId = null) |
Evaluate and filter the text using the config provided. |
findOperationMethod(String:text): String |
Find operation method from text. |
Available Methods from StringUtils
You can call the following methods from the StringUtils
class in your crawling strategy code:
Name | Description |
---|---|
camelCase(String:text): String |
Apply camelCase to the provided text. |
toUpperCaseFirst(String:text): String |
Convert the first character of the string to uppercase. |
toLowerCaseFirst(String:text): String |
Convert the first character of the string to lowercase. |
findVariables(String:text): String[] |
Find path variables in the provided path; enclosed curly braces {} indicate a path variable. Given the string '/user/{id}/' , the output of this function would be ['id'] . |
findQueries(String:text): String[] |
Find query variables from the provided path. Given the string '?id=2&ref=home' , this function will return ['id', 'ref'] . |
Available Methods from SchemaUtils
You can call the following methods from the SchemaUtils
class in your crawling strategy code:
Name | Description |
---|---|
isArray(Schema:schema): Boolean |
Checks whether the schema is an array. |
toArray(Schema:schema): Schema |
Transforms the schema from a regular model to an array model. |
stripArray(Schema:schema): Boolean |
Transforms the schema from an array model to a regular model. |
fromJsonString(String:title, String:text): Schema |
Create a schema model from JSON text. |
findDataType(Object:data): String |
Look for the data type from object data. |
fetchType(String:key): Type |
Find the type. |
Available Methods from JsonUtils
You can call the following methods from the JsonUtils
class in your crawling strategy code:
Name | Description |
---|---|
parse(String:text): Object |
Transforms the text data to a key-value object. |
stringify(Object:data): String |
Transforms the object data to a text string. |
Available Methods from Object
You can call the following methods from the Object
class in your crawling strategy code:
Name | Description |
---|---|
assign(Object:target, Object... sources): Object |
Used to copy the values of all enumerable own properties from one or more source objects to a target object. It will return the target object. |
is(Object:value1, Object:value2) |
Determines whether two values are the same value. |
keys(Object:obj): String[] |
Returns an array of a given object's property names, in the same order as we get with a normal loop. |
values(Object:obj) |
Returns an array of a given object's own enumerable property values, in the same order as that provided by a for...in loop. |
defineProperty(Object:obj, String:prop, Descriptor:descriptor): Object |
Defines a new property directly on an object, or modifies an existing property on an object, and returns the object. |
defineProperties(Object:obj, Descriptor:props): Object |
Defines new or modifies existing properties directly on an object, returning the object. |
create(Object:proto, Object:propertiesObject): Object |
Creates a new object, using an existing object to provide the newly created object's proto (see browser console for visual evidence). |
entries(Object:obj): Object[] |
Returns an array of a given object's own enumerable property [key, value] pairs, in the same order as that provided by a for...in loop. |
freeze(Object:obj): Object |
Freezes an object. This means it prevents new properties from being added to it; prevents existing properties from being removed; and prevents existing properties, or their enumerability, configurability, or writability, from being changed. It also prevents the prototype from being changed. The method returns the passed object. |
getOwnPropertyDescriptor(Object:obj, Descriptor:prop): Object |
Returns a property descriptor for an own property (that is, one directly present on an object and not in the object's prototype chain) of a given object. |
getOwnPropertyDescriptors(Object:obj): Object |
Returns all own property descriptors of a given object. |
getOwnPropertyNames(Object:obj): Object |
Returns an array of all properties (including non-enumerable properties except for those which use symbol) found directly upon a given object. |
getOwnPropertySymbols(Object:obj): Object |
Returns an array of all symbol properties found directly upon a given object. |
getPrototypeOf(Object:obj): Object |
Returns the prototype (i.e. the value of the internal [[Prototype]] property) of the specified object. |
isExtensible(Object:obj): Boolean |
Determines if an object is extensible (whether it can have new properties added to it). |
isFrozen(Object:obj): Boolean |
Determines if an object is frozen. |
isSealed(Object:obj): Boolean |
Determines if an object is sealed. |
Writing Your Own Reusable Methods
If you've got logic reusable across strategies, you can declare them as methods and register them in the Script tab. Methods registered there will be callable in all strategies, regardless of which tab a strategy belongs to. This allows you to store your logic in one place and reduce the length of your code.
The Script tab's content is consistent
Even if you shuffle through the Operation, Request Body, Parameter, and Response tabs, the Script tab's content will remain the same. This is because the Script tab's content is shared by all tabs.
Consider the following snippet:
1 2 3 | registerMethod('toUpperCase', text => { return text.toUpperCase(); }); |
Writing this in the Script tab prompts the plugin to register a method with the signature of toUpperCase(String)
. This
function transforms passed-in strings to their uppercased version. toUpperCase(String)
can then be used in any
strategy code, like so:
1 | toUpperCase('Hello, world!'); // returns 'HELLO, WORLD!' |