TORO Integrate with SolrCloud
Compared to embedded and stand-alone Solr servers where Solr cores are stored on a single machine, with SolrCloud – Solr cores are abstracted into collections. A collection is a core with parts distributed among multiple Solr servers. Storing data in various servers enable replication and sharding. In other words, with SolrCloud, you are provided distributed indexing and search capabilities.
Familiarizing with SolrCloud
We recommend checking out Solr's guide on SolrCloud if you are not yet acquainted with it.
Solr is the engine behind Tracker and Monitor's family of features. Custom Solr search indices also rely on Solr. If you are to use any of these search-reliant components extensively, then optimizing TORO Integrate's Solr server(s)1 should be a priority. Using SolrCloud is one such way of making Solr more reliable2. SolrCloud ensures that Solr cores are always available when needed and makes your system resilient to outage. We highly recommended this set-up if you're running on a production environment.
Luckily, connecting your SolrCloud cluster to your TORO Integrate instance3 is a piece of cake. The complicated parts are in configuring your ZooKeeper ensemble and SolrCloud cluster. In this three-part guide, we will teach you the things you need to do in order to run TORO Integrate with SolrCloud. We recommend you read the documents in order:
- Configuring an External ZooKeeper Ensemble
- Configuring SolrCloud to Work with TORO Integrate and an External ZooKeeper Ensemble
- Configuring TORO Integrate to Work with SolrCloud
We think it's best to discuss the steps by example. In this case, we will be setting up three instances of ZooKeeper
and three instances of SolrCloud – quite similar to what we'd do in production environments. For ease of configuration
and installation, the instances are connected to a shared storage via NAS. A folder called
/datastore is mounted
across all servers. Setting up the shared storage server, however, will not be covered in this guide. The diagram below
summarizes the set-up we're specifically aiming for:
Of course, none of the variables in this example configuration is absolute; to each their own! We recommend analyzing how your organization uses TORO Integrate and its Solr-dependent features and from there, you can decide where to go.
Inability to create new cores on its own
Just like remote instances of Solr, with SolrCloud mode, TORO Integrate will not have the ability to create new cores. If a package needs its own Solr collections, you have to create it manually through the Solr Collections API, or the package will not start.
TORO Integrate uses an embedded instance of Solr by default. ↩
SolrCloud will add some overhead when processing data (e.g. network latency, distribution of data in the cluster). When it comes to indexing small data, the embedded version of Solr performed better but the difference is quite negligible. Solr in SolrCloud mode provides better performance when indexing huge chunks of data. In addition to this, it increases the reliability and availability of the Solr cores. ↩