Hi! First off, thank you for the help!
I'm currently running SolrCloud based off the helm chart found here: https://github.com/helm/charts/tree/master/incubator/solr Everything works great but I'd like to now use Tika to start indexing PDF's as well. In the documentation, its recommended to not use Solr Cell in a production environment: https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html#solr-cell-performance-implications So I have been trying to figure out a solution to have a Tika service to extract the contents of the possible files and came up with an idea. I could scale the amount of solr pods, have a dedicated service point to specific solr-pods that do not contain any shards on them and that will only be used for content extraction. That way if content-extraction goes wrong, it doesn't matter if the pod crashes. However, these nodes will still be connected to ZooKeeper for the entire cluster, that way they may index the file to the correct collection immediately after extraction. I'm not sure if this is how SolrCloud works though. If I send an extraction and Index request to a pod that doesn't contain the specified collection, is it extracted before being sent to the correct pod for indexing? Or is it sent to a pod with the collection and then extracted? If it's the later, do you have any advice? Thanks for the help! Dustin Pilkington Associate Software Engineer dustin.pilking...@bentley.com<mailto:dustin.pilking...@bentley.com> [Bentley_Logo_sig_113x36]