I've recently started looking at using the updateRequestProcessorChain to ensure the presence of certain fields in our solr records. The reason for doing so is that we have records from several different sources, that are processed in different ways, and by adding the field via the updateRequestProcessorChain I don't have to duplicate the logic for how to create the fields in several different places.
At first it seemed that I might be able to accomplish what I needed to do with the TemplateUpdateProcessorFactory and the CloneFieldUpdateProcessorFactory and the RegexReplaceProcessorFactory , but I quickly went beyond what they can easily accomplish. example1: A document will have one or more pool_f_stored value(s) and a full_title_tsearch_stored value. generate a field where the field name(s) is drawn from the pool_f_stored value(s) and the field value is equal to the value from the full_title_tsearch_stored field. (Adding a pool specific title browse field) example2: A document will have one (or more) values in a field named uva_availability_f_stored, these values will be from the following set of strings { Online, On shelf , Request, <anything else> } these strings should be mapped to integer values { 3, 2, 1, 0 } respectively, and a field named uva_availability_isort should be added with only the largest of those values. So I tried using the StatelessScriptUpdateProcessorFactory and wrote short javascript implementations to accomplish the above, and called the scripts from the updateRequestProcessorChain and tested, and everything seemed great. However when I ran the bulk of our 9 million records through the indexing process, solr would repeatedly, unceremoniously throw a OOM error and terminate. Usually citing " # java.lang.OutOfMemoryError: Metaspace" as the reason. The only difference is that now I am calling the three javascript scripts during the updateRequestProcessorChain If I comment out those steps in the updateRequestProcessorChain I can index all 9 million items and have no problem. Any thoughts on why this would be the case? Any suggestions on how to track this down? Any known "gotchas" with using javascript scripts from within the updateRequestProcessorChain ? Java version: Java(TM) SE Runtime Environment (build 1.8.0-b132) Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode) Solr version: solr-spec 7.3.0 solr-impl 7.3.0 98a6b3d642928b1ac9076c6c5a369472581f7633 - woody - 2018-03-28 14:37:45 javascript for example 1: function processAdd(cmd) { doc = cmd.solrDoc; // org.apache.solr.common.SolrInputDocument field_value_name = params.get("field_value"); field_value = doc.getFieldValue(field_value_name); logger.debug("update-script#processAdd: field_value=" + field_value); if (field_value != null) { field_name_name = params.get("field_name"); field_names_response = doc.getFieldValues(field_name_name); field_names = (field_names_response != null) ? field_names_response.toArray() : null; for(i=0; field_names != null && i < field_names.length; i++) { field_name = "full_"+field_names[i]+"_title_f"; doc.setField(field_name, field_value); } } } SolrConfig.xml to call script: <processor class="solr.StatelessScriptUpdateProcessorFactory"> <str name="script">title_browse.js</str> <lst name="params"> <str name="field_name">pool_f_stored</str> <str name="field_value">full_title_tsearchf_stored</str> </lst> </processor> javascript for example 2: function processAdd(cmd) { doc = cmd.solrDoc; // org.apache.solr.common.SolrInputDocument field_name = params.get("field_name"); field_value_name = params.get("field_value"); logger.debug("update-script#processAdd: field_value_name=" + field_value_name); field_values_result = doc.getFieldValues(field_value_name); field_values = (field_values_result != null) ? field_values_result.toArray() : null; logger.debug("update-script#processAdd: field_value count=" + (field_values == null ? "null" : " " + (field_values.length))); if (field_name != null && field_values != null && field_values.length > 0) { // logger.debug("update-script#processAdd: field_value=" + field_value); value = 0; for(i=0; i < field_values.length; i++) { field_value = field_values[i]; if (field_value.equals("Request")) value = Math.max(value, 1); else if (field_value.equals("On shelf")) value = Math.max(value, 2); else if (field_value.equals("Online")) value = Math.max(value, 3); } doc.setField(field_name, value); } } SolrConfig.xml to call example 2 script: <processor class="solr.StatelessScriptUpdateProcessorFactory"> <str name="script">availability_rank.js</str> <lst name="params"> <str name="field_name">uva_availability_isort</str> <str name="field_value">uva_availability_f_stored</str> </lst> </processor>