Hello, Our schema in Sol 1.3 looked like:
<tokenizer class="solr.HTMLStripStandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> It takes 30s to index 1500 docs. When we run the same in Sol 1.4 it take 70s. I noticed that HTMLStripStandardTokenizerFactory was deprecated. So changed the schema to: <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> It still takes 70s. Instead, if I use the schema: <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> It takes 30s in both 1.3 and 1.4. I am not sure if HTMLStrip has become slower in 1.4 or HTML stripping impacts perf down stream in 1.4. Before I started writing a unit test with a TokenizerChain, I wanted to check if I am doing something fundamentally wrong. Robin