We are using Solr 4.10.4. I have a few Solr fields in schema.xml defined as follows:
<field name="category" type="text_en_splitting" indexed="true" multiValued="true" stored="false" required="true" /> <field name="categories" type="string" indexed="true" multiValued="true" stored="false" /> <field name="categorycountfacet" type="string" indexed="true" multiValued="true" stored="false" /> Both of them are loaded in via data-config.xml import handler, and they are defined there as: <field column="category" sourceColName="Categories" splitBy="`" /> <field column="categories" sourceColName="Categories" splitBy="`" /> <field column="categoryfacet" sourceColName="Category_Facet" splitBy="`" /> This has been working for years, but, lately, we have noticed strange happenings, not sure what triggered it. Note a few things: category and categories both have the same exact source field. categorycountfacet contains the same data as categories, with an additional piece of data in each entry. So, sample data: category and categories loaded from a mysql database with value: "Software Maintenance Agreement`Technical Support Services” So, this should equate to two field values, "Software Maintenance Agreement” and "Technical Support Services” categoryfacet for the same product has the following mysql value before loading: "Software Maintenance Agreement~60005`Technical Support Services~60184" So, basically the same just with an extra piece of data used by our system So, these are bulk loaded via the data import handler, and, I then do a simple query: http:/ourserver:8080/solr/prod/select?q=10001548&facet.field=categories&facet=true&facet.mincount=1 And this results in: <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="categories"> <int name="Software Maintenance Agreement">1</int> <int name="Software Maintenance Agreement`Technical Support Services">1</int> <int name="Technical Support Services">1</int> </lst> </lst> Note the problem here, there are THREE values, and one of them is the original non split field. Let’s do the same query on category since it comes from the same source field: <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="category"> <int name="agreement">1</int> <int name="mainten">1</int> <int name="servic">1</int> <int name="softwar">1</int> <int name="support">1</int> <int name="technic">1</int> </lst> And let’s do the same query for categoryfacet since it’s almost identical and not tokenized: <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="categoryfacet"> <int name="Software Maintenance Agreement~60005">1</int> <int name="Technical Support Services~60184">1</int> </lst> Note it does not have a third value! I can’t seem to figure out what might be causing three values for the categories facet result. Any ideas? Steve