We are using Solr 4.10.4. I have a few Solr fields in schema.xml defined as 
follows:

   <field name="category" type="text_en_splitting" indexed="true" 
multiValued="true" stored="false" required="true" />
   <field name="categories" type="string" indexed="true" multiValued="true" 
stored="false" />
   <field name="categorycountfacet" type="string" indexed="true" 
multiValued="true" stored="false" />

Both of them are loaded in via data-config.xml import handler, and they are 
defined there as:

          <field column="category" sourceColName="Categories" splitBy="`" />
          <field column="categories" sourceColName="Categories" splitBy="`" />
          <field column="categoryfacet" sourceColName="Category_Facet" 
splitBy="`" />

This has been working for years, but, lately, we have noticed strange 
happenings, not sure what triggered it. Note a few things: category and 
categories both have the same exact source field. categorycountfacet contains 
the same data as categories, with an additional piece of data in each entry.

So, sample data:

category and categories loaded from a mysql database with value:

"Software Maintenance Agreement`Technical Support Services”

So, this should equate to two field values, "Software Maintenance Agreement” 
and "Technical Support Services”

categoryfacet for the same product has the following mysql value before loading:

"Software Maintenance Agreement~60005`Technical Support Services~60184"

So, basically the same just with an extra piece of data used by our system

So, these are bulk loaded via the data import handler, and, I then do a simple 
query:

http:/ourserver:8080/solr/prod/select?q=10001548&facet.field=categories&facet=true&facet.mincount=1

And this results in:

<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="categories">
<int name="Software Maintenance Agreement">1</int>
<int name="Software Maintenance Agreement`Technical Support Services">1</int>
<int name="Technical Support Services">1</int>
</lst>
</lst>

Note the problem here, there are THREE values, and one of them is the original 
non split field.

Let’s do the same query on category since it comes from the same source field:

<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="category">
<int name="agreement">1</int>
<int name="mainten">1</int>
<int name="servic">1</int>
<int name="softwar">1</int>
<int name="support">1</int>
<int name="technic">1</int>
</lst>


And let’s do the same query for categoryfacet since it’s almost identical and 
not tokenized:

<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="categoryfacet">
<int name="Software Maintenance Agreement~60005">1</int>
<int name="Technical Support Services~60184">1</int>
</lst>


Note it does not have a third value! I can’t seem to figure out what might be 
causing three values for the categories facet result. Any ideas?

Steve

Reply via email to