Yes, so the terms component will of course show me the same thing as the facet query, I am sure the facet query is not wrong. It shows ` in the values, no matter for which unique product key since there should be 0 of them since there is a splitby, was there something else you wanted me to look for? So, I am wondering how in the world a tilda can get into the index. After all, there is a splitby (and always has been). And there is a transformer of course:
<entity name="products" dataSource="searchit" pk="Product_ID" transformer="TemplateTransformer,RegexTransformer" onError=“skip" The field is merely an existing field in mysql table, it’s a very simple import and it’s worked for 2 years, but not now. So, perhaps data has changed, but, I can look in binary at the field, and it’s correct as shown in the original post in the database. So, relevant query: query="select spd.Product_ID, spd.Categories, spd.Category_Facet from Solr_Prod_Data" I simply used for the terms: http://ourserver:8080/solr/prod/terms?terms.fl=categories&terms.regex=.*`.* This should return 0 values since it’s split on the ~, and if it wasn’t split on it, I would get 3 facet results I’d only get 1 in the original example. So, something else is going on, but I can’t seem to find it. Have any other ideas or were you thinking of something else? It clearly does the split, but then adds additionally the non split value. > On Mar 2, 2017, at 5:27 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > "should" is the operative term here. My guess is that the data you're putting > in the index isn't what you think it is. > > I'd suggest you use the TermsComponent to examine the data actually in > your index. > > Best, > Erick > > On Thu, Mar 2, 2017 at 3:18 PM, Sales > <i...@smallbusinessconsultingexperts.com> wrote: >> We are using Solr 4.10.4. I have a few Solr fields in schema.xml defined as >> follows: >> >> <field name="category" type="text_en_splitting" indexed="true" >> multiValued="true" stored="false" required="true" /> >> <field name="categories" type="string" indexed="true" multiValued="true" >> stored="false" /> >> <field name="categorycountfacet" type="string" indexed="true" >> multiValued="true" stored="false" /> >> >> Both of them are loaded in via data-config.xml import handler, and they are >> defined there as: >> >> <field column="category" sourceColName="Categories" splitBy="`" /> >> <field column="categories" sourceColName="Categories" splitBy="`" /> >> <field column="categoryfacet" sourceColName="Category_Facet" >> splitBy="`" /> >> >> This has been working for years, but, lately, we have noticed strange >> happenings, not sure what triggered it. Note a few things: category and >> categories both have the same exact source field. categorycountfacet >> contains the same data as categories, with an additional piece of data in >> each entry. >> >> So, sample data: >> >> category and categories loaded from a mysql database with value: >> >> "Software Maintenance Agreement`Technical Support Services” >> >> So, this should equate to two field values, "Software Maintenance Agreement” >> and "Technical Support Services” >> >> categoryfacet for the same product has the following mysql value before >> loading: >> >> "Software Maintenance Agreement~60005`Technical Support Services~60184" >> >> So, basically the same just with an extra piece of data used by our system >> >> So, these are bulk loaded via the data import handler, and, I then do a >> simple query: >> >> http:/ourserver:8080/solr/prod/select?q=10001548&facet.field=categories&facet=true&facet.mincount=1 >> >> And this results in: >> >> <lst name="facet_counts"> >> <lst name="facet_queries"/> >> <lst name="facet_fields"> >> <lst name="categories"> >> <int name="Software Maintenance Agreement">1</int> >> <int name="Software Maintenance Agreement`Technical Support Services">1</int> >> <int name="Technical Support Services">1</int> >> </lst> >> </lst> >> >> Note the problem here, there are THREE values, and one of them is the >> original non split field. >> >> Let’s do the same query on category since it comes from the same source >> field: >> >> <lst name="facet_counts"> >> <lst name="facet_queries"/> >> <lst name="facet_fields"> >> <lst name="category"> >> <int name="agreement">1</int> >> <int name="mainten">1</int> >> <int name="servic">1</int> >> <int name="softwar">1</int> >> <int name="support">1</int> >> <int name="technic">1</int> >> </lst> >> >> >> And let’s do the same query for categoryfacet since it’s almost identical >> and not tokenized: >> >> <lst name="facet_counts"> >> <lst name="facet_queries"/> >> <lst name="facet_fields"> >> <lst name="categoryfacet"> >> <int name="Software Maintenance Agreement~60005">1</int> >> <int name="Technical Support Services~60184">1</int> >> </lst> >> >> >> Note it does not have a third value! I can’t seem to figure out what might >> be causing three values for the categories facet result. Any ideas? >> >> Steve >>