Yes, so the terms component will of course show me the same thing as the facet 
query, I am sure the facet query is not wrong. It shows ` in the values, no 
matter for which unique product key since there should be 0 of them since there 
is a splitby, was there something else you wanted me to look for? So, I am 
wondering how in the world a tilda can get into the index. After all, there is 
a splitby (and always has been). And there is a transformer of course:

<entity name="products" dataSource="searchit" pk="Product_ID" 
transformer="TemplateTransformer,RegexTransformer" onError=“skip"

The field is merely an existing field in mysql table, it’s a very simple import 
and it’s worked for 2 years, but not now. So, perhaps data has changed, but, I 
can look in binary at the field, and it’s correct as shown in the original post 
in the database.

So, relevant query:

query="select spd.Product_ID, spd.Categories, spd.Category_Facet from 
Solr_Prod_Data"

I simply used for the terms:

http://ourserver:8080/solr/prod/terms?terms.fl=categories&terms.regex=.*`.*

This should return 0 values since it’s split on the ~, and if it wasn’t split 
on it, I would get 3 facet results I’d only get 1 in the original example. 

So, something else is going on, but I can’t seem to find it. Have any other 
ideas or were you thinking of something else?

It clearly does the split, but then adds additionally the non split value. 


> On Mar 2, 2017, at 5:27 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> "should" is the operative term here. My guess is that the data you're putting
> in the index isn't what you think it is.
> 
> I'd suggest you use the TermsComponent to examine the data actually in
> your index.
> 
> Best,
> Erick
> 
> On Thu, Mar 2, 2017 at 3:18 PM, Sales
> <i...@smallbusinessconsultingexperts.com> wrote:
>> We are using Solr 4.10.4. I have a few Solr fields in schema.xml defined as 
>> follows:
>> 
>>   <field name="category" type="text_en_splitting" indexed="true" 
>> multiValued="true" stored="false" required="true" />
>>   <field name="categories" type="string" indexed="true" multiValued="true" 
>> stored="false" />
>>   <field name="categorycountfacet" type="string" indexed="true" 
>> multiValued="true" stored="false" />
>> 
>> Both of them are loaded in via data-config.xml import handler, and they are 
>> defined there as:
>> 
>>          <field column="category" sourceColName="Categories" splitBy="`" />
>>          <field column="categories" sourceColName="Categories" splitBy="`" />
>>          <field column="categoryfacet" sourceColName="Category_Facet" 
>> splitBy="`" />
>> 
>> This has been working for years, but, lately, we have noticed strange 
>> happenings, not sure what triggered it. Note a few things: category and 
>> categories both have the same exact source field. categorycountfacet 
>> contains the same data as categories, with an additional piece of data in 
>> each entry.
>> 
>> So, sample data:
>> 
>> category and categories loaded from a mysql database with value:
>> 
>> "Software Maintenance Agreement`Technical Support Services”
>> 
>> So, this should equate to two field values, "Software Maintenance Agreement” 
>> and "Technical Support Services”
>> 
>> categoryfacet for the same product has the following mysql value before 
>> loading:
>> 
>> "Software Maintenance Agreement~60005`Technical Support Services~60184"
>> 
>> So, basically the same just with an extra piece of data used by our system
>> 
>> So, these are bulk loaded via the data import handler, and, I then do a 
>> simple query:
>> 
>> http:/ourserver:8080/solr/prod/select?q=10001548&facet.field=categories&facet=true&facet.mincount=1
>> 
>> And this results in:
>> 
>> <lst name="facet_counts">
>> <lst name="facet_queries"/>
>> <lst name="facet_fields">
>> <lst name="categories">
>> <int name="Software Maintenance Agreement">1</int>
>> <int name="Software Maintenance Agreement`Technical Support Services">1</int>
>> <int name="Technical Support Services">1</int>
>> </lst>
>> </lst>
>> 
>> Note the problem here, there are THREE values, and one of them is the 
>> original non split field.
>> 
>> Let’s do the same query on category since it comes from the same source 
>> field:
>> 
>> <lst name="facet_counts">
>> <lst name="facet_queries"/>
>> <lst name="facet_fields">
>> <lst name="category">
>> <int name="agreement">1</int>
>> <int name="mainten">1</int>
>> <int name="servic">1</int>
>> <int name="softwar">1</int>
>> <int name="support">1</int>
>> <int name="technic">1</int>
>> </lst>
>> 
>> 
>> And let’s do the same query for categoryfacet since it’s almost identical 
>> and not tokenized:
>> 
>> <lst name="facet_counts">
>> <lst name="facet_queries"/>
>> <lst name="facet_fields">
>> <lst name="categoryfacet">
>> <int name="Software Maintenance Agreement~60005">1</int>
>> <int name="Technical Support Services~60184">1</int>
>> </lst>
>> 
>> 
>> Note it does not have a third value! I can’t seem to figure out what might 
>> be causing three values for the categories facet result. Any ideas?
>> 
>> Steve
>> 

Reply via email to