Re: NgramTokenizerFactory question

2018-07-05 Thread Kudrettin Güleryüz
Thank you for the explanation. To close the loop, I was able to track the problem down to the Lucene Query parser on 5.2.1 which returned +body:"123 234 345 456" for a query string 123456. Turned out that It is possible to get the same behavior by turning on split on white-space and auto Generate

Re: NgramTokenizerFactory question

2018-07-02 Thread Alexandre Rafalovitch
I am not familiar with Lucene method to create analyzer. Perhaps it was already doing just analyzes phase. But here is what the NGram would do to a string of '123456' with just trigrams: 123 234 345 456 So, if you only apply it on the index side, and your query is '2345' - there is no such token i

Re: NgramTokenizerFactory question

2018-07-02 Thread Kudrettin Güleryüz
> 1) if you want face to match interface, you need max value to be at least 4. Can you please explain this a bit more? I am not following this one. Values are set to 3,3 and Solr already matches interface and interfaces when searched for face. In addition to that Solr matches the trigrams of face

Re: NgramTokenizerFactory question

2018-07-02 Thread Alexandre Rafalovitch
Two things: 1) if you want face to match interface, you need max value to be at least 4. 2) you probably have the factory symmetrically or on Query analyzer. You probably want it on Index analyzer side only. Otherwise you are trying to match any 3-letter query substring against yoir index. Admin U

Re: NgramTokenizerFactory question

2018-07-02 Thread Kudrettin Güleryüz
It is correct that a search string causes following query to be generated: +(field:fac field:ace) Hence the results... However, I fail to see how (fac OR ace) is a relevant query, shouldn't it be +field:fac +field:ace instead? What is the suggested way to change this this behaviour? On Mon, Jul 2

Re: NgramTokenizerFactory question

2018-07-02 Thread Erick Erickson
Take a look at two things: 1> the admin/analysis page. This is probably mostly a sanity check to insure you're seeing what you expect. 2> add debug=query to the query and look at the parsed query. My bet is that the grams are being OR'd together and your search term is effectively fac OR ace