If you don't want partial matches with edismax you should always use StandardTokenizerFactory and play with mm parameter.
On Mon, Nov 7, 2016 at 4:50 PM, Stefan Matheis <matheis.ste...@gmail.com> wrote: > Vincenzo, > > thanks for the response - i know that only the Keyword Tokenizer by > itself does not do anything. as pointed at the end of the initial > mail, i’m applying a pattern replace for everything non-numeric to > make it actually useful. > > and especially because of the tokenization based on whitespaces i’d > like to use the very same field once again as phrase field to around > this issue. Shawn mentioned in #solr in the meantime that there is > SOLR-9185 which is similar and would be helpful, but currently very > very in-the-works. > > Standard Tokenizer you’ve mentioned does split on whitespace - as > edismax does by default in the first place. so i’m not sure how that > would help? For now, i don’t want to have partial matches on phone > numbers .. at least not yet. > > -Stefan > > > On November 7, 2016 at 4:41:50 PM, Vincenzo D'Amore (v.dam...@gmail.com) > wrote: > > Hi Stefan, > > > > I think the problem is solr.KeywordTokenizerFactory. > > This tokeniser does not make any tokenisation to the string, it returns > > exactly what you have. > > > > '+49 1234 12345678' -> '+49 1234 12345678' > > > > On the other hand, using edismax you are looking for '+49', '1234' and > > '12345678' and none of these keywords match your phone_number field. > > > > Try using a different tokenizer like solr.StandardTokenizerFactory, this > > should change your results. > > > > Bests, > > Vincenzo > > > > On Mon, Nov 7, 2016 at 4:05 PM, Stefan Matheis > > wrote: > > > > > I’m guessing that i’m missing something obvious here - so feel free to > > > ask for more details and as well point out other directions i should > > > following. > > > > > > the problem goes as follows: the input in one case might be a phone > > > number (like +49 1234 12345678), since we’re using edismax the parts > > > gets split on whitespaces - which is fine. bringing the same field > > > (based on TextField) to the party (using qf) doesn’t change a thing. > > > > > > > responseHeader: > > > > params: > > > > q: '+49 1234 12345678' > > > > defType: edismax > > > > qf: person_mobile > > > > pf: person_mobile^5 > > > > debug: > > > > rawquerystring: '+49 1234 12345678' > > > > querystring: '+49 1234 12345678' > > > > parsedquery: '(+(+DisjunctionMaxQuery((person_mobile:49)) > > > DisjunctionMaxQuery((person_mobile:1234)) DisjunctionMaxQuery((person_ > mobile:12345678))) > > > ())/no_coord' > > > > parsedquery_toString: '+(+(person_mobile:49) (person_mobile:1234) > > > (person_mobile:12345678)) ()’ > > > > > > but .. as far as i was able to reduce the culprit, that only happens > > > when i’m using solr.KeywordTokenizerFactory . as soon as i’m changing > > > that to solr.StandardTokenizerFactory the phrase query appears as > > > expected: > > > > > > > responseHeader: > > > > params: > > > > q: '+49 1234 12345678' > > > > defType: edismax > > > > qf: person_mobile > > > > pf: person_mobile^5 > > > > debug: > > > > rawquerystring: '+49 1234 12345678' > > > > querystring: '+49 1234 12345678' > > > > parsedquery: '(+(+DisjunctionMaxQuery((person_mobile:49)) > > > DisjunctionMaxQuery((person_mobile:1234)) DisjunctionMaxQuery((person_ > mobile:12345678))) > > > DisjunctionMaxQuery(((person_mobile:"49 1234 > 12345678")^5.0)))/no_coord' > > > > parsedquery_toString: '+(+(person_mobile:49) (person_mobile:1234) > > > (person_mobile:12345678)) ((person_mobile:"49 1234 12345678")^5.0)’ > > > > > > removing the + at the beginning, doesn’t make a difference either > > > (just mentioning since tokee already asked this on #solr, where i’ve > > > brought up the question earlier) > > > > > > it’s absolutely possible i’m focusing on a very wrong assumption - but > > > since switching the tokenizer does result in such a rather large > > > behaviour change, i think something is spooky here. > > > > > > i’ve read older issues and posts from the list, some of them pointed > > > out that it might be a optimization that edismax brings to the table - > > > i didn’t find anything specific about that. > > > > > > oh, and btw: if that would be working - my idea is to drop out > > > everything for a given phrase that is not a number, to match the phone > > > number, like this: > > > > > > > > > > > > > > > > > > > > > replacement=""/> > > > > > > > > > > > > > > any thoughts? or wild guesses? > > > > > > Thanks Stefan > > > > > > > > > > > -- > > Vincenzo D'Amore > > email: v.dam...@gmail.com > > skype: free.dev > > mobile: +39 349 8513251 > > > -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251