Vincenzo, thanks for the response - i know that only the Keyword Tokenizer by itself does not do anything. as pointed at the end of the initial mail, i’m applying a pattern replace for everything non-numeric to make it actually useful.
and especially because of the tokenization based on whitespaces i’d like to use the very same field once again as phrase field to around this issue. Shawn mentioned in #solr in the meantime that there is SOLR-9185 which is similar and would be helpful, but currently very very in-the-works. Standard Tokenizer you’ve mentioned does split on whitespace - as edismax does by default in the first place. so i’m not sure how that would help? For now, i don’t want to have partial matches on phone numbers .. at least not yet. -Stefan On November 7, 2016 at 4:41:50 PM, Vincenzo D'Amore (v.dam...@gmail.com) wrote: > Hi Stefan, > > I think the problem is solr.KeywordTokenizerFactory. > This tokeniser does not make any tokenisation to the string, it returns > exactly what you have. > > '+49 1234 12345678' -> '+49 1234 12345678' > > On the other hand, using edismax you are looking for '+49', '1234' and > '12345678' and none of these keywords match your phone_number field. > > Try using a different tokenizer like solr.StandardTokenizerFactory, this > should change your results. > > Bests, > Vincenzo > > On Mon, Nov 7, 2016 at 4:05 PM, Stefan Matheis > wrote: > > > I’m guessing that i’m missing something obvious here - so feel free to > > ask for more details and as well point out other directions i should > > following. > > > > the problem goes as follows: the input in one case might be a phone > > number (like +49 1234 12345678), since we’re using edismax the parts > > gets split on whitespaces - which is fine. bringing the same field > > (based on TextField) to the party (using qf) doesn’t change a thing. > > > > > responseHeader: > > > params: > > > q: '+49 1234 12345678' > > > defType: edismax > > > qf: person_mobile > > > pf: person_mobile^5 > > > debug: > > > rawquerystring: '+49 1234 12345678' > > > querystring: '+49 1234 12345678' > > > parsedquery: '(+(+DisjunctionMaxQuery((person_mobile:49)) > > DisjunctionMaxQuery((person_mobile:1234)) > > DisjunctionMaxQuery((person_mobile:12345678))) > > ())/no_coord' > > > parsedquery_toString: '+(+(person_mobile:49) (person_mobile:1234) > > (person_mobile:12345678)) ()’ > > > > but .. as far as i was able to reduce the culprit, that only happens > > when i’m using solr.KeywordTokenizerFactory . as soon as i’m changing > > that to solr.StandardTokenizerFactory the phrase query appears as > > expected: > > > > > responseHeader: > > > params: > > > q: '+49 1234 12345678' > > > defType: edismax > > > qf: person_mobile > > > pf: person_mobile^5 > > > debug: > > > rawquerystring: '+49 1234 12345678' > > > querystring: '+49 1234 12345678' > > > parsedquery: '(+(+DisjunctionMaxQuery((person_mobile:49)) > > DisjunctionMaxQuery((person_mobile:1234)) > > DisjunctionMaxQuery((person_mobile:12345678))) > > DisjunctionMaxQuery(((person_mobile:"49 1234 12345678")^5.0)))/no_coord' > > > parsedquery_toString: '+(+(person_mobile:49) (person_mobile:1234) > > (person_mobile:12345678)) ((person_mobile:"49 1234 12345678")^5.0)’ > > > > removing the + at the beginning, doesn’t make a difference either > > (just mentioning since tokee already asked this on #solr, where i’ve > > brought up the question earlier) > > > > it’s absolutely possible i’m focusing on a very wrong assumption - but > > since switching the tokenizer does result in such a rather large > > behaviour change, i think something is spooky here. > > > > i’ve read older issues and posts from the list, some of them pointed > > out that it might be a optimization that edismax brings to the table - > > i didn’t find anything specific about that. > > > > oh, and btw: if that would be working - my idea is to drop out > > everything for a given phrase that is not a number, to match the phone > > number, like this: > > > > > > > > > > > > > > > > replacement=""/> > > > > > > > > > > any thoughts? or wild guesses? > > > > Thanks Stefan > > > > > > -- > Vincenzo D'Amore > email: v.dam...@gmail.com > skype: free.dev > mobile: +39 349 8513251 >