Hi Stefan, I think the problem is solr.KeywordTokenizerFactory. This tokeniser does not make any tokenisation to the string, it returns exactly what you have.
'+49 1234 12345678' -> '+49 1234 12345678' On the other hand, using edismax you are looking for '+49', '1234' and '12345678' and none of these keywords match your phone_number field. Try using a different tokenizer like solr.StandardTokenizerFactory, this should change your results. Bests, Vincenzo On Mon, Nov 7, 2016 at 4:05 PM, Stefan Matheis <matheis.ste...@gmail.com> wrote: > I’m guessing that i’m missing something obvious here - so feel free to > ask for more details and as well point out other directions i should > following. > > the problem goes as follows: the input in one case might be a phone > number (like +49 1234 12345678), since we’re using edismax the parts > gets split on whitespaces - which is fine. bringing the same field > (based on TextField) to the party (using qf) doesn’t change a thing. > > > responseHeader: > > params: > > q: '+49 1234 12345678' > > defType: edismax > > qf: person_mobile > > pf: person_mobile^5 > > debug: > > rawquerystring: '+49 1234 12345678' > > querystring: '+49 1234 12345678' > > parsedquery: '(+(+DisjunctionMaxQuery((person_mobile:49)) > DisjunctionMaxQuery((person_mobile:1234)) > DisjunctionMaxQuery((person_mobile:12345678))) > ())/no_coord' > > parsedquery_toString: '+(+(person_mobile:49) (person_mobile:1234) > (person_mobile:12345678)) ()’ > > but .. as far as i was able to reduce the culprit, that only happens > when i’m using solr.KeywordTokenizerFactory . as soon as i’m changing > that to solr.StandardTokenizerFactory the phrase query appears as > expected: > > > responseHeader: > > params: > > q: '+49 1234 12345678' > > defType: edismax > > qf: person_mobile > > pf: person_mobile^5 > > debug: > > rawquerystring: '+49 1234 12345678' > > querystring: '+49 1234 12345678' > > parsedquery: '(+(+DisjunctionMaxQuery((person_mobile:49)) > DisjunctionMaxQuery((person_mobile:1234)) > DisjunctionMaxQuery((person_mobile:12345678))) > DisjunctionMaxQuery(((person_mobile:"49 1234 12345678")^5.0)))/no_coord' > > parsedquery_toString: '+(+(person_mobile:49) (person_mobile:1234) > (person_mobile:12345678)) ((person_mobile:"49 1234 12345678")^5.0)’ > > removing the + at the beginning, doesn’t make a difference either > (just mentioning since tokee already asked this on #solr, where i’ve > brought up the question earlier) > > it’s absolutely possible i’m focusing on a very wrong assumption - but > since switching the tokenizer does result in such a rather large > behaviour change, i think something is spooky here. > > i’ve read older issues and posts from the list, some of them pointed > out that it might be a optimization that edismax brings to the table - > i didn’t find anything specific about that. > > oh, and btw: if that would be working - my idea is to drop out > everything for a given phrase that is not a number, to match the phone > number, like this: > > > <fieldType name="phone_number" class="solr.TextField"> > > <analyzer> > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > <filter class="solr.PatternReplaceFilterFactory" pattern="[^\d]" > replacement=""/> > > </analyzer> > > </fieldType> > > any thoughts? or wild guesses? > > Thanks Stefan > -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251