Re: Any filter to map mutiple tokens into one ?

Konrad Lötzsch Fri, 12 Oct 2012 00:04:46 -0700

You can build shingles and then use the synonym filter. in this case youwill have to think about all these token that you don't need after theshingle filter.


Am 12.10.2012 01:35, schrieb T. Kuro Kurosaka:

I am looking for a way to fold a particular sequence of tokens intoone token.Concretely, I'd like to detect a three-token sequence of "*", ":" and"*", and replace it with a token of the text "*:*".I tried SynonymFIlter but it seems it can only deal with a singleinput token. "* : * => *:*" seems to be interpreted
as one input token of 5 characters "*", space, ":", space and "*".

I'm using Solr 3.5.

Background:
My tokenizer separate the three character sequence "*:*" into 3 tokensof one character each.The edismax parser, when given the query "*:*", i.e. find every doc,seems to pass the entire string "*:*" to the query analyzer (I suspecta bug.),
and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:

<lst name="debug">
<str name="rawquerystring">*:*</str>
<str name="querystring">*:*</str>
<str name="parsedquery">+MatchAllDocsQuery(*:*)DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* :*"~100^1.2)~0.01)</str><str name="parsedquery_toString">+*:* (body:"* : *"~100^0.5 | title:"*: *"~100^1.2)~0.01</str>
Notice that there is a space between * and : inDisjunctionMaxQuery((body:"* : *" ....)
Probably because of this, the hit score is as low as 0.109, while itis 1.000 if an analyzer that doesn't break "*:*" is used.So I'd like to stitch together "*", ":", "*" into "*:*" again to makeDisjunctionMaxQuery happy.
Thanks.


T. "Kuro" Kurosaka

Re: Any filter to map mutiple tokens into one ?

Reply via email to