Any filter to map mutiple tokens into one ?

T. Kuro Kurosaka Thu, 11 Oct 2012 16:36:25 -0700

I am looking for a way to fold a particular sequence of tokens into onetoken.Concretely, I'd like to detect a three-token sequence of "*", ":" and"*", and replace it with a token of the text "*:*".I tried SynonymFIlter but it seems it can only deal with a single inputtoken. "* : * => *:*" seems to be interpreted

as one input token of 5 characters "*", space, ":", space and "*".


I'm using Solr 3.5.

Background:

My tokenizer separate the three character sequence "*:*" into 3 tokensof one character each.The edismax parser, when given the query "*:*", i.e. find every doc,seems to pass the entire string "*:*" to the query analyzer (I suspecta bug.),

and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:

<lst name="debug">
<str name="rawquerystring">*:*</str>
<str name="querystring">*:*</str>

<str name="parsedquery">+MatchAllDocsQuery(*:*)DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* :*"~100^1.2)~0.01)</str><str name="parsedquery_toString">+*:* (body:"* : *"~100^0.5 | title:"* :*"~100^1.2)~0.01</str>

Notice that there is a space between * and : inDisjunctionMaxQuery((body:"* : *" ....)

Probably because of this, the hit score is as low as 0.109, while it is1.000 if an analyzer that doesn't break "*:*" is used.So I'd like to stitch together "*", ":", "*" into "*:*" again to makeDisjunctionMaxQuery happy.



Thanks.


T. "Kuro" Kurosaka

Any filter to map mutiple tokens into one ?

Reply via email to