The ":" which normally separates a field name from a term (or quoted string
or parenthesized sub-query) is "parsed" by the query parser before analysis
gets called, and "*:*" is recognized before analysis as well. So, any
attempt to recreate "*:*" in analysis will be too late to affect query
parsing and other pre-analysis processing.
But, what is it you are really trying to do? What's the real problem? (This
sounds like a proverbial "XY Problem".)
-- Jack Krupansky
-----Original Message-----
From: T. Kuro Kurosaka
Sent: Thursday, October 11, 2012 7:35 PM
To: solr-user@lucene.apache.org
Subject: Any filter to map mutiple tokens into one ?
I am looking for a way to fold a particular sequence of tokens into one
token.
Concretely, I'd like to detect a three-token sequence of "*", ":" and
"*", and replace it with a token of the text "*:*".
I tried SynonymFIlter but it seems it can only deal with a single input
token. "* : * => *:*" seems to be interpreted
as one input token of 5 characters "*", space, ":", space and "*".
I'm using Solr 3.5.
Background:
My tokenizer separate the three character sequence "*:*" into 3 tokens
of one character each.
The edismax parser, when given the query "*:*", i.e. find every doc,
seems to pass the entire string "*:*" to the query analyzer (I suspect
a bug.),
and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:
<lst name="debug">
<str name="rawquerystring">*:*</str>
<str name="querystring">*:*</str>
<str name="parsedquery">+MatchAllDocsQuery(*:*)
DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01)</str>
<str name="parsedquery_toString">+*:* (body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01</str>
Notice that there is a space between * and : in
DisjunctionMaxQuery((body:"* : *" ....)
Probably because of this, the hit score is as low as 0.109, while it is
1.000 if an analyzer that doesn't break "*:*" is used.
So I'd like to stitch together "*", ":", "*" into "*:*" again to make
DisjunctionMaxQuery happy.
Thanks.
T. "Kuro" Kurosaka