Re: international characters in facet.prefix

Erick Erickson Wed, 07 Jun 2017 20:57:04 -0700

If you require that the facets show both the folded and non-folded
versions, then you have no choice except to index both somehow.

But I think you're saying that you expect "néd" and "ned" to be
counted in one bucket. Then, indeed, you have to somehow pre-apply the
relevant filters. You can do that in the client code or you could
write a QueryComponent that intercepted the query (probably a
first-component) and "did the right thing". The advantage there is
that since this is running on the server it has full access to the
analysis chain and could force the token to go through selected parts
of the chain without having to change the client code.

I say "parts of the chain" because some things just wouldn't make
sense. Say you had WordDelimiterFilterFactory in your chain. If your
prefix has a change in case, you'd get two tokens, definitely not what
you want. Which is one of the reasons facet prefixes don't do this by
default. Another gotcha would be, say, stemming. facet.prefix=runn
doesn't stem like "runner" for instance. In fact it doesn't stem at
all....

Note that case sensitivity matters here too. If you specified a prefix
of Ned I don't think you'd get anything counted in that bucket.

If I were going to make a queryComponent out of it, I'd probably just
define a new field that has selected filters in it (lowerCase,
folding, etc). and force the prefix through that.

Here's some background on the general problem:
https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

Skimming that again it _does_ seem possible that sending a facet
prefix through the analysis chain as though it were a wildcarded term
would do what you're asking, but nobody has yet volunteered to write
the code. It would probably require a new facet parameter like
facet.analyze=true or something.

But frankly I think that's overkill. My bet is that you could do this
on the client side "well enough" and much more quickly....

Best,
Erick

On Wed, Jun 7, 2017 at 6:03 PM, arik <arik...@gmail.com> wrote:
> Thanks Erick, indeed your hunch is correct, it's the analyzing filters that
> facet.prefix seems to bypass, and getting rid of my
> ASCIIFoldingFilterFactory and MappingCharFilterFactory make it work ok.
>
> The problem is I need those filters... otherwise how should I create facets
> which match against both Anglicized as well as international prefix
> spellings?  I could of course maintain separate fields and do multiple
> queries, but seems like that quickly gets out of hand if I also want to
> support mixed case and other filtering dimensions.
>
> Is there a way to route facet.prefix through the field type filters like all
> the other params? I suppose I could manually instantiate and pre-apply the
> filters in the client code... any other ideas?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/international-characters-in-facet-prefix-tp4339415p4339534.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: international characters in facet.prefix

Reply via email to