Re: Case Insensitive Matching in Solr/Lucene

Erick Erickson Tue, 25 Nov 2014 15:01:04 -0800

DocValues are restricted to certain types of untokenized fields,
specifically string, Trie* and UUID. So lowercasefilter is just not
even in the picture.


Furthermore, changing to DocValues requires completely re-indexing, so....

Best,
Erick

On Tue, Nov 25, 2014 at 1:26 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 11/25/2014 6:27 AM, Alexandre Rafalovitch wrote:
>> The usual solution is to have faceting using the other field (with
>> copyField). Usually it is because people want the original unmodified
>> version the string without tokenization (So, "United States of
>> America" instead of "united" "states" "america"). It sounds like your
>> case is a little different and you do want tokenized values, just not
>> lowercased.
>
> Something I've been wondering about related to facets.  This might be a
> tangent from the original issue, but it's somewhat related, so I'm
> asking it here.
>
> It's my understanding that DocValues have the same info as stored fields
> -- that is, the original value, completely unmodified by the analysis chain.
>
> It's also my understanding that DocValues get used for sorting and
> facets if they are present.
>
> If both of these assumptions/understandings are correct, then I would
> think that simply turning on DocValues for a field with the lowercase
> filter (and reindexing) would allow case-insensitive queries *plus*
> facets with the original unmodified and untokenized values.
>
> Have I got completely the wrong idea?  I haven't tested any of this.
>
> Thanks,
> Shawn
>

Re: Case Insensitive Matching in Solr/Lucene

Reply via email to