Re: Case Insensitive Matching in Solr/Lucene

Ahmet Arslan Tue, 25 Nov 2014 03:59:16 -0800

Hi Apurv,

I wouldn't worry about index size, increase in index size is not linear (2x) 
like that. 
Please see similar discussion : 
https://issues.apache.org/jira/browse/LUCENE-5620


Ahmet


On Tuesday, November 25, 2014 1:46 PM, Ahmet Arslan <iori...@yahoo.com.INVALID> 
wrote:



Hi Apurv,

You can create an additional field for case sensitive search, and then you can 
switch at query time. You will have two fields (text_ci and text_lower) with 
different analysers populated with copyField.

Ahmet



On Tuesday, November 25, 2014 1:39 PM, Apurv Verma <ap...@bloomreach.com> wrote:
Hey all,
The standard solution to doing a case-insensitive match in lucene is to
use a Lowercase filter at index and query time. However this does not
preserve the content of the original document. For example if my inverted
index is.

Term      Doc_1  Doc_2
-------------------------
Quick   |       |  X
The     |   X   |
brown   |   X   |  X
dog     |   X   |
dogs    |       |  X
fox     |   X   |
foxes   |       |  X
in      |       |  X
jumped  |   X   |
lazy    |   X   |  X
leap    |       |  X
over    |   X   |  X
quick   |   X   |
summer  |       |  X
the     |   X   |
------------------------

Is it possible to choose between case insensitive/ case sensitive match at
query time. The index is stored in memory in solr. My question is, if this
is stored as a hashmap with string key can I override the hashcode so that
"Quick" and "quick" return the same hash value?

Has anyone attempted this before? Is my assumption about index right? What
would be the classes and code flow to look at?

-- 
Regards,
Apurv

Re: Case Insensitive Matching in Solr/Lucene

Reply via email to