Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Erick Erickson
DocValues are restricted to certain types of untokenized fields, specifically string, Trie* and UUID. So lowercasefilter is just not even in the picture. Furthermore, changing to DocValues requires completely re-indexing, so Best, Erick On Tue, Nov 25, 2014 at 1:26 PM, Shawn Heisey wrote: >

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Shawn Heisey
On 11/25/2014 6:27 AM, Alexandre Rafalovitch wrote: > The usual solution is to have faceting using the other field (with > copyField). Usually it is because people want the original unmodified > version the string without tokenization (So, "United States of > America" instead of "united" "states" "

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Ahmet Arslan
Hi, CapitalizationFilterFactory could be useful to build nice looking facet parameters. Ahmet On Tuesday, November 25, 2014 3:28 PM, Alexandre Rafalovitch wrote: The usual solution is to have faceting using the other field (with copyField). Usually it is because people want the original unmo

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Alexandre Rafalovitch
The usual solution is to have faceting using the other field (with copyField). Usually it is because people want the original unmodified version the string without tokenization (So, "United States of America" instead of "united" "states" "america"). It sounds like your case is a little different an

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Apurv Verma
Hey Michael, Thanks for your reply. My use case is a little different. I would like to get the original values in facet queries but I would like to apply filter queries in a case insensitive fashion. For example I require facet_query to return Quick, The, brown, ... But I want filter queries of

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Heyde, Ralf
Simply take 2 fields for sensitive and in-sensitive selection Am 25.11.2014 12:39 schrieb "Apurv Verma" : > Hey all, > The standard solution to doing a case-insensitive match in lucene is to > use a Lowercase filter at index and query time. However this does not > preserve the content of the orig

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
right -- missed Ahmet's answer there in my haste to respond ... -Mike On 11/25/14 6:56 AM, Ahmet Arslan wrote: Hi Apurv, I wouldn't worry about index size, increase in index size is not linear (2x) like that. Please see similar discussion : https://issues.apache.org/jira/browse/LUCENE-5620 A

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
The index size will not increase as quickly as you might think, and is not an issue in most cases. An alternative to two fields, though, is to index both upper- and lower-case tokens at the same position in a single field, and then to perform no case folding at query time. There is no standar

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Ahmet Arslan
Hi Apurv, I wouldn't worry about index size, increase in index size is not linear (2x) like that. Please see similar discussion : https://issues.apache.org/jira/browse/LUCENE-5620 Ahmet On Tuesday, November 25, 2014 1:46 PM, Ahmet Arslan wrote: Hi Apurv, You can create an additional fi

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Apurv Verma
Hii Ahmet, Thanks for your reply. Creating two separate fields is a viable solution where one contains the original value and the other contains the lowercased value. But this leads to index bloat up. (~ 2x) I am looking for any other alternative solutions. -- Regards, Apurv Verma On Tue, Nov

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Ahmet Arslan
Hi Apurv, You can create an additional field for case sensitive search, and then you can switch at query time. You will have two fields (text_ci and text_lower) with different analysers populated with copyField. Ahmet On Tuesday, November 25, 2014 1:39 PM, Apurv Verma wrote: Hey all, The sta

Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Apurv Verma
Hey all, The standard solution to doing a case-insensitive match in lucene is to use a Lowercase filter at index and query time. However this does not preserve the content of the original document. For example if my inverted index is. Term Doc_1 Doc_2 - Quick |