The usual solution is to have faceting using the other field (with copyField). Usually it is because people want the original unmodified version the string without tokenization (So, "United States of America" instead of "united" "states" "america"). It sounds like your case is a little different and you do want tokenized values, just not lowercased.
In which case, I would copyField and do the different processing. Also, in latest Solr, the recommendation is to use docValues for fields used for faceting, so you can benefit from that speed-up as well. As to the different variants of the same token, some of the filters have preserve_original flag that will generate two forms. For example WordDelimiterFilterFactory http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilterFactory.html There is also http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilterFactory.html but it is not clear what consequent filters actually take advantage of this duplication. And, of course, ngram filters generate multiple token substrings, all in the same positions. Easy to see by using an analyzer chain that has one and testing it in the Admin UI's Analyze screen with extended information checkbox enabled. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 25 November 2014 at 08:05, Apurv Verma <dapu...@gmail.com> wrote: > Hey Michael, > Thanks for your reply. My use case is a little different. I would like to > get the original values in facet queries but I would like to apply filter > queries in a case insensitive fashion. > > For example I require facet_query to return Quick, The, brown, ... > But I want filter queries of the form fq=Term:"quick" > > Also could you please point me to some additional links on how I can index > different variants of a token at the same position? > > > -- > Regards, > Apurv Verma > > > > On Tue, Nov 25, 2014 at 6:26 PM, Michael Sokolov < > msoko...@safaribooksonline.com> wrote: > >> right -- missed Ahmet's answer there in my haste to respond ... >> >> -Mike >> >> >> On 11/25/14 6:56 AM, Ahmet Arslan wrote: >> >>> Hi Apurv, >>> >>> I wouldn't worry about index size, increase in index size is not linear >>> (2x) like that. >>> Please see similar discussion : >>> https://issues.apache.org/jira/browse/LUCENE-5620 >>> >>> Ahmet >>> >>> >>> On Tuesday, November 25, 2014 1:46 PM, Ahmet Arslan >>> <iori...@yahoo.com.INVALID> wrote: >>> >>> >>> >>> Hi Apurv, >>> >>> You can create an additional field for case sensitive search, and then >>> you can switch at query time. You will have two fields (text_ci and >>> text_lower) with different analysers populated with copyField. >>> >>> Ahmet >>> >>> >>> >>> On Tuesday, November 25, 2014 1:39 PM, Apurv Verma <ap...@bloomreach.com> >>> wrote: >>> Hey all, >>> The standard solution to doing a case-insensitive match in lucene is to >>> use a Lowercase filter at index and query time. However this does not >>> preserve the content of the original document. For example if my inverted >>> index is. >>> >>> Term Doc_1 Doc_2 >>> ------------------------- >>> Quick | | X >>> The | X | >>> brown | X | X >>> dog | X | >>> dogs | | X >>> fox | X | >>> foxes | | X >>> in | | X >>> jumped | X | >>> lazy | X | X >>> leap | | X >>> over | X | X >>> quick | X | >>> summer | | X >>> the | X | >>> ------------------------ >>> >>> Is it possible to choose between case insensitive/ case sensitive match at >>> query time. The index is stored in memory in solr. My question is, if this >>> is stored as a hashmap with string key can I override the hashcode so that >>> "Quick" and "quick" return the same hash value? >>> >>> Has anyone attempted this before? Is my assumption about index right? What >>> would be the classes and code flow to look at? >>> >>> >>