Well, since the ascii upper-case codes are smaller than lower case, i.e. A = 0x41 a = 0x61
upper case before lower case is correct IMO. But you're being fooled by the I "tiebreaker" I'd guess, along with (I suppose) a small number of test docs. When two docs have the same sort value, the internal Lucene doc ID is used to break the tie. I suggest that it just happens that you've indexed your docs with all the upper-case versions first in your test set and all the lower-case versions second. If I'm right, and you reverse the sort order, the docs will still appear upper-case first. Try interleaving upper and lower case values and I think you'll see them mixed in the result, i.e. doc1: APPLE doc2: apple doc3: APPLE doc4: apple Best, Erick On Mon, Jul 25, 2016 at 9:59 AM, Vasu Y <vya...@gmail.com> wrote: > Hi, > We are indexing our objects into Solr and let users to sort by different > fields. The sort field is defined as specified below in schema.xml: > > <fieldType name="lowercase" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > </fieldType> > > For a field of type "lowercase", if we have the field values: APPLES, > ZUCCHINI, banana, BANANA, apples, zucchini and sort in ascending order, > solr produces the result in the following sorted order: > APPLES, apples, BANANA, banana, ZUCCHINI, zucchini. > > But we have another tool which also displays the same information from a > database in the following sorted order: > apples, APPLES, banana, BANANA, zucchini, ZUCCHINI > > But the database is using the SQL query "select column1 from table1 order > by UPPER(column1) asc". > > I could either change SQL query to "select column1 from table1 order by > LOWER(column1) asc" or change solr definition to include > solr.UpperCaseFilterFactory instead of solr.LowerCaseFilterFactory so that > both applications behave same in terms of sorting. > > But, in general, when we sort a collection of string values, what should be > the correct sort order? Should upper case value ("APPLE") come before > lowercase value ("apple") or the other way (lowercase value before > uppercase value) when sorting in ascending order? > > Thanks, > Vasu