Well, since the ascii upper-case codes are smaller than lower case,
i.e.
A = 0x41
a = 0x61

upper case before lower case is correct IMO.

But you're being fooled by the I "tiebreaker" I'd guess,
along with (I suppose) a small number of test docs. When
two docs have the same sort value, the internal Lucene
doc ID is used to break the tie. I suggest that it just happens
that you've indexed your docs with all the upper-case
versions first in your test set and all the lower-case
versions second. If I'm right, and you reverse
the sort order, the docs will still appear upper-case first.

Try interleaving upper and lower case values and I think you'll
see them mixed in the result, i.e.
doc1: APPLE
doc2: apple
doc3: APPLE
doc4: apple

Best,
Erick

On Mon, Jul 25, 2016 at 9:59 AM, Vasu Y <vya...@gmail.com> wrote:
> Hi,
>  We are indexing our objects into Solr and let users to sort by different
> fields. The sort field is defined as specified below in schema.xml:
>
>     <fieldType name="lowercase" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory" />
>       </analyzer>
>     </fieldType>
>
> For a field of type "lowercase", if we have the field values: APPLES,
> ZUCCHINI, banana, BANANA, apples, zucchini and sort in ascending order,
> solr produces the result in the following sorted order:
> APPLES, apples, BANANA, banana, ZUCCHINI, zucchini.
>
> But we have another tool which also displays the same information from a
> database in the following sorted order:
> apples, APPLES, banana, BANANA, zucchini, ZUCCHINI
>
> But the database is using the SQL query "select column1 from table1 order
> by UPPER(column1) asc".
>
> I could either change SQL query to "select column1 from table1 order by
> LOWER(column1) asc" or change solr definition to include
> solr.UpperCaseFilterFactory instead of solr.LowerCaseFilterFactory so that
> both applications behave same in terms of sorting.
>
> But, in general, when we sort a collection of string values, what should be
> the correct sort order? Should upper case value ("APPLE") come before
> lowercase value ("apple") or the other way (lowercase value before
> uppercase value) when sorting in ascending order?
>
> Thanks,
> Vasu

Reply via email to