Re: String ordering appears different with sort vs range query

Cat Bieber Fri, 20 Apr 2012 10:17:36 -0700

Thanks for looking at this. I'll see if we can sneak an upgrade to 3.6into the project to get this working.

-Cat

On 04/20/2012 12:03 PM, Erick Erickson wrote:

BTW, nice problem statement...


Anyway, I see this too in 3.5. I do NOT see
this in 3.6 or trunk, so it looks like a bug that got fixed
in the 3.6 time-frame. Don't have the time right now
to go back over the JIRA's to see...

Best
Erick

On Thu, Apr 19, 2012 at 3:39 PM, Cat Bieber<cbie...@techtarget.com>  wrote:

I'm trying to use a Solr query to find the next title in alphabetical order
after a given string. The issue I'm facing is that the sort param seems to
sort non-alphanumeric characters in a different order from the ordering used
by a range filter in the q or fq param. I can't filter the non-alphanumeric
characters out because they're integral to the data and it would not be a
useful ordering if it were based only on the alphanumeric portion of the
strings.

I'm running Solr version 3.5.

In my current approach, I have a field that is a unique string for each
document:

<fieldType name="lowerCaseSort" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>

<field name="uniqueSortString" type="lowerCaseSort" indexed="true"
stored="true"/>

I'm passing the value for the current document in a range to query
everything after the current string, sorted ascending:

/select?fl=uniqueSortString&sort=uniqueSortString+asc&q=uniqueSortString:["$1+ZX+Spectrum+HOBETA+format+file"+TO+*]&wt=xml&rows=5&version=2.2

In theory, I expect the first result to be the current item and the second
result to be the next one. However, I'm finding that the sort and the range
filter seem to use different ordering:

<result name="response" numFound="448" start="0">
<doc>
<str name="uniqueSortString">$1 ZX Spectrum - Emulator</str>
</doc>
<doc>
<str name="uniqueSortString">$1 ZX Spectrum HOBETA format file</str>
</doc>
<doc>
<str name="uniqueSortString">$1 ZX Spectrum Hobetta Picture Format</str>
</doc>
<doc>
<str name="uniqueSortString">$? TR-DOS ZX Spectrum file in HOBETA
format</str>
</doc>
<doc>
<str name="uniqueSortString">$A AutoCAD Autosave File ( Autodesk Inc.)</str>
</doc>
</result>

Based on the results ordering, sort believes - precedes H, but the range
filter should have excluded that first result if it ordered in the same way.
Digging through the code, I think it looks like sorting uses
String.compareTo() for ordering on a text/string field. However I haven't
been able to track down where the range filter code is. If someone can point
me in the right direction to find that code I'd love to look through it. Or,
if anyone has suggestions regarding a different approach or changes I can
make to this query/field, that would be very helpful.

Thanks for your time.
-Cat Bieber

Re: String ordering appears different with sort vs range query

Reply via email to