Hi Harry,

You should be using solr.StrField, or KeywordTokenizer with solr.TextField - 
otherwise you’ll get multiple tokens, and for sorting, you want just one.

Here’s one way to get what you want: copyfield your title to a sortable field 
with a fieldtype something like (untested):

<fieldType name=“titleSort” class=“solr.TextField” sortMissingLast=“true” 
omitNorms=“true”>
  <analyzer>
    <charFilter class=“solr.PatternReplaceCharFilterFactory”
                pattern=“^(?i)(a|an|the)\s+” 
                replacement=“”/>
    <tokenizer class=“solr.KeywordTokenizerFactory”/>
    <filter class="solr.ICUFoldingFilterFactory"/>
  </analyzer>
</fieldType>

The “(?i)” thing at the start of the pattern will cause it to match 
case-insensitively.

A common strategy for sorting titles while ignoring initial articles is to 
place the article at the end, separated by a comma, e.g. “Book, The” and 
“Wallet, A”; such a sorting mechanism would allow you to consistently sort 
“Book”, “The Book”, and “A Book” - here’s a slightly different version of the 
above field type that achieves this (again, untested):

<fieldType name=“titleSort” class=“solr.TextField” sortMissingLast=“true” 
omitNorms=“true”>
  <analyzer>
    <charFilter class=“solr.PatternReplaceCharFilterFactory”
                pattern=“^(?i)(a|an|the)\s+(.*)” 
                replacement=“$2, $1”/>
    <tokenizer class=“solr.KeywordTokenizerFactory”/>
    <filter class="solr.ICUFoldingFilterFactory"/>
  </analyzer>
</fieldType>

Steve

On May 24, 2014, at 9:56 AM, HL <freemail.grha...@gmail.com> wrote:

> I am trying to sort by title field  asc or desc
> in a manner that is influenced by the stopwords list of a language,
> 
> for Instance I would like the title
> "The Book", and "A Wallet"  when sorted  appear as
> 
> title
> ---------
> The Book
> A Wallet
> 
> but while I only managed to get my head smashed on the solr wall,
> while I had NO SUCCESS what-so-ever !
> 
> 
> So far I've tried to do this from Solr by various  filedType definitions and 
> either copy the contents of title to BIB_title_sort
> or via a dynamicField  with a suffix or a prefix,
> or even import the title straight into the field.
> 
> Here is my last FAILED attempt to do that
> 
> <fieldType name="sortString" class="solr.TextField" sortMissingLast="true" 
> omitNorms="true">
>        <analyzer type="index">
>            <tokenizer class="solr.StandardTokenizerFactory"/>
>            <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>            <filter class="solr.ICUFoldingFilterFactory"/>
>            <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="lang/stopwords_el.txt,lang/stopwords_en.txt" 
> enablePositionIncrements="true"/>
>            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
>      </fieldType>
> 
> My question is
> 
> Is there a possible way to do that in SOLR?
> OR
> Do I HAVE TO remove the STOP WORDS and so on, during the IMPORT process, by 
> only writing custom scripts??
> Thanks in advance,
> Harry
> 
> 
> 
> 

Reply via email to