Hi Harry, You should be using solr.StrField, or KeywordTokenizer with solr.TextField - otherwise you’ll get multiple tokens, and for sorting, you want just one.
Here’s one way to get what you want: copyfield your title to a sortable field with a fieldtype something like (untested): <fieldType name=“titleSort” class=“solr.TextField” sortMissingLast=“true” omitNorms=“true”> <analyzer> <charFilter class=“solr.PatternReplaceCharFilterFactory” pattern=“^(?i)(a|an|the)\s+” replacement=“”/> <tokenizer class=“solr.KeywordTokenizerFactory”/> <filter class="solr.ICUFoldingFilterFactory"/> </analyzer> </fieldType> The “(?i)” thing at the start of the pattern will cause it to match case-insensitively. A common strategy for sorting titles while ignoring initial articles is to place the article at the end, separated by a comma, e.g. “Book, The” and “Wallet, A”; such a sorting mechanism would allow you to consistently sort “Book”, “The Book”, and “A Book” - here’s a slightly different version of the above field type that achieves this (again, untested): <fieldType name=“titleSort” class=“solr.TextField” sortMissingLast=“true” omitNorms=“true”> <analyzer> <charFilter class=“solr.PatternReplaceCharFilterFactory” pattern=“^(?i)(a|an|the)\s+(.*)” replacement=“$2, $1”/> <tokenizer class=“solr.KeywordTokenizerFactory”/> <filter class="solr.ICUFoldingFilterFactory"/> </analyzer> </fieldType> Steve On May 24, 2014, at 9:56 AM, HL <freemail.grha...@gmail.com> wrote: > I am trying to sort by title field asc or desc > in a manner that is influenced by the stopwords list of a language, > > for Instance I would like the title > "The Book", and "A Wallet" when sorted appear as > > title > --------- > The Book > A Wallet > > but while I only managed to get my head smashed on the solr wall, > while I had NO SUCCESS what-so-ever ! > > > So far I've tried to do this from Solr by various filedType definitions and > either copy the contents of title to BIB_title_sort > or via a dynamicField with a suffix or a prefix, > or even import the title straight into the field. > > Here is my last FAILED attempt to do that > > <fieldType name="sortString" class="solr.TextField" sortMissingLast="true" > omitNorms="true"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.ICUFoldingFilterFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_el.txt,lang/stopwords_en.txt" > enablePositionIncrements="true"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > > My question is > > Is there a possible way to do that in SOLR? > OR > Do I HAVE TO remove the STOP WORDS and so on, during the IMPORT process, by > only writing custom scripts?? > Thanks in advance, > Harry > > > >