RE: Solr configuration issue for sorting on title field

EL KASMI Hicham Mon, 25 Jan 2010 05:32:40 -0800

Thanks Otis,

The long message was an attempt to be as detailed as possible, so that one 
could understand our tests. I'm afraid the problem we wanted to describe didn't 
really come through.


These are the relevant entries from our config:

====================================================================================
<field name="title" type="text" indexed="true" stored="true" termVectors="true" 
/> 
<field name="titleStr" type="string" indexed="true" stored="false" /> 

<copyField source="title" dest="titleStr" />
====================================================================================

We want to sort on titleStr; but we end up with the error message:

"HTTP Status 500 - there are more terms than documents in field "titleStr", but 
it's impossible to sort on tokenized fields".

We don't understand this message, or what is wrong in our config. We tried 
several other configs, as described in my first message, but no positive result.

Thanks for any clarification you can provide us.

Hicham.

==========================================
Hicham El Kasmi
Université Libre de Bruxelles - Libraries
Av. F.D. Roosevelt 50, CP 180
1050 BRUSSELS Belgium

Tel: + 32 2 650 25 30
Fax: + 32 2 650 23 91
========================================== 


-----Message d'origine-----
De : Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Envoyé : vendredi 22 janvier 2010 6:17
À : solr-user@lucene.apache.org
Objet : Re: Solr configuration issue for sorting on title field

Hi,

Long message.  I skimmed through your configs.  It looks like your main 
question is how can changing the field type (or, really, turning off 
"multiValued" on a field cause the number of document in your index to 
decrease, right?  Well, it can't or shouldn't.  I am guessing you simply did 
something wrong, like not index all docs, or got errors while indexing that you 
didn't notice or some such.


If all you changed is a field's type, this alone should not cause your index to 
have fewer documents.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: EL KASMI Hicham <hicham.el.ka...@ulb.ac.be>
> To: solr-user@lucene.apache.org
> Sent: Thu, January 21, 2010 5:14:22 AM
> Subject: Solr configuration issue for sorting on title field
> 
> Hello again,
> 
> We have a problem with sorting on title field in Solr instance of our
> production repository, we get the error message: 
> 
> "HTTP Status 500 - there are more terms than documents in field
> "titleStr", but it's impossible to sort on tokenized fields".
> 
> After some googling and searching in this listserv, we found that a
> sorting field has to be untokenized but our sorting field "titleStr"
> which is a copy of the "title" field has a string type.
> 
> What we did as configs in our schema.xml file :
> 
> 1st config
> ++++++++++
> 
>     
> sortMissingLast="true" omitNorms="true"/>
> 
> 
>     
> positionIncrementGap="100">
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>     
> 
> ===============================
> 
>   
> termVectors="true"/>
>   
> 
> =================================
> 
> 
> 
> 
> As you can see, the title field has the termVectors property as true, we
> drop it in the second attempt of our config
> 
> 2end attempt
> ++++++++++++
> 
>     
> positionIncrementGap="100">
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>     
> 
> ===============================
> 
>   
>   
> 
> =================================
> 
> 
> 
> 
> 3rd attempt
> +++++++++++
> Create a new field type named 'text_exact' which doesn't use the
> "WhitespaceTokenizer" tokenizer but instead uses the "KeywordTokenizer"
> tokenizer.
> 
> 
>     
> positionIncrementGap="100">
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>     
>     
> sortMissingLast="true" omitNorms="true">
>       
>     
>         
>         
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>         
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>       
>     
>         
>         
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>         
>         
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>     
> 
> 
> ===============================
> 
>   
>   
> stored="false"/>
> 
> =================================
> 
> No copyField.
> 
> 
> 4th attempt
> +++++++++++
> Same config as the 3rd attempt but adding explicitly that the property
> 'multiValued' of titleStr (text_exact type) field is false
> 
> 
> multiValued="false"/>
> 
> For this last config, we noticed that the number of documents in our
> index was downsized from ~22500 records to ~17800 records! We don't
> understand this behavior of Sorl/Lucene?
> 
> 
> For all these configs, we got the same error message, please note that
> we encounter this issue on our production server
> http://difusion.ulb.ac.be/vufind/Search/Home?lookfor=&sort=pubdate+desc&;
> submitButton=Recherche&type=general&sort=title (with ~22500 records), 
> 
> or with the same config (the first one) on our test server
> http://bib17.ulb.ac.be/vufind/Search/Home?lookfor=&sort=pubdate+desc⊂
> mitButton=Find&type=general&sort=title (with ~57700 records), the
> sorting on title is going well!
> 
> Thanks in advance for the time you can spend to have a look on this.
> 
> Best regards,
> 
> Hicham El Kasmi

RE: Solr configuration issue for sorting on title field

Reply via email to