Re: Solr configuration issue for sorting on title field

Otis Gospodnetic Thu, 21 Jan 2010 21:17:11 -0800

Hi,

Long message.  I skimmed through your configs.  It looks like your main 
question is how can changing the field type (or, really, turning off 
"multiValued" on a field cause the number of document in your index to 
decrease, right?  Well, it can't or shouldn't.  I am guessing you simply did 
something wrong, like not index all docs, or got errors while indexing that you 
didn't notice or some such.



If all you changed is a field's type, this alone should not cause your index to 
have fewer documents.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: EL KASMI Hicham <hicham.el.ka...@ulb.ac.be>
> To: solr-user@lucene.apache.org
> Sent: Thu, January 21, 2010 5:14:22 AM
> Subject: Solr configuration issue for sorting on title field
> 
> Hello again,
> 
> We have a problem with sorting on title field in Solr instance of our
> production repository, we get the error message: 
> 
> "HTTP Status 500 - there are more terms than documents in field
> "titleStr", but it's impossible to sort on tokenized fields".
> 
> After some googling and searching in this listserv, we found that a
> sorting field has to be untokenized but our sorting field "titleStr"
> which is a copy of the "title" field has a string type.
> 
> What we did as configs in our schema.xml file :
> 
> 1st config
> ++++++++++
> 
>     
> sortMissingLast="true" omitNorms="true"/>
> 
> 
>     
> positionIncrementGap="100">
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>     
> 
> ===============================
> 
>   
> termVectors="true"/>
>   
> 
> =================================
> 
> 
> 
> 
> As you can see, the title field has the termVectors property as true, we
> drop it in the second attempt of our config
> 
> 2end attempt
> ++++++++++++
> 
>     
> positionIncrementGap="100">
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>     
> 
> ===============================
> 
>   
>   
> 
> =================================
> 
> 
> 
> 
> 3rd attempt
> +++++++++++
> Create a new field type named 'text_exact' which doesn't use the
> "WhitespaceTokenizer" tokenizer but instead uses the "KeywordTokenizer"
> tokenizer.
> 
> 
>     
> positionIncrementGap="100">
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>       
>         
>     
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>     
>         
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>     
>     
> sortMissingLast="true" omitNorms="true">
>       
>     
>         
>         
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>         
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>       
>     
>         
>         
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
>         
>         
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         
> words="stopwords.txt"/>
>         
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         
>         
> protected="protwords.txt"/>
>         
>       
>     
> 
> 
> ===============================
> 
>   
>   
> stored="false"/>
> 
> =================================
> 
> No copyField.
> 
> 
> 4th attempt
> +++++++++++
> Same config as the 3rd attempt but adding explicitly that the property
> 'multiValued' of titleStr (text_exact type) field is false
> 
> 
> multiValued="false"/>
> 
> For this last config, we noticed that the number of documents in our
> index was downsized from ~22500 records to ~17800 records! We don't
> understand this behavior of Sorl/Lucene?
> 
> 
> For all these configs, we got the same error message, please note that
> we encounter this issue on our production server
> http://difusion.ulb.ac.be/vufind/Search/Home?lookfor=&sort=pubdate+desc&;
> submitButton=Recherche&type=general&sort=title (with ~22500 records), 
> 
> or with the same config (the first one) on our test server
> http://bib17.ulb.ac.be/vufind/Search/Home?lookfor=&sort=pubdate+desc⊂
> mitButton=Find&type=general&sort=title (with ~57700 records), the
> sorting on title is going well!
> 
> Thanks in advance for the time you can spend to have a look on this.
> 
> Best regards,
> 
> Hicham El Kasmi

Re: Solr configuration issue for sorting on title field

Reply via email to