Re: Solr configuration issue for sorting on title field

Erik Hatcher Mon, 25 Jan 2010 06:03:58 -0800

Are you sending in more than one title per document, by chance?

Have you changed your configuration without reindexing the entirecollection, possibly?


        Erik


On Jan 25, 2010, at 8:32 AM, EL KASMI Hicham wrote:

Thanks Otis,

The long message was an attempt to be as detailed as possible, sothat one could understand our tests. I'm afraid the problem wewanted to describe didn't really come through.


These are the relevant entries from our config:

====================================================================================<field name="title" type="text" indexed="true" stored="true"termVectors="true" />

<field name="titleStr" type="string" indexed="true" stored="false" />

<copyField source="title" dest="titleStr" />

====================================================================================


We want to sort on titleStr; but we end up with the error message:

"HTTP Status 500 - there are more terms than documents in field"titleStr", but it's impossible to sort on tokenized fields".

We don't understand this message, or what is wrong in our config. Wetried several other configs, as described in my first message, butno positive result.


Thanks for any clarification you can provide us.

Hicham.

==========================================
Hicham El Kasmi
Université Libre de Bruxelles - Libraries
Av. F.D. Roosevelt 50, CP 180
1050 BRUSSELS Belgium

Tel: + 32 2 650 25 30
Fax: + 32 2 650 23 91
==========================================


-----Message d'origine-----
De : Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Envoyé : vendredi 22 janvier 2010 6:17
À : solr-user@lucene.apache.org
Objet : Re: Solr configuration issue for sorting on title field

Hi,

Long message. I skimmed through your configs. It looks like yourmain question is how can changing the field type (or, really,turning off "multiValued" on a field cause the number of document inyour index to decrease, right? Well, it can't or shouldn't. I amguessing you simply did something wrong, like not index all docs, orgot errors while indexing that you didn't notice or some such.

If all you changed is a field's type, this alone should not causeyour index to have fewer documents.


Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----

From: EL KASMI Hicham <hicham.el.ka...@ulb.ac.be>
To: solr-user@lucene.apache.org
Sent: Thu, January 21, 2010 5:14:22 AM
Subject: Solr configuration issue for sorting on title field

Hello again,

We have a problem with sorting on title field in Solr instance of our
production repository, we get the error message:

"HTTP Status 500 - there are more terms than documents in field
"titleStr", but it's impossible to sort on tokenized fields".

After some googling and searching in this listserv, we found that a
sorting field has to be untokenized but our sorting field "titleStr"
which is a copy of the "title" field has a string type.

What we did as configs in our schema.xml file :

1st config
++++++++++

sortMissingLast="true" omitNorms="true"/>

positionIncrementGap="100">

version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>

words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>

protected="protwords.txt"/>

version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>

synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>

protected="protwords.txt"/>

===============================

termVectors="true"/>

=================================

As you can see, the title field has the termVectors property astrue, we

drop it in the second attempt of our config

2end attempt
++++++++++++


positionIncrementGap="100">



version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>


protected="protwords.txt"/>





version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>


protected="protwords.txt"/>




===============================




=================================




3rd attempt
+++++++++++
Create a new field type named 'text_exact' which doesn't use the

"WhitespaceTokenizer" tokenizer but instead uses the"KeywordTokenizer"

tokenizer.



positionIncrementGap="100">



version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>


protected="protwords.txt"/>





version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>


protected="protwords.txt"/>




sortMissingLast="true" omitNorms="true">




version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>


protected="protwords.txt"/>






version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>


protected="protwords.txt"/>





===============================



stored="false"/>

=================================

No copyField.


4th attempt
+++++++++++

Same config as the 3rd attempt but adding explicitly that theproperty

'multiValued' of titleStr (text_exact type) field is false


multiValued="false"/>

For this last config, we noticed that the number of documents in our
index was downsized from ~22500 records to ~17800 records! We don't
understand this behavior of Sorl/Lucene?

For all these configs, we got the same error message, please notethat

we encounter this issue on our production server
http://difusion.ulb.ac.be/vufind/Search/Home?lookfor=&sort=pubdate+desc&;
submitButton=Recherche&type=general&sort=title (with ~22500 records),

or with the same config (the first one) on our test server

http://bib17.ulb.ac.be/vufind/Search/Home?lookfor=&sort=pubdate+desc⊂

mitButton=Find&type=general&sort=title (with ~57700 records), the
sorting on title is going well!

Thanks in advance for the time you can spend to have a look on this.

Best regards,

Hicham El Kasmi

Re: Solr configuration issue for sorting on title field

Reply via email to