Are you sending in more than one title per document, by chance?
Have you changed your configuration without reindexing the entire
collection, possibly?
Erik
On Jan 25, 2010, at 8:32 AM, EL KASMI Hicham wrote:
Thanks Otis,
The long message was an attempt to be as detailed as possible, so
that one could understand our tests. I'm afraid the problem we
wanted to describe didn't really come through.
These are the relevant entries from our config:
=
=
=
=
=
=
=
=
=
=
=
=
=
=
======================================================================
<field name="title" type="text" indexed="true" stored="true"
termVectors="true" />
<field name="titleStr" type="string" indexed="true" stored="false" />
<copyField source="title" dest="titleStr" />
=
=
=
=
=
=
=
=
=
=
=
=
=
=
======================================================================
We want to sort on titleStr; but we end up with the error message:
"HTTP Status 500 - there are more terms than documents in field
"titleStr", but it's impossible to sort on tokenized fields".
We don't understand this message, or what is wrong in our config. We
tried several other configs, as described in my first message, but
no positive result.
Thanks for any clarification you can provide us.
Hicham.
==========================================
Hicham El Kasmi
Université Libre de Bruxelles - Libraries
Av. F.D. Roosevelt 50, CP 180
1050 BRUSSELS Belgium
Tel: + 32 2 650 25 30
Fax: + 32 2 650 23 91
==========================================
-----Message d'origine-----
De : Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Envoyé : vendredi 22 janvier 2010 6:17
À : solr-user@lucene.apache.org
Objet : Re: Solr configuration issue for sorting on title field
Hi,
Long message. I skimmed through your configs. It looks like your
main question is how can changing the field type (or, really,
turning off "multiValued" on a field cause the number of document in
your index to decrease, right? Well, it can't or shouldn't. I am
guessing you simply did something wrong, like not index all docs, or
got errors while indexing that you didn't notice or some such.
If all you changed is a field's type, this alone should not cause
your index to have fewer documents.
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
----- Original Message ----
From: EL KASMI Hicham <hicham.el.ka...@ulb.ac.be>
To: solr-user@lucene.apache.org
Sent: Thu, January 21, 2010 5:14:22 AM
Subject: Solr configuration issue for sorting on title field
Hello again,
We have a problem with sorting on title field in Solr instance of our
production repository, we get the error message:
"HTTP Status 500 - there are more terms than documents in field
"titleStr", but it's impossible to sort on tokenized fields".
After some googling and searching in this listserv, we found that a
sorting field has to be untokenized but our sorting field "titleStr"
which is a copy of the "title" field has a string type.
What we did as configs in our schema.xml file :
1st config
++++++++++
sortMissingLast="true" omitNorms="true"/>
positionIncrementGap="100">
version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
protected="protwords.txt"/>
version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
protected="protwords.txt"/>
===============================
termVectors="true"/>
=================================
As you can see, the title field has the termVectors property as
true, we
drop it in the second attempt of our config
2end attempt
++++++++++++
positionIncrementGap="100">
version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
protected="protwords.txt"/>
version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
protected="protwords.txt"/>
===============================
=================================
3rd attempt
+++++++++++
Create a new field type named 'text_exact' which doesn't use the
"WhitespaceTokenizer" tokenizer but instead uses the
"KeywordTokenizer"
tokenizer.
positionIncrementGap="100">
version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
protected="protwords.txt"/>
version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
protected="protwords.txt"/>
sortMissingLast="true" omitNorms="true">
version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
protected="protwords.txt"/>
version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
protected="protwords.txt"/>
===============================
stored="false"/>
=================================
No copyField.
4th attempt
+++++++++++
Same config as the 3rd attempt but adding explicitly that the
property
'multiValued' of titleStr (text_exact type) field is false
multiValued="false"/>
For this last config, we noticed that the number of documents in our
index was downsized from ~22500 records to ~17800 records! We don't
understand this behavior of Sorl/Lucene?
For all these configs, we got the same error message, please note
that
we encounter this issue on our production server
http://difusion.ulb.ac.be/vufind/Search/Home?lookfor=&sort=pubdate+desc&
submitButton=Recherche&type=general&sort=title (with ~22500 records),
or with the same config (the first one) on our test server
http://bib17.ulb.ac.be/vufind/Search/Home?lookfor=&sort=pubdate
+desc⊂
mitButton=Find&type=general&sort=title (with ~57700 records), the
sorting on title is going well!
Thanks in advance for the time you can spend to have a look on this.
Best regards,
Hicham El Kasmi