Hi, Long message. I skimmed through your configs. It looks like your main question is how can changing the field type (or, really, turning off "multiValued" on a field cause the number of document in your index to decrease, right? Well, it can't or shouldn't. I am guessing you simply did something wrong, like not index all docs, or got errors while indexing that you didn't notice or some such.
If all you changed is a field's type, this alone should not cause your index to have fewer documents. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch ----- Original Message ---- > From: EL KASMI Hicham <hicham.el.ka...@ulb.ac.be> > To: solr-user@lucene.apache.org > Sent: Thu, January 21, 2010 5:14:22 AM > Subject: Solr configuration issue for sorting on title field > > Hello again, > > We have a problem with sorting on title field in Solr instance of our > production repository, we get the error message: > > "HTTP Status 500 - there are more terms than documents in field > "titleStr", but it's impossible to sort on tokenized fields". > > After some googling and searching in this listserv, we found that a > sorting field has to be untokenized but our sorting field "titleStr" > which is a copy of the "title" field has a string type. > > What we did as configs in our schema.xml file : > > 1st config > ++++++++++ > > > sortMissingLast="true" omitNorms="true"/> > > > > positionIncrementGap="100"> > > > > version="icu4j" composed="false" remove_diacritics="true" > remove_modifiers="true" fold="true"/> > > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > > > protected="protwords.txt"/> > > > > > > version="icu4j" composed="false" remove_diacritics="true" > remove_modifiers="true" fold="true"/> > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > > > protected="protwords.txt"/> > > > > > =============================== > > > termVectors="true"/> > > > ================================= > > > > > As you can see, the title field has the termVectors property as true, we > drop it in the second attempt of our config > > 2end attempt > ++++++++++++ > > > positionIncrementGap="100"> > > > > version="icu4j" composed="false" remove_diacritics="true" > remove_modifiers="true" fold="true"/> > > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > > > protected="protwords.txt"/> > > > > > > version="icu4j" composed="false" remove_diacritics="true" > remove_modifiers="true" fold="true"/> > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > > > protected="protwords.txt"/> > > > > > =============================== > > > > > ================================= > > > > > 3rd attempt > +++++++++++ > Create a new field type named 'text_exact' which doesn't use the > "WhitespaceTokenizer" tokenizer but instead uses the "KeywordTokenizer" > tokenizer. > > > > positionIncrementGap="100"> > > > > version="icu4j" composed="false" remove_diacritics="true" > remove_modifiers="true" fold="true"/> > > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > > > protected="protwords.txt"/> > > > > > > version="icu4j" composed="false" remove_diacritics="true" > remove_modifiers="true" fold="true"/> > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > > > protected="protwords.txt"/> > > > > > sortMissingLast="true" omitNorms="true"> > > > > > version="icu4j" composed="false" remove_diacritics="true" > remove_modifiers="true" fold="true"/> > > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > > > protected="protwords.txt"/> > > > > > > > version="icu4j" composed="false" remove_diacritics="true" > remove_modifiers="true" fold="true"/> > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > > > protected="protwords.txt"/> > > > > > > =============================== > > > > stored="false"/> > > ================================= > > No copyField. > > > 4th attempt > +++++++++++ > Same config as the 3rd attempt but adding explicitly that the property > 'multiValued' of titleStr (text_exact type) field is false > > > multiValued="false"/> > > For this last config, we noticed that the number of documents in our > index was downsized from ~22500 records to ~17800 records! We don't > understand this behavior of Sorl/Lucene? > > > For all these configs, we got the same error message, please note that > we encounter this issue on our production server > http://difusion.ulb.ac.be/vufind/Search/Home?lookfor=&sort=pubdate+desc& > submitButton=Recherche&type=general&sort=title (with ~22500 records), > > or with the same config (the first one) on our test server > http://bib17.ulb.ac.be/vufind/Search/Home?lookfor=&sort=pubdate+desc⊂ > mitButton=Find&type=general&sort=title (with ~57700 records), the > sorting on title is going well! > > Thanks in advance for the time you can spend to have a look on this. > > Best regards, > > Hicham El Kasmi