On 4/17/2015 7:45 PM, Vincenzo D'Amore wrote:
Hi Shawn,
thanks for your answer.
I apologise for my english, for "floating results" I meant random results
in queries.
As far as I know, we should split the synonyms file because of zookeeper,
there is a limit in the size of files (1MB).
All my synonyms are about 10MB.
That's a very large synonyms file. If your synonyms happen at index
time, that might slow down indexing, and as I said before in my previous
reply, a full reindex would be required after updating the synonyms. If
your synonyms are at query time, a reindex wouldn't be required. Such a
large synonym file at query time could add noticeable time to query
parsing, because every term in the query would need to be checked
against every synonym.
Regarding the 1MB limit in zookeeper, you might find it more useful to
increase the limit instead of trying to use multiple files. Adding
-Djute.maxbuffer=nnnnnnnn to the java commandline on all Solr (Tomcat)
instances and all Zookeeper instances will increase this limit.
http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Experimental+Options%2FFeatures
As a general rule, storing very large stuff in zookeeper is not
recommended, but synonyms will only be read when a core first starts up
or is reloaded, so I do not think it is a big problem in this case.
I have tried again in dev environment these steps:
1. put into zookeeper an updated synonym file sinonimi_freeling/sfak (added
just one new synonym )
2. reload the core using Core Admin UI
Then I started to receive random results executing a simple query like:
http://src-dev-3:8080/solr/0bis/select/?q=smartphone&fl=*&rows=24
There are random numFound in
<result name="response" numFound="641" start="0" maxScore="4.653946">
and the order of documents vary.
If numFound is changing when you run the same query multiple times,
there is one of two things happening:
1) You have documents with the same uniqueKey value in more than one
shard. This can happen if you are using implicit (manual) document
routing for multiple shards.
2) Different replicas of your index have different settings (such as the
synonyms), or different documents in the index.Different settings can
happen if you update the config and then only reload/restart some of
your cores. Different documents in different replicas is usually an
indication of a bug, or something going very wrong, such as OutOfMemory
errors.
Thanks,
Shawn