On 4/17/2015 7:45 PM, Vincenzo D'Amore wrote:
Hi Shawn,

thanks for your answer.

I apologise for my english, for "floating results" I meant random results
in queries.

As far as I know, we should split the synonyms file because of zookeeper,
there is a limit in the size of files (1MB).
All my synonyms are about 10MB.

That's a very large synonyms file. If your synonyms happen at index time, that might slow down indexing, and as I said before in my previous reply, a full reindex would be required after updating the synonyms. If your synonyms are at query time, a reindex wouldn't be required. Such a large synonym file at query time could add noticeable time to query parsing, because every term in the query would need to be checked against every synonym.

Regarding the 1MB limit in zookeeper, you might find it more useful to increase the limit instead of trying to use multiple files. Adding -Djute.maxbuffer=nnnnnnnn to the java commandline on all Solr (Tomcat) instances and all Zookeeper instances will increase this limit.

http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Experimental+Options%2FFeatures

As a general rule, storing very large stuff in zookeeper is not recommended, but synonyms will only be read when a core first starts up or is reloaded, so I do not think it is a big problem in this case.

I have tried again in dev environment these steps:
1. put into zookeeper an updated synonym file sinonimi_freeling/sfak (added
just one new synonym )
2. reload the core using Core Admin UI

Then I started to receive random results executing a simple query like:

http://src-dev-3:8080/solr/0bis/select/?q=smartphone&fl=*&rows=24

There are random numFound in

<result name="response" numFound="641" start="0" maxScore="4.653946">

and the order of documents vary.

If numFound is changing when you run the same query multiple times, there is one of two things happening:

1) You have documents with the same uniqueKey value in more than one shard. This can happen if you are using implicit (manual) document routing for multiple shards.

2) Different replicas of your index have different settings (such as the synonyms), or different documents in the index.Different settings can happen if you update the config and then only reload/restart some of your cores. Different documents in different replicas is usually an indication of a bug, or something going very wrong, such as OutOfMemory errors.

Thanks,
Shawn

Reply via email to