I see a lot of people using shards to hold "different types of
documents", and it almost always seems to be a bad solution. Shards are
intended for distributing a large index over multiple hosts -- that's
it. Not for some kind of federated search over multiple schemas, not
for access control.
Why not put everything in the same index, without shards, and just use
an 'fq' limit in order to limit to the specific document you'd like to
search over in a given search? I think that would achieve your goal a
lot more simply than shards -- then you use sharding only if and when
your index grows to be so large you'd like to distribute it over
multiple hosts, and when you do so you choose a shard key that will have
more or less equal distribution accross shards.
Using shards for access control or schema management just leads to
headaches.
[Apparently Solr could use some highlighted documentation on what shards
are really for, as it seems to be a very common issue on this list,
someone trying to use them for something else and then inevitably
finding problems with that approach.]
Jonathan
On 1/7/2011 6:48 AM, supersoft wrote:
The reason of this distribution is the kind of the documents. In spite of
having the same schema structure (and solr conf), a document belongs to 1 of
5 different kinds.
Each kind corresponds to a concrete shard and due to this, the implemented
client tool avoids searching in all the shards when the users selects just
one or a few of kinds. The tool runs a multisharded query of the proper
shards. I guess this is a right approach but correct me if I am wrong.
The real problem of this architecture is the correlation between concurrent
users and response time:
1 query: n seconds
2 queries: 2*n second each query
3 queries: 3*n seconds each query
and so...
This is being a real headache because 1 single query has an acceptable
response time but when many users are accessing to the server the
performance goes hardly down.