Hello. My company is currently thinking of switching over to Solr 4.2,
coming off of SQL Server. However, what we need to do is a bit weird.

Right now, we have ~12 million segments and growing. Usually these are
sentences but can be other things. These segments are what will be stored
in Solr. I’ve already done that.

Now, what happens is a user will upload say a word document to us. We then
parse it and process it into segments. It very well could be 5000 segments
or even more in that word document. Each one of those ~5000 segments needs
to be searched for similar segments in solr. I’m not quite sure how I will
do the query (whether proximate or something else). The point though, is to
get back similar results for each segment.

However, I think I’m seeing a bigger problem first. I have to search
against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m
pretty sure that would take a LOT of hardware. Keep in mind this could be
happening with maybe 4 different users at once right now (and of course
more in the future). Is there a good way to send a batch query over one (or
at least a lot fewer) http requests?

If not, what kinds of things could I do to implement such a feature (if
feasible, of course)?


Thanks,

Mike

Reply via email to