Hello. My company is currently thinking of switching over to Solr 4.2, coming off of SQL Server. However, what we need to do is a bit weird.
Right now, we have ~12 million segments and growing. Usually these are sentences but can be other things. These segments are what will be stored in Solr. I’ve already done that. Now, what happens is a user will upload say a word document to us. We then parse it and process it into segments. It very well could be 5000 segments or even more in that word document. Each one of those ~5000 segments needs to be searched for similar segments in solr. I’m not quite sure how I will do the query (whether proximate or something else). The point though, is to get back similar results for each segment. However, I think I’m seeing a bigger problem first. I have to search against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m pretty sure that would take a LOT of hardware. Keep in mind this could be happening with maybe 4 different users at once right now (and of course more in the future). Is there a good way to send a batch query over one (or at least a lot fewer) http requests? If not, what kinds of things could I do to implement such a feature (if feasible, of course)? Thanks, Mike