This copying is a bit overstated here because of the way that small
segments are merged into larger segments. Those larger segments are then
copied much less often than the smaller ones.
While you can wind up with lots of copying in certain extreme cases, it is
quite rare. In particular, if you
Here is an example of schema design: a PDF file of 5MB might have
maybe 50k of actual text. The Solr ExtractingRequestHandler will find
that text and only index that. If you set the field to stored=true,
the 5mb will be saved. If saved=false, the PDF is not saved. Instead,
you would store a link to
For data of this size you may want to look at something like Apache
Cassandra, which is made specifically to handle data at this kind of
scale across many machines.
You can still use Hadoop to analyse and transform the data in a
performant manner, however it's probably best to do some research on
Well that begins to not look so much like a Solr/Lucene problem. Overall
data is moderately large (TB's to 10's of TB's) for Lucene and the
individual user profiles are distinctly large to be storing in Lucene.
If there is part of the profile that you might want to search, that would
be appropria
Well, actually we haven't started the actual project yet.
But probably it will have to handle the data of millions of users,
and a rough estimation for each user's data would be something around
5 MB.
The other problem is that those data will be changed very often.
I hope I answered your question
You didn't mention how big your data is or how you create it.
Hadoop would mostly used in the preparation of the data or the off-line
creation of indexes.
On Tue, Dec 20, 2011 at 12:28 PM, Alireza Salimi
wrote:
> Hi,
>
> I have a basic question, let's say we're going to have a very very huge set
Hi,
I have a basic question, let's say we're going to have a very very huge set
of data.
In a way that for sure we will need many servers (tens or hundreds of
servers).
We will also need failover.
Now the question is, if we should use Hadoop or using Solr Distributed
Search
with shards would be en