subject:"Solr Distributed Search vs Hadoop"

Re: Solr Distributed Search vs Hadoop

2011-12-28 Thread Ted Dunning

This copying is a bit overstated here because of the way that small segments are merged into larger segments. Those larger segments are then copied much less often than the smaller ones. While you can wind up with lots of copying in certain extreme cases, it is quite rare. In particular, if you

Re: Solr Distributed Search vs Hadoop

2011-12-28 Thread Lance Norskog

Here is an example of schema design: a PDF file of 5MB might have maybe 50k of actual text. The Solr ExtractingRequestHandler will find that text and only index that. If you set the field to stored=true, the 5mb will be saved. If saved=false, the PDF is not saved. Instead, you would store a link to

Re: Solr Distributed Search vs Hadoop

2011-12-23 Thread Nick Vincent

For data of this size you may want to look at something like Apache Cassandra, which is made specifically to handle data at this kind of scale across many machines. You can still use Hadoop to analyse and transform the data in a performant manner, however it's probably best to do some research on

Re: Solr Distributed Search vs Hadoop

2011-12-20 Thread Ted Dunning

Well that begins to not look so much like a Solr/Lucene problem. Overall data is moderately large (TB's to 10's of TB's) for Lucene and the individual user profiles are distinctly large to be storing in Lucene. If there is part of the profile that you might want to search, that would be appropria

Re: Solr Distributed Search vs Hadoop

2011-12-20 Thread Alireza Salimi

Well, actually we haven't started the actual project yet. But probably it will have to handle the data of millions of users, and a rough estimation for each user's data would be something around 5 MB. The other problem is that those data will be changed very often. I hope I answered your question

Re: Solr Distributed Search vs Hadoop

2011-12-20 Thread Ted Dunning

You didn't mention how big your data is or how you create it. Hadoop would mostly used in the preparation of the data or the off-line creation of indexes. On Tue, Dec 20, 2011 at 12:28 PM, Alireza Salimi wrote: > Hi, > > I have a basic question, let's say we're going to have a very very huge set

Solr Distributed Search vs Hadoop

2011-12-20 Thread Alireza Salimi

Hi, I have a basic question, let's say we're going to have a very very huge set of data. In a way that for sure we will need many servers (tens or hundreds of servers). We will also need failover. Now the question is, if we should use Hadoop or using Solr Distributed Search with shards would be en

Re: Solr Distributed Search vs Hadoop

Re: Solr Distributed Search vs Hadoop

Re: Solr Distributed Search vs Hadoop

Re: Solr Distributed Search vs Hadoop

Re: Solr Distributed Search vs Hadoop

Re: Solr Distributed Search vs Hadoop

Solr Distributed Search vs Hadoop

7 matches

Site Navigation

Mail list logo

Footer information