Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Jason Rutherglen
The main point being made is established NoSQL solutions (eg, Cassandra, HBase, et al) have solved the update problem (among many other scalability issues, for several years). If an update is being performed and it is not known where the record exists, the update capability of the system is ineffi

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Lukáš Vlček
AFAIK it can not. You can only add new shards by creating a new index and you will then need to index new data into that new index. Index aliases are useful mainly for searching part. So it means that you need to plan for this when you implement your indexing logic. On the other hand the query logi

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Jason Rutherglen
I'm curious how on the fly updates are handled as a new shard is added to an alias. Eg, how does the system know to which shard to send an update? On Tue, Apr 17, 2012 at 4:00 PM, Lukáš Vlček wrote: > Hi, > > speaking about ES I think it would be fair to mention that one has to > specify number

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-17 Thread Lukáš Vlček
Hi, speaking about ES I think it would be fair to mention that one has to specify number of shards upfront when the index is created - that is correct, however, it is possible to give index one or more aliases which basically means that you can add new indices on the fly and give them same alias w

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-17 Thread Jason Rutherglen
ext.com/spm/solr-performance-monitoring/index.html > > > >> >> From: Jason Rutherglen >>To: solr-user@lucene.apache.org >>Sent: Monday, April 16, 2012 8:42 PM >>Subject: Re: Options for automagically Scaling Solr (without needing >>

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-17 Thread Otis Gospodnetic
oring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html > > From: Jason Rutherglen >To: solr-user@lucene.apache.org >Sent: Monday, April 16, 2012 8:42 PM >Subject: Re: Options for automagically Scaling Solr (without ne

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-17 Thread Jan Høydahl
Hi, I think Katta integration is nice, but it is not very real-time. What if you want both? Perhaps a Katta/SolrCloud integration could make the two frameworks play together, so that some shards in SolrCloud may be marked as "static" while others are "realtime". SolrCloud will handle indexing t

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-16 Thread Jason Rutherglen
One of big weaknesses of Solr Cloud (and ES?) is the lack of the ability to redistribute shards across servers. Meaning, as a single shard grows too large, splitting the shard, while live updates. How do you plan on elastically adding more servers without this feature? Cassandra and HBase handle

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-15 Thread Jason Rutherglen
index.html >> >> >> >>> >>> From: Ali S Kureishy >>>To: Otis Gospodnetic >>>Cc: "solr-user@lucene.apache.org" >>>Sent: Friday, April 13, 2012 7:16 PM >>>Subject: Re: Options for aut

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-14 Thread Lance Norskog
> http://sematext.com/spm/solr-performance-monitoring/index.html > > > >> >> From: Ali S Kureishy >>To: Otis Gospodnetic >>Cc: "solr-user@lucene.apache.org" >>Sent: Friday, April 13, 2012 7:16 PM >>Sub

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-14 Thread Otis Gospodnetic
ring/index.html > > From: Ali S Kureishy >To: Otis Gospodnetic >Cc: "solr-user@lucene.apache.org" >Sent: Friday, April 13, 2012 7:16 PM >Subject: Re: Options for automagically Scaling Solr (without needing >distributed index/replication) in a Hadoo

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-14 Thread Jan Høydahl
Hi, This won't give you the performance you need, unless you have enough RAM on the Solr box to cache the whole index in memory. Have you tested this yourself? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 12. apr. 2012, at 15

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-13 Thread Ali S Kureishy
Thanks Otis. I really appreciate the details offered here. This was very helpful information. I'm going to go through Solandra and Elastic Search and see if those make sense. I was also given a suggestion to use SolrCloud on FuseDFS (that's two recommendations for SolrCloud so far), so I will giv

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-13 Thread Jan Høydahl
Hi, For a web crawl+search like this you will probably need a lot of additional Big Data crunching, so a Hadoop based solution is wise. In addition to those products mentioned we also now have Amazon's own CloudSearch http://aws.amazon.com/cloudsearch/ It's new, is not as cool as Solr (not eve

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Otis Gospodnetic
Hello Ali, > I'm trying to setup a large scale *Crawl + Index + Search *infrastructure > using Nutch and Solr/Lucene. The targeted scale is *5 Billion web pages*, > crawled + indexed every *4 weeks, *with a search latency of less than 0.5 > seconds. That's fine.  Whether it's doable with any te

RE: Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Darren Govoni
Solrcloud or any other tech specific replication isnt going to 'just work' with hadoop replication. But with some significant custom coding anything should be possible. Interesting idea. br>--- Original Message --- On 4/12/2012 09:21 AM Ali S Kureishy wrote:Thanks Darren. Actually, I

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Ali S Kureishy
Thanks Darren. Actually, I would like the system to be homogenous - i.e., use Hadoop based tools that already provide all the necessary scaling for the lucene index (in terms of throughput, latency of writes/reads etc). Since SolrCloud adds its own layer of sharding/replication that is outside Had

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Darren Govoni
You could use SolrCloud (for the automatic scaling) and just mount a fuse[1] HDFS directory and configure solr to use that directory for its data. [1] https://ccp.cloudera.com/display/CDHDOC/Mountable+HDFS On Thu, 2012-04-12 at 16:04 +0300, Ali S Kureishy wrote: > Hi, > > I'm trying to setup a

Re: How to update a distributed index?

2010-10-04 Thread Otis Gospodnetic
/   - Original Message > From: bbarani > To: solr-user@lucene.apache.org > Sent: Mon, October 4, 2010 3:52:02 PM > Subject: How to update a distributed index? > > > Hi, > > We are maintaining multiple SOLR index, one for each source (the source data > is too

How to update a distributed index?

2010-10-04 Thread bbarani
res it should be pushed to index 2. Is there a way to update the index using shards? Any suggestions / ideas would be of great help for me. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-update-a-distributed-index-tp1631946p1631946.html Sent from

Connection Reset Errors on a Distributed Index

2010-05-16 Thread harish.agarwal
buted-Search-td495188.html#a495191 During the aftermath of commits on a distributed index (3 shards, about 3M documents each with many many facets), I'm getting ConnectionReset errors (see below for the full trace). The place in the code where it happens is where the 'master server

Re: Distributed index

2009-08-18 Thread Shalin Shekhar Mangar
On Tue, Aug 18, 2009 at 3:49 PM, ToJira wrote: > > Hi, > > I am very new to Solr and overall a newbie in software developing. I have a > problem with cross-platform implementation. Basically I have a local index > running on a windows server 2003 aided with a web service (asp.net) for > the > use

Distributed index

2009-08-18 Thread ToJira
message in context: http://www.nabble.com/Distributed-index-tp25022053p25022053.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-08 Thread Srikant Jakilinki
hare our experiences in a formal written discourse as and when we have them. Cheers, Srikant Ning Li wrote: There have been several proposals for a Lucene-based distributed index architecture. 1) Doug Cutting's "Index Server Project Proposal" at http://www.mail-archive.com

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-07 Thread Andrzej Bialecki
Doug Cutting wrote: Ning, I am also interested in starting a new project in this area. The approach I have in mind is slightly different, but hopefully we can come to some agreement and collaborate. I'm interested in this too. My current thinking is that the Solr search API is the appropri

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
One main focus is to provide fault-tolerance in this distributed index system. Correct me if I'm wrong, I think SOLR-303 is focusing on merging results from multiple shards right now. We'd like to start an open source project for a fault-tolerant distributed index system (or join if o

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
No. I'm curious too. :) On Feb 6, 2008 11:44 AM, J. Delgado <[EMAIL PROTECTED]> wrote: > I assume that Google also has distributed index over their > GFS/MapReduce implementation. Any idea how they achieve this? > > J.D. >

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
I work for IBM Research. I read the Rackspace article. Rackspace's Mailtrust has a similar design. Happy to see an existing application on such a system. Do they plan to open-source it? Is the AOL project an open source project? On Feb 6, 2008 11:33 AM, Clay Webster <[EMAIL PROTECTED]> wrote: > >

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ian Holsman
Clay Webster wrote: There seem to be a few other players in this space too. Are you from Rackspace? (http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop- query-terabytes-data) AOL also has a Hadoop/Solr project going on. CNET does not have much brewing there. Although Yo

Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
There have been several proposals for a Lucene-based distributed index architecture. 1) Doug Cutting's "Index Server Project Proposal" at http://www.mail-archive.com/[EMAIL PROTECTED]/msg00338.html 2) Solr's "Distributed Search" at http://wiki.apache.org/s

RE: Running into problems with distributed index and search

2007-08-23 Thread Kasi Sankaralingam
I have not seen performance degradation, but I will keep that in mind, thanks -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Thursday, August 23, 2007 8:56 AM To: solr-user@lucene.apache.org Subject: Re: Running into problems with distributed index and search

RE: Running into problems with distributed index and search

2007-08-23 Thread Kasi Sankaralingam
Thanks a lot, yes I found that yesterday after doing some experiments. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 22, 2007 11:10 PM To: solr-user@lucene.apache.org Subject: Re: Running into problems with distributed index and search : 3

Re: Running into problems with distributed index and search

2007-08-23 Thread Walter Underwood
How is the performace? For me, Solr got about 100 times faster for update when I moved the files from NFS to local disk. wunder On 8/22/07 2:27 PM, "Kasi Sankaralingam" <[EMAIL PROTECTED]> wrote: > Instance (index server) for indexing. The index file data directory > reside on a NFS partition, I

Re: Running into problems with distributed index and search

2007-08-22 Thread Chris Hostetter
: 3) I had to bounce the tomcat search SOLR Webapp instance for it to : read the index files, is it mandatory? In a distributed environment, do : we always have to : : Bounce the SOLR Webapp instances to reflect the changes in the index : files? it sounds like you esentially have a master/sl

Running into problems with distributed index and search

2007-08-22 Thread Kasi Sankaralingam
Hi All, This is the scenario, I have two search SOLR instances running on two different partitions, I am treating one of the servers strictly read-only (for search) (search server) and the other Instance (index server) for indexing. The index file data directory reside on a NFS partition, I am

Re: Backup and distributed index/backup management

2007-03-25 Thread al patel
Thanks Chris. So, looks like then one has to delete entries to keep the index managable then. In my case, we need to preserve entries - thus, wanted to "archive" snapshots, but still keep them searchable (thaw certain indices if you may). So, is there anyone out there looking into "ever increa

Re: Backup and distributed index/backup management

2007-03-24 Thread Chris Hostetter
: My question is, even with backup, solr will still have a single index, : right? We will have huge amount of data in index - it is ever increasing. if you have older docs you want to retire out of your index, you'll need to do that manually (delete by query can come in handy) : I want to archiv

Re: Backup and distributed index/backup management

2007-03-24 Thread al patel
Reposting :) Hi: I am novice to solr in terms of backup/operations. We have a single instance of master (solr) working well, I tried the backup scripts etc and could get things working fine. My question is, even with backup, solr will still have a single index, right? We will have huge amount

Backup and distributed index/backup management

2007-03-23 Thread al patel
Hi: I am novice to solr in terms of backup/operations. We have a single instance of master (solr) working well, I tried the backup scripts etc and could get things working fine. My question is, even with backup, solr will still have a single index, right? We will have huge amount of data in ind