The main point being made is established NoSQL solutions (eg,
Cassandra, HBase, et al) have solved the update problem (among many
other scalability issues, for several years).
If an update is being performed and it is not known where the record
exists, the update capability of the system is ineffi
AFAIK it can not. You can only add new shards by creating a new index and
you will then need to index new data into that new index. Index aliases are
useful mainly for searching part. So it means that you need to plan for
this when you implement your indexing logic. On the other hand the query
logi
I'm curious how on the fly updates are handled as a new shard is added
to an alias. Eg, how does the system know to which shard to send an
update?
On Tue, Apr 17, 2012 at 4:00 PM, Lukáš Vlček wrote:
> Hi,
>
> speaking about ES I think it would be fair to mention that one has to
> specify number
Hi,
speaking about ES I think it would be fair to mention that one has to
specify number of shards upfront when the index is created - that is
correct, however, it is possible to give index one or more aliases which
basically means that you can add new indices on the fly and give them same
alias w
ext.com/spm/solr-performance-monitoring/index.html
>
>
>
>>
>> From: Jason Rutherglen
>>To: solr-user@lucene.apache.org
>>Sent: Monday, April 16, 2012 8:42 PM
>>Subject: Re: Options for automagically Scaling Solr (without needing
>>
oring SaaS for Solr -
http://sematext.com/spm/solr-performance-monitoring/index.html
>
> From: Jason Rutherglen
>To: solr-user@lucene.apache.org
>Sent: Monday, April 16, 2012 8:42 PM
>Subject: Re: Options for automagically Scaling Solr (without ne
Hi,
I think Katta integration is nice, but it is not very real-time. What if you
want both?
Perhaps a Katta/SolrCloud integration could make the two frameworks play
together, so that some shards in SolrCloud may be marked as "static" while
others are "realtime". SolrCloud will handle indexing t
One of big weaknesses of Solr Cloud (and ES?) is the lack of the
ability to redistribute shards across servers. Meaning, as a single
shard grows too large, splitting the shard, while live updates.
How do you plan on elastically adding more servers without this feature?
Cassandra and HBase handle
index.html
>>
>>
>>
>>>
>>> From: Ali S Kureishy
>>>To: Otis Gospodnetic
>>>Cc: "solr-user@lucene.apache.org"
>>>Sent: Friday, April 13, 2012 7:16 PM
>>>Subject: Re: Options for aut
> http://sematext.com/spm/solr-performance-monitoring/index.html
>
>
>
>>
>> From: Ali S Kureishy
>>To: Otis Gospodnetic
>>Cc: "solr-user@lucene.apache.org"
>>Sent: Friday, April 13, 2012 7:16 PM
>>Sub
ring/index.html
>
> From: Ali S Kureishy
>To: Otis Gospodnetic
>Cc: "solr-user@lucene.apache.org"
>Sent: Friday, April 13, 2012 7:16 PM
>Subject: Re: Options for automagically Scaling Solr (without needing
>distributed index/replication) in a Hadoo
Hi,
This won't give you the performance you need, unless you have enough RAM on the
Solr box to cache the whole index in memory.
Have you tested this yourself?
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 12. apr. 2012, at 15
Thanks Otis.
I really appreciate the details offered here. This was very helpful
information.
I'm going to go through Solandra and Elastic Search and see if those make
sense. I was also given a suggestion to use SolrCloud on FuseDFS (that's
two recommendations for SolrCloud so far), so I will giv
Hi,
For a web crawl+search like this you will probably need a lot of additional Big
Data crunching, so a Hadoop based solution is wise.
In addition to those products mentioned we also now have Amazon's own
CloudSearch http://aws.amazon.com/cloudsearch/ It's new, is not as cool as Solr
(not eve
Hello Ali,
> I'm trying to setup a large scale *Crawl + Index + Search *infrastructure
> using Nutch and Solr/Lucene. The targeted scale is *5 Billion web pages*,
> crawled + indexed every *4 weeks, *with a search latency of less than 0.5
> seconds.
That's fine. Whether it's doable with any te
Solrcloud or any other tech specific replication isnt going to 'just work' with
hadoop replication. But with some significant custom coding anything should be
possible. Interesting idea.
br>--- Original Message ---
On 4/12/2012 09:21 AM Ali S Kureishy wrote:Thanks Darren.
Actually, I
Thanks Darren.
Actually, I would like the system to be homogenous - i.e., use Hadoop based
tools that already provide all the necessary scaling for the lucene index
(in terms of throughput, latency of writes/reads etc). Since SolrCloud adds
its own layer of sharding/replication that is outside Had
You could use SolrCloud (for the automatic scaling) and just mount a
fuse[1] HDFS directory and configure solr to use that directory for its
data.
[1] https://ccp.cloudera.com/display/CDHDOC/Mountable+HDFS
On Thu, 2012-04-12 at 16:04 +0300, Ali S Kureishy wrote:
> Hi,
>
> I'm trying to setup a
/
- Original Message
> From: bbarani
> To: solr-user@lucene.apache.org
> Sent: Mon, October 4, 2010 3:52:02 PM
> Subject: How to update a distributed index?
>
>
> Hi,
>
> We are maintaining multiple SOLR index, one for each source (the source data
> is too
res it should be pushed to index 2.
Is there a way to update the index using shards?
Any suggestions / ideas would be of great help for me.
Thanks,
Barani
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-update-a-distributed-index-tp1631946p1631946.html
Sent from
buted-Search-td495188.html#a495191
During the aftermath of commits on a distributed index (3 shards, about 3M
documents each with many many facets), I'm getting ConnectionReset errors
(see below for the full trace). The place in the code where it happens is
where the 'master server
On Tue, Aug 18, 2009 at 3:49 PM, ToJira wrote:
>
> Hi,
>
> I am very new to Solr and overall a newbie in software developing. I have a
> problem with cross-platform implementation. Basically I have a local index
> running on a windows server 2003 aided with a web service (asp.net) for
> the
> use
message in context:
http://www.nabble.com/Distributed-index-tp25022053p25022053.html
Sent from the Solr - User mailing list archive at Nabble.com.
hare our experiences in a formal written discourse as and
when we have them.
Cheers,
Srikant
Ning Li wrote:
There have been several proposals for a Lucene-based distributed index
architecture.
1) Doug Cutting's "Index Server Project Proposal" at
http://www.mail-archive.com
Doug Cutting wrote:
Ning,
I am also interested in starting a new project in this area. The
approach I have in mind is slightly different, but hopefully we can come
to some agreement and collaborate.
I'm interested in this too.
My current thinking is that the Solr search API is the appropri
One main focus is to provide fault-tolerance in this distributed index
system. Correct me if I'm wrong, I think SOLR-303 is focusing on merging
results from multiple shards right now. We'd like to start an open source
project for a fault-tolerant distributed index system (or join if o
No. I'm curious too. :)
On Feb 6, 2008 11:44 AM, J. Delgado <[EMAIL PROTECTED]> wrote:
> I assume that Google also has distributed index over their
> GFS/MapReduce implementation. Any idea how they achieve this?
>
> J.D.
>
I work for IBM Research. I read the Rackspace article. Rackspace's Mailtrust
has a similar design. Happy to see an existing application on such a system.
Do they plan to open-source it? Is the AOL project an open source project?
On Feb 6, 2008 11:33 AM, Clay Webster <[EMAIL PROTECTED]> wrote:
>
>
Clay Webster wrote:
There seem to be a few other players in this space too.
Are you from Rackspace?
(http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-
query-terabytes-data)
AOL also has a Hadoop/Solr project going on.
CNET does not have much brewing there. Although Yo
There have been several proposals for a Lucene-based distributed index
architecture.
1) Doug Cutting's "Index Server Project Proposal" at
http://www.mail-archive.com/[EMAIL PROTECTED]/msg00338.html
2) Solr's "Distributed Search" at
http://wiki.apache.org/s
I have not seen performance degradation, but I will keep that in mind,
thanks
-Original Message-
From: Walter Underwood [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 23, 2007 8:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Running into problems with distributed index and search
Thanks a lot, yes I found that yesterday after doing some experiments.
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 22, 2007 11:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Running into problems with distributed index and search
: 3
How is the performace? For me, Solr got about 100 times faster for
update when I moved the files from NFS to local disk.
wunder
On 8/22/07 2:27 PM, "Kasi Sankaralingam" <[EMAIL PROTECTED]> wrote:
> Instance (index server) for indexing. The index file data directory
> reside on a NFS partition, I
: 3) I had to bounce the tomcat search SOLR Webapp instance for it to
: read the index files, is it mandatory? In a distributed environment, do
: we always have to
:
: Bounce the SOLR Webapp instances to reflect the changes in the index
: files?
it sounds like you esentially have a master/sl
Hi All,
This is the scenario, I have two search SOLR instances running on two
different partitions, I am treating one of the servers strictly
read-only (for search) (search server) and the other
Instance (index server) for indexing. The index file data directory
reside on a NFS partition, I am
Thanks Chris.
So, looks like then one has to delete entries to keep the index managable
then.
In my case, we need to preserve entries - thus, wanted to "archive"
snapshots, but still keep them searchable (thaw certain indices if you may).
So, is there anyone out there looking into "ever increa
: My question is, even with backup, solr will still have a single index,
: right? We will have huge amount of data in index - it is ever increasing.
if you have older docs you want to retire out of your index, you'll need
to do that manually (delete by query can come in handy)
: I want to archiv
Reposting :)
Hi:
I am novice to solr in terms of backup/operations.
We have a single instance of master (solr) working well, I tried the
backup scripts etc and could get things working fine.
My question is, even with backup, solr will still have a single index,
right? We will have huge amount
Hi:
I am novice to solr in terms of backup/operations.
We have a single instance of master (solr) working well, I tried the backup
scripts etc and could get things working fine.
My question is, even with backup, solr will still have a single index,
right? We will have huge amount of data in ind
39 matches
Mail list logo