This isn't a Solr-specific answer, but the easiest approach might be to
just collect the document IDs you're about to add, query for them, and then
filter out the ones Solr already has (this'll give you a nice list for
later reporting). You'll need to keep your batch sizes below
maxBooleanClauses i
I disabled softCommit and tried to run another indexing proccess.
Now I see no jetty EofException and no latency peaks..
I also noticed that when I had softcommit every 10 minutes, I also saw
spikes in the major GC (i use CMS) to around 9-10k.
Any idea?
Shawn Heisey-4 wrote
> On 3/17/2014 7:07
On 3/17/2014 7:07 AM, adfel70 wrote:
> we currently have arround 200gb in a server.
> I'm aware of the RAM issue, but it somehow doesnt seems related.
> I would expect search latency problems. not strange eofexceptions.
>
> regarding the http.timeout - I didn't change anything concerning this.
> D
we currently have arround 200gb in a server.
I'm aware of the RAM issue, but it somehow doesnt seems related.
I would expect search latency problems. not strange eofexceptions.
regarding the http.timeout - I didn't change anything concerning this.
Do I need to explicitly set something different th
On 3/16/2014 10:34 AM, adfel70 wrote:
> I have a 12-node solr 4.6.1 cluster. each node has 2 solr procceses, running
> on 8gb heap jvms. each node has total of 64gb memory.
> My current collection (7 shards, 3 replicas) has around 500 million docs.
> I'm performing bulk indexing into the collectio
On 11/27/2012 1:07 PM, Joseph C. Trubisz wrote:
When I curl a file to be indexed (in this case, as CSV), how do I know
which index it’s going to, if I have multiple indexes currently being
managed by Solr? For example, I have indexes for drug, company,
author, abstract and I want to CSV load to
Usually collecting whole array hurts client's jvm JVM, sending doc-by-doc
bloats sever by huge number of small requests. You need just rewrite your
code from the eager loop to pulling iterator to be able to submit all docs
via single http request
http://wiki.apache.org/solr/Solrj#Streaming_document
We have auto commit on and will basically send it in a loop after
validating each record, we send it to search service. And keep doing it in
a loop. Mikhail / Lan, are you suggesting that instead of sending it in a
loop, we should collect them in an array and do a commit at the end? Is
this better
Lan,
I assume that some particular server can freeze on such bulk. But overall
message seems not absolutely correct to me. Solr has a lot of mechanisms to
survive in such cases.
Bulk indexing is absolutely right (if you submit single request with long
iterator of SolrInputDocs). This indexing thre
I assume your're indexing on the same server that is used to execute search
queries. Adding 20K documents in bulk could cause the Solr Server to 'stop
the world' where the server would stop responding to queries.
My suggestion is
- Setup master/slave to insulate your clients from 'stop the world'
We will be using Solr 3.x version. I was wondering if we do need to worry
about this as we have only 10k index entries at a time. It sounds like a
very low number and we have only document type at this point.
Should we worry about directly using SolrJ for indexing and searching for
this low volume
Haven't tried this but:
1) I think SOLR 4 supports on-the-fly core attach/detach/select. Can
somebody confirm this?
2) If 1) is true, run everything as two cores.
3) One core is live in production
4) Second core is detached from SOLR and attached to something like
SolrJ, which I believe can index w
Hi,
Previously I asked a similar question and I have not fully implemented yet.
My plan is:
1) use Solr only for search, not for indexing
2) have a separate java process to index (calling lucene API directly, maybe
can call Solr API, I need to check more details).
As other people pointed earl
-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
Sent: Thursday, July 26, 2012 12:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Bulk indexing data into solr
IIRC about a two month ago problem with such scheme discussed here, but I
can remember exact
Message-
> From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
> Sent: Thursday, July 26, 2012 10:15 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Bulk indexing data into solr
>
>
> Coming back to your original question. I'm puzzled a little.
> It
@lucene.apache.org
Subject: Re: Bulk indexing data into solr
Coming back to your original question. I'm puzzled a little.
It's not clear where you wanna call Lucene API directly from.
if you mean that you has standalone indexer, which write index files. Then
it stops and these files become ava
Coming back to your original question. I'm puzzled a little.
It's not clear where you wanna call Lucene API directly from.
if you mean that you has standalone indexer, which write index files. Then
it stops and these files become available for Solr Process it will work.
Sharing index between proces
Right in time, guys. https://issues.apache.org/jira/browse/SOLR-3585
Here is server side update processing "fork". It does the best for halting
processing on exception occurs. Plug this UpdateProcessor, specify number
of threads. Then submit lazy iterator into StreamingUpdateServer at client
side
Thanks very much, both your and Rafal's advice are very helpful!
-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Thursday, July 26, 2012 8:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Bulk indexing data into solr
On 7/26/2012 7:34 AM, Rafał Kuć wrote:
On 7/26/2012 7:34 AM, Rafał Kuć wrote:
If you use Java (and I think you do, because you mention Lucene) you
should take a look at StreamingUpdateSolrServer. It not only allows
you to send data in batches, but also index using multiple threads.
A caveat to what Rafał said:
The streaming object
Hello!
If you use Java (and I think you do, because you mention Lucene) you
should take a look at StreamingUpdateSolrServer. It not only allows
you to send data in batches, but also index using multiple threads.
--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Lee,
Thank you very much for your answer.
Using the signature field as the uniqueKey is effectively what I was
doing, so the "overwriteDupes=true" parameter in my solrconfig was
somehow redundant, although I wasn't aware of it! =D
In practice it works perfectly and that's the nice part.
By
Tanguy
You might have tried this already but can you set overwritedupes to
false and set the signiture key to be the id. That way solr
will manage updates?
from the wiki
http://wiki.apache.org/solr/Deduplication
HTH
Lee
On 30 May 2011 08:32, Tanguy Moal wrote:
>
> Hello,
>
> Sorry for re-
Hello,
Sorry for re-posting this but it seems my message got lost in the
mailing list's messages stream without hitting anyone's attention... =D
Shortly, has anyone already experienced dramatic indexing slowdowns
during large bulk imports with overwriteDupes turned on and a fairly
high dupli
24 matches
Mail list logo