Re: bulk indexing with optimistick lock

2015-02-13 Thread Scott Stults
This isn't a Solr-specific answer, but the easiest approach might be to just collect the document IDs you're about to add, query for them, and then filter out the ones Solr already has (this'll give you a nice list for later reporting). You'll need to keep your batch sizes below maxBooleanClauses i

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-18 Thread adfel70
I disabled softCommit and tried to run another indexing proccess. Now I see no jetty EofException and no latency peaks.. I also noticed that when I had softcommit every 10 minutes, I also saw spikes in the major GC (i use CMS) to around 9-10k. Any idea? Shawn Heisey-4 wrote > On 3/17/2014 7:07

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-17 Thread Shawn Heisey
On 3/17/2014 7:07 AM, adfel70 wrote: > we currently have arround 200gb in a server. > I'm aware of the RAM issue, but it somehow doesnt seems related. > I would expect search latency problems. not strange eofexceptions. > > regarding the http.timeout - I didn't change anything concerning this. > D

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-17 Thread adfel70
we currently have arround 200gb in a server. I'm aware of the RAM issue, but it somehow doesnt seems related. I would expect search latency problems. not strange eofexceptions. regarding the http.timeout - I didn't change anything concerning this. Do I need to explicitly set something different th

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-16 Thread Shawn Heisey
On 3/16/2014 10:34 AM, adfel70 wrote: > I have a 12-node solr 4.6.1 cluster. each node has 2 solr procceses, running > on 8gb heap jvms. each node has total of 64gb memory. > My current collection (7 shards, 3 replicas) has around 500 million docs. > I'm performing bulk indexing into the collectio

Re: Bulk Indexing Question

2012-11-27 Thread Shawn Heisey
On 11/27/2012 1:07 PM, Joseph C. Trubisz wrote: When I curl a file to be indexed (in this case, as CSV), how do I know which index it’s going to, if I have multiple indexes currently being managed by Solr? For example, I have indexes for drug, company, author, abstract and I want to CSV load to

Re: Bulk Indexing

2012-07-31 Thread Mikhail Khludnev
Usually collecting whole array hurts client's jvm JVM, sending doc-by-doc bloats sever by huge number of small requests. You need just rewrite your code from the eager loop to pulling iterator to be able to submit all docs via single http request http://wiki.apache.org/solr/Solrj#Streaming_document

Re: Bulk Indexing

2012-07-28 Thread Sohail Aboobaker
We have auto commit on and will basically send it in a loop after validating each record, we send it to search service. And keep doing it in a loop. Mikhail / Lan, are you suggesting that instead of sending it in a loop, we should collect them in an array and do a commit at the end? Is this better

Re: Bulk Indexing

2012-07-28 Thread Mikhail Khludnev
Lan, I assume that some particular server can freeze on such bulk. But overall message seems not absolutely correct to me. Solr has a lot of mechanisms to survive in such cases. Bulk indexing is absolutely right (if you submit single request with long iterator of SolrInputDocs). This indexing thre

RE: Bulk Indexing

2012-07-27 Thread Lan
I assume your're indexing on the same server that is used to execute search queries. Adding 20K documents in bulk could cause the Solr Server to 'stop the world' where the server would stop responding to queries. My suggestion is - Setup master/slave to insulate your clients from 'stop the world'

Re: Bulk Indexing

2012-07-27 Thread Sohail Aboobaker
We will be using Solr 3.x version. I was wondering if we do need to worry about this as we have only 10k index entries at a time. It sounds like a very low number and we have only document type at this point. Should we worry about directly using SolrJ for indexing and searching for this low volume

Re: Bulk Indexing

2012-07-27 Thread Alexandre Rafalovitch
Haven't tried this but: 1) I think SOLR 4 supports on-the-fly core attach/detach/select. Can somebody confirm this? 2) If 1) is true, run everything as two cores. 3) One core is live in production 4) Second core is detached from SOLR and attached to something like SolrJ, which I believe can index w

RE: Bulk Indexing

2012-07-27 Thread Zhang, Lisheng
Hi, Previously I asked a similar question and I have not fully implemented yet. My plan is: 1) use Solr only for search, not for indexing 2) have a separate java process to index (calling lucene API directly, maybe can call Solr API, I need to check more details). As other people pointed earl

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
-Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Thursday, July 26, 2012 12:46 PM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr IIRC about a two month ago problem with such scheme discussed here, but I can remember exact

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
Message- > From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] > Sent: Thursday, July 26, 2012 10:15 AM > To: solr-user@lucene.apache.org > Subject: Re: Bulk indexing data into solr > > > Coming back to your original question. I'm puzzled a little. > It

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
@lucene.apache.org Subject: Re: Bulk indexing data into solr Coming back to your original question. I'm puzzled a little. It's not clear where you wanna call Lucene API directly from. if you mean that you has standalone indexer, which write index files. Then it stops and these files become ava

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
Coming back to your original question. I'm puzzled a little. It's not clear where you wanna call Lucene API directly from. if you mean that you has standalone indexer, which write index files. Then it stops and these files become available for Solr Process it will work. Sharing index between proces

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
Right in time, guys. https://issues.apache.org/jira/browse/SOLR-3585 Here is server side update processing "fork". It does the best for halting processing on exception occurs. Plug this UpdateProcessor, specify number of threads. Then submit lazy iterator into StreamingUpdateServer at client side

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Thanks very much, both your and Rafal's advice are very helpful! -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, July 26, 2012 8:47 AM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr On 7/26/2012 7:34 AM, Rafał Kuć wrote:

Re: Bulk indexing data into solr

2012-07-26 Thread Shawn Heisey
On 7/26/2012 7:34 AM, Rafał Kuć wrote: If you use Java (and I think you do, because you mention Lucene) you should take a look at StreamingUpdateSolrServer. It not only allows you to send data in batches, but also index using multiple threads. A caveat to what Rafał said: The streaming object

Re: Bulk indexing data into solr

2012-07-26 Thread Rafał Kuć
Hello! If you use Java (and I think you do, because you mention Lucene) you should take a look at StreamingUpdateSolrServer. It not only allows you to send data in batches, but also index using multiple threads. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-06-01 Thread Tanguy Moal
Lee, Thank you very much for your answer. Using the signature field as the uniqueKey is effectively what I was doing, so the "overwriteDupes=true" parameter in my solrconfig was somehow redundant, although I wasn't aware of it! =D In practice it works perfectly and that's the nice part. By

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-31 Thread lee carroll
Tanguy You might have tried this already but can you set overwritedupes to false and set the signiture key to be the id. That way solr will manage updates? from the wiki http://wiki.apache.org/solr/Deduplication HTH Lee On 30 May 2011 08:32, Tanguy Moal wrote: > > Hello, > > Sorry for re-

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-30 Thread Tanguy Moal
Hello, Sorry for re-posting this but it seems my message got lost in the mailing list's messages stream without hitting anyone's attention... =D Shortly, has anyone already experienced dramatic indexing slowdowns during large bulk imports with overwriteDupes turned on and a fairly high dupli