Re: Indexing huge data onto solr

2020-05-26 Thread Erick Erickson
ch one of parent > tuples and execute the child entity sql’s(with where condition of parent) to > create one solr document? Won’t it be more load on database by executing more > sqls? Is there an optimum solution? > > Thanks, > Srinivas > From: Erick Erickson > Sent: 22 May 2

RE: Indexing huge data onto solr

2020-05-25 Thread Srinivas Kashyap
22:52 To: solr-user@lucene.apache.org Subject: Re: Indexing huge data onto solr You have a lot more control over the speed and form of importing data if you just do the initial load in SolrJ. Here’s an example, taking the Tika parts out is easy: https://lucidworks.com/post/indexing-with-solrj

Re: Indexing huge data onto solr

2020-05-22 Thread matthew sporleder
I can index (without nested entities ofc ;) ) 100M records in about 6-8 hours on a pretty low-powered machine using vanilla DIH -> mysql so it is probably worth looking at why it is going slow before writing your own indexer (which we are finally having to do) On Fri, May 22, 2020 at 1:22 PM Erick

Re: Indexing huge data onto solr

2020-05-22 Thread Erick Erickson
You have a lot more control over the speed and form of importing data if you just do the initial load in SolrJ. Here’s an example, taking the Tika parts out is easy: https://lucidworks.com/post/indexing-with-solrj/ It’s especially instructive to comment out just the call to CloudSolrClient.add(d

Re: Indexing huge data

2014-03-08 Thread Rallavagu
[mailto:erickerick...@gmail.com] Sent: Wednesday, March 05, 2014 8:03 PM To: solr-user@lucene.apache.org Subject: Re: Indexing huge data Here's the easiest thing to try to figure out where to concentrate your energies. Just comment out the server.add call in your SolrJ program. Well, and a

Re: Indexing huge data

2014-03-07 Thread Erick Erickson
nto Solr >>> using CSV handler & curl. This will give you the pure indexing time & the >>> differences. >>> >>> Thanks, >>> Susheel >>> >>> -Original Message- >>> From: Erick Erickson [mailto:erickerick...@gmail.

Re: Indexing huge data

2014-03-06 Thread Kranti Parisa
al Message- >> From: Erick Erickson [mailto:erickerick...@gmail.com] >> Sent: Wednesday, March 05, 2014 8:03 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Indexing huge data >> >> Here's the easiest thing to try to figure out where to concentrate your >

Re: Indexing huge data

2014-03-06 Thread Rallavagu
into Solr using CSV handler & curl. This will give you the pure indexing time & the differences. Thanks, Susheel -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 05, 2014 8:03 PM To: solr-user@lucene.apache.org Subject: Re: Indexing h

Re: Indexing huge data

2014-03-06 Thread Rallavagu
Erick, That helps so I can focus on the problem areas. Thanks. On 3/5/14, 6:03 PM, Erick Erickson wrote: Here's the easiest thing to try to figure out where to concentrate your energies. Just comment out the server.add call in your SolrJ program. Well, and any commits you're doing from Solr

RE: Indexing huge data

2014-03-05 Thread Susheel Kumar
m: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 05, 2014 8:03 PM To: solr-user@lucene.apache.org Subject: Re: Indexing huge data Here's the easiest thing to try to figure out where to concentrate your energies. Just comment out the server.add call in your SolrJ progra

Re: Indexing huge data

2014-03-05 Thread Erick Erickson
Here's the easiest thing to try to figure out where to concentrate your energies. Just comment out the server.add call in your SolrJ program. Well, and any commits you're doing from SolrJ. My bet: Your program will run at about the same speed it does when you actually index the docs, indicatin

Re: Indexing huge data

2014-03-05 Thread Jack Krupansky
Make sure you're not doing a commit on each individual document add. Commit every few minutes or every few hundred or few thousand documents is sufficient. You can set up auto commit in solrconfig.xml. -- Jack Krupansky -Original Message- From: Rallavagu Sent: Wednesday, March 5, 201

Re: Indexing huge data

2014-03-05 Thread Otis Gospodnetic
Hi, Each doc is 100K? That's on the big side, yes, and the server seems on the small side, yes. Hence the "speed". :) Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 3:37 PM, Rallavagu wrote: > Otis

Re: Indexing huge data

2014-03-05 Thread Rallavagu
Otis, Good points. I guess you are suggesting that it depends on the resources. The document is 100k each the pre processing server is a 2 cpu VM running with 4G RAM. So, that could be a "small" machine relatively to process such amount of data?? On 3/5/14, 12:27 PM, Otis Gospodnetic wrote:

Re: Indexing huge data

2014-03-05 Thread Otis Gospodnetic
Hi, It depends. Are docs huge or small? Server single core or 32 core? Heap big or small? etc. etc. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 3:02 PM, Rallavagu wrote: > It seems the latency

Re: Indexing huge data

2014-03-05 Thread Rallavagu
It seems the latency is introduced by collecting the data from different sources and putting them together then actual Solr index. I would say all these activities are contributing equally though I would say So, is it normal to expect to run indexing to run for long? Wondering what to expect in

Re: Indexing huge data

2014-03-05 Thread Otis Gospodnetic
Hi, 6M is really not huge these days. 6B is big, though also still not huge any more. What seems to be the bottleneck? Solr or DB or network or something else? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Wed, Mar 5,