ch one of parent
> tuples and execute the child entity sql’s(with where condition of parent) to
> create one solr document? Won’t it be more load on database by executing more
> sqls? Is there an optimum solution?
>
> Thanks,
> Srinivas
> From: Erick Erickson
> Sent: 22 May 2
22:52
To: solr-user@lucene.apache.org
Subject: Re: Indexing huge data onto solr
You have a lot more control over the speed and form of importing data if
you just do the initial load in SolrJ. Here’s an example, taking the Tika
parts out is easy:
https://lucidworks.com/post/indexing-with-solrj
I can index (without nested entities ofc ;) ) 100M records in about
6-8 hours on a pretty low-powered machine using vanilla DIH -> mysql
so it is probably worth looking at why it is going slow before writing
your own indexer (which we are finally having to do)
On Fri, May 22, 2020 at 1:22 PM Erick
You have a lot more control over the speed and form of importing data if
you just do the initial load in SolrJ. Here’s an example, taking the Tika
parts out is easy:
https://lucidworks.com/post/indexing-with-solrj/
It’s especially instructive to comment out just the call to
CloudSolrClient.add(d
[mailto:erickerick...@gmail.com]
Sent: Wednesday, March 05, 2014 8:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing huge data
Here's the easiest thing to try to figure out where to concentrate your
energies. Just comment out the server.add call in your SolrJ program.
Well, and a
nto Solr
>>> using CSV handler & curl. This will give you the pure indexing time & the
>>> differences.
>>>
>>> Thanks,
>>> Susheel
>>>
>>> -Original Message-
>>> From: Erick Erickson [mailto:erickerick...@gmail.
al Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Wednesday, March 05, 2014 8:03 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Indexing huge data
>>
>> Here's the easiest thing to try to figure out where to concentrate your
>
into Solr using CSV handler & curl.
This will give you the pure indexing time & the differences.
Thanks,
Susheel
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, March 05, 2014 8:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing h
Erick,
That helps so I can focus on the problem areas. Thanks.
On 3/5/14, 6:03 PM, Erick Erickson wrote:
Here's the easiest thing to try to figure out where to
concentrate your energies. Just comment out the
server.add call in your SolrJ program. Well, and any
commits you're doing from Solr
m: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, March 05, 2014 8:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing huge data
Here's the easiest thing to try to figure out where to concentrate your
energies. Just comment out the server.add call in your SolrJ progra
Here's the easiest thing to try to figure out where to
concentrate your energies. Just comment out the
server.add call in your SolrJ program. Well, and any
commits you're doing from SolrJ.
My bet: Your program will run at about the same speed
it does when you actually index the docs, indicatin
Make sure you're not doing a commit on each individual document add. Commit
every few minutes or every few hundred or few thousand documents is
sufficient. You can set up auto commit in solrconfig.xml.
-- Jack Krupansky
-Original Message-
From: Rallavagu
Sent: Wednesday, March 5, 201
Hi,
Each doc is 100K? That's on the big side, yes, and the server seems on the
small side, yes. Hence the "speed". :)
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
On Wed, Mar 5, 2014 at 3:37 PM, Rallavagu wrote:
> Otis
Otis,
Good points. I guess you are suggesting that it depends on the
resources. The document is 100k each the pre processing server is a 2
cpu VM running with 4G RAM. So, that could be a "small" machine
relatively to process such amount of data??
On 3/5/14, 12:27 PM, Otis Gospodnetic wrote:
Hi,
It depends. Are docs huge or small? Server single core or 32 core? Heap
big or small? etc. etc.
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
On Wed, Mar 5, 2014 at 3:02 PM, Rallavagu wrote:
> It seems the latency
It seems the latency is introduced by collecting the data from different
sources and putting them together then actual Solr index. I would say
all these activities are contributing equally though I would say So, is
it normal to expect to run indexing to run for long? Wondering what to
expect in
Hi,
6M is really not huge these days. 6B is big, though also still not huge
any more. What seems to be the bottleneck? Solr or DB or network or
something else?
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
On Wed, Mar 5,
17 matches
Mail list logo