Re: How to do parallel indexing on files (not on HDFS)

2018-05-24 Thread Rahul Singh
, 2018 at 12:46 AM, Raymond Xie wrote: > > > Thank you all for the suggestions. I'm now tending to not using a > > > traditional parallel indexing my data are json files with meta data > > > extracted from raw data received and archived into our data server > &g

Re: How to do parallel indexing on files (not on HDFS)

2018-05-24 Thread Adhyan Arizki
using a > traditional parallel indexing my data are json files with meta data > extracted from raw data received and archived into our data server cluster. > Those data come in various flows and reside in their respective folders, > splitting them might introduce unnecessary extra w

Re: How to do parallel indexing on files (not on HDFS)

2018-05-24 Thread Raymond Xie
Thank you all for the suggestions. I'm now tending to not using a traditional parallel indexing my data are json files with meta data extracted from raw data received and archived into our data server cluster. Those data come in various flows and reside in their respective folders, splitting

Re: How to do parallel indexing on files (not on HDFS)

2018-05-24 Thread Rahul Singh
Resending to list to help more people.. This is an architectural pattern to solve the same issue that arises over and over again.. The queue can be anything — a table in a database, even a collection solr. And yes I have implemented it —  I did it in C# before using a SQL Server table based qu

Re: How to do parallel indexing on files (not on HDFS)

2018-05-24 Thread Adhyan Arizki
Raymond, Running parallel index might be trickier than it looks if the scale is big. For instance, you can easily partition your data (let's say into 5 chunks) and run 5 processes to index them. However, you will need to be aware if there will be choke in the pipeline along the way (e.g. I/O of da

Re: How to do parallel indexing on files (not on HDFS)

2018-05-23 Thread Raymond Xie
Thank you Rahul despite that's very high level. With no offense, do you have a successful implementation or it is just your unproven idea? I never used Rabbit nor Kafka before but would be very interested in knowing more detail on the Kafka idea as Kafka is available in my environment. Thank you

Re: How to do parallel indexing on files (not on HDFS)

2018-05-23 Thread Rahul Singh
Enumerate the file locations (map) , put them in a queue like rabbit or Kafka (Persist the map), have a bunch of threads , workers, containers, whatever pop off the queue , process the item (reduce). -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 20, 2018, 7:24 AM -0400, Raymond

How to do parallel indexing on files (not on HDFS)

2018-05-20 Thread Raymond Xie
I know how to do indexing on file system like single file or folder, but how do I do that in a parallel way? The data I need to index is of huge volume and can't be put on HDFS. Thank you ** *Sincerely yours,* *Raymond*

Re: Configuration of parallel indexing threads

2017-06-09 Thread gigo314
Thanks a lot! -- View this message in context: http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4339792.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration of parallel indexing threads

2017-06-02 Thread Erick Erickson
-- > View this message in context: > http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4338599.html > Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration of parallel indexing threads

2017-06-02 Thread gigo314
? -- View this message in context: http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4338599.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration of parallel indexing threads

2017-06-01 Thread Susheel Kumar
, but not sure how > to use > > it in Solr 6.2. Hopefully there is a setting in Solr configuration file, > but > > I cannot find it. > > > > > > > > -- > > View this message in context: http://lucene.472066.n3. > nabble.com/Configuration-of-parallel-indexing-threads-tp4338466.html > > Sent from the Solr - User mailing list archive at Nabble.com. >

Re: Configuration of parallel indexing threads

2017-06-01 Thread Erick Erickson
sure how to use > it in Solr 6.2. Hopefully there is a setting in Solr configuration file, but > I cannot find it. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466.html > Sent from the Solr - User mailing list archive at Nabble.com.

Configuration of parallel indexing threads

2017-06-01 Thread gigo314
lr 6.2. Hopefully there is a setting in Solr configuration file, but I cannot find it. -- View this message in context: http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Parallel Indexing

2014-12-22 Thread Peri Subrahmanya
Thanks guys for the quick responses. I need to take the suggestions, incorporate them, figure out how is that we are doing the fetching etc and reply back on this post. The suggestions have been very helpful in taking this forward for us here. Thanks -Peri.S > On Dec 22, 2014, at 10:32 AM, Er

Re: Parallel Indexing

2014-12-22 Thread Erick Erickson
Just to pile on _very_ frequently in my experience the problem is not Solr at all, but acquiring the data in the first place, i.e. often executing the DB query. A very simple test is (in the SolrJ world) just comment out the server.add(doclist). Assuming you're using SolrJ, you _are_ indexin

Re: Parallel Indexing

2014-12-22 Thread Mikhail Khludnev
What your indexer is build on? Do you use SolrJ, just REST, or DataImportHandler? What's you DB schema is briefly? Frankly speaking, there are few approaches to handle indexing concurrently, details depends on the details mentioned above. On Mon, Dec 22, 2014 at 5:54 PM, Peri Subrahmanya < peri.su

Re: Parallel Indexing

2014-12-22 Thread Ahmet Arslan
Hi Peri, You can always send concurrent update requests to solr. Usually data acquisition takes more time than indexing time. You can dump your db record into several csv files and you can feed them to solr in parallel. Ahmet On Monday, December 22, 2014 4:55 PM, Peri Subrahmanya wrote: H

Parallel Indexing

2014-12-22 Thread Peri Subrahmanya
Hi, We have millions of records in our db that we do a complete re-index of every fortnight or so. It takes around 11 hours or so and I was wondering if there was a way to fetch the records in batches parallel and issue the solr http command with the solr docs in parallel. Please let me know.

Re: Issue in parallel Indexing using multiple csv files

2013-10-01 Thread zaheer.java
Ran more tests. It works. -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-in-parallel-Indexing-using-multiple-csv-files-tp4092452p4092873.html Sent from the Solr - User mailing list archive at Nabble.com.

Issue in parallel Indexing using multiple csv files

2013-09-27 Thread zaheer.java
ucene.472066.n3.nabble.com/Issue-in-parallel-Indexing-using-multiple-csv-files-tp4092452.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Parallel Indexing With Solr?

2013-03-29 Thread Furkan KAMACI
Can you tell more about "You can index from a MapReduce job "? I use nutch and it says Solr to index and reindex. I know that I can use Map Reduce jobs at nutch side however can I use Map Reduce jobs at Solr side (i.e for indexing etc.)? 2013/3/29 Otis Gospodnetic > Yes. You can index from

Re: Parallel Indexing With Solr?

2013-03-29 Thread Otis Gospodnetic
Yes. You can index from any app that can hit SOlr with multiple threads. You can use StreamingUpdateSolrServer, at least in older Solrs, to handle multi-threading for you. You can index from a MapReduce job Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013

Re: Parallel Indexing With Solr?

2013-03-29 Thread Gora Mohanty
On 29 March 2013 14:56, Furkan KAMACI wrote: > Does Solr allows parallelism (parallel computing) for indexing? What do you mean by parallel computing in this context? Solr can use multiple threads for indexing if that is what you are asking. Regards, Gora

Parallel Indexing With Solr?

2013-03-29 Thread Furkan KAMACI
Does Solr allows parallelism (parallel computing) for indexing?

Re: Parallel indexing and swapping

2013-02-20 Thread Mark Miller
There is an open issue somewhere for this type of support. We don't have a simple way to do it currently. We also will be looking at adding index alias', which is probably another feature you could use to solve this. Currently, you would need some kind of load balancer to achieve this nicely I

Parallel indexing and swapping

2013-02-20 Thread Shankar Sundararaju
Hi All, I am using Solr 4.1. I have a Solr cluster of 3 leaders and 3 replicas hosting collection1 consisting of thousands of documents currently serving the search requests. I would like re-index all the documents in another collection, say collection2 in this same solr cluster and swap it with

Re: Parallel indexing in Solr

2012-02-07 Thread Per Steffensen
You could try to isolate the bottleneck by testing the indexing speed from the local machine hosting Solr. Also tools like iostat or sar might give you more details about the disk side. Yes, I am doing different stuff to isolate bottleneck. Im also profiling JVM. And I am using iostat, top a

Re: Parallel indexing in Solr

2012-02-07 Thread Sami Siren
On Mon, Feb 6, 2012 at 5:55 PM, Per Steffensen wrote: > Sami Siren skrev: > >> On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen >> wrote: >> >> >> >>> >>> Actually right now, I am trying to find our what my bottleneck is. The >>> setup >>> is more complex, than I would bother you with, but basicall

Re: Parallel indexing in Solr

2012-02-06 Thread Erick Erickson
. I've had recurring discussions with "executive level folks" that no matter how many VMs you host on a machine, and no matter how big that machine is, there really, truly, *is* some hardware underlying it all that really, truly, *does* have some limits. And adding more VMs doesn't somehow get aro

Re: Parallel indexing in Solr

2012-02-06 Thread Per Steffensen
Sami Siren skrev: On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen wrote: Actually right now, I am trying to find our what my bottleneck is. The setup is more complex, than I would bother you with, but basically I have servers with 80-90% IO-wait and only 5-10% "real CPU usage". It might not

Re: Parallel indexing in Solr

2012-02-06 Thread Per Steffensen
So SolrJ with CommonsHttpSolrServer will not support handling several requests concurrently? Nope. Use StreamingUpdateSolrServer, it should be just a drop-in with a different constructor. I will try to do that. It is a little bit difficult for me, as we are actually not dealing with

Re: Parallel indexing in Solr

2012-02-06 Thread Sami Siren
On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen wrote: > Actually right now, I am trying to find our what my bottleneck is. The setup > is more complex, than I would bother you with, but basically I have servers > with 80-90% IO-wait and only 5-10% "real CPU usage". It might not be a > Solr-relat

Re: Parallel indexing in Solr

2012-02-06 Thread Erick Erickson
Right. See below. On Mon, Feb 6, 2012 at 7:53 AM, Per Steffensen wrote: > See response below > > Erick Erickson skrev: > >> Unfortunately, the answer is "it depends(tm)". >> >> First question: How are you indexing things? SolrJ? post.jar? >> > > SolrJ, CommonsHttpSolrServer > >> But some observat

Re: Parallel indexing in Solr

2012-02-06 Thread Per Steffensen
See response below Erick Erickson skrev: Unfortunately, the answer is "it depends(tm)". First question: How are you indexing things? SolrJ? post.jar? SolrJ, CommonsHttpSolrServer But some observations: 1> sure, using multiple cores will have some parallelism. So will using a single co

Re: Parallel indexing in Solr

2012-02-03 Thread Erick Erickson
Unfortunately, the answer is "it depends(tm)". First question: How are you indexing things? SolrJ? post.jar? But some observations: 1> sure, using multiple cores will have some parallelism. So will using a single core but using something like SolrJ and StreamingUpdateSolrServer. Especial

Parallel indexing in Solr

2012-02-03 Thread Per Steffensen
Hi This topic has probably been covered before, but I havnt had the luck to find the answer. We are running solr instances with several cores inside. Solr running out-of-the-box on top of jetty. I believe jetty is receiving all the http-requests about indexing ned documents, and forwards it

Re: Using DIH for parallel indexing

2009-07-31 Thread Avlesh Singh
Thanks Noble and Shalin. Cheers Avlesh On Fri, Jul 31, 2009 at 1:23 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Fri, Jul 31, 2009 at 11:53 AM, Avlesh Singh wrote: > > > Thanks for the revert Noble. A few questions are still open: > > > > 1. Can I pass parameters to DIH and

Re: Using DIH for parallel indexing

2009-07-31 Thread Shalin Shekhar Mangar
On Fri, Jul 31, 2009 at 11:53 AM, Avlesh Singh wrote: > Thanks for the revert Noble. A few questions are still open: > > 1. Can I pass parameters to DIH and be able to use them inside the > "query" attribute of an entity inside the data-config file? > Yes. Use ${dataimporter.request.X} or ${

Re: Using DIH for parallel indexing

2009-07-30 Thread Avlesh Singh
Thanks for the revert Noble. A few questions are still open: 1. Can I pass parameters to DIH and be able to use them inside the "query" attribute of an entity inside the data-config file? 2. Can I use the same data-import-handler in someway so that indexing can be carried out in parall

Re: Using DIH for parallel indexing

2009-07-30 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Jul 31, 2009 at 11:11 AM, Avlesh Singh wrote: > I am using Solr 1.3 and have a few questions regarding DIH: > >   1. Can I pass parameters to DIH and be able to use them inside the >   "query" attribute of an entity inside the data-config file? >   2. I am indexing some 2 million database r

Using DIH for parallel indexing

2009-07-30 Thread Avlesh Singh
I am using Solr 1.3 and have a few questions regarding DIH: 1. Can I pass parameters to DIH and be able to use them inside the "query" attribute of an entity inside the data-config file? 2. I am indexing some 2 million database records using DIH with 4-5 nested entities (just one level