, 2018 at 12:46 AM, Raymond Xie wrote:
> > > Thank you all for the suggestions. I'm now tending to not using a
> > > traditional parallel indexing my data are json files with meta data
> > > extracted from raw data received and archived into our data server
> &g
using a
> traditional parallel indexing my data are json files with meta data
> extracted from raw data received and archived into our data server cluster.
> Those data come in various flows and reside in their respective folders,
> splitting them might introduce unnecessary extra w
Thank you all for the suggestions. I'm now tending to not using a
traditional parallel indexing my data are json files with meta data
extracted from raw data received and archived into our data server cluster.
Those data come in various flows and reside in their respective folders,
splitting
Resending to list to help more people..
This is an architectural pattern to solve the same issue that arises over and
over again.. The queue can be anything — a table in a database, even a
collection solr.
And yes I have implemented it — I did it in C# before using a SQL Server table
based qu
Raymond,
Running parallel index might be trickier than it looks if the scale is big.
For instance, you can easily partition your data (let's say into 5 chunks)
and run 5 processes to index them. However, you will need to be aware if
there will be choke in the pipeline along the way (e.g. I/O of da
Thank you Rahul despite that's very high level.
With no offense, do you have a successful implementation or it is just your
unproven idea? I never used Rabbit nor Kafka before but would be very
interested in knowing more detail on the Kafka idea as Kafka is available
in my environment.
Thank you
Enumerate the file locations (map) , put them in a queue like rabbit or Kafka
(Persist the map), have a bunch of threads , workers, containers, whatever pop
off the queue , process the item (reduce).
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On May 20, 2018, 7:24 AM -0400, Raymond
I know how to do indexing on file system like single file or folder, but
how do I do that in a parallel way? The data I need to index is of huge
volume and can't be put on HDFS.
Thank you
**
*Sincerely yours,*
*Raymond*
Thanks a lot!
--
View this message in context:
http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4339792.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
> View this message in context:
> http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4338599.html
> Sent from the Solr - User mailing list archive at Nabble.com.
?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4338599.html
Sent from the Solr - User mailing list archive at Nabble.com.
, but not sure how
> to use
> > it in Solr 6.2. Hopefully there is a setting in Solr configuration file,
> but
> > I cannot find it.
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> nabble.com/Configuration-of-parallel-indexing-threads-tp4338466.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
sure how to use
> it in Solr 6.2. Hopefully there is a setting in Solr configuration file, but
> I cannot find it.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466.html
> Sent from the Solr - User mailing list archive at Nabble.com.
lr 6.2. Hopefully there is a setting in Solr configuration file, but
I cannot find it.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466.html
Sent from the Solr - User mailing list archive at Nabble.com.
Thanks guys for the quick responses. I need to take the suggestions,
incorporate them, figure out how is that we are doing the fetching etc and
reply back on this post. The suggestions have been very helpful in taking this
forward for us here.
Thanks
-Peri.S
> On Dec 22, 2014, at 10:32 AM, Er
Just to pile on
_very_ frequently in my experience the problem
is not Solr at all, but acquiring the data in the
first place, i.e. often executing the DB query.
A very simple test is (in the SolrJ world) just comment
out the server.add(doclist).
Assuming you're using SolrJ, you _are_ indexin
What your indexer is build on? Do you use SolrJ, just REST, or
DataImportHandler? What's you DB schema is briefly?
Frankly speaking, there are few approaches to handle indexing concurrently,
details depends on the details mentioned above.
On Mon, Dec 22, 2014 at 5:54 PM, Peri Subrahmanya <
peri.su
Hi Peri,
You can always send concurrent update requests to solr.
Usually data acquisition takes more time than indexing time. You can dump your
db record into several csv files and you can feed them to solr in parallel.
Ahmet
On Monday, December 22, 2014 4:55 PM, Peri Subrahmanya
wrote:
H
Hi,
We have millions of records in our db that we do a complete re-index of every
fortnight or so. It takes around 11 hours or so and I was wondering if there
was a way to fetch the records in batches parallel and issue the solr http
command with the solr docs in parallel. Please let me know.
Ran more tests. It works.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Issue-in-parallel-Indexing-using-multiple-csv-files-tp4092452p4092873.html
Sent from the Solr - User mailing list archive at Nabble.com.
ucene.472066.n3.nabble.com/Issue-in-parallel-Indexing-using-multiple-csv-files-tp4092452.html
Sent from the Solr - User mailing list archive at Nabble.com.
Can you tell more about "You can index from a MapReduce job "? I use
nutch and it says Solr to index and reindex. I know that I can use Map
Reduce jobs at nutch side however can I use Map Reduce jobs at Solr side
(i.e for indexing etc.)?
2013/3/29 Otis Gospodnetic
> Yes. You can index from
Yes. You can index from any app that can hit SOlr with multiple
threads. You can use StreamingUpdateSolrServer, at least in older
Solrs, to handle multi-threading for you. You can index from a
MapReduce job
Otis
--
Solr & ElasticSearch Support
http://sematext.com/
On Fri, Mar 29, 2013
On 29 March 2013 14:56, Furkan KAMACI wrote:
> Does Solr allows parallelism (parallel computing) for indexing?
What do you mean by parallel computing in this context?
Solr can use multiple threads for indexing if that is what
you are asking.
Regards,
Gora
Does Solr allows parallelism (parallel computing) for indexing?
There is an open issue somewhere for this type of support. We don't have a
simple way to do it currently.
We also will be looking at adding index alias', which is probably another
feature you could use to solve this.
Currently, you would need some kind of load balancer to achieve this nicely I
Hi All,
I am using Solr 4.1.
I have a Solr cluster of 3 leaders and 3 replicas hosting collection1
consisting of thousands of documents currently serving the search requests.
I would like re-index all the documents in another collection, say
collection2 in this same solr cluster and swap it with
You could try to isolate the bottleneck by testing the indexing speed
from the local machine hosting Solr. Also tools like iostat or sar
might give you more details about the disk side.
Yes, I am doing different stuff to isolate bottleneck. Im also profiling
JVM. And I am using iostat, top a
On Mon, Feb 6, 2012 at 5:55 PM, Per Steffensen wrote:
> Sami Siren skrev:
>
>> On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen
>> wrote:
>>
>>
>>
>>>
>>> Actually right now, I am trying to find our what my bottleneck is. The
>>> setup
>>> is more complex, than I would bother you with, but basicall
. I've had recurring discussions with "executive level folks" that no
matter how many VMs you host on a machine, and no matter how big that
machine is, there really, truly, *is* some hardware underlying it all that
really, truly, *does* have some limits.
And adding more VMs doesn't somehow get aro
Sami Siren skrev:
On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen wrote:
Actually right now, I am trying to find our what my bottleneck is. The setup
is more complex, than I would bother you with, but basically I have servers
with 80-90% IO-wait and only 5-10% "real CPU usage". It might not
So SolrJ with CommonsHttpSolrServer will not support handling several
requests concurrently?
Nope. Use StreamingUpdateSolrServer, it should be just a drop-in with
a different constructor.
I will try to do that. It is a little bit difficult for me, as we are
actually not dealing with
On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen wrote:
> Actually right now, I am trying to find our what my bottleneck is. The setup
> is more complex, than I would bother you with, but basically I have servers
> with 80-90% IO-wait and only 5-10% "real CPU usage". It might not be a
> Solr-relat
Right. See below.
On Mon, Feb 6, 2012 at 7:53 AM, Per Steffensen wrote:
> See response below
>
> Erick Erickson skrev:
>
>> Unfortunately, the answer is "it depends(tm)".
>>
>> First question: How are you indexing things? SolrJ? post.jar?
>>
>
> SolrJ, CommonsHttpSolrServer
>
>> But some observat
See response below
Erick Erickson skrev:
Unfortunately, the answer is "it depends(tm)".
First question: How are you indexing things? SolrJ? post.jar?
SolrJ, CommonsHttpSolrServer
But some observations:
1> sure, using multiple cores will have some parallelism. So will
using a single co
Unfortunately, the answer is "it depends(tm)".
First question: How are you indexing things? SolrJ? post.jar?
But some observations:
1> sure, using multiple cores will have some parallelism. So will
using a single core but using something like SolrJ and
StreamingUpdateSolrServer. Especial
Hi
This topic has probably been covered before, but I havnt had the luck to
find the answer.
We are running solr instances with several cores inside. Solr running
out-of-the-box on top of jetty. I believe jetty is receiving all the
http-requests about indexing ned documents, and forwards it
Thanks Noble and Shalin.
Cheers
Avlesh
On Fri, Jul 31, 2009 at 1:23 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
> On Fri, Jul 31, 2009 at 11:53 AM, Avlesh Singh wrote:
>
> > Thanks for the revert Noble. A few questions are still open:
> >
> > 1. Can I pass parameters to DIH and
On Fri, Jul 31, 2009 at 11:53 AM, Avlesh Singh wrote:
> Thanks for the revert Noble. A few questions are still open:
>
> 1. Can I pass parameters to DIH and be able to use them inside the
> "query" attribute of an entity inside the data-config file?
>
Yes. Use ${dataimporter.request.X} or ${
Thanks for the revert Noble. A few questions are still open:
1. Can I pass parameters to DIH and be able to use them inside the
"query" attribute of an entity inside the data-config file?
2. Can I use the same data-import-handler in someway so that indexing can
be carried out in parall
On Fri, Jul 31, 2009 at 11:11 AM, Avlesh Singh wrote:
> I am using Solr 1.3 and have a few questions regarding DIH:
>
> 1. Can I pass parameters to DIH and be able to use them inside the
> "query" attribute of an entity inside the data-config file?
> 2. I am indexing some 2 million database r
I am using Solr 1.3 and have a few questions regarding DIH:
1. Can I pass parameters to DIH and be able to use them inside the
"query" attribute of an entity inside the data-config file?
2. I am indexing some 2 million database records using DIH with 4-5
nested entities (just one level
42 matches
Mail list logo