Is it possible for the Data Import Handler to bring in maximum number of
records depending on available resources? If so, how should it be
configured?
Thanks,
We are running the data import handler to retrieve about 10 million records
during work hours every day of the week. We are using Clean = true, Commit
= true and Optimize = true. The entire process takes about 1 hour.
What would be a good setting for autoCommit and autoSoftCommit?
Thanks
t;>
>> Not much help I know,
>> Erick
>>
>> On Tue, Feb 2, 2016 at 10:11 AM, Troy Edwards
>> wrote:
>> > Rerunning the Data Import Handler again on the the linux machine has
>> > started producing some errors and w
u have
> old jars hanging around that are mis-matched. Or someone manually
> deleted files from the Solr install. Or your disk filled up. Or
>
> How sure are you that the linux setup was done properly?
>
> Not much help I know,
> Erick
>
> On Tue, Feb 2, 2016 at 10:11 A
ou can also forgo DIH and do a simple import program via SolrJ. The
> advantage here is that the comparison I'm talking about above is
> really simple, just comment out the call that sends data to Solr. Here's an
> example...
>
> https://lucidworks.com/blog/2012/02/14/indexing
a Linux dev machine? Perhaps your prod
> > machine is loaded much more than a dev.
> >
> > Regards,
> >Alex.
> >
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> > On 2
We have a windows development machine on which the Data Import Handler
consistently takes about 40 mins to finish. Queries run fine. JVM memory is
2 GB per node.
But on a linux machine it consistently takes about 2.5 hours. The queries
also run slower. JVM memory here is also 2 GB per node.
How s
happen a
> lot restarting nodes (this is annoying with replicas with 100G), don't
> underestimate this point. Free space can save your life.
>
> \--
>
> /Yago Riveiro
>
> > On Jan 19 2016, at 11:26 pm, Shawn Heisey <apa...@elyograg.org>
> wrote:
>
> >
We are currently "beta testing" a SolrCloud with 2 nodes and 2 shards with
2 replicas each. The number of documents is about 125000.
We now want to scale this to about 10 billion documents.
What are the steps to prototyping, hardware estimation and stress testing?
Thanks
In Solr 5.1.0 we had to flatten out two collections into one
Item - about 1.5 million items with primary key - ItemId (this mainly
contains item description)
FacilityItem - about 10,000 facilities - primary key - FacilityItemId
(pricing information for each facility) - ItemId points to Item
We a
After running Solr on a linux box for about 15 days; today when I tried to
reload collections I got the following error
reload the collection time out:180s
org.apache.solr.common.SolrException: reload the collection time out:180s
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse
We are considering using a multivalued field that can contain up to 1000
unique values. This field will not be searched on but just used in facet
and filter.
What is the maximum number of values that a multivalued field can contain?
Is there another more efficient way of doing this?
Thanks
I am looking for some good articles/guidance on how to determine number of
shards and replicas for an index?
Thanks
I am testing to see if we can go to Solr 5.3.1.
When I try to create collection (using collection api) I am getting an
error: "ClassNotFoundException"
org.apache.solr.handler.dataimport.DataImportHandler
I have all the files in exactly the same locations
In my solrconfig.xml:
This is where I h
s {!terms} QParser (which makes leg-shooting easier).
> 4. what are number of documents you operate? what is update frequency? Is
> there a chance to keep both types in the single index?
>
> On Thu, Oct 1, 2015 at 5:58 AM, Troy Edwards
> wrote:
>
> > I am working with the foll
I am working with the following indices
*Item*
ItemId - string
Description - text (query on this)
Categories - Multivalued text (query on this)
Sellers - Multivalued text (query on this)
SellersString - Multivalued string (Need to facet and filter on this)
*ContractItem*
ContractItemId - string
11:26 AM, Troy Edwards wrote:
> > I am having a hard time finding documentation on DataImportHandler
> > scheduling in SolrCloud. Can someone please post a link to that? I
> > have a requirement that the DIH should be initiated at a specific time
> > Monday through Friday.
>
I am having a hard time finding documentation on DataImportHandler
scheduling in SolrCloud. Can someone please post a link to that? I have a
requirement that the DIH should be initiated at a specific time Monday
through Friday.
Thanks!
Thank you for taking the time to do the test.
I have been doing similar tests using the post Tool (SimplePostTool) with
the real data and was able to get to about 10K documents/second.
I am considering using multiple files (one per client) ftp'd into a solr
node and then use a scheduled job to us
Are you suggesting that requests come into a service layer that identifies
which client is on which solrcloud and passes the request to that cloud?
Thank you
On Wed, Aug 19, 2015 at 1:13 PM, Toke Eskildsen
wrote:
> Troy Edwards wrote:
> > My average document size is 400 bytes
>
I have a requirement where I have to bulk insert a lot of documents in
SolrCloud.
My average document size is 400 bytes
Number of documents that need to be inserted 25/second (for a total of
about 3.6 Billion documents)
Any ideas/suggestions on how that can be done? (use a client or uploadcsv
I am using SolrCloud
My initial requirements are:
1) There are about 6000 clients
2) The number of documents from each client are about 50 (average
document size is about 400 bytes)
3 I have to wipe off the index/collection every night and create new
Any thoughts/ideas/suggestions on:
1) Ho
22 matches
Mail list logo