Tune Data Import Handler to retrieve maximum records

2016-02-10 Thread Troy Edwards
Is it possible for the Data Import Handler to bring in maximum number of records depending on available resources? If so, how should it be configured? Thanks,

Data Import Handler - autoSoftCommit and autoCommit

2016-02-08 Thread Troy Edwards
We are running the data import handler to retrieve about 10 million records during work hours every day of the week. We are using Clean = true, Commit = true and Optimize = true. The entire process takes about 1 hour. What would be a good setting for autoCommit and autoSoftCommit? Thanks

Re: Data Import Handler takes different time on different machines

2016-02-03 Thread Troy Edwards
t;> >> Not much help I know, >> Erick >> >> On Tue, Feb 2, 2016 at 10:11 AM, Troy Edwards >> wrote: >> > Rerunning the Data Import Handler again on the the linux machine has >> > started producing some errors and w

Re: Data Import Handler takes different time on different machines

2016-02-02 Thread Troy Edwards
u have > old jars hanging around that are mis-matched. Or someone manually > deleted files from the Solr install. Or your disk filled up. Or > > How sure are you that the linux setup was done properly? > > Not much help I know, > Erick > > On Tue, Feb 2, 2016 at 10:11 A

Re: Data Import Handler takes different time on different machines

2016-02-02 Thread Troy Edwards
ou can also forgo DIH and do a simple import program via SolrJ. The > advantage here is that the comparison I'm talking about above is > really simple, just comment out the call that sends data to Solr. Here's an > example... > > https://lucidworks.com/blog/2012/02/14/indexing

Re: Data Import Handler takes different time on different machines

2016-02-01 Thread Troy Edwards
a Linux dev machine? Perhaps your prod > > machine is loaded much more than a dev. > > > > Regards, > >Alex. > > > > Newsletter and resources for Solr beginners and intermediates: > > http://www.solr-start.com/ > > > > > > On 2

Data Import Handler takes different time on different machines

2016-02-01 Thread Troy Edwards
We have a windows development machine on which the Data Import Handler consistently takes about 40 mins to finish. Queries run fine. JVM memory is 2 GB per node. But on a linux machine it consistently takes about 2.5 hours. The queries also run slower. JVM memory here is also 2 GB per node. How s

Re: Scaling SolrCloud

2016-01-20 Thread Troy Edwards
happen a > lot restarting nodes (this is annoying with replicas with 100G), don't > underestimate this point. Free space can save your life. > > \-- > > /Yago Riveiro > > > On Jan 19 2016, at 11:26 pm, Shawn Heisey <apa...@elyograg.org> > wrote: > > >

Scaling SolrCloud

2016-01-19 Thread Troy Edwards
We are currently "beta testing" a SolrCloud with 2 nodes and 2 shards with 2 replicas each. The number of documents is about 125000. We now want to scale this to about 10 billion documents. What are the steps to prototyping, hardware estimation and stress testing? Thanks

Solr 6 - Relational Index querying

2015-12-23 Thread Troy Edwards
In Solr 5.1.0 we had to flatten out two collections into one Item - about 1.5 million items with primary key - ItemId (this mainly contains item description) FacilityItem - about 10,000 facilities - primary key - FacilityItemId (pricing information for each facility) - ItemId points to Item We a

Reloading the collection timed out

2015-12-04 Thread Troy Edwards
After running Solr on a linux box for about 15 days; today when I tried to reload collections I got the following error reload the collection time out:180s org.apache.solr.common.SolrException: reload the collection time out:180s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse

Maximum in Multivalued field

2015-12-01 Thread Troy Edwards
We are considering using a multivalued field that can contain up to 1000 unique values. This field will not be searched on but just used in facet and filter. What is the maximum number of values that a multivalued field can contain? Is there another more efficient way of doing this? Thanks

Shards and Replicas

2015-11-18 Thread Troy Edwards
I am looking for some good articles/guidance on how to determine number of shards and replicas for an index? Thanks

From Solr 5.1.0 to 5.3.1

2015-11-11 Thread Troy Edwards
I am testing to see if we can go to Solr 5.3.1. When I try to create collection (using collection api) I am getting an error: "ClassNotFoundException" org.apache.solr.handler.dataimport.DataImportHandler I have all the files in exactly the same locations In my solrconfig.xml: This is where I h

Re: Join with faceting and filtering

2015-10-01 Thread Troy Edwards
s {!terms} QParser (which makes leg-shooting easier). > 4. what are number of documents you operate? what is update frequency? Is > there a chance to keep both types in the single index? > > On Thu, Oct 1, 2015 at 5:58 AM, Troy Edwards > wrote: > > > I am working with the foll

Join with faceting and filtering

2015-09-30 Thread Troy Edwards
I am working with the following indices *Item* ItemId - string Description - text (query on this) Categories - Multivalued text (query on this) Sellers - Multivalued text (query on this) SellersString - Multivalued string (Need to facet and filter on this) *ContractItem* ContractItemId - string

Re: DataImportHandler scheduling

2015-09-01 Thread Troy Edwards
11:26 AM, Troy Edwards wrote: > > I am having a hard time finding documentation on DataImportHandler > > scheduling in SolrCloud. Can someone please post a link to that? I > > have a requirement that the DIH should be initiated at a specific time > > Monday through Friday. >

DataImportHandler scheduling

2015-08-31 Thread Troy Edwards
I am having a hard time finding documentation on DataImportHandler scheduling in SolrCloud. Can someone please post a link to that? I have a requirement that the DIH should be initiated at a specific time Monday through Friday. Thanks!

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Troy Edwards
Thank you for taking the time to do the test. I have been doing similar tests using the post Tool (SimplePostTool) with the real data and was able to get to about 10K documents/second. I am considering using multiple files (one per client) ftp'd into a solr node and then use a scheduled job to us

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Troy Edwards
Are you suggesting that requests come into a service layer that identifies which client is on which solrcloud and passes the request to that cloud? Thank you On Wed, Aug 19, 2015 at 1:13 PM, Toke Eskildsen wrote: > Troy Edwards wrote: > > My average document size is 400 bytes >

How to Fast Bulk Inserting documents

2015-08-19 Thread Troy Edwards
I have a requirement where I have to bulk insert a lot of documents in SolrCloud. My average document size is 400 bytes Number of documents that need to be inserted 25/second (for a total of about 3.6 Billion documents) Any ideas/suggestions on how that can be done? (use a client or uploadcsv

Index very large number of documents from large number of clients

2015-08-15 Thread Troy Edwards
I am using SolrCloud My initial requirements are: 1) There are about 6000 clients 2) The number of documents from each client are about 50 (average document size is about 400 bytes) 3 I have to wipe off the index/collection every night and create new Any thoughts/ideas/suggestions on: 1) Ho