In Solr 5.1.0 we had to flatten out two collections into one
Item - about 1.5 million items with primary key - ItemId (this mainly
contains item description)
FacilityItem - about 10,000 facilities - primary key - FacilityItemId
(pricing information for each facility) - ItemId points to Item
We a
We are currently "beta testing" a SolrCloud with 2 nodes and 2 shards with
2 replicas each. The number of documents is about 125000.
We now want to scale this to about 10 billion documents.
What are the steps to prototyping, hardware estimation and stress testing?
Thanks
happen a
> lot restarting nodes (this is annoying with replicas with 100G), don't
> underestimate this point. Free space can save your life.
>
> \--
>
> /Yago Riveiro
>
> > On Jan 19 2016, at 11:26 pm, Shawn Heisey <apa...@elyograg.org>
> wrote:
>
> >
We have a windows development machine on which the Data Import Handler
consistently takes about 40 mins to finish. Queries run fine. JVM memory is
2 GB per node.
But on a linux machine it consistently takes about 2.5 hours. The queries
also run slower. JVM memory here is also 2 GB per node.
How s
a Linux dev machine? Perhaps your prod
> > machine is loaded much more than a dev.
> >
> > Regards,
> >Alex.
> >
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> > On 2
ou can also forgo DIH and do a simple import program via SolrJ. The
> advantage here is that the comparison I'm talking about above is
> really simple, just comment out the call that sends data to Solr. Here's an
> example...
>
> https://lucidworks.com/blog/2012/02/14/indexing
u have
> old jars hanging around that are mis-matched. Or someone manually
> deleted files from the Solr install. Or your disk filled up. Or
>
> How sure are you that the linux setup was done properly?
>
> Not much help I know,
> Erick
>
> On Tue, Feb 2, 2016 at 10:11 A
t;>
>> Not much help I know,
>> Erick
>>
>> On Tue, Feb 2, 2016 at 10:11 AM, Troy Edwards
>> wrote:
>> > Rerunning the Data Import Handler again on the the linux machine has
>> > started producing some errors and w
We are running the data import handler to retrieve about 10 million records
during work hours every day of the week. We are using Clean = true, Commit
= true and Optimize = true. The entire process takes about 1 hour.
What would be a good setting for autoCommit and autoSoftCommit?
Thanks
Is it possible for the Data Import Handler to bring in maximum number of
records depending on available resources? If so, how should it be
configured?
Thanks,
I am using SolrCloud
My initial requirements are:
1) There are about 6000 clients
2) The number of documents from each client are about 50 (average
document size is about 400 bytes)
3 I have to wipe off the index/collection every night and create new
Any thoughts/ideas/suggestions on:
1) Ho
I have a requirement where I have to bulk insert a lot of documents in
SolrCloud.
My average document size is 400 bytes
Number of documents that need to be inserted 25/second (for a total of
about 3.6 Billion documents)
Any ideas/suggestions on how that can be done? (use a client or uploadcsv
Are you suggesting that requests come into a service layer that identifies
which client is on which solrcloud and passes the request to that cloud?
Thank you
On Wed, Aug 19, 2015 at 1:13 PM, Toke Eskildsen
wrote:
> Troy Edwards wrote:
> > My average document size is 400 bytes
>
Thank you for taking the time to do the test.
I have been doing similar tests using the post Tool (SimplePostTool) with
the real data and was able to get to about 10K documents/second.
I am considering using multiple files (one per client) ftp'd into a solr
node and then use a scheduled job to us
I am having a hard time finding documentation on DataImportHandler
scheduling in SolrCloud. Can someone please post a link to that? I have a
requirement that the DIH should be initiated at a specific time Monday
through Friday.
Thanks!
11:26 AM, Troy Edwards wrote:
> > I am having a hard time finding documentation on DataImportHandler
> > scheduling in SolrCloud. Can someone please post a link to that? I
> > have a requirement that the DIH should be initiated at a specific time
> > Monday through Friday.
>
I am working with the following indices
*Item*
ItemId - string
Description - text (query on this)
Categories - Multivalued text (query on this)
Sellers - Multivalued text (query on this)
SellersString - Multivalued string (Need to facet and filter on this)
*ContractItem*
ContractItemId - string
s {!terms} QParser (which makes leg-shooting easier).
> 4. what are number of documents you operate? what is update frequency? Is
> there a chance to keep both types in the single index?
>
> On Thu, Oct 1, 2015 at 5:58 AM, Troy Edwards
> wrote:
>
> > I am working with the foll
I am testing to see if we can go to Solr 5.3.1.
When I try to create collection (using collection api) I am getting an
error: "ClassNotFoundException"
org.apache.solr.handler.dataimport.DataImportHandler
I have all the files in exactly the same locations
In my solrconfig.xml:
This is where I h
I am looking for some good articles/guidance on how to determine number of
shards and replicas for an index?
Thanks
We are considering using a multivalued field that can contain up to 1000
unique values. This field will not be searched on but just used in facet
and filter.
What is the maximum number of values that a multivalued field can contain?
Is there another more efficient way of doing this?
Thanks
After running Solr on a linux box for about 15 days; today when I tried to
reload collections I got the following error
reload the collection time out:180s
org.apache.solr.common.SolrException: reload the collection time out:180s
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse
22 matches
Mail list logo