Does https://issues.apache.org/jira/browse/SOLR-2112 help?
Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jul 5, 2013 at 5:57 PM, Valery Giner <valgi...@research.att.com> wrote: > As a simplest example, just write a query result into a file for processing > by external programs (the programs are out of our control, and the result > could contain millions of docs) > > Thanks, > Val > > On 07/05/2013 04:41 PM, Walter Underwood wrote: >> >> What are you doing that start=500000 is normal? --wunder >> >> On Jul 5, 2013, at 1:28 PM, Valery Giner wrote: >> >>> Eric, >>> >>> We did not have any RAM problems, but just the following official >>> limitation makes our life too miserable to use the shards: >>> >>> "Makes it more inefficient to use a high "start" parameter. For example, >>> if you request start=500000&rows=25 on an index with 500,000+ docs per >>> shard, this will currently result in 500,000 records getting sent over the >>> network from the shard to the coordinating Solr instance. If you had a >>> single-shard index, in contrast, only 25 records would ever get sent over >>> the network. (Granted, setting start this high is not something many people >>> need to do.) " http://wiki.apache.org/solr/DistributedSearch >>> >>> Reading millions of documents as a result of a query is a "normal" use >>> case for us, not a "design defect". Subdividing the "large" indexes into >>> smaller ones seems too ugly to use as a way to scale up. This turns solr >>> from a perfect solution for us into something unacceptable for such cases. >>> >>> I wonder whether any one else has similar use cases/problem with >>> sharding. >>> >>> Thanks, >>> Val >>> >>> On 05/03/2013 12:10 PM, Erick Erickson wrote: >>>> >>>> My off the cuff thought is that there are significant costs trying to >>>> do this that would be paid by 99.999% of setups out there. Also, >>>> usually you'll run into other issues (RAM etc) long before you come >>>> anywhere close to 2^31 docs. >>>> >>>> Lucene/Solr often allocates int[maxDoc] for various operations. when >>>> maxDoc approaches 2^31, well memory goes through the roof. Now >>>> consider allocating longs instead... >>>> >>>> which is a long way of saying that I don't really think anyone's going >>>> to be working on this any time soon, especially when SolrCloud removes >>>> a LOT of the pain /complexity (from a user perspective anyway) from >>>> going to a sharded setup... >>>> >>>> FWIW, >>>> Erick >>>> >>>> On Thu, May 2, 2013 at 1:17 PM, Valery Giner <valgi...@research.att.com> >>>> wrote: >>>>> >>>>> Otis, >>>>> >>>>> The documents themselves are relatively small, tens of fields, only a >>>>> few of >>>>> them could be up to a hundred bytes. >>>>> Lunix Servers with relatively large RAM (256), >>>>> Minutes on the searches are fine for our purposes, adding a few tens >>>>> of >>>>> millions of records in tens of minutes are also fine. >>>>> We had to do some simple tricks for keeping indexing up to speed but >>>>> nothing >>>>> too fancy. >>>>> Moving to the sharding adds a layer of complexity which we don't really >>>>> need >>>>> because of the above, ... and adding complexity may result in lower >>>>> reliability :) >>>>> >>>>> Thanks, >>>>> Val >>>>> >>>>> >>>>> On 05/02/2013 03:41 PM, Otis Gospodnetic wrote: >>>>>> >>>>>> Val, >>>>>> >>>>>> Haven't seen this mentioned in a while... >>>>>> >>>>>> I'm curious...what sort of index, queries, hardware, and latency >>>>>> requirements do you have? >>>>>> >>>>>> Otis >>>>>> Solr & ElasticSearch Support >>>>>> http://sematext.com/ >>>>>> On May 1, 2013 4:36 PM, "Valery Giner" <valgi...@research.att.com> >>>>>> wrote: >>>>>> >>>>>>> Dear Solr Developers, >>>>>>> >>>>>>> I've been unable to find an answer to the question in the subject >>>>>>> line of >>>>>>> this e-mail, except of a vague one. >>>>>>> >>>>>>> We need to be able to index over 2bln+ documents. We were doing >>>>>>> well >>>>>>> without sharding until the number of docs hit the limit ( 2bln+). >>>>>>> The >>>>>>> performance was satisfactory for the queries, updates and indexing of >>>>>>> new >>>>>>> documents. >>>>>>> >>>>>>> That is, except for the need to go around the int32 limit, we don't >>>>>>> really >>>>>>> have a need for setting up distributed solr. >>>>>>> >>>>>>> I wonder whether some one on the solr team could tell us when/what >>>>>>> version >>>>>>> of solr we could expect the limit to be removed. >>>>>>> >>>>>>> I hope this question may be of interest to some one else :) >>>>>>> >>>>>>> -- >>>>>>> Thanks, >>>>>>> Val >>>>>>> >>>>>>> >> -- >> Walter Underwood >> wun...@wunderwood.org >> >> >> >> >