We have found that 200-250mb per Lucene index is where efficiency
drops off and Lucene gets slow. You will have to use a sharding
approach: many small indexes, and all have different sets of
documents. Solr has a tool for doing queries across many shards,
called Distributed Search.

http://wiki.apache.org/solr/DistributedSearch

There is a great book from Packt Books -
https://www.packtpub.com/solr-1-4-enterprise-search-server/book

On Tue, Aug 24, 2010 at 10:10 AM, Liz Sommers <lizswo...@gmail.com> wrote:
> We will be ingesting gigabytes of new data per day, but have a lot of legacy
> data (petabytes) that will also need to be indexed.   We will probably index
> many fields per record (ave. 50/record) and hope to add facets in the near
> future.
>
> If this solution gives us the speed and facet capabilities we are hoping
> for, our searches per hour will go up by 10 times or more but will probably
> max out at a couple of searches per second.
> Thanks.
>
> Liz Sommers
>
> On Tue, Aug 24, 2010 at 12:53 PM, Glen Newton <glen.new...@gmail.com> wrote:
>
>> Liz,
>>
>> I've built terrabyte (1-2 TB) test Lucene indexes, but have not
>> reached to the petabyte level, so I am not sure. Certainly there is
>> overhead in using the http and xml marshaling/de-marshaling, which may
>> or may not be a critical factor for you.
>>
>> Could you give more information with respect to your application, i.e.
>> the nature of your data loading (i.e. many PB at once or GB per
>> hour/day/week accumulating to PB or MB per second/minute/hour
>> eventually accumulating to PB...;) searching ( i.e. the number of
>> fields indexed & the query complexity; if you are using facets, etc),
>> number of queries per second expected...
>>
>> Lucene has a limit on the number of documents (in a single index) that
>> might impact your application:
>>
>> http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexWriter.html#numDocs%28%29
>> of a 32bit int, 2 147 483 648.
>>
>> -glen
>>
>> On 24 August 2010 12:29, Liz Sommers <lizswo...@gmail.com> wrote:
>> > I was worried that it wouldn't scale.  We are going to be indexing
>> petabytes
>> > of data.  Does the httpserver solution scale?
>> >
>> > Thanks
>> >
>> > Liz Sommers
>> > lizswo...@gmail.com
>> >
>> > On Tue, Aug 24, 2010 at 12:23 PM, Thomas Joiner
>> > <thomas.b.joi...@gmail.com>wrote:
>> >
>> >> Is there any reason you aren't using http://wiki.apache.org/solr/Solrjto
>> >> interact with Solr?
>> >>
>> >> On Tue, Aug 24, 2010 at 11:12 AM, Liz Sommers <lizswo...@gmail.com>
>> wrote:
>> >>
>> >> > I am very new to the solr/lucene world.  I am using solr 1.4.0 and
>> cannot
>> >> > move to 1.4.1.
>> >> >
>> >> > I have to index about 50 fields for each document, these fields are
>> >> already
>> >> > in key/value pairs by the time I get to my index methods.  I was able
>> to
>> >> > index them with lucene without any problem, but found that I could not
>> >> then
>> >> > read the indexes with solr/admin.  So, I decided to use Solr for my
>> >> > indexing.
>> >> >
>> >> > The error I am currently getting is
>> >> > java.lang.RuntimeException: Can't find resource 'synonyms.txt' in
>> >> classpath
>> >> > or 'solr/conf'/'
>> >> >
>> >> > This exception is being thrown by SolrResourceLoader.openResource(line
>> >> > 260).
>> >> > which is called by IndexSchema<init> (line 102)
>> >> >
>> >> > My code that leads up to this follows:
>> >> >
>> >> > <code>
>> >> > String path = "c:/swdev/apache-solr-1.4.0/IDW"
>> >> > SolrConfig cfg new SolrConfig(path + "/solr/conf/solrconfig.xml");
>> >> > schema = new IndexSchema(cfg,path + "/solr/conf/schema.xml",null);
>> >> >
>> >> > </code>
>> >> >
>> >> > This also fails if I use
>> >> > schema = new IndexSchema(cfg,"schema.xml",null);
>> >> >
>> >> >
>> >> > Any help would be greatly appreciated.
>> >> >
>> >> > Thank you
>> >> >
>> >> > Liz Sommers
>> >> > lizswo...@gmail.com
>> >> >
>> >>
>> >
>>
>>
>>
>> --
>>
>> -
>>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to