We will be ingesting gigabytes of new data per day, but have a lot of legacy data (petabytes) that will also need to be indexed. We will probably index many fields per record (ave. 50/record) and hope to add facets in the near future.
If this solution gives us the speed and facet capabilities we are hoping for, our searches per hour will go up by 10 times or more but will probably max out at a couple of searches per second. Thanks. Liz Sommers On Tue, Aug 24, 2010 at 12:53 PM, Glen Newton <glen.new...@gmail.com> wrote: > Liz, > > I've built terrabyte (1-2 TB) test Lucene indexes, but have not > reached to the petabyte level, so I am not sure. Certainly there is > overhead in using the http and xml marshaling/de-marshaling, which may > or may not be a critical factor for you. > > Could you give more information with respect to your application, i.e. > the nature of your data loading (i.e. many PB at once or GB per > hour/day/week accumulating to PB or MB per second/minute/hour > eventually accumulating to PB...;) searching ( i.e. the number of > fields indexed & the query complexity; if you are using facets, etc), > number of queries per second expected... > > Lucene has a limit on the number of documents (in a single index) that > might impact your application: > > http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexWriter.html#numDocs%28%29 > of a 32bit int, 2 147 483 648. > > -glen > > On 24 August 2010 12:29, Liz Sommers <lizswo...@gmail.com> wrote: > > I was worried that it wouldn't scale. We are going to be indexing > petabytes > > of data. Does the httpserver solution scale? > > > > Thanks > > > > Liz Sommers > > lizswo...@gmail.com > > > > On Tue, Aug 24, 2010 at 12:23 PM, Thomas Joiner > > <thomas.b.joi...@gmail.com>wrote: > > > >> Is there any reason you aren't using http://wiki.apache.org/solr/Solrjto > >> interact with Solr? > >> > >> On Tue, Aug 24, 2010 at 11:12 AM, Liz Sommers <lizswo...@gmail.com> > wrote: > >> > >> > I am very new to the solr/lucene world. I am using solr 1.4.0 and > cannot > >> > move to 1.4.1. > >> > > >> > I have to index about 50 fields for each document, these fields are > >> already > >> > in key/value pairs by the time I get to my index methods. I was able > to > >> > index them with lucene without any problem, but found that I could not > >> then > >> > read the indexes with solr/admin. So, I decided to use Solr for my > >> > indexing. > >> > > >> > The error I am currently getting is > >> > java.lang.RuntimeException: Can't find resource 'synonyms.txt' in > >> classpath > >> > or 'solr/conf'/' > >> > > >> > This exception is being thrown by SolrResourceLoader.openResource(line > >> > 260). > >> > which is called by IndexSchema<init> (line 102) > >> > > >> > My code that leads up to this follows: > >> > > >> > <code> > >> > String path = "c:/swdev/apache-solr-1.4.0/IDW" > >> > SolrConfig cfg new SolrConfig(path + "/solr/conf/solrconfig.xml"); > >> > schema = new IndexSchema(cfg,path + "/solr/conf/schema.xml",null); > >> > > >> > </code> > >> > > >> > This also fails if I use > >> > schema = new IndexSchema(cfg,"schema.xml",null); > >> > > >> > > >> > Any help would be greatly appreciated. > >> > > >> > Thank you > >> > > >> > Liz Sommers > >> > lizswo...@gmail.com > >> > > >> > > > > > > -- > > - >