Re: Having problems with the java api in 1.4.0

2010-08-24 Thread Lance Norskog
We have found that 200-250mb per Lucene index is where efficiency drops off and Lucene gets slow. You will have to use a sharding approach: many small indexes, and all have different sets of documents. Solr has a tool for doing queries across many shards, called Distributed Search. http://wiki.apa

Re: Having problems with the java api in 1.4.0

2010-08-24 Thread Liz Sommers
We will be ingesting gigabytes of new data per day, but have a lot of legacy data (petabytes) that will also need to be indexed. We will probably index many fields per record (ave. 50/record) and hope to add facets in the near future. If this solution gives us the speed and facet capabilities we

Re: Having problems with the java api in 1.4.0

2010-08-24 Thread Glen Newton
Liz, I've built terrabyte (1-2 TB) test Lucene indexes, but have not reached to the petabyte level, so I am not sure. Certainly there is overhead in using the http and xml marshaling/de-marshaling, which may or may not be a critical factor for you. Could you give more information with respect to

Re: Having problems with the java api in 1.4.0

2010-08-24 Thread Liz Sommers
We do have synonyms.txt in our config directory. The config directory is a copy of the example directory. We will probably also run into this problem with stopwords.xml. We don't understand how to make it look in the correct directory. We thought it got the correct directory out of the solrconf

Re: Having problems with the java api in 1.4.0

2010-08-24 Thread Liz Sommers
I was worried that it wouldn't scale. We are going to be indexing petabytes of data. Does the httpserver solution scale? Thanks Liz Sommers lizswo...@gmail.com On Tue, Aug 24, 2010 at 12:23 PM, Thomas Joiner wrote: > Is there any reason you aren't using http://wiki.apache.org/solr/Solrj to >

Re: Having problems with the java api in 1.4.0

2010-08-24 Thread Rafał Kuć
Hello! The exception thrown by Solr says that You do not have synonyms.txt file either in classpath or in solr core config directory. Check Your schema.xml file for a filter - SynonymFilterFactory. That filter use synonyms.txt file to read synonyms definitions. If You don`t need synonyms filter

Re: Having problems with the java api in 1.4.0

2010-08-24 Thread Thomas Joiner
Is there any reason you aren't using http://wiki.apache.org/solr/Solrj to interact with Solr? On Tue, Aug 24, 2010 at 11:12 AM, Liz Sommers wrote: > I am very new to the solr/lucene world. I am using solr 1.4.0 and cannot > move to 1.4.1. > > I have to index about 50 fields for each document, t