Jibo, Well, there is always field compression, which lets you trade the index size/disk space for extra CPU time and thus some increase in indexing and search latency.
Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR ----- Original Message ---- > From: Jibo John <jiboj...@mac.com> > To: solr-user@lucene.apache.org > Sent: Thursday, July 23, 2009 1:43:45 PM > Subject: Re: Storing string field in solr.ExternalFieldFile type > > Thanks for the quick response, Otis. > > We have been able to achieve the ratio of 2 with different settings, however, > considering the huge volume of the data that we need to deal with - 600 GB of > data per day, and, we need to keep it in the index for 3 days - we're looking > at > all possible ways to reduce the index size further. > Will definitely keep exploring the straightforward things and see if we can > find > a better setting. > > > Thanks, > -Jibo > > On Jul 23, 2009, at 9:49 AM, Otis Gospodnetic wrote: > > > I'm not sure if there is a lot of benefit from storing the literal values > > in > that external file vs. directly in the index. There are a number of things > one > should look at first, as far as performance is concerned - JVM settings, > cache > sizes, analysis, etc. > > > > For example, I have one index here that is 9 times the size of the original > data because of how its fields are analyzed. I can change one analysis-level > setting and make that ratio go down to 2. So I'd look at other, more > straight > forward things first. There is a Wiki page either on Solr or Lucene Wiki > dedicated to various search performance tricks. > > > > Otis > > -- > > Sematext is hiring: http://sematext.com/about/jobs.html?mls > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > > > > > ----- Original Message ---- > >> From: Jibo John > >> To: solr-user@lucene.apache.org > >> Sent: Thursday, July 23, 2009 12:08:26 PM > >> Subject: Re: Storing string field in solr.ExternalFieldFile type > >> > >> Thanks for the response, Eric. > >> > >> We have seen that size of the index has a direct impact on the search > >> speed, > >> especially when the index size is in GBs, so trying all possible ways to > >> keep > >> the index size as low as we can. > >> > >> We thought solr.ExternalFileField type would help to keep the index size > >> low > by > >> storing a text field out side of the index. > >> > >> Here's what we were planning: initially, all the fields except the > >> solr.ExternalFileField type field will be queried and will be displayed to > the > >> end user. . There will be subsequent calls from the UI to pull the > >> solr.ExternalFileField field that will be loaded in a lazy manner. > >> > >> However, realized that solr.ExternalFileField only supports float type, > however, > >> the data that we're planning to keep as an external field is a string type. > >> > >> Thanks, > >> -Jibo > >> > >> > >> > >> On Jul 22, 2009, at 1:46 PM, Erick Erickson wrote: > >> > >>> Hoping the experts chime in if I'm wrong, but.... > >>> As far as I know, while storing a field increases the size of an index, > >>> it doesn't have much impact on the search speed. Which you could > >>> pretty easily test by creating the index both ways and firing off some > >>> timing queries and comparing..... Although it would be time consuming... > >>> > >>> I believe there's some info on the Lucene Wiki about this, but my memory > >>> isn't what it used to be. > >>> > >>> Erick > >>> > >>> > >>> On Tue, Jul 21, 2009 at 2:42 PM, Jibo John wrote: > >>> > >>>> We're in the process of building a log searcher application. > >>>> > >>>> In order to reduce the index size to improve the query performance, we're > >>>> exploring the possibility of having: > >>>> > >>>> 1. One field for each log line with 'indexed=true & stored=false' that > >>>> will be used for searching > >>>> 2. Another field for each log line of type solr.ExternalFileField that > >>>> will be used only for display purpose. > >>>> > >>>> We realized that currently solr.ExternalFileField supports only float > >>>> type. > >>>> > >>>> Is there a way we can override this to support string type? Any issues > >>>> with > >>>> this approach? > >>>> > >>>> Any ideas are welcome. > >>>> > >>>> > >>>> Thanks, > >>>> -Jibo > >>>> > >>>> > >>>> > >