Jibo,

Well, there is always field compression, which lets you trade the index 
size/disk space for extra CPU time and thus some increase in indexing and 
search latency.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Jibo John <jiboj...@mac.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 23, 2009 1:43:45 PM
> Subject: Re: Storing string field in solr.ExternalFieldFile type
> 
> Thanks for the quick response, Otis.
> 
> We have been able to achieve the ratio of 2 with different settings, however, 
> considering the huge volume of the data that we need to deal with - 600 GB of 
> data per day, and, we need to keep it in the index for 3 days - we're looking 
> at 
> all possible ways to reduce the index size further.
> Will definitely keep exploring the straightforward things and see if we can 
> find 
> a better setting.
> 
> 
> Thanks,
> -Jibo
> 
> On Jul 23, 2009, at 9:49 AM, Otis Gospodnetic wrote:
> 
> > I'm not sure if there is a lot of benefit from storing the literal values 
> > in 
> that external file vs. directly in the index.  There are a number of things 
> one 
> should look at first, as far as performance is concerned - JVM settings, 
> cache 
> sizes, analysis, etc.
> > 
> > For example, I have one index here that is 9 times the size of the original 
> data because of how its fields are analyzed.  I can change one analysis-level 
> setting and make that ratio go down to 2.  So I'd look at other, more 
> straight 
> forward things first.  There is a Wiki page either on Solr or Lucene Wiki 
> dedicated to various search performance tricks.
> > 
> > Otis
> > --
> > Sematext is hiring: http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: Jibo John 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, July 23, 2009 12:08:26 PM
> >> Subject: Re: Storing string field in solr.ExternalFieldFile type
> >> 
> >> Thanks for the response, Eric.
> >> 
> >> We have seen that size of the index has a direct impact on the search 
> >> speed,
> >> especially when the index size is in GBs, so trying all possible ways to 
> >> keep
> >> the index size as low as we can.
> >> 
> >> We thought solr.ExternalFileField type would help to keep the index size 
> >> low 
> by
> >> storing a text field out side of the index.
> >> 
> >> Here's what we were planning: initially, all the fields except the
> >> solr.ExternalFileField type field will be queried and will be displayed to 
> the
> >> end user. . There will be subsequent calls from the UI  to pull the
> >> solr.ExternalFileField field that will be loaded in a lazy manner.
> >> 
> >> However, realized that solr.ExternalFileField only supports float type, 
> however,
> >> the data that we're planning to keep as an external field is a string type.
> >> 
> >> Thanks,
> >> -Jibo
> >> 
> >> 
> >> 
> >> On Jul 22, 2009, at 1:46 PM, Erick Erickson wrote:
> >> 
> >>> Hoping the experts chime in if I'm wrong, but....
> >>> As far as I know, while storing a field increases the size of an index,
> >>> it doesn't have much impact on the search speed. Which you could
> >>> pretty easily test by creating the index both ways and firing off some
> >>> timing queries and comparing..... Although it would be time consuming...
> >>> 
> >>> I believe there's some info on the Lucene Wiki about this, but my memory
> >>> isn't what it used to be.
> >>> 
> >>> Erick
> >>> 
> >>> 
> >>> On Tue, Jul 21, 2009 at 2:42 PM, Jibo John wrote:
> >>> 
> >>>> We're in the process of building a log searcher application.
> >>>> 
> >>>> In order to reduce the index size to improve the query performance, we're
> >>>> exploring the possibility of having:
> >>>> 
> >>>> 1. One field for each log line with 'indexed=true & stored=false' that
> >>>> will be used for searching
> >>>> 2. Another field for each log line of type solr.ExternalFileField that
> >>>> will be used only for display purpose.
> >>>> 
> >>>> We realized that currently solr.ExternalFileField supports only float 
> >>>> type.
> >>>> 
> >>>> Is there a way we can override this to support string type? Any issues 
> >>>> with
> >>>> this approach?
> >>>> 
> >>>> Any ideas are welcome.
> >>>> 
> >>>> 
> >>>> Thanks,
> >>>> -Jibo
> >>>> 
> >>>> 
> >>>> 
> > 

Reply via email to