Basically Term Vectors are only on one main field i.e. Contents. Average size of each document would be few KB's but there are around 130 million documents so what do you suggest now?
On Fri, Feb 4, 2011 at 5:24 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com > wrote: > Salman, > > It also depends on the size of your documents. Re-analyzing 20 fields of > 500 > bytes each will be a lot faster than re-analyzing 20 fields with 50 KB > each. > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- > > From: Grant Ingersoll <gsing...@apache.org> > > To: solr-user@lucene.apache.org > > Sent: Wed, January 26, 2011 10:44:09 AM > > Subject: Re: Highlighting with/without Term Vectors > > > > > > On Jan 24, 2011, at 2:42 PM, Salman Akram wrote: > > > > > Hi, > > > > > > Does anyone have any benchmarks how much highlighting speeds up with > Term > > > Vectors (compared to without it)? e.g. if highlighting on 20 documents > take > > > 1 sec with Term Vectors any idea how long it will take without them? > > > > > > I need to know since the index used for highlighting has a TVF file of > > > around 450GB (approx 65% of total index size) so I am trying to see > whether > > > the decreasing the index size by dropping TVF would be more helpful > for > > > performance (less RAM, should be good for I/O too I guess) or keeping > it is > > > still better? > > > > > > I know the best way is try it out but indexing takes a very long time > so > > > trying to see whether its even worthy or not. > > > > > > Try testing on a smaller set. In general, you are saving the process of > >re-analyzing the content, so, to some extent it is going to be dependent > on how > >fast your analyzer chain is. At the size you are at, I don't know if > storing > >TVs is worth it. > -- Regards, Salman Akram