commongrams

David Hastings Fri, 10 Feb 2017 13:55:47 -0800

Hey All,
I followed an old blog post about  implementing the common grams, and used
the 400 most popular words file on a subset of my data.  original index
size was 33gb with 2.2 million documents, using the 400, it grep to 96gb.
I scaled it down to the 100 most common words and got to about 76gb, but
with a cold phrase search going from 4 seconds at 400 words to 6 with 100.
 this will not really scale well, as the base index that this is a subset
of right now has 22 million documents and sits around 360 gb.  at this
rate, it would be around a TB index size.  is there a common
hardware/software configuration to handle TB size indexes?
thanks,
DH

commongrams

Reply via email to