Hi all, In my code, I'd like to keep a subset of my 14M docs which is around 100k large.
What is according to you the best option in terms of speed and memory usage ? Some basic thoughts tells me the BitDocSet should be the fastest for lookup, but takes ~ 14M * sizeof(int) in memory, whereas the HashDocSet takes just ~ 100k * sizeof(int) , but is a bit slower lookup. The doc of HashDocSet says "t can be a better choice if there are few docs in the set" . What does 'few' means in this context ? Cheers ! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]