I would suggest benchmarking this before doing any more complex design. A field with only 10k unique integer or string values will search very very quickly.
On Thu, May 6, 2010 at 7:54 AM, Nagelberg, Kallin <knagelb...@globeandmail.com> wrote: > Hey everyone, > > I'm having some difficulty figuring out the best way to optimize for a > certain query situation. My documents have a many-valued field that stores > lists of IDs. All in all there are probably about 10,000 distinct IDs > throughout my index. I need to be able to query and find all documents that > contain a given set of IDs. Ie, I want to find all documents that contain IDs > 3, 202, 3030 or 505. Currently I'm implementing this like so: > > q= (myfield:3) OR (myfield:202) OR (myfield:3030) OR (myfield:505). > > It's possible that there could be upwards of hundreds of terms, although 90% > of the time it will be under 10. Ideally I would like to do this with a > filter query, but I have read that it is impossible to cache OR'd terms in a > fq, though this feature may come soon. The problem is that the combinations > of OR'd terms will almost always be unique, so the query cache will have a > very low hit rate. It would be great if the individual terms could be cached > individually, but I'm not sure how to accomplish that. > > Any suggestions would be welcome! > -Kallin Nagelberg > > -- Lance Norskog goks...@gmail.com