Hello Luis, Your problem seems fairly obvious (hard to solve problem). Where these set of orange id come from? Does an user enter thousand of these ids into web-form?
On Tue, Feb 5, 2013 at 8:49 PM, Luis Lebolo <luis.leb...@gmail.com> wrote: > Hello! First time poster so {insert ignorance disclaimer here ;)}. > > I'm building a web application backed by an Oracle database and we're using > Lucene Solr to index various lists of "entities" (via DIH). We then harness > Solr's faceting to allow the user to filter through their searches. > > One aspect we're having trouble modeling is the concept of data > availability. A dataset will have a data value for various entity pairs. To > generalize, say we have two entities: Apples and Oranges. Therefore, > there's a data value for various Apple and Orange pairs (e.g. apple1 & > orange5 have value 6.566). > > The question we want to model is "which Apples have data for a specific set > of Oranges." The problem is that the list of Oranges can be ~2000. > > Our first (and albeit ugly) approach was to create a dataAvailability field > in each Apple document. It's a multi-valued field that holds a list of > Oranges (actually a list of Orange IDs) that have data for that specific > Apple. > > Our facet query then becomes ...facet.query=dataAvailability:(1 OR 2 OR 4 > OR 45 OR 200 OR ...)... > > For > 1000 Oranges, the query takes a long time to run the first time a > user performs it (afterwards it gets cached so it runs fairly quickly). Any > thoughts on how to speed this up? Is there a better model to use? > > One idea was to use the autowarming features. However, the list of Oranges > will always be dynamically built by the user (and it's not feasible to > autowarm all possible permutations of ~2000 Oranges =)). > > Hope the generalization isn't too stupid, and thanks in advance! > > Cheers, > Luis > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>