Hello Luis,

Your problem seems fairly obvious (hard to solve problem).
Where these set of orange id come from? Does an user enter thousand of
these ids into web-form?


On Tue, Feb 5, 2013 at 8:49 PM, Luis Lebolo <luis.leb...@gmail.com> wrote:

> Hello! First time poster so {insert ignorance disclaimer here ;)}.
>
> I'm building a web application backed by an Oracle database and we're using
> Lucene Solr to index various lists of "entities" (via DIH). We then harness
> Solr's faceting to allow the user to filter through their searches.
>
> One aspect we're having trouble modeling is the concept of data
> availability. A dataset will have a data value for various entity pairs. To
> generalize, say we have two entities: Apples and Oranges. Therefore,
> there's a data value for various Apple and Orange pairs (e.g. apple1 &
> orange5 have value 6.566).
>
> The question we want to model is "which Apples have data for a specific set
> of Oranges." The problem is that the list of Oranges can be ~2000.
>
> Our first (and albeit ugly) approach was to create a dataAvailability field
> in each Apple document. It's a multi-valued field that holds a list of
> Oranges (actually a list of Orange IDs) that have data for that specific
> Apple.
>
> Our facet query then becomes ...facet.query=dataAvailability:(1 OR 2 OR 4
> OR 45 OR 200 OR ...)...
>
> For > 1000 Oranges, the query takes a long time to run the first time a
> user performs it (afterwards it gets cached so it runs fairly quickly). Any
> thoughts on how to speed this up? Is there a better model to use?
>
> One idea was to use the autowarming features. However, the list of Oranges
> will always be dynamically built by the user (and it's not feasible to
> autowarm all possible permutations of ~2000 Oranges =)).
>
> Hope the generalization isn't too stupid, and thanks in advance!
>
> Cheers,
> Luis
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to