Re: Querying for ~2000 integers - better model?

Luis Lebolo Tue, 05 Feb 2013 09:48:00 -0800

Hi Mikhail,

Thanks for the interest! The user selects various Oranges from the website.
The list of Orange IDs then gets placed into a table in our database.


For example, the user may want to search oranges from Florida (a state
filter) planted a week ago (a data filter). We then display 600 Oranges
that fit this query and the user says "select them all". We then store all
600 IDs in our database.

For the data availability filter, we get the list of Orange IDs from the
database first then use SolrJ to create the facet query.

-Luis


On Tue, Feb 5, 2013 at 12:03 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello Luis,
>
> Your problem seems fairly obvious (hard to solve problem).
> Where these set of orange id come from? Does an user enter thousand of
> these ids into web-form?
>
>
> On Tue, Feb 5, 2013 at 8:49 PM, Luis Lebolo <luis.leb...@gmail.com> wrote:
>
> > Hello! First time poster so {insert ignorance disclaimer here ;)}.
> >
> > I'm building a web application backed by an Oracle database and we're
> using
> > Lucene Solr to index various lists of "entities" (via DIH). We then
> harness
> > Solr's faceting to allow the user to filter through their searches.
> >
> > One aspect we're having trouble modeling is the concept of data
> > availability. A dataset will have a data value for various entity pairs.
> To
> > generalize, say we have two entities: Apples and Oranges. Therefore,
> > there's a data value for various Apple and Orange pairs (e.g. apple1 &
> > orange5 have value 6.566).
> >
> > The question we want to model is "which Apples have data for a specific
> set
> > of Oranges." The problem is that the list of Oranges can be ~2000.
> >
> > Our first (and albeit ugly) approach was to create a dataAvailability
> field
> > in each Apple document. It's a multi-valued field that holds a list of
> > Oranges (actually a list of Orange IDs) that have data for that specific
> > Apple.
> >
> > Our facet query then becomes ...facet.query=dataAvailability:(1 OR 2 OR 4
> > OR 45 OR 200 OR ...)...
> >
> > For > 1000 Oranges, the query takes a long time to run the first time a
> > user performs it (afterwards it gets cached so it runs fairly quickly).
> Any
> > thoughts on how to speed this up? Is there a better model to use?
> >
> > One idea was to use the autowarming features. However, the list of
> Oranges
> > will always be dynamically built by the user (and it's not feasible to
> > autowarm all possible permutations of ~2000 Oranges =)).
> >
> > Hope the generalization isn't too stupid, and thanks in advance!
> >
> > Cheers,
> > Luis
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>
>

Re: Querying for ~2000 integers - better model?

Reply via email to