Re: Querying for ~2000 integers - better model?

Jack Krupansky Tue, 05 Feb 2013 09:50:16 -0800

Could you describe in more detail what the user queries (not thefacet/filters) would actually look like. What are they actually looking forin terms of "documents"?

In terms of modeling, the idea behind a query is that it identifies a set ofdocuments which will then be scored for relevancy, or otherwise sorted bysome single-valued field(s). So, again, what do the "documents" want tocontain.

Multivalued fields do have some reasonable uses, but they are not uniformlywell-supported by all Solr features, so try starting with a data model thatuses multiple documents rather than multi-valued fields. In other words,fully flatten or denormalize your data first. That might not be ideal, butis a better starting point.


-- Jack Krupansky

-----Original Message-----From: Mikhail Khludnev

Sent: Tuesday, February 05, 2013 12:03 PM
To: solr-user
Subject: Re: Querying for ~2000 integers - better model?

Hello Luis,

Your problem seems fairly obvious (hard to solve problem).
Where these set of orange id come from? Does an user enter thousand of
these ids into web-form?

On Tue, Feb 5, 2013 at 8:49 PM, Luis Lebolo <luis.leb...@gmail.com> wrote:

Hello! First time poster so {insert ignorance disclaimer here ;)}.

I'm building a web application backed by an Oracle database and we'reusingLucene Solr to index various lists of "entities" (via DIH). We thenharness

Solr's faceting to allow the user to filter through their searches.

One aspect we're having trouble modeling is the concept of data

availability. A dataset will have a data value for various entity pairs.To

generalize, say we have two entities: Apples and Oranges. Therefore,
there's a data value for various Apple and Orange pairs (e.g. apple1 &
orange5 have value 6.566).

The question we want to model is "which Apples have data for a specificset

of Oranges." The problem is that the list of Oranges can be ~2000.

Our first (and albeit ugly) approach was to create a dataAvailabilityfield

in each Apple document. It's a multi-valued field that holds a list of
Oranges (actually a list of Orange IDs) that have data for that specific
Apple.

Our facet query then becomes ...facet.query=dataAvailability:(1 OR 2 OR 4
OR 45 OR 200 OR ...)...

For > 1000 Oranges, the query takes a long time to run the first time a

user performs it (afterwards it gets cached so it runs fairly quickly).Any

thoughts on how to speed this up? Is there a better model to use?

One idea was to use the autowarming features. However, the list of Oranges
will always be dynamically built by the user (and it's not feasible to
autowarm all possible permutations of ~2000 Oranges =)).

Hope the generalization isn't too stupid, and thanks in advance!

Cheers,
Luis




--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>

<mkhlud...@griddynamics.com>

Re: Querying for ~2000 integers - better model?

Reply via email to