Re: Simple Filter Query (fq) Use Case Question

Jonathan Rochkind Thu, 16 Sep 2010 11:20:55 -0700

One solr core has essentially one index in it. (not only one 'field',but one indexed collection of documents) There are weird hacks, like Ibelieve the spellcheck component kind of creates it's own sub-indexes,not sure how it does that.

You can have more than one core in a single solr instance, but they'reessentially seperate, there's no easy way to 'join' accross them oranything, a given request targets one core.


Dennis Gearon wrote:

This brings me to ask a question that's been on my mind for awhile.

Are indexes set up for the whole site, or a set of searches, with several 
different indexes for a site?

How many instances does one Solr/Lucene instance have access to, (not counting 
shards/segments)?
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 9/16/10, Chantal Ackermann <chantal.ackerm...@btelligent.de> wrote:

From: Chantal Ackermann <chantal.ackerm...@btelligent.de>
Subject: RE: Simple Filter Query (fq) Use Case Question
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Date: Thursday, September 16, 2010, 1:05 AM
Hi Andre,

changing the entity in your index from donor to gift
changes of course
the scope of your search results. I found it helpful to
re-think such
change from that "other" side (the result side).
If the users of your search application look for individual
gifts, in
the end, then changing the index to gift is for the
better.

If they are searching for donors, then I would rethink the
change but
not discard it completely: you can still get the list of
distinct donors
by facetting over donors. You can show the users that list
of donors
(the facets), and they can chose from it and get all
information on that
donor (restricted to the original query, of course). The
information
would include the actual search result of a list of gifts
that passed
the query.

Cheers,
Chantal

On Wed, 2010-09-15 at 21:49 +0200, Andre Bickford wrote:

Thanks for the response Erick.

I did actually try exactly what you suggested. I

flipped the index over so that a gift is the document. This
solution certainly solves the previous problem, but
introduces a new issue where the search results show
duplicate donors. If a donor gave 12 times in a year, and we
offer full years as facet ranges, my understanding is that
you'd see that donor 12 times in the search results, once
for each gift document. Obviously I could do some client
side filtering to list only distinct donors, but I was
hoping to avoid that.

If I've simply stumbled into the basic tradeoffs of

denormalization, I can live with client side de-duplication,
but if you have any further suggestions I'm all eyes.

As for sizing, we have some huge charities as clients.

However, right now I'm testing on a copy of prod data from a
smaller client with ~350,000 donors and ~8,000,000 gift
records. So, when I "flipped" the index around as you
suggested, it went from 350,000 documents to 8,000,000
documents. No issues with performance at all.

Thanks again,
Andre

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]

Sent: Wednesday, September 15, 2010 3:09 PM

To: solr-user@lucene.apache.org
Subject: Re: Simple Filter Query (fq) Use Case

Question

One strategy is to denormalize all the way. That is,

each

Solr "document" is Gift Amount and Gift Date would not

be multiValued.

You'd create a different "document" for each gift, so

you'd have multiple

documents with the same Id, Name, and Address. Be

careful, though,

if you've defined Id as a UniqueKey, you'd only have

one record/donor. You

can handle this easily enough by making a composite

key of Id+Gift Date

(assuming no donor made more than one gift on exactly

the same date).

I know this goes completely against all the reflexes

you've built up with

working with DBs, but...

Can you give us a clue how many donations we're

talking about here?

You'd have to be working with a really big nonprofit

to get enough documents

to have to start worrying about making your index

smaller.

HTH
Erick

On Wed, Sep 15, 2010 at 1:41 PM, Andre Bickford <abickf...@softrek.com>wrote:

I'm working on creating a solr index search for a

charitable organization.

The solr index stores documents of donors. Each

donor document has the

following four fields:

Id
Name
Address
Gift Amount (multiValued)
Gift Date (multiValued)

In our relational database, there is a

one-to-many relationship between the

DONOR table and the GIFT table. One donor can of

course give many gifts over

time. Consequently, I created the Gift Amount and

Gift Date fields to be

mutiValued.

Now, consider the following query filtered for

gifts last month between $0

and $100:

q=name:Jones
fq=giftDate:[NOW/MONTH-1 TO NOW/MONTH]
fq=giftAmount:[0 TO 100]

The results show me donors who donated ANY amount

in the past month and

donors who had EVER in the past given a gift

between $0 and $100. I was

hoping to only see donors who had given a gift

between $0 and $100 in the

past month exclusively. I believe the problem is

that I neglected to

consider that for two multiValued fields, while

the values might align

"index wise", there is really no other

association between the two fields,

so the filter query intersection isn't really

behaving as I expected.

I think this is a fundamental question of

one-to-many denormalization, but

obviously I'm not yet experienced enough with

Lucene/Solr to find a

solution. As to why not just keep using a

relational database, it's because

I'm trying to provide a faceting solution to

"drill down" to donors. The

aforementioned fq parameters would come from

faceting. Oh, that and Oracle

Text indexes are a PITA. :-)

Thanks for any help you can provide.

André Bickford
Software Engineering Team Leader
SofTrek Corporation
30 Bryant Woods North  Amherst, NY 14228
716.691.2800 x154  800.442.9211  Fax:

716.691.2828

abickf...@softrek.com

www.softrek.com

Re: Simple Filter Query (fq) Use Case Question

Reply via email to