Hi Andre,

changing the entity in your index from donor to gift changes of course
the scope of your search results. I found it helpful to re-think such
change from that "other" side (the result side).
If the users of your search application look for individual gifts, in
the end, then changing the index to gift is for the better.

If they are searching for donors, then I would rethink the change but
not discard it completely: you can still get the list of distinct donors
by facetting over donors. You can show the users that list of donors
(the facets), and they can chose from it and get all information on that
donor (restricted to the original query, of course). The information
would include the actual search result of a list of gifts that passed
the query.

Cheers,
Chantal

On Wed, 2010-09-15 at 21:49 +0200, Andre Bickford wrote:
> Thanks for the response Erick.
> 
> I did actually try exactly what you suggested. I flipped the index over so 
> that a gift is the document. This solution certainly solves the previous 
> problem, but introduces a new issue where the search results show duplicate 
> donors. If a donor gave 12 times in a year, and we offer full years as facet 
> ranges, my understanding is that you'd see that donor 12 times in the search 
> results, once for each gift document. Obviously I could do some client side 
> filtering to list only distinct donors, but I was hoping to avoid that.
> 
> If I've simply stumbled into the basic tradeoffs of denormalization, I can 
> live with client side de-duplication, but if you have any further suggestions 
> I'm all eyes.
> 
> As for sizing, we have some huge charities as clients. However, right now I'm 
> testing on a copy of prod data from a smaller client with ~350,000 donors and 
> ~8,000,000 gift records. So, when I "flipped" the index around as you 
> suggested, it went from 350,000 documents to 8,000,000 documents. No issues 
> with performance at all.
> 
> Thanks again,
> Andre
> 
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com] 
> Sent: Wednesday, September 15, 2010 3:09 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Simple Filter Query (fq) Use Case Question
> 
> One strategy is to denormalize all the way. That is, each
> Solr "document" is Gift Amount and Gift Date would not be multiValued.
> You'd create a different "document" for each gift, so you'd have multiple
> documents with the same Id, Name, and Address. Be careful, though,
> if you've defined Id as a UniqueKey, you'd only have one record/donor. You
> can handle this easily enough by making a composite key of Id+Gift Date
> (assuming no donor made more than one gift on exactly the same date).
> 
> I know this goes completely against all the reflexes you've built up with
> working with DBs, but...
> 
> Can you give us a clue how many donations we're talking about here?
> You'd have to be working with a really big nonprofit to get enough documents
> to have to start worrying about making your index smaller.
> 
> HTH
> Erick
> 
> On Wed, Sep 15, 2010 at 1:41 PM, Andre Bickford <abickf...@softrek.com>wrote:
> 
> > I'm working on creating a solr index search for a charitable organization.
> > The solr index stores documents of donors. Each donor document has the
> > following four fields:
> >
> > Id
> > Name
> > Address
> > Gift Amount (multiValued)
> > Gift Date (multiValued)
> >
> > In our relational database, there is a one-to-many relationship between the
> > DONOR table and the GIFT table. One donor can of course give many gifts over
> > time. Consequently, I created the Gift Amount and Gift Date fields to be
> > mutiValued.
> >
> > Now, consider the following query filtered for gifts last month between $0
> > and $100:
> >
> > q=name:Jones
> > fq=giftDate:[NOW/MONTH-1 TO NOW/MONTH]
> > fq=giftAmount:[0 TO 100]
> >
> > The results show me donors who donated ANY amount in the past month and
> > donors who had EVER in the past given a gift between $0 and $100. I was
> > hoping to only see donors who had given a gift between $0 and $100 in the
> > past month exclusively. I believe the problem is that I neglected to
> > consider that for two multiValued fields, while the values might align
> > "index wise", there is really no other association between the two fields,
> > so the filter query intersection isn't really behaving as I expected.
> >
> > I think this is a fundamental question of one-to-many denormalization, but
> > obviously I'm not yet experienced enough with Lucene/Solr to find a
> > solution. As to why not just keep using a relational database, it's because
> > I'm trying to provide a faceting solution to "drill down" to donors. The
> > aforementioned fq parameters would come from faceting. Oh, that and Oracle
> > Text indexes are a PITA. :-)
> >
> > Thanks for any help you can provide.
> >
> > André Bickford
> > Software Engineering Team Leader
> > SofTrek Corporation
> > 30 Bryant Woods North  Amherst, NY 14228
> > 716.691.2800 x154  800.442.9211  Fax: 716.691.2828
> > abickf...@softrek.com  www.softrek.com
> >
> >
> >
> 


Reply via email to