There's something that works a little bit like 'DISTINCT' called field 
collapsing. Take a look in the archives for it.

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/15/10, Andre Bickford <abickf...@softrek.com> wrote:

> From: Andre Bickford <abickf...@softrek.com>
> Subject: RE: Simple Filter Query (fq) Use Case Question
> To: solr-user@lucene.apache.org
> Date: Wednesday, September 15, 2010, 12:49 PM
> Thanks for the response Erick.
> 
> I did actually try exactly what you suggested. I flipped
> the index over so that a gift is the document. This solution
> certainly solves the previous problem, but introduces a new
> issue where the search results show duplicate donors. If a
> donor gave 12 times in a year, and we offer full years as
> facet ranges, my understanding is that you'd see that donor
> 12 times in the search results, once for each gift document.
> Obviously I could do some client side filtering to list only
> distinct donors, but I was hoping to avoid that.
> 
> If I've simply stumbled into the basic tradeoffs of
> denormalization, I can live with client side de-duplication,
> but if you have any further suggestions I'm all eyes.
> 
> As for sizing, we have some huge charities as clients.
> However, right now I'm testing on a copy of prod data from a
> smaller client with ~350,000 donors and ~8,000,000 gift
> records. So, when I "flipped" the index around as you
> suggested, it went from 350,000 documents to 8,000,000
> documents. No issues with performance at all.
> 
> Thanks again,
> Andre
> 
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> 
> Sent: Wednesday, September 15, 2010 3:09 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Simple Filter Query (fq) Use Case Question
> 
> One strategy is to denormalize all the way. That is, each
> Solr "document" is Gift Amount and Gift Date would not be
> multiValued.
> You'd create a different "document" for each gift, so you'd
> have multiple
> documents with the same Id, Name, and Address. Be careful,
> though,
> if you've defined Id as a UniqueKey, you'd only have one
> record/donor. You
> can handle this easily enough by making a composite key of
> Id+Gift Date
> (assuming no donor made more than one gift on exactly the
> same date).
> 
> I know this goes completely against all the reflexes you've
> built up with
> working with DBs, but...
> 
> Can you give us a clue how many donations we're talking
> about here?
> You'd have to be working with a really big nonprofit to get
> enough documents
> to have to start worrying about making your index smaller.
> 
> HTH
> Erick
> 
> On Wed, Sep 15, 2010 at 1:41 PM, Andre Bickford <abickf...@softrek.com>wrote:
> 
> > I'm working on creating a solr index search for a
> charitable organization.
> > The solr index stores documents of donors. Each donor
> document has the
> > following four fields:
> >
> > Id
> > Name
> > Address
> > Gift Amount (multiValued)
> > Gift Date (multiValued)
> >
> > In our relational database, there is a one-to-many
> relationship between the
> > DONOR table and the GIFT table. One donor can of
> course give many gifts over
> > time. Consequently, I created the Gift Amount and Gift
> Date fields to be
> > mutiValued.
> >
> > Now, consider the following query filtered for gifts
> last month between $0
> > and $100:
> >
> > q=name:Jones
> > fq=giftDate:[NOW/MONTH-1 TO NOW/MONTH]
> > fq=giftAmount:[0 TO 100]
> >
> > The results show me donors who donated ANY amount in
> the past month and
> > donors who had EVER in the past given a gift between
> $0 and $100. I was
> > hoping to only see donors who had given a gift between
> $0 and $100 in the
> > past month exclusively. I believe the problem is that
> I neglected to
> > consider that for two multiValued fields, while the
> values might align
> > "index wise", there is really no other association
> between the two fields,
> > so the filter query intersection isn't really behaving
> as I expected.
> >
> > I think this is a fundamental question of one-to-many
> denormalization, but
> > obviously I'm not yet experienced enough with
> Lucene/Solr to find a
> > solution. As to why not just keep using a relational
> database, it's because
> > I'm trying to provide a faceting solution to "drill
> down" to donors. The
> > aforementioned fq parameters would come from faceting.
> Oh, that and Oracle
> > Text indexes are a PITA. :-)
> >
> > Thanks for any help you can provide.
> >
> > André Bickford
> > Software Engineering Team Leader
> > SofTrek Corporation
> > 30 Bryant Woods North  Amherst, NY 14228
> > 716.691.2800 x154  800.442.9211  Fax:
> 716.691.2828
> > abickf...@softrek.com 
> www.softrek.com
> >
> >
> >
> 
> 
>

Reply via email to