There's something that works a little bit like 'DISTINCT' called field collapsing. Take a look in the archives for it.
Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/15/10, Andre Bickford <abickf...@softrek.com> wrote: > From: Andre Bickford <abickf...@softrek.com> > Subject: RE: Simple Filter Query (fq) Use Case Question > To: solr-user@lucene.apache.org > Date: Wednesday, September 15, 2010, 12:49 PM > Thanks for the response Erick. > > I did actually try exactly what you suggested. I flipped > the index over so that a gift is the document. This solution > certainly solves the previous problem, but introduces a new > issue where the search results show duplicate donors. If a > donor gave 12 times in a year, and we offer full years as > facet ranges, my understanding is that you'd see that donor > 12 times in the search results, once for each gift document. > Obviously I could do some client side filtering to list only > distinct donors, but I was hoping to avoid that. > > If I've simply stumbled into the basic tradeoffs of > denormalization, I can live with client side de-duplication, > but if you have any further suggestions I'm all eyes. > > As for sizing, we have some huge charities as clients. > However, right now I'm testing on a copy of prod data from a > smaller client with ~350,000 donors and ~8,000,000 gift > records. So, when I "flipped" the index around as you > suggested, it went from 350,000 documents to 8,000,000 > documents. No issues with performance at all. > > Thanks again, > Andre > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > > Sent: Wednesday, September 15, 2010 3:09 PM > To: solr-user@lucene.apache.org > Subject: Re: Simple Filter Query (fq) Use Case Question > > One strategy is to denormalize all the way. That is, each > Solr "document" is Gift Amount and Gift Date would not be > multiValued. > You'd create a different "document" for each gift, so you'd > have multiple > documents with the same Id, Name, and Address. Be careful, > though, > if you've defined Id as a UniqueKey, you'd only have one > record/donor. You > can handle this easily enough by making a composite key of > Id+Gift Date > (assuming no donor made more than one gift on exactly the > same date). > > I know this goes completely against all the reflexes you've > built up with > working with DBs, but... > > Can you give us a clue how many donations we're talking > about here? > You'd have to be working with a really big nonprofit to get > enough documents > to have to start worrying about making your index smaller. > > HTH > Erick > > On Wed, Sep 15, 2010 at 1:41 PM, Andre Bickford <abickf...@softrek.com>wrote: > > > I'm working on creating a solr index search for a > charitable organization. > > The solr index stores documents of donors. Each donor > document has the > > following four fields: > > > > Id > > Name > > Address > > Gift Amount (multiValued) > > Gift Date (multiValued) > > > > In our relational database, there is a one-to-many > relationship between the > > DONOR table and the GIFT table. One donor can of > course give many gifts over > > time. Consequently, I created the Gift Amount and Gift > Date fields to be > > mutiValued. > > > > Now, consider the following query filtered for gifts > last month between $0 > > and $100: > > > > q=name:Jones > > fq=giftDate:[NOW/MONTH-1 TO NOW/MONTH] > > fq=giftAmount:[0 TO 100] > > > > The results show me donors who donated ANY amount in > the past month and > > donors who had EVER in the past given a gift between > $0 and $100. I was > > hoping to only see donors who had given a gift between > $0 and $100 in the > > past month exclusively. I believe the problem is that > I neglected to > > consider that for two multiValued fields, while the > values might align > > "index wise", there is really no other association > between the two fields, > > so the filter query intersection isn't really behaving > as I expected. > > > > I think this is a fundamental question of one-to-many > denormalization, but > > obviously I'm not yet experienced enough with > Lucene/Solr to find a > > solution. As to why not just keep using a relational > database, it's because > > I'm trying to provide a faceting solution to "drill > down" to donors. The > > aforementioned fq parameters would come from faceting. > Oh, that and Oracle > > Text indexes are a PITA. :-) > > > > Thanks for any help you can provide. > > > > André Bickford > > Software Engineering Team Leader > > SofTrek Corporation > > 30 Bryant Woods North Amherst, NY 14228 > > 716.691.2800 x154 800.442.9211 Fax: > 716.691.2828 > > abickf...@softrek.com > www.softrek.com > > > > > > > > >