Thanks to everyone for your suggestions. It seems that creating the index using gifts as the top level entity is the appropriate approach so I can effectively filter gifts on both the gift amount and gift date without running into multiValued field issues. It introduces a problem of listing donors multiple times, but that can be addressed by the field collapsing feature which will hopefully be completed in trunk soon.
For anyone else who is looking for information on the Solr equivalent of "select distinct", check out these resources: http://wiki.apache.org/solr/FieldCollapsing https://issues.apache.org/jira/browse/SOLR-236 On Sep 16, 2010, at 2:26 PM, Dennis Gearon wrote: > So THAT'S what a core is! I have been wondering. Thank you very much! > Dennis Gearon > > Signature Warning > ---------------- > EARTH has a Right To Life, > otherwise we all die. > > Read 'Hot, Flat, and Crowded' > Laugh at http://www.yert.com/film.php > > > --- On Thu, 9/16/10, Jonathan Rochkind <rochk...@jhu.edu> wrote: > >> From: Jonathan Rochkind <rochk...@jhu.edu> >> Subject: Re: Simple Filter Query (fq) Use Case Question >> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> >> Date: Thursday, September 16, 2010, 11:20 AM >> One solr core has essentially one >> index in it. (not only one 'field', >> but one indexed collection of documents) There are weird >> hacks, like I >> believe the spellcheck component kind of creates it's own >> sub-indexes, >> not sure how it does that. >> >> You can have more than one core in a single solr instance, >> but they're >> essentially seperate, there's no easy way to 'join' accross >> them or >> anything, a given request targets one core. >> >> Dennis Gearon wrote: >>> This brings me to ask a question that's been on my >> mind for awhile. >>> >>> Are indexes set up for the whole site, or a set of >> searches, with several different indexes for a site? >>> >>> How many instances does one Solr/Lucene instance have >> access to, (not counting shards/segments)? >>> Dennis Gearon >>> >>> Signature Warning >>> ---------------- >>> EARTH has a Right To Life, >>> otherwise we all die. >>> >>> Read 'Hot, Flat, and Crowded' >>> Laugh at http://www.yert.com/film.php >>> >>> >>> --- On Thu, 9/16/10, Chantal Ackermann <chantal.ackerm...@btelligent.de> >> wrote: >>> >>> >>>> From: Chantal Ackermann <chantal.ackerm...@btelligent.de> >>>> Subject: RE: Simple Filter Query (fq) Use Case >> Question >>>> To: "solr-user@lucene.apache.org" >> <solr-user@lucene.apache.org> >>>> Date: Thursday, September 16, 2010, 1:05 AM >>>> Hi Andre, >>>> >>>> changing the entity in your index from donor to >> gift >>>> changes of course >>>> the scope of your search results. I found it >> helpful to >>>> re-think such >>>> change from that "other" side (the result side). >>>> If the users of your search application look for >> individual >>>> gifts, in >>>> the end, then changing the index to gift is for >> the >>>> better. >>>> >>>> If they are searching for donors, then I would >> rethink the >>>> change but >>>> not discard it completely: you can still get the >> list of >>>> distinct donors >>>> by facetting over donors. You can show the users >> that list >>>> of donors >>>> (the facets), and they can chose from it and get >> all >>>> information on that >>>> donor (restricted to the original query, of >> course). The >>>> information >>>> would include the actual search result of a list >> of gifts >>>> that passed >>>> the query. >>>> >>>> Cheers, >>>> Chantal >>>> >>>> On Wed, 2010-09-15 at 21:49 +0200, Andre Bickford >> wrote: >>>> >>>>> Thanks for the response Erick. >>>>> >>>>> I did actually try exactly what you suggested. >> I >>>>> >>>> flipped the index over so that a gift is the >> document. This >>>> solution certainly solves the previous problem, >> but >>>> introduces a new issue where the search results >> show >>>> duplicate donors. If a donor gave 12 times in a >> year, and we >>>> offer full years as facet ranges, my understanding >> is that >>>> you'd see that donor 12 times in the search >> results, once >>>> for each gift document. Obviously I could do some >> client >>>> side filtering to list only distinct donors, but I >> was >>>> hoping to avoid that. >>>> >>>>> If I've simply stumbled into the basic >> tradeoffs of >>>>> >>>> denormalization, I can live with client side >> de-duplication, >>>> but if you have any further suggestions I'm all >> eyes. >>>> >>>>> As for sizing, we have some huge charities as >> clients. >>>>> >>>> However, right now I'm testing on a copy of prod >> data from a >>>> smaller client with ~350,000 donors and ~8,000,000 >> gift >>>> records. So, when I "flipped" the index around as >> you >>>> suggested, it went from 350,000 documents to >> 8,000,000 >>>> documents. No issues with performance at all. >>>> >>>>> Thanks again, >>>>> Andre >>>>> >>>>> -----Original Message----- >>>>> From: Erick Erickson [mailto:erickerick...@gmail.com] >>>>> >>>>> Sent: Wednesday, September 15, 2010 3:09 PM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Re: Simple Filter Query (fq) Use >> Case >>>>> >>>> Question >>>> >>>>> One strategy is to denormalize all the way. >> That is, >>>>> >>>> each >>>> >>>>> Solr "document" is Gift Amount and Gift Date >> would not >>>>> >>>> be multiValued. >>>> >>>>> You'd create a different "document" for each >> gift, so >>>>> >>>> you'd have multiple >>>> >>>>> documents with the same Id, Name, and Address. >> Be >>>>> >>>> careful, though, >>>> >>>>> if you've defined Id as a UniqueKey, you'd >> only have >>>>> >>>> one record/donor. You >>>> >>>>> can handle this easily enough by making a >> composite >>>>> >>>> key of Id+Gift Date >>>> >>>>> (assuming no donor made more than one gift on >> exactly >>>>> >>>> the same date). >>>> >>>>> I know this goes completely against all the >> reflexes >>>>> >>>> you've built up with >>>> >>>>> working with DBs, but... >>>>> >>>>> Can you give us a clue how many donations >> we're >>>>> >>>> talking about here? >>>> >>>>> You'd have to be working with a really big >> nonprofit >>>>> >>>> to get enough documents >>>> >>>>> to have to start worrying about making your >> index >>>>> >>>> smaller. >>>> >>>>> HTH >>>>> Erick >>>>> >>>>> On Wed, Sep 15, 2010 at 1:41 PM, Andre >> Bickford <abickf...@softrek.com>wrote: >>>>> >>>>> >>>>>> I'm working on creating a solr index >> search for a >>>>>> >>>> charitable organization. >>>> >>>>>> The solr index stores documents of donors. >> Each >>>>>> >>>> donor document has the >>>> >>>>>> following four fields: >>>>>> >>>>>> Id >>>>>> Name >>>>>> Address >>>>>> Gift Amount (multiValued) >>>>>> Gift Date (multiValued) >>>>>> >>>>>> In our relational database, there is a >>>>>> >>>> one-to-many relationship between the >>>> >>>>>> DONOR table and the GIFT table. One donor >> can of >>>>>> >>>> course give many gifts over >>>> >>>>>> time. Consequently, I created the Gift >> Amount and >>>>>> >>>> Gift Date fields to be >>>> >>>>>> mutiValued. >>>>>> >>>>>> Now, consider the following query filtered >> for >>>>>> >>>> gifts last month between $0 >>>> >>>>>> and $100: >>>>>> >>>>>> q=name:Jones >>>>>> fq=giftDate:[NOW/MONTH-1 TO NOW/MONTH] >>>>>> fq=giftAmount:[0 TO 100] >>>>>> >>>>>> The results show me donors who donated ANY >> amount >>>>>> >>>> in the past month and >>>> >>>>>> donors who had EVER in the past given a >> gift >>>>>> >>>> between $0 and $100. I was >>>> >>>>>> hoping to only see donors who had given a >> gift >>>>>> >>>> between $0 and $100 in the >>>> >>>>>> past month exclusively. I believe the >> problem is >>>>>> >>>> that I neglected to >>>> >>>>>> consider that for two multiValued fields, >> while >>>>>> >>>> the values might align >>>> >>>>>> "index wise", there is really no other >>>>>> >>>> association between the two fields, >>>> >>>>>> so the filter query intersection isn't >> really >>>>>> >>>> behaving as I expected. >>>> >>>>>> I think this is a fundamental question of >>>>>> >>>> one-to-many denormalization, but >>>> >>>>>> obviously I'm not yet experienced enough >> with >>>>>> >>>> Lucene/Solr to find a >>>> >>>>>> solution. As to why not just keep using a >>>>>> >>>> relational database, it's because >>>> >>>>>> I'm trying to provide a faceting solution >> to >>>>>> >>>> "drill down" to donors. The >>>> >>>>>> aforementioned fq parameters would come >> from >>>>>> >>>> faceting. Oh, that and Oracle >>>> >>>>>> Text indexes are a PITA. :-) >>>>>> >>>>>> Thanks for any help you can provide. >>>>>> >>>>>> André Bickford >>>>>> Software Engineering Team Leader >>>>>> SofTrek Corporation >>>>>> 30 Bryant Woods North Amherst, NY >> 14228 >>>>>> 716.691.2800 x154 800.442.9211 >> Fax: >>>>>> >>>> 716.691.2828 >>>> >>>>>> abickf...@softrek.com >> >>>>>> >>>> www.softrek.com >>>> >>>>>> >>>>>> >>>> >>>> >>> >>> >>