This brings me to ask a question that's been on my mind for awhile. Are indexes set up for the whole site, or a set of searches, with several different indexes for a site?
How many instances does one Solr/Lucene instance have access to, (not counting shards/segments)? Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/16/10, Chantal Ackermann <chantal.ackerm...@btelligent.de> wrote: > From: Chantal Ackermann <chantal.ackerm...@btelligent.de> > Subject: RE: Simple Filter Query (fq) Use Case Question > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Date: Thursday, September 16, 2010, 1:05 AM > Hi Andre, > > changing the entity in your index from donor to gift > changes of course > the scope of your search results. I found it helpful to > re-think such > change from that "other" side (the result side). > If the users of your search application look for individual > gifts, in > the end, then changing the index to gift is for the > better. > > If they are searching for donors, then I would rethink the > change but > not discard it completely: you can still get the list of > distinct donors > by facetting over donors. You can show the users that list > of donors > (the facets), and they can chose from it and get all > information on that > donor (restricted to the original query, of course). The > information > would include the actual search result of a list of gifts > that passed > the query. > > Cheers, > Chantal > > On Wed, 2010-09-15 at 21:49 +0200, Andre Bickford wrote: > > Thanks for the response Erick. > > > > I did actually try exactly what you suggested. I > flipped the index over so that a gift is the document. This > solution certainly solves the previous problem, but > introduces a new issue where the search results show > duplicate donors. If a donor gave 12 times in a year, and we > offer full years as facet ranges, my understanding is that > you'd see that donor 12 times in the search results, once > for each gift document. Obviously I could do some client > side filtering to list only distinct donors, but I was > hoping to avoid that. > > > > If I've simply stumbled into the basic tradeoffs of > denormalization, I can live with client side de-duplication, > but if you have any further suggestions I'm all eyes. > > > > As for sizing, we have some huge charities as clients. > However, right now I'm testing on a copy of prod data from a > smaller client with ~350,000 donors and ~8,000,000 gift > records. So, when I "flipped" the index around as you > suggested, it went from 350,000 documents to 8,000,000 > documents. No issues with performance at all. > > > > Thanks again, > > Andre > > > > -----Original Message----- > > From: Erick Erickson [mailto:erickerick...@gmail.com] > > > Sent: Wednesday, September 15, 2010 3:09 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Simple Filter Query (fq) Use Case > Question > > > > One strategy is to denormalize all the way. That is, > each > > Solr "document" is Gift Amount and Gift Date would not > be multiValued. > > You'd create a different "document" for each gift, so > you'd have multiple > > documents with the same Id, Name, and Address. Be > careful, though, > > if you've defined Id as a UniqueKey, you'd only have > one record/donor. You > > can handle this easily enough by making a composite > key of Id+Gift Date > > (assuming no donor made more than one gift on exactly > the same date). > > > > I know this goes completely against all the reflexes > you've built up with > > working with DBs, but... > > > > Can you give us a clue how many donations we're > talking about here? > > You'd have to be working with a really big nonprofit > to get enough documents > > to have to start worrying about making your index > smaller. > > > > HTH > > Erick > > > > On Wed, Sep 15, 2010 at 1:41 PM, Andre Bickford > > <abickf...@softrek.com>wrote: > > > > > I'm working on creating a solr index search for a > charitable organization. > > > The solr index stores documents of donors. Each > donor document has the > > > following four fields: > > > > > > Id > > > Name > > > Address > > > Gift Amount (multiValued) > > > Gift Date (multiValued) > > > > > > In our relational database, there is a > one-to-many relationship between the > > > DONOR table and the GIFT table. One donor can of > course give many gifts over > > > time. Consequently, I created the Gift Amount and > Gift Date fields to be > > > mutiValued. > > > > > > Now, consider the following query filtered for > gifts last month between $0 > > > and $100: > > > > > > q=name:Jones > > > fq=giftDate:[NOW/MONTH-1 TO NOW/MONTH] > > > fq=giftAmount:[0 TO 100] > > > > > > The results show me donors who donated ANY amount > in the past month and > > > donors who had EVER in the past given a gift > between $0 and $100. I was > > > hoping to only see donors who had given a gift > between $0 and $100 in the > > > past month exclusively. I believe the problem is > that I neglected to > > > consider that for two multiValued fields, while > the values might align > > > "index wise", there is really no other > association between the two fields, > > > so the filter query intersection isn't really > behaving as I expected. > > > > > > I think this is a fundamental question of > one-to-many denormalization, but > > > obviously I'm not yet experienced enough with > Lucene/Solr to find a > > > solution. As to why not just keep using a > relational database, it's because > > > I'm trying to provide a faceting solution to > "drill down" to donors. The > > > aforementioned fq parameters would come from > faceting. Oh, that and Oracle > > > Text indexes are a PITA. :-) > > > > > > Thanks for any help you can provide. > > > > > > André Bickford > > > Software Engineering Team Leader > > > SofTrek Corporation > > > 30 Bryant Woods North Amherst, NY 14228 > > > 716.691.2800 x154 800.442.9211 Fax: > 716.691.2828 > > > abickf...@softrek.com > www.softrek.com > > > > > > > > > > > > > >