This brings me to ask a question that's been on my mind for awhile.

Are indexes set up for the whole site, or a set of searches, with several 
different indexes for a site?

How many instances does one Solr/Lucene instance have access to, (not counting 
shards/segments)?
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 9/16/10, Chantal Ackermann <chantal.ackerm...@btelligent.de> wrote:

> From: Chantal Ackermann <chantal.ackerm...@btelligent.de>
> Subject: RE: Simple Filter Query (fq) Use Case Question
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Date: Thursday, September 16, 2010, 1:05 AM
> Hi Andre,
> 
> changing the entity in your index from donor to gift
> changes of course
> the scope of your search results. I found it helpful to
> re-think such
> change from that "other" side (the result side).
> If the users of your search application look for individual
> gifts, in
> the end, then changing the index to gift is for the
> better.
> 
> If they are searching for donors, then I would rethink the
> change but
> not discard it completely: you can still get the list of
> distinct donors
> by facetting over donors. You can show the users that list
> of donors
> (the facets), and they can chose from it and get all
> information on that
> donor (restricted to the original query, of course). The
> information
> would include the actual search result of a list of gifts
> that passed
> the query.
> 
> Cheers,
> Chantal
> 
> On Wed, 2010-09-15 at 21:49 +0200, Andre Bickford wrote:
> > Thanks for the response Erick.
> > 
> > I did actually try exactly what you suggested. I
> flipped the index over so that a gift is the document. This
> solution certainly solves the previous problem, but
> introduces a new issue where the search results show
> duplicate donors. If a donor gave 12 times in a year, and we
> offer full years as facet ranges, my understanding is that
> you'd see that donor 12 times in the search results, once
> for each gift document. Obviously I could do some client
> side filtering to list only distinct donors, but I was
> hoping to avoid that.
> > 
> > If I've simply stumbled into the basic tradeoffs of
> denormalization, I can live with client side de-duplication,
> but if you have any further suggestions I'm all eyes.
> > 
> > As for sizing, we have some huge charities as clients.
> However, right now I'm testing on a copy of prod data from a
> smaller client with ~350,000 donors and ~8,000,000 gift
> records. So, when I "flipped" the index around as you
> suggested, it went from 350,000 documents to 8,000,000
> documents. No issues with performance at all.
> > 
> > Thanks again,
> > Andre
> > 
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> 
> > Sent: Wednesday, September 15, 2010 3:09 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Simple Filter Query (fq) Use Case
> Question
> > 
> > One strategy is to denormalize all the way. That is,
> each
> > Solr "document" is Gift Amount and Gift Date would not
> be multiValued.
> > You'd create a different "document" for each gift, so
> you'd have multiple
> > documents with the same Id, Name, and Address. Be
> careful, though,
> > if you've defined Id as a UniqueKey, you'd only have
> one record/donor. You
> > can handle this easily enough by making a composite
> key of Id+Gift Date
> > (assuming no donor made more than one gift on exactly
> the same date).
> > 
> > I know this goes completely against all the reflexes
> you've built up with
> > working with DBs, but...
> > 
> > Can you give us a clue how many donations we're
> talking about here?
> > You'd have to be working with a really big nonprofit
> to get enough documents
> > to have to start worrying about making your index
> smaller.
> > 
> > HTH
> > Erick
> > 
> > On Wed, Sep 15, 2010 at 1:41 PM, Andre Bickford 
> > <abickf...@softrek.com>wrote:
> > 
> > > I'm working on creating a solr index search for a
> charitable organization.
> > > The solr index stores documents of donors. Each
> donor document has the
> > > following four fields:
> > >
> > > Id
> > > Name
> > > Address
> > > Gift Amount (multiValued)
> > > Gift Date (multiValued)
> > >
> > > In our relational database, there is a
> one-to-many relationship between the
> > > DONOR table and the GIFT table. One donor can of
> course give many gifts over
> > > time. Consequently, I created the Gift Amount and
> Gift Date fields to be
> > > mutiValued.
> > >
> > > Now, consider the following query filtered for
> gifts last month between $0
> > > and $100:
> > >
> > > q=name:Jones
> > > fq=giftDate:[NOW/MONTH-1 TO NOW/MONTH]
> > > fq=giftAmount:[0 TO 100]
> > >
> > > The results show me donors who donated ANY amount
> in the past month and
> > > donors who had EVER in the past given a gift
> between $0 and $100. I was
> > > hoping to only see donors who had given a gift
> between $0 and $100 in the
> > > past month exclusively. I believe the problem is
> that I neglected to
> > > consider that for two multiValued fields, while
> the values might align
> > > "index wise", there is really no other
> association between the two fields,
> > > so the filter query intersection isn't really
> behaving as I expected.
> > >
> > > I think this is a fundamental question of
> one-to-many denormalization, but
> > > obviously I'm not yet experienced enough with
> Lucene/Solr to find a
> > > solution. As to why not just keep using a
> relational database, it's because
> > > I'm trying to provide a faceting solution to
> "drill down" to donors. The
> > > aforementioned fq parameters would come from
> faceting. Oh, that and Oracle
> > > Text indexes are a PITA. :-)
> > >
> > > Thanks for any help you can provide.
> > >
> > > André Bickford
> > > Software Engineering Team Leader
> > > SofTrek Corporation
> > > 30 Bryant Woods North  Amherst, NY 14228
> > > 716.691.2800 x154  800.442.9211  Fax:
> 716.691.2828
> > > abickf...@softrek.com 
> www.softrek.com
> > >
> > >
> > >
> > 
> 
> 
>

Reply via email to