Is a core a running piece of software, or just an index/config pairing?
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 9/16/10, Jonathan Rochkind <rochk...@jhu.edu> wrote:

> From: Jonathan Rochkind <rochk...@jhu.edu>
> Subject: Re: Simple Filter Query (fq) Use Case Question
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Date: Thursday, September 16, 2010, 11:20 AM
> One solr core has essentially one
> index in it. (not only one 'field', 
> but one indexed collection of documents) There are weird
> hacks, like I 
> believe the spellcheck component kind of creates it's own
> sub-indexes, 
> not sure how it does that.
> 
> You can have more than one core in a single solr instance,
> but they're 
> essentially seperate, there's no easy way to 'join' accross
> them or 
> anything, a given request targets one core.
> 
> Dennis Gearon wrote:
> > This brings me to ask a question that's been on my
> mind for awhile.
> >
> > Are indexes set up for the whole site, or a set of
> searches, with several different indexes for a site?
> >
> > How many instances does one Solr/Lucene instance have
> access to, (not counting shards/segments)?
> > Dennis Gearon
> >
> > Signature Warning
> > ----------------
> > EARTH has a Right To Life,
> >   otherwise we all die.
> >
> > Read 'Hot, Flat, and Crowded'
> > Laugh at http://www.yert.com/film.php
> >
> >
> > --- On Thu, 9/16/10, Chantal Ackermann <chantal.ackerm...@btelligent.de>
> wrote:
> >
> >   
> >> From: Chantal Ackermann <chantal.ackerm...@btelligent.de>
> >> Subject: RE: Simple Filter Query (fq) Use Case
> Question
> >> To: "solr-user@lucene.apache.org"
> <solr-user@lucene.apache.org>
> >> Date: Thursday, September 16, 2010, 1:05 AM
> >> Hi Andre,
> >>
> >> changing the entity in your index from donor to
> gift
> >> changes of course
> >> the scope of your search results. I found it
> helpful to
> >> re-think such
> >> change from that "other" side (the result side).
> >> If the users of your search application look for
> individual
> >> gifts, in
> >> the end, then changing the index to gift is for
> the
> >> better.
> >>
> >> If they are searching for donors, then I would
> rethink the
> >> change but
> >> not discard it completely: you can still get the
> list of
> >> distinct donors
> >> by facetting over donors. You can show the users
> that list
> >> of donors
> >> (the facets), and they can chose from it and get
> all
> >> information on that
> >> donor (restricted to the original query, of
> course). The
> >> information
> >> would include the actual search result of a list
> of gifts
> >> that passed
> >> the query.
> >>
> >> Cheers,
> >> Chantal
> >>
> >> On Wed, 2010-09-15 at 21:49 +0200, Andre Bickford
> wrote:
> >>     
> >>> Thanks for the response Erick.
> >>>
> >>> I did actually try exactly what you suggested.
> I
> >>>       
> >> flipped the index over so that a gift is the
> document. This
> >> solution certainly solves the previous problem,
> but
> >> introduces a new issue where the search results
> show
> >> duplicate donors. If a donor gave 12 times in a
> year, and we
> >> offer full years as facet ranges, my understanding
> is that
> >> you'd see that donor 12 times in the search
> results, once
> >> for each gift document. Obviously I could do some
> client
> >> side filtering to list only distinct donors, but I
> was
> >> hoping to avoid that.
> >>     
> >>> If I've simply stumbled into the basic
> tradeoffs of
> >>>       
> >> denormalization, I can live with client side
> de-duplication,
> >> but if you have any further suggestions I'm all
> eyes.
> >>     
> >>> As for sizing, we have some huge charities as
> clients.
> >>>       
> >> However, right now I'm testing on a copy of prod
> data from a
> >> smaller client with ~350,000 donors and ~8,000,000
> gift
> >> records. So, when I "flipped" the index around as
> you
> >> suggested, it went from 350,000 documents to
> 8,000,000
> >> documents. No issues with performance at all.
> >>     
> >>> Thanks again,
> >>> Andre
> >>>
> >>> -----Original Message-----
> >>> From: Erick Erickson [mailto:erickerick...@gmail.com]
> >>>       
> >>> Sent: Wednesday, September 15, 2010 3:09 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Simple Filter Query (fq) Use
> Case
> >>>       
> >> Question
> >>     
> >>> One strategy is to denormalize all the way.
> That is,
> >>>       
> >> each
> >>     
> >>> Solr "document" is Gift Amount and Gift Date
> would not
> >>>       
> >> be multiValued.
> >>     
> >>> You'd create a different "document" for each
> gift, so
> >>>       
> >> you'd have multiple
> >>     
> >>> documents with the same Id, Name, and Address.
> Be
> >>>       
> >> careful, though,
> >>     
> >>> if you've defined Id as a UniqueKey, you'd
> only have
> >>>       
> >> one record/donor. You
> >>     
> >>> can handle this easily enough by making a
> composite
> >>>       
> >> key of Id+Gift Date
> >>     
> >>> (assuming no donor made more than one gift on
> exactly
> >>>       
> >> the same date).
> >>     
> >>> I know this goes completely against all the
> reflexes
> >>>       
> >> you've built up with
> >>     
> >>> working with DBs, but...
> >>>
> >>> Can you give us a clue how many donations
> we're
> >>>       
> >> talking about here?
> >>     
> >>> You'd have to be working with a really big
> nonprofit
> >>>       
> >> to get enough documents
> >>     
> >>> to have to start worrying about making your
> index
> >>>       
> >> smaller.
> >>     
> >>> HTH
> >>> Erick
> >>>
> >>> On Wed, Sep 15, 2010 at 1:41 PM, Andre
> Bickford <abickf...@softrek.com>wrote:
> >>>
> >>>       
> >>>> I'm working on creating a solr index
> search for a
> >>>>         
> >> charitable organization.
> >>     
> >>>> The solr index stores documents of donors.
> Each
> >>>>         
> >> donor document has the
> >>     
> >>>> following four fields:
> >>>>
> >>>> Id
> >>>> Name
> >>>> Address
> >>>> Gift Amount (multiValued)
> >>>> Gift Date (multiValued)
> >>>>
> >>>> In our relational database, there is a
> >>>>         
> >> one-to-many relationship between the
> >>     
> >>>> DONOR table and the GIFT table. One donor
> can of
> >>>>         
> >> course give many gifts over
> >>     
> >>>> time. Consequently, I created the Gift
> Amount and
> >>>>         
> >> Gift Date fields to be
> >>     
> >>>> mutiValued.
> >>>>
> >>>> Now, consider the following query filtered
> for
> >>>>         
> >> gifts last month between $0
> >>     
> >>>> and $100:
> >>>>
> >>>> q=name:Jones
> >>>> fq=giftDate:[NOW/MONTH-1 TO NOW/MONTH]
> >>>> fq=giftAmount:[0 TO 100]
> >>>>
> >>>> The results show me donors who donated ANY
> amount
> >>>>         
> >> in the past month and
> >>     
> >>>> donors who had EVER in the past given a
> gift
> >>>>         
> >> between $0 and $100. I was
> >>     
> >>>> hoping to only see donors who had given a
> gift
> >>>>         
> >> between $0 and $100 in the
> >>     
> >>>> past month exclusively. I believe the
> problem is
> >>>>         
> >> that I neglected to
> >>     
> >>>> consider that for two multiValued fields,
> while
> >>>>         
> >> the values might align
> >>     
> >>>> "index wise", there is really no other
> >>>>         
> >> association between the two fields,
> >>     
> >>>> so the filter query intersection isn't
> really
> >>>>         
> >> behaving as I expected.
> >>     
> >>>> I think this is a fundamental question of
> >>>>         
> >> one-to-many denormalization, but
> >>     
> >>>> obviously I'm not yet experienced enough
> with
> >>>>         
> >> Lucene/Solr to find a
> >>     
> >>>> solution. As to why not just keep using a
> >>>>         
> >> relational database, it's because
> >>     
> >>>> I'm trying to provide a faceting solution
> to
> >>>>         
> >> "drill down" to donors. The
> >>     
> >>>> aforementioned fq parameters would come
> from
> >>>>         
> >> faceting. Oh, that and Oracle
> >>     
> >>>> Text indexes are a PITA. :-)
> >>>>
> >>>> Thanks for any help you can provide.
> >>>>
> >>>> André Bickford
> >>>> Software Engineering Team Leader
> >>>> SofTrek Corporation
> >>>> 30 Bryant Woods North  Amherst, NY
> 14228
> >>>> 716.691.2800 x154  800.442.9211 
> Fax:
> >>>>         
> >> 716.691.2828
> >>     
> >>>> abickf...@softrek.com
> 
> >>>>         
> >> www.softrek.com
> >>     
> >>>>
> >>>>         
> >>
> >>     
> >
> >   
>

Reply via email to