Is a core a running piece of software, or just an index/config pairing? Dennis Gearon
Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/16/10, Jonathan Rochkind <rochk...@jhu.edu> wrote: > From: Jonathan Rochkind <rochk...@jhu.edu> > Subject: Re: Simple Filter Query (fq) Use Case Question > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Date: Thursday, September 16, 2010, 11:20 AM > One solr core has essentially one > index in it. (not only one 'field', > but one indexed collection of documents) There are weird > hacks, like I > believe the spellcheck component kind of creates it's own > sub-indexes, > not sure how it does that. > > You can have more than one core in a single solr instance, > but they're > essentially seperate, there's no easy way to 'join' accross > them or > anything, a given request targets one core. > > Dennis Gearon wrote: > > This brings me to ask a question that's been on my > mind for awhile. > > > > Are indexes set up for the whole site, or a set of > searches, with several different indexes for a site? > > > > How many instances does one Solr/Lucene instance have > access to, (not counting shards/segments)? > > Dennis Gearon > > > > Signature Warning > > ---------------- > > EARTH has a Right To Life, > > otherwise we all die. > > > > Read 'Hot, Flat, and Crowded' > > Laugh at http://www.yert.com/film.php > > > > > > --- On Thu, 9/16/10, Chantal Ackermann <chantal.ackerm...@btelligent.de> > wrote: > > > > > >> From: Chantal Ackermann <chantal.ackerm...@btelligent.de> > >> Subject: RE: Simple Filter Query (fq) Use Case > Question > >> To: "solr-user@lucene.apache.org" > <solr-user@lucene.apache.org> > >> Date: Thursday, September 16, 2010, 1:05 AM > >> Hi Andre, > >> > >> changing the entity in your index from donor to > gift > >> changes of course > >> the scope of your search results. I found it > helpful to > >> re-think such > >> change from that "other" side (the result side). > >> If the users of your search application look for > individual > >> gifts, in > >> the end, then changing the index to gift is for > the > >> better. > >> > >> If they are searching for donors, then I would > rethink the > >> change but > >> not discard it completely: you can still get the > list of > >> distinct donors > >> by facetting over donors. You can show the users > that list > >> of donors > >> (the facets), and they can chose from it and get > all > >> information on that > >> donor (restricted to the original query, of > course). The > >> information > >> would include the actual search result of a list > of gifts > >> that passed > >> the query. > >> > >> Cheers, > >> Chantal > >> > >> On Wed, 2010-09-15 at 21:49 +0200, Andre Bickford > wrote: > >> > >>> Thanks for the response Erick. > >>> > >>> I did actually try exactly what you suggested. > I > >>> > >> flipped the index over so that a gift is the > document. This > >> solution certainly solves the previous problem, > but > >> introduces a new issue where the search results > show > >> duplicate donors. If a donor gave 12 times in a > year, and we > >> offer full years as facet ranges, my understanding > is that > >> you'd see that donor 12 times in the search > results, once > >> for each gift document. Obviously I could do some > client > >> side filtering to list only distinct donors, but I > was > >> hoping to avoid that. > >> > >>> If I've simply stumbled into the basic > tradeoffs of > >>> > >> denormalization, I can live with client side > de-duplication, > >> but if you have any further suggestions I'm all > eyes. > >> > >>> As for sizing, we have some huge charities as > clients. > >>> > >> However, right now I'm testing on a copy of prod > data from a > >> smaller client with ~350,000 donors and ~8,000,000 > gift > >> records. So, when I "flipped" the index around as > you > >> suggested, it went from 350,000 documents to > 8,000,000 > >> documents. No issues with performance at all. > >> > >>> Thanks again, > >>> Andre > >>> > >>> -----Original Message----- > >>> From: Erick Erickson [mailto:erickerick...@gmail.com] > >>> > >>> Sent: Wednesday, September 15, 2010 3:09 PM > >>> To: solr-user@lucene.apache.org > >>> Subject: Re: Simple Filter Query (fq) Use > Case > >>> > >> Question > >> > >>> One strategy is to denormalize all the way. > That is, > >>> > >> each > >> > >>> Solr "document" is Gift Amount and Gift Date > would not > >>> > >> be multiValued. > >> > >>> You'd create a different "document" for each > gift, so > >>> > >> you'd have multiple > >> > >>> documents with the same Id, Name, and Address. > Be > >>> > >> careful, though, > >> > >>> if you've defined Id as a UniqueKey, you'd > only have > >>> > >> one record/donor. You > >> > >>> can handle this easily enough by making a > composite > >>> > >> key of Id+Gift Date > >> > >>> (assuming no donor made more than one gift on > exactly > >>> > >> the same date). > >> > >>> I know this goes completely against all the > reflexes > >>> > >> you've built up with > >> > >>> working with DBs, but... > >>> > >>> Can you give us a clue how many donations > we're > >>> > >> talking about here? > >> > >>> You'd have to be working with a really big > nonprofit > >>> > >> to get enough documents > >> > >>> to have to start worrying about making your > index > >>> > >> smaller. > >> > >>> HTH > >>> Erick > >>> > >>> On Wed, Sep 15, 2010 at 1:41 PM, Andre > Bickford <abickf...@softrek.com>wrote: > >>> > >>> > >>>> I'm working on creating a solr index > search for a > >>>> > >> charitable organization. > >> > >>>> The solr index stores documents of donors. > Each > >>>> > >> donor document has the > >> > >>>> following four fields: > >>>> > >>>> Id > >>>> Name > >>>> Address > >>>> Gift Amount (multiValued) > >>>> Gift Date (multiValued) > >>>> > >>>> In our relational database, there is a > >>>> > >> one-to-many relationship between the > >> > >>>> DONOR table and the GIFT table. One donor > can of > >>>> > >> course give many gifts over > >> > >>>> time. Consequently, I created the Gift > Amount and > >>>> > >> Gift Date fields to be > >> > >>>> mutiValued. > >>>> > >>>> Now, consider the following query filtered > for > >>>> > >> gifts last month between $0 > >> > >>>> and $100: > >>>> > >>>> q=name:Jones > >>>> fq=giftDate:[NOW/MONTH-1 TO NOW/MONTH] > >>>> fq=giftAmount:[0 TO 100] > >>>> > >>>> The results show me donors who donated ANY > amount > >>>> > >> in the past month and > >> > >>>> donors who had EVER in the past given a > gift > >>>> > >> between $0 and $100. I was > >> > >>>> hoping to only see donors who had given a > gift > >>>> > >> between $0 and $100 in the > >> > >>>> past month exclusively. I believe the > problem is > >>>> > >> that I neglected to > >> > >>>> consider that for two multiValued fields, > while > >>>> > >> the values might align > >> > >>>> "index wise", there is really no other > >>>> > >> association between the two fields, > >> > >>>> so the filter query intersection isn't > really > >>>> > >> behaving as I expected. > >> > >>>> I think this is a fundamental question of > >>>> > >> one-to-many denormalization, but > >> > >>>> obviously I'm not yet experienced enough > with > >>>> > >> Lucene/Solr to find a > >> > >>>> solution. As to why not just keep using a > >>>> > >> relational database, it's because > >> > >>>> I'm trying to provide a faceting solution > to > >>>> > >> "drill down" to donors. The > >> > >>>> aforementioned fq parameters would come > from > >>>> > >> faceting. Oh, that and Oracle > >> > >>>> Text indexes are a PITA. :-) > >>>> > >>>> Thanks for any help you can provide. > >>>> > >>>> André Bickford > >>>> Software Engineering Team Leader > >>>> SofTrek Corporation > >>>> 30 Bryant Woods North Amherst, NY > 14228 > >>>> 716.691.2800 x154 800.442.9211 > Fax: > >>>> > >> 716.691.2828 > >> > >>>> abickf...@softrek.com > > >>>> > >> www.softrek.com > >> > >>>> > >>>> > >> > >> > > > > >