Geoff, some comments inlined. ----- Original Message ---- From: Geoffrey Young <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, March 11, 2008 4:55:15 PM Subject: Re: schema help
Otis Gospodnetic wrote: > Geoff, > > I'm not sure if I understood your problem correctly, but it sounds > like you want your search to be restricted to authors, but then you > want to list all of his/her books when displaying results. that's about right. add that I may also want to search on libraries and show all the books (and authors) stored there. OG: That's fine. One page (of results) at a time, I imagine. in real life, it's not books or authors, of course, but the parallels are close enough :) in fact, the library example is a good one for me... or at least a network of public libraries linked together. > The > easiest thing to do would be to create an index where each > "row"/Document has the author name, the book title, etc. For each > author-matching Document you'd pull his/her books out of the result > set. Yes, this means the author name would be denormalized in > RDBMS-speak. I think I can live with the denormalization - it seems lucene is flat and very different conceptually than a database :) OG: Right, it is. :) the trouble I'm having is one of dimension. an author has many, many attributes (name, birthdate, biography in $language, etc). as does each book (title in $language, summary in $language, genre, etc). as does each library (name, address, directions in $language, etc). so an author with N books doesn't seem to scale very well in the flat representations I'm finding in all the lucene/solr docs and examples... at least not in some way I can wrap my head around. OG: I'm not sure why the number of attributes worries you. Imagine is as a wide RDBMS table, if it helps. Indices with dozens of fields are not uncommon. part of what seemed really appealing about lucene in general was that you could stuff all this (unindexed) information into a document and retrieve it all based on some search criteria. but it's seeming very difficult for me to wrap my head around the data I need to represent. OG: You certainly can do that. I'm not sure I understand where the hard part is. You seem to know what attributes each entity has. Maybe you are confused by how to handle N different types of entities in a single index? (I'm assuming a single index is what you currently have in mind) > Another option is not to index/store book titles, but > rather have only an author index to search against. The book data > (mapped to author identities) would then be pulled from an external > source (e.g. RDBMS: select title from books where author_id in > (1,2,3)) at search results display time. eew :) seriously, though, that's what we have now - all rdbms driven. if solr could only conceptually handle the initial lookup there wouldn't be much point. OG: Well, there might or might not be, depending on how much data you have, how flexible and fast your RDBMS-powered (full-text?) search, and so on. The Lucene/Solr for full-text search + RDBMS/BDB for display data is a common combination. maybe I'm thinking about this all wrong (as is to be expected :), but I just can't believe that nobody is using solr to represent data a bit more complex than the examples out there. OG: Oh, lots of people are, it's just that examples are simple, so people new to Solr, Lucene, etc. have easier time learning. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > Otis > > -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- From: Geoffrey Young > <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: > Tuesday, March 11, 2008 12:17:32 PM Subject: schema help > > hi :) > > I'm trying to work out a schema for our widgets. more than "just > coming up with something" I'd like something idiomatic in solr terms. > any help is much appreciated. here's a similar problem space to what > I'm working with... > > lets say we're talking books. books are written by authors and held > in libraries. a sister company is using lucene+compass and they seem > to have completely different collections (or whatever the technical > term is :) > > authors books libraries > > so that a search for authors hits only the authors dataset. > > all of the solr examples I can find don't seem to address this kind > of data disparity. what is the standard and idiomatic approach for > solr? > > for my particular data I'd want to display something like this > > author book in library book in library > > on the same result page, but using a completely flat, single schema > doesn't seem to scale very well. > > collective widsom most welcome :) > > --Geoff > >