Re: schema help

Otis Gospodnetic Tue, 11 Mar 2008 20:42:34 -0700

Geoff, some comments inlined.

----- Original Message ----
From: Geoffrey Young <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, March 11, 2008 4:55:15 PM
Subject: Re: schema help

Otis Gospodnetic wrote:
> Geoff,
> 
> I'm not sure if I understood your problem correctly, but it sounds
> like you want your search to be restricted to authors, but then you
> want to list all of his/her books when displaying results. 

that's about right.  add that I may also want to search on libraries and 
show all the books (and authors) stored there.

OG: That's fine.  One page (of results) at a time, I imagine.

in real life, it's not books or authors, of course, but the parallels 
are close enough :)  in fact, the library example is a good one for 
me... or at least a network of public libraries linked together.

> The
> easiest thing to do would be to create an index where each
> "row"/Document has the author name, the book title, etc.  For each
> author-matching Document you'd pull his/her books out of the result
> set.  Yes, this means the author name would be denormalized in
> RDBMS-speak.  

I think I can live with the denormalization - it seems lucene is flat 
and very different conceptually than a database :)

OG: Right, it is. :)

the trouble I'm having is one of dimension.  an author has many, many 
attributes (name, birthdate, biography in $language, etc).  as does each 
book (title in $language, summary in $language, genre, etc).  as does 
each library (name, address, directions in $language, etc).  so an 
author with N books doesn't seem to scale very well in the flat 
representations I'm finding in all the lucene/solr docs and examples... 
at least not in some way I can wrap my head around.

OG: I'm not sure why the number of attributes worries you.  Imagine is as a 
wide RDBMS table, if it helps.  Indices with dozens of fields are not uncommon.

part of what seemed really appealing about lucene in general was that 
you could stuff all this (unindexed) information into a document and 
retrieve it all based on some search criteria.  but it's seeming very 
difficult for me to wrap my head around the data I need to represent.

OG: You certainly can do that.  I'm not sure I understand where the hard part 
is.  You seem to know what attributes each entity has.  Maybe you are confused 
by how to handle N different types of entities in a single index? (I'm assuming 
a single index is what you currently have in mind)

> Another option is not to index/store book titles, but
> rather have only an author index to search against.  The book data
> (mapped to author identities) would then be pulled from an external
> source (e.g. RDBMS: select title from books where author_id in
> (1,2,3)) at search results display time.

eew :)  seriously, though, that's what we have now - all rdbms driven. 
if solr could only conceptually handle the initial lookup there wouldn't 
be much point.

OG: Well, there might or might not be, depending on how much data you have, how 
flexible and fast your RDBMS-powered (full-text?) search, and so on.  The 
Lucene/Solr for full-text search + RDBMS/BDB for display data is a common 
combination.

maybe I'm thinking about this all wrong (as is to be expected :), but I 
just can't believe that nobody is using solr to represent data a bit 
more complex than the examples out there.

OG: Oh, lots of people are, it's just that examples are simple, so people new 
to Solr, Lucene, etc. have easier time learning.

Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

> 
> Otis
> 
> -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ---- From: Geoffrey Young
> <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent:
> Tuesday, March 11, 2008 12:17:32 PM Subject: schema help
> 
> hi :)
> 
> I'm trying to work out a schema for our widgets.  more than "just
> coming up with something" I'd like something idiomatic in solr terms.
> any help is much appreciated.  here's a similar problem space to what
> I'm working with...
> 
> lets say we're talking books.  books are written by authors and held
> in libraries.  a sister company is using lucene+compass and they seem
> to have completely different collections (or whatever the technical
> term is :)
> 
> authors books libraries
> 
> so that a search for authors hits only the authors dataset.
> 
> all of the solr examples I can find don't seem to address this kind
> of data disparity.  what is the standard and idiomatic approach for
> solr?
> 
> for my particular data I'd want to display something like this
> 
> author book in library book in library
> 
> on the same result page, but using a completely flat, single schema 
> doesn't seem to scale very well.
> 
> collective widsom most welcome :)
> 
> --Geoff
> 
>

Re: schema help

Reply via email to