Obvious datasources: MSSQL, MySQL, etc. I'm under the impression that I have
to send an XML request to SOLR for every add, update, delete, etc. in my
database.

I believe there's a way to access MSSQL, MySQL etc. directly with Lucene,
but not sure how to do this with SOLR.

Thanks for all your feedback. While I started out way over my head. Solr is
actually fun to play around with, even for non-programmers or marginal
programmers like myself.

On 9/22/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 9/22/06, Tim Archambault <[EMAIL PROTECTED]> wrote:
> I have a couple of questions from some online newspaper folks who are
> interested in Solr and are trying to understand how and why it came to
be. I
> think inherent in these questions is the underlying theme I hear all the
> time and that is "Solr is not a content management system. It's a search
> engine."
>
> What I really wonder about CNet is how they manage their content and how
> Solr fits into their overall architecture -- is it an add-on? a
> purpose-built hammer to handle a specific problem they were having? was
it
> something they "wanted" ... or instead something they needed to do,
despite
> preferring something else?

Putting on my CNET hat for a little history:

We had a search server... a very thin layer built around a proprietary
search engine, used in a ton of places, for search-box type
functionality and direct generation of dynamic content.

That search engine was being discontinued by the vendor, so a
replacement was needed.  RFPs were put out, and all the commercial
alternatives were examined, but licensing costs  for the number of
servers we were talking about was exorbitant.

So we decided to build our own...

The replacement: ATOMICS- a MySQL/Apache hybrid.
http://conferences.oreillynet.com/cs/mysqluc2005/view/e_sess/7066
It works well for many of the search collections we have that don't
need much in the way of full-text search (MySQL does have full-text
capabilities, but nothing like Lucene).

Backup plan: something based on Lucene.
SOLAR really started out as a pure backup plan... just in case ATOMICS
had problems in some areas.  I had joined CNET a week earlier, and the
task of building "something lucene-based" was luckily handed to me as
I didn't have any other responsibilities yet.  Pretty much no
requirements except for the preference of something that spoke
HTTP/XML that could be put behind a load-balancer and scaled.

ATOMICS was pretty much done by the time I started on SOLAR, and was
rapidly deployed across CNET.  SOLAR had a tough time gaining traction
until someone crossed a problem that ATOMICS couldn't easily handle:
faceted browsing.  There was finally something concrete to aim for,
and filter caching, docsets, autowarming, custom query handlers, etc,
were rapidly added to allow the ability to write custom plugins that
could acutally do the faceting logic.

The result:
http://www.mail-archive.com/java-user@lucene.apache.org/msg02645.html

It soulds like Hoss might go into some more details in his ApacheCon
session:
http://www.us.apachecon.com/html/sessions.html#FR26

> Another question asked of me was "Will Solr ever connect with
datasources
> directly?"

As far as where Solr fits into our architecture, it's a back-end
component in the generation of dynamic content... sort of the same
place that a database would occupy.

I don't know much about content generation in CNET, and specific
content manangement syustems, but a lot of it ends up in databases.
An "indexer" piece normally pulls stuff from one or more databases,
and puts them into a solr master, which is replicated out to solr
searchers (or slaves) that the app-servers generating dynamic content
hit through a load-balancer.

There is a diagram of that from my ApacheCon presentation:
http://people.apache.org/~yonik/ApacheConEU2006/

As far as connecting to datasources directly... I think that being
able to pull content from a database is a good idea, and It's on the
todo list.  What specific other data sources did you have in mind?

-Yonik

Reply via email to