Obvious datasources: MSSQL, MySQL, etc. I'm under the impression that I have to send an XML request to SOLR for every add, update, delete, etc. in my database.
I believe there's a way to access MSSQL, MySQL etc. directly with Lucene, but not sure how to do this with SOLR. Thanks for all your feedback. While I started out way over my head. Solr is actually fun to play around with, even for non-programmers or marginal programmers like myself. On 9/22/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 9/22/06, Tim Archambault <[EMAIL PROTECTED]> wrote: > I have a couple of questions from some online newspaper folks who are > interested in Solr and are trying to understand how and why it came to be. I > think inherent in these questions is the underlying theme I hear all the > time and that is "Solr is not a content management system. It's a search > engine." > > What I really wonder about CNet is how they manage their content and how > Solr fits into their overall architecture -- is it an add-on? a > purpose-built hammer to handle a specific problem they were having? was it > something they "wanted" ... or instead something they needed to do, despite > preferring something else? Putting on my CNET hat for a little history: We had a search server... a very thin layer built around a proprietary search engine, used in a ton of places, for search-box type functionality and direct generation of dynamic content. That search engine was being discontinued by the vendor, so a replacement was needed. RFPs were put out, and all the commercial alternatives were examined, but licensing costs for the number of servers we were talking about was exorbitant. So we decided to build our own... The replacement: ATOMICS- a MySQL/Apache hybrid. http://conferences.oreillynet.com/cs/mysqluc2005/view/e_sess/7066 It works well for many of the search collections we have that don't need much in the way of full-text search (MySQL does have full-text capabilities, but nothing like Lucene). Backup plan: something based on Lucene. SOLAR really started out as a pure backup plan... just in case ATOMICS had problems in some areas. I had joined CNET a week earlier, and the task of building "something lucene-based" was luckily handed to me as I didn't have any other responsibilities yet. Pretty much no requirements except for the preference of something that spoke HTTP/XML that could be put behind a load-balancer and scaled. ATOMICS was pretty much done by the time I started on SOLAR, and was rapidly deployed across CNET. SOLAR had a tough time gaining traction until someone crossed a problem that ATOMICS couldn't easily handle: faceted browsing. There was finally something concrete to aim for, and filter caching, docsets, autowarming, custom query handlers, etc, were rapidly added to allow the ability to write custom plugins that could acutally do the faceting logic. The result: http://www.mail-archive.com/java-user@lucene.apache.org/msg02645.html It soulds like Hoss might go into some more details in his ApacheCon session: http://www.us.apachecon.com/html/sessions.html#FR26 > Another question asked of me was "Will Solr ever connect with datasources > directly?" As far as where Solr fits into our architecture, it's a back-end component in the generation of dynamic content... sort of the same place that a database would occupy. I don't know much about content generation in CNET, and specific content manangement syustems, but a lot of it ends up in databases. An "indexer" piece normally pulls stuff from one or more databases, and puts them into a solr master, which is replicated out to solr searchers (or slaves) that the app-servers generating dynamic content hit through a load-balancer. There is a diagram of that from my ApacheCon presentation: http://people.apache.org/~yonik/ApacheConEU2006/ As far as connecting to datasources directly... I think that being able to pull content from a database is a good idea, and It's on the todo list. What specific other data sources did you have in mind? -Yonik