Kuro, doc of some type -> parse content into various fields -> post to Solr
Even Nutch does the same - there is a title field, a content field, and so on (the exact names may be different). Of course, you can always just combine everything into a single content field. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: Teruhiko Kurosaka <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, July 3, 2007 8:56:23 PM Subject: Indexing HTML and other doc types Solr looks very good for indexing and searching strcutured data. But I noticed there is no tool in the Solr distribution with which documents of other doc types can be indexed. Are there other side projects that develop Solr clients for indexing documents of other doc types? Or is the generic full-text search really a wrong area to apply Solr, and should I be using something like Nutch? -kuro