Kuro,

doc of some type -> parse content into various fields -> post to Solr

Even Nutch does the same - there is a title field, a content field, and so on 
(the exact names may be different).

Of course, you can always just combine everything into a single content field.

Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Teruhiko Kurosaka <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, July 3, 2007 8:56:23 PM
Subject: Indexing HTML and other doc types

Solr looks very good for indexing and searching strcutured data. 
But I noticed there is no tool in the Solr distribution with which documents
of other doc types can be indexed.  Are there other side projects that 
develop Solr clients for indexing documents of other doc types?

Or is the generic full-text search really a wrong area to apply Solr, and
should I be using something like Nutch?
-kuro 



Reply via email to