On 1/21/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
Deep within the "Update Plugin" discussion, Hoss and I agreed that
adding an interface and registry for DocumentParsers is a good idea:
interface SolrDocumentParser
{
Document parse(ContentStream content);
}
SolrDocumentParser parser = core.getDocumentParse( "text/html");
This would let update plugins share (pluggable) logic for how to
convert a single stream into a single document... this is more then
we are talking about doing now, but something (else) to keep in mind.
Yes, please, for another day... ;-)
It would be interesting to explore what we could share with Nutch
too... they're in the business of doc parsing.
When we get to it, I'd like to hear why it (things like PDF parsing)
should be inside Solr rather than outside using our update interfaces.
-Yonik