I'm trying to do research for my organization on the best practices for open source pipeline/connectors. Since we need Web Crawls, File System crawls, and Databases, it seems to me that Manifold CF might be the best case.
Has anyone combined ManifestCF with Solr UpdateRequestProcessors or DataImportHandler? It would be nice to decide in ManifestCF which resultHandler should receive a document or id, barring that, you can post some fields including an URL and have Data Import Handler handle it - it already supports scripts whereas ManifestCF may not at this time. Suggestions and ideas? Thanks, Dan