New to Solr and Lucene.   We're indexing text, pdf, html docs located on
local Unix file systems, and need the ability to search for file owner,
group, and other Linux file metadata, in addition to the file contents.  It
would be great if we could use nutch to index everything, and then crawl
through the file system again with a 10 line shell script that passed the
missing metadata to solr, and updated the existing docs.

But <add><doc> deletes all the old fields even if they're not present in the
new document.

If partial updates aren't possible,  what would be the best way to
accomplish what we need?  Do we want to to modify the source code for each
of the different doc format parsers to add support for this metadata?

Thanks,

Reply via email to