New to Solr and Lucene. We're indexing text, pdf, html docs located on local Unix file systems, and need the ability to search for file owner, group, and other Linux file metadata, in addition to the file contents. It would be great if we could use nutch to index everything, and then crawl through the file system again with a 10 line shell script that passed the missing metadata to solr, and updated the existing docs.
But <add><doc> deletes all the old fields even if they're not present in the new document. If partial updates aren't possible, what would be the best way to accomplish what we need? Do we want to to modify the source code for each of the different doc format parsers to add support for this metadata? Thanks,