Hi list,
I'm using the ExtractingRequestHandler to extract content from
documents. It's extracting the "last_modified" field quite fine, but of
course only for documents where this field is set. If this field is not
set I want to pass the file system timestamp of the file.
I'm doing:
final ContentStreamUpdateRequest up =
new ContentStreamUpdateRequest("/update/extract");
up.setParam("literal.last_modified",
format.format(new Date(file.lastModified())));
This works fine but only for documents that don't have a last modified
field inside (like many PDFs have). Then I get
"multiple values encountered for non multiValued field last_modified"
Is it possible to make ExtractingRequestHandler overwrite the
last_modified I passed as parameter with the one Tika extracted?
Thanks,
Chris