Hi, I want to build an index of quite a number of pdf and msword files using the Data Import Request Handler and the Tika Entity Processor. It works very well. Now I would like to use the md5 digest of the binary (pdf/word) file as the unique key in t he index. But I do not know how to implement this. In the data-config.xml configuring the FileListEntityProcessor I have access to the absolute file name of a pdf to be indexed. I'm sitting on a Linux box and so there is an easy way to calculate t he md5 hash using the operating system command md5sum. But how can I trigger this calculation and store the result as a field in my index?
Any tips or ideas are really appreciated. Thanks. Joe