Storing the md5 hash of pdf files as a field in the index

kuchenbrett Fri, 20 Apr 2012 08:01:41 -0700

Hi,

 I want to build an index of quite a number of pdf and msword files using the 
Data Import Request Handler and the Tika Entity Processor. It works very well. 
Now I would like to use the md5 digest of the binary (pdf/word) file as the 
unique key in t
 he index. But I do not know how to implement this. In the data-config.xml 
configuring the FileListEntityProcessor I have access to the absolute file name 
of a pdf to be indexed. I'm sitting on a Linux box and so there is an easy way 
to calculate t
 he md5 hash using the operating system command md5sum. But how can I trigger 
this calculation and store the result as a field in my index?


 Any tips or ideas are really appreciated.

 Thanks.
 Joe

Storing the md5 hash of pdf files as a field in the index

Reply via email to