Hi List, My SOLR instance is setup to index PST files with DIH, TikaEntityProcessor and OutlookPSTParser. After running import, I can see that the index contains the top level information of the PST file (e.g. unique id of each message, header, PST file size) but the messages themselves are missing. I suspect that I need to instruct SOLR to recurse to the next level during indexing inside DIH config file but I don’t know how. My DIH config file looks like so:
<dataSource name="bin" type="BinFileDataSource" /> <document> <entity name="files" dataSource="bin" rootEntity="false" processor="FileListEntityProcessor" baseDir=“/PST_Path" fileName=".*" onError="abort” recursive=“true”> <entity pk="uri" name="file" dataSource="bin" processor="TikaEntityProcessor" url="${files.fileAbsolutePath}" format="xml" rootEntity="true" onError="skip" recursive="true" parser="org.apache.tika.parser.mbox.OutlookPSTParser”> <!—- I think I need to insert another entity here to parse/index the actual messages but I don’t know how to craft one —> </entity> </entity> </document> Any ideas? Thank you, Anton