Hi, I have a custom library, which is used to input a file path and it returns file content as a string output. My DB has a file path in one of the table and using DIH configuration in Solr to do the indexing. I couldnt use TikaEntityProcessor to do indexing of a file located in file system. I though of using Custom Transformer to transform file_path to file_content field in the row.
I would like to know following details: 1. Setting file content as a string to a custom file_content field might cause memory issue if a very big file over hundreds of mega bites might consume the RAM space. Is it possible to send a stream as input to Solr? What is the filed type should be configured in schema.xml? 2. Is there any better approach than a custom transformer? 3. Any other best approach to implement indexing based on a file path? Thanks a lot.