This is a good start. Few things to consider.
1. Extract the contents via Tika externally or via Tika Server.
2. Create a canonical “Item” document schema which would have title, metadata,
contents, imagePreview (something to consider) , etc.
3. Use the extracted Tika data to populate your index.
Hi team,
We have a business case like the below one.
There are nearly 150 GB of docs(pdf/ppt/word/xl/msg) files which are in
stored in a N/w Path as of now. To implement text search on these , we are
planning to use solr search in these. Listed below is the plan.
1)Using a high configuration W