Hi Been using Solr on a project now for a couple of years and is working well. It's just a simple index of about 20 - 25 fields and 7,000 project records.
Now there's a requirement to be able to search on the content of documents (web pages, Word, pdf etc) related to those projects. My initial thought was to just create a new index to store the Tika'd content and just search on that. However, the requirement is to somehow search through both the project records and the content records at the same time and list the main project with perhaps some info on the matching content data. I tried to explain that you may find matching main project records but no content, and vice versa. My only solution to this search problem is to either concatenate all the document content into one field on the main project record, and add that to my dismax search, and use boosting etc or to use a multi-valued field to store the content of each project document. I'm a bit reluctant to do this as the application is running well and I'm a bit nervous about a change to the schema and the indexing process. I just wondered what you thought about adding a lot of content to an existing schema (single or multivalued field) that doesn't normally store big amounts of data. Or does anyone know of any way, I can join two searches like this together and two separate indexes? Thanks Shaun