Hello, I am going thru few use cases where we have kind of multiple disparate data sources which in general doesn't have much common fields and i was thinking to design different schema/index/collection for each of them and query each of them separately and provide different result sets to the client.
I have seen one implementation where all different fields from these disparate data sources are put together in single schema/design/collection that it can be searched easily using catch all field but this was having 200+ fields including copy fields. The problem i see with this design is ingestion will be slower (and scaling) as many of the fields for one data source will not be applicable when ingesting for other data source. Basically everything is being dumped into one huge schema/index/collection. After looking above, I am wondering how we can design this better in another implementation where we have the requirement to search across disparate source (each having multiple fields 10-15 fields searchable & 10-15 fields stored) with only 1 common field like description in each of the data sources. Most of the time user may perform search on description and rest of the time combination of different fields. Similar to google like search where you search for "coffee" and it searches in various data sources (websites, maps, images, places etc.) My thought is to make separate indexes for each search scenario. For example for single search box, we index description, other key fields which can be searched together and their data source type into one index/schema that we don't make a huge index/schema and use the catch all field for search. And for other Advance search (field specific) scenario we create separate index/schema for each data sources. Any suggestions/guidelines on how we can better address this in terms of responsiveness and scaling? Each data source may have documents in 50-100+ millions. Thanks, Susheel