I have a project where the client wants to store time series data
(maybe in SOLR if it can work).  We want to store daily "prices" over
last 20 years (about 6000 values with associate dates), for up to
500,000 entities.

This data currently exists in a SQL database.  Access to SQL is too
slow for clients needs at this point.  The requirements are to fetch
up to 6000 daily prices for an entity and render a chart in real-time
on a web page.

One way we can do it is to generate one document for every daily
price, per entity, so we have 500,000 * 6000 = 3 billion docs in SOLR.
 We created simple proof of concept with 10 million documents and it
works perfectly.  But, I assume up to 3 billion small documents is too
much for a single index.  What is the hard limit on the total # of
documents you can put into a SOLR index (regardless of memory, disk
space, etc.)?  The good thing about this approach is it works fine
using existing data import handler for SQL.  I know we can shard the
index per entity using some hash, but want to know what upper limit
per index is.

Another way is to store each set of 6000 prices as some blog (maybe
JSON) as single field on a document, and have one document per entity
(500,000 documents).  That will work, but there is no way to do this
using existing data import handlers correct?  If possible I dont want
to develop custom import handler or data loader unless I absolutely
have to.  Is there some template function or something available in
current DIH features to make this work?

Thanks
Bob

Reply via email to