Hi guys, I'm cross posting this from lucene list as I guess I can have better help here for this scenario. Suppose I want to index 100Gb+ of numeric data. I'm not yet sure the specifics, but I can expect the following: - data is expected to be in one gigantic table. conceptually, is likea spreadsheet table: rows are objects and columns are properties.- values are mostly floating point numbers, and I expect them to be,let's say, unique or discreet, or almost randomly distributed (1.89868776E+50,1.434E-12)- The data is readonly. it will never change. Now I need to query this data based mostly in range queries on thecolumns. Something like: "SELECT * FROM Table WHERE (Col1 > 1.2E2 AND Col1 < 1.8E2) OR (Col3 == 0)" which is basically "give me all the rows that satisfy this criteria". I believe this could be easily done with a standard RDBMS, but I wouldlike to avoid that route. While thinking about this, and assuming this could work well withSolr, I had some things I couldn't answer:- - In this case, it makes total sense to store the data in the index. If I will index all "columns", I might as well have the data right there. - Does it make any sense to index this whole thing once, while offline, and then upload only the index to the servers? - I'm almost sure I will have to shard the index in some way, and this isn't difficult. But what are the possible hardware requirements to host this thing? I know this depends on lots of information I didn't provide (searches/sec for example), but can someone throw a number? I have completely no ideia...
Thanks -- Pedro Ferreira mobile: 00 44 7712 557303 skype: pedrosilvaferreira email: psilvaferre...@gmail.com linkedin: http://uk.linkedin.com/in/pedrosilvaferreira