<field name="id" type="string" stored="true" indexed="true" required="true" /> <field name="data" type="text_en" stored="true" indexed="false" />
Then sometime later <uniqueKey>id</uniqueKey> (all this in your schema.xml file). That's it. The data field isn't analyzed at all, so the type is largely irrelevant. what you put in it is all your pairs of doubles in some kind of delimited format, e.g. 2345.0,<timestamp> | 873945.7,<timestamp> Now you just get your data field back, split it up and go. Getting the report document will be about as fast as anything you could do in Solr, lookup by what is essentially the primary key. Updating your reports is just re-indexing (use the timestamp in your DB) and it'll automatically replace documents with the same id. You *might* be able to use the "binary" type, but that's base64 encoded so whether it would be faster than parsing your pairs from text is an open question. But what's really unclear is how ginormous your double/timestamp pairs are. If you're pulling a billion pairs out, Solr performance won't be your problem <G>.... Best Erick On Mon, Dec 5, 2011 at 2:24 PM, Alan Miller <alan.mill...@gmail.com> wrote: > > I know I'm using SolR for a task that is better suited for the DB to handle > but I'm > doing this for reasons related to the overall design of my system. My DB is > going to > become very large over time and it is constantly being updated via Hadoop > jobs that > collect,analyze some data and generate the final (report) results. > > The front end web-app needs to be VERY fast and only needs access to a subset > of the data. > It also let's us decouple the state of the DB and the front end, ie we can > control when we sync > the data from the DB to the SolR indexes. > You could say I'm using SolR as an in memory cache of my DB indexes. > > We're also a small team and all our development is in java Hadoop, GWT so it > was very > easy for us to integrate SolR and Solrj into our app. > > If somebody could toss in an example of what the scheme might look like > that'd be great. > I have a very simple VALUE table that has columns: > value_pk INTEGER ; primary-key > report_fk INT ; foreign-key to report table > tstamp TIMESTAMP > value NUMERIC(7,4) > > Alan > > On Dec 5, 2011, at 14:34, Erick Erickson <erickerick...@gmail.com> wrote: > >> Well, Solr is a text search engine, and a good one. But this sure >> feels like a problem that RDBMSs were built to handle. Why do >> you want to do this? Is your current performance a problem? >> Are you blowing your space resources out of the water? Do you >> want to distribute your app to places not connected to your RDBMS? >> Is there too much traffic on your RDBMS machine? >> >> Something about "if it ain't broke, don't fix it". >> >> In general, you have to tell us the problem you're trying to solve >> so we don't go off into XY land. >> http://people.apache.org/~hossman/#xyproblem >> >> Best >> Erick >> >> On Fri, Dec 2, 2011 at 1:33 PM, Alan Miller <alan.mill...@gmail.com> wrote: >>> Hi I have a webapp that plots a bunch of time series >>> Data which are just doubles coupled with a timestamp >>> >>> Every chart in my webapp has a reportid in my db and i am wondering if it >>> would be effective to usr solr to serve the data th my app instead of >>> keeping the data in my rdbms. >>> >>> Currently im using hadoop to calc and generate the report data and the >>> sticking it in my rdbms but i could use solrj client to upload the data to >>> a solr index >>> >>> I know solr if for indexing text documents but would it be effective to use >>> solr in this way? >>> >>> I want to query by reportid and get back a series of timestamp:double pairs. >>> >>> Regards >>> Alan