Thanks Erick Very helpful indeed.
Your guesses on data size are about right. There might only be 50,000 items in the whole index. And typically we'd fetch a batch of 10. Disk is cheap and this really isn't taking much room anyway. For such a tiny data set, it seems like this approach will work well. This seems like it might even be a good approach for creating additional cores primarily for the purpose of caching: that is, a core full of records that are only ever queries by some unique key. I wouldn't want to abuse Solr for a purpose it wasn't designed, but since it is already there it appears to be a useful approach. Rather than getting some data from the db, we fetch it from Solr pre-assembled. Thanks Ari On 22/11/16 3:28am, Erick Erickson wrote: > Searching isn't really going to be impacted much, if at all. You're > essentially talking about setting some field with store="true" and > stuffing the HTML into that, right? It will probably have indexed="false" > and docValues="false". > > So.. what that means is that very early in the indexing process, the > raw data is dumped to the *.fdt and *.fdx extensions for the segment. These > are totally irrelevant for querying, they aren't even read from disk to score > the docs. So let's say your numFound = 10,000 and rows=10. Those 10,000 > docs are scored without having to look at the stored data at all. Now, when > the 10 docs are assembled for return, the stored data is read off disk > decompressed and returned. > > So the additional cost will be > 1> your index is larger on disk > 2> merging etc. will be a bit more costly. This doesn't > seem like a problem if your index doesn't change all > that often. > 3> there will be some additional load to decompress the data > and return it. > > This is a perfectly reasonable approach, my guess is that any difference > in search speed will be lost in the noise of measuring and that the > additional load of decompressing will be more than offset by not having > to make a separate service call to actually get the doc, but as always > measuring the performance is the proof you need. > > You haven't indicated how _many_ docs you have in your corpus, but a > rough indication of the additional disk space is about half the raw HTML size, > we've usually seen about a 2:1 compression ratio. With a zillion docs > that could be sizeable, but disk space is cheap. > > > Best, > Erick > > On Mon, Nov 21, 2016 at 8:08 AM, Aristedes Maniatis > <amania...@apache.org> wrote: >> I'm familiar enough with 7-8 years of Solr usage in how it performs as a >> full text search index, including spatial coordinates and much more. But for >> the most part, we've been returning database ids from Solr rather than a >> full record ready to display. We then grab the data and related records from >> the database in the usual way and display it. >> >> We are thinking now about improving performance of our app. One option is >> Reddis to store html pieces for reuse, rather than assembling the html from >> dozens of queries to the database. We've done what we can with caching in >> the ORM level, and we can't do too much with varnish because of differences >> in page rendering per user (eg shopping baskets). >> >> But we are thinking about storing the rendered html directly in Solr. The >> downsides appear to be: >> >> * adding 2-10kB of html to each record and the performance hit this might >> have on searching and retrieving >> * additional load of ensuring we rebuild Solr's data every time some part of >> that html changes (but this is minimal in our use case) >> * additional cores that we'll want to add to cache other data that isn't yet >> in Solr >> >> Is this a reasonable approach to avoid running yet another cluster of >> services? Are there downsides to this I haven't thought of? How does Solr >> scale with record size? >> >> >> >> Cheers >> Ari >> >> >> >> >> -- >> --------------------------> >> Aristedes Maniatis >> GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A -- --------------------------> Aristedes Maniatis GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A