Searching isn't really going to be impacted much, if at all. You're essentially talking about setting some field with store="true" and stuffing the HTML into that, right? It will probably have indexed="false" and docValues="false".
So.. what that means is that very early in the indexing process, the raw data is dumped to the *.fdt and *.fdx extensions for the segment. These are totally irrelevant for querying, they aren't even read from disk to score the docs. So let's say your numFound = 10,000 and rows=10. Those 10,000 docs are scored without having to look at the stored data at all. Now, when the 10 docs are assembled for return, the stored data is read off disk decompressed and returned. So the additional cost will be 1> your index is larger on disk 2> merging etc. will be a bit more costly. This doesn't seem like a problem if your index doesn't change all that often. 3> there will be some additional load to decompress the data and return it. This is a perfectly reasonable approach, my guess is that any difference in search speed will be lost in the noise of measuring and that the additional load of decompressing will be more than offset by not having to make a separate service call to actually get the doc, but as always measuring the performance is the proof you need. You haven't indicated how _many_ docs you have in your corpus, but a rough indication of the additional disk space is about half the raw HTML size, we've usually seen about a 2:1 compression ratio. With a zillion docs that could be sizeable, but disk space is cheap. Best, Erick On Mon, Nov 21, 2016 at 8:08 AM, Aristedes Maniatis <amania...@apache.org> wrote: > I'm familiar enough with 7-8 years of Solr usage in how it performs as a full > text search index, including spatial coordinates and much more. But for the > most part, we've been returning database ids from Solr rather than a full > record ready to display. We then grab the data and related records from the > database in the usual way and display it. > > We are thinking now about improving performance of our app. One option is > Reddis to store html pieces for reuse, rather than assembling the html from > dozens of queries to the database. We've done what we can with caching in the > ORM level, and we can't do too much with varnish because of differences in > page rendering per user (eg shopping baskets). > > But we are thinking about storing the rendered html directly in Solr. The > downsides appear to be: > > * adding 2-10kB of html to each record and the performance hit this might > have on searching and retrieving > * additional load of ensuring we rebuild Solr's data every time some part of > that html changes (but this is minimal in our use case) > * additional cores that we'll want to add to cache other data that isn't yet > in Solr > > Is this a reasonable approach to avoid running yet another cluster of > services? Are there downsides to this I haven't thought of? How does Solr > scale with record size? > > > > Cheers > Ari > > > > > -- > --------------------------> > Aristedes Maniatis > GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A