Hi -- I'm building a Solr index to replace an existing RDBMS-based system, and I have one requirement that I'm not sure how to best satisfy. Documents in our collection can have user-generated ratings associated with them; these user-generated ratings are aggregated by source (sources are basically business partners who use our public API to a) publish content on our system, and to b) allow their users to interact with -- i.e., rate, comment on, etc. -- content in our system). When we query the index, it's important to be able to return documents sorted by the aggregated ratings data for any source.
The simplest solution I could think of was to add some dynamic fields to the schema: <dynamicField name="userRatingAverage_*" type="sfloat" indexed="true" stored="true" /> <dynamicField name="userRatingCount_*" type="sint" indexed="true" stored="true" /> <dynamicField name="userRatingSum_*" type="sfloat" indexed="true" stored="true" /> And when I'm indexing documents, I add one field for each source from which users have contributed ratings, e.g.: <field name="userRatingAverage_sourceId1">3.3</field> <field name="userRatingCount_sourceId1">10</field> <field name="userRatingSum_sourceId1">33</field> <field name="userRatingAverage_sourceId2">2.8</field> <field name="userRatingCount_sourceId2">20</field> <field name="userRatingSum_sourceId2">56</field> etc... So far this seems acceptable. Query performance seems fine when using the dynamic fields to sort result sets; indexing performance also seems fine*. That said, there are only 400K documents in the collection I'm working with, and few external rating sources at the moment (there are about a dozen, and most documents have no external ratings data associated with them). But as these fields will be created from user-generated data, there's nothing to stop those numbers from ballooning. What I'm wondering is whether any of the Solr experts on this list would endorse this solution, or caution against it? Are there any things I need to know before I proceed with it? Before this obvious solution occurred to me, I was thinking I would need to create a custom FieldType of my own, and perhaps my own SortComparatorSource, so that I could sort records based in query-time parameters (i.e., the ID of the source whose ratings are to be used as the sort key). I've got a copy of LIA, and the DistanceComparatorSource example from the start of chapter 6 seemed a bit out of date, but like it ought to serve me plenty well. But then this message made me think that maybe that wasn't going to be quite as easy as I'd hoped: http://www.nabble.com/custom-sorting-tf4521989.html#a12951515 (It also made me think that I ought to take on the project proposed there -- i.e., "the idea of being able to specify a raw function as a sort" -- once I've got a better handle on Solr's internals.) Thanks in advance for any advice you can give. -Charlie * I'm adding about 250 docs/sec, though because of how I'm feeding documents, it's hard to say how much of that time is spent in Solr, and how much is spent in the Python feeding script I'm using; in any case, 250 docs/sec is perfectly adequate for now.