Using hundreds of dynamic fields
Hi folks, My application requires tracking a daily performance metric for all documents. I start tracking for an 18 month window from the time a doc is indexed, so each doc will have ~548 of these fields. I have in my schema a dynamic field to capture this requirement: Example: metric_2014_06_24 : 15 metric_2014_06_25 : 21 … My application then issues a query that: a) sorts documents by the sum of the metrics within a date range that is variable for each query; b) gathers stats on the metrics using the Statistics component. With this design, the app must unfortunately: a) construct the sort as a long list of fields within the spec’d date range to accomplish the sum; e.g. sort=sum(metric_2014_06_24,metric_2014_06_25…) desc b) specify each field in the range independently to the Stats component; e.g. stats.field=metric_2014_06_24&stats.field=metric_2014_06_25… Am I missing a cleaner way to accomplish this given the requirements above? Thanks for any suggestions you may have.
Re: Using hundreds of dynamic fields
Thanks, Jack and Jared, for your input on this. I'm looking into whether parent-child relationships via block or query time join will meet my requirements. Jack, I noticed in a bunch of other posts around the web that you've suggested to use dynamic fields in moderation. Is this suggestion based on negative performance implications of having to read and rewrite all previous fields for a document when doing atomic updates? Or are there additional inherent negatives to using lots of dynamic fields? Andy On Fri, Jun 27, 2014 at 11:46 AM, Jared Whiklo wrote: > This is probably not the best answer, but my gut says that even if you > changed your document to a simple 2 fields and have one as your metric and > the other as a TrieDateField you would speed up and simplify your date > range queries. > > > -- > Jared Whiklo > > > > On 2014-06-27 10:10 AM, "Andy Crossen" wrote: > > >Hi folks, > > > >My application requires tracking a daily performance metric for all > >documents. I start tracking for an 18 month window from the time a doc is > >indexed, so each doc will have ~548 of these fields. I have in my schema > >a > >dynamic field to capture this requirement: > > > > > > > >Example: > >metric_2014_06_24 : 15 > >metric_2014_06_25 : 21 > >… > > > >My application then issues a query that: > >a) sorts documents by the sum of the metrics within a date range that is > >variable for each query; > >b) gathers stats on the metrics using the Statistics component. > > > >With this design, the app must unfortunately: > >a) construct the sort as a long list of fields within the spec’d date > >range > >to accomplish the sum; e.g. sort=sum(metric_2014_06_24,metric_2014_06_25…) > >desc > >b) specify each field in the range independently to the Stats component; > >e.g. stats.field=metric_2014_06_24&stats.field=metric_2014_06_25… > > > >Am I missing a cleaner way to accomplish this given the requirements > >above? > > > >Thanks for any suggestions you may have. > >
Sorting by a dynamically-generated field in a distributed context
Hi folks, Using Solr 4.6.0 in a cloud configuration, I'm developing a SearchComponent that generates a custom score for each document. Its operational flow looks like this: 1. The score is derived from an analysis of search results coming out of the QueryComponent. Therefore, the component is installed after QueryComponent in the processing chain. 2. The scores are generated in the component's process method (i.e. at the shard level), and a map of uniqueKey:score is attached to each shard's response at this point. 3. The shard-wise maps are combined in handleResponses and the aggregate map is attached to the top-level distributed query's response. 4. In the finishStage method at the coordinator node level (i.e. response stage = Get Fields), I'm presented with the final list of search results sorted by Lucene score. My custom scores are now added as fields to their corresponding documents based on a uniqueKey lookup in the aggregate score map. Now I need to sort the final document list (or do it at the shard level) by the custom score, but I'm having trouble understanding how to accomplish this. Yes, I could just sort my list (which will never exceed 1K results) in finishStage and be done with it, but I'm trying to learn Solr best practices to see if there's a better way. At the end of the day, I'd like to be able to take advantage of the "sort" request parameter to effect my sort. Given the current operational flow, it seems like I'd need to add a new SortField for my score in step 4 and reinvoke QueryComponent's mergeIds sort routine now that my custom field is present in the document list. Of course, I can't do that since it's all private code; nor does it seem wise from an extensibility perspective to copy that code into my component for use in this manner. Reading Sujit Pal's blog post on "Custom Sorting in Solr using External Database Data", I started down the path of defining a custom FieldType/FieldComparatorSource for my score, but I didn't see how that would help since the sort is still applied in QueryComponent - before my custom score is available. Regardless, Sujit's example seems pretty close to what I want. I must be misusing/misunderstanding the distributed design here in some way. Can an expert on distributed search components weigh in here? Thanks!
Duplicate scoring situation in DelegatingCollector
Hi folks, I have a DelegatingCollector installed via a PostFilter (kind of like an AnalyticsQuery) that needs the document score to a) add to a collection of score-based stats, and b) decide whether to keep the document based on the score. If I keep the document, I call super.collect() (where super is a TopScoreDocCollector), which re-scores the document in its collect method. The scoring is custom and reasonably expensive. Is there an easy way to avoid this? Or do I have to stop calling super.collect(), manage my own bitset/PQ, and pass the filtered results in the DelegatingCollector's finish() method? There's a thread out there ("Configurable collectors for custom ranking") that kind of talks about the above. Seems cumbersome. Thanks for any direction!