dsmiley commented on a change in pull request #1171: SOLR-13892: Add 'top-level' docValues Join implementation URL: https://github.com/apache/lucene-solr/pull/1171#discussion_r370319600
########## File path: solr/solr-ref-guide/src/other-parsers.adoc ########## @@ -591,36 +591,95 @@ The hash range query parser uses a special cache to improve the speedup of the q == Join Query Parser -`JoinQParser` extends the `QParserPlugin`. It allows normalizing relationships between documents with a join operation. This is different from the concept of a join in a relational database because no information is being truly joined. An appropriate SQL analogy would be an "inner query". +The Join query parser allows users to run queries that normalize relationships between documents. +Solr runs a subquery of the user's choosing (the `v` param), identifies all the values that matching documents have in a field of interest (the `from` param), and then returns documents where those values are contained in a second field of interest (the `to` param). -Examples: - -Find all products containing the word "ipod", join them against manufacturer docs and return the list of manufacturers: +In practice, these semantics are much like "inner queries" in a SQL engine. +As an example, consider the Solr query below: [source,text] ---- -{!join from=manu_id_s to=id}ipod +/solr/techproducts/select?q={!join from=manu_id_s to=id}title:ipod ---- -Find all manufacturer docs named "belkin", join them against product docs, and filter the list to only products with a price less than $12: +This query, which returns a document for each manufacturer that makes a product with "ipod" in the title, is semantically identical to the SQL query below: [source,text] ---- -q = {!join from=id to=manu_id_s}compName_s:Belkin -fq = price:[* TO 12] +SELECT * +FROM techproducts +WHERE id IN ( + SELECT manu_id_s + FROM techproducts + WHERE title='ipod' + ) ---- -The join operation is done on a term basis, so the "from" and "to" fields must use compatible field types. For example: joining between a `StrField` and a `IntPointField` will not work, likewise joining between a `StrField` and a `TextField` that uses `LowerCaseFilterFactory` will only work for values that are already lower cased in the string field. +The join operation is done on a term basis, so the `from` and `to` fields must use compatible field types. +For example: joining between a `StrField` and a `IntPointField` will not work. +Likewise joining between a `StrField` and a `TextField` that uses `LowerCaseFilterFactory` will only work for values that are already lower cased in the string field. + +=== Parameters + +This query parser takes the following parameters: + +`from`:: +The name of a field which contains values to look for in the "to" field. +Can be single or multi-valued, but must have a field type compatible with the field represented in the "to" field. +This parameter is required. + +`to`:: +The name of a field whose value(s) will be checked against those found in the "from" field. +Can be single or multi-valued, but must have a field type compatible with the "from" field. +This parameter is required. + +`fromIndex`:: +The name of the index to run the "from" query (`v` parameter) on and where "from" values are gathered. +Must be located on the same node as the core processing the request. +This parameter is optional; it defaults to the value of the processing core if not specified. +See <<Joining Across Collections,Joining Across Collections>> below for more information. + +`score`:: +An optional parameter that instructs Solr to return information about the "from" query scores. +The value of this parameter controls what type of aggregation information is returned. +Options include `avg` (average), `max` (maximum), `min` (minimum), `total` (total), or `none`. ++ +If `method` is not specified but `score` is, then the `dvWithScore` method is used. +If `method` is specified and is not `dvWithScore`, then the `score` value is ignored. +See the `method` parameter documentation below for more details. -=== Join Parser Scoring -You can optionally use the `score` parameter to return scores of the subordinate query. The values to use for this parameter define the type of aggregation, which are `avg` (average), `max` (maximum), `min` (minimum) `total`, or `none`. +`method`:: +An optional parameter used to determine which of several query implementations should be used by Solr. +Options are restricted to: `index`, `dvWithScore`, and `topLevelDV`. +If unspecified the default value is `index`, unless the `score` parameter is present which overrides it to `dvWithScore`. +Each implementation has its own performance characteristics, and users are encouraged to experiment to determine which implementation is most performant for their use-case. +Details and performance heuristics are given below. ++ +`index` the default `method` unless the `score` parameter is specified. +Uses inverted index structures to process the request. +Performance scales linearly with the number of values matched in the "from" field. +Consider this method when the "from" query matches few documents, when the "to" side returns a large number of documents, or when sporadic post-commit slowdowns cannot be tolerated (this a disadvantage of other methods that `index` avoids). Review comment: "this a" -> "this is a" ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org