Solr 5.3 Faceting on Children with Block Join Parser
Apologies for cross posting a question from SO here. I am very interested in the new faceting on child documents feature of Solr 5.3 and would like to know if somebody has figured out how to do it as asked in the question on http://stackoverflow.com/questions/32212949/solr-5-3-faceting-on-children-with-block-join-parser Thanks for any hints, Tom The question is: Solr 5.3 supports faceting on nested documents [1], with a great tutorial from Yonik [2]. In the tutorial example, the query to get the documents for faceting is directly performed on the child documents: $ curl http://localhost:8983/solr/demo/query -d ' q=author_s:yonik&fl=id,comment_t& json.facet={ genres : { type: terms, field: cat_s, domain: { blockParent : "type_s:book" } } }' What I do not know is how to facet on child documents returned from a Block Join Parent Query Parser [3] and provided through ExpandComponent [4]. What I have working so far is the same as in the example from the ExpandComponent [4]: Query the child fields to return the parent documents (see 1.), then expand the result to get the relevant child documents (see 2.) 1. q={!parent which="type_s:parent" v='text_t:solr'} 2. &expand=true&expand.field=ISBN_s&expand.q=*:* What I need: Having steps 1.) and 2.) already working, how can we facet on some field (does not matter which) of the returned child documents from (2.) ? [1]: http://yonik.com/solr-5-3/ [2]: http://yonik.com/solr-nested-objects/ [3]: https://cwiki.apache.org/confluence/display/solr/Other+Parsers [4]: http://heliosearch.org/expand-block-join/
Search over a multiValued field
Hi, I am running Solr 5.0.0 and have a question about proximity search and multiValued fields. I am indexing xml files of the following form with foundField being a field defined as multiValued and text_en my in schema.xml. 8 "Oranges from South California - ordered" "Green Apples - available" "Black Report Books - ordered" There are several such documents, and for instance, I would like to query all documents having in the foundField "Oranges" and "ordered". The following proximity query takes care of it: q=foundField:("oranges AND ordered"~2) However, a field could have more words, and I also cannot know the proximity of the desired query words in advance. Setting the proximity value too high results in false positives, the following query also returns the document (although "available" was in the entry about Apples): foundField:("oranges AND available"~200) I do not think that tweaking a proximity value is the correct approach. How can I search to match contents in a multiValued field per Value as described above, without running into the problem? Many thanks for any help
Re: Search over a multiValued field
Jack, This is exactly what I was looking for, thanks. I found the positionIncrementGap attribute in the schema.xml for the text_en I was putting in "AND" because I read in the Solr documentation that "The OR operator is the default conjunction operator." Does it mean that words between " symbols, such as "Orange ordered" are treated as a single term, with (implicitly) AND conjunction between them? Where could I found more info about this? I am currently reading https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser Thanks again On Tue, Mar 3, 2015 at 3:58 PM, Jack Krupansky wrote: > Just set the positionIncrementGap for the multivalued field to a much > higher value, like 1000 or 5000. That's the purpose of this attribute, to > assure that reasonable proximity matches don't match across multiple > values. > > Also, leave "AND" out of the query phrases - you're just trying to match > the product name and availability. > > > -- Jack Krupansky > > On Tue, Mar 3, 2015 at 4:51 PM, Tom Devel wrote: > > > Hi, > > > > I am running Solr 5.0.0 and have a question about proximity search and > > multiValued fields. > > > > I am indexing xml files of the following form with foundField being a > field > > defined as multiValued and text_en my in schema.xml. > > > > > > > > 8 > > "Oranges from South California - > ordered" > > "Green Apples - available" > > "Black Report Books - ordered" > > > > > > There are several such documents, and for instance, I would like to query > > all documents having in the foundField "Oranges" and "ordered". The > > following proximity query takes care of it: > > > > q=foundField:("oranges AND ordered"~2) > > > > However, a field could have more words, and I also cannot know the > > proximity of the desired query words in advance. Setting the proximity > > value too high results in false positives, the following query also > returns > > the document (although "available" was in the entry about Apples): > > > > foundField:("oranges AND available"~200) > > > > I do not think that tweaking a proximity value is the correct approach. > > > > How can I search to match contents in a multiValued field per Value as > > described above, without running into the problem? > > > > Many thanks for any help > > >
Re: Search over a multiValued field
Erick, Thanks a lot for the explanation, makes sense now. Tom On Tue, Mar 3, 2015 at 5:54 PM, Erick Erickson wrote: > bq: Does it mean that words between " symbols, such as "Orange ordered" are > treated as a single term, with (implicitly) AND conjunction between them? > > not at all. When you quote things, you're getting a "phrase query", > perhaps one > with slop. So something like > "a b" means that 'a' must appear right next to 'b'. This is something > like an AND > in the sense that both terms must appear, but it is far more > restrictive since it takes into > account the position of the terms in the field. > > "a b"~10 means that both words must appear within 10 transpositions in > the same field. > You can think of "transposition" as how many intervening terms there > are, so something > like "a b"~2 would match docs with "a x b", but not "a x y z b". > > And this is where positionIncrementGap comes in. By putting 1000 in > for it, you guarantee > "a b"~999 won't match 'a' in one field and 'b' in another. > > whereas a AND b would match across successive MV entries no matter what the > gap. > > HTH, > Erick > > On Tue, Mar 3, 2015 at 2:22 PM, Tom Devel wrote: > > Jack, > > > > This is exactly what I was looking for, thanks. I found the > > positionIncrementGap attribute in the schema.xml for the text_en > > > > I was putting in "AND" because I read in the Solr documentation that "The > > OR operator is the default conjunction operator." > > > > Does it mean that words between " symbols, such as "Orange ordered" are > > treated as a single term, with (implicitly) AND conjunction between them? > > > > Where could I found more info about this? > > > > I am currently reading > > > https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser > > > > Thanks again > > > > On Tue, Mar 3, 2015 at 3:58 PM, Jack Krupansky > > > wrote: > > > >> Just set the positionIncrementGap for the multivalued field to a much > >> higher value, like 1000 or 5000. That's the purpose of this attribute, > to > >> assure that reasonable proximity matches don't match across multiple > >> values. > >> > >> Also, leave "AND" out of the query phrases - you're just trying to match > >> the product name and availability. > >> > >> > >> -- Jack Krupansky > >> > >> On Tue, Mar 3, 2015 at 4:51 PM, Tom Devel wrote: > >> > >> > Hi, > >> > > >> > I am running Solr 5.0.0 and have a question about proximity search and > >> > multiValued fields. > >> > > >> > I am indexing xml files of the following form with foundField being a > >> field > >> > defined as multiValued and text_en my in schema.xml. > >> > > >> > > >> > > >> > 8 > >> > "Oranges from South California - > >> ordered" > >> > "Green Apples - available" > >> > "Black Report Books - ordered" > >> > > >> > > >> > There are several such documents, and for instance, I would like to > query > >> > all documents having in the foundField "Oranges" and "ordered". The > >> > following proximity query takes care of it: > >> > > >> > q=foundField:("oranges AND ordered"~2) > >> > > >> > However, a field could have more words, and I also cannot know the > >> > proximity of the desired query words in advance. Setting the proximity > >> > value too high results in false positives, the following query also > >> returns > >> > the document (although "available" was in the entry about Apples): > >> > > >> > foundField:("oranges AND available"~200) > >> > > >> > I do not think that tweaking a proximity value is the correct > approach. > >> > > >> > How can I search to match contents in a multiValued field per Value as > >> > described above, without running into the problem? > >> > > >> > Many thanks for any help > >> > > >> >
Order of defining fields and dynamic fields in schema.xml
Hi, I am running solr 5 using basic_configs and have a questions about the order of defining fields and dynamic fields in the schema.xml file? For example, there is a field "hierarchy.of.fields.Project" I am capturing as below as "text_en_splitting", but the rest of the fields in this hierarchy, I would like as "text_en" Since the dynamicField with * is technically spanning over the Project field, should its definition go above, or below the Project field? Or this case, I have a hierarchy where currently only one field should be captured "another.hierarchy.of.fields.Description", the rest for now should be just ignored. Is here any significance of which definition comes first? Thanks for any hints, Tom
Re: Order of defining fields and dynamic fields in schema.xml
Thats good to know. On http://wiki.apache.org/solr/SchemaXml it also states about dynamicFields that "you can create field rules that Solr will use to understand what datatype should be used whenever it is given a field name that is not explicitly defined, but matches a prefix or suffix used in a dynamicField. " Thanks On Fri, Mar 6, 2015 at 10:43 AM, Alexandre Rafalovitch wrote: > I don't believe the order in file matters for anything apart from > initParams section. The longer - more specific one - matches first. > > > Regards, >Alex. > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 6 March 2015 at 11:21, Tom Devel wrote: > > Hi, > > > > I am running solr 5 using basic_configs and have a questions about the > > order of defining fields and dynamic fields in the schema.xml file? > > > > For example, there is a field "hierarchy.of.fields.Project" I am > capturing > > as below as "text_en_splitting", but the rest of the fields in this > > hierarchy, I would like as "text_en" > > > > Since the dynamicField with * is technically spanning over the Project > > field, should its definition go above, or below the Project field? > > > > > indexed="true" stored="true" multiValued="true" required="false" /> > > > indexed="true" stored="true" multiValued="true" required="false" /> > > > > > > Or this case, I have a hierarchy where currently only one field should be > > captured "another.hierarchy.of.fields.Description", the rest for now > should > > be just ignored. Is here any significance of which definition comes > first? > > > > > indexed="false" stored="false" multiValued="true" required="false" /> > > > type="text_en"indexed="true" stored="true" multiValued="true" > > required="false" /> > > > > Thanks for any hints, > > Tom >
Block Join Query update documents, how to do it correctly?
I am using the Block Join Query Parser with success, following the example on: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers As this example shows, each parent document can have a number of documents embedded, and each document, be it a parent or a child, has its own unique identifier. Now I would like to update some of the parent documents, and read that there are horror stories with duplicate documents, scrambled data etc., the two prominent JIRA entries for this are: https://issues.apache.org/jira/browse/SOLR-6700 https://issues.apache.org/jira/browse/SOLR-6096 My question is, how do you usually update such documents, for example to update a value for the parent or a value for one of its children? I tried to repost the whole modified document (the parent and ALL of its children as one file), and it seems to work on a small toy example, but of course I cannot be sure for a larger instance with thousands of documents, and I would like to know if this is the correct way to go or not. To make it clear, if originally I used bin/solr post on on the following file: 1 Solr has block join support parentDocument 2 SolrCloud supports it too! Now I could do bin/solr post on a file: 1 Updated field: Solr has block join support parentDocument 2 Updated field: SolrCloud supports it too! Will this avoid these inconsistent and scrambled or duplicate data on Solr instances as discussed in the JIRAs? How do you usually do this? Thanks for any help or hints. Tom
Re: How to update from Solr Cloud 5.4.1 to 5.5.1
Shawn, Do you (or anybody else here) know of the upgrade steps from 6.1 to 6.2 in this case? The release notes of 6.2 do not mention anything about upgrading, but 6.2 has some good bugfixes. If 6.2 made changes to the index format, is a drop-in replacement from 6.1 to 6.2 still possible? Thanks, Tom On Sat, Aug 27, 2016 at 12:23 PM, Shawn Heisey wrote: > On 8/26/2016 10:22 AM, D'agostino Victor wrote: > > Do you know in which version index format changes and if I should > > update to a higher version ? > > In version 6.0, and again in the just-released 6.2, one aspect of the > index format has been updated. Version 6.1 didn't have any format > changes from 6.0. You won't see the new version reflected in any of the > filenames in the index directory. > > Whether or not to upgrade depends on what features you need, and whether > you need fixes included in the new version. Not all of the fixed bugs > in 6.x are applicable to 5.x -- some are fixes for problems introduced > during 6.x development. > > > And about ZooKeeper ; the 3.4.8 is fine or should I update it too ? > > That's the newest stable version of zookeeper. There are alpha releases > of version 3.5. > > Solr includes zookeeper 3.4.6. A 3.4.8 server will work, but no > guarantees can be made about the 3.5 alpha versions. > > Thanks, > Shawn > >
Solr and UIMA, capturing fields
Hi, I successfully combined Solr and UIMA with the help of https://wiki.apache.org/solr/SolrUIMA and other pages (and am happy to provide some help about how to reach this step). Right now I can run an analysis engine and get some "primitive" feature/fields which I specify in the schema.xml automatically recognized by Solr. But if the features itself are objects, I do not know how to capture them in Solr. I provided the relevant solrconfig.xml in [1], and the schema.xml addition in [2] for the following small example, they are using the AE directly provided by the UIMA example. With the input "This is a sentence with an email at u...@host.com", Solr correctly adds the field: "UIMAname": [ "36" ] since this is the index where the email token starts. I could also successfully capture the feature end to indicate where the found email token ends. However, example.EmailAddress has the features: "begin, end, sofa". sofa is not a primitive feature, but an "object" which itself has features "sofaNum, sofaID, sofaString, ..." How can I access fields in Solr from an annotation like example.EmailAddress that are not simple strings but itself objects? I made an image of the CAS Visual Debugger with this AE and the sentence to show which fields I mean, I hope this makes it more clear: http://tinypic.com/view.php?pic=34rud1s&s=8#.VN5bF7s2cWN Does anyone know how to access such fields with Solr and UIMA? Thanks a lot for any help, Tom [1] /home/toliwa/javalibs/uimaj-2.6.0-bin/apache-uima/examples/descriptors/analysis_engine/UIMA_Analysis_Example.xml false id false text example.EmailAddress begin UIMAname [2]