For Hierarchical data structure is Graph Query a good option ?
Hi , I have a data in hierarchical structure ex: parent --> children --> grandchildren Usecase: Get parent docs by adding filter on children and grand children or Get grand children docs by adding filters on parent and children To accommodate this use case i have flattened the docs by adding a reference (parent) in the children and similarly (parent and children) in grandchildren doc. And used graph to query join the data using from/to fields. But this gets complicated as we add filter with AND and OR conditions. Any other approach which can solve these kind of use cases . Regards sam
Graph Query Bug ?
Hi All , Solr 8.2 Database structure . Parent -> Children Each child has parent referenceId Query: Get Parent doc based on child query Method 1: {!graph from=parentId to=parentId traversalFilter='docType:parent' returnRoot=false}child.name:foo AND child.type:name Result : 1 Debug: "rawquerystring": "{!graph from=parentId to=parentId traversalFilter='docType:parent' returnRoot=false}child.name:foo AND child.type:name", "querystring": "{!graph from=parentId to=parentId traversalFilter='docType:parent' returnRoot=false}child.name:foo AND child.type:name", "parsedquery": "GraphQuery([[+child.name:foo +child.type:name],parentId=parentId] [TraversalFilter: docType:parent][maxDepth=-1][returnRoot=false][onlyLeafNodes=false][useAutn=false])", "parsedquery_toString": "[[+child.name:foo +child.type:name],parentId=parentId] [TraversalFilter: docType:parent][maxDepth=-1][returnRoot=false][onlyLeafNodes=false][useAutn=false]", Method 2: ({!graph from=parentId to=parentId traversalFilter='docType:parent' returnRoot=false}child.name:foo AND child.type:name) Result : 0 Debug: "rawquerystring": "({!graph from=parentId to=parentId traversalFilter='docType:parent' returnRoot=false}child.name:foo AND child.type:name)", "querystring": "({!graph from=parentId to=parentId traversalFilter='docType:parent' returnRoot=false}child.name:foo AND child.type:name)", "parsedquery": "+GraphQuery([[child.name:foo],parentId=parentId] [TraversalFilter: docType:parent][maxDepth=-1][returnRoot=false][onlyLeafNodes=false][useAutn=false]) +child.type:name", "parsedquery_toString": "+[[child.name:foo],parentId=parentId] [TraversalFilter: docType:parent][maxDepth=-1][returnRoot=false][onlyLeafNodes=false][useAutn=false] +child.type:name", Any reason why it works differently . Regards sam
Stream InnerJoin to merge hierarchal data
Hi All, Our dataset is of 50M records and we are using complex graph query and now trying to do innerjoin on the records and facing the below issue . This is a critical issue . Parent { parentId:"1" parent.name:"foo" type:"parent" } Child { childId:"2" parentId:"1" child.name:"bar" type:"child" } GrandChild { grandId:"3" childId:"2" parentId:"1" grandchild.name:"too" type:"grandchild" } innerJoin(search(collection_name, q="type:grandchild", qt="/export", fl=" grandchild.name,grandId,childId,parentId", sort="childId asc"), search(collection_name, q="type:child", qt="/export", fl="child.name,childId,parentId", sort="childId asc"), on="childId") this works and gives result { "parentId": "1", "childId": "2", "grandId: "3", "grandchild.name": "too", "child.name": "bar" } but if i try to join the parent as well with another innerjoin this gives error innerJoin( innerJoin(search(collection_name, q="type:grandchild", qt="/export", fl=" grandchild.name,grandId,childId,parentId", sort="childId asc"), search(collection_name, q="type:child", qt="/export", fl="child.name,childId,parentId", sort="childId asc"), on="childId"), search(collection_name, q="type:parent", qt="/export", fl="parent.name, parentId", sort="parentId asc"),on="parentId") ERROR { "result-set": { "docs": [ { "EXCEPTION": "Invalid JoinStream - all incoming stream comparators (sort) must be a superset of this stream's equalitor.", "EOF": true } ] } } If we change the key parentId in child doc to childParentId and similarly childId,parentId in grandchild doc to grandchildId,grandParentId then query will work but this is a big change in schema.. i also refered this issue https://issues.apache.org/jira/browse/SOLR-10512 Thanks sam
Graph Query Parser Syntax
Hi All, In our project we have to use multiple graph queries with AND and OR conditions but graph query parser does not work for the below scenario, can any one suggest how to overcome this kind of problem? this is stopping our pre prod release . we are also using traversalFilter but our usecase still need multiple OR and AND graph query . *works* {!graph from=parentId to=parentId returnRoot=false}id:abc *works* ({!graph from=parentId to=parentId returnRoot=false}id:abc) *works* ({!graph from=parentId to=parentId returnRoot=false}id:abc AND name:test) *works* {!graph from=parentId to=parentId returnRoot=false}(id:abc AND name:test) *Fails Syntax Error * ({!graph from=parentId to=parentId returnRoot=false}(id:abc AND name:test)) *Fails Syntax Error * ({!graph from=parentId to=parentId returnRoot=false}(id:abc AND name:test)) OR (({!graph from=parentId to=parentId returnRoot=false}(description :abc AND name:test)) '(id:abc': Encountered \"\" at line 1, column 13.\nWas expecting one of:\n ...\n ...\n ...\n\"+\" ...\n\"-\" ...\n ...\n\"(\" ...\n\")\" ...\n\"*\" ...\n \"^\" ...\n ...\n ...\n ...\n ...\n ...\n ...\n\"[\" ...\n\"{\" ...\n ...\n\"filter(\" ...\n ...\n", Regards sam
How to use existing SolrClient with Streaming
Hi All , I have created a SolrClient bean and checking how to use it with SolrStream. @Configuration(proxyBeanMethods = *false*) SolrConfiguration Class @Bean *public* SolrClient solrClient() { String solrBaseUrl="http://***";; *return* *new* Http2SolrClient.Builder(solrBaseUrl).build(); } Another Streaming Class ex: *public* List> streamQuery(String expr) { List> tuples = *null*; ModifiableSolrParams params = *new* ModifiableSolrParams(); params.set("expr", expr); params.set("qt", "/stream"); TupleStream tupleStream = *new* SolrStream("http://***";, params) StreamContext context = *new* StreamContext(); tupleStream.setStreamContext(context); tuples = getTuples(tupleStream); } this works but is there any other way to use the existing SolrClient. I don't have zookeeper setup as of now Regards sambasiva
Re: How to use existing SolrClient with Streaming
during SolrStream initialization i had to pass the URL again rather would like to see if i can get it by any other way . On Tue, Feb 25, 2020 at 5:05 PM sambasivarao giddaluri < sambasiva.giddal...@gmail.com> wrote: > Hi All , > > I have created a SolrClient bean and checking how to use it with > SolrStream. > > @Configuration(proxyBeanMethods = *false*) > SolrConfiguration Class > > @Bean > > *public* SolrClient solrClient() { > >String solrBaseUrl="http://***";; > > *return* *new* Http2SolrClient.Builder(solrBaseUrl).build(); > > > > } > > > Another Streaming Class > > > ex: > > *public* List> streamQuery(String expr) { > > List> tuples = *null*; > > ModifiableSolrParams params = *new* ModifiableSolrParams(); > > params.set("expr", expr); > > params.set("qt", "/stream"); > > TupleStream tupleStream = *new* SolrStream("http://***";, params) > > StreamContext context = *new* StreamContext(); > > tupleStream.setStreamContext(context); > > tuples = getTuples(tupleStream); > > } > > > this works but is there any other way to use the existing SolrClient. I > don't have zookeeper setup as of now > > > Regards > > sambasiva > > > >
Re: Graph Query Parser Syntax
Hi All , any suggestions? On Fri, Feb 14, 2020 at 5:20 PM sambasivarao giddaluri < sambasiva.giddal...@gmail.com> wrote: > Hi All, > In our project we have to use multiple graph queries with AND and OR > conditions but graph query parser does not work for the below scenario, can > any one suggest how to overcome this kind of problem? this is stopping our > pre prod release . > we are also using traversalFilter but our usecase still need multiple OR > and AND graph query . > > > > *works* > {!graph from=parentId to=parentId returnRoot=false}id:abc > *works* > ({!graph from=parentId to=parentId returnRoot=false}id:abc) > *works* > ({!graph from=parentId to=parentId returnRoot=false}id:abc AND name:test) > *works* > {!graph from=parentId to=parentId returnRoot=false}(id:abc AND name:test) > > *Fails Syntax Error * > ({!graph from=parentId to=parentId returnRoot=false}(id:abc AND > name:test)) > > *Fails Syntax Error * > ({!graph from=parentId to=parentId returnRoot=false}(id:abc AND > name:test)) OR (({!graph from=parentId to=parentId > returnRoot=false}(description :abc AND name:test)) > > > '(id:abc': Encountered \"\" at line 1, column 13.\nWas expecting one > of:\n ...\n ...\n ...\n\"+\" ...\n\"-\" > ...\n ...\n\"(\" ...\n\")\" ...\n\"*\" ...\n > \"^\" ...\n ...\n ...\n ...\n > ...\n ...\n ...\n\"[\" > ...\n\"{\" ...\n ...\n\"filter(\" ...\n > ...\n", > > Regards > sam > > >
Graph Parser Bug
Hi All , i think this is bug in graph query parser if we pass additional brackets it throws parser exception and this is blocking our project can anyone suggest how to handle we are using solr 8.2. ex: ({!graph from=parentId to=parentId traversalFilter='type:parent' returnRoot=false}(childId.name:foo)) My use case multiple OR , AND conditions EX: ({!graph from=parentId to=parentId traversalFilter='type:parent' returnRoot=false}(childId.name:foo OR child.id:(1 2 4) )) AND ({!graph from=parentId to=parentId traversalFilter='type:parent' returnRoot=false} grandchild.id:1)
Solr fields mapping
Hi All, Is there a way we can map fields in a single field? Ex: scheme has below fields createdBy.userName createdBy.name createdBy.email If have to retrieve these fields need to pass all the three fields in *fl* parameter instead is there a way i can have a map or a object of these fields in to createdBy and in fl i pass only createdBy and get all these 3 as output Regards sam
Re: Solr fields mapping
Hi Audrey, Yes i am aware of copyField but it does not fit in my use case. Reason is while giving as output we have to show each field with its value, with copy it combines the value but we do not know field and value relationship. regards sam On Wed, Apr 29, 2020 at 9:53 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi, Sam! > > Have you tried creating a copyField? > https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/copying-fields.html > > Best, > Audrey > > On 4/28/20, 1:07 PM, "sambasivarao giddaluri" < > sambasiva.giddal...@gmail.com> wrote: > > Hi All, > Is there a way we can map fields in a single field? > Ex: scheme has below fields > createdBy.userName > createdBy.name > createdBy.email > > If have to retrieve these fields need to pass all the three fields in > *fl* > parameter instead is there a way i can have a map or a object of these > fields in to createdBy and in fl i pass only createdBy and get all > these 3 > as output > > Regards > sam > > >
Support for Graph parser with multiple shards
Hi all , We have documents to type which parent , child, grandchild and each child document has a reference field to parent doc and grandchild document has reference fields to child doc and parent doc . and each document has multiple fields ex: few fields on parent doc are age , gender , name and similarly on child and grandchild doc we have different fields few of our use cases are 1) Get all parents documents where child documents matches condition (ex: child.age>20) . this should give all the parent documents with all fields 2) Get all parents documents where parent condition matches,child condition matches and grand child condition matches 3) Get all child documents where parent condition matches,child condition matches and grand child condition matches we also had few other complex use case with pagination and *all were achievable by graph parser .* 1) But we are stuck as graph parser is not applicable with multi shard in cloud mode 2) gatherNodes does not work with pagination and also it gathers only a single field and we had to run another query to get all the fields on the doc and as it is streaming we had to pass sort field which does not work if we are looking for relevancy. Any plans to implement graph parser with multiple shards I did went through patch this https://issues.apache.org/jira/browse/SOLR-8176 but not working as it is getting exceptions with kafka. Regards sam
Insert documents to a particular shard
Hi All, I am running solr in cloud mode in local with 2 shards and 2 replica on port 8983 and 7574 and figuring out how to insert document in to a particular shard , I read about implicit and composite route but i don't think it will work for my usecase. shard1 : http://192.168.0.112:8983/family_shard1_replica_n1 http://192.168.0.112:7574/family_shard1_replica_n2 shard2: http://192.168.0.112:8983/family_shard2_replica_n3 http://192.168.0.112:7574/family_shard2_replica_n4 we have documents with parent child relationship but flatten out with 2 levels down and reference to each other. family schema documents: { "Id":"1" "document_type":"parent" "name":"John" } { "Id":"2" "document_type":"child" "parentId":"1" "name":"Rodney" } { "Id":"3" "document_type":"child" "parentId":"1" "name":"George" } { "Id":"4" "document_type":"grandchild" "parentId":"1", "childIdId":"2" "name":"David" } we have complex queries to get data based on graph query parser and as graph query parser does not work on solr cloud with multiple shards. I was trying to develop a logic like whenever a document gets inserted or updated make sure it gets saved in the same shard where the parent doc is stored , in that way graph query works because all the family information will be in the same shard. Approach : 1) If a new child/grandchild is getting inserted then get the parent doc shard details and add the shard details to the document in a field ex:parentshard and save the doc in the shard. 2) If document is getting updated check if the parentshard field exists if so update the doc to same shard. But all these check conditions will increase response time , currently our development is done in cloud mode with single shard and using solrj to save the data. Also i an unable to figure out the query to update doc to a particular shard. Any suggestions will help . Thanks in Advance sam
Re: Insert documents to a particular shard
Thanks Jorn for your suggestions , It was a sample schema but each document_type will have more fields . 1) Yes i have exported graph traversal gatherNodes using streaming expression but we found few issues ex: get parent doc based on grandchild doc filter Graph Traversal - {!graph from=parentId to=parentId traversalFilter='document_type:parent' returnRoot=false}(name:David AND document_type:grandchild) this request gives all the fields of the parent doc but gather nodes i can gather only a single field of the parent doc and then i have to query to get all the fields also we are looking for pagination where streams does not support pagination . 2) I tried document routing with explicit way and it might work for us but i have to explore more on what happens when we split the shards. ex: curl 'localhost:8983/solr/admin/collections?action=CREATE&name=family& router.name =implicit&router.field=rfield&collection.configName=base-config&shards=shard1,shard2&maxShardsPerNode=2&numShards=1&replicationFactor=2' - when inserting the parent doc i can randomly pick one of the shard (shard1 or shard2) for the rfield - while inserting any child doc or grandchild doc i use the parent doc rfield to keep them in the same shard. Regards sam On Tue, Jun 2, 2020 at 10:35 PM Jörn Franke wrote: > Hint: you can easily try out streaming expressions in the admin UI > > > Am 03.06.2020 um 07:32 schrieb Jörn Franke : > > > > > > You are trying to achieve data locality by having parents and children > in the same shard? > > Does document routing address it? > > > > > https://lucene.apache.org/solr/guide/8_5/shards-and-indexing-data-in-solrcloud.html#document-routing > > > > > > On a side node, I don’t know your complete use case, but have you > explored streaming expressions for graph traversal? > > > > https://lucene.apache.org/solr/guide/8_5/graph-traversal.html > > > > > >>> Am 03.06.2020 um 00:37 schrieb sambasivarao giddaluri < > sambasiva.giddal...@gmail.com>: > >>> > >> Hi All, > >> I am running solr in cloud mode in local with 2 shards and 2 replica on > >> port 8983 and 7574 and figuring out how to insert document in to a > >> particular shard , I read about implicit and composite route but i don't > >> think it will work for my usecase. > >> > >> shard1 : http://192.168.0.112:8983/family_shard1_replica_n1 > >> http://192.168.0.112:7574/family_shard1_replica_n2 > >> > >> shard2: http://192.168.0.112:8983/family_shard2_replica_n3 > >> http://192.168.0.112:7574/family_shard2_replica_n4 > >> > >> we have documents with parent child relationship but flatten out with 2 > >> levels down and reference to each other. > >> family schema documents: > >> { > >> "Id":"1" > >> "document_type":"parent" > >> "name":"John" > >> } > >> { > >> "Id":"2" > >> "document_type":"child" > >> "parentId":"1" > >> "name":"Rodney" > >> } > >> { > >> "Id":"3" > >> "document_type":"child" > >> "parentId":"1" > >> "name":"George" > >> } > >> { > >> "Id":"4" > >> "document_type":"grandchild" > >> "parentId":"1", > >> "childIdId":"2" > >> "name":"David" > >> } > >> we have complex queries to get data based on graph query parser and as > >> graph query parser does not work on solr cloud with multiple shards. I > was > >> trying to develop a logic like whenever a document gets inserted or > updated > >> make sure it gets saved in the same shard where the parent doc is > stored , > >> in that way graph query works because all the family information will > be in > >> the same shard. > >> Approach : > >> 1) If a new child/grandchild is getting inserted then get the parent doc > >> shard details and add the shard details to the document in a field > >> ex:parentshard and save the doc in the shard. > >> 2) If document is getting updated check if the parentshard field exists > if > >> so update the doc to same shard. > >> But all these check conditions will increase response time , currently > our > >> development is done in cloud mode with single shard and using solrj to > >> save the data. > >> Also i an unable to figure out the query to update doc to a particular > >> shard. > >> > >> Any suggestions will help . > >> > >> Thanks in Advance > >> sam >
Authentication for each collection
Hi All, We have 2 collections and we are using basic authentication against solr , configured in security.json . Is it possible to configure in such a way that we have different credentials for each collection . Please advise if there is any other approach i can look into. Example ; user1:password1 for collection A user2:password2 for collection B
Re: Java Streaming API - nested Hashjoins with zk and accesstoken
Hi All, Any advice on this. Thanks sam On Sun, Nov 1, 2020 at 11:05 PM Anamika Solr wrote: > Hi All, > > I need to combine 3 different documents using hashjoin. I am using below > query(ignore placeholder queries): > > > hashJoin(hashJoin(search(collectionName,q="*:*",fl="id",qt="/export",sort="id > desc"), hashed = > select(search(collectionName,q="*:*",fl="id",qt="/export",sort="id > asc")),on="id"), hashed = > select(search(collectionName,q="*:*",fl="id",qt="/export",sort="id > asc")),on="id") > > This works with simple TupleStream in java. But I also need to pass auth > token on zk. So I have to use below code: > ZkClientClusterStateProvider zkCluster = new > ZkClientClusterStateProvider(zkHosts, null); > SolrZkClient zkServer = zkCluster.getZkStateReader().getZkClient(); > StreamFactory streamFactory = new > StreamFactory().withCollectionZkHost("collectionName"), > zkServer.getZkServerAddress()) > .withFunctionName("search", CloudSolrStream.class) > .withFunctionName("hashJoin", HashJoinStream.class) > .withFunctionName("select", SelectStream.class); > > try (HashJoinStream hashJoinStream = > (HashJoinStream)streamFactory.constructStream(expr);){} > > Issue is one hashjoin with nested select and search works fine with this > api. But the multiple hashjoin is not completing the task. I can see > expression is correctly parsed, but its waiting indefinitely to complete > the thread. > > Any help is appreciated. > > Thanks, > Anamika >
Shard Lock
Hi All, We are getting below exception from Solr where 3 zk with 3 solr nodes and 3 replicas. It was working fine and we got this exception unexpectedly. - - *k04o95kz_shard2_replica_n10:* org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Index dir *'/opt/solr/volumes/data/cores/k04o95kz_shard2_replica_n10/data/index.20201126040543992' of core 'k04o95kz_shard2_replica_n10' is already locked. The most likely cause is another Solr server (or another solr core in this server) also configured to use this directory; other possible causes may be specific to lockType: native* - *k04o95kz_shard3_replica_n16: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Index dir '/opt/solr/volumes/data/cores/k04o95kz_shard3_replica_n16/data/index.20201126040544142' of core 'k04o95kz_shard3_replica_n16' is already locked. The most likely cause is another Solr server (or another solr core in this server) also configured to use this directory; other possible causes may be specific to lockType: native* - [image: Screen Shot 2020-11-30 at 4.10.46 PM.png] [image: Screen Shot 2020-11-30 at 4.09.29 PM.png] Any advice Thanks sam
Re: Shard Lock
when checked in to *opt/solr/volumes/data/cores/ both **k04o95kz_shard2_replica_n10 and **k04o95kz_shard3_replica_n16 replicate are not present no idea how they got deleted.* On Mon, Nov 30, 2020 at 4:13 PM sambasivarao giddaluri < sambasiva.giddal...@gmail.com> wrote: > Hi All, > We are getting below exception from Solr where 3 zk with 3 solr nodes and > 3 replicas. It was working fine and we got this exception unexpectedly. > >- > - *k04o95kz_shard2_replica_n10:* > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: >Index dir > *'/opt/solr/volumes/data/cores/k04o95kz_shard2_replica_n10/data/index.20201126040543992' >of core 'k04o95kz_shard2_replica_n10' is already locked. The most likely >cause is another Solr server (or another solr core in this server) also >configured to use this directory; other possible causes may be specific to >lockType: native* >- *k04o95kz_shard3_replica_n16: > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: >Index dir > > '/opt/solr/volumes/data/cores/k04o95kz_shard3_replica_n16/data/index.20201126040544142' >of core 'k04o95kz_shard3_replica_n16' is already locked. The most likely >cause is another Solr server (or another solr core in this server) also >configured to use this directory; other possible causes may be specific to >lockType: native* >- > > > [image: Screen Shot 2020-11-30 at 4.10.46 PM.png] > > [image: Screen Shot 2020-11-30 at 4.09.29 PM.png] > > Any advice > > Thanks > sam >
Graph Query Parser with pagination
Hi All, Is it possible to search on a index using graph query parser with pagination available . ex: 1 <--2<--3 1 <--4<--5 1 <--6<--7 and so on 1 is parent of 2,4 and 2 is parent of 3 and 4 is parent of 5 1 is doc type A and 2,4 are of type doc B and 3,5 are of type C similarly if i have 200 children similar to 2,4,6 schema example: doc A { id : 1 name: Laptop } doc B { id : 2 parent:1 name: Dell } doc C { id : 3 parent:2 mainparent:1 name: latitude 15inch } doc A { id : 1 name: Laptop } doc B { id : 4 parent:1 name: Dell Desktop } doc C { id : 5 parent:4 mainparent:1 name: latitude 15inch } So my query doc C.name=latitude 15inch and doc A.name=laptop this will give me two results when from doc C if am using graph query parser , but instead of getting all results in one call , can add some kind of pagination . Or any other suggestions ? which can be used to achieve the below results where we multiple docs involved in query . Regards sambasiva
Odata filter query to Solr query
Hi All, Do we have any library which can convert Odata filter to Solr Query EX : $filter=Address eq 'Redmond' to ?q=Address:Redmond Any suggestions will help. Thanks Sambasiva