Re: solr sorting problem
Were you able to get it work .. if yes how ? I'm having almost the same problem. I used the " fieldType name="alphaOnlySort" class="solr.TextField" as in the sample schema.xml , to define a field named "alphaname". Then copied from one of the fields name "foodDescUS" to "alphaname". When i try to sort using alphaname ... i get this error :- The field :foodDesc present in DataConfig does not have a counterpart in Solr Schema Please help Thanks Pratik -- View this message in context: http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2851229.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr sorting problem
Hello, I got over that problem but now i am facing a new problem. Indexing works but search does not. I used the following line in the schema:- and I'm trying to use the default "alphaOnlySort" in the sample schema.xml. Database is MySQL, there is a column/field named ColXYZ My data-config looks like :- In which scenarios would SOLR index the records/documents but the search won't work Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2886248.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr sorting problem
Hi, Thanks for your reply. I'm, using "commit=true" while indexing, and it does index the records and show the number of records indexed. The problem is that search yields 0 records ( numFound="0" ). e.g. 00onappl There are some entries for spell checking in my schema too. e.g. The Search URL is something like:- http://localhost:8983/solr/select/?q=apple&indent=on http://localhost:8983/solr/select/?q=apple&version=2.2&start=0&rows=10&indent=on Cache could not be a problem as it did not fetch any records from the very begining. So, basically it does not fetch any documents/records whereas it does index them. Thanks Pratik -- View this message in context: http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2889075.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr sorting problem
Hi, Were you able to sort the results using alphaOnlySort ? If yes what changes were made to the schema and data-config ? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2889473.html Sent from the Solr - User mailing list archive at Nabble.com.
Should I Use Solr
Hi, I am using Oracle 11g2 and we are having a schema where few tables are having more than 100 million rows (some of them are Varchar2 100 bytes). And we have to frequently do the LIKE based search on those tables. Sometimes we need to join the tables also. Insert / Updates are also happening very frequently for such tables (1000 insert / updates per second) by other applications. So my question is, for my User Interface, should I use Apache Solr to let user search on these tables instead of SQL queries? I have tried SQL and it is really slow (considering amount of data I am having in my database). My requirements are, Result should come faster and it should be accurate. It should have the latest data. Can you suggest if I should go with Apache Solr, or another solution for my problem ? Regards, Pratik Thaker The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
Streaming Expressions : rollup function returning results with duplicate tuples
Hi, I have a streaming expression which uses rollup function. My understanding is that rollup takes an incoming stream and aggregates over given buckets. However, with following query the result contains duplicate tuples. Following is the streaming expression. rollup( fetch( collection1, gatherNodes( collection1, gatherNodes(collection1, walk="54227b412a1c4e574f88f2bb->eventParticipantID", gather="eventID" ), walk="eventID->conceptid", gather="conceptid", trackTraversal="true", scatter="branches,leaves" ), fl="schematype", on="node=conceptid" ), over="schematype", count(schematype) ) The result returned is as follows. { "result-set": { "docs": [ { "count(schematype)": 1, "schematype": "Company" }, { "count(schematype)": 1, "schematype": "Founding Event" }, { "count(schematype)": 1, "schematype": "Customer" }, { "count(schematype)": 1, "schematype": "Founding Event" // duplicate }, { "count(schematype)": 1, "schematype": "Employment" // duplicate }, { "count(schematype)": 1, "schematype": "Founding Event" }, { "count(schematype)": 4, "schematype": "Employment" },.. ] } As you can see, there are more than one tuples for 'Founding Event'/'Employment' Am I missing something here? Following is the content of stream which is wrapped by rollup, if it helps. // stream on which rollup is working { "result-set": { "docs": [ { "node": "54227b412a1c4e574f88f2bb", "schematype": "Company", "collection": "collection1", "field": "node", "level": 0 }, { "node": "543004f0c92c0a651166aea5", "schematype": "Founding Event", "collection": "collection1", "field": "eventID", "level": 1 }, { "node": "543004f0c92c0a651166ae99", "schematype": "Customer", "collection": "collection1", "field": "eventID", "level": 1 }, { "node": "543004f0c92c0a651166aea1", "schematype": "Founding Event", "collection": "collection1", "field": "eventID", "level": 1 }, { "node": "543004f0c92c0a651166ae78", "schematype": "Employment", "collection": "collection1", "field": "eventID", "level": 1 }, { "node": "54ee6178b54c1d65412b5f9f", "schematype": "Founding Event", "collection": "collection1", "field": "eventID", "level": 1 }, { "node": "543004f0c92c0a651166ae7c", "schematype": "Employment", "collection": "collection1", "field": "eventID", "level": 1 }, { "node": "543004f0c92c0a651166ae80", "schematype": "Employment", "collection": "collection1", "field": "eventID", "level": 1 }, { "node": "543004f0c92c0a651166ae8a", "schematype": "Employment", "collection": "collection1", "field": "eventID", "level": 1 }, { "node": "543004f0c92c0a651166ae94", "schematype": "Employment", "collection": "collection1", "field": "eventID", "level": 1 }, { "node": "543004f0c92c0a651166ae9d", "schematype": "Customer", "collection": "collection1", "field": "eventID", "level": 1 }, { "EOF": true, "RESPONSE_TIME": 38 } ] } } If I rollup on the level field then the results are as expected but not when the field is schematype. Any idea what's going on here? Thanks, Pratik
Re: Streaming Expressions : rollup function returning results with duplicate tuples
Yes, that was the missing piece. Thanks a lot! On Thu, Jun 22, 2017 at 5:20 PM, Joel Bernstein wrote: > Here is the psuedo code: > > rollup(sort(fetch(gatherNodes( > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Jun 22, 2017 at 5:19 PM, Joel Bernstein > wrote: > > > You'll need to use the sort expression to sort the nodes by schemaType > > first. The rollup expression is doing a MapReduce rollup that requires > the > > the records to be sorted by the "over" fields. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Thu, Jun 22, 2017 at 2:49 PM, Pratik Patel > wrote: > > > >> Hi, > >> > >> I have a streaming expression which uses rollup function. My > understanding > >> is that rollup takes an incoming stream and aggregates over given > buckets. > >> However, with following query the result contains duplicate tuples. > >> > >> Following is the streaming expression. > >> > >> rollup( > >> fetch( > >> collection1, > >> gatherNodes( > >> collection1, > >> gatherNodes(collection1, > >> walk="54227b412a1c4e574f88f2bb > >> ->eventParticipantID", > >> gather="eventID" > >> ), > >> walk="eventID->conceptid", > >> gather="conceptid", > >> trackTraversal="true", scatter="branches,leaves" > >> ), > >> fl="schematype", > >> on="node=conceptid" > >> ), > >> over="schematype", > >> count(schematype) > >> ) > >> > >> The result returned is as follows. > >> > >> { > >> "result-set": { > >> "docs": [ > >> { > >> "count(schematype)": 1, > >> "schematype": "Company" > >> }, > >> { > >> "count(schematype)": 1, > >> "schematype": "Founding Event" > >> }, > >> { > >> "count(schematype)": 1, > >> "schematype": "Customer" > >> }, > >> { > >> "count(schematype)": 1, > >> "schematype": "Founding Event" // duplicate > >> }, > >> { > >> "count(schematype)": 1, > >> "schematype": "Employment" // duplicate > >> }, > >> { > >> "count(schematype)": 1, > >> "schematype": "Founding Event" > >> }, > >> { > >> "count(schematype)": 4, > >> "schematype": "Employment" > >> },.. > >> ] > >> } > >> > >> As you can see, there are more than one tuples for 'Founding > >> Event'/'Employment' > >> > >> Am I missing something here? > >> > >> Following is the content of stream which is wrapped by rollup, if it > >> helps. > >> > >> // stream on which rollup is working > >> { > >> "result-set": { > >> "docs": [ > >> { > >> "node": "54227b412a1c4e574f88f2bb", > >> "schematype": "Company", > >> "collection": "collection1", > >> "field": "node", > >> "level": 0 > >> }, > >> { > >> "node": "543004f0c92c0a651166aea5", > >> "schematype": "Founding Event", > >> "collection": "collection1", > >> "field": "eventID", > >> "level": 1 > >> }, > >> { > >> "node": "543004f0c92c0a651166ae99", > >> "schematype": "Customer", > >> "collection": "collection1", > >> "field": "eventID", > >> "level": 1
Limit for facet function of Streaming Expressions in solr cloud
Hey Everyone, This is about the facet function of Streaming Expression. Is there any way to set limit for number of facets to infinite? The *bucketSizeLimit parameter *seems to accept only those numbers which are greater than 0. Thanks, Pratik
Re: Limit for facet function of Streaming Expressions in solr cloud
Thanks Joel. For my use case I can switch to rollup for now which can work with "/export" query type. On Thu, Jun 29, 2017 at 10:11 AM, Joel Bernstein wrote: > Yes, I see this is hardcoded into the parameter checks. We can create a > ticket to allow unlimited. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Jun 29, 2017 at 10:06 AM, Pratik Patel > wrote: > > > Hey Everyone, > > > > This is about the facet function of Streaming Expression. Is there any > way > > to set limit for number of facets to infinite? The *bucketSizeLimit > > parameter *seems to accept only those numbers which are greater than 0. > > > > Thanks, > > Pratik > > >
Streaming expressions and Jetty Host
Hi Everyone, We are running solr 6.4.1 in cloud mode on CentOS production server. Currently, we are using the embedded zookeeper. It is a simple set up with one collection and one shard. By default, Jetty server binds to all interfaces which is not safe so we have changed the bin/solr script. We have added "-Djetty.host=127.0.0.1" in SOLR_START_OPTS so that it looks like as follows. SOLR_START_OPTS=('-server' "${JAVA_MEM_OPTS[@]}" "${GC_TUNE[@]}" "${GC_LOG_OPTS[@]}" \ "${REMOTE_JMX_OPTS[@]}" "${CLOUD_MODE_OPTS[@]}" $SOLR_LOG_LEVEL_OPT -Dsolr.log.dir="$SOLR_LOGS_DIR" \ "-Djetty.host=127.0.0.1" "-Djetty.port=$SOLR_PORT" "-DSTOP.PORT=$stop_port" "-DSTOP.KEY=$STOP_KEY" \ "${SOLR_HOST_ARG[@]}" "-Duser.timezone=$SOLR_TIMEZONE" \ "-Djetty.home=$SOLR_SERVER_DIR" "-Dsolr.solr.home=$SOLR_HOME" "-Dsolr.install.dir=$SOLR_TIP" \ "${LOG4J_CONFIG[@]}" "${SOLR_OPTS[@]}") We just found that with this change everything works fine in cloud mode except the streaming expressions. With streaming expressions, we get following response. org.apache.solr.client.solrj.SolrServerException: Server refused connection > at: http://:8081/solr/collection1_shard1_replica1 We don't get this error if we let jetty server bind to all interfaces. Any idea about what's the problem here? Thanks, Pratik
Solr not preserving milliseconds precision for zero milliseconds
Hello Everyone, Say I have a document like one below. > { > "id":"test", > "startTime":"2013-02-10T18:36:07.000Z" > } I add this document to solr index using the admin UI and "update" request handler. It gets added successfully but when I retrieve this document back using "id" I get following. { > "id":"test", > "startTime":"2013-02-10T18:36:07Z", > "_version_":1580456021738913792}] > } As you can see, the milliseconds precision in date field "startTime" is lost. Precision is preserved for non-zero milliseconds but it's being lost for zero values. The field type of "startTime" field is as follows. docValues="true" precisionStep="0"/> Does anyone know how I can preserve milliseconds even if its zero? Or is it not possible at all? Thanks, Pratik
Re: Solr not preserving milliseconds precision for zero milliseconds
Thanks for the clarification. I'll change my code to accommodate this behavior. On Thu, Oct 5, 2017 at 6:24 PM, Chris Hostetter wrote: > : > "startTime":"2013-02-10T18:36:07.000Z" > ... > : handler. It gets added successfully but when I retrieve this document > back > : using "id" I get following. > ... > : > "startTime":"2013-02-10T18:36:07Z", > ... > : As you can see, the milliseconds precision in date field "startTime" is > : lost. Precision is preserved for non-zero milliseconds but it's being > lost > : for zero values. The field type of "startTime" field is as follows. > ... > : Does anyone know how I can preserve milliseconds even if its zero? Or is > it > : not possible at all? > > ms precision is being preserved -- but as you mentioned, the fractional > seconds you indexed are "0" therefore they are not needed/preserved when > writing the response to maintain ms precision. > > This is the correct formatting as specified in the specification for the > time format that Solr follows... > > https://lucene.apache.org/solr/guide/working-with-dates.html > https://www.w3.org/TR/xmlschema-2/#dateTime > > >>> 3.2.7.2 Canonical representation > >>> ... > >>> The fractional second string, if present, must not end in '0'; > > > > -Hoss > http://www.lucidworks.com/ >
Re: Graph Traversal
For now, you can probably use Cartesian function of Streaming Expressions which Joel implemented to solve the same problem. https://issues.apache.org/jira/browse/SOLR-10292 http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-coming-in-solr-66.html Regards, Pratik On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein wrote: > I don't see a jira ticket for this yet. Feel free to create it and reply > back with the link. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Oct 27, 2017 at 9:55 AM, Kojo wrote: > > > Hi, I was looking for information on Graph Traversal. More specifically, > > support to search graph on multivalued field. > > > > Searching on the Internet, I found a question exactly the same of mine, > > with an answer that what I need is not implemented yet: > > http://lucene.472066.n3.nabble.com/Using-multi-valued- > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html > > > > > > Is there a ticket on Jira to follow the implementation of search graph on > > multivalued field? > > > > Thank you, > > >
Re: Graph Traversal
By including Cartesian function in Streaming Expression pipeline, you can convert a tuple having one multivalued field into multiple tuples where each tuple holds one value for the field which was originally multivalued. For example, if you have following document. { id: someID, fruits: [apple, organge, banana] } // fruits is multivalued > field Applying Cartesian function would give following tuples. { id: someID , fruits: apple }, { id: someID, fruits: orange }, {id: > someID, fruits: banana } Now that fruits holds single values, you can also use any Streaming Expression functions which don't work with multivalued fields. This happens in the Streaming Expression pipeline so you don't have to flatten your documents in index. On Mon, Oct 30, 2017 at 8:39 AM, Kojo wrote: > Hi, > just a question, I have no deep background on Solr, Graph etc. > This solution looks like normalizing data like a m2m table in sql database, > is it? > > > > 2017-10-29 21:51 GMT-02:00 Pratik Patel : > > > For now, you can probably use Cartesian function of Streaming Expressions > > which Joel implemented to solve the same problem. > > > > https://issues.apache.org/jira/browse/SOLR-10292 > > http://joelsolr.blogspot.com/2017/03/streaming-nlp-is- > > coming-in-solr-66.html > > > > Regards, > > Pratik > > > > On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein > > wrote: > > > > > I don't see a jira ticket for this yet. Feel free to create it and > reply > > > back with the link. > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Fri, Oct 27, 2017 at 9:55 AM, Kojo wrote: > > > > > > > Hi, I was looking for information on Graph Traversal. More > > specifically, > > > > support to search graph on multivalued field. > > > > > > > > Searching on the Internet, I found a question exactly the same of > mine, > > > > with an answer that what I need is not implemented yet: > > > > http://lucene.472066.n3.nabble.com/Using-multi-valued- > > > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html > > > > > > > > > > > > Is there a ticket on Jira to follow the implementation of search > graph > > on > > > > multivalued field? > > > > > > > > Thank you, > > > > > > > > > >
Re: Graph Traversal
You use this in query time. Since Streaming Expressions can be pipelined, the next stage/function of pipeline will work on the new tuples generated. On Mon, Oct 30, 2017 at 10:09 AM, Kojo wrote: > Do you store this new tuples, created by Streaming Expressions, in a new > Solr cloud collection? Or just use this tuples in query time? > > 2017-10-30 11:00 GMT-02:00 Pratik Patel : > > > By including Cartesian function in Streaming Expression pipeline, you can > > convert a tuple having one multivalued field into multiple tuples where > > each tuple holds one value for the field which was originally > multivalued. > > > > For example, if you have following document. > > > > { id: someID, fruits: [apple, organge, banana] } // fruits is > multivalued > > > field > > > > > > Applying Cartesian function would give following tuples. > > > > { id: someID , fruits: apple }, { id: someID, fruits: orange }, {id: > > > someID, fruits: banana } > > > > > > Now that fruits holds single values, you can also use any Streaming > > Expression functions which don't work with multivalued fields. This > happens > > in the Streaming Expression pipeline so you don't have to flatten your > > documents in index. > > > > On Mon, Oct 30, 2017 at 8:39 AM, Kojo wrote: > > > > > Hi, > > > just a question, I have no deep background on Solr, Graph etc. > > > This solution looks like normalizing data like a m2m table in sql > > database, > > > is it? > > > > > > > > > > > > 2017-10-29 21:51 GMT-02:00 Pratik Patel : > > > > > > > For now, you can probably use Cartesian function of Streaming > > Expressions > > > > which Joel implemented to solve the same problem. > > > > > > > > https://issues.apache.org/jira/browse/SOLR-10292 > > > > http://joelsolr.blogspot.com/2017/03/streaming-nlp-is- > > > > coming-in-solr-66.html > > > > > > > > Regards, > > > > Pratik > > > > > > > > On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein > > > > wrote: > > > > > > > > > I don't see a jira ticket for this yet. Feel free to create it and > > > reply > > > > > back with the link. > > > > > > > > > > Joel Bernstein > > > > > http://joelsolr.blogspot.com/ > > > > > > > > > > On Fri, Oct 27, 2017 at 9:55 AM, Kojo wrote: > > > > > > > > > > > Hi, I was looking for information on Graph Traversal. More > > > > specifically, > > > > > > support to search graph on multivalued field. > > > > > > > > > > > > Searching on the Internet, I found a question exactly the same of > > > mine, > > > > > > with an answer that what I need is not implemented yet: > > > > > > http://lucene.472066.n3.nabble.com/Using-multi-valued- > > > > > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html > > > > > > > > > > > > > > > > > > Is there a ticket on Jira to follow the implementation of search > > > graph > > > > on > > > > > > multivalued field? > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > > > > > > > >
Re: Streaming Expression - cartesianProduct
Roll up needs documents to be sorted by the "over" field. Check this for more details http://lucene.472066.n3.nabble.com/Streaming-Expressions-rollup-function-returning-results-with-duplicate-tuples-td4342398.html On Wed, Nov 1, 2017 at 3:41 PM, Kojo wrote: > Wrap cartesianProduct function with fetch function works as expected. > > But rollup function over cartesianProduct doesn´t aggregate on a returned > field of the cartesianProduct. > > > The field "id_researcher" bellow is a Multivalued field: > > > > This one works: > > > fetch(reasercher, > > cartesianProduct( > having( > cartesianProduct( > search(schoolarship,zkHost="localhost:9983",qt="/export", > q="*:*", > fl="process, area, id_reasercher",sort="process asc"), > area > ), > eq(area, val(Anything))), > id_reasercher), > fl="name, django_id", > on="id_reasercher=django_id" > ) > > > This one doesn´t works: > > rollup( > > cartesianProduct( > having( > cartesianProduct( > search(schoolarship,zkHost="localhost:9983",qt="/export", > q="*:*", > fl="process, area, id_researcher, status",sort="process asc"), > area > ), > eq(area, val(Anything))), > id_researcher), > over=id_researcher,count(*) > ) > > If I aggregate over a non MultiValued field, it works. > > > Is that correct, rollup doesn´t work on a cartesianProduct? >
DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain
Hi All, I am using SOLR Cloud 6.0 I am receiving below exception very frequently in solr logs, o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: RunUpdateProcessor has received an AddUpdateCommand containing a document that appears to still contain Atomic document update operations, most likely because DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:63) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:936) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1091) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:714) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97) Can you please help me with the root cause ? Below is the snapshot of solrconfig, [^\w-\.] _ -MM-dd'T'HH:mm:ss.SSSZ -MM-dd'T'HH:mm:ss,SSSZ -MM-dd'T'HH:mm:ss.SSS -MM-dd'T'HH:mm:ss,SSS -MM-dd'T'HH:mm:ssZ -MM-dd'T'HH:mm:ss -MM-dd'T'HH:mmZ -MM-dd'T'HH:mm -MM-dd HH:mm:ss.SSSZ -MM-dd HH:mm:ss,SSSZ -MM-dd HH:mm:ss.SSS -MM-dd HH:mm:ss,SSS -MM-dd HH:mm:ssZ -MM-dd HH:mm:ss -MM-dd HH:mmZ -MM-dd HH:mm -MM-dd strings java.lang.Boolean booleans java.util.Date tdates java.lang.Long java.lang.Integer tlongs java.lang.Number tdoubles Regards, Pratik Thaker The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain
Hi Friends, Can you please try to give me some details about below issue ? Regards, Pratik Thaker From: Pratik Thaker Sent: 07 February 2017 17:12 To: 'solr-user@lucene.apache.org' Subject: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain Hi All, I am using SOLR Cloud 6.0 I am receiving below exception very frequently in solr logs, o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: RunUpdateProcessor has received an AddUpdateCommand containing a document that appears to still contain Atomic document update operations, most likely because DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:63) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:936) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1091) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:714) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97) Can you please help me with the root cause ? Below is the snapshot of solrconfig, [^\w-\.] _ -MM-dd'T'HH:mm:ss.SSSZ -MM-dd'T'HH:mm:ss,SSSZ -MM-dd'T'HH:mm:ss.SSS -MM-dd'T'HH:mm:ss,SSS -MM-dd'T'HH:mm:ssZ -MM-dd'T'HH:mm:ss -MM-dd'T'HH:mmZ -MM-dd'T'HH:mm -MM-dd HH:mm:ss.SSSZ -MM-dd HH:mm:ss,SSSZ -MM-dd HH:mm:ss.SSS -MM-dd HH:mm:ss,SSS -MM-dd HH:mm:ssZ -MM-dd HH:mm:ss -MM-dd HH:mmZ -MM-dd HH:mm -MM-dd strings java.lang.Boolean booleans java.util.Date tdates java.lang.Long java.lang.Integer tlongs java.lang.Number tdoubles Regards, Pratik Thaker The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
Fwd: Solr dynamic field blowing up the index size
Here is the same question in stackOverflow for better format. http://stackoverflow.com/questions/42370231/solr- dynamic-field-blowing-up-the-index-size Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine but the problem is that index size with solr 6 is way too large. In solr 5, index size was about 15GB and in solr 6, for the same data, the index size is 300GB! I am not able to understand what contributes to such huge difference in solr 6. I have been able to identify a field which is blowing up the size of index. It is as follows. When this field is commented out, the index size reduces to less than 10GB. This field is of type text_general. Following is the definition of this type. Few things which I did to debug this issue: - I have ensured that field type definition is same as what I was using in solr 5 and it is also valid in version 6. This field type considers a list of "stopwords" to be ignored during indexing. I have supplied the same list of stopwords which we were using in solr 5. I have verified that path of this file is correct and it is being loaded fine in solr admin UI. When I analyse these fields using "Analysis" tab of the solr admin UI, I can see that stopwords are being filtered out. However, when I query with some of these stopwords, I do get the results back which makes me think that probably stopwords are being indexed. Any idea what could increase the size of index by so much in solr 6?
Re: Fwd: Solr dynamic field blowing up the index size
Thanks for the reply. I can see that in solr 6, more than 50% of the index directory is occupied by ".nvd" file extension. It is something related to norms and doc values. On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch wrote: > Did you look in the data directories to check what index file extensions > contribute most to the difference? That could give a hint. > > Regards, > Alex > > On 21 Feb 2017 9:47 AM, "Pratik Patel" wrote: > > > Here is the same question in stackOverflow for better format. > > > > http://stackoverflow.com/questions/42370231/solr- > > dynamic-field-blowing-up-the-index-size > > > > Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine > but > > the problem is that index size with solr 6 is way too large. In solr 5, > > index size was about 15GB and in solr 6, for the same data, the index > size > > is 300GB! I am not able to understand what contributes to such huge > > difference in solr 6. > > > > I have been able to identify a field which is blowing up the size of > index. > > It is as follows. > > > > > stored="true" multiValued="true" /> > > > > > stored="false" multiValued="true" /> > > > > > > When this field is commented out, the index size reduces to less than > 10GB. > > > > This field is of type text_general. Following is the definition of this > > type. > > > > > positionIncrementGap="100"> > > > > > > > > > > > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > > > protected="protwords.txt" generateWordParts="1" > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > catenateAll="0" splitOnCaseChange="0"/> > > > > > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/ > > solr-6.4.1/server/solr/collection1/conf/stopwords.txt" > > /> > > > > > > > > > > > > > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > > > protected="protwords.txt" generateWordParts="1" > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > catenateAll="0" splitOnCaseChange="0"/> > > > > > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/ > > solr-6.4.1/server/solr/collection1/conf/stopwords.txt" > > /> > > > > > > > > Few things which I did to debug this issue: > > > >- I have ensured that field type definition is same as what I was > using > >in solr 5 and it is also valid in version 6. This field type > considers a > >list of "stopwords" to be ignored during indexing. I have supplied the > > same > >list of stopwords which we were using in solr 5. I have verified that > > path > >of this file is correct and it is being loaded fine in solr admin UI. > > When > >I analyse these fields using "Analysis" tab of the solr admin UI, I > can > > see > >that stopwords are being filtered out. However, when I query with some > > of > >these stopwords, I do get the results back which makes me think that > >probably stopwords are being indexed. > > > > Any idea what could increase the size of index by so much in solr 6? > > >
Re: Fwd: Solr dynamic field blowing up the index size
I am using the schema from solr 5 which does not have any field with docValues enabled.In fact to ensure that everything is same as solr 5 (except the breaking changes) I am using the solrconfig.xml also from solr 5 with schemaFactory set as classicSchemaFactory to use schema.xml from solr 5. On Tue, Feb 21, 2017 at 11:33 AM, Alexandre Rafalovitch wrote: > Did you reuse the schema or rebuilt it on top of the latest examples? > Because the latest example schema enabled docValues for strings on the > fieldType level. > > I would do a diff of the schemas to see what changed. If they look > very different and you are looking for tools to normalize/extract > elements from schemas, you may find my latest Revolution presentation > useful for that: > https://www.slideshare.net/arafalov/rebuilding-solr-6- > examples-layer-by-layer-lucenesolrrevolution-2016 > (e.g. slide 20). There is also the video there at the end. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 21 February 2017 at 11:18, Mike Thomsen wrote: > > Correct me if I'm wrong, but heavy use of doc values should actually blow > > up the size of your index considerably if they are in fields that get > sent > > a lot of data. > > > > On Tue, Feb 21, 2017 at 10:50 AM, Pratik Patel > wrote: > > > >> Thanks for the reply. I can see that in solr 6, more than 50% of the > index > >> directory is occupied by ".nvd" file extension. It is something related > to > >> norms and doc values. > >> > >> On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch < > >> arafa...@gmail.com> > >> wrote: > >> > >> > Did you look in the data directories to check what index file > extensions > >> > contribute most to the difference? That could give a hint. > >> > > >> > Regards, > >> > Alex > >> > > >> > On 21 Feb 2017 9:47 AM, "Pratik Patel" wrote: > >> > > >> > > Here is the same question in stackOverflow for better format. > >> > > > >> > > http://stackoverflow.com/questions/42370231/solr- > >> > > dynamic-field-blowing-up-the-index-size > >> > > > >> > > Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app > fine > >> > but > >> > > the problem is that index size with solr 6 is way too large. In > solr 5, > >> > > index size was about 15GB and in solr 6, for the same data, the > index > >> > size > >> > > is 300GB! I am not able to understand what contributes to such huge > >> > > difference in solr 6. > >> > > > >> > > I have been able to identify a field which is blowing up the size of > >> > index. > >> > > It is as follows. > >> > > > >> > > >> > > stored="true" multiValued="true" /> > >> > > > >> > > >> > > stored="false" multiValued="true" /> > >> > > > >> > > > >> > > When this field is commented out, the index size reduces to less > than > >> > 10GB. > >> > > > >> > > This field is of type text_general. Following is the definition of > this > >> > > type. > >> > > > >> > > >> > > positionIncrementGap="100"> > >> > > > >> > > > >> > > > >> > > > >> > > >> > > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > >> > > >> > > protected="protwords.txt" generateWordParts="1" > >> > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > >> > > catenateAll="0" splitOnCaseChange="0"/> > >> > > > >> > > >> > > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/ > >> > > solr-6.4.1/server/solr/collection1/conf/stopwords.txt" > >> > > /> > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > >> > > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > >> > >
Re: Fwd: Solr dynamic field blowing up the index size
I think I have found something concrete. Reading up more on nvd file extension, I found that it is being used to store length and boost factors for documents and fields. These are normalization files. Normalization on a field is controlled by omitNorms attribute. If omitNorms=true then the field will not be normalized. I explicitly added omitNorms=true for the field type text_general and re-indexed the data. Now, my index size is much smaller. I haven't yet verified this with complete data set yet but I can see that index size is reduced. We have a large data set and it takes about 5-6 hours to index it completely so I'll index the whole data set overnight to confirm the fix. But now I am curious about omitNorms attribute. What would be the default value for omitNorms for field type "text_general". The documentation says that omitNorms=true for primitive field types like string, int etc. but I don't know what is the default value for "text_general"? I never had omitNorms set explicitly on text_general field type or any of the fields having type text_general. Has the default value of omitNorms been changed from solr 5.0.0 to 6.4.1? Any clarification on this would be really helpful. I am posting some relevant links here for someone who might face similar issue in future. http://apprize.info/php/solr_4/2.html http://stackoverflow.com/questions/18694242/what-is-omitnorms-and-version-field-in-solr-schema https://lucidworks.com/2009/09/02/scaling-lucene-and-solr/#d0e71 Thanks, Pratik On Tue, Feb 21, 2017 at 12:03 PM, Pratik Patel wrote: > I am using the schema from solr 5 which does not have any field with > docValues enabled.In fact to ensure that everything is same as solr 5 > (except the breaking changes) I am using the solrconfig.xml also from solr > 5 with schemaFactory set as classicSchemaFactory to use schema.xml from > solr 5. > > > On Tue, Feb 21, 2017 at 11:33 AM, Alexandre Rafalovitch < > arafa...@gmail.com> wrote: > >> Did you reuse the schema or rebuilt it on top of the latest examples? >> Because the latest example schema enabled docValues for strings on the >> fieldType level. >> >> I would do a diff of the schemas to see what changed. If they look >> very different and you are looking for tools to normalize/extract >> elements from schemas, you may find my latest Revolution presentation >> useful for that: >> https://www.slideshare.net/arafalov/rebuilding-solr-6-exampl >> es-layer-by-layer-lucenesolrrevolution-2016 >> (e.g. slide 20). There is also the video there at the end. >> >> Regards, >>Alex. >> >> http://www.solr-start.com/ - Resources for Solr users, new and >> experienced >> >> >> On 21 February 2017 at 11:18, Mike Thomsen >> wrote: >> > Correct me if I'm wrong, but heavy use of doc values should actually >> blow >> > up the size of your index considerably if they are in fields that get >> sent >> > a lot of data. >> > >> > On Tue, Feb 21, 2017 at 10:50 AM, Pratik Patel >> wrote: >> > >> >> Thanks for the reply. I can see that in solr 6, more than 50% of the >> index >> >> directory is occupied by ".nvd" file extension. It is something >> related to >> >> norms and doc values. >> >> >> >> On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch < >> >> arafa...@gmail.com> >> >> wrote: >> >> >> >> > Did you look in the data directories to check what index file >> extensions >> >> > contribute most to the difference? That could give a hint. >> >> > >> >> > Regards, >> >> > Alex >> >> > >> >> > On 21 Feb 2017 9:47 AM, "Pratik Patel" wrote: >> >> > >> >> > > Here is the same question in stackOverflow for better format. >> >> > > >> >> > > http://stackoverflow.com/questions/42370231/solr- >> >> > > dynamic-field-blowing-up-the-index-size >> >> > > >> >> > > Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app >> fine >> >> > but >> >> > > the problem is that index size with solr 6 is way too large. In >> solr 5, >> >> > > index size was about 15GB and in solr 6, for the same data, the >> index >> >> > size >> >> > > is 300GB! I am not able to understand what contributes to such huge >> >> > > difference in solr 6. >> >> > > >> >> > > I have been able to identify
How to figure out whether stopwords are being indexed or not
I have a field type in schema which has been applied stopwords list. I have verified that path of stopwords file is correct and it is being loaded fine in solr admin UI. When I analyse these fields using "Analysis" tab of the solr admin UI, I can see that stopwords are being filtered out. However, when I query with some of these stopwords, I do get the results back which makes me think that probably stopwords are being indexed. For example, when I run following query, I do get back results. I have word "and" in the stopwords list so I expect no results for this query. http://localhost:8081/solr/collection1/select?fq=Description_note:*%20and%20*&indent=on&q=*:*&rows=100&start=0&wt=json Does this mean that the "and" word is being indexed and stopwords are not being used? Following is the field type of field Description_note :
Re: How to figure out whether stopwords are being indexed or not
Hi Eric, Thanks for the reply! Following is the relevant part of response header with debugQuery on. { "responseHeader":{ "status":0, "QTime":282, "params":{ "q":"Description_note:* and *", "indent":"on", "wt":"json", "debugQuery":"on", "_":"1487773835305"}}, "response":{"numFound":81771,"start":0,"docs":[ { "id":"", . . . },.. ] } } On Tue, Feb 21, 2017 at 8:22 PM, Erick Erickson wrote: > Attach &debug=query to your query and look at the parsed query that's > returned. > That'll tell you what was searched at least. > > You can also use the TermsComponent to examine terms in a field directly. > > Best, > Erick > > On Tue, Feb 21, 2017 at 2:52 PM, Pratik Patel wrote: > > I have a field type in schema which has been applied stopwords list. > > I have verified that path of stopwords file is correct and it is being > > loaded fine in solr admin UI. When I analyse these fields using > "Analysis" tab > > of the solr admin UI, I can see that stopwords are being filtered out. > > However, when I query with some of these stopwords, I do get the results > > back which makes me think that probably stopwords are being indexed. > > > > For example, when I run following query, I do get back results. I have > word > > "and" in the stopwords list so I expect no results for this query. > > > > http://localhost:8081/solr/collection1/select?fq= > Description_note:*%20and%20*&indent=on&q=*:*&rows=100&start=0&wt=json > > > > Does this mean that the "and" word is being indexed and stopwords are not > > being used? > > > > Following is the field type of field Description_note : > > > > > > > positionIncrementGap="100" omitNorms="true"> > > > > > > > > > > > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > > protected="protwords.txt" > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> > > > > > words="stopwords.txt" /> > > > > > > > > > > > > > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > > protected="protwords.txt" > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> > > > > > words="stopwords.txt" /> > > > > >
Re: How to figure out whether stopwords are being indexed or not
Asterisks were not for formatting, I was trying to use a wildcard operator. Here is another example query and "parsed_query toString" entry for that. Query : http://localhost:8081/solr/collection1/select?debugQuery=on&indent=on&q=Description_note:*their*&wt=json "parsedquery_toString":"Description_note:*their*" I have word "their" in my stopwords list so I am expecting zero results but this query returns 20 documents with word "their" Here is more of the debug object of response. "debug":{ "rawquerystring":"Description_note:*their*", "querystring":"Description_note:*their*", "parsedquery":"Description_note:*their*", "parsedquery_toString":"Description_note:*their*", "explain":{ "54227b012a1c4e574f88505556987be57ef1af28d01b6d94":"\n1.0 = Description_note:*their*, product of:\n 1.0 = boost\n 1.0 = queryNorm\n", }, "QParser":"LuceneQParser", "timing":{ ... } } Thanks, Pratik On Wed, Feb 22, 2017 at 11:25 AM, Erick Erickson wrote: > That's not what I'm looking for. Way down near the end there should be > an entry like > "parsed_query toString" > > This line is pretty suspicious: 82, "params":{ "q":"Description_note:* > and *" > > Are you really searching for asterisks (I'd originally interpreted > that as bolding > which sometimes happens). Please don't do formatting with asterisks in > e-mails as it's very confusing. > > Best, > Erick > > > On Wed, Feb 22, 2017 at 8:12 AM, Pratik Patel wrote: > > Hi Eric, > > > > Thanks for the reply! Following is the relevant part of response header > > with debugQuery on. > > > > { > > "responseHeader":{ "status":0, "QTime":282, "params":{ > "q":"Description_note:* > > and *", "indent":"on", "wt":"json", "debugQuery":"on", > "_":"1487773835305"}}, > > "response":{"numFound":81771,"start":0,"docs":[ { "id":"", . > > . > > . > > },.. > > ] > > } > > } > > > > > > On Tue, Feb 21, 2017 at 8:22 PM, Erick Erickson > > > wrote: > > > >> Attach &debug=query to your query and look at the parsed query that's > >> returned. > >> That'll tell you what was searched at least. > >> > >> You can also use the TermsComponent to examine terms in a field > directly. > >> > >> Best, > >> Erick > >> > >> On Tue, Feb 21, 2017 at 2:52 PM, Pratik Patel > wrote: > >> > I have a field type in schema which has been applied stopwords list. > >> > I have verified that path of stopwords file is correct and it is being > >> > loaded fine in solr admin UI. When I analyse these fields using > >> "Analysis" tab > >> > of the solr admin UI, I can see that stopwords are being filtered out. > >> > However, when I query with some of these stopwords, I do get the > results > >> > back which makes me think that probably stopwords are being indexed. > >> > > >> > For example, when I run following query, I do get back results. I have > >> word > >> > "and" in the stopwords list so I expect no results for this query. > >> > > >> > http://localhost:8081/solr/collection1/select?fq= > >> Description_note:*%20and%20*&indent=on&q=*:*&rows=100&start=0&wt=json > >> > > >> > Does this mean that the "and" word is being indexed and stopwords are > not > >> > being used? > >> > > >> > Following is the field type of field Description_note : > >> > > >> > > >> > >> > positionIncrementGap="100" omitNorms="true"> > >> > > >> > > >> > > >> > > >> > >> > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > >> > >> protected="protwords.txt" > >> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > >> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> > >> > > >> > >> > words="stopwords.txt" /> > >> > > >> > > >> > > >> > > >> > > >> > >> > pattern="((?m)[a-z]+)'s" replacement="$1s" /> > >> > >> protected="protwords.txt" > >> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > >> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> > >> > > >> > >> > words="stopwords.txt" /> > >> > > >> > > >> >
Re: How to figure out whether stopwords are being indexed or not
That explains why I was getting back the results. Thanks! I was doing that query only to test whether stopwords are being indexed or not but apparently the query I had would not serve the purpose. I should rather have a document field with just the stop word and search against it without using wildcard to test whether the stopword was indexed or not. Thanks again. Regards, Pratik On Wed, Feb 22, 2017 at 12:10 PM, Alexandre Rafalovitch wrote: > StopFilterFactory (and WordDelimiterFilterFactory and maybe others) > are NOT multiterm aware. > > Using wildcards triggers the edge-case third type of analyzer chain > that is automatically constructed unless you specify it explicitly. > > You can see the full list of analyzers and whether they are multiterm > aware at http://www.solr-start.com/info/analyzers/ (I mark them with > "(multi)"). > > Solution in your case is probably to go away from these > performance-killing double-side wildcards and to switch to the NGrams > instead. And you may want to look at ApostropheFilterFactory while you > are at it (instead of regexp you have there). > > Regards, >Alex. > > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 22 February 2017 at 12:02, Pratik Patel wrote: > > Asterisks were not for formatting, I was trying to use a wildcard > operator. > > Here is another example query and "parsed_query toString" entry for that. > > > > Query : > > http://localhost:8081/solr/collection1/select?debugQuery= > on&indent=on&q=Description_note:*their*&wt=json > > > > "parsedquery_toString":"Description_note:*their*" > > > > I have word "their" in my stopwords list so I am expecting zero results > but > > this query returns 20 documents with word "their" > > > > Here is more of the debug object of response. > > > > > > "debug":{ > > "rawquerystring":"Description_note:*their*", > > "querystring":"Description_note:*their*", > > "parsedquery":"Description_note:*their*", > > "parsedquery_toString":"Description_note:*their*", > > "explain":{ > > "54227b012a1c4e574f88505556987be57ef1af28d01b6d94":"\n1.0 = > > Description_note:*their*, product of:\n 1.0 = boost\n 1.0 = > > queryNorm\n", > > }, > > "QParser":"LuceneQParser", > > "timing":{ ... } > > > > } > > > > Thanks, > > > > Pratik > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 11:25 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > >> That's not what I'm looking for. Way down near the end there should be > >> an entry like > >> "parsed_query toString" > >> > >> This line is pretty suspicious: 82, "params":{ "q":"Description_note:* > >> and *" > >> > >> Are you really searching for asterisks (I'd originally interpreted > >> that as bolding > >> which sometimes happens). Please don't do formatting with asterisks in > >> e-mails as it's very confusing. > >> > >> Best, > >> Erick > >> > >> > >> On Wed, Feb 22, 2017 at 8:12 AM, Pratik Patel > wrote: > >> > Hi Eric, > >> > > >> > Thanks for the reply! Following is the relevant part of response > header > >> > with debugQuery on. > >> > > >> > { > >> > "responseHeader":{ "status":0, "QTime":282, "params":{ > >> "q":"Description_note:* > >> > and *", "indent":"on", "wt":"json", "debugQuery":"on", > >> "_":"1487773835305"}}, > >> > "response":{"numFound":81771,"start":0,"docs":[ { "id":"", . > >> > . > >> > . > >> > },.. > >> > ] > >> > } > >> > } > >> > > >> > > >> > On Tue, Feb 21, 2017 at 8:22 PM, Erick Erickson < > erickerick...@gmail.com > >> > > >> > wrote: > >> > > >> >> Attach &debug=query to your query and look at the parsed query that's > >> >> returned. > >
Using multi valued field in solr cloud Graph Traversal Query
I am trying to do a graph traversal query using gatherNode function. I am seeding a streaming expression to get some documents and then I am trying to map their ids(conceptid) to a multi valued field "participantIds" and gather nodes. Here is the query I am doing. gatherNodes(collection1, > search(collection1,q="*:*",fl="conceptid",sort="conceptid > asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Project"), > walk=conceptid->participantIds, > gather="conceptid") The field participantIds is a multi valued field. This is the field which holds connections between the documents. When I execute this query, I get exception as below. { "result-set": { "docs": [ { "EXCEPTION": "java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: --> http://169.254.40.158:8081/solr/collection1_shard1_replica1/:can not sort on multivalued field: participantIds", "EOF": true, "RESPONSE_TIME": 15 } ] } } Does this mean you can not look into multivalued fields in graph traversal query? In our solr index, we have documents having "conceptid" field which is id and we have participantIds which is a multivalued field storing connections of that documents to other documents. I believe we need to have one field in document which stores connections of that document so that graph traversal is possible. If not, what is the other the way to index graph data and use graph traversal. I am trying to explore graph traversal and am new to it. Any help would be appreciated. Thanks, Pratik
BooleanEvaluator inside 'having' function of a streaming expression
> at > org.apache.solr.client.solrj.io.stream.expr.StreamFactory.createInstance(StreamFactory.java:358) > at > org.apache.solr.client.solrj.io.stream.expr.StreamFactory.constructOperation(StreamFactory.java:339) > at > org.apache.solr.client.solrj.io.stream.HavingStream.(HavingStream.java:72) > ... 38 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > Source) > at java.lang.reflect.Constructor.newInstance(Unknown Source) > at > org.apache.solr.client.solrj.io.stream.expr.StreamFactory.createInstance(StreamFactory.java:351) > ... 40 more > Caused by: java.lang.NumberFormatException: For input string: > "524efcfd505637004b1f6f24" > at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) > at sun.misc.FloatingDecimal.parseDouble(Unknown Source) > at java.lang.Double.parseDouble(Unknown Source) > at > org.apache.solr.client.solrj.io.ops.LeafOperation.(LeafOperation.java:48) > at > org.apache.solr.client.solrj.io.ops.EqualsOperation.(EqualsOperation.java:42) > ... 44 more I can see that solr is trying to parse storeid as double and hence the NumberFormatException, even though this field is of type String in schema. How can I fix this? Thanks, Pratik
Re: BooleanEvaluator inside 'having' function of a streaming expression
it's not a stable version* On Mon, Mar 13, 2017 at 1:34 PM, Pratik Patel wrote: > Thanks Joel! This is just a simplified sample query that I created to > better demonstrate the issue. I am not sure whether I want to upgrade to > solr 6.5 as only developer version is available yet and it's a stable > version as far as I know. Thanks for the clarification. I will try to find > some other logic for my query. > > On Mon, Mar 13, 2017 at 1:23 PM, Joel Bernstein > wrote: > >> If you're using Solr 6.4 then the expression you're running won't work, >> because on numeric comparisons are supported. >> >> Solr 6.5 will have the expanded Evaluator functionality, which has string >> comparisons. >> >> In the expression you're working with it would be much more performant >> though to filter the query on the storeid. >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Mon, Mar 13, 2017 at 1:06 PM, Pratik Patel >> wrote: >> >> > Hi, >> > >> > I am trying to write a streaming expression with 'having' function in >> it. >> > Following is my simple query. >> > >> > >> > having( >> > >search(collection1,q="*:*",fl="storeid",sort="storeid >> > > asc",fq=tags:"Company"), >> > >eq(storeid,524efcfd505637004b1f6f24) >> > > ) >> > >> > >> > Here, storeid is a field of type "string" in schema. But when I execute >> > this query in admin UI, I am getting a NumberFormatException. >> > >> > Here is the response in admin UI. >> > >> > >> > { "result-set": { "docs": [ { "EXCEPTION": "For input string: >> > \"524efcfd505637004b1f6f24\"", "EOF": true } ] } } >> > >> > If I change storeid value to 123 in the boolean evaluator then it works >> > fine. I tried to quote the original value so that we have >> > eq(storeid,"524efcfd505637004b1f6f24") but still it fails with same >> > exception. >> > >> > Here is the detailed stack trace from log file. >> > >> > >> > ERROR - 2017-03-13 16:56:39.516; [c:collection1 s:shard1 r:core_node1 >> > > x:collection1_shard1_replica1] org.apache.solr.common.SolrException; >> > > java.io.IOException: Unable to construct instance of >> > > org.apache.solr.client.solrj.io.stream.HavingStream >> > > at >> > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory. >> > createInstance(StreamFactory.java:358) >> > > at >> > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory. >> > constructStream(StreamFactory.java:222) >> > > at >> > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory. >> > constructStream(StreamFactory.java:215) >> > > at >> > > org.apache.solr.handler.StreamHandler.handleRequestBody( >> > StreamHandler.java:212) >> > > at >> > > org.apache.solr.handler.RequestHandlerBase.handleRequest( >> > RequestHandlerBase.java:166) >> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306) >> > > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall. >> java:658) >> > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464) >> > > at >> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter( >> > SolrDispatchFilter.java:345) >> > > at >> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter( >> > SolrDispatchFilter.java:296) >> > > at >> > > org.eclipse.jetty.servlet.ServletHandler$CachedChain. >> > doFilter(ServletHandler.java:1691) >> > > at >> > > org.eclipse.jetty.servlet.ServletHandler.doHandle( >> > ServletHandler.java:582) >> > > at >> > > org.eclipse.jetty.server.handler.ScopedHandler.handle( >> > ScopedHandler.java:143) >> > > at >> > > org.eclipse.jetty.security.SecurityHandler.handle( >> > SecurityHandler.java:548) >> > > at >> > > org.eclipse.jetty.server.session.SessionHandler. >> > doHandle(SessionHandler.java:226) >> > > at >> > > org.eclipse.jetty.server.handler.ContextHandler. >> > doHandle(ContextHandler.java:1180) >> > > at >> > > org.eclipse.jetty.servlet.ServletHandler.doScope( >> &
Re: BooleanEvaluator inside 'having' function of a streaming expression
Thanks Joel! This is just a simplified sample query that I created to better demonstrate the issue. I am not sure whether I want to upgrade to solr 6.5 as only developer version is available yet and it's a stable version as far as I know. Thanks for the clarification. I will try to find some other logic for my query. On Mon, Mar 13, 2017 at 1:23 PM, Joel Bernstein wrote: > If you're using Solr 6.4 then the expression you're running won't work, > because on numeric comparisons are supported. > > Solr 6.5 will have the expanded Evaluator functionality, which has string > comparisons. > > In the expression you're working with it would be much more performant > though to filter the query on the storeid. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Mon, Mar 13, 2017 at 1:06 PM, Pratik Patel wrote: > > > Hi, > > > > I am trying to write a streaming expression with 'having' function in it. > > Following is my simple query. > > > > > > having( > > >search(collection1,q="*:*",fl="storeid",sort="storeid > > > asc",fq=tags:"Company"), > > >eq(storeid,524efcfd505637004b1f6f24) > > > ) > > > > > > Here, storeid is a field of type "string" in schema. But when I execute > > this query in admin UI, I am getting a NumberFormatException. > > > > Here is the response in admin UI. > > > > > > { "result-set": { "docs": [ { "EXCEPTION": "For input string: > > \"524efcfd505637004b1f6f24\"", "EOF": true } ] } } > > > > If I change storeid value to 123 in the boolean evaluator then it works > > fine. I tried to quote the original value so that we have > > eq(storeid,"524efcfd505637004b1f6f24") but still it fails with same > > exception. > > > > Here is the detailed stack trace from log file. > > > > > > ERROR - 2017-03-13 16:56:39.516; [c:collection1 s:shard1 r:core_node1 > > > x:collection1_shard1_replica1] org.apache.solr.common.SolrException; > > > java.io.IOException: Unable to construct instance of > > > org.apache.solr.client.solrj.io.stream.HavingStream > > > at > > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory. > > createInstance(StreamFactory.java:358) > > > at > > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory. > > constructStream(StreamFactory.java:222) > > > at > > > org.apache.solr.client.solrj.io.stream.expr.StreamFactory. > > constructStream(StreamFactory.java:215) > > > at > > > org.apache.solr.handler.StreamHandler.handleRequestBody( > > StreamHandler.java:212) > > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest( > > RequestHandlerBase.java:166) > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306) > > > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) > > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464) > > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter( > > SolrDispatchFilter.java:345) > > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter( > > SolrDispatchFilter.java:296) > > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain. > > doFilter(ServletHandler.java:1691) > > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle( > > ServletHandler.java:582) > > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle( > > ScopedHandler.java:143) > > > at > > > org.eclipse.jetty.security.SecurityHandler.handle( > > SecurityHandler.java:548) > > > at > > > org.eclipse.jetty.server.session.SessionHandler. > > doHandle(SessionHandler.java:226) > > > at > > > org.eclipse.jetty.server.handler.ContextHandler. > > doHandle(ContextHandler.java:1180) > > > at > > > org.eclipse.jetty.servlet.ServletHandler.doScope( > > ServletHandler.java:512) > > > at > > > org.eclipse.jetty.server.session.SessionHandler. > > doScope(SessionHandler.java:185) > > > at > > > org.eclipse.jetty.server.handler.ContextHandler. > > doScope(ContextHandler.java:1112) > > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle( > > ScopedHandler.java:141) > > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle( > > ContextHandlerCollection.
Using fetch function with streaming expression
I have two types of documents in my index. eventLink and concepttData. eventLink { ancestors:[,] } conceptData-{ id:id1, conceptid, concept_name . } Both are in same collection. In my query, I am doing a gatherNodes query wrapped in some other function and ultimately I am getting a bunch of eventLink documents. Now, I am trying to get conceptData document for each id specified in eventLink's ancestors field. I am trying to do that using fetch() function. Here is simplified form of my query. fetch(collection1, > function to get eventLinks, > fl="concept_name", > on="ancestors=conceptid" > ) On executing this query, I am getting back same set of documents which are results of my streaming expression containing gatherNodes() function. No fields are added to the tuples. From documentation, it seems like fetch would fetch additional data and add it to the tuples. However, that is not happening. Resulting tuples does not have concept_name field in them. What am I missing here? I really need to get this additional data from one solr query so that I don't have to iterate over the eventLinks and get additional data by individual queries. That would badly impact performance. Any suggestions? Here is my actual query and the response. fetch(collection1, > having( > gatherNodes(collection1, > search(collection1,q="*:*",fl="conceptid",sort="conceptid > asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:"Prospects2", > qt="/export"), > walk=conceptid->eventParticipantID, > gather="eventID", > trackTraversal="true", scatter="leaves", > count(*) > ), > gt(count(*),1) > ), > fl="concept_name", > on="ancestors=conceptid" > ) Response : { > "result-set": { > "docs": [ > { > "node": "524f03355056c8b53b4ed199", > "field": "eventID", > "level": 1, > "count(*)": 2, > "collection": "collection1", > "ancestors": [ > "524f02845056c8b53b4e9871", > "524f02755056c8b53b4e9269" > ] > }, > . > } Thanks, Pratik
Re: Using fetch function with streaming expression
Hi, Joel. Thanks for the reply. So, I need to do some graph traversal queries for my use case. In my data set, I have concepts and events. concept : {name, address, bio ..}, > event: {name, date, participantIds:[concept1, concept2...] .} Events connects two or more concepts. So, this is a graph data where concepts are connected to each other via events. Each event store links to the concepts that it connects. So the field which stores those links is multivalued. This is a natural structure for my data on which I wanted to do some advanced graph traversal queries with some streaming expression. However, gatherNodes() function does not support multivalued fields yet. So, I changed my index structure to be something like this. concept : {conceptId, name, address, bio ..}, > event: {eventId, name, date, participantIds:[concept1, concept2...] .} > *create eventLink documents for each participantId in each > event > eventLink:{eventid, conceptid, id} I created eventLink documents from each event so that I can traverse the data using gatherNodes() function. With this change, I was able to do graph query and get Ids of concepts which I wanted. However, I only have ids of concepts. Now, using these ids, I want additional data from concept documents like concept_name or address or bio. This is what I was trying to achieve with fetch() function but it seems I hit the multivalued limitation again :) The reason why I am storing only the ids in eventLink documents is because I don't want to duplicate data unnecessarily. It will complicate maintenance of consistency in index when delete/update happens. Is there any way I can achieve this? Thanks! Pratik On Tue, Mar 14, 2017 at 11:24 AM, Joel Bernstein wrote: > Wow that's an interesting expression! > > The problem is that you are trying to fetch using the ancestors field, > which is multi-valued. fetch doesn't support multi-value join keys. I never > thought someone might try to do that. > > So , your attempting to get the concept names for ancestors? > > Can you explain a little more about the use case? > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Mar 14, 2017 at 11:08 AM, Pratik Patel > wrote: > > > I have two types of documents in my index. eventLink and concepttData. > > > > eventLink { ancestors:[,] } > > conceptData-{ id:id1, conceptid, concept_name . } > > > > Both are in same collection. > > In my query, I am doing a gatherNodes query wrapped in some other > function > > and ultimately I am getting a bunch of eventLink documents. Now, I am > > trying to get conceptData document for each id specified in eventLink's > > ancestors field. I am trying to do that using fetch() function. Here is > > simplified form of my query. > > > > fetch(collection1, > > > function to get eventLinks, > > > fl="concept_name", > > > on="ancestors=conceptid" > > > ) > > > > > > On executing this query, I am getting back same set of documents which > are > > results of my streaming expression containing gatherNodes() function. No > > fields are added to the tuples. From documentation, it seems like fetch > > would fetch additional data and add it to the tuples. However, that is > not > > happening. Resulting tuples does not have concept_name field in them. > What > > am I missing here? I really need to get this additional data from one > solr > > query so that I don't have to iterate over the eventLinks and get > > additional data by individual queries. That would badly impact > performance. > > Any suggestions? > > > > Here is my actual query and the response. > > > > > > fetch(collection1, > > > having( > > > gatherNodes(collection1, > > > search(collection1,q="*:*",fl="conceptid",sort="conceptid > > > asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:" > > Prospects2", > > > qt="/export"), > > > walk=conceptid->eventParticipantID, > > > gather="eventID", > > > trackTraversal="true", scatter="leaves", > > > count(*) > > > ), > > > gt(count(*),1) > > > ), > > > fl="concept_name", > > > on="ancestors=conceptid" > > > ) > > > > > > > > Response : > > > > { > > > "result-set": { > > > "docs": [ > > > { > > > "node": "524f03355056c8b53b4ed199", > > > "field": "eventID", > > > "level": 1, > > > "count(*)": 2, > > > "collection": "collection1", > > > "ancestors": [ > > > "524f02845056c8b53b4e9871", > > > "524f02755056c8b53b4e9269" > > > ] > > > }, > > > . > > > } > > > > > > Thanks, > > Pratik > > >
Re: Using fetch function with streaming expression
Wow, this is interesting! Is it going to be a new addition to solr or is it already available cause I can not find it in documentation? I am using solr version 6.4.1. On Tue, Mar 14, 2017 at 7:41 PM, Joel Bernstein wrote: > I'm going to add a "cartesian" function that create a cartesian product > from a multi-value field. This will turn a single tuple with a multi-value > into multiple tuples with a single value field. This will allow the fetch > operation to work on ancestors. It also has many other use cases. Sample > syntax: > > fetch(collection1, > cartesian(field=ancestors, > having(gatherNodes(collection1, > > search(collection1, > > q="*:*", > > fl="conceptid", > > sort="conceptid asc", > > fq=storeid:"524efcfd505637004b1f6f24", > > fq=tags:"Company", > > fq=tags:"Prospects2", > > qt="/export"), > > walk=conceptid->eventParticipantID, > > gather="eventID", > t > rackTraversal="true", > > scatter="leaves", > count(*)), > gt(count(*),1))), > fl="concept_name", > on="ancestors=conceptid") > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Mar 14, 2017 at 11:51 AM, Pratik Patel > wrote: > > > Hi, Joel. Thanks for the reply. > > > > So, I need to do some graph traversal queries for my use case. In my data > > set, I have concepts and events. > > > > concept : {name, address, bio ..}, > > > event: {name, date, participantIds:[concept1, concept2...] .} > > > > > > Events connects two or more concepts. So, this is a graph data where > > concepts are connected to each other via events. Each event store links > to > > the concepts that it connects. So the field which stores those links is > > multivalued. This is a natural structure for my data on which I wanted to > > do some advanced graph traversal queries with some streaming expression. > > However, gatherNodes() function does not support multivalued fields yet. > > So, I changed my index structure to be something like this. > > > > concept : {conceptId, name, address, bio ..}, > > > event: {eventId, name, date, participantIds:[concept1, concept2...] > > .} > > > *create eventLink documents for each participantId in each > > > event > > > eventLink:{eventid, conceptid, id} > > > > > > > > I created eventLink documents from each event so that I can traverse the > > data using gatherNodes() function. With this change, I was able to do > graph > > query and get Ids of concepts which I wanted. However, I only have ids of > > concepts. Now, using these ids, I want additional data from concept > > documents like concept_name or address or bio. This is what I was trying > > to achieve with fetch() function but it seems I hit the multivalued > > limitation again :) The reason why I am storing only the ids in eventLink > > documents is because I don't want to duplicate data unnecessarily. It > will > > complicate maintenance of consistency in index when delete/update > happens. > > Is there any way I can achieve this? > > > > Thanks! > > Pratik > > > > > > > > > > > > On Tue, Mar 14, 2017 at 11:24 AM, Joel Bernstein > > wrote: > > > > > Wow that's an interesting expression! > > > > > > The problem is that you are trying to fetch using the ancestors field, > > > which is multi-valued. fetch doesn't support multi-value join keys. I > > never > > > thought someone might try to do that. > > > > > > So , your attempting to get the concept names for ancestors? > > > > > > Can you explain a little more about the use case? > > > > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Tue, Mar 14, 2017 at 11:08 AM, Pratik Patel > > > wrote: > > > > > > > I have two types of documents in my index. eventLink and > concepttData. > > > > > > > > eventLink { ancestors:[,] } > > > > conceptData-{ id:id1, conceptid, concept_name . > data> } > > > > > > > > Both are in same collection. > > > > In my query, I am doing a gatherNodes query wrapped in some other > > > f
How to implement nested streaming expressions in Java using solrj
I am trying to write a streaming expression in solrj. Following is the query that I want to implement in Java. having( > gatherNodes(collection1, > search(collection1,q="*:*",fl="conceptid",sort="conceptid > asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:"Prospects2", > qt="/export"), > walk=conceptid->eventParticipantID, > gather="eventID", > trackTraversal="true", scatter="leaves", > count(*) > ), > gt(count(*),1) > ) Using this article ( http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html) I could implement and run single streaming expression, search(collection1,q="*:*",fl="conceptid",sort="conceptid asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:"Prospects2", qt="/export") But I can not find a way to create a nested query. How can I do that? Thanks, Pratik
Re: Using fetch function with streaming expression
Great, I think I can achieve what I want by combining "select" and "cartersian" functions in my expression. Thanks a lot for help! Regards, Pratik On Wed, Mar 15, 2017 at 10:21 AM, Joel Bernstein wrote: > I haven't created the jira ticket for this yet. It's fairly quick to > implement but the Solr 6.5 release is just around the corner. So most > likely it would be in the Solr 6.6. It will be committed fairly soon > though so if you want to use master, or branch_6x you can experiment with > it earlier. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Mar 14, 2017 at 7:53 PM, Pratik Patel wrote: > > > Wow, this is interesting! Is it going to be a new addition to solr or is > it > > already available cause I can not find it in documentation? I am using > solr > > version 6.4.1. > > > > On Tue, Mar 14, 2017 at 7:41 PM, Joel Bernstein > > wrote: > > > > > I'm going to add a "cartesian" function that create a cartesian product > > > from a multi-value field. This will turn a single tuple with a > > multi-value > > > into multiple tuples with a single value field. This will allow the > fetch > > > operation to work on ancestors. It also has many other use cases. > Sample > > > syntax: > > > > > > fetch(collection1, > > > cartesian(field=ancestors, > > > having(gatherNodes(collection1, > > > > > > search(collection1, > > > > > > q="*:*", > > > > > > fl="conceptid", > > > > > > sort="conceptid asc", > > > > > > fq=storeid:"524efcfd505637004b1f6f24", > > > > > > fq=tags:"Company", > > > > > > fq=tags:"Prospects2", > > > > > > qt="/export"), > > > > > > walk=conceptid->eventParticipantID, > > > > > > gather="eventID", > > > t > > > rackTraversal="true", > > > > > > scatter="leaves", > > > count(*)), > > > gt(count(*),1))), > > > fl="concept_name", > > > on="ancestors=conceptid") > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Tue, Mar 14, 2017 at 11:51 AM, Pratik Patel > > > wrote: > > > > > > > Hi, Joel. Thanks for the reply. > > > > > > > > So, I need to do some graph traversal queries for my use case. In my > > data > > > > set, I have concepts and events. > > > > > > > > concept : {name, address, bio ..}, > > > > > event: {name, date, participantIds:[concept1, concept2...] .} > > > > > > > > > > > > Events connects two or more concepts. So, this is a graph data where > > > > concepts are connected to each other via events. Each event store > links > > > to > > > > the concepts that it connects. So the field which stores those links > is > > > > multivalued. This is a natural structure for my data on which I > wanted > > to > > > > do some advanced graph traversal queries with some streaming > > expression. > > > > However, gatherNodes() function does not support multivalued fields > > yet. > > > > So, I changed my index structure to be something like this. > > > > > > > > concept : {conceptId, name, address, bio ..}, > > > > > event: {eventId, name, date, participantIds:[concept1, concept2...] > > > > .} > > > > > *create eventLink documents for each participantId in each > > > > > event > > > > > eventLink:{eventid, conceptid, id} > > > > > > > > > > > > > > > > I created eventLink documents from each event so that I can traverse > > the > > > > data using gatherNodes() function. With this change, I was able to do > > > graph > > > > query and get Ids of concepts which I wanted. However, I only have > ids > > of > > > > concepts. Now, using these ids, I want additional data from concept > > > > documents like concept_name or address or bio. This is what I was > > trying > > > > to achieve with fetch() function but it s
RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain
Hi All, I am facing this issue since very long, can you please provide your suggestion on it ? Regards, Pratik Thaker -Original Message- From: Pratik Thaker [mailto:pratik.tha...@smartstreamrdu.com] Sent: 09 February 2017 21:24 To: 'solr-user@lucene.apache.org' Subject: RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain Hi Friends, Can you please try to give me some details about below issue ? Regards, Pratik Thaker From: Pratik Thaker Sent: 07 February 2017 17:12 To: 'solr-user@lucene.apache.org' Subject: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain Hi All, I am using SOLR Cloud 6.0 I am receiving below exception very frequently in solr logs, o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: RunUpdateProcessor has received an AddUpdateCommand containing a document that appears to still contain Atomic document update operations, most likely because DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:63) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:936) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1091) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:714) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97) Can you please help me with the root cause ? Below is the snapshot of solrconfig, [^\w-\.] _ -MM-dd'T'HH:mm:ss.SSSZ -MM-dd'T'HH:mm:ss,SSSZ -MM-dd'T'HH:mm:ss.SSS -MM-dd'T'HH:mm:ss,SSS -MM-dd'T'HH:mm:ssZ -MM-dd'T'HH:mm:ss -MM-dd'T'HH:mmZ -MM-dd'T'HH:mm -MM-dd HH:mm:ss.SSSZ -MM-dd HH:mm:ss,SSSZ -MM-dd HH:mm:ss.SSS -MM-dd HH:mm:ss,SSS -MM-dd HH:mm:ssZ -MM-dd HH:mm:ss -MM-dd HH:mmZ -MM-dd HH:mm -MM-dd strings java.lang.Boolean booleans java.util.Date tdates java.lang.Long java.lang.Integer tlongs java.lang.Number tdoubles Regards, Pratik Thaker The information in this email is confidential and may be legally privileged. It is intende
RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain
Hi Ishan, After making suggested changes to solrconfig.xml, I did upconfig on all 3 SOLR VMs and restarted SOLR engines. But still I am facing same issue. Is it something I am missing ? Regards, Pratik Thaker -Original Message- From: Ishan Chattopadhyaya [mailto:ichattopadhy...@gmail.com] Sent: 14 April 2017 02:12 To: solr-user@lucene.apache.org Subject: Re: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain Why are you adding these update processors (esp. the AddSchemaFieldsUpdateProcessor) after DistributedUpdateProcessor? Try adding them before DUP, and it has a better chance to work. On Wed, Apr 12, 2017 at 3:44 PM, Pratik Thaker < pratik.tha...@smartstreamrdu.com> wrote: > Hi All, > > I am facing this issue since very long, can you please provide your > suggestion on it ? > > Regards, > Pratik Thaker > > -----Original Message- > From: Pratik Thaker [mailto:pratik.tha...@smartstreamrdu.com] > Sent: 09 February 2017 21:24 > To: 'solr-user@lucene.apache.org' > Subject: RE: DistributedUpdateProcessorFactory was explicitly disabled > from this updateRequestProcessorChain > > Hi Friends, > > Can you please try to give me some details about below issue ? > > Regards, > Pratik Thaker > > From: Pratik Thaker > Sent: 07 February 2017 17:12 > To: 'solr-user@lucene.apache.org' > Subject: DistributedUpdateProcessorFactory was explicitly disabled > from this updateRequestProcessorChain > > Hi All, > > I am using SOLR Cloud 6.0 > > I am receiving below exception very frequently in solr logs, > > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: > RunUpdateProcessor has received an AddUpdateCommand containing a > document that appears to still contain Atomic document update > operations, most likely because DistributedUpdateProcessorFactory was > explicitly disabled from this updateRequestProcessorChain > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd( > RunUpdateProcessorFactory.java:63) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessor > Factory$AddSchemaFieldsUpdateProcessor.processAdd( > AddSchemaFieldsUpdateProcessorFactory.java:335) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldMutatingUpdateProcessor. > processAdd(FieldMutatingUpdateProcessor.java:117) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldMutatingUpdateProcessor. > processAdd(FieldMutatingUpdateProcessor.java:117) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldMutatingUpdateProcessor. > processAdd(FieldMutatingUpdateProcessor.java:117) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldMutatingUpdateProcessor. > processAdd(FieldMutatingUpdateProcessor.java:117) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.FieldNameMutatingUpdateProcess > orFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.FieldMutatingUpdateProcessor. > processAdd(FieldMutatingUpdateProcessor.java:117) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at org.apache.solr.update.processor.DistributedUpdateProcessor. > doLocalAdd(DistributedUpdateProcessor.java:936) > at org.apache.solr.update.processor.DistributedUpdateProcessor. > versionAdd(DistributedUpdateProcessor.java:1091) > at org.apache.solr.update.processor.DistributedUpdateProcessor. > processAdd(DistributedUpdateProcessor.java:714) > at org.apache.solr.update.processor.UpdateRequestProcessor. > processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.AbstractDefaultValueUpdateProc > essorFactory$DefaultValueUpdateProcessor.processAdd( > AbstractDefaultValueUpdateProcessorFactory.java:93) > at org.apache.solr.handler.loader.JavabinLoader$1.update( > JavabinLoader.java:97) >
RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain
Hi Alessandro, Can you please suggest what should be the correct order of adding processors ? I am having 5 collections, 6 shards, replication factor 2, 3 nodes on 3 separate VMs. Regards, Pratik Thaker -Original Message- From: alessandro.benedetti [mailto:a.benede...@sease.io] Sent: 21 April 2017 13:38 To: solr-user@lucene.apache.org Subject: RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain Let's make a quick differentiation between PRE and POST processors in a Solr Cloud atchitecture : "In a single node, stand-alone Solr, each update is run through all the update processors in a chain exactly once. But the behavior of update request processors in SolrCloud deserves special consideration. " cit. wiki *PRE PROCESSORS* All the processors defined BEFORE the distributedUpdateProcessor happen ONLY on the first node that receive the update ( regardless if it is a leader or a replica ). *POST PROCESSORS* The distributedUpdateProcessor will forward the update request to the the correct leader ( or multiple leaders if the request involves more shards), the leader will then forward to the replicas. The leaders and replicas at this point will execute all the update request processors defined AFTER the distributedUpdateProcessor. " Pre-processors and Atomic Updates Because DistributedUpdateProcessor is responsible for processing Atomic Updates into full documents on the leader node, this means that pre-processors which are executed only on the forwarding nodes can only operate on the partial document. If you have a processor which must process a full document then the only choice is to specify it as a post-processor." wiki In your example, your chain is definitely messed up, the order is important and you want your heavy processing to happen only on the first node. For better info and clarification: https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ( you can find here a working alternative to your chain) https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nabble.com/DistributedUpdateProcessorFactory-was-explicitly-disabled-from-this-updateRequestProcessorChain-tp4319154p4331215.html Sent from the Solr - User mailing list archive at Nabble.com. The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
Solr Carrot Clustering query with specific label in it
Hi, When we do a Carrot Clustering query on a set of solr documents we get back following type of response. DDR 3.9599865057283354 TWINX2048-3200PRO VS1GB400C3 VDBDB1A16 iPod 11.959228467119022 F8V7067-APL-KIT IW-02 MA147LL/A Each label(cluster) has corresponding set of documents. The question is, is it possible to make another Carrot Clustering query with specific label in it so as to only get back documents corresponding to that label. In my use case, I am trying to write a streaming expression where one of the stream is documents corresponding to a label(carrot cluster) selected by user. Hence, I can not use the data present in original response object. I have been exploring Carrot2 documentation but I can't seem find any option which lets you specify a label in the query. I am using solr 6.4.1 in cloud mode and clustering algorithm is "org.carrot2.clustering.lingo.LingoClusteringAlgorithm" Thanks, Pratik
Semantic Knowledge Graph query using SolrJ
I am trying to use Semantic Knowledge Graph in my java based application. I have a Semantic Knowledge Graph query which works fine if I trigger it through browser using restlet client. Following is the query. { "queries": [ "foo:\"5a6127a7234e76473a816f1c\"" ], "compare": [ { "type": "bar", "limit": 30 } ]} Now, I want to trigger the same query through SolrJ client. I have tried following code but it gives me an error {"error":{"msg":"KnowledgeGraphHandler requires POST data","code":400}} The code in java is SolrQuery request = new SolrQuery(); request.setRequestHandler("/skg"); request.setShowDebugInfo(true); request.setParam("wt", "json"); request.setParam("json", "{\"queries\":[\"foo:\\\"5a6127a7234e76473a816f1c\\\"\"],\"compare\":[{\"type\":\"bar\",\"limit\":30}]}"); request.set("rows", 10); request.setParam("qf", "conceptname^10 tags^3 textproperty^2 file_text^4"); try { QueryResponse response = getStore().getEnvironment().getSolr().query(request, SolrRequest.METHOD.POST); NamedList rsp = response.getResponse(); ArrayList> skg_resp = (ArrayList>) rsp.get("clusters"); if (skg_resp != null) { } } Any idea what is wrong here? Any pointer to documentation on how to construct request for Semantic Knowledge Graph through solrJ would be very helpful. Thanks Pratik
Question on query time boosting
Hello All, I am trying to understand how exactly query time boosting works in solr. Primarily, I want to understand if absolute boost values matter or is it just the relative difference between various boost values which decides scoring. Let's take following two queries for example. // case1: q parameter > concept_name:(*semantic*)^200 OR > concept_name:(*machine*)^400 OR > Abstract_note:(*semantic*)^20 OR > Abstract_note:(*machine*)^40 //case2: q parameter > concept_name:(*semantic*)^20 OR > concept_name:(*machine*)^40 OR > Abstract_note:(*semantic*)^2 OR > Abstract_note:(*machine*)^4 Are these two queries any different? Relative boosting is same in both of them. I can see that they produce same results and ordering. Only difference is that the score in case1 is 10 times the score in case2. Thanks, Pratik
Named entity extraction/correlation using Semantic Knowledge Graph
Hi Everyone, I have been using Semantic Knowledge Graph for document summarization, term correlation and document similarity. It has produced very good results after appropriate tuning. I was wondering if there is any way the Semantic Knowledge Graph can be used to for Named Entity Extraction like person names, companies etc. Related cases could be like below. 1. Extracting top named entities given a specific document or set of documents. 2. Given a named entity (let's say person name), return top N entities which are conceptually related to that entity. Does anyone have an idea as to how this can be achieved? Any direction would be a great help! Thanks And Regards, Pratik
Re: Named entity extraction/correlation using Semantic Knowledge Graph
I am on look out for ideas too but I was thinking of using some NER technique to index named entities in a specific field and then use Semantic Knowledge Graph on that specific field i.e. limit SKG queries to that field only. I am not sure however if this would produce desired results. I don't have a training corpus yet. Essentially what I want is something like a Solr Filter for entities or a request handler which can extract entities at query time. On Wed, Oct 17, 2018 at 4:45 PM Alexandre Rafalovitch wrote: > Solr does have: > 1) OpenNLP that does NER specifically > 2) TextTagger that does gazeteer NER based on existing list but with > Solr analysis power > > I would be curious to know how Semantic Knowledge Graph could be used > from NER (or even for other things you already have used it for), but > I am not sure it is clear what specifically you invisage. As in, is > there training corpus, are you looking at NGram techniques, etc. > > Regards, > Alex. > On Wed, 17 Oct 2018 at 13:40, Pratik Patel wrote: > > > > Hi Everyone, > > > > I have been using Semantic Knowledge Graph for document summarization, > term > > correlation and document similarity. It has produced very good results > > after appropriate tuning. > > > > I was wondering if there is any way the Semantic Knowledge Graph can be > > used to for Named Entity Extraction like person names, companies etc. > > Related cases could be like below. > > > > 1. Extracting top named entities given a specific document or set of > > documents. > > 2. Given a named entity (let's say person name), return top N entities > > which are conceptually related to that entity. > > > > Does anyone have an idea as to how this can be achieved? Any direction > > would be a great help! > > > > Thanks And Regards, > > Pratik >
Extracting important multi term phrases from the text
Hello Everyone, Standard way of tokenizing in solr would divide the text by white space in solr. Is there a way by which we can index multi-term phrases like "Machine Learning" instead of "Machine", "Learning"? Is it possible to create a specific field type for such phrases which has its own indexing pipeline? I am open to storing n-grams but these n-grams would be across terms and not just one term? In other words, I don't want to store n-grams of the term "machine", I want to store n-grams for a sentence like below. "I like machine learning" --> "I like", "like machine", "machine learning" and so on. It seems like Shingle Filter ( https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter) may be used for this. Is there a better alternative? I want to use this field as an input to Semantic Knowledge Graph. The plugin works great for words. But now I want to use it for phrases. Any idea around this would be really helpful. Thanks a lot! - Pratik
Re: Extracting important multi term phrases from the text
Hi Markus, Thanks for the reply. I tried using ShingleFilter and it seems to be working. However, I am hitting an issue when it is used with StopWordFilter. StopWordFilter leaves an underscore "_" for removed words and it kind of screws up the data in index. I tried setting enablePositionIncrements="false" for stop word filter but that parameter only works for lucene version 4.3 or earlier. Looks like it's an open issue in lucene https://issues.apache.org/jira/browse/LUCENE-4065 For now, I am trying to find a workaround using PatternReplaceFilterFactory. Regards, Pratik On Thu, Nov 15, 2018 at 4:15 PM Markus Jelsma wrote: > Hello Pratik, > > We would use ShingleFilter for this indeed. If you only want > bigrams/shingles, don't forget to disable outputUnigrams and set both > shinle size limits to 2. > > Regards, > Markus > > -Original message- > > From:Pratik Patel > > Sent: Thursday 15th November 2018 17:00 > > To: solr-user@lucene.apache.org > > Subject: Extracting important multi term phrases from the text > > > > Hello Everyone, > > > > Standard way of tokenizing in solr would divide the text by white space > in > > solr. > > > > Is there a way by which we can index multi-term phrases like "Machine > > Learning" instead of "Machine", "Learning"? > > Is it possible to create a specific field type for such phrases which has > > its own indexing pipeline? I am open to storing n-grams but these n-grams > > would be across terms and not just one term? In other words, I don't want > > to store n-grams of the term "machine", I want to store n-grams for a > > sentence like below. > > > > "I like machine learning" --> "I like", "like machine", "machine > learning" > > and so on. > > > > It seems like Shingle Filter ( > > > https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter > ) > > may be used for this. Is there a better alternative? > > > > I want to use this field as an input to Semantic Knowledge Graph. The > > plugin works great for words. But now I want to use it for phrases. Any > > idea around this would be really helpful. > > > > Thanks a lot! > > > > - Pratik > > >
Re: Extracting important multi term phrases from the text
@Markus @Walter, @Alexandre is right. The culprit was not StopWord Filter, it was ShingleFilter. I could not find parameter filterToken in documentation, is it a new addition? BTW, I tried that and it works. Thanks! I still ended up using pattern replacement filter because I did not want any single word string in that field. @David I am using SKG through the plugin. So it is a POST request with query in body. I haven't yet upgraded to version 7.5. Thank you all for the help! Regards, Pratik On Fri, Nov 16, 2018 at 8:36 AM David Hastings wrote: > Which function of the SKG are you using? significantTerms? > > On Thu, Nov 15, 2018 at 7:09 PM Alexandre Rafalovitch > wrote: > > > I think the underscore actually comes from the Shingles (parameter > > fillerToken). Have you tried setting it to empty string? > > > > Regards, > >Alex. > > On Thu, 15 Nov 2018 at 17:16, Pratik Patel wrote: > > > > > > Hi Markus, > > > > > > Thanks for the reply. I tried using ShingleFilter and it seems to > > > be working. However, I am hitting an issue when it is used with > > > StopWordFilter. StopWordFilter leaves an underscore "_" for removed > words > > > and it kind of screws up the data in index. > > > > > > I tried setting enablePositionIncrements="false" for stop word filter > but > > > that parameter only works for lucene version 4.3 or earlier. Looks like > > > it's an open issue in lucene > > > https://issues.apache.org/jira/browse/LUCENE-4065 > > > > > > For now, I am trying to find a workaround using > > PatternReplaceFilterFactory. > > > > > > Regards, > > > Pratik > > > > > > On Thu, Nov 15, 2018 at 4:15 PM Markus Jelsma < > > markus.jel...@openindex.io> > > > wrote: > > > > > > > Hello Pratik, > > > > > > > > We would use ShingleFilter for this indeed. If you only want > > > > bigrams/shingles, don't forget to disable outputUnigrams and set both > > > > shinle size limits to 2. > > > > > > > > Regards, > > > > Markus > > > > > > > > -Original message- > > > > > From:Pratik Patel > > > > > Sent: Thursday 15th November 2018 17:00 > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Extracting important multi term phrases from the text > > > > > > > > > > Hello Everyone, > > > > > > > > > > Standard way of tokenizing in solr would divide the text by white > > space > > > > in > > > > > solr. > > > > > > > > > > Is there a way by which we can index multi-term phrases like > "Machine > > > > > Learning" instead of "Machine", "Learning"? > > > > > Is it possible to create a specific field type for such phrases > > which has > > > > > its own indexing pipeline? I am open to storing n-grams but these > > n-grams > > > > > would be across terms and not just one term? In other words, I > don't > > want > > > > > to store n-grams of the term "machine", I want to store n-grams > for a > > > > > sentence like below. > > > > > > > > > > "I like machine learning" --> "I like", "like machine", "machine > > > > learning" > > > > > and so on. > > > > > > > > > > It seems like Shingle Filter ( > > > > > > > > > > > > https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter > > > > ) > > > > > may be used for this. Is there a better alternative? > > > > > > > > > > I want to use this field as an input to Semantic Knowledge Graph. > The > > > > > plugin works great for words. But now I want to use it for phrases. > > Any > > > > > idea around this would be really helpful. > > > > > > > > > > Thanks a lot! > > > > > > > > > > - Pratik > > > > > > > > > > > >
Re: Extracting important multi term phrases from the text
@David Sorry for late reply. The SKG query that I am using is actually fairly basic in itself. For example, { > "queries":[ > "dataStoreId:\"123\"", > "text:\"foo\"" > ], > "compare":[ > { > "type":"text_shingles", > "limit":30, > "discover_values":true > } > ] > } What I am expecting is that SKG will return words/phrases that are related to the term "foo". I am filtering the text through StopWordFilter before that. I have also found that specifying a good foreground can drastically improve the results. Good luck! - Pratik On Fri, Nov 16, 2018 at 11:15 AM Alexandre Rafalovitch wrote: > Good catch Pratik. > > It is in Javadoc, but not in the reference guide: > > https://lucene.apache.org/core/6_3_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilterFactory.html > . I'll try to fix that later (SOLR-12996). > > Regards, >Alex. > On Fri, 16 Nov 2018 at 10:44, Pratik Patel wrote: > > > > @Markus @Walter, @Alexandre is right. The culprit was not StopWord > Filter, > > it was ShingleFilter. I could not find parameter filterToken in > > documentation, is it a new addition? BTW, I tried that and it works. > Thanks! > > I still ended up using pattern replacement filter because I did not want > > any single word string in that field. > > > > @David I am using SKG through the plugin. So it is a POST request with > > query in body. I haven't yet upgraded to version 7.5. > > > > Thank you all for the help! > > > > Regards, > > Pratik > > > > On Fri, Nov 16, 2018 at 8:36 AM David Hastings < > hastings.recurs...@gmail.com> > > wrote: > > > > > Which function of the SKG are you using? significantTerms? > > > > > > On Thu, Nov 15, 2018 at 7:09 PM Alexandre Rafalovitch < > arafa...@gmail.com> > > > wrote: > > > > > > > I think the underscore actually comes from the Shingles (parameter > > > > fillerToken). Have you tried setting it to empty string? > > > > > > > > Regards, > > > >Alex. > > > > On Thu, 15 Nov 2018 at 17:16, Pratik Patel > wrote: > > > > > > > > > > Hi Markus, > > > > > > > > > > Thanks for the reply. I tried using ShingleFilter and it seems to > > > > > be working. However, I am hitting an issue when it is used with > > > > > StopWordFilter. StopWordFilter leaves an underscore "_" for removed > > > words > > > > > and it kind of screws up the data in index. > > > > > > > > > > I tried setting enablePositionIncrements="false" for stop word > filter > > > but > > > > > that parameter only works for lucene version 4.3 or earlier. Looks > like > > > > > it's an open issue in lucene > > > > > https://issues.apache.org/jira/browse/LUCENE-4065 > > > > > > > > > > For now, I am trying to find a workaround using > > > > PatternReplaceFilterFactory. > > > > > > > > > > Regards, > > > > > Pratik > > > > > > > > > > On Thu, Nov 15, 2018 at 4:15 PM Markus Jelsma < > > > > markus.jel...@openindex.io> > > > > > wrote: > > > > > > > > > > > Hello Pratik, > > > > > > > > > > > > We would use ShingleFilter for this indeed. If you only want > > > > > > bigrams/shingles, don't forget to disable outputUnigrams and set > both > > > > > > shinle size limits to 2. > > > > > > > > > > > > Regards, > > > > > > Markus > > > > > > > > > > > > -Original message- > > > > > > > From:Pratik Patel > > > > > > > Sent: Thursday 15th November 2018 17:00 > > > > > > > To: solr-user@lucene.apache.org > > > > > > > Subject: Extracting important multi term phrases from the text > > > > > > > > > > > > > > Hello Everyone, > > > > > > > > > > > > > > Standard way of tokenizing in solr would divide the text by > white > > > > space > > > > > > in > > > > > > > solr. > > > > > > > > > > > > > > Is there a way by which we can index multi-term phrases like > > > "Machine > > > > > > > Learning" instead of "Machine", "Learning"? > > > > > > > Is it possible to create a specific field type for such phrases > > > > which has > > > > > > > its own indexing pipeline? I am open to storing n-grams but > these > > > > n-grams > > > > > > > would be across terms and not just one term? In other words, I > > > don't > > > > want > > > > > > > to store n-grams of the term "machine", I want to store n-grams > > > for a > > > > > > > sentence like below. > > > > > > > > > > > > > > "I like machine learning" --> "I like", "like machine", > "machine > > > > > > learning" > > > > > > > and so on. > > > > > > > > > > > > > > It seems like Shingle Filter ( > > > > > > > > > > > > > > > > > > > > > https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter > > > > > > ) > > > > > > > may be used for this. Is there a better alternative? > > > > > > > > > > > > > > I want to use this field as an input to Semantic Knowledge > Graph. > > > The > > > > > > > plugin works great for words. But now I want to use it for > phrases. > > > > Any > > > > > > > idea around this would be really helpful. > > > > > > > > > > > > > > Thanks a lot! > > > > > > > > > > > > > > - Pratik > > > > > > > > > > > > > > > > > > > > >
Get MLT Interesting Terms for a set of documents corresponding to the query specified
Hi Everyone! I am trying to use MLT request handler. My query matches more than one documents but the response always seems to pick up the first document and interestingTerms also seems to be corresponding to that single document only. What I am expecting is that if my query matches multiple documents then the InterestingTerms handler result also corresponds to that set of documents and not the first document. Following is my query, http://localhost:8081/solr/collection1/mlt?debugQuery=on&fq=tags:test&mlt.boost=true&mlt.fl=mlt.fl=textpropertymlt&mlt.interestingTerms=details&mlt.mindf=1&mlt.mintf=2&mlt.minwl=3&q=*:*&rows=100&rows=2&start=0 Ultimately, my goal is to get interesting terms corresponding to this whole set of documents. I don't need similar documents as such. If not with mlt, is there any other way I can achieve this? that is, given a query matching set of documents, find interestingTerms for that set of documents based on tf-idf? Thanks! Pratik
Re: Get MLT Interesting Terms for a set of documents corresponding to the query specified
Aman, Thanks for the reply! I have tried with corrected query but it doesn't solve the problem. also, my tags filter matches multiple documents, however the interestingTerms seems to correspond to just the first document. Here is an example of a query which matches 1900 documents. http://localhost:8081/solr/collection1/mlt?debugQuery=on&q=tags:voltage&mlt.boost=true&mlt.fl=my_field&mlt.interestingTerms=details&mlt.mindf=1&mlt.mintf=2&mlt.minwl=3&q=*:*&rows=100&start=0 Thanks, Pratik On Mon, Jan 21, 2019 at 2:52 PM Aman Tandon wrote: > I see two rows params, looks like which will be overwritten by rows=2, and > then your tags filter is resulting only one document. Please remove extra > rows and try. > > On Mon, Jan 21, 2019, 08:44 Pratik Patel > > Hi Everyone! > > > > I am trying to use MLT request handler. My query matches more than one > > documents but the response always seems to pick up the first document and > > interestingTerms also seems to be corresponding to that single document > > only. > > > > What I am expecting is that if my query matches multiple documents then > the > > InterestingTerms handler result also corresponds to that set of documents > > and not the first document. > > > > Following is my query, > > > > > > > http://localhost:8081/solr/collection1/mlt?debugQuery=on&fq=tags:test&mlt.boost=true&mlt.fl=mlt.fl=textpropertymlt&mlt.interestingTerms=details&mlt.mindf=1&mlt.mintf=2&mlt.minwl=3&q=*:*&rows=100&rows=2&start=0 > > > > Ultimately, my goal is to get interesting terms corresponding to this > whole > > set of documents. I don't need similar documents as such. If not with > mlt, > > is there any other way I can achieve this? that is, given a query > matching > > set of documents, find interestingTerms for that set of documents based > on > > tf-idf? > > > > Thanks! > > Pratik > > >
Re: Get MLT Interesting Terms for a set of documents corresponding to the query specified
I will certainly try it out. Thanks! On Mon, Jan 21, 2019 at 8:48 PM Joel Bernstein wrote: > You find the significantTerms streaming expressions useful: > > > https://lucene.apache.org/solr/guide/7_6/stream-source-reference.html#significantterms > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Mon, Jan 21, 2019 at 3:02 PM Pratik Patel wrote: > > > Aman, > > > > Thanks for the reply! > > > > I have tried with corrected query but it doesn't solve the problem. also, > > my tags filter matches multiple documents, however the interestingTerms > > seems to correspond to just the first document. > > Here is an example of a query which matches 1900 documents. > > > > > > > http://localhost:8081/solr/collection1/mlt?debugQuery=on&q=tags:voltage&mlt.boost=true&mlt.fl=my_field&mlt.interestingTerms=details&mlt.mindf=1&mlt.mintf=2&mlt.minwl=3&q=*:*&rows=100&start=0 > > > > > > Thanks, > > Pratik > > > > > > On Mon, Jan 21, 2019 at 2:52 PM Aman Tandon > > wrote: > > > > > I see two rows params, looks like which will be overwritten by rows=2, > > and > > > then your tags filter is resulting only one document. Please remove > extra > > > rows and try. > > > > > > On Mon, Jan 21, 2019, 08:44 Pratik Patel > > > > > > Hi Everyone! > > > > > > > > I am trying to use MLT request handler. My query matches more than > one > > > > documents but the response always seems to pick up the first document > > and > > > > interestingTerms also seems to be corresponding to that single > document > > > > only. > > > > > > > > What I am expecting is that if my query matches multiple documents > then > > > the > > > > InterestingTerms handler result also corresponds to that set of > > documents > > > > and not the first document. > > > > > > > > Following is my query, > > > > > > > > > > > > > > > > > > http://localhost:8081/solr/collection1/mlt?debugQuery=on&fq=tags:test&mlt.boost=true&mlt.fl=mlt.fl=textpropertymlt&mlt.interestingTerms=details&mlt.mindf=1&mlt.mintf=2&mlt.minwl=3&q=*:*&rows=100&rows=2&start=0 > > > > > > > > Ultimately, my goal is to get interesting terms corresponding to this > > > whole > > > > set of documents. I don't need similar documents as such. If not with > > > mlt, > > > > is there any other way I can achieve this? that is, given a query > > > matching > > > > set of documents, find interestingTerms for that set of documents > based > > > on > > > > tf-idf? > > > > > > > > Thanks! > > > > Pratik > > > > > > > > > >
Re: Using solr graph to traverse N relationships
Problem #1 can probably be solved by using "fetch" function. ( https://lucene.apache.org/solr/guide/6_6/stream-decorators.html#fetch) Problem #2 and #3 can be solved by normalizing the graph connections and by applying cartesianProduct on multi valued field, as described here. http://lucene.472066.n3.nabble.com/Using-fetch-function-with-streaming-expression-td4324896.html On Wed, Mar 13, 2019 at 11:20 AM Nightingale, Jonathan A (US) < jonathan.nighting...@baesystems.com> wrote: > Hi, > I posted this question originally on stack overflow and it was suggested I > use this mailing list instead so I'm sending it out here also. Here's my > original link if you want to maybe answer there also. But I also copied the > question into the body of the email. > > > https://stackoverflow.com/questions/55130208/using-solr-graph-to-traverse-n-relationships > > I'm investigating if I can use an existing solr store to do graph > traversal. It would be ideal to not have to duplicate the data in a graph > store. I was playing with the solr streaming capabilities and the nodes > (gatherNodes) source. I have three problems with it and I'm wondering if > people have found solutions: > 1) getting the original documents that the nodes references with all of > their fields. I did eventually solve this by doing an innerJoin on the > nodes returned by gatherNodes and a query against "*:*" but this seem less > than ideal. Is there a better way to do this? Even better would be if I > could do it as an "export" and not a "select" to better handle large > amounts of data. This problem is small compared to the other two which seem > like major bugs in Solr > 2) I can't traverse to nodes from a field that has more than one value. In > the nodes stream source definition there is a walk parameter. > nodes(collection, > search(some search params) > walk="ref->id", > gather="vals") > > in this example its walking the from the search results, taking the field > "ref" on those docs and finding all nodes that match that as an id. This > works until ref becomes a list of values. Has anyone had success making > this work? A simple example would be a tree structure where you have a > folder document and it has a multiValue field representing its subfolders > and files. How would I walk that relationship? > 3) in that example the gather is returning the nodes that are represented > by the "vals" field on all the nodes that result from the walk. This also > does not work if that field is multiValued. Has anyone had any success with > this also? Again going back to the files and folders example, I want to > return all the files in the subfolders of the selected folder. > nodes(collection, > search(collection, q="path:currentFolder", qt="/select", sort="fileId > ASC"), > walk="contents->fileId", > gather="contents", > fq="type:file") > > I made this up so there may be some typos but the premise is that contents > are a multiValued string field and every document, either of type "file" or > "folder" has a fileId, which is what the contents field references. How > would I accomplish this? Do these fields need to be indexed in a special > way? > Something that interesting is I see in the solr documentation it does > support a multi valued walk but only if its a hard coded value > > nodes(emails, walk="john...@apache.org, janesm...@apache.org->from", > gather="to") > > but when using a different stream as the input of the nodes function it > can't resolve fields that are multivalues. It can't even properly resolve > text fields that mimic the example above. If I store a field called refs > with a string value of "ref-1, ref-2, ref-3", the only match will be on an > id of "ref-1" when walk="refs->id" > > Thanks, I'd appreciate any help > >
Best Practice about solr cloud schema
Hello all, I have added some fields to default managed-schema file. I was wondering if it is safe to take default managed-schema file as is and add your own fields to it in production. What is the best practice for this? As I understand, it should be safe to use default schema as base if documents that are going to be indexed in solr will only have newly defined fields in it. In fact, it helps because the common field types are already defined in default schema which can be re-used. I looked through the documentation but couldn't find the answer and more clarity on this would be helpful. Is it safe to use default managed-schema file as base add your own fields to it? Thanks, Pratik
Re: Best Practice about solr cloud schema
Hey Eric, thanks for the clarification! What about solrConfig.xml file? Sure, it should be customized to suit one's needs but can it be used as a base or is it best to create one from scratch ? Thanks, Pratik On Wed, Feb 7, 2018 at 5:29 PM, Erick Erickson wrote: > That's really the point of the default managed-schema, to be a base > you use for your customizations. In fact, I often _remove_ most of the > fields (and especially fieldTypes) that I don't need. This includes > dynamic fields, copyFields and the like. > > Sometimes it's actually easier, though, to just start all over. > > BTW, do not delete any field that begins and ends with an underscore, > e.g. _version_ unless you know exactly what the consequences are > > Best, > Erick > > On Wed, Feb 7, 2018 at 2:59 PM, Pratik Patel wrote: > > Hello all, > > > > I have added some fields to default managed-schema file. I was wondering > if > > it is safe to take default managed-schema file as is and add your own > > fields to it in production. What is the best practice for this? As I > > understand, it should be safe to use default schema as base if documents > > that are going to be indexed in solr will only have newly defined fields > in > > it. In fact, it helps because the common field types are already defined > in > > default schema which can be re-used. I looked through the documentation > but > > couldn't find the answer and more clarity on this would be helpful. > > > > Is it safe to use default managed-schema file as base add your own fields > > to it? > > > > Thanks, > > Pratik >
Re: Best Practice about solr cloud schema
That makes it clear. Thanks a lot for your help. Pratik On Feb 7, 2018 10:33 PM, "Erick Erickson" wrote: > It can pretty much be used as-is, _except_ > > you'll find one or more entries in your request handlers like: > _text_ > > Change "_text_" to something in your schema, that's the default search > field if you don't field-qualify your search terms. > > Note that if you take out, for instance, all of your non-english > fieldTypes, you can also remove most of the stuff under the /lang > folder. > > I essentially always test this out on a local, stand-alone instance > until I can index a few documents and query them, it's faster than > always having to remember to move them to ZooKeeper > > Best, > Erick > > On Wed, Feb 7, 2018 at 7:14 PM, Pratik Patel wrote: > > Hey Eric, thanks for the clarification! What about solrConfig.xml file? > > Sure, it should be customized to suit one's needs but can it be used as a > > base or is it best to create one from scratch ? > > > > Thanks, > > Pratik > > > > On Wed, Feb 7, 2018 at 5:29 PM, Erick Erickson > > wrote: > > > >> That's really the point of the default managed-schema, to be a base > >> you use for your customizations. In fact, I often _remove_ most of the > >> fields (and especially fieldTypes) that I don't need. This includes > >> dynamic fields, copyFields and the like. > >> > >> Sometimes it's actually easier, though, to just start all over. > >> > >> BTW, do not delete any field that begins and ends with an underscore, > >> e.g. _version_ unless you know exactly what the consequences are > >> > >> Best, > >> Erick > >> > >> On Wed, Feb 7, 2018 at 2:59 PM, Pratik Patel > wrote: > >> > Hello all, > >> > > >> > I have added some fields to default managed-schema file. I was > wondering > >> if > >> > it is safe to take default managed-schema file as is and add your own > >> > fields to it in production. What is the best practice for this? As I > >> > understand, it should be safe to use default schema as base if > documents > >> > that are going to be indexed in solr will only have newly defined > fields > >> in > >> > it. In fact, it helps because the common field types are already > defined > >> in > >> > default schema which can be re-used. I looked through the > documentation > >> but > >> > couldn't find the answer and more clarity on this would be helpful. > >> > > >> > Is it safe to use default managed-schema file as base add your own > fields > >> > to it? > >> > > >> > Thanks, > >> > Pratik > >> >
Re: Index size increases disproportionately to size of added field when indexed=false
I had a similar issue with index size after upgrading to version 6.4.1 from 5.x. The issue for me was that the field which caused index size to be increased disproportionately had a field type("text_general") for which default value of omitNorms was not true. Turning it on explicitly on field fixed the problem. Following is the link to my related question. You can verify value of omitNorms for your fields to check whether this is applicable in your case or not. http://search-lucene.com/m/Solr/eHNlagIB7209f1w1?subj=Fwd+Solr+dynamic+field+blowing+up+the+index+size On Tue, Feb 13, 2018 at 8:48 PM, Howe, David wrote: > > I have set docValues=false on all of the string fields in our index that > have indexed=false and stored=true. This gave a small improvement in the > index size from 13.3GB to 12.82GB. > > I have also tried running an optimize, which then reduced the index to > 12.6GB. > > Next step is to dump the sizes of the Solr index files for the index > version that is the correct size and the version that has the large size. > > Regards, > > David > > > David Howe > Java Domain Architect > Postal Systems > Level 16, 111 Bourke Street Melbourne VIC 3000 > > T 0391067904 > > M 0424036591 > > E david.h...@auspost.com.au > > W auspost.com.au > W startrack.com.au > > -Original Message- > From: Howe, David [mailto:david.h...@auspost.com.au] > Sent: Wednesday, 14 February 2018 7:26 AM > To: solr-user@lucene.apache.org > Subject: RE: Index size increases disproportionately to size of added > field when indexed=false > > > Thanks Hoss. I will try setting docValues to false, as we only ever want > to be able to retrieve the value of this field. > > Regards, > > David > > David Howe > Java Domain Architect > Postal Systems > Level 16, 111 Bourke Street Melbourne VIC 3000 > > T 0391067904 > > M 0424036591 > > E david.h...@auspost.com.au > > W auspost.com.au > W startrack.com.au > > Australia Post is committed to providing our customers with excellent > service. If we can assist you in any way please telephone 13 13 18 or visit > our website. > > The information contained in this email communication may be proprietary, > confidential or legally professionally privileged. It is intended > exclusively for the individual or entity to which it is addressed. You > should only read, disclose, re-transmit, copy, distribute, act in reliance > on or commercialise the information if you are authorised to do so. > Australia Post does not represent, warrant or guarantee that the integrity > of this email communication has been maintained nor that the communication > is free of errors, virus or interference. > > If you are not the addressee or intended recipient please notify us by > replying direct to the sender and then destroy any electronic or paper copy > of this message. Any views expressed in this email communication are taken > to be those of the individual sender, except where the sender specifically > attributes those views to Australia Post and is authorised to do so. > > Please consider the environment before printing this email. > Australia Post is committed to providing our customers with excellent > service. If we can assist you in any way please telephone 13 13 18 or visit > our website. > > The information contained in this email communication may be proprietary, > confidential or legally professionally privileged. It is intended > exclusively for the individual or entity to which it is addressed. You > should only read, disclose, re-transmit, copy, distribute, act in reliance > on or commercialise the information if you are authorised to do so. > Australia Post does not represent, warrant or guarantee that the integrity > of this email communication has been maintained nor that the communication > is free of errors, virus or interference. > > If you are not the addressee or intended recipient please notify us by > replying direct to the sender and then destroy any electronic or paper copy > of this message. Any views expressed in this email communication are taken > to be those of the individual sender, except where the sender specifically > attributes those views to Australia Post and is authorised to do so. > > Please consider the environment before printing this email. >
Re: Index size increases disproportionately to size of added field when indexed=false
You are right, in my case this field type was applied to many text fields. These includes many copy fields and dynamic fields as well. In my case, only specifying omitNorms=true for field type "text_general" fixed the issue. I didn't do anything else or had any other bug. On Wed, Feb 14, 2018 at 1:01 PM, Alessandro Benedetti wrote: > Hi pratik, > how is it possible that just the norms for a single field were causing such > a massive index size increment in your case ? > > In your case I think it was for a field type used by multiple fields, but > it's still suspicious in my opinions, > norms should be that big. > If I remember correctly in old versions of Solr before the drop of index > time boost, norms were containing both an approximation of the length of > the > field + index time boost. > From your mailing list problem you moved from 10 Gb to 300 Gb. > It can't be just the norms, are you sure you didn't face some bug ? > > Regards > > > > - > --- > Alessandro Benedetti > Search Consultant, R&D Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Index size increases disproportionately to size of added field when indexed=false
@Alessandro I will see if I can reproduce the same issue just by turning off omitNorms on field type. I'll open another mail thread if required. Thanks. On Thu, Feb 15, 2018 at 6:12 AM, Howe, David wrote: > > Hi Alessandro, > > Some interesting testing today that seems to have gotten me closer to what > the issue is. When I run the version of the index that is working > correctly against my database table that has the extra field in it, the > index suddenly increases in size. This is even though the data importer is > running the same SELECT as before (which doesn't include the extra column) > and loads the same number of rows. > > After scratching my head for a bit and browsing through both versions of > the table I am loading from (with and without the extra field), I noticed > that the natural ordering of the tables is different. These tables are > "staging" tables that I populate with another set of queries and inserts to > get the data into a format that is easy to ingest into Solr. When I add > the extra field to these queries, it changes the Oracle query plan as the > field is contained in a different table that I need to join to. As I don't > specify an "ORDER BY" on the query (as I didn't think it would make a > difference and would slow the query down), Oracle is free to chose how it > orders the result set. Adding the extra field changes that natural > ordering, which affects the order things go into my staging table. As I > don't specify an "ORDER BY" when I select things out of the staging table, > my data in the scenario that is working is being loaded in a different > order to the scenario which doesn't work. > > I am currently running full loads to verify this under each scenario, as I > have now forced the data in the scenario that doesn't work to be in the > same order as the scenario that does. Will see how this load goes > overnight. > > This leads to the question of what difference does it make to Solr what > order I load the data in? > > I also noticed that the .cfs file is quite large in the second scenario, > even though this is supposed to be disabled by default in Solr. I checked > my Solr config and there is no override of the default. > > In answer to your questions: > > 1) same number of documents - YES ~14,000,000 documents > 2) identical documents ( + 1 new field each not indexed) - YES, the second > scenario has one extra field that is stored but not indexed > 3) same number of deleted documents - YES, there are zero deleted > documents in both scenarios > 4) they both were born from scratch ( an empty index) - YES, both start > from a brand new virtual server with a brand new installation of Solr > > I am using the default auto commit, which I think is 15000. > > Thanks again for your assistance. > > Regards, > > David > > David Howe > Java Domain Architect > Postal Systems > Level 16, 111 Bourke Street Melbourne VIC 3000 > > T 0391067904 > > M 0424036591 > > E david.h...@auspost.com.au > > W auspost.com.au > W startrack.com.au > > Australia Post is committed to providing our customers with excellent > service. If we can assist you in any way please telephone 13 13 18 or visit > our website. > > The information contained in this email communication may be proprietary, > confidential or legally professionally privileged. It is intended > exclusively for the individual or entity to which it is addressed. You > should only read, disclose, re-transmit, copy, distribute, act in reliance > on or commercialise the information if you are authorised to do so. > Australia Post does not represent, warrant or guarantee that the integrity > of this email communication has been maintained nor that the communication > is free of errors, virus or interference. > > If you are not the addressee or intended recipient please notify us by > replying direct to the sender and then destroy any electronic or paper copy > of this message. Any views expressed in this email communication are taken > to be those of the individual sender, except where the sender specifically > attributes those views to Australia Post and is authorised to do so. > > Please consider the environment before printing this email. >
Re: Getting more documents from resultsSet
Using cursor marker might help as explained in this documentation https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html On Fri, May 18, 2018 at 4:13 PM, Deepak Goel wrote: > I wonder if in-memory-filesystem would help... > > On Sat, 19 May 2018, 01:03 Erick Erickson, > wrote: > > > If you only return fields that are docValue=true that'll largely > > eliminate the disk seeks. 30 seconds does seem kind of excessive even > > with disk seeks though. > > > > Here'r a reference: > > https://lucene.apache.org/solr/guide/6_6/docvalues.html > > > > Whenever I see anything like "...our business requirement is...", I > > cringe. _Why_ is that a requirement? What is being done _for the user_ > > that requires 2000 documents? There may be legitimate reasons, but > > there also may be better ways to get what you need. This may very well > > be an XY problem. > > > > For instance, if you want to take the top 2,000 docs from query X and > > score just those, see: > > https://lucene.apache.org/solr/guide/6_6/query-re-ranking.html, > > specifically: ReRankQParserPlugin. > > > > Best, > > Erick > > > > On Fri, May 18, 2018 at 11:09 AM, root23 wrote: > > > Hi all, > > > I am working on Solr 6. Our business requirement is that we need to > > return > > > 2000 docs for every query we execute. > > > Now normally if i execute the same set to query with start=0 to > rows=10. > > It > > > returns very fast(event for our most complex queries in like less then > 3 > > > seconds). > > > however the moment i add start=0 to rows =2000, the response time is > > like 30 > > > seconds or so. > > > > > > I understand that solr has to do probably disk seek to get the > documents > > > which might be the bottle neck in this case. > > > > > > Is there a way i can optimize around this knowingly that i might have > to > > get > > > 2000 results in one go and then might have to paginate also further and > > > showing 2000 results on each page. We could go to as much as 50 page. > > > > > > > > > > > > -- > > > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > > >
Applying streaming expression as a filter in graph traversal expression (gatherNodes)
We can limit the scope of graph traversal by applying some filter along the way as follows. gatherNodes(emails, walk="john...@apache.org->from", fq="body:(solr rocks)", gather="to") Is it possible to replace "body:(solr rocks)" by some streaming expression like "search" function for example? Like as follows.. gatherNodes(emails, walk="john...@apache.org->from", fq="search(...)", // use streaming expression as filter gather="to") In my case, it would improve performance significantly if one can do that. Other approach I can think of is to save results of "search" streaming expression in some variable in pipeline and then use it at multiple places including "fq" clause of "gatherNodes". Is it possible to do something like this?
Re: Applying streaming expression as a filter in graph traversal expression (gatherNodes)
Hi Joel, Thanks for the reply! I have indexed graph data in solr where an "event" can have one or more "participants". Thus, it's a graph of "participants" connected to each other via "events". Because participants are multiple, I am indexing the graph as follows. event--event_participant_child--participant Now my end goal is this, I have a list of "events" and for that list I want to plot a graph of "participants" by connecting them via events (which have to be from the original list). I get this list of "events" from a search() function which I use as my seed expression for gatherNodes(). I am doing a two hop graph traversal as follows. having( having( gatherNodes( collection1, having( gatherNodes( collection1, search(.), // gets list of events with each node having "eventId" walk=eventId->eventId, // walk to event_participant_child document which has both "eventId" and "participantId" gather="participantId", trackTraversal="true", scatter="leaves", count(*) ), gt(count(*),0) ), walk=node->participantId, gather="eventId", fq=(), // limit traversal to original list of events by using search() here?? trackTraversal="true", scatter="branches", count(*) ), eq(level,0) ), gt(count(*),1) ) I am able to get the graph I want from ancestors fields of nodes which are at level 0. Essentially, these are the events from my original list. Using "having()" function, I am able to limit the response so that it only includes original events. But it would be a great improvement if I can also limit the traversal so that only events from original list are visited at second hop. That is why, I want to apply original search() function as a filter in outer gatherNodes() function. I know it's a long shot but considering the potential improvement in performance, I was curious. Please let me know if you feel there is a better approach. Thanks - Pratik On Thu, Jun 21, 2018 at 7:05 PM, Joel Bernstein wrote: > Currently the gatherNodes expression can only be filtered by a traditional > filter query. I'm curious about the type of expression you are thinking of > filtering by? > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, Jun 20, 2018 at 1:54 PM, Pratik Patel wrote: > > > We can limit the scope of graph traversal by applying some filter along > the > > way as follows. > > > > gatherNodes(emails, > > walk="john...@apache.org->from", > > fq="body:(solr rocks)", > > gather="to") > > > > > > Is it possible to replace "body:(solr rocks)" by some streaming > expression > > like "search" function for example? Like as follows.. > > > > gatherNodes(emails, > > walk="john...@apache.org->from", > > fq="search(...)", // use streaming expression as filter > > gather="to") > > > > > > > > In my case, it would improve performance significantly if one can do > that. > > Other approach I can think of is to save results of "search" streaming > > expression in some variable in pipeline and then use it at multiple > places > > including "fq" clause of "gatherNodes". Is it possible to do something > like > > this? > > >
Java library for building Streaming Expressions
Hello Everyone, Is there any java library for building Streaming Expressions? Currently, I am using solr's java client and building Streaming Expressions as follows. StreamFactory factory = new StreamFactory().withCollectionZkHost( collName, zkHost ) .withFunctionName("gatherNodes", GatherNodesStream.class) .withFunctionName("search", CloudSolrStream.class) .withFunctionName("count", CountMetric.class) .withFunctionName("having", HavingStream.class) .withFunctionName("gt", GreaterThanOperation.class) .withFunctionName("eq", EqualsOperation.class); HavingStream cs = (HavingStream) factory.constructStream( ); In this approach, I still have to build streaming_expression_str in code. Is there any better approach for this or is there any java library to do this? My search for it didn't yield anything so I was wondering if anyone here has an idea. Thanks, Pratik
String concatenation in Streaming Expressions
Hello, Is there a function which can be used in Streaming Expressions to concatenate two strings? I want to use it just like add(1,2) in a Streaming Expression. Essentially, I want to achieve something as follows. select( search(..), conceptid as foo, storeid as bar concat(foo,bar) as id ) I can use merge() function but my streaming expression is quite complex and that will make it even more complex as that would be a round about way of doing it. Any idea how this can be achieved? Thanks, Pratik
Re: String concatenation in Streaming Expressions
Thanks a lot for help! Looks like this is a recent addition? It doesn't work for me in version 6.6.4 On Wed, Jun 27, 2018 at 4:18 PM, Aroop Ganguly wrote: > So it will become: > select( > search(..), > conceptid as foo, >storeid as bar > append(conceptid, storeid) as id > ) > > Or > select > select( > search(..), > conceptid as foo, >storeid as bar > ), > foo, > bar, > append(foo,bar) as id > ) > > > On Jun 27, 2018, at 1:12 PM, Aroop Ganguly > wrote: > > > > this test case here will help in understanding the usage: > > https://github.com/apache/lucene-solr/blob/branch_7_2/ > solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/ > AppendEvaluatorTest.java <https://github.com/apache/ > lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/ > apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java> > > > >> On Jun 27, 2018, at 1:07 PM, Aroop Ganguly > wrote: > >> > >> I think u can use the append evaluator > >> https://github.com/apache/lucene-solr/blob/master/solr/ > solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java < > https://github.com/apache/lucene-solr/blob/master/solr/ > solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java> > >> > >> > >>> On Jun 27, 2018, at 12:58 PM, Pratik Patel > wrote: > >>> > >>> Hello, > >>> > >>> Is there a function which can be used in Streaming Expressions to > >>> concatenate two strings? I want to use it just like add(1,2) in a > Streaming > >>> Expression. Essentially, I want to achieve something as follows. > >>> > >>> select( > >>> search(..), > >>> conceptid as foo, > >>> storeid as bar > >>> concat(foo,bar) as id > >>> ) > >>> > >>> I can use merge() function but my streaming expression is quite > complex and > >>> that will make it even more complex as that would be a round about way > of > >>> doing it. Any idea how this can be achieved? > >>> > >>> Thanks, > >>> Pratik > >> > > > >
Re: String concatenation in Streaming Expressions
Thanks Aroop, I tired following Streaming Expression but it doesn't work for me. select( search(collection1,q="*:*",fl="conceptid",sort="conceptid asc",fq=storeid:"59c03d21d997b97bf47b3eeb",fq=schematype:"Article",fq=tags:"genetics", qt="/export"), conceptid as conceptid, storeid as "test_", concat([conceptid,storeid], conceptid, "-") ) It generates an exception, "Invalid expression concat([conceptid,storeid],conceptid,\"-\") - unknown operands found" Is this correct syntax? On Wed, Jun 27, 2018 at 4:30 PM, Aroop Ganguly wrote: > It seems like append is not available on 6.4, but concat is … > Check this out on the 6.4 branch: > https://github.com/apache/lucene-solr/blob/branch_6_4/ > solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/ops/ > ConcatOperationTest.java <https://github.com/apache/ > lucene-solr/blob/branch_6_4/solr/solrj/src/test/org/ > apache/solr/client/solrj/io/stream/ops/ConcatOperationTest.java> > > > > On Jun 27, 2018, at 1:27 PM, Aroop Ganguly > wrote: > > > > It should, but 6.6.* has some issues of things not working per > documentation. > > Try using 7+. > > > >> On Jun 27, 2018, at 1:24 PM, Pratik Patel wrote: > >> > >> Thanks a lot for help! > >> > >> Looks like this is a recent addition? It doesn't work for me in version > >> 6.6.4 > >> > >> > >> > >> On Wed, Jun 27, 2018 at 4:18 PM, Aroop Ganguly > > >> wrote: > >> > >>> So it will become: > >>> select( > >>> search(..), > >>> conceptid as foo, > >>> storeid as bar > >>> append(conceptid, storeid) as id > >>> ) > >>> > >>> Or > >>> select > >>> select( > >>> search(..), > >>> conceptid as foo, > >>> storeid as bar > >>> ), > >>> foo, > >>> bar, > >>> append(foo,bar) as id > >>> ) > >>> > >>>> On Jun 27, 2018, at 1:12 PM, Aroop Ganguly > >>> wrote: > >>>> > >>>> this test case here will help in understanding the usage: > >>>> https://github.com/apache/lucene-solr/blob/branch_7_2/ > >>> solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/eval/ > >>> AppendEvaluatorTest.java <https://github.com/apache/ > >>> lucene-solr/blob/branch_7_2/solr/solrj/src/test/org/ > >>> apache/solr/client/solrj/io/stream/eval/AppendEvaluatorTest.java> > >>>> > >>>>> On Jun 27, 2018, at 1:07 PM, Aroop Ganguly > >>> wrote: > >>>>> > >>>>> I think u can use the append evaluator > >>>>> https://github.com/apache/lucene-solr/blob/master/solr/ > >>> solrj/src/java/org/apache/solr/client/solrj/io/eval/AppendEvaluator.java > < > >>> https://github.com/apache/lucene-solr/blob/master/solr/ > >>> solrj/src/java/org/apache/solr/client/solrj/io/eval/ > AppendEvaluator.java> > >>>>> > >>>>> > >>>>>> On Jun 27, 2018, at 12:58 PM, Pratik Patel > >>> wrote: > >>>>>> > >>>>>> Hello, > >>>>>> > >>>>>> Is there a function which can be used in Streaming Expressions to > >>>>>> concatenate two strings? I want to use it just like add(1,2) in a > >>> Streaming > >>>>>> Expression. Essentially, I want to achieve something as follows. > >>>>>> > >>>>>> select( > >>>>>> search(..), > >>>>>> conceptid as foo, > >>>>>>storeid as bar > >>>>>>concat(foo,bar) as id > >>>>>> ) > >>>>>> > >>>>>> I can use merge() function but my streaming expression is quite > >>> complex and > >>>>>> that will make it even more complex as that would be a round about > way > >>> of > >>>>>> doing it. Any idea how this can be achieved? > >>>>>> > >>>>>> Thanks, > >>>>>> Pratik > >>>>> > >>>> > >>> > >>> > > > >
Re: Bug in scoreNodes function of streaming expressions?
Joel Bernstein wrote > Ok, that sounds like a bug. I can create a ticket for this. > > On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel < > pratik@ > > wrote: > >> I think the problem was that my streaming expression was always returning >> just one node. When I added more data so that I can have more than one >> node, I started seeing the result. >> >> On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel < > pratik@ > > wrote: >> >>> Hello Everyone, >>> >>> I am trying to execute following streaming expression with "scoreNodes" >>> function in it. This is taken from the documentation. >>> >>> scoreNodes(top(n="50", >>>sort="count(*) desc", >>>nodes(baskets, >>> random(baskets, q="productID:ABC", >>> fl="basketID", rows="500"), >>> walk="basketID->basketID", >>> fq="-productID:ABC", >>> gather="productID", >>> count(* >>> >>> I have ensured that I have the collection and data present for it. >>> Upon executing this, I am getting an error message as follows. >>> >>> "No collection param specified on request and no default collection has >>> been set: []" >>> >>> Upon digging into the source code I found that there is a possible bug >>> in >>> ScoreNodesStream.java >>> >>> StringBuilder instance is never appended any string and the block which >>> initializes collection, needs the length of that instance to be more >>> than >>> zero. This condition will always be false and hence the collection will >>> never be set. >>> >>> I checked this file in solr version 8.1 and that also has the same >>> issue. >>> Is there any JIRA open for this or any patch available? >>> >>> [image: image.png] >>> >>> Thanks, >>> Pratik >>> >> Hi Joel, You mentioned creating a ticket for this bug, I can't find any, was it created? If not then I can create one. Currently, ScoreNodes has two issues. 1. It fails when result has only one node. 2. It triggers a GET request instead of POST. GET fails if number of nodes is large. I have been using a custom class as workaround for #2, it would be good to use the original SolrJ class. Thanks, Pratik -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Bug in scoreNodes function of streaming expressions?
Thanks a lot. I will update the ticket with more details if appropriate. Pratik On Wed, Jan 29, 2020 at 10:07 AM Joel Bernstein wrote: > Here is the ticket: > https://issues.apache.org/jira/browse/SOLR-14231 > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Wed, Jan 29, 2020 at 10:03 AM Joel Bernstein > wrote: > > > Hi Pratik, > > > > I'll create the ticket now and report back. If you've got a fix please > > post it to the ticket and I'll try to get this in for the next release. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > > > On Tue, Jan 28, 2020 at 11:52 AM pratik@semandex > > wrote: > > > >> Joel Bernstein wrote > >> > Ok, that sounds like a bug. I can create a ticket for this. > >> > > >> > On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel < > >> > >> > pratik@ > >> > >> > > wrote: > >> > > >> >> I think the problem was that my streaming expression was always > >> returning > >> >> just one node. When I added more data so that I can have more than > one > >> >> node, I started seeing the result. > >> >> > >> >> On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel < > >> > >> > pratik@ > >> > >> > > wrote: > >> >> > >> >>> Hello Everyone, > >> >>> > >> >>> I am trying to execute following streaming expression with > >> "scoreNodes" > >> >>> function in it. This is taken from the documentation. > >> >>> > >> >>> scoreNodes(top(n="50", > >> >>>sort="count(*) desc", > >> >>>nodes(baskets, > >> >>> random(baskets, q="productID:ABC", > >> >>> fl="basketID", rows="500"), > >> >>> walk="basketID->basketID", > >> >>> fq="-productID:ABC", > >> >>> gather="productID", > >> >>> count(* > >> >>> > >> >>> I have ensured that I have the collection and data present for it. > >> >>> Upon executing this, I am getting an error message as follows. > >> >>> > >> >>> "No collection param specified on request and no default collection > >> has > >> >>> been set: []" > >> >>> > >> >>> Upon digging into the source code I found that there is a possible > bug > >> >>> in > >> >>> ScoreNodesStream.java > >> >>> > >> >>> StringBuilder instance is never appended any string and the block > >> which > >> >>> initializes collection, needs the length of that instance to be more > >> >>> than > >> >>> zero. This condition will always be false and hence the collection > >> will > >> >>> never be set. > >> >>> > >> >>> I checked this file in solr version 8.1 and that also has the same > >> >>> issue. > >> >>> Is there any JIRA open for this or any patch available? > >> >>> > >> >>> [image: image.png] > >> >>> > >> >>> Thanks, > >> >>> Pratik > >> >>> > >> >> > >> > >> > >> Hi Joel, > >> > >> You mentioned creating a ticket for this bug, I can't find any, was it > >> created? If not then I can create one. Currently, ScoreNodes has two > >> issues. > >> > >> 1. It fails when result has only one node. > >> 2. It triggers a GET request instead of POST. GET fails if number of > nodes > >> is large. > >> > >> I have been using a custom class as workaround for #2, it would be good > to > >> use the original SolrJ class. > >> > >> Thanks, > >> Pratik > >> > >> > >> > >> -- > >> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html > >> > > >
Solr Analyzer : Filter to drop tokens based on some logic which needs access to adjacent tokens
Hello Everyone, Let's say I have an analyzer which has following token stream as an output. *token stream : [], a, ab, [], c, [], d, de, def .* Now let's say I want to add another filter which will drop a certain tokens based on whether adjacent token on the right side is [] or some string. for a given token, drop/replace it by empty string it if there is a non-empty string token on its right and keep it if there is an empty token string on its right. based on this, the resulting token stream would be like this. *desired output stream : [], [a], ab, [], c, [], d, de, def * *Is there any Filter available in solr with which this can be achieved?* *If writing a custom filter is the only possible option then I want to know whether its possible to access adjacent tokens in the custom filter?* *Any idea about this would be really helpful.* Thanks, Pratik
Re: Solr Analyzer : Filter to drop tokens based on some logic which needs access to adjacent tokens
Thanks for the reply Emir. I will be exploring the option of creating a custom filter. It's good to know that we can consume more than one tokens from previous filter and emit different number of tokens. Do you know of any existing filter in Solr which does something similar? It would be greatly helpful to see how more than one tokens can be consumed. I can implement my custom logic once I have access to multiple tokens from previous filter. Thanks Pratik On Mon, Feb 10, 2020 at 2:47 AM Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Pratik, > You might be able to do some of required things using > PatternReplaceChartFilter, but as you can see it does not operate on tokens > level but input string. Your best bet is custom token filter. Not sure how > familiar you are with how token filters work, but you have access to tokens > from previous filter and you can implement any logic you want: you consume > three tokens and emit tokens based on adjacent tokens. > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 7 Feb 2020, at 19:27, Pratik Patel wrote: > > > > Hello Everyone, > > > > Let's say I have an analyzer which has following token stream as an > output. > > > > *token stream : [], a, ab, [], c, [], d, de, def .* > > > > Now let's say I want to add another filter which will drop a certain > tokens > > based on whether adjacent token on the right side is [] or some string. > > > > for a given token, > > drop/replace it by empty string it if there is a non-empty string > > token on its right and > > keep it if there is an empty token string on its right. > > > > based on this, the resulting token stream would be like this. > > > > *desired output stream : [], [a], ab, [], c, [], d, > > de, def * > > > > > > *Is there any Filter available in solr with which this can be achieved?* > > *If writing a custom filter is the only possible option then I want to > know > > whether its possible to access adjacent tokens in the custom filter?* > > > > *Any idea about this would be really helpful.* > > > > Thanks, > > Pratik > >
NPE Issue with atomic update to nested document or child document through SolrJ
Hello Everyone, I am trying to update a field of a child document using atomic updates feature. I am using solr and solrJ version 8.5.0 I have ensured that my schema satisfies the conditions for atomic updates and I am able to do atomic updates on normal documents but with nested child documents, I am getting a Null Pointer Exception. Following is the simple test which I am trying. TestPojo pojo1 = new TestPojo().cId( "abcd" ) > .conceptid( "c1" ) > .storeid( storeId ) > .testChildPojos( > Collections.list( testChildPOJO, testChildPOJO2, > testChildPOJO3 ) > ); > TestChildPOJOtestChildPOJO = new TestChildPOJO().cId( > "c1_child1" ) > .conceptid( "c1" ) > .storeid( storeId ) > .fieldName( > "c1_child1_field_value1" ) > .startTime( > Date.from( now.minus( 10, ChronoUnit.DAYS ) ) ) > .integerField_iDF( > 10 ) > > .booleanField_bDF(true); > // index pojo1 with child testChildPOJO > SolrInputDocument sdoc = new SolrInputDocument(); > sdoc.addField( "_route_", pojo1.cId() ); > sdoc.addField( "id", testChildPOJO.cId() ); > sdoc.addField( "conceptid", testChildPOJO.conceptid() ); > sdoc.addField( "storeid", testChildPOJO.cId() ); > sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", > Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify field > "fieldName" > collection.client.add( sdoc ); // results in NPE! Stack Trace: ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to > collection [collectionTest2] failed due to (500) > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at > http://172.15.1.100:8081/solr/collectionTest2_shard1_replica_n1: > java.lang.NullPointerException > at > org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:308) > at > org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:405) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:711) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:374) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339) > at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225) > at > org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245) > at > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) > at > org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:332) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:281) > at > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338) > at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:236) > at > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303) > at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283) > at > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196) > at > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:127) > at > org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:122) > at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802) > at org.apache.solr.s
Re: NPE Issue with atomic update to nested document or child document through SolrJ
Looking at some other unit tests in repo, I tried an approach using UpdateRequest as follows. SolrInputDocument sdoc = new SolrInputDocument( ); > sdoc.addField( "id", testChildPOJO.id() ); > sdoc.setField( "fieldName", > java.util.Collections.singletonMap("set", testChildPOJO.fieldName() + > postfix) ); > final UpdateRequest req = new UpdateRequest(); > req.withRoute( pojo1.id() ); > req.add(sdoc); > > collection.client.request( req, collection.getCollectionName() > ); > req.commit( collection.client, collection.getCollectionName()); But this also results in the SAME Null Pointer Exception. Looking at the source code, it looks like "fieldPath" is null below. > AtomicUpdateDocumentMerger.getFieldFromHierarchy(SolrInputDocument > completeHierarchy, String fieldPath) { > final List docPaths = > StrUtils.splitSmart(fieldPath.substring(1), '/'); > .. >} Any idea what's wrong here? Thanks On Wed, Sep 16, 2020 at 1:27 PM Pratik Patel wrote: > Hello Everyone, > > I am trying to update a field of a child document using atomic updates > feature. I am using solr and solrJ version 8.5.0 > > I have ensured that my schema satisfies the conditions for atomic updates > and I am able to do atomic updates on normal documents but with nested > child documents, I am getting a Null Pointer Exception. Following is the > simple test which I am trying. > > TestPojo pojo1 = new TestPojo().cId( "abcd" ) >> .conceptid( "c1" ) >> .storeid( storeId ) >> .testChildPojos( >> Collections.list( testChildPOJO, testChildPOJO2, >> testChildPOJO3 ) >> ); >> TestChildPOJOtestChildPOJO = new TestChildPOJO().cId( >> "c1_child1" ) >> .conceptid( "c1" ) >> .storeid( storeId ) >> .fieldName( >> "c1_child1_field_value1" ) >> .startTime( >> Date.from( now.minus( 10, ChronoUnit.DAYS ) ) ) >> .integerField_iDF( >> 10 ) >> >> .booleanField_bDF(true); >> // index pojo1 with child testChildPOJO >> SolrInputDocument sdoc = new SolrInputDocument(); >> sdoc.addField( "_route_", pojo1.cId() ); >> sdoc.addField( "id", testChildPOJO.cId() ); >> sdoc.addField( "conceptid", testChildPOJO.conceptid() ); >> sdoc.addField( "storeid", testChildPOJO.cId() ); >> sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", >> Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify field >> "fieldName" >> collection.client.add( sdoc ); // results in NPE! > > > Stack Trace: > > ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to >> collection [collectionTest2] failed due to (500) >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error >> from server at >> http://172.15.1.100:8081/solr/collectionTest2_shard1_replica_n1: >> java.lang.NullPointerException >> at >> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:308) >> at >> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:405) >> at >> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:711) >> at >> org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:374) >> at >> org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339) >> at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50) >> at >> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339) >> at >> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225) >> at >> org.apache.solr.update.processor.DistributedZkUp
Re: NPE Issue with atomic update to nested document or child document through SolrJ
Following are the approaches I have tried so far and both results in NPE. *approach 1 TestChildPOJO testChildPOJO = new TestChildPOJO().cId( "c1_child1" ) .conceptid( "c1" ) .storeid( storeId ) .fieldName( "c1_child1_field_value1" ) .startTime( Date.from( now.minus( 10, ChronoUnit.DAYS ) ) ) .integerField_iDF( 10 ) .booleanField_bDF(true); TestPojo pojo1 = new TestPojo().cId( "abcd" ) .conceptid( "c1" ) .storeid( storeId ) .testChildPojos( Collections.list( testChildPOJO, testChildPOJO2, testChildPOJO3 ) ); // index pojo1 with child testChildPOJO SolrInputDocument sdoc = new SolrInputDocument(); sdoc.addField( "_route_", pojo1.cId() ); sdoc.addField( "id", testChildPOJO.cId() ); sdoc.addField( "conceptid", testChildPOJO.conceptid() ); sdoc.addField( "storeid", testChildPOJO.cId() ); sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify field "fieldName" collection.client.add( sdoc ); // results in NPE! *approach 1 *approach 2 SolrInputDocument sdoc = new SolrInputDocument( ); sdoc.addField( "id", testChildPOJO.id() ); sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", testChildPOJO.fieldName() + postfix) ); final UpdateRequest req = new UpdateRequest(); req.withRoute( pojo1.id() ); req.add(sdoc); collection.client.request( req, collection.getCollectionName() ); req.commit( collection.client, collection.getCollectionName()); *approach 2 -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: NPE Issue with atomic update to nested document or child document through SolrJ
Thanks for your reply Alexandre. I have "_root_" and "_nest_path_" fields in my schema but not "_nest_parent_". I ran my test after adding the "_nest_parent_" field and I am not getting NPE any more which is good. Thanks! But looking at the documents in the index, I see that after the atomic update, now there are two children documents with the same id. One document has old values and another one has new values. Shouldn't they be merged based on the "id"? Do we need to specify anything else in the request to ensure that documents are merged/updated and not duplicated? For your reference, below is the test I am running now. // update field of one child doc SolrInputDocument sdoc = new SolrInputDocument( ); sdoc.addField( "id", testChildPOJO.id() ); sdoc.addField( "conceptid", testChildPOJO.conceptid() ); sdoc.addField( "storeid", "foo" ); sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", Collections.list("bar" ) )); final UpdateRequest req = new UpdateRequest(); req.withRoute( pojo1.id() );// parent id req.add(sdoc); collection.client.request( req, collection.getCollectionName() ); collection.client.commit(); Resulting documents : {id=c1_child1, conceptid=c1, storeid=s1, fieldName=c1_child1_field_value1, startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112} {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true, _root_=abcd, _version_=1678099970405695488} On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch wrote: > Can you double-check your schema to see if you have all the fields > required to support nested documents. You are supposed to get away > with just _root_, but really you should also include _nest_path and > _nest_parent_. Your particular exception seems to be triggering > something (maybe a bug) related to - possibly - missing _nest_path_ > field. > > See: > https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents > > Regards, >Alex. > > On Wed, 16 Sep 2020 at 13:28, Pratik Patel wrote: > > > > Hello Everyone, > > > > I am trying to update a field of a child document using atomic updates > > feature. I am using solr and solrJ version 8.5.0 > > > > I have ensured that my schema satisfies the conditions for atomic updates > > and I am able to do atomic updates on normal documents but with nested > > child documents, I am getting a Null Pointer Exception. Following is the > > simple test which I am trying. > > > > TestPojo pojo1 = new TestPojo().cId( "abcd" ) > > > .conceptid( "c1" ) > > > .storeid( storeId ) > > > .testChildPojos( > > > Collections.list( testChildPOJO, testChildPOJO2, > > > > testChildPOJO3 ) > > > ); > > > TestChildPOJOtestChildPOJO = new TestChildPOJO().cId( > > > "c1_child1" ) > > > .conceptid( "c1" > ) > > > .storeid( > storeId ) > > > .fieldName( > > > "c1_child1_field_value1" ) > > > .startTime( > > > Date.from( now.minus( 10, ChronoUnit.DAYS ) ) ) > > > > .integerField_iDF( > > > 10 ) > > > > > > .booleanField_bDF(true); > > > // index pojo1 with child testChildPOJO > > > SolrInputDocument sdoc = new SolrInputDocument(); > > > sdoc.addField( "_route_", pojo1.cId() ); > > > sdoc.addField( "id", testChildPOJO.cId() ); > > > sdoc.addField( "conceptid", testChildPOJO.conceptid() ); > > > sdoc.addField( "storeid", testChildPOJO.cId() ); > > > sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", > > > Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify > field > > > "fieldName" > > > collection.client.add( sdoc ); // results in NPE! > > > > > > Stack Trace: > > > > ERROR org.apache.solr.client.solrj.impl.Base
Re: NPE Issue with atomic update to nested document or child document through SolrJ
I am running this in a unit test which deletes the collection after the test is over. So every new test run gets a fresh collection. It is a very simple test where I am first indexing a couple of parent documents with few children and then testing an atomic update on one parent as I have posted in my previous message. (using UpdateRequest) I am not sure if I am triggering the atomic update correctly, do you see any potential issue in that code? I noticed something in the documentation here. https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents field_type is declared with name *"_nest_path_"* whereas field is declared with type *"nest_path". * Is this intentional? or should it be as follows? Also, should we explicitly set index=true and store=true on _nest_path_ and _nest_parent_ fields? On Thu, Sep 17, 2020 at 1:17 PM Alexandre Rafalovitch wrote: > Did you reindex the original document after you added a new field? If > not, then the previously indexed content is missing it and your code > paths will get out of sync. > > Regards, >Alex. > P.s. I haven't done what you are doing before, so there may be > something I am missing myself. > > > On Thu, 17 Sep 2020 at 12:46, Pratik Patel wrote: > > > > Thanks for your reply Alexandre. > > > > I have "_root_" and "_nest_path_" fields in my schema but not > > "_nest_parent_". > > > > > > > > > > > docValues="false" /> > > > > > name="_nest_path_" class="solr.NestPathField" /> > > > > I ran my test after adding the "_nest_parent_" field and I am not getting > > NPE any more which is good. Thanks! > > > > But looking at the documents in the index, I see that after the atomic > > update, now there are two children documents with the same id. One > document > > has old values and another one has new values. Shouldn't they be merged > > based on the "id"? Do we need to specify anything else in the request to > > ensure that documents are merged/updated and not duplicated? > > > > For your reference, below is the test I am running now. > > > > // update field of one child doc > > SolrInputDocument sdoc = new SolrInputDocument( ); > > sdoc.addField( "id", testChildPOJO.id() ); > > sdoc.addField( "conceptid", testChildPOJO.conceptid() ); > > sdoc.addField( "storeid", "foo" ); > > sdoc.setField( "fieldName", > > java.util.Collections.singletonMap("set", Collections.list("bar" ) )); > > > > final UpdateRequest req = new UpdateRequest(); > > req.withRoute( pojo1.id() );// parent id > > req.add(sdoc); > > > > collection.client.request( req, > collection.getCollectionName() > > ); > > collection.client.commit(); > > > > > > Resulting documents : > > > > {id=c1_child1, conceptid=c1, storeid=s1, > fieldName=c1_child1_field_value1, > > startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10, > > booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112} > > {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon > Sep > > 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true, > > _root_=abcd, _version_=1678099970405695488} > > > > > > > > > > > > > > On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch < > arafa...@gmail.com> > > wrote: > > > > > Can you double-check your schema to see if you have all the fields > > > required to support nested documents. You are supposed to get away > > > with just _root_, but really you should also include _nest_path and > > > _nest_parent_. Your particular exception seems to be triggering > > > something (maybe a bug) related to - possibly - missing _nest_path_ > > > field. > > > > > > See: > > > > https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents > > > > > > Regards, > > >Alex. > > > > > > On Wed, 16 Sep 2020 at 13:28, Pratik Patel > wrote: > > > > > > > > Hello Everyone, > > > > > > > > I am trying to update a field of a child document using atomic > updates > > > > feature. I am using solr and solrJ version 8.5.0 > > > > >
SolrJ : Inserting Bean object containing different types of Child documents
Hello Everyone, I have a Bean object which can have child documents of classes Child_type1 and Child_type2. When I try to index this document, I get an error message "Doc cannot have more than one Field with child=true". I looked at the mailing list but couldn't find any solution for this. Any suggestions on how such documents should be indexed? I am using SolrJ version 7.7.1 and Solr 7.4.0 Thanks! Pratik
Pagination with streaming expressions
Hello Everyone, Is there a way to paginate the results of Streaming Expression? Let's say I have a simple gatherNodes function which has count operation at the end of it. I can sort by the count fine but now I would like to be able to select specific sub set of result based on pagination parameters. Is there any way to do that? Thanks! Pratik
Writing unit tests to test complex solr queries
Hello Everyone, I want to write unit tests for some solr queries which are being triggered through java code. These queries includes complex streaming expressions and faceting queries which requires large number of documents to be present in solr index. I can not create and push so many documents programmatically through my tests. I am trying to find a way to test these queries without depending on externally running solr instance. I found following approach which is using classes like EmbeddedSolrServer and CoreContainer. We can put index files and solr configuration on classpath and run the tests against them. https://dzone.com/articles/junit-testing-for-solr-6 However, this seems to be an old approach and I am trying to find a way to do it using latest solr-test-framework. I also can not use old approach because I want to test Streaming Expressions as well and I need SolrCloudClient for that. In solr-test-framework, I found MiniSolrCloudCluster class but I don't know how to use pre-created index files and configuration with that. Does anyone know how we can use pre-created index files and configuration with latest test-framework? What is the recommended way to do such kind of testing? Any direction with this would be really helpful. Thanks! Pratik
Re: Writing unit tests to test complex solr queries
Thanks a lot for the response Mikhail and Angie! I did go through most of the test classes in solr before posting here but couldn't find anything which is close to what I want to do which is to load pre-created index files and configuration or at least index files. However, the class HelloWorldSolrCloudTestCase.java class pointed out by Angie put together with his code that he has shared seems to be completing the picture and looks spot on! Thanks a lot. I will try to re-write my unit tests with this approach and will post an update soon. @Angie, can you please share the format of data in your "testdata/test-data.json" file? I want to be sure about using the correct format. Thanks! Pratik On Tue, May 14, 2019 at 1:14 PM Angie Rabelero wrote: > Hi, I’ll advised you to extend the class SolrCloudTestCase, which extends > the MiniSolrCloudCluster. Theres a hello world example in the solr source > at > https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/HelloWorldSolrCloudTestCase.java > . > > Here’s how I setup a cluster, create a collection with my ConfigSet, and > index a file. > > @BeforeClass > public static void setupCluster() throws Exception { > > // Create and configure cluster > configureCluster(nodeCount) > .addConfig(CONFIG_NAME, getFile(CONFIG_DIR).toPath()) > .configure(); > > // Create an empty collection > Create.createCollection(COLLECTION, CONFIG_NAME, numShards, > numReplicas) > .setMaxShardsPerNode(maxShardsPerNode) > .process(cluster.getSolrClient(), COLLECTION); > AbstractDistribZkTestBase > .waitForRecoveriesToFinish(COLLECTION, > cluster.getSolrClient().getZkStateReader(), true, true, 120); > > // Set default collection > cluster.getSolrClient().setDefaultCollection(COLLECTION); > > // Add documents to collection > ContentStreamUpdateRequest up = new > ContentStreamUpdateRequest("/update"); > up.addFile(getFile("testdata/test-data.json"), "application/json"); > up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); > NamedList result = cluster.getSolrClient().request(up); > > // Print cluster status > System.out.println("Default Collection: " + > cluster.getSolrClient().getDefaultCollection()); > System.out.println("Cluster State: " + > cluster.getSolrClient().getZkStateReader().getClusterState()); > System.out.println("Update Result: " + result); > > } > > I copy the configset to the resources dir in the pom using a mauven > plugin. And the test file is already in the resources dir. > > > > > > On May 14, 2019, at 04:01, Mikhail Khludnev wrote: > > > > Hello, Pratick. > > Welcome to mysterious world of Solr testing. The best way is to find > > existing test closest to your problem field, copy in and amend > necessarily. > > What about > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_solr_solrj_src_test_org_apache_solr_client_solrj_io_stream_StreamExpressionTest.java&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=lUsTzFRk0CX38HvagQ0wd52D67dA0fx_D6M6F3LHzAU&m=9tFliF4KA1tiG2lGmDJWO34hyq9-Sz1inAxRPVKkz78&s=KjveDzxzQAKRmvzPYk2y1FQ-w6yAGWuwfTVGHMQP2ZA&e= > > ? > > > > On Fri, May 10, 2019 at 11:36 PM Pratik Patel > wrote: > > > >> Hello Everyone, > >> > >> I want to write unit tests for some solr queries which are being > triggered > >> through java code. These queries includes complex streaming expressions > and > >> faceting queries which requires large number of documents to be present > in > >> solr index. I can not create and push so many documents programmatically > >> through my tests. > >> > >> I am trying to find a way to test these queries without depending on > >> externally running solr instance. I found following approach which is > using > >> classes like EmbeddedSolrServer and CoreContainer. We can put index > files > >> and solr configuration on classpath and run the tests against them. > >> > >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__dzone.com_articles_junit-2Dtesting-2Dfor-2Dsolr-2D6&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=lUsTzFRk0CX38HvagQ0wd52D67dA0fx_D6M6F3LHzAU&m=9tFliF4KA1tiG2lGmDJWO34hyq9-Sz1inAxRPVKkz78&s=K4vPwvz9h9H8s-nsZTbkmCvTh002RP3CHcpbb9IOrpw&e= > >> > >> However, this seems to be an old approach and
Solr test framework not able to upload configuration to zk and fails with KeeperException
okeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x1003ec815f30007 type:create cxid:0x16 zxid:0x48 txntype:-1 reqpath:n/a Error Path:/solr/configs Error:KeeperErrorCode = NodeExists for /solr/configs 2019-06-04T15:07:01,163 [ProcessThread(sid:0 cport:50192):] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x1003ec815f30007 type:create cxid:0x17 zxid:0x49 txntype:-1 reqpath:n/a Error Path:/solr/configs/collection2 Error:KeeperErrorCode = NodeExists for /solr/configs/collection2 2019-06-04T15:07:01,163 [ProcessThread(sid:0 cport:50192):] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x1003ec815f30007 type:create cxid:0x18 zxid:0x4a txntype:-1 reqpath:n/a Error Path:/solr/configs/collection2/conf Error:KeeperErrorCode = NodeExists for /solr/configs/collection2/conf 2019-06-04T15:07:01,165 [ProcessThread(sid:0 cport:50192):] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x1003ec815f30007 type:create cxid:0x1b zxid:0x4d txntype:-1 reqpath:n/a Error Path:/solr/configs Error:KeeperErrorCode = NodeExists for /solr/configs 2019-06-04T15:07:01,166 [ProcessThread(sid:0 cport:50192):] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x1003ec815f30007 type:create cxid:0x1c zxid:0x4e txntype:-1 reqpath:n/a Error Path:/solr/configs/collection2 Error:KeeperErrorCode = NodeExists for /solr/configs/collection2 ** I have searched through the mailing list and related areas. Also, I have tried various ways of creating MiniSolrCloudCluster but I get the same exception. I have made sure that a new directory is always used as BASE_DIR to MiniSolrCloudCluster. Can anyone please throw some light on whats wrong here? Am I hitting any solr test framework issue? I am using solr test framework version 7.7.1 Thanks a lot, Pratik
Loading pre created index files into MiniSolrCloudCluster of test framework
Hello Everyone, I am trying to write some unit tests for solr queries which requires some data in specific state. There is a way to load this data through json files but the problem is that the required data needs to have parent-child blocks to be present. Because of this, I would prefer if there is a way to load pre-created index files into the cluster. I checked the solr test framework and related examples but couldn't find any example of index files being loaded in cloud mode. Is there a way to load index files into solr running in cloud mode? Thanks! Pratik
Re: Loading pre created index files into MiniSolrCloudCluster of test framework
Thanks for the reply Alexandre, only special thing about JSON/XML is that in order to export the data in that form, I need to have "docValues" enabled for all the fields which are to be retrieved. I need to retrieve all the fields and I can not enable docValues on all fields. If there was a way to export data in JSON format without having to change schema and index then I would have no issues with JSON. I can not use "select" handler as it does not include parent/child relationships. The options I have are following I guess. I am not sure if they are real possibilities though. 1. Find a way to load pre-created index files either through SolrCloudClient or directly to ZK 2. Find a way to export the data in JSON format without having to make all fields docValues enabled. 3. Use Merge Index tool with an empty index and a real index. I am don't know if it is possible to do this through solrJ though. Please let me know if there is better way available, it would really help. Just so you know, I am trying to do this for unit tests related to solr queries. Ultimately I want to load some pre-created data into MiniSolrCloudCluster. Thanks a lot, Pratik On Wed, Jun 5, 2019 at 6:56 PM Alexandre Rafalovitch wrote: > Is there something special about parent/child blocks you cannot do through > JSON? Or XML? > > Both Solr XML and Solr JSON support it. > > New style parent/child mapping is also supported in latest Solr but I think > it is done differently. > > Regards, > Alex > > On Wed, Jun 5, 2019, 6:29 PM Pratik Patel, wrote: > > > Hello Everyone, > > > > I am trying to write some unit tests for solr queries which requires some > > data in specific state. There is a way to load this data through json > files > > but the problem is that the required data needs to have parent-child > blocks > > to be present. > > Because of this, I would prefer if there is a way to load pre-created > index > > files into the cluster. > > I checked the solr test framework and related examples but couldn't find > > any example of index files being loaded in cloud mode. > > > > Is there a way to load index files into solr running in cloud mode? > > > > Thanks! > > Pratik > > >
Re: Solr test framework not able to upload configuration to zk and fails with KeeperException
Thanks guys, I found that the issue I had was because of some binary files (NLP models) in my configuration. Once I fixed that, I was able to set up a cluster. These exceptions are still logged but they are logged as INFO and were not the real issue. Thanks Again Pratik On Tue, Jun 4, 2019 at 4:15 PM Angie Rabelero wrote: > For what I know the configuration files need to be already in the > test/resource directory before runnin. I copy them to the directory using a > maven maven-antrun-plugin in the generate-test-sources phase. And the > framework can "create a collection” without the configfiles, but it will > obviously fail when try to use it. > > > On the surface, this znode already exists: > > /solr/configs/collection2 > > So it looks like somehow you're > > > On Jun 4, 2019, at 12:29 PM, Pratik Patel pra...@semandex.net>> wrote: > > > > /solr/configs/collection2 > > > On Jun 4, 2019, at 14:29, Pratik Patel wrote: > > > > Hello Everyone, > > > > I am trying to run a simple unit test using solr test framework. At this > > point, all I am trying to achieve is to be able to upload some > > configuration and create a collection using solr test framework. > > > > Following is the simple code which I am trying to run. > > > > private static final String COLLECTION = "collection2" ; > > > > private static final int numShards = 1; > > private static final int numReplicas = 1; > > private static final int maxShardsPerNode = 1; > > private static final int nodeCount = (numShards*numReplicas + > > (maxShardsPerNode-1))/maxShardsPerNode; > > > > private static final String id = "id"; > > private static final String CONFIG_DIR = > > "src/test/resources/testdata/solr/collection2"; > > > > @BeforeClass > > public static void setupCluster() throws Exception { > > > >// create and configure cluster > >configureCluster(nodeCount) > >.addConfig("collection2", getFile(CONFIG_DIR).toPath()) > >.configure(); > > > >// create an empty collection > >CollectionAdminRequest.createCollection(COLLECTION, "collection2", > > numShards, numReplicas) > >.setMaxShardsPerNode(maxShardsPerNode) > >.process(cluster.getSolrClient()); > > > >// add further document(s) here > >// TODO > > } > > > > > > However, I see that solr fails to upload the configuration to zk. > > Following method of ZooKeeper class fails with the "KeeperException" > > > > public String create(final String path, byte data[], List acl, > >CreateMode createMode) > >throws KeeperException, InterruptedException > > { > >final String clientPath = path; > >PathUtils.validatePath(clientPath, createMode.isSequential()); > > > >final String serverPath = prependChroot(clientPath); > > > >RequestHeader h = new RequestHeader(); > >h.setType(ZooDefs.OpCode.create); > >CreateRequest request = new CreateRequest(); > >CreateResponse response = new CreateResponse(); > >request.setData(data); > >request.setFlags(createMode.toFlag()); > >request.setPath(serverPath); > >if (acl != null && acl.size() == 0) { > >throw new KeeperException.InvalidACLException(); > >} > >request.setAcl(acl); > >ReplyHeader r = cnxn.submitRequest(h, request, response, null); > >if (r.getErr() != 0) { > >throw KeeperException.create(KeeperException.Code.get(r.getErr()), > >clientPath); > >} > >if (cnxn.chrootPath == null) { > >return response.getPath(); > >} else { > >return response.getPath().substring(cnxn.chrootPath.length()); > >} > > } > > > > > > And following are the Keeper exceptions thrown for each file of the > > configuration. > > > > Basically, it says > > Got user-level KeeperException when processing sessionid: Error > > Path:/solr/configs Error:KeeperErrorCode = NodeExists for /solr/configs > > > > > ** > > 2019-06-04T15:07:01,157 [ProcessThread(sid:0 cport:50192):] INFO > > org.apache.zookeeper.server.PrepRequestProcessor - Got user-level > > KeeperException when processing sessionid:0x1003ec815f30007 type:create > > cxid:0xe zxid:0x40 txntype:-1 reqpath:n/a Error Path:/solr/configs > > Error:KeeperE
Re: Streaming expression function which can give parent document along with its child documents ?
If your children documents have a link to parent documents (like parent id or something) then you can use graph traversal to do this. On Mon, Jun 10, 2019 at 8:01 AM Jai Jamba wrote: > Can anyone help me in this ? > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Loading pre created index files into MiniSolrCloudCluster of test framework
So, I found a way to programmatically restore a collection from a backup. I though that I could create a backup of a collection, put it on the classpath, restore it during unit test set up and run the queries against newly created collection using restore. Theoretically, it sounded like it would work. I have following code doing the restore. CollectionAdminRequest.Restore restore = CollectionAdminRequest.restoreCollection( newCollectionName, backupName ) .setLocation( pathToBackup ); CollectionAdminResponse resp = restore.process( cluster.getSolrClient() ); AbstractDistribZkTestBase.waitForRecoveriesToFinish( newCollectionName, cluster.getSolrClient().getZkStateReader(), true, true, 30); However, any query I run against this new collection returns zero documents. I have tried queries which should match many documents but they all return zero documents. It seems like the data is not really loaded during the restore operation. I stepped through the "doRestore()" method of class RestoreCore.java which is internally doing the restore, I see that it has no errors or exceptions and the restore operation status is successful, but in reality there is no data in new collection. I see that new collection is created but it seems to be without any data. Am I missing something here? Any idea what could be the cause of this? Thanks! Pratik On Thu, Jun 6, 2019 at 11:18 AM Pratik Patel wrote: > Thanks for the reply Alexandre, only special thing about JSON/XML is that > in order to export the data in that form, I need to have "docValues" > enabled for all the fields which are to be retrieved. I need to retrieve > all the fields and I can not enable docValues on all fields. > If there was a way to export data in JSON format without having to change > schema and index then I would have no issues with JSON. > I can not use "select" handler as it does not include parent/child > relationships. > > The options I have are following I guess. I am not sure if they are real > possibilities though. > > 1. Find a way to load pre-created index files either through > SolrCloudClient or directly to ZK > 2. Find a way to export the data in JSON format without having to make all > fields docValues enabled. > 3. Use Merge Index tool with an empty index and a real index. I am don't > know if it is possible to do this through solrJ though. > > Please let me know if there is better way available, it would really help. > Just so you know, I am trying to do this for unit tests related to solr > queries. Ultimately I want to load some pre-created data into > MiniSolrCloudCluster. > > Thanks a lot, > Pratik > > > On Wed, Jun 5, 2019 at 6:56 PM Alexandre Rafalovitch > wrote: > >> Is there something special about parent/child blocks you cannot do through >> JSON? Or XML? >> >> Both Solr XML and Solr JSON support it. >> >> New style parent/child mapping is also supported in latest Solr but I >> think >> it is done differently. >> >> Regards, >> Alex >> >> On Wed, Jun 5, 2019, 6:29 PM Pratik Patel, wrote: >> >> > Hello Everyone, >> > >> > I am trying to write some unit tests for solr queries which requires >> some >> > data in specific state. There is a way to load this data through json >> files >> > but the problem is that the required data needs to have parent-child >> blocks >> > to be present. >> > Because of this, I would prefer if there is a way to load pre-created >> index >> > files into the cluster. >> > I checked the solr test framework and related examples but couldn't find >> > any example of index files being loaded in cloud mode. >> > >> > Is there a way to load index files into solr running in cloud mode? >> > >> > Thanks! >> > Pratik >> > >> >
How to increase maximum size of files allowed in configuration for MiniSolrCloudCluster
Hi, I am trying to upload a configuration to "MiniSolrCloudCluster" in my unit test. This configuration has some binary files for NLP related functionality. Some of these binary files are bigger than 5 MB. If I try to upload configuration with these files then it doesn't work. I can set up the cluster fine if I remove all binary files bigger than 5 MB. I have noticed the same issue when I try to restore a backup having configuration files bigger than 5 MB. Does jetty have some limit on the size of configuration files? Is there a way to override this? Thanks, Pratik
Re: How to increase maximum size of files allowed in configuration for MiniSolrCloudCluster
That was spot on. Thanks a lot for your help! On Tue, Jun 11, 2019 at 2:14 AM Jörn Franke wrote: > It is probably a Zookeeper limit. You have to set jute.maxbuffer in the > Java System properties of all (!) zookeeper Servers and clients to the same > value (in your case it should be a little bit larger than your largest > file). > If possible you can try to avoid storing the NLP / ML models in Solr but > provide them on a share or similar where all Solr nodes have access to. > > > Am 11.06.2019 um 00:32 schrieb Pratik Patel : > > > > Hi, > > > > I am trying to upload a configuration to "MiniSolrCloudCluster" in my > unit > > test. This configuration has some binary files for NLP related > > functionality. Some of these binary files are bigger than 5 MB. If I try > to > > upload configuration with these files then it doesn't work. I can set up > > the cluster fine if I remove all binary files bigger than 5 MB. > > > > I have noticed the same issue when I try to restore a backup having > > configuration files bigger than 5 MB. > > > > Does jetty have some limit on the size of configuration files? Is there a > > way to override this? > > > > Thanks, > > Pratik >
Bug in scoreNodes function of streaming expressions?
Hello Everyone, I am trying to execute following streaming expression with "scoreNodes" function in it. This is taken from the documentation. scoreNodes(top(n="50", sort="count(*) desc", nodes(baskets, random(baskets, q="productID:ABC", fl="basketID", rows="500"), walk="basketID->basketID", fq="-productID:ABC", gather="productID", count(* I have ensured that I have the collection and data present for it. Upon executing this, I am getting an error message as follows. "No collection param specified on request and no default collection has been set: []" Upon digging into the source code I found that there is a possible bug in ScoreNodesStream.java StringBuilder instance is never appended any string and the block which initializes collection, needs the length of that instance to be more than zero. This condition will always be false and hence the collection will never be set. I checked this file in solr version 8.1 and that also has the same issue. Is there any JIRA open for this or any patch available? [image: image.png] Thanks, Pratik
Re: Bug in scoreNodes function of streaming expressions?
I think the problem was that my streaming expression was always returning just one node. When I added more data so that I can have more than one node, I started seeing the result. On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel wrote: > Hello Everyone, > > I am trying to execute following streaming expression with "scoreNodes" > function in it. This is taken from the documentation. > > scoreNodes(top(n="50", >sort="count(*) desc", >nodes(baskets, > random(baskets, q="productID:ABC", fl="basketID", > rows="500"), > walk="basketID->basketID", > fq="-productID:ABC", > gather="productID", > count(* > > I have ensured that I have the collection and data present for it. > Upon executing this, I am getting an error message as follows. > > "No collection param specified on request and no default collection has > been set: []" > > Upon digging into the source code I found that there is a possible bug in > ScoreNodesStream.java > > StringBuilder instance is never appended any string and the block which > initializes collection, needs the length of that instance to be more than > zero. This condition will always be false and hence the collection will > never be set. > > I checked this file in solr version 8.1 and that also has the same issue. > Is there any JIRA open for this or any patch available? > > [image: image.png] > > Thanks, > Pratik >
Re: Bug in scoreNodes function of streaming expressions?
Great, thanks! On Tue, Jul 2, 2019 at 6:37 AM Joel Bernstein wrote: > Ok, that sounds like a bug. I can create a ticket for this. > > On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel wrote: > > > I think the problem was that my streaming expression was always returning > > just one node. When I added more data so that I can have more than one > > node, I started seeing the result. > > > > On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel > wrote: > > > >> Hello Everyone, > >> > >> I am trying to execute following streaming expression with "scoreNodes" > >> function in it. This is taken from the documentation. > >> > >> scoreNodes(top(n="50", > >>sort="count(*) desc", > >>nodes(baskets, > >> random(baskets, q="productID:ABC", > >> fl="basketID", rows="500"), > >> walk="basketID->basketID", > >> fq="-productID:ABC", > >> gather="productID", > >> count(* > >> > >> I have ensured that I have the collection and data present for it. > >> Upon executing this, I am getting an error message as follows. > >> > >> "No collection param specified on request and no default collection has > >> been set: []" > >> > >> Upon digging into the source code I found that there is a possible bug > in > >> ScoreNodesStream.java > >> > >> StringBuilder instance is never appended any string and the block which > >> initializes collection, needs the length of that instance to be more > than > >> zero. This condition will always be false and hence the collection will > >> never be set. > >> > >> I checked this file in solr version 8.1 and that also has the same > issue. > >> Is there any JIRA open for this or any patch available? > >> > >> [image: image.png] > >> > >> Thanks, > >> Pratik > >> > > >
Re: Bug in scoreNodes function of streaming expressions?
Hi Joel, There also seems to be an issue related to how QueryRequest instance is created in scoreNodes implementation. It seems to be using GET method instead of POST. As a result, when underlying stream is big, scoreNodes function fails with an exception "URI is too large" I found a related is issue mentioned here, http://lucene.472066.n3.nabble.com/Streaming-Expressions-GET-vs-POST-td4415044.html ScoreNodesStream.java initializes QueryRequest as follows. QueryRequest request = new QueryRequest(params); vs TimeSeriesStream.java which does it like this. QueryRequest request = new QueryRequest(paramsLoc, SolrRequest.METHOD.POST); Is this also a bug? On Tue, Jul 2, 2019 at 10:17 AM Pratik Patel wrote: > Great, thanks! > > On Tue, Jul 2, 2019 at 6:37 AM Joel Bernstein wrote: > >> Ok, that sounds like a bug. I can create a ticket for this. >> >> On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel wrote: >> >> > I think the problem was that my streaming expression was always >> returning >> > just one node. When I added more data so that I can have more than one >> > node, I started seeing the result. >> > >> > On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel >> wrote: >> > >> >> Hello Everyone, >> >> >> >> I am trying to execute following streaming expression with "scoreNodes" >> >> function in it. This is taken from the documentation. >> >> >> >> scoreNodes(top(n="50", >> >>sort="count(*) desc", >> >>nodes(baskets, >> >> random(baskets, q="productID:ABC", >> >> fl="basketID", rows="500"), >> >> walk="basketID->basketID", >> >> fq="-productID:ABC", >> >> gather="productID", >> >> count(* >> >> >> >> I have ensured that I have the collection and data present for it. >> >> Upon executing this, I am getting an error message as follows. >> >> >> >> "No collection param specified on request and no default collection has >> >> been set: []" >> >> >> >> Upon digging into the source code I found that there is a possible bug >> in >> >> ScoreNodesStream.java >> >> >> >> StringBuilder instance is never appended any string and the block which >> >> initializes collection, needs the length of that instance to be more >> than >> >> zero. This condition will always be false and hence the collection will >> >> never be set. >> >> >> >> I checked this file in solr version 8.1 and that also has the same >> issue. >> >> Is there any JIRA open for this or any patch available? >> >> >> >> [image: image.png] >> >> >> >> Thanks, >> >> Pratik >> >> >> > >> >
Best way to retrieve parent documents with children using getBeans method?
Hello Everyone, We use SolrJ with POJOs to index documents into solr. If a POJO has a field annotated with @child then SolrJ automatically adds those objects as children of the POJO. This works fine and indexing is done properly. However, when I retrieve the same document through same POJO using "getBeans" method of DocumentObjectBinder class, the field annotated with @child annotation is always null i.e. the children are not populated in POJO. What is the best way to get children in the same POJO along with other fields. I read about child transformers but I am not sure if it is the prescribed and recommended way to get children with parent. What is the best practice to achieve this? Thanks! Pratik
Re: The Visual Guide to Streaming Expressions and Math Expressions
Hi Joel, Looks like this is going to be very helpful, thank you! I am wondering whether the visualizations are generated through third party library or is it something which would be part of solr distribution? https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/visualization.adoc#visualization Thanks, Pratik On Wed, Oct 16, 2019 at 10:54 AM Joel Bernstein wrote: > Hi, > > The Visual Guide to Streaming Expressions and Math Expressions is now > complete. It's been published to Github at the following location: > > > https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/math-expressions.adoc#streaming-expressions-and-math-expressions > > The guide will eventually be part of Solr's release when the RefGuide is > ready to accommodate it. In the meantime its been designed to be easily > read directly from Github. > > The guide contains close to 200 visualizations and examples showing how to > use Streaming Expressions and Math Expressions for data analysis and > visualization. The visual guide is also designed to guide users that are > not experts in math in how to apply the functions to analysis and visualize > data. > > The new visual data loading feature in Solr 8.3 is also covered in the > guide. This feature should cut down on the time it takes to load CSV files > so that more time can be spent on analysis and visualization. > > > https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/loading.adoc#loading-data > > Joel Bernstein >
Re: Issues with the handling of NULLs in Streaming Expressions
I am facing exactly the same issue right now. There is no way to check if a particular field is not present in tuple or is null. Was there any development related to this issue? Is there a work around? In my case, I have an incoming stream of tuples and I want to filter out all the tuples which do not have certain field set, so I was thinking of "having" function like this. having( seed_expr, not(eq(fieldA,null) ) this would result in stream of tuples which definitely have fieldA set and I can do some operation on it. Problem is that "eq" evaluator fails with null value. Is there a related JIRA that I can track? @Joel is there any way/ workaround to achieve this? i.e. to know whether certain field is null or not? Thanks and Regards, Pratik -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
How to change config set for some collection
Hello Everyone, Let's say I have a collection called "collection1" which uses config set "config_set_1". Now, using "upconfig" command, I upload a new configuration called "config_set_2". How can I make "collection1" use "config_set_2" instead of "config_set_1"? I know that if I upload new configuration with the same name "config_set_1" and reload the collection then it will have new configuration but I want to keep the old config set, add a new one and make changes so that collection1 starts using new config set. Is it possible? Thanks and Regards Pratik
Re: How to change config set for some collection
Thanks Shawn! This is what I needed. On Wed, Nov 20, 2019 at 3:59 PM Shawn Heisey wrote: > On 11/20/2019 1:34 PM, Pratik Patel wrote: > > Let's say I have a collection called "collection1" which uses config set > > "config_set_1". > > Now, using "upconfig" command, I upload a new configuration called > > "config_set_2". How can I make "collection1" use "config_set_2" instead > of > > "config_set_1"? > > > > I know that if I upload new configuration with the same name > "config_set_1" > > and reload the collection then it will have new configuration but I want > to > > keep the old config set, add a new one and make changes so that > collection1 > > starts using new config set. > > > > Is it possible? > > There is an action, available in the zkcli script and possibly > elsewhere, called "linkconfig". > > It looks like the config can also be changed with the collections API, > using the MODIFYCOLLECTION action. > > > https://lucene.apache.org/solr/guide/8_2/collection-management.html#modifycollection > > To make the change effective after linking to a new config, you'll need > to reload the collection. > > Thanks, > Shawn >