Re: Use case for the Shingle Filter
The query parser will split on whitespace. I'm not sure how I can use the shingle filter in my query, and use-cases for it. For example, if my fieldType looks like this: ** and I have a document that has "my babysitter is terrific" in the content_t field, a query such as: http://localhost:8983/solr/collection_name/select?q={!lucene}content_t:(the baby sitter was here) won't return the document. I was hoping I'd get tokens like "the thebaby baby babysitter sitter sitterwas ..." when querying. On Sun, 5 Mar 2017 at 23:59 Ryan Josal wrote: > I thought new versions of solr didn't split on whitespace at the query > parser anymore, so this should work? > > That being said, I think I remember it having a problem coming after a > synonym filter. IIRC, if your input is "Foo Bar" and you have a synonym > "foo <=> baz" you would get foobaz bazbar instead of foobar and bazbar. I > wrote a custom shingler to account for that. > > Ryan > > On Sun, Mar 5, 2017 at 02:48 Markus Jelsma > wrote: > > > Hello - we use it for text classification and online near-duplicate > > document detection/filtering. Using shingles means you want to consider > > order in the text. It is analogous to using bigrams and trigrams when > doing > > language detection, you cannot distinguish between Danish and Norwegian > > solely on single characters. > > > > Markus > > > > > > > > -Original message- > > > From:Ryan Yacyshyn > > > Sent: Sunday 5th March 2017 5:57 > > > To: solr-user@lucene.apache.org > > > Subject: Use case for the Shingle Filter > > > > > > Hi everyone, > > > > > > I was thinking of using the Shingle Filter to help solve an issue I'm > > > facing. I can see this working in the analysis panel in the Solr admin, > > but > > > not when I make my queries. > > > > > > I find out it's because of the query parser splitting up the tokens on > > > white space before passing them along. > > > > > > This made me wonder what a practical use case can be, for using the > > shingle > > > filter? > > > > > > Any enlightenment on this would be much appreciated! > > > > > > Thanks, > > > Ryan > > > > > >
Custom DelegatingCollector : collect sorted docs by score
Hi , I developped a custom DelegatingCollector in which I should receive the documents (in collect method) sorted by score. I used SolR 5.5.3. In the older version of SolR, there was a method called acceptsDocsOutOfOrder() . Best Regards --Jamel -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DelegatingCollector-collect-sorted-docs-by-score-tp4323549.html Sent from the Solr - User mailing list archive at Nabble.com.
Learning to rank - Bad Request
Hi all, I've been trying to get learning to rank working on our own search index. Following the LTR-readme (https://github.com/bloomberg/lucene-solr/blob/master-ltr/solr/contrib/ltr/example/README.md) I ran the example python script to train and upload the model, but I already get an error during the uploading of the features: Bad Request (400) - Expected Map to create a new ManagedResource but received a java.util.ArrayList at org.apache.solr.rest.RestManager$RestManagerManagedResource.doPut(RestManager.java:523) at org.apache.solr.rest.ManagedResource.doPost(ManagedResource.java:355) at org.apache.solr.rest.RestManager$ManagedEndpoint.post(RestManager.java:351) at org.restlet.resource.ServerResource.doHandle(ServerResource.java:454) ... This makes sense: the json feature file is an array, and the RestManager needs a Map in doPut. Using the curl command from the cwiki (https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank) yields the same error, but instead of it having "received a java.util.ArrayList" it "received a java.lang.String". I wonder how this actually is supposed to work, and what's going wrong in this case. I have tried the LTR with the default techproducts example, and that worked just fine. Does anyone have an idea of what's going wrong here? Thanks in advance! Vincent
Conditions for replication to copy full index
Hi all, We've recently had some issues with a 5.1.0 core copying the whole index when it was set to replicate from a master core. I've read that if there are documents that have been added to the slave core by mistake, it will do a full copy. Though we are still investigating, this is probably not the cause of it. Are there any other conditions in which the slave core will do a full copy of an index instead of only the necessary files? Thanks, Chris
Re: Conditions for replication to copy full index
We need to be pretty nit-picky here. bq: do a full copy of an index instead of only the necessary files It's all about "necessary files". "necessary" here means a all changed segments. Since segments are not changed after a commit, then replication can safely ignore any segments files it already has and only copies new segments. The rub is that "new" includes merged segments. And it's possible that _all_ current segments are merged into a new segment. At that point, technically, a full copy is done. You can force this by an optimize (not recommended) or, perhaps expungeDeletes options. Here's a great video of segment merging, the third one down is the TieredMergePolicy which has been the default for some time. http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html And, if you want to force a full replication, shut down the slave, "rm -rf data". (data should be the parent of the "index" dir) and restart solr. Best, Erick On Mon, Mar 6, 2017 at 8:06 AM, Chris Ulicny wrote: > Hi all, > > We've recently had some issues with a 5.1.0 core copying the whole index > when it was set to replicate from a master core. > > I've read that if there are documents that have been added to the slave > core by mistake, it will do a full copy. Though we are still investigating, > this is probably not the cause of it. > > Are there any other conditions in which the slave core will do a full copy > of an index instead of only the necessary files? > > Thanks, > Chris
Re: Conditions for replication to copy full index
Thanks Erik. I love Mike's video on segment merging. However I do not believe a large number of merged segments or accidental optimization is the issue. The data in the core is mostly static and there is no evidence so far of a large number of merges that took place. Usually the only updates the index receives are deletes. The other reason I assume it was a copy of the entire data directory is that the log lines for the IndexFetcher threads have the fullCopy flag set to true, where the usual replication seems to have it set to false. This fullCopy for the core in question is preceded by a failure to fetch the index on the previous replication attempt, but the subsequent check yields matching generations between the slave and master. I've included the logs for the indexFetcher thread for the core. 11:13:00,138 ERROR [org.apache.solr.handler.IndexFetcher] (indexFetcher-23-thread-1) Master at: is not available. Index fetch failed. Exception: IOException occured when talking to server at: 11:14:00,036 INFO [org.apache.solr.handler.IndexFetcher] (indexFetcher-23-thread-1) Master's generation: 182823 11:14:00,044 INFO [org.apache.solr.handler.IndexFetcher] (indexFetcher-23-thread-1) Slave's generation: 182823 11:14:00,081 INFO [org.apache.solr.handler.IndexFetcher] (indexFetcher-23-thread-1) Starting replication process 11:14:00,422 INFO [org.apache.solr.handler.IndexFetcher] (indexFetcher-23-thread-1) Number of files in latest index in master: 404 11:14:00,435 INFO [org.apache.solr.core.CachingDirectoryFactory] (indexFetcher-23-thread-1) return new directory for //data/index.20170306111400434 11:14:00,555 INFO [org.apache.solr.handler.IndexFetcher] (indexFetcher-23-thread-1) Starting download to NRTCachingDirectory(MMapDirectory@//data/index.20170306111400434 lockFactory=org.apache.lucene.store.NativeFSLockFactory@6a453731; maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true Thanks On Mon, Mar 6, 2017 at 11:30 AM Erick Erickson wrote: > We need to be pretty nit-picky here. > > bq: do a full copy of an index instead of only the necessary files > > It's all about "necessary files". "necessary" here means a > all changed segments. Since segments are not changed > after a commit, then replication can safely ignore any segments > files it already has and only copies new segments. > > The rub is that "new" includes merged segments. And it's > possible that _all_ current segments are merged into a new > segment. At that point, technically, a full copy is done. > > You can force this by an optimize (not recommended) or, > perhaps expungeDeletes options. > > Here's a great video of segment merging, the third one down > is the TieredMergePolicy which has been the default for some > time. > > > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html > > And, if you want to force a full replication, shut down the slave, > "rm -rf data". (data should be the parent of the "index" dir) and > restart solr. > > Best, > Erick > > On Mon, Mar 6, 2017 at 8:06 AM, Chris Ulicny wrote: > > Hi all, > > > > We've recently had some issues with a 5.1.0 core copying the whole index > > when it was set to replicate from a master core. > > > > I've read that if there are documents that have been added to the slave > > core by mistake, it will do a full copy. Though we are still > investigating, > > this is probably not the cause of it. > > > > Are there any other conditions in which the slave core will do a full > copy > > of an index instead of only the necessary files? > > > > Thanks, > > Chris >
Re: Does {!child} query support nested Queries ("v=")
Hi Mikhail, Sorry I didn’t reply sooner Here are some example docs - each document for a userAccount object has 1 or more nested documents for our userLinkedAccount object SolrInputDocument(fields: [type=userAccount, typeId=userAccount/HERE-8ce41333-7c08-40d3-9b2c-REDACTED, id=userAccount/HERE-8ce41333-7c08-40d3-9b2c-REDACTED, emailAddress=[redac...@here.com, REDACTED here.com], nameSort=�, emailType=Primary, familyName=REDACTED, allText=[REDACTED, REDACTED , untokenized=[REDACTED, REDACTED , isEnabled=1, createdTimeNumeric=1406972278682, haAccountId=HERE-8ce41333-7c08-40d3-9b2c-REDACTED, givenName=REDACTED, readAccess=application, indexTime=1488828050933]) SolrInputDocument(fields: [type=userLinkedAccount, typeId=userLinkedAccount/5926990ea0708fa82c9ddca5d1bda6ed3331a450, id=userLinkedAccount/5926990ea0708fa82c9ddca5d1bda6ed3331a450, haAccountId=HERE-8ce41333-7c08-40d3-9b2c-REDACTED, nameSort=�, hereRealm=HERE, haAccountType=password, haUserId= redac...@here.com, readAccess=application, createdTimeNumeric=1406972278646, indexTime=1488828050933]) SolrInputDocument(fields: [type=userAccount, typeId=userAccount/HERE-4797487f-7659-4c58-80b5-REDACTED, id=userAccount/HERE-4797487f-7659-4c58-80b5-REDACTED, emailAddress=[redac...@live.de, redac...@live.de], nameSort=�, emailType=Primary, familyName= REDACTED, allText=[REDACTED, REDACTED], untokenized=[REDACTED, REDACTED], isEnabled=1, createdTimeNumeric=1447141199050, haAccountId=HERE-4797487f-7659-4c58-80b5-REDACTED, givenName=Krzysztof, readAccess=application, indexTime=1488828050941]) SolrInputDocument(fields: [type=userLinkedAccount, typeId=userLinkedAccount/02d11e8096dc4727ee7c2c4f6cc4723190620088, id=userLinkedAccount/02d11e8096dc4727ee7c2c4f6cc4723190620088, haAccountId=HERE-4797487f-7659-4c58-80b5-REDACTED, nameSort=�, hereRealm=HERE, haAccountType=password, haUserId=redac...@live.de, readAccess=application, createdTimeNumeric=1447141199009, indexTime=1488828050941]) SolrInputDocument(fields: [type=userAccount, typeId=userAccount/HERE-8ce41333-7c08-40d3-9b2c-REDACTED, id=userAccount/HERE-8ce41333-7c08-40d3-9b2c-REDACTED, emailAddress=[redac...@here.com, REDACTED here.com], nameSort=�, emailType=Primary, familyName= REDACTED, allText=[REDACTED, REDACTED], untokenized=[REDACTED, REDACTED], isEnabled=1, createdTimeNumeric=1406972278682, haAccountId=HERE-8ce41333-7c08-40d3-9b2c-REDACTED, givenName= REDACTED, readAccess=application, indexTime=1488828051697]) SolrInputDocument(fields: [type=userLinkedAccount, typeId=userLinkedAccount/5926990ea0708fa82c9ddca5d1bda6ed3331a450, id=userLinkedAccount/5926990ea0708fa82c9ddca5d1bda6ed3331a450, haAccountId=HERE-8ce41333-7c08-40d3-9b2c-REDACTED, nameSort=�, hereRealm=HERE, haAccountType=password, haUserId= redac...@here.com, readAccess=application, createdTimeNumeric=1406972278646, indexTime=1488828051697]) So we often want to FIND userLinkedAccount document WHERE parentDocument has some filter properties e.g. Name / email address E.g. +type:userLinkedAccount +{!child of="type:userAccount" v="givenName:frank*”} The results appear to come back fine but the numFound often has a small delta we cannot explain Here is the output of the debugQuery "rawquerystring": "+type:userLinkedAccount +{!child of=\"type:userAccount\" v=\"givenName:frank*\"}", "querystring": "+type:userLinkedAccount +{!child of=\"type:userAccount\" v=\"givenName:frank*\"}", "parsedquery": "+type:userLinkedAccount +ToChildBlockJoinQuery(ToChildBlockJoinQuery (givenName:frank*))", "parsedquery_toString": "+type:userLinkedAccount +ToChildBlockJoinQuery (givenName:frank*)", "QParser": "LuceneQParser", "explain": { "userLinkedAccount/eb86bc13944094ce16f684a7f58e2294c84ca956": "\n1.9348345 = sum of:\n 1.4179944 = weight(type:userLinkedAccount in 84623) [DefaultSimilarity], result of:\n1.4179944 = score(doc=84623,freq=1.0), product of:\n 0.85608196 = queryWeight, product of:\n1.6563768 = idf(docFreq=14190942, maxDocs=27357228)\n 0.5168401 = queryNorm\n 1.6563768 = fieldWeight in 84623, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n1.6563768 = idf(docFreq=14190942, maxDocs=27357228)\n1.0 = fieldNorm(doc=84623)\n 0.5168401 = Score based on parent document 84624\n0.5168401 = givenName:frank*, product of:\n 1.0 = boost\n 0.5168401 = queryNorm\n", "userLinkedAccount/78498d9d7d5c1a52de0f61d90df138ac7381d37f": "\n1.9348345 = sum of:\n 1.4179944 = weight(type:userLinkedAccount in 113884) [DefaultSimilarity], result of:\n1.4179944 = score(doc=113884,freq=1.0), product of:\n 0.85608196 = queryWeight, product of:\n1.6563768 = idf(docFreq=14190942, maxDocs=27357228)\n 0.5168401 = queryNorm\n 1.6563768 = fieldWeight in 113884, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n1.6563768 = idf(docFreq=14190942, maxDocs=27357228)\n1.0 = fieldNorm(doc=113
Recommendation for production SOLR
Given the known issues with 6.4.1 and no release date for 6.4.2, is the best recommendation for a production version of SOLR 6.3.0? Hoping to take to production in first week of April. Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.
Re: Recommendation for production SOLR
We are going to production this week using 6.3.0. We don’t have time to re-run all the load benchmarks on 6.4.2. We’ll qualify 6.4.2 in a couple of weeks, then upgrade prod if it passes. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 6, 2017, at 11:48 AM, Phil Scadden wrote: > > Given the known issues with 6.4.1 and no release date for 6.4.2, is the best > recommendation for a production version of SOLR 6.3.0? Hoping to take to > production in first week of April. > Notice: This email and any attachments are confidential and may not be used, > published or redistributed without the prior written consent of the Institute > of Geological and Nuclear Sciences Limited (GNS Science). If received in > error please destroy and immediately notify GNS Science. Do not copy or > disclose the contents.
Re:Learning to rank - Bad Request
Hi Vincent, Would you be comfortable sharing (redacted) details of the exact upload command you used and (redacted) extracts of the features json file that gave the upload error? Two things I have encountered commonly myself: * uploading features to the model endpoint or model to the feature endpoint * forgotten double-quotes around the numbers in MultipleAdditiveTreesModel json Regards, Christine - Original Message - From: solr-user@lucene.apache.org To: solr-user@lucene.apache.org At: 03/06/17 13:22:40 Hi all, I've been trying to get learning to rank working on our own search index. Following the LTR-readme (https://github.com/bloomberg/lucene-solr/blob/master-ltr/solr/contrib/ltr/example/README.md) I ran the example python script to train and upload the model, but I already get an error during the uploading of the features: Bad Request (400) - Expected Map to create a new ManagedResource but received a java.util.ArrayList at org.apache.solr.rest.RestManager$RestManagerManagedResource.doPut(RestManager.java:523) at org.apache.solr.rest.ManagedResource.doPost(ManagedResource.java:355) at org.apache.solr.rest.RestManager$ManagedEndpoint.post(RestManager.java:351) at org.restlet.resource.ServerResource.doHandle(ServerResource.java:454) ... This makes sense: the json feature file is an array, and the RestManager needs a Map in doPut. Using the curl command from the cwiki (https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank) yields the same error, but instead of it having "received a java.util.ArrayList" it "received a java.lang.String". I wonder how this actually is supposed to work, and what's going wrong in this case. I have tried the LTR with the default techproducts example, and that worked just fine. Does anyone have an idea of what's going wrong here? Thanks in advance! Vincent
question related to solr LTR plugin
Hi, I do have a question related to solr LTR plugin. I have a use case of personalization and wondering whether you can help me there. I would like to rerank my query based on the relationship of searcher with the author of the returned documents. I do have relationship score in the external datastore in form of user1(searcher), user2(author), relationship score. In my query, I can pass searcher id as external feature. My question is that during querying, how do I retrieve relationship score for each documents as a feature and rerank the documents. Would I need to implement a custom feature to do so? and How to implement the custom feature. Thanks, Saurabh
Re: Recommendation for production SOLR
6.4.2 has passed the vote to release, so it should be hitting the mirrors in a few days at most. On Mon, Mar 6, 2017 at 11:50 AM, Walter Underwood wrote: > We are going to production this week using 6.3.0. We don’t have time to > re-run all the load benchmarks on 6.4.2. > > We’ll qualify 6.4.2 in a couple of weeks, then upgrade prod if it passes. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On Mar 6, 2017, at 11:48 AM, Phil Scadden wrote: >> >> Given the known issues with 6.4.1 and no release date for 6.4.2, is the >> best recommendation for a production version of SOLR 6.3.0? Hoping to take >> to production in first week of April. >> Notice: This email and any attachments are confidential and may not be used, >> published or redistributed without the prior written consent of the >> Institute of Geological and Nuclear Sciences Limited (GNS Science). If >> received in error please destroy and immediately notify GNS Science. Do not >> copy or disclose the contents. >
Getting an error: was indexed without position data; cannot run PhraseQuery
We keep getting this in our Tomcat/SOLR Logs and I was wondering if a simple schema change will alleviate this issue: INFO - 2017-03-06 07:26:58.751; org.apache.solr.core.SolrCore; [Client_AdvanceAutoParts] webapp=/solr path=/select params={fl=candprofileid,+candid&start=0&q=*:*&wt=json&fq=issearchable:1+AND+cpentitymodifiedon:[2017-01-20T00:00:00.000Z+TO+*]+AND+clientreqid:17672+AND+folderid:132+AND+(engagedid_s:(0)+AND+atleast21_s:(1))+AND+(preferredlocations_s:(3799H))&rows=1000} status=500 QTime=1480 ERROR - 2017-03-06 07:26:58.766; org.apache.solr.common.SolrException; null:java.lang.IllegalStateException: field "preferredlocations_s" was indexed without position data; cannot run PhraseQuery (term=3799) at org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:351) at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131) at org.apache.lucene.search.BooleanQuery$BooleanWeight.bulkScorer(BooleanQuery.java:313) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1158) at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:846) at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1004) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1517) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1397) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) The field in question "preferredlocations_s" is not defined in schema.xml explicitly, but we have a dynamicField schema entry that covers it. Would adding omitTermFreqAndPositions="false" to this schema line help out here? Should I explicitly define this "preferredlocations_s" field in the schema instead and add it there? We do have a handful of dynamic fields that all get covered by this rule, but it seems the "preferredlocations_s" field is the only one throwing errors. All it stores is a CSV string with location IDs in it.
negative array size exception
After migrating from solr to a load balanced solrcloud with 3 ZKs on the same machines and solr has 3 shards (one per node) We see this logged in the UI on one of our solrs. Does anyone know what this is symptomatic of? java.lang.NegativeArraySizeException at org.apache.lucene.util.PriorityQueue.(PriorityQueue.java:63) at org.apache.lucene.util.PriorityQueue.(PriorityQueue.java:44) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.(ShardFieldSortedHitQueue.java:45) at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:979) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:763) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:742) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:428) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:745)
Re: Getting an error: was indexed without position data; cannot run PhraseQuery
Usually an _s field is a "string" type, so be sure you didn't change the definition without completely re-indexing. In fact I generally either index to a new collection or remove the data directory entirely. right, the field isn't indexed with position information. That combined with (probably) the WordDelimiterFilterFactory in text_en_splitting is generating multiple tokens for inputs like 3799H. See the admin/analysis page for how that gets broken up. Term positions are usually enable by default, so I'm not quite sure why they're gone unless you disabled them. But you're on the right track regardless. you have to 1> include term positions for anything that generates phrase queries or 2> make sure you don't generate phrase queries. edismax can do this if you have it configured to, and then there's autoGeneratePhrasQueries that you may find. And do reindex completely from scratch if you change the definitions. Best, Erick On Mon, Mar 6, 2017 at 1:41 PM, Pouliot, Scott wrote: > We keep getting this in our Tomcat/SOLR Logs and I was wondering if a simple > schema change will alleviate this issue: > > INFO - 2017-03-06 07:26:58.751; org.apache.solr.core.SolrCore; > [Client_AdvanceAutoParts] webapp=/solr path=/select > params={fl=candprofileid,+candid&start=0&q=*:*&wt=json&fq=issearchable:1+AND+cpentitymodifiedon:[2017-01-20T00:00:00.000Z+TO+*]+AND+clientreqid:17672+AND+folderid:132+AND+(engagedid_s:(0)+AND+atleast21_s:(1))+AND+(preferredlocations_s:(3799H))&rows=1000} > status=500 QTime=1480 > ERROR - 2017-03-06 07:26:58.766; org.apache.solr.common.SolrException; > null:java.lang.IllegalStateException: field "preferredlocations_s" was > indexed without position data; cannot run PhraseQuery (term=3799) > at > org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277) > at > org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:351) > at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131) > at > org.apache.lucene.search.BooleanQuery$BooleanWeight.bulkScorer(BooleanQuery.java:313) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) > at > org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1158) > at > org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:846) > at > org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1004) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1517) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1397) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478) > at > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) > at > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) > at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) > at > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) >
RE: Getting an error: was indexed without position data; cannot run PhraseQuery
Hmm. We haven’t changed data or the definition in YEARS now. I'll have to do some more digging I guess. Not sure re-indexing is a great thing to do though since this is a production setup and the database for this user is @ 50GB. It would take quite a long time to reindex all that data from scratch. H Thanks for the quick reply Erick! -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, March 6, 2017 5:33 PM To: solr-user Subject: Re: Getting an error: was indexed without position data; cannot run PhraseQuery Usually an _s field is a "string" type, so be sure you didn't change the definition without completely re-indexing. In fact I generally either index to a new collection or remove the data directory entirely. right, the field isn't indexed with position information. That combined with (probably) the WordDelimiterFilterFactory in text_en_splitting is generating multiple tokens for inputs like 3799H. See the admin/analysis page for how that gets broken up. Term positions are usually enable by default, so I'm not quite sure why they're gone unless you disabled them. But you're on the right track regardless. you have to 1> include term positions for anything that generates phrase queries or 2> make sure you don't generate phrase queries. edismax can do this if you have it configured to, and then there's autoGeneratePhrasQueries that you may find. And do reindex completely from scratch if you change the definitions. Best, Erick On Mon, Mar 6, 2017 at 1:41 PM, Pouliot, Scott wrote: > We keep getting this in our Tomcat/SOLR Logs and I was wondering if a simple > schema change will alleviate this issue: > > INFO - 2017-03-06 07:26:58.751; org.apache.solr.core.SolrCore; > [Client_AdvanceAutoParts] webapp=/solr path=/select > params={fl=candprofileid,+candid&start=0&q=*:*&wt=json&fq=issearchable:1+AND+cpentitymodifiedon:[2017-01-20T00:00:00.000Z+TO+*]+AND+clientreqid:17672+AND+folderid:132+AND+(engagedid_s:(0)+AND+atleast21_s:(1))+AND+(preferredlocations_s:(3799H))&rows=1000} > status=500 QTime=1480 ERROR - 2017-03-06 07:26:58.766; > org.apache.solr.common.SolrException; null:java.lang.IllegalStateException: > field "preferredlocations_s" was indexed without position data; cannot run > PhraseQuery (term=3799) > at > org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277) > at > org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:351) > at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131) > at > org.apache.lucene.search.BooleanQuery$BooleanWeight.bulkScorer(BooleanQuery.java:313) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) > at > org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1158) > at > org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:846) > at > org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1004) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1517) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1397) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478) > at > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) > at > org.apache.catalina.core.StandardEngineV
Re: Getting an error: was indexed without position data; cannot run PhraseQuery
You're in a pickle then. If you change the definition you need to re-index. But you claim you haven't changed anything in years as far as the schema is concerned so maybe you're going to get lucky ;). The error you reported is because somehow there's a phrase search going on against this field. You could have changed something in the query parsers or eDismax definitions or the query generated on the app side to have phrase query get through. I'm not quite sure if you'll get information back when the query fails, but try adding &debug=query to the URL and see what the parsed_query and parsed_query_toString() to see where phrases are getting generated. Best, Erick On Mon, Mar 6, 2017 at 5:26 PM, Pouliot, Scott wrote: > Hmm. We haven’t changed data or the definition in YEARS now. I'll have to > do some more digging I guess. Not sure re-indexing is a great thing to do > though since this is a production setup and the database for this user is @ > 50GB. It would take quite a long time to reindex all that data from scratch. > H > > Thanks for the quick reply Erick! > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Monday, March 6, 2017 5:33 PM > To: solr-user > Subject: Re: Getting an error: was indexed without position data; > cannot run PhraseQuery > > Usually an _s field is a "string" type, so be sure you didn't change the > definition without completely re-indexing. In fact I generally either index > to a new collection or remove the data directory entirely. > > right, the field isn't indexed with position information. That combined with > (probably) the WordDelimiterFilterFactory in text_en_splitting is generating > multiple tokens for inputs like 3799H. > See the admin/analysis page for how that gets broken up. Term positions are > usually enable by default, so I'm not quite sure why they're gone unless you > disabled them. > > But you're on the right track regardless. you have to > 1> include term positions for anything that generates phrase queries > or > 2> make sure you don't generate phrase queries. edismax can do this if > you have it configured to, and then there's autoGeneratePhrasQueries that you > may find. > > And do reindex completely from scratch if you change the definitions. > > Best, > Erick > > On Mon, Mar 6, 2017 at 1:41 PM, Pouliot, Scott > wrote: >> We keep getting this in our Tomcat/SOLR Logs and I was wondering if a simple >> schema change will alleviate this issue: >> >> INFO - 2017-03-06 07:26:58.751; org.apache.solr.core.SolrCore; >> [Client_AdvanceAutoParts] webapp=/solr path=/select >> params={fl=candprofileid,+candid&start=0&q=*:*&wt=json&fq=issearchable:1+AND+cpentitymodifiedon:[2017-01-20T00:00:00.000Z+TO+*]+AND+clientreqid:17672+AND+folderid:132+AND+(engagedid_s:(0)+AND+atleast21_s:(1))+AND+(preferredlocations_s:(3799H))&rows=1000} >> status=500 QTime=1480 ERROR - 2017-03-06 07:26:58.766; >> org.apache.solr.common.SolrException; null:java.lang.IllegalStateException: >> field "preferredlocations_s" was indexed without position data; cannot run >> PhraseQuery (term=3799) >> at >> org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277) >> at >> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:351) >> at >> org.apache.lucene.search.Weight.bulkScorer(Weight.java:131) >> at >> org.apache.lucene.search.BooleanQuery$BooleanWeight.bulkScorer(BooleanQuery.java:313) >> at >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) >> at >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) >> at >> org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1158) >> at >> org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:846) >> at >> org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1004) >> at >> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1517) >> at >> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1397) >> at >> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478) >> at >> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461) >> at >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(Sol
RE: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.
I couldn't find an issue for this in JIRA so I thought I would add some of our own findings here... We are seeing the same problem with the Solr 6 Restore functionality. While I do not think it is important it happens on both our Linux environments and our local Windows development environments. Also, from our testing, I do not think it has anything to do with actual indexing (if you notice in the order of my test steps documents appear in replicas after creation, without re-indexing). Test Environment: • Windows 10 (we see the same behavior on Linux as well) • Java 1.8.0_121 • Solr 6.3.0 with patch for SOLR-9527 (To fix RESTORE shard distribution and add createNodeSet to RESTORE) • 1 Zookeeper node running on localhost:2181 • 3 Solr nodes running on localhost:8171, localhost:8181 and localhost:8191 (hostname NY07LP521696) Test and observations: 1) Create a 2 shard collection 'test' http://localhost:8181/solr/admin/collections?action=CREATE&name=test&numShards=2&replicationFactor=1&maxShardsPerNode=1&collection.configName=testconf&&createNodeSet=NY07LP521696:8171_solr,NY07LP521696:8181_solr 2) Index 7 documents to 'test' 3) Search 'test' - result count 7 4) Backup collection 'test' http://localhost:8181/solr/admin/collections?action=BACKUP&collection=test&name=copy&location=%2FData%2Fsolr%2Fbkp&async=1234 5) Restore 'test' to collection 'test2' http://localhost:8191/solr/admin/collections?action=RESTORE&name=copy&location=%2FData%2Fsolr%2Fbkp&collection=test2&async=1234&maxShardsPerNode=1&createNodeSet=NY07LP521696:8181_solr,NY07LP521696:8191_solr 6) Search 'test2' - result count 7 7) Index 2 new documents to 'test2' 8) Search 'test2' - result count 7 (new documents do not appear in results) 9) Create a replica for each of the shards of 'test2' http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=NY07LP521696:8181_solr http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard2&node=NY07LP521696:8171_solr *** Note that it is not necessary to try to re-index the 2 new documents before this step, just create replicas and query *** 10) Repeatedly query 'test2' - result count randomly changes between 7, 8 and 9. This is because Solr is randomly selecting replicas of 'test2' and one of the two new docs were added to each of the shards in the collection so if replica0 of both shards are selected the result is 7, if replica0 and replica1 are selected for each of either shard the result is 8 and if replica1 is selected for both shards the result is 9. This is random behavior because we do not know ahead of time which shards the new documents will be added to and if they will be split evenly. Query 'test2' with shards parameter of original restored shards - result count 7 http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica0 Query 'test2' with shard parameter of one original restored shard and one replica shard - result count 8 http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica1 http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,localhost:8181/solr/test2_shard2_replica0 Query 'test2' with shards parameter of replica shards - result count 9 http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,localhost:8181/solr/test2_shard2_replica1 13) Note that on the Solr admin Core statistics show the restored cores as not current, the Searching master is Gen 2, the Replicable master is Gen 3, on the replicated core both the Searching and Replicable master is Gen 3 14) Restarting Solr corrects the issue Thoughts: • Solr is backing up and restoring correctly • The restored collection data is stored under a path like: …/node8181/test2_shard1_replica0/restore.20170307005909295 instead of …/node8181/test2_shard1_replica0/index • Indexing is actually behaving correctly (documents are available in replicas even without re-indexing) • When asked to about the state of the searcher though the admin page core details Solr does know that the searcher is not current I was looking in the source but haven’t found the root cause yet. My gut feeling is that because the index data dir is …/restore.20170307005909295 instead of …/index Solr isn't seeing the index changes and recycling the searcher for the restored cores. Neither committing the collection or forcing an optimize fix the issue, restarting Solr fixes the issue but this will not be viable for us in production. John Marquiss -Original Message- >From: Jerome Yang [mailto:jey...@pivotal.io]