Re: Getting an error: was indexed without position data; cannot run PhraseQuery

Erick Erickson Tue, 07 Mar 2017 12:50:54 -0800

OK, you can do kind of the same thing with the core admin API "SWAP" command.


And in stand-alone it's much simpler. Just index your data somewhere
(I don't particularly care where, your workstation, a spare machine
lying around, whatever) and copy the result to the index directory for
prod. I'd copy it to the master then make sure it propagates to all
the slaves. You can do this by removing the data directory while a
slave is shut down and starting it back up.

Or, you an copy the index to the master and all the slaves in one big go.

Up to you.

Best,
Erick



On Tue, Mar 7, 2017 at 8:59 AM, Pouliot, Scott
<scott.poul...@peoplefluent.com> wrote:
> We are NOT using SOLRCloud yet.  I'm still trying to figure out how to get 
> SOLRCloud running.  We're using old school master/slave replication still.  
> So sounds like it can be done if I get to that point.  I've got a few non 
> SOLR tasks to get done today, so hoping to dig into this later in the week 
> though.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, March 7, 2017 11:05 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Getting an error: <field> was indexed without position data; 
> cannot run PhraseQuery
>
> First, it's not clear whether you're using SolrCloud or not, so there may be 
> some irrelevant info in here....
>
> bq: .could I do it on another instance running the same SOLR version
> (4.8.0) and then copy the database into place instead
>
> In a word "yes", if you're careful. Assuming you have more than one shard you 
> have to be sure to copy the shards faithfully. By that I mean look at your 
> admin UI>>cloud>>tree>>(clusterstate.json or
>>>collection>>state.json). You'll see a bunch of information for each
> replica but the critical bit is that the hash range should be the same for 
> the source and destination. It'll be something like 0x00000000-0x7fffffff for 
> one shard (each replica on a shard has the same hash range). etc.
>
> The implication of course is that both collections need to have the same 
> number of shards.
>
> If you don't have any shards, don't worry about it...
>
> Another possibility, depending on your resources is to create another 
> collection with the same number of shards and index to _that_. Then use the 
> Collections API CREATEALIAS command to atomically switch. This assumes you 
> have enough extra capacity that you can do the reindexing without unduly 
> impacting prod.
>
> And there are a number of variants on this.
>> index to a leader-only collection
>> during a small maintenance window you shut down prod and ADDREPLICA
>> for all the shards to build out your new collection blow away your old 
>> collection when you're comfortable.
>
> But the bottom line is that indexes may be freely copied wherever you want as 
> long as the bookkeeping is respected wrt hash ranges. I used to build Lucene 
> indexes on a Windows box and copy it to a Unix server as long as I used 
> binary copy....
>
> Best,
> Erick
>
> On Tue, Mar 7, 2017 at 7:04 AM, Pouliot, Scott 
> <scott.poul...@peoplefluent.com> wrote:
>> Welcome to IT right?  We're always in some sort of pickle  ;-)  I'm going to 
>> play with settings on one of our internal environments and see if I can 
>> replicate the issue and go from there with some test fixes.
>>
>> Here's a question though...  If I need to re-index....could I do it on 
>> another instance running the same SOLR version (4.8.0) and then copy the 
>> database into place instead?  We're using some crappy custom Groovy script 
>> run through Aspire to do our indexing and it's horribly slow.  50GB would 
>> take at least a day...maybe 2 and I obviously can't have a client down for 
>> that long in Production, but if I did it on a backup SOLR box....copying 
>> 50GB into place is much much quicker.
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Monday, March 6, 2017 8:48 PM
>> To: solr-user <solr-user@lucene.apache.org>
>> Subject: Re: Getting an error: <field> was indexed without position
>> data; cannot run PhraseQuery
>>
>> You're in a pickle then. If you change the definition you need to re-index.
>>
>> But you claim you haven't changed anything in years as far as the schema is 
>> concerned so maybe you're going to get lucky ;).
>>
>> The error you reported is because somehow there's a phrase search going on 
>> against this field. You could have changed something in the query parsers or 
>> eDismax definitions or the query generated on the app side to have  phrase 
>> query get through. I'm not quite sure if you'll get information back when 
>> the query fails, but try adding &debug=query to the URL and see what the 
>> parsed_query and parsed_query_toString() to see where phrases are getting 
>> generated.
>>
>> Best,
>> Erick
>>
>> On Mon, Mar 6, 2017 at 5:26 PM, Pouliot, Scott 
>> <scott.poul...@peoplefluent.com> wrote:
>>> Hmm.  We haven’t changed data or the definition in YEARS now.  I'll
>>> have to do some more digging I guess.  Not sure re-indexing is a
>>> great thing to do though since this is a production setup and the
>>> database for this user is @ 50GB.  It would take quite a long time to
>>> reindex all that data from scratch.  Hmmmm
>>>
>>> Thanks for the quick reply Erick!
>>>
>>> -----Original Message-----
>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>>> Sent: Monday, March 6, 2017 5:33 PM
>>> To: solr-user <solr-user@lucene.apache.org>
>>> Subject: Re: Getting an error: <field> was indexed without position
>>> data; cannot run PhraseQuery
>>>
>>> Usually an _s field is a "string" type, so be sure you didn't change the 
>>> definition without completely re-indexing. In fact I generally either index 
>>> to a new collection or remove the data directory entirely.
>>>
>>> right, the field isn't indexed with position information. That combined 
>>> with (probably) the WordDelimiterFilterFactory in text_en_splitting is 
>>> generating multiple tokens for inputs like 3799H.
>>> See the admin/analysis page for how that gets broken up. Term positions are 
>>> usually enable by default, so I'm not quite sure why they're gone unless 
>>> you disabled them.
>>>
>>> But you're on the right track regardless. you have to
>>> 1> include term positions for anything that generates phrase queries
>>> or
>>> 2> make sure you don't generate phrase queries. edismax can do this
>>> 2> if
>>> you have it configured to, and then there's autoGeneratePhrasQueries that 
>>> you may find.
>>>
>>> And do reindex completely from scratch if you change the definitions.
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, Mar 6, 2017 at 1:41 PM, Pouliot, Scott 
>>> <scott.poul...@peoplefluent.com> wrote:
>>>> We keep getting this in our Tomcat/SOLR Logs and I was wondering if a 
>>>> simple schema change will alleviate this issue:
>>>>
>>>> INFO  - 2017-03-06 07:26:58.751; org.apache.solr.core.SolrCore;
>>>> [Client_AdvanceAutoParts] webapp=/solr path=/select 
>>>> params={fl=candprofileid,+candid&start=0&q=*:*&wt=json&fq=issearchable:1+AND+cpentitymodifiedon:[2017-01-20T00:00:00.000Z+TO+*]+AND+clientreqid:17672+AND+folderid:132+AND+(engagedid_s:(0)+AND+atleast21_s:(1))+AND+(preferredlocations_s:(3799H))&rows=1000}
>>>>  status=500 QTime=1480 ERROR - 2017-03-06 07:26:58.766; 
>>>> org.apache.solr.common.SolrException; 
>>>> null:java.lang.IllegalStateException: field "preferredlocations_s" was 
>>>> indexed without position data; cannot run PhraseQuery (term=3799)
>>>>                 at 
>>>> org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277)
>>>>                 at 
>>>> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:351)
>>>>                 at 
>>>> org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
>>>>                 at 
>>>> org.apache.lucene.search.BooleanQuery$BooleanWeight.bulkScorer(BooleanQuery.java:313)
>>>>                 at 
>>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
>>>>                 at 
>>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
>>>>                 at 
>>>> org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1158)
>>>>                 at 
>>>> org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:846)
>>>>                 at 
>>>> org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1004)
>>>>                 at 
>>>> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1517)
>>>>                 at 
>>>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1397)
>>>>                 at 
>>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478)
>>>>                 at 
>>>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461)
>>>>                 at 
>>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
>>>>                 at 
>>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>>>                 at 
>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
>>>>                 at 
>>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
>>>>                 at 
>>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
>>>>                 at 
>>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>>>>                 at 
>>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>>>>                 at 
>>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>>>>                 at 
>>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>>>>                 at 
>>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>>>>                 at 
>>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>>>>                 at 
>>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>>>>                 at 
>>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>>>>                 at 
>>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>>>>                 at 
>>>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
>>>>                 at 
>>>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>>>>                 at 
>>>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
>>>>                 at 
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>>>>                 at 
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>>>                 at java.lang.Thread.run(Unknown Source)
>>>>
>>>>
>>>> The field in question "preferredlocations_s" is not defined in schema.xml 
>>>> explicitly, but we have a dynamicField schema entry that covers it.
>>>>
>>>> <dynamicField name="*_s" type="text_en_splitting" indexed="true"
>>>> stored="true" />
>>>>
>>>> Would adding omitTermFreqAndPositions="false" to this schema line help out 
>>>> here?  Should I explicitly define this "preferredlocations_s" field in the 
>>>> schema instead and add it there?  We do have a handful of dynamic fields 
>>>> that all get covered by this rule, but it seems the "preferredlocations_s" 
>>>> field is the only one throwing errors.  All it stores is a CSV string with 
>>>> location IDs in it.
>>>>

Re: Getting an error: was indexed without position data; cannot run PhraseQuery

Reply via email to