Welcome to IT right?  We're always in some sort of pickle  ;-)  I'm going to 
play with settings on one of our internal environments and see if I can 
replicate the issue and go from there with some test fixes.

Here's a question though...  If I need to re-index....could I do it on another 
instance running the same SOLR version (4.8.0) and then copy the database into 
place instead?  We're using some crappy custom Groovy script run through Aspire 
to do our indexing and it's horribly slow.  50GB would take at least a 
day...maybe 2 and I obviously can't have a client down for that long in 
Production, but if I did it on a backup SOLR box....copying 50GB into place is 
much much quicker.

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, March 6, 2017 8:48 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Getting an error: <field> was indexed without position data; 
cannot run PhraseQuery

You're in a pickle then. If you change the definition you need to re-index.

But you claim you haven't changed anything in years as far as the schema is 
concerned so maybe you're going to get lucky ;).

The error you reported is because somehow there's a phrase search going on 
against this field. You could have changed something in the query parsers or 
eDismax definitions or the query generated on the app side to have  phrase 
query get through. I'm not quite sure if you'll get information back when the 
query fails, but try adding &debug=query to the URL and see what the 
parsed_query and parsed_query_toString() to see where phrases are getting 
generated.

Best,
Erick

On Mon, Mar 6, 2017 at 5:26 PM, Pouliot, Scott <scott.poul...@peoplefluent.com> 
wrote:
> Hmm.  We haven’t changed data or the definition in YEARS now.  I'll 
> have to do some more digging I guess.  Not sure re-indexing is a great 
> thing to do though since this is a production setup and the database 
> for this user is @ 50GB.  It would take quite a long time to reindex 
> all that data from scratch.  Hmmmm
>
> Thanks for the quick reply Erick!
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, March 6, 2017 5:33 PM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Getting an error: <field> was indexed without position 
> data; cannot run PhraseQuery
>
> Usually an _s field is a "string" type, so be sure you didn't change the 
> definition without completely re-indexing. In fact I generally either index 
> to a new collection or remove the data directory entirely.
>
> right, the field isn't indexed with position information. That combined with 
> (probably) the WordDelimiterFilterFactory in text_en_splitting is generating 
> multiple tokens for inputs like 3799H.
> See the admin/analysis page for how that gets broken up. Term positions are 
> usually enable by default, so I'm not quite sure why they're gone unless you 
> disabled them.
>
> But you're on the right track regardless. you have to
> 1> include term positions for anything that generates phrase queries
> or
> 2> make sure you don't generate phrase queries. edismax can do this if
> you have it configured to, and then there's autoGeneratePhrasQueries that you 
> may find.
>
> And do reindex completely from scratch if you change the definitions.
>
> Best,
> Erick
>
> On Mon, Mar 6, 2017 at 1:41 PM, Pouliot, Scott 
> <scott.poul...@peoplefluent.com> wrote:
>> We keep getting this in our Tomcat/SOLR Logs and I was wondering if a simple 
>> schema change will alleviate this issue:
>>
>> INFO  - 2017-03-06 07:26:58.751; org.apache.solr.core.SolrCore; 
>> [Client_AdvanceAutoParts] webapp=/solr path=/select 
>> params={fl=candprofileid,+candid&start=0&q=*:*&wt=json&fq=issearchable:1+AND+cpentitymodifiedon:[2017-01-20T00:00:00.000Z+TO+*]+AND+clientreqid:17672+AND+folderid:132+AND+(engagedid_s:(0)+AND+atleast21_s:(1))+AND+(preferredlocations_s:(3799H))&rows=1000}
>>  status=500 QTime=1480 ERROR - 2017-03-06 07:26:58.766; 
>> org.apache.solr.common.SolrException; null:java.lang.IllegalStateException: 
>> field "preferredlocations_s" was indexed without position data; cannot run 
>> PhraseQuery (term=3799)
>>                 at 
>> org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277)
>>                 at 
>> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:351)
>>                 at 
>> org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
>>                 at 
>> org.apache.lucene.search.BooleanQuery$BooleanWeight.bulkScorer(BooleanQuery.java:313)
>>                 at 
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
>>                 at 
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
>>                 at 
>> org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1158)
>>                 at 
>> org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:846)
>>                 at 
>> org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1004)
>>                 at 
>> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1517)
>>                 at 
>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1397)
>>                 at 
>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478)
>>                 at 
>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461)
>>                 at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
>>                 at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>                 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
>>                 at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
>>                 at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
>>                 at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>>                 at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>>                 at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>>                 at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>>                 at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>>                 at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>>                 at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>>                 at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>>                 at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>>                 at 
>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
>>                 at 
>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>>                 at 
>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
>>                 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
>> Source)
>>                 at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>                 at java.lang.Thread.run(Unknown Source)
>>
>>
>> The field in question "preferredlocations_s" is not defined in schema.xml 
>> explicitly, but we have a dynamicField schema entry that covers it.
>>
>> <dynamicField name="*_s" type="text_en_splitting" indexed="true"
>> stored="true" />
>>
>> Would adding omitTermFreqAndPositions="false" to this schema line help out 
>> here?  Should I explicitly define this "preferredlocations_s" field in the 
>> schema instead and add it there?  We do have a handful of dynamic fields 
>> that all get covered by this rule, but it seems the "preferredlocations_s" 
>> field is the only one throwing errors.  All it stores is a CSV string with 
>> location IDs in it.
>>

Reply via email to