Re: Corrupt Index error on Target cluster

2018-09-09 Thread Susheel Kumar
Thanks. I have 6.6.2.  Do you remember the exact minor version which you
run into with corruptIndex.  I did fix it using CheckIndex.

On Sat, Sep 8, 2018 at 2:00 AM Stephen Bianamara 
wrote:

> Hmm, when this occurred for me I was also on 6.6 between minor releases. So
> unclear if it's connected to 6.6 specifically.
>
> If you want to resolve the problem, you should be able to use the
> collection api delete that node from the collection, and then re-add it
> which will trigger resync.
>
>
> On Fri, Sep 7, 2018, 10:35 AM Susheel Kumar  wrote:
>
> > No. The solr i have is 6.6.
> >
> > On Fri, Sep 7, 2018 at 10:51 AM Stephen Bianamara <
> > sdl1tinsold...@gmail.com>
> > wrote:
> >
> > > I've gotten incorrect checksums when upgrading solr versions across the
> > > cluster. Or in other words, when indexing into a mixed version cluster.
> > Are
> > > you running mixed versions by chance?
> > >
> > > On Fri, Sep 7, 2018, 6:07 AM Susheel Kumar 
> > wrote:
> > >
> > > > Anyone has  insight / have faced above errors ?
> > > >
> > > > On Thu, Sep 6, 2018 at 12:04 PM Susheel Kumar  >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > We had a running cluster with CDCR and there were some issues with
> > > > > indexing on Source cluster which got resolved after restarting the
> > > nodes
> > > > > (in my absence...) and now I see  below errors on a shard at Target
> > > > > cluster.  Any suggestions / ideas what could have caused this and
> > whats
> > > > the
> > > > > best way to recover.
> > > > >
> > > > > Thnx
> > > > >
> > > > > Caused by: org.apache.solr.common.SolrException: Error opening new
> > > > searcher
> > > > > at
> > > > > org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2069)
> > > > > at
> > > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
> > > > > at
> > > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1926)
> > > > > at
> > > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1826)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:127)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:310)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> > > > > at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:267)
> > > > > ... 34 more
> > > > > Caused by: org.apache.lucene.index.CorruptIndexException: Corrupted
> > > > > bitsPerDocBase: 6033
> > > > >
> > > >
> > >
> >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica1/data/index.20180903220548447/_9nsy.tvx")))
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexReader.(CompressingStoredFieldsIndexReader.java:89)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:126)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:91)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:128)
> > > > > at
> > > > > org.apache.lucene.index.SegmentReader.(SegmentReader.java:74)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:197)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:103)
> > > > > at
> > > > > org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:467)
> > > > > at
> > > > >
> > org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:103)
> > > > > at
> > > > >
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:79)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:39)
> > > > > at
> > > > > org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2033)
> > > > > ... 43 more
> > > > > Suppressed: org.apache.lucene.index.CorruptIndexException:
> > > > > checksum failed (hardware problem?) : expected=e5bf0d15
> > actual=21722825
> > > > >
> > > >
> > >
> >
> (resource=BufferedChecksumInde

504 timeout

2018-09-09 Thread John Blythe
hi all. we just migrated to cloud on friday night (woohoo!). everything is
looking good (great!) overall. we did, however, just run into a hiccup.
running a query like this got us a 504 gateway time-out error:

**some* *foo* *bar* *query**

it was about 6 partials with encapsulating wildcards that someone was
running that gave the error. doing 4 or 5 of them worked fine, but upon
adding the last one or two it went caput. all operations have been zippier
since the migration before doing some of those wildcard queries which took
time (if they worked at all). is this something related directly w our
server configuration or is there some solr/cloud config'ing that we could
work on that would allow better response to these sorts of queries (though
it'd be at a cost, i'd imagine!).

thanks for any insight!

best,

--
John Blythe


Re: 504 timeout

2018-09-09 Thread Erick Erickson
First of all, wildcards are evil. Be sure that the reason people are
using wildcards wouldn't be better served by proper tokenizing,
perhaps something like stemming etc.

Assuming that wildcards must be handled though, there are two main strategies:
1> if you want to use leading wildcards, look at
ReverseWildcardFilterFactory. For something like abc* (trailing
wildcard), conceptually Lucene has to construct a big OR query of
every term that starts with "abc". That's not hard and is also pretty
fast, just jump to the first term that starts with "abc" and gather
all of them (they're sorted lexicaly) until you get to the first term
starting with "abd".

_Leading_ wildcards are a whole 'nother story. *abc means that each
and every distinct term in the field must be enumerated. The first
term could be abc and the last term in the field zzzabc.
There's no way to tell without checking every one.
ReverseWildcardFilterFactory handles indexing the term, well, reversed
so the above example not only would the term abc bb indexed,
but also cba. Now both leading and trailing wildcards are
automagically made into trailing wildcards.

2> If you must allow leading and trailing wildcards on the same term
*abc*, consider ngramming, bigrams are usually sufficient. So aaabcde
is indexed as aa, aa, ab, bd, de and searching for *abc* becomes
searching for "ab bc".

Both of these make the index larger, but usually by surprisingly
little. People will also index these variants in separate fields upon
occasion, it depends on the use-cases needed to support. Ngramming for
instance would find "ab" in the above (no wildcards)

Best,
Erick
On Sun, Sep 9, 2018 at 1:40 PM John Blythe  wrote:
>
> hi all. we just migrated to cloud on friday night (woohoo!). everything is
> looking good (great!) overall. we did, however, just run into a hiccup.
> running a query like this got us a 504 gateway time-out error:
>
> **some* *foo* *bar* *query**
>
> it was about 6 partials with encapsulating wildcards that someone was
> running that gave the error. doing 4 or 5 of them worked fine, but upon
> adding the last one or two it went caput. all operations have been zippier
> since the migration before doing some of those wildcard queries which took
> time (if they worked at all). is this something related directly w our
> server configuration or is there some solr/cloud config'ing that we could
> work on that would allow better response to these sorts of queries (though
> it'd be at a cost, i'd imagine!).
>
> thanks for any insight!
>
> best,
>
> --
> John Blythe


Solr Index Issues

2018-09-09 Thread Bineesh
Hi Team,

We are using Nutch 1.15 and Solr 6.6.3

We tried crawling one of the URL and and noticed issues while indexing data
to solr.Below is the capture from logs 

Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/nutch: Expected mime type
application/octet-stream but got text/html. 

Here in the log i see collection name is nutch but the actual collection
name i created is Nutch1.15_Test

Given below is the command used for crawling

bin/nutch solrindex http://10.150.17.32:8983/solr/Nutch1.15_Test
crawl/crawldb -linkdb crawl/linkdb crawl/segments/*


Please suggest any workarounds if available. Thank you



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html