Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

2018-10-04 Thread Ere Maijala

Hi,

In addition to what others wrote already, there are a couple of things 
that might trigger sudden memory allocation surge that you can't really 
account for:


1. Deep paging, especially in a sharded index. Don't allow it and you'll 
be much happier.


2. Faceting without docValues especially in a large index.

These would be my top two things to check before anything else. I've 
gone from 48 GB heap and GC having massive trouble keeping up to 8 GB 
heap and no trouble at all just by getting rid of deep paging and using 
docValues with all faceted fields.


--Ere

yasoobhaider kirjoitti 3.10.2018 klo 17.01:

Hi

I'm working with a Solr cluster with master-slave architecture.

Master and slave config:
ram: 120GB
cores: 16

At any point there are between 10-20 slaves in the cluster, each serving ~2k
requests per minute. Each slave houses two collections of approx 10G
(~2.5mil docs) and 2G(10mil docs) when optimized.

I am working with Solr 6.2.1

Solr configuration:

-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:-OmitStackTraceInFastThrow
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:ConcGCThreads=4
-XX:MaxTenuringThreshold=8
-XX:ParallelGCThreads=4
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=15
-XX:TargetSurvivorRatio=90
-Xmn10G
-Xms80G
-Xmx80G

Some of these configurations have been reached by multiple trial and errors
over time, including the huge heap size.

This cluster usually runs without any error.

In the usual scenario, old gen gc is triggered according to the
configuration at 50% old gen occupancy, and the collector clears out the
memory over the next minute or so. This happens every 10-15 minutes.

However, I have noticed that sometimes the GC pattern of the slaves
completely changes and old gen gc is not able to clear the memory.

After observing the gc logs closely for multiple old gen gc collections, I
noticed that the old gen gc is triggered at 50% occupancy, but if there is a
GC Allocation Failure before the collection completes (after CMS Initial
Remark but before CMS reset), the old gen collection is not able to clear
much memory. And as soon as this collection completes, another old gen gc is
triggered.

And in worst case scenarios, this cycle of old gen gc triggering, GC
allocation failure keeps happening, and the old gen memory keeps increasing,
leading to a single threaded STW GC, which is not able to do much, and I
have to restart the solr server.

The last time this happened after the following sequence of events:

1. We optimized the bigger collection bringing it to its optimized size of
~10G.
2. For an unrelated reason, we had stopped indexing to the master. We
usually index at a low-ish throughput of ~1mil docs/day. This is relevant as
when we are indexing, the size of the collection increases, and this effects
the heap size used by collection.
3. The slaves started behaving erratically, with old gc collection not being
able to free up the required memory and finally being stuck in a STW GC.

As unlikely as this sounds, this is the only thing that changed on the
cluster. There was no change in query throughput or type of queries.

I restarted the slaves multiple times but the gc behaved in the same way for
over three days. Then when we fixed the indexing and made it live, the
slaves resumed their original gc pattern and are running without any issues
for over 24 hours now.

I would really be grateful for any advice on the following:

1. What could be the reason behind CMS not being able to free up the memory?
What are some experiments I can run to solve this problem?
2. Can stopping/starting indexing be a reason for such drastic changes to GC
pattern?
3. I have read at multiple places on this mailing list that the heap size
should be much lower (2x-3x the size of collection), but the last time I
tried CMS was not able to run smoothly and GC STW would occur which was only
solved by a restart. My reasoning for this is that the type of queries and
the throughput are also a factor in deciding the heap size, so it may be
that our queries are creating too many objects maybe. Is my reasoning
correct or should I try with a lower heap size (if it helps achieve a stable
gc pattern)?

(4. Silly question, but what is the right way to ask question on the mailing
list? via mail or via the nabble website? I sent this question earlier as a
mail, but it was not showing up on the nabble website so I am posting it
from the website now)

-
-

Logs which show this:


Desired survivor size 568413384 bytes, new threshold 2 (max 8)
- age   1:  437184344 bytes,  4371843

Re: SPLITSHARD throwing OutOfMemory Error

2018-10-04 Thread Zheng Lin Edwin Yeo
Hi Atita,

What is the amount of memory that you have in your system?
And what is your index size?

Regards,
Edwin

On Tue, 25 Sep 2018 at 22:39, Atita Arora  wrote:

> Hi,
>
> I am working on a test setup with Solr 6.1.0 cloud with 1 collection
> sharded across 2 shards with no replication. When triggered a SPLITSHARD
> command it throws "java.lang.OutOfMemoryError: Java heap space" everytime.
> I tried this with multiple heap settings of 8, 12 & 20G but every time it
> does create 2 sub-shards but then fails eventually.
> I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 has been
> resolved but the trace looked very similar to this one.
> Also just to ensure that I do not run into exceptions due to merge as
> reported in this ticket, I also tried running optimize before proceeding
> with splitting the shard.
> I issued the following commands :
>
> 1.
>
> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD
>
> This threw java.lang.OutOfMemoryError: Java heap space
>
> 2.
>
> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD&async=1000
>
> Then I ran with async=1000 and checked the status. Every time It's creating
> the sub shards, but not splitting the index.
>
> Is there something that I am not doing correctly?
>
> Please guide.
>
> Thanks,
> Atita
>


Re: SPLITSHARD throwing OutOfMemory Error

2018-10-04 Thread Atita Arora
Hi Edwin,

Thanks for following up on this.

So here are the configs :

Memory - 30G - 20 G to Solr
Disk - 1TB
Index = ~ 500G

and I think that it possibly is due to the reason why this could be
happening is that during split shard, the unsplit index + split index
persists on the instance and may be causing this.
I actually tried splitshard on another instance with index size 64G and it
went through without any issues.

I would appreciate if you have additional information to enlighten me on
this issue.

Thanks again.

Regards,

Atita

On Thu, Oct 4, 2018 at 9:47 AM Zheng Lin Edwin Yeo 
wrote:

> Hi Atita,
>
> What is the amount of memory that you have in your system?
> And what is your index size?
>
> Regards,
> Edwin
>
> On Tue, 25 Sep 2018 at 22:39, Atita Arora  wrote:
>
> > Hi,
> >
> > I am working on a test setup with Solr 6.1.0 cloud with 1 collection
> > sharded across 2 shards with no replication. When triggered a SPLITSHARD
> > command it throws "java.lang.OutOfMemoryError: Java heap space"
> everytime.
> > I tried this with multiple heap settings of 8, 12 & 20G but every time it
> > does create 2 sub-shards but then fails eventually.
> > I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 has
> been
> > resolved but the trace looked very similar to this one.
> > Also just to ensure that I do not run into exceptions due to merge as
> > reported in this ticket, I also tried running optimize before proceeding
> > with splitting the shard.
> > I issued the following commands :
> >
> > 1.
> >
> >
> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD
> >
> > This threw java.lang.OutOfMemoryError: Java heap space
> >
> > 2.
> >
> >
> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD&async=1000
> >
> > Then I ran with async=1000 and checked the status. Every time It's
> creating
> > the sub shards, but not splitting the index.
> >
> > Is there something that I am not doing correctly?
> >
> > Please guide.
> >
> > Thanks,
> > Atita
> >
>


Re: SPLITSHARD throwing OutOfMemory Error

2018-10-04 Thread Andrzej Białecki
I know it’s not much help if you’re stuck with Solr 6.1 … but Solr 7.5 comes 
with an alternative strategy for SPLITSHARD that doesn’t consume as much memory 
and nearly doesn’t consume additional disk space on the leader. This strategy 
can be turned on by “splitMethod=link” parameter.

> On 4 Oct 2018, at 10:23, Atita Arora  wrote:
> 
> Hi Edwin,
> 
> Thanks for following up on this.
> 
> So here are the configs :
> 
> Memory - 30G - 20 G to Solr
> Disk - 1TB
> Index = ~ 500G
> 
> and I think that it possibly is due to the reason why this could be
> happening is that during split shard, the unsplit index + split index
> persists on the instance and may be causing this.
> I actually tried splitshard on another instance with index size 64G and it
> went through without any issues.
> 
> I would appreciate if you have additional information to enlighten me on
> this issue.
> 
> Thanks again.
> 
> Regards,
> 
> Atita
> 
> On Thu, Oct 4, 2018 at 9:47 AM Zheng Lin Edwin Yeo 
> wrote:
> 
>> Hi Atita,
>> 
>> What is the amount of memory that you have in your system?
>> And what is your index size?
>> 
>> Regards,
>> Edwin
>> 
>> On Tue, 25 Sep 2018 at 22:39, Atita Arora  wrote:
>> 
>>> Hi,
>>> 
>>> I am working on a test setup with Solr 6.1.0 cloud with 1 collection
>>> sharded across 2 shards with no replication. When triggered a SPLITSHARD
>>> command it throws "java.lang.OutOfMemoryError: Java heap space"
>> everytime.
>>> I tried this with multiple heap settings of 8, 12 & 20G but every time it
>>> does create 2 sub-shards but then fails eventually.
>>> I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 has
>> been
>>> resolved but the trace looked very similar to this one.
>>> Also just to ensure that I do not run into exceptions due to merge as
>>> reported in this ticket, I also tried running optimize before proceeding
>>> with splitting the shard.
>>> I issued the following commands :
>>> 
>>> 1.
>>> 
>>> 
>> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD
>>> 
>>> This threw java.lang.OutOfMemoryError: Java heap space
>>> 
>>> 2.
>>> 
>>> 
>> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD&async=1000
>>> 
>>> Then I ran with async=1000 and checked the status. Every time It's
>> creating
>>> the sub shards, but not splitting the index.
>>> 
>>> Is there something that I am not doing correctly?
>>> 
>>> Please guide.
>>> 
>>> Thanks,
>>> Atita
>>> 
>> 

—

Andrzej Białecki



Re: SPLITSHARD throwing OutOfMemory Error

2018-10-04 Thread Atita Arora
Hi Andrzej,

We're rather weighing on a lot of other stuff to upgrade our Solr for a
very long time like better authentication handling, backups using CDCR, new
Replication mode and this probably has just given us another reason to
upgrade.
Thank you so much for the suggestion, I think its good to know about
something like this exists. We'll find out more about this.

Great day ahead!

Regards,
Atita



On Thu, Oct 4, 2018 at 11:28 AM Andrzej Białecki  wrote:

> I know it’s not much help if you’re stuck with Solr 6.1 … but Solr 7.5
> comes with an alternative strategy for SPLITSHARD that doesn’t consume as
> much memory and nearly doesn’t consume additional disk space on the leader.
> This strategy can be turned on by “splitMethod=link” parameter.
>
> > On 4 Oct 2018, at 10:23, Atita Arora  wrote:
> >
> > Hi Edwin,
> >
> > Thanks for following up on this.
> >
> > So here are the configs :
> >
> > Memory - 30G - 20 G to Solr
> > Disk - 1TB
> > Index = ~ 500G
> >
> > and I think that it possibly is due to the reason why this could be
> > happening is that during split shard, the unsplit index + split index
> > persists on the instance and may be causing this.
> > I actually tried splitshard on another instance with index size 64G and
> it
> > went through without any issues.
> >
> > I would appreciate if you have additional information to enlighten me on
> > this issue.
> >
> > Thanks again.
> >
> > Regards,
> >
> > Atita
> >
> > On Thu, Oct 4, 2018 at 9:47 AM Zheng Lin Edwin Yeo  >
> > wrote:
> >
> >> Hi Atita,
> >>
> >> What is the amount of memory that you have in your system?
> >> And what is your index size?
> >>
> >> Regards,
> >> Edwin
> >>
> >> On Tue, 25 Sep 2018 at 22:39, Atita Arora  wrote:
> >>
> >>> Hi,
> >>>
> >>> I am working on a test setup with Solr 6.1.0 cloud with 1 collection
> >>> sharded across 2 shards with no replication. When triggered a
> SPLITSHARD
> >>> command it throws "java.lang.OutOfMemoryError: Java heap space"
> >> everytime.
> >>> I tried this with multiple heap settings of 8, 12 & 20G but every time
> it
> >>> does create 2 sub-shards but then fails eventually.
> >>> I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 has
> >> been
> >>> resolved but the trace looked very similar to this one.
> >>> Also just to ensure that I do not run into exceptions due to merge as
> >>> reported in this ticket, I also tried running optimize before
> proceeding
> >>> with splitting the shard.
> >>> I issued the following commands :
> >>>
> >>> 1.
> >>>
> >>>
> >>
> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD
> >>>
> >>> This threw java.lang.OutOfMemoryError: Java heap space
> >>>
> >>> 2.
> >>>
> >>>
> >>
> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD&async=1000
> >>>
> >>> Then I ran with async=1000 and checked the status. Every time It's
> >> creating
> >>> the sub shards, but not splitting the index.
> >>>
> >>> Is there something that I am not doing correctly?
> >>>
> >>> Please guide.
> >>>
> >>> Thanks,
> >>> Atita
> >>>
> >>
>
> —
>
> Andrzej Białecki
>
>


Update Request Processors are Not Chained

2018-10-04 Thread Furkan KAMACI
I've defined my update processors as:


   
  
content
en,tr
language_code
other
true
true
  

   
   
 

 
   
 true
 signature
 false
 content
 3
 org.apache.solr.update.processor.TextProfileSignature
   
   
   
 

 
   
 200
   
   
   
   
 

My /update/extract request handler is as follows:


  
true
true
ignored_
content
ignored_
ignored_
  
  
dedupe
langid
ignore-commit-from-client
 


dedupe chain works nd signature field is populated but langid processor is
not triggered at this combination. When I change their places:


  
true
true
ignored_
content
ignored_
ignored_
  
  
langid
dedupe
ignore-commit-from-client
 


langid works but dedup is not activated (signature field is disappears).

I use Solr 6.3. How can I solve this problem?

Kind Regards,
Furkan KAMACI


Re: Update Request Processors are Not Chained

2018-10-04 Thread Furkan KAMACI
I found the problem :) Problem is processor are not combined into one chain.

On Thu, Oct 4, 2018 at 3:57 PM Furkan KAMACI  wrote:

> I've defined my update processors as:
>
> 
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>   
> content
> en,tr
> language_code
> other
> true
> true
>   
> 
>
>
>  
>
>  
>
>  true
>  signature
>  false
>  content
>  3
>   name="signatureClass">org.apache.solr.update.processor.TextProfileSignature
>
>
>
>  
>
>   default="true">
>
>  200
>
>
>
>
>  
>
> My /update/extract request handler is as follows:
>
>  startup="lazy"
> class="solr.extraction.ExtractingRequestHandler" >
>   
> true
> true
> ignored_
> content
> ignored_
> ignored_
>   
>   
> dedupe
> langid
> ignore-commit-from-client
>  
> 
>
> dedupe chain works nd signature field is populated but langid processor is
> not triggered at this combination. When I change their places:
>
>  startup="lazy"
> class="solr.extraction.ExtractingRequestHandler" >
>   
> true
> true
> ignored_
> content
> ignored_
> ignored_
>   
>   
> langid
> dedupe
> ignore-commit-from-client
>  
> 
>
> langid works but dedup is not activated (signature field is disappears).
>
> I use Solr 6.3. How can I solve this problem?
>
> Kind Regards,
> Furkan KAMACI
>


Filtering group query results

2018-10-04 Thread Greenhorn Techie
Hi,

We have a requirement where we need to perform a group query in Solr where
results are grouped by user-name (which is a field in our indexes) . We
then need to filter the results based on numFound response parameter
present under each group. In essence, we want to return results only where
numFound=1.

Looking into the documentation, I couldn’t figure out any mechanism to
achieve this. So wondering if there is a possibility to achieve this
requirement with the existing building blocks of Solr query mechanism.

Thanks


Re: solr and diversification

2018-10-04 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
The use case is on ranking news, Joel. And yes, I have the feeling that it 
might improve relevance and in 2011/2012 there was a lot of work on this in 
academia..

Thanks Tim, I'll check out MMR. 

From: solr-user@lucene.apache.org At: 09/28/18 20:24:44To:  
solr-user@lucene.apache.org
Subject: Re: solr and diversification

Interesting, I had not heard of MMR.


Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Sep 28, 2018 at 10:43 AM Tim Allison  wrote:

> If you haven’t already, might want to check out maximal marginal
> relevance...original paper: Carbonell and Goldstein.
>
> On Thu, Sep 27, 2018 at 7:29 PM Joel Bernstein  wrote:
>
> > Yeah, I think your plan sounds fine.
> >
> > Do you have a specific use case for diversity of results. I've been
> > wondering if diversity of results would provide better perceived
> relevance.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, Sep 27, 2018 at 1:39 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > dceccarel...@bloomberg.net> wrote:
> >
> > > Yeah, I think Kmeans might be a way to implement the "top 3 stories
> that
> > > are more distant", but you can also have a more naïve (and faster)
> > strategy
> > > like
> > >  - sending a threshold
> > >  - scan the documents according to the relevance score
> > >  - select the top documents that have diversity > threshold.
> > >
> > > I would allow to define the strategy and select it from the request.
> > >
> > > From: solr-user@lucene.apache.org At: 09/27/18 18:25:43To:  Diego
> > > Ceccarelli (BLOOMBERG/ LONDON ) ,  solr-user@lucene.apache.org
> > > Subject: Re: solr and diversification
> > >
> > > I've thought about this problem a little bit. What I was considering
> was
> > > using Kmeans clustering to cluster the top 50 docs, then pulling the
> top
> > > scoring doc form each cluster as the top documents. This should be fast
> > and
> > > effective at getting diversity.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Thu, Sep 27, 2018 at 1:20 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > > dceccarel...@bloomberg.net> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm considering to write a component for diversifying the results. I
> > know
> > > > that diversification can be achieved by using grouping but I'm
> thinking
> > > > about something different and query biased.
> > > > The idea is to have something that gets applied after the normal
> > > retrieval
> > > > and selects the top k documents more diverse based on some distance
> > > metric:
> > > >
> > > > Example:
> > > > imagine that you are asking for 10 rows, and you set diversify.rows=3
> > > > diversity.metric=tfidf  diversify.field=body
> > > >
> > > > Solr might retrieve the the top 10 rows as usual, extract tfidf
> vectors
> > > > for the bodies and select the top 3 stories that are more distant
> > > according
> > > > to the cosine similarity.
> > > > This would be different from grouping because documents will be
> > > > 'collapsed' or not based on the subset of documents retrieved for the
> > > > query.
> > > > Do you think it would make sense to have it as a component?  any
> > feedback
> > > > / idea?
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
>




Re: Modify the log directory for dih

2018-10-04 Thread Shawn Heisey

On 10/4/2018 12:30 AM, lala wrote:

Hi,
I am using:

Solr: 7.4
OS: windows7
I start solr using a service on startup.


In that case, I really have no idea where anything is on your system.

There is no service installation from the Solr project for Windows -- 
either you obtained that from somewhere else, or it's something written 
in-house.  Either way, you would need to talk to whoever created that 
service installation for help locating files on your setup.


In general, you need to find the log4j2.xml file that is controlling 
your logging configuration and modify it.  It contains a sample of how 
to log something to a separate file -- the slow query log.  That example 
redirects a specific logger name (which is similar to a full qualified 
class name and in most cases *is* the class name) to a different logfile.


Version 7.4 has a bug when running on Windows that causes a lot of 
problems specific to logging.


https://issues.apache.org/jira/browse/SOLR-12538

That problem has been fixed in the 7.5 release.  You can also fix it by 
editing the solr.cmd script manually.



Additional info: I am developing a web application that uses solr as search
engine, I use DIH to index folders in solr using the
FileListEntityProcessor. What I need is logging each index operation in a
file that I can reach & read to be able to detect failed index files in the
folder.


The FileListEntityProcessor class has absolutely no logging in it.  If 
you require that immediately, you would need to add logging commands to 
the source code and recompile Solr yourself to produce a package with 
your change.  With an enhancement issue in Jira, we can review what 
logging is suitable for the class, and probably make it work like 
SQLEntityProcessor in that regard.  If that's done the way I think it 
should be, then you could add config in log4j2.xml to could enable DEBUG 
level logging for that class specifically and write its logs to a 
separate logfile.


Thanks,
Shawn



Re: Filtering group query results

2018-10-04 Thread Shawn Heisey

On 10/4/2018 7:10 AM, Greenhorn Techie wrote:

We have a requirement where we need to perform a group query in Solr where
results are grouped by user-name (which is a field in our indexes) . We
then need to filter the results based on numFound response parameter
present under each group. In essence, we want to return results only where
numFound=1.


I don't think this is possible in Solr.  I'm reasonably sure that the 
document count isn't calculated until after all the querying and 
filtering is done.


It would be easy enough to do on the client side -- just skip over any 
group where the number of results is not what you're looking for.


I've got no idea how difficult it would be to write this kind of 
capability into the server side.  Off hand I would guess that it's 
probably not super difficult for someone who already knows that part of 
the code.  I don't know that code, so I'd be spending a lot of time 
learning it before I could make a change.


Thanks,
Shawn



Boolean clauses in ComplexPhraseQuery

2018-10-04 Thread Chuming Chen
Hi All,

Does Solr supports boolean clauses inside ComplexPhraseQuery? 

For example: {!complexphrase inOrder=true}  NOT (field: “value is this” OR 
field: “value is that”)

Thanks,

Chuming



Re: checksum failed (hardware problem?)

2018-10-04 Thread Stephen Bianamara
To be more concrete: Is the definitive test of whether or not a core's
index is corrupt to copy it onto a new set of hardware and attempt to write
to it? If this is a definitive test, we can run the experiment and update
the report so you have a sense of how often this happens.

Since this is a SOLR cloud node, which is already removed but whose data
dir was preserved, I believe I can just copy the data directory to a fresh
machine and start a regular non-cloud solr node hosting this core. Can you
please confirm that this will be a definitive test, or whether there is
some aspect needed to make it definitive?

Thanks!

On Wed, Oct 3, 2018 at 2:10 AM Stephen Bianamara 
wrote:

> Hello All --
>
> As it would happen, we've seen this error on version 6.6.2 very recently.
> This is also on an AWS instance, like Simon's report. The drive doesn't
> show any sign of being unhealthy, either from cursory investigation. FWIW,
> this occurred during a collection backup.
>
> Erick, is there some diagnostic data we can find to help pin this down?
>
> Thanks!
> Stephen
>
> On Sun, Sep 30, 2018 at 12:48 PM Susheel Kumar 
> wrote:
>
>> Thank you, Simon. Which basically points that something related to env and
>> was causing the checksum failures than any lucene/solr issue.
>>
>> Eric - I did check with hardware folks and they are aware of some VMware
>> issue where the VM hosted in HCI environment is coming into some halt
>> state
>> for minute or so and may be loosing connections to disk/network.  So that
>> probably may be the reason of index corruption though they have not been
>> able to find anything specific from logs during the time Solr run into
>> issue
>>
>> Also I had again issue where Solr is loosing the connection with zookeeper
>> (Client session timed out, have not heard from server in 8367ms for
>> sessionid 0x0)  Does that points to similar hardware issue, Any
>> suggestions?
>>
>> Thanks,
>> Susheel
>>
>> 2018-09-29 17:30:44.070 INFO
>> (searcherExecutor-7-thread-1-processing-n:server54:8080_solr
>> x:COLL_shard4_replica2 s:shard4 c:COLL r:core_node8) [c:COLL s:shard4
>> r:core_node8 x:COLL_shard4_replica2] o.a.s.c.SolrCore
>> [COLL_shard4_replica2] Registered new searcher
>> Searcher@7a4465b1[COLL_shard4_replica2]
>>
>> main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_7x3f(6.6.2):C826923/317917:delGen=2523)
>> Uninverting(_83pb(6.6.2):C805451/172968:delGen=2957)
>> Uninverting(_3ywj(6.6.2):C727978/334529:delGen=2962)
>> Uninverting(_7vsw(6.6.2):C872110/385178:delGen=2020)
>> Uninverting(_8n89(6.6.2):C741293/109260:delGen=3863)
>> Uninverting(_7zkq(6.6.2):C720666/101205:delGen=3151)
>> Uninverting(_825d(6.6.2):C707731/112410:delGen=3168)
>> Uninverting(_dgwu(6.6.2):C760421/295964:delGen=4624)
>> Uninverting(_gs5x(6.6.2):C540942/138952:delGen=1623)
>> Uninverting(_gu6a(6.6.2):c75213/35640:delGen=1110)
>> Uninverting(_h33i(6.6.2):c131276/40356:delGen=706)
>> Uninverting(_h5tc(6.6.2):c44320/11080:delGen=380)
>> Uninverting(_h9d9(6.6.2):c35088/3188:delGen=104)
>> Uninverting(_h80h(6.6.2):c11927/3412:delGen=153)
>> Uninverting(_h7ll(6.6.2):c11284/1368:delGen=205)
>> Uninverting(_h8bs(6.6.2):c11518/2103:delGen=149)
>> Uninverting(_h9r3(6.6.2):c16439/1018:delGen=52)
>> Uninverting(_h9z1(6.6.2):c9428/823:delGen=27)
>> Uninverting(_h9v2(6.6.2):c933/33:delGen=12)
>> Uninverting(_ha1c(6.6.2):c1056/1:delGen=1)
>> Uninverting(_ha6i(6.6.2):c1883/124:delGen=8)
>> Uninverting(_ha3x(6.6.2):c807/14:delGen=3)
>> Uninverting(_ha47(6.6.2):c1229/133:delGen=6)
>> Uninverting(_hapk(6.6.2):c523) Uninverting(_haoq(6.6.2):c279)
>> Uninverting(_hamr(6.6.2):c311) Uninverting(_hap0(6.6.2):c338)
>> Uninverting(_hapu(6.6.2):c275) Uninverting(_hapv(6.6.2):C4/2:delGen=1)
>> Uninverting(_hapw(6.6.2):C5/2:delGen=1)
>> Uninverting(_hapx(6.6.2):C2/1:delGen=1)
>> Uninverting(_hapy(6.6.2):C2/1:delGen=1)
>> Uninverting(_hapz(6.6.2):C3/1:delGen=1)
>> Uninverting(_haq0(6.6.2):C6/3:delGen=1)
>> Uninverting(_haq1(6.6.2):C1)))}
>> 2018-09-29 17:30:52.390 WARN
>>
>> (zkCallback-5-thread-91-processing-n:server54:8080_solr-SendThread(server117:2182))
>> [   ] o.a.z.ClientCnxn Client session timed out, have not heard from
>> server in 8367ms for sessionid 0x0
>> 2018-09-29 17:31:01.302 WARN
>>
>> (zkCallback-5-thread-91-processing-n:server54:8080_solr-SendThread(server120:2182))
>> [   ] o.a.z.ClientCnxn Client session timed out, have not heard from
>> server in 8812ms for sessionid 0x0
>> 2018-09-29 17:31:14.049 INFO
>> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
>>   ] o.a.s.c.c.ConnectionManager Connection with ZooKeeper
>> reestablished.
>> 2018-09-29 17:31:14.049 INFO
>> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
>>   ] o.a.s.c.ZkController ZooKeeper session re-connected ... refreshing
>> core states after session expiration.
>> 2018-09-29 17:31:14.051 INFO
>> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
>>   ] o.a.s.c.c.ZkStateReader Updated live no

Connecting Solr to Nutch

2018-10-04 Thread Timeka Cobb
Hello out there! I'm trying to create a small search engine and have
installed Nutch 1.15 and Solr 7.5.0..issue now is connecting the 2
primarily because the files required to create the Nutch core in Solr
doesn't exist i.e. basicconfig. How do I go about connecting the 2 so I can
begin crawling websites for the engine? Please help 😊

💗💗,
Timeka Cobb


Re: SPLITSHARD throwing OutOfMemory Error

2018-10-04 Thread Zheng Lin Edwin Yeo
Hi Atita,

It would be good to consider upgrading to have the use of the better
features like better memory consumption and better authentication.

On a side note, it is also good to upgrade now in Solr 7, as Solr Indexes
can only be upgraded from the previous major release version (Solr 6) to
the current major release version (Solr 7). Since you are using Solr 6.1,
so when Solr 8 comes around, it will not be possible to upgrade directly,
and the index will have to be upgrade to Solr 7 first before upgrading to
Solr 8.
http://lucene.apache.org/solr/guide/7_5/indexupgrader-tool.html

Regards,
Edwin

On Thu, 4 Oct 2018 at 17:41, Atita Arora  wrote:

> Hi Andrzej,
>
> We're rather weighing on a lot of other stuff to upgrade our Solr for a
> very long time like better authentication handling, backups using CDCR, new
> Replication mode and this probably has just given us another reason to
> upgrade.
> Thank you so much for the suggestion, I think its good to know about
> something like this exists. We'll find out more about this.
>
> Great day ahead!
>
> Regards,
> Atita
>
>
>
> On Thu, Oct 4, 2018 at 11:28 AM Andrzej Białecki  wrote:
>
> > I know it’s not much help if you’re stuck with Solr 6.1 … but Solr 7.5
> > comes with an alternative strategy for SPLITSHARD that doesn’t consume as
> > much memory and nearly doesn’t consume additional disk space on the
> leader.
> > This strategy can be turned on by “splitMethod=link” parameter.
> >
> > > On 4 Oct 2018, at 10:23, Atita Arora  wrote:
> > >
> > > Hi Edwin,
> > >
> > > Thanks for following up on this.
> > >
> > > So here are the configs :
> > >
> > > Memory - 30G - 20 G to Solr
> > > Disk - 1TB
> > > Index = ~ 500G
> > >
> > > and I think that it possibly is due to the reason why this could be
> > > happening is that during split shard, the unsplit index + split index
> > > persists on the instance and may be causing this.
> > > I actually tried splitshard on another instance with index size 64G and
> > it
> > > went through without any issues.
> > >
> > > I would appreciate if you have additional information to enlighten me
> on
> > > this issue.
> > >
> > > Thanks again.
> > >
> > > Regards,
> > >
> > > Atita
> > >
> > > On Thu, Oct 4, 2018 at 9:47 AM Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi Atita,
> > >>
> > >> What is the amount of memory that you have in your system?
> > >> And what is your index size?
> > >>
> > >> Regards,
> > >> Edwin
> > >>
> > >> On Tue, 25 Sep 2018 at 22:39, Atita Arora 
> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I am working on a test setup with Solr 6.1.0 cloud with 1 collection
> > >>> sharded across 2 shards with no replication. When triggered a
> > SPLITSHARD
> > >>> command it throws "java.lang.OutOfMemoryError: Java heap space"
> > >> everytime.
> > >>> I tried this with multiple heap settings of 8, 12 & 20G but every
> time
> > it
> > >>> does create 2 sub-shards but then fails eventually.
> > >>> I know the issue => https://jira.apache.org/jira/browse/SOLR-5214
> has
> > >> been
> > >>> resolved but the trace looked very similar to this one.
> > >>> Also just to ensure that I do not run into exceptions due to merge as
> > >>> reported in this ticket, I also tried running optimize before
> > proceeding
> > >>> with splitting the shard.
> > >>> I issued the following commands :
> > >>>
> > >>> 1.
> > >>>
> > >>>
> > >>
> >
> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD
> > >>>
> > >>> This threw java.lang.OutOfMemoryError: Java heap space
> > >>>
> > >>> 2.
> > >>>
> > >>>
> > >>
> >
> http://localhost:8983/solr/admin/collections?collection=testcollection&shard=shard1&action=SPLITSHARD&async=1000
> > >>>
> > >>> Then I ran with async=1000 and checked the status. Every time It's
> > >> creating
> > >>> the sub shards, but not splitting the index.
> > >>>
> > >>> Is there something that I am not doing correctly?
> > >>>
> > >>> Please guide.
> > >>>
> > >>> Thanks,
> > >>> Atita
> > >>>
> > >>
> >
> > —
> >
> > Andrzej Białecki
> >
> >
>