Securying ONLY the web interface console

2018-03-19 Thread Jesus Olivan
hi!

i'm trying to password protect only Solr web interface (not queries
launched from my app). I'm currently using SolrCloud 6.6.0 with external
zookeepers. I've read tons of Docs about it, but i couldn't find a proper
way to secure ONLY the web admin console. Can anybody give me some light
about it, please? =)

Thanks in advance!


Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Jesus Olivan
Hi Adam,

IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM,
your JVM can afford more RAM without threading penalties due to outside
heap RAM lacks.

Another good one would be to increase -XX:CMSInitiatingOccupancyFraction=50
to 75. I think that CMS collector works better when Old generation space is
more populated.

I usually use to set Survivor spaces to lesser size. If you want to try
SurvivorRatio to 6, i think performance would be improved.

Another good practice for me would be to set an static NewSize instead
of -XX:NewRatio=3.
You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one third
of total heap space is recommended).

Finally, my best results after a deep JVM I+D related to Solr, came
removing ScavengeBeforeRemark flag and applying this new one: +
ParGCCardsPerStrideChunk.

However, It would be a good one to set ParallelGCThreads and
*ConcGCThreads *to their optimal value, and we need you system CPU number
to know it. Can you provide this data, please?

Regards


2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller :

> Hey all,
>
> I was wondering if I could get some JVM/GC tuning advice to resolve an
> issue that we are experiencing.
>
> Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
> render would be greatly appreciated.
>
> Our Solr cloud nodes are having issues throwing OOM exceptions under load.
> This issue has only started manifesting itself over the last few months
> during which time the only change I can discern is an increase in index
> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
> index is currently 58G and the server has 46G of physical RAM and runs
> nothing other than the Solr node.
>
> The JVM is invoked with the following JVM options:
> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 -XX:+ManagementServer
> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
> -XX:NewRatio=3 -XX:OldPLABSize=16
> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 3
> /data/gnpd/solr/logs
> -XX:ParallelGCThreads=4
> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>
> These values were decided upon serveral years by a colleague based upon
> some suggestions from this mailing group with an index size ~25G.
>
> I have imported the GC logs into GCViewer and attached a link to a
> screenshot showing the lead up to a OOM crash.  Interestingly the young
> generation space is almost empty before the repeated GC's and subsequent
> crash.
> https://imgur.com/a/Wtlez
>
> I was considering slowly increasing the amount of heap available to the JVM
> slowly until the crashes, any other suggestions?  I'm looking at trying to
> get the nodes stable without having issues with the GC taking forever to
> run.
>
> Additional information can be provided on request.
>
> Cheers!
> Adam
>
> --
>
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in
> England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for
> our other offices can be found at http://www.mintel.com/office-locations
> .
>
> This email and any attachments
> may include content that is confidential, privileged
> or otherwise
> protected under applicable law. Unauthorised disclosure, copying,
> distribution
> or use of the contents is prohibited and may be unlawful. If
> you have received this email in error,
> including without appropriate
> authorisation, then please reply to the sender about the error
> and delete
> this email and any attachments.
>
>


Full import alternatives

2018-04-13 Thread Jesus Olivan
Hi!

we're trying to launch a full import of 375 millions of docs aprox. from a
MySQL database to our solrcloud cluster. Until now, this full import
process takes around 24/27 hours to finish due to an huge import query
(several group bys, left joins, etc), but after another import query
modification (adding more complexity), we're unable to execute this full
import from MySQL.

We've done some research about migrating to PostgreSQL, but this option is
now a real option at this time, because it implies a big refatoring from
several dev teams.

Is there some alternative ways to perform successfully this full import
process?

Any ideas are welcome :)

Thanks in advance!


Re: Full import alternatives

2018-04-13 Thread Jesus Olivan
Hi Shawn,

thanks for your answer. It happens that when we launch full import process
didn't finished (we wait for more than 60 hours last time, and we cancelled
it, because this is not an acceptable time for us) There weren't any errors
in solr logfile simply because it was working fine. The problem is that it
lasted eternally and didn't finish. We tried it on Aurora cluster under
AWS, and after 20 hours of work, it failed due to lack of space in Aurora
tmp folder.



2018-04-13 18:41 GMT+02:00 Shawn Heisey :

> On 4/13/2018 10:11 AM, Jesus Olivan wrote:
> > we're trying to launch a full import of 375 millions of docs aprox. from
> a
> > MySQL database to our solrcloud cluster. Until now, this full import
> > process takes around 24/27 hours to finish due to an huge import query
> > (several group bys, left joins, etc), but after another import query
> > modification (adding more complexity), we're unable to execute this full
> > import from MySQL.
> >
> > We've done some research about migrating to PostgreSQL, but this option
> is
> > now a real option at this time, because it implies a big refatoring from
> > several dev teams.
> >
> > Is there some alternative ways to perform successfully this full import
> > process?
>
> DIH is a capable tool, and for what it does, it's remarkably efficient.
>
> It can't really be made any faster, because it's single threaded.  To
> get increased index speed with Solr, you must index documents from
> several sources/processes/threads at the same time.  Writing custom
> software that can retrieve information from your source, build the
> documents you require, and send several update requests simultaneously
> will yield the best results.  The source itself may be a bottleneck
> though -- this is frequently the case, and Solr is often MUCH faster
> than the information source.
>
> You said that you're unable to execute an updated import from MySQL.
> What exactly happens when you try?  Are there any errors in your solr
> logfile?
>
> I'm not going to debate whether MySQL or PostgreSQL is the better
> solution.  For my indexes, my source data is in MySQL.  It works well,
> but full rebuilds using DIH are slower than I would like -- because it's
> single-threaded.  Our overall system architecture would probably be
> improved by a switch to PostgreSQL, but it would be an extremely
> time-consuming transition process.  We aren't having any real issues
> with MySQL, so we have no incentive to spend the required effort.
>
> Thanks,
> Shawn
>
>


Re: Full import alternatives

2018-04-13 Thread Jesus Olivan
hi Shawn,

first of all, thanks for your answer.

How you import simultaneously these 6 shards?

2018-04-13 19:30 GMT+02:00 Shawn Heisey :

> On 4/13/2018 11:03 AM, Jesus Olivan wrote:
> > thanks for your answer. It happens that when we launch full import
> process
> > didn't finished (we wait for more than 60 hours last time, and we
> cancelled
> > it, because this is not an acceptable time for us) There weren't any
> errors
> > in solr logfile simply because it was working fine. The problem is that
> it
> > lasted eternally and didn't finish. We tried it on Aurora cluster under
> > AWS, and after 20 hours of work, it failed due to lack of space in Aurora
> > tmp folder.
>
> 375 million documents importing from MySQL with one DIH import is going
> to take quite a while.
>
> The last full rebuild I did of my main index took 21.61 hours.  This is
> an index where six large shards build simultaneously, using DIH, each
> one having more than 30 million documents.  If I were to build it as a
> single 180 million document import, it would probably take 5 days, maybe
> longer.
>
> We had another index (since retired) that had more than 400 million
> total documents, built similarly with multiple shards at the same time.
> The last rebuild I can remember on that index took about two days.
>
> Thanks,
> Shawn
>
>