Re: Connecting Solr to Nutch

2018-10-05 Thread Jan Høydahl
This is more a questions for the Nutch community to answer. 
Googling, I found a Tutorial which seems fairly up to date (2018-09-10), 
perhaps try to follow that one?
https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search 


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 5. okt. 2018 kl. 03:53 skrev Timeka Cobb :
> 
> Hello out there! I'm trying to create a small search engine and have
> installed Nutch 1.15 and Solr 7.5.0..issue now is connecting the 2
> primarily because the files required to create the Nutch core in Solr
> doesn't exist i.e. basicconfig. How do I go about connecting the 2 so I can
> begin crawling websites for the engine? Please help 😊
> 
> 💗💗,
> Timeka Cobb



Re: Boolean clauses in ComplexPhraseQuery

2018-10-05 Thread Mikhail Khludnev
Why not?

On Thu, Oct 4, 2018 at 6:52 PM Chuming Chen  wrote:

> Hi All,
>
> Does Solr supports boolean clauses inside ComplexPhraseQuery?
>
> For example: {!complexphrase inOrder=true}  NOT (field: “value is
> this” OR field: “value is that”)
>
> Thanks,
>
> Chuming
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Modify the log directory for dih

2018-10-05 Thread Charlie Hull

On 04/10/2018 16:35, Shawn Heisey wrote:

On 10/4/2018 12:30 AM, lala wrote:

Hi,
I am using:

Solr: 7.4
OS: windows7
I start solr using a service on startup.


In that case, I really have no idea where anything is on your system.

There is no service installation from the Solr project for Windows -- 
either you obtained that from somewhere else, or it's something written 
in-house.  Either way, you would need to talk to whoever created that 
service installation for help locating files on your setup.


We usually use NSSM for service-ifying Solr on Windows, I'd recommend 
you consider that. Also, bear in mind that a Windows Service can't 
output to stdout or stderr so some messages simply won't go anywhere - 
but the NSSM documentation is helpful.


Charlie


In general, you need to find the log4j2.xml file that is controlling 
your logging configuration and modify it.  It contains a sample of how 
to log something to a separate file -- the slow query log.  That example 
redirects a specific logger name (which is similar to a full qualified 
class name and in most cases *is* the class name) to a different logfile.


Version 7.4 has a bug when running on Windows that causes a lot of 
problems specific to logging.


https://issues.apache.org/jira/browse/SOLR-12538

That problem has been fixed in the 7.5 release.  You can also fix it by 
editing the solr.cmd script manually.


Additional info: I am developing a web application that uses solr as 
search

engine, I use DIH to index folders in solr using the
FileListEntityProcessor. What I need is logging each index operation in a
file that I can reach & read to be able to detect failed index files 
in the

folder.


The FileListEntityProcessor class has absolutely no logging in it.  If 
you require that immediately, you would need to add logging commands to 
the source code and recompile Solr yourself to produce a package with 
your change.  With an enhancement issue in Jira, we can review what 
logging is suitable for the class, and probably make it work like 
SQLEntityProcessor in that regard.  If that's done the way I think it 
should be, then you could add config in log4j2.xml to could enable DEBUG 
level logging for that class specifically and write its logs to a 
separate logfile.


Thanks,
Shawn




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Encoding issue in solr

2018-10-05 Thread UMA MAHESWAR
HI ALL,

while i am using nutch for crawling and indexing in to solr,while storing
data in to solr encoding issue facing 


in site  having the title

title : ebm-papst Motoren & Ventilatoren GmbH - Axialventilatoren und
Radialventilatoren aus Linz, Österreich

but in solr storing in the below format

title": "ebm-papst Motoren & Ventilatoren GmbH - Axialventilatoren und
Radialventilatoren aus Linz, Österrei",

suggest me how to store actual data in to solr .

thanks for your suggestions.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Apache SOLR upgrade from 5.2.1 to 7.x

2018-10-05 Thread padmanabhan1616


Hi Team,

We planning to upgrade SOLR from 5.2.1 to 7.x version. I just googled and
found that there is no way of upgrading from 5.x to 7.x directly 

Here are the list of suggestions gathered from different sources

1. We cannot upgrade directly from 5.x to 7.x instead upgrade to 5.5 then
upgrade to 7 as there is major index format level changes taken place in 5.5
or later version.
1. Use index upgrade tool which can allow to upgrade all old indexes to new
index format then we can upgrade to 7.x version easily.

We are really struggling to get the right options here for this upgrade. 

I Know we are running very old version. We need this to be happen ASAP. 

Can you please suggest us what is the right approcuh here to upgrade from
5.2.1 version to 7.x version.

Thanks,
Padmanabhan



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Apache zookeeper jar upgrade for SOLR

2018-10-05 Thread padmanabhan1616
Hi Jan,

Thank you so much for your answers.

Yes . I agree we are running very old version of SOLR. We decided to upgrade
SOLR to 7.x version.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Cloud in recovering state & down state for long

2018-10-05 Thread Ganesh Sethuraman
1. Does GC and Solr Logs help to why the Solr replicas server continues to
be in the recovering/ state? Our assumption is that Sept 17 16:00 hrs we
had done ZK transaction log reading, that might have caused the issue. Is
that correct?
2. Does this state can cause slowness to Solr Queries for reads?
3. Is there any way to get notified/email if the servers has any replica
gets into the recovery mode?


On Wed, Oct 3, 2018 at 5:26 PM Ganesh Sethuraman 
wrote:

>
>
>
> On Tue, Oct 2, 2018 at 11:46 PM Shawn Heisey  wrote:
>
>> On 10/2/2018 8:55 PM, Ganesh Sethuraman wrote:
>> > We are using 2 node SolrCloud 7.2.1 cluster with external 3 node ZK
>> > ensemble in AWS. There are about 60 collections at any point in time. We
>> > have per JVM max heap of 8GB.
>>
>> Let's focus for right now on a single Solr machine, rather than the
>> whole cluster.  How many shard replicas (cores) are on one server?  How
>> much disk space does all the index data take? How many documents
>> (maxDoc, which includes deleted docs) are in all those cores?  What is
>> the total amount of RAM on the server? Is there any other software
>> besides Solr running on each server?
>>
>> We have  471 replicas are available in each server we have about 60
> collections each with 8 shards and 2 replica. Couple of them just 2 shards
> they are small size. Note that only about 30 of them are actively used. Old
> collections are periodically deleted.
> 470 GB of index data per node
> Max Doc per collection is about 300M. However average per collection will
> be about 50M Docs.
> 256GB RAM (24 vCPUs) on each of the two AWS
> No other software running on the box
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
>
>>
>> > But as stated above problem, we will have few collection replicas in the
>> > recovering and down state. In the past we have seen it come back to
>> normal
>> > by restarting the solr server, but we want to understand is there any
>> way
>> > to get this back to normal (all synched up with Zookeeper) through
>> command
>> > line/admin? Another question is, being in this state can it cause data
>> > issue? How do we check that (distrib=false on collection count?)?
>>
>> As long as you have at least one replica operational on every shard, you
>> should be OK.  But if you only have one replica operational, then you're
>> in a precarious state, where one additional problem could result in
>> something being unavailable.
>>
>> thanks for info.
>
>> If all is well, SolrCloud should not have replicas stay in down or
>> recovering state for very long, unless they're really large, in which
>> case it can take a while to copy the data from the leader.  If that
>> state persists for a long time, there's probably something going wrong
>> with your Solr install.  Usually restarting Solr is the only way to
>> recover persistently down replicas.  If it happens again after restart,
>> then the root problem has not been dealt with, and you're going to need
>> to figure it out.
>>
>> Ok. Based on the point above it looks restarting the only option, no
> other way to sync with ZK.  Thanks for that
>
> The log snippet you shared only covers a timespan of less than one
>> second, so it's not very helpful in making any kind of determination.
>> The "session expired" message sounds like what happens when the
>> zkClientTimeout value is exceeded.  Internally, this value defaults to
>> 15 seconds, and typical example configs set it to 30 seconds ... so when
>> the session expires, it means there's a SERIOUS problem.  For computer
>> software, 15 or 30 seconds is a relative eternity.  A properly running
>> system should NEVER exceed that timeout.
>>
>> I don't think we have a memory issue (GC Log for busy day is posted
> here), we had Solr going out of sync with ZK because of the manual ZK
> Transaction log parsing/checking on the server (we did that on the Sept 17
> 16:00 UTC as you can see in the log), which resulted in ZK timeout. Since
> then the Solr has not returned to normal.  Is there a possibility of the
> Solr query (real time GET )response time increasing due the solr servers
> being in recovering/Down state?
>
> Here is the full Solr Log file (Note that it is in INFO mode):
> https://raw.githubusercontent.com/ganeshmailbox/har/master/SolrLogFile
> Here is the GC Log:
> http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMTAvMy8tLTAxX3NvbHJfZ2MubG9nLjUtLTIxLTE5LTU3
>
>
> Can you share your solr log when the problem happens, covering a
>> timespan of at least a few minutes (and ideally much longer), as well as
>> a gc log from a time when Solr was up for a long time?  Hopefully the
>> solr.log and gc log will cover the same timeframe.  You'll need to use a
>> file sharing site for the GC log, since it's likely to be a large file.
>> I would suggest compressing it.  If the solr.log is small enough, you
>> could use a paste website for that, but if it's large, you'll need to
>> use a file sharin

Re: checksum failed (hardware problem?)

2018-10-05 Thread Susheel Kumar
My understanding is once the index is corrupt, the only way to fix is using
checkindex utility which will remove some bad segments and then only we can
use it.

This is bit scary that you see similar error on 6.6.2 though in our case we
know we are going thru some hardware problem which likely would have caused
the corruption but there is no concrete evidence which can be used to
confirm if it is hardware or Solr/Lucene.  Are you able to use another AWS
instance similar to Simon's case.

Thanks,
Susheel

On Thu, Oct 4, 2018 at 7:11 PM Stephen Bianamara 
wrote:

> To be more concrete: Is the definitive test of whether or not a core's
> index is corrupt to copy it onto a new set of hardware and attempt to write
> to it? If this is a definitive test, we can run the experiment and update
> the report so you have a sense of how often this happens.
>
> Since this is a SOLR cloud node, which is already removed but whose data
> dir was preserved, I believe I can just copy the data directory to a fresh
> machine and start a regular non-cloud solr node hosting this core. Can you
> please confirm that this will be a definitive test, or whether there is
> some aspect needed to make it definitive?
>
> Thanks!
>
> On Wed, Oct 3, 2018 at 2:10 AM Stephen Bianamara 
> wrote:
>
> > Hello All --
> >
> > As it would happen, we've seen this error on version 6.6.2 very recently.
> > This is also on an AWS instance, like Simon's report. The drive doesn't
> > show any sign of being unhealthy, either from cursory investigation.
> FWIW,
> > this occurred during a collection backup.
> >
> > Erick, is there some diagnostic data we can find to help pin this down?
> >
> > Thanks!
> > Stephen
> >
> > On Sun, Sep 30, 2018 at 12:48 PM Susheel Kumar 
> > wrote:
> >
> >> Thank you, Simon. Which basically points that something related to env
> and
> >> was causing the checksum failures than any lucene/solr issue.
> >>
> >> Eric - I did check with hardware folks and they are aware of some VMware
> >> issue where the VM hosted in HCI environment is coming into some halt
> >> state
> >> for minute or so and may be loosing connections to disk/network.  So
> that
> >> probably may be the reason of index corruption though they have not been
> >> able to find anything specific from logs during the time Solr run into
> >> issue
> >>
> >> Also I had again issue where Solr is loosing the connection with
> zookeeper
> >> (Client session timed out, have not heard from server in 8367ms for
> >> sessionid 0x0)  Does that points to similar hardware issue, Any
> >> suggestions?
> >>
> >> Thanks,
> >> Susheel
> >>
> >> 2018-09-29 17:30:44.070 INFO
> >> (searcherExecutor-7-thread-1-processing-n:server54:8080_solr
> >> x:COLL_shard4_replica2 s:shard4 c:COLL r:core_node8) [c:COLL s:shard4
> >> r:core_node8 x:COLL_shard4_replica2] o.a.s.c.SolrCore
> >> [COLL_shard4_replica2] Registered new searcher
> >> Searcher@7a4465b1[COLL_shard4_replica2]
> >>
> >>
> main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_7x3f(6.6.2):C826923/317917:delGen=2523)
> >> Uninverting(_83pb(6.6.2):C805451/172968:delGen=2957)
> >> Uninverting(_3ywj(6.6.2):C727978/334529:delGen=2962)
> >> Uninverting(_7vsw(6.6.2):C872110/385178:delGen=2020)
> >> Uninverting(_8n89(6.6.2):C741293/109260:delGen=3863)
> >> Uninverting(_7zkq(6.6.2):C720666/101205:delGen=3151)
> >> Uninverting(_825d(6.6.2):C707731/112410:delGen=3168)
> >> Uninverting(_dgwu(6.6.2):C760421/295964:delGen=4624)
> >> Uninverting(_gs5x(6.6.2):C540942/138952:delGen=1623)
> >> Uninverting(_gu6a(6.6.2):c75213/35640:delGen=1110)
> >> Uninverting(_h33i(6.6.2):c131276/40356:delGen=706)
> >> Uninverting(_h5tc(6.6.2):c44320/11080:delGen=380)
> >> Uninverting(_h9d9(6.6.2):c35088/3188:delGen=104)
> >> Uninverting(_h80h(6.6.2):c11927/3412:delGen=153)
> >> Uninverting(_h7ll(6.6.2):c11284/1368:delGen=205)
> >> Uninverting(_h8bs(6.6.2):c11518/2103:delGen=149)
> >> Uninverting(_h9r3(6.6.2):c16439/1018:delGen=52)
> >> Uninverting(_h9z1(6.6.2):c9428/823:delGen=27)
> >> Uninverting(_h9v2(6.6.2):c933/33:delGen=12)
> >> Uninverting(_ha1c(6.6.2):c1056/1:delGen=1)
> >> Uninverting(_ha6i(6.6.2):c1883/124:delGen=8)
> >> Uninverting(_ha3x(6.6.2):c807/14:delGen=3)
> >> Uninverting(_ha47(6.6.2):c1229/133:delGen=6)
> >> Uninverting(_hapk(6.6.2):c523) Uninverting(_haoq(6.6.2):c279)
> >> Uninverting(_hamr(6.6.2):c311) Uninverting(_hap0(6.6.2):c338)
> >> Uninverting(_hapu(6.6.2):c275) Uninverting(_hapv(6.6.2):C4/2:delGen=1)
> >> Uninverting(_hapw(6.6.2):C5/2:delGen=1)
> >> Uninverting(_hapx(6.6.2):C2/1:delGen=1)
> >> Uninverting(_hapy(6.6.2):C2/1:delGen=1)
> >> Uninverting(_hapz(6.6.2):C3/1:delGen=1)
> >> Uninverting(_haq0(6.6.2):C6/3:delGen=1)
> >> Uninverting(_haq1(6.6.2):C1)))}
> >> 2018-09-29 17:30:52.390 WARN
> >>
> >>
> (zkCallback-5-thread-91-processing-n:server54:8080_solr-SendThread(server117:2182))
> >> [   ] o.a.z.ClientCnxn Client session timed out, have not heard from
> >> server in 8367ms for sessionid 0x0
> >> 2018-09-29 17:31:01.3

Re: Connecting Solr to Nutch

2018-10-05 Thread Timeka Cobb
Good morning! The Nutch community doesn't help much..the problem I notice
is where they say install Solr the first step create resources: the
basicconfig file does not exist at all in the Solr packet..I can't connect
because Solr is missing files that are required in the setup process. Maybe
try to install this in the Nutch directory..I don't know but I'm going to
figure it out. Thank you for your help😊

On Fri, Oct 5, 2018, 3:36 AM Jan Høydahl  wrote:

> This is more a questions for the Nutch community to answer.
> Googling, I found a Tutorial which seems fairly up to date (2018-09-10),
> perhaps try to follow that one?
> https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search <
> https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search>
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 5. okt. 2018 kl. 03:53 skrev Timeka Cobb :
> >
> > Hello out there! I'm trying to create a small search engine and have
> > installed Nutch 1.15 and Solr 7.5.0..issue now is connecting the 2
> > primarily because the files required to create the Nutch core in Solr
> > doesn't exist i.e. basicconfig. How do I go about connecting the 2 so I
> can
> > begin crawling websites for the engine? Please help 😊
> >
> > 💗💗,
> > Timeka Cobb
>
>


Re: Solr Cloud in recovering state & down state for long

2018-10-05 Thread Shawn Heisey

On 10/5/2018 5:15 AM, Ganesh Sethuraman wrote:

1. Does GC and Solr Logs help to why the Solr replicas server continues to
be in the recovering/ state? Our assumption is that Sept 17 16:00 hrs we
had done ZK transaction log reading, that might have caused the issue. Is
that correct?
2. Does this state can cause slowness to Solr Queries for reads?
3. Is there any way to get notified/email if the servers has any replica
gets into the recovery mode?


Seeing the GC log and Solr log will allow us to look for problems.  It 
won't solve anything, it just lets us examine the situation, see if 
there is any evidence to point to the root issue and maybe a solution.


If you're running with a heap that's too small, you can get into a 
situation where you never actually run out of memory, but the amount of 
available memory is so small that Java must continually run full garbage 
collections to keep enough of it free for the program to stay running.  
This can happen to ANY java program, including your ZK servers.


If that happens, the program itself will only be running a small 
percentage of the time, and there will be extremely long pauses where 
very little happens other than garbage collection, and then when the 
program starts running again, it realizes that its timeouts have been 
exceeded, which in SolrCloud, will initiate recovery operations ... and 
that will probably keep the GC pause storm happening.


With an 8 GB heap and likely billions of documents being handled by one 
Solr instance, that low-memory situation I just described seems very 
possible.  The solution is to make the heap bigger.  Your Solr install 
is very large ... it seems unlikely to me that 8GB would be enough.  
Solr is not typically a memory hog kind of application, if what it is 
asked to do is small.  When it is asked to do a bigger job, more memory 
will be required.


Running without sufficient system memory to effectively cache the 
indexes that are actively used can also cause performance problems.  
This is memory *NOT* allocated to programs like Solr, that the OS is 
free to use for caching purposes.  With a busy enough server, 
performance problems caused by that can spiral and lead to SolrCloud 
recovery issues.


Thanks,
Shawn



Does SolrJ support JSON DSL?

2018-10-05 Thread Alexandre Rafalovitch
Hi,

Does anybody know if it is possible to do the new JSON DSL and JSON
Facets requests via SolrJ. The SolrJ documentation is a bit sparse and
I don't often use it. So, I can't figure out if there is a direct
support or even a pass-through workaround.

Thank you,
   Alex.


Re: Apache SOLR upgrade from 5.2.1 to 7.x

2018-10-05 Thread Shawn Heisey

On 10/5/2018 4:41 AM, padmanabhan1616 wrote:

1. We cannot upgrade directly from 5.x to 7.x instead upgrade to 5.5 then
upgrade to 7 as there is major index format level changes taken place in 5.5
or later version.


Solr 7.x cannot read indexes from 5.5.  It can only read indexes that 
were *fully* constructed by versions back to 6.0.0.



1. Use index upgrade tool which can allow to upgrade all old indexes to new
index format then we can upgrade to 7.x version easily.


We have been advised by Lucene experts that if a version that's at least 
two major versions before the target version has *EVER* touched the 
index, there's no guarantee that the index will work even after 
upgrading through the major versions one by one. The compatibility 
guarantee only goes back one major version.


I would strongly recommend with ANY upgrade that you always build the 
index from scratch.  That produces the best results.


This is becoming a frequently asked question, so I have built a wiki 
page to answer it:


https://wiki.apache.org/solr/VersionCompatibility

Thanks,
Shawn



Re: Connecting Solr to Nutch

2018-10-05 Thread Shawn Heisey

On 10/5/2018 7:24 AM, Timeka Cobb wrote:

Good morning! The Nutch community doesn't help much..the problem I notice
is where they say install Solr the first step create resources: the
basicconfig file does not exist at all in the Solr packet..I can't connect
because Solr is missing files that are required in the setup process. Maybe
try to install this in the Nutch directory..I don't know but I'm going to
figure it out. Thank you for your help😊


The config example included with Solr that is considered "default" used 
to be called basic_configs.  In more recent versions, it has been 
renamed to _default. The name does include the leading underscore as I 
have written it.


Nutch should not be relying on example configs included with Solr.  
Those can easily change in new versions to be something that's not 
compatible with their software.  They should be completely supplying the 
entire configuration (the "nutch" configset).  This includes the schema 
and solrconfig.xml, as well as any other config files referenced by 
those two.  Different configs for different Solr versions might become 
necessary ... they will need to be prepared for that.


Thanks,
Shawn



Re: Solr Cloud in recovering state & down state for long

2018-10-05 Thread Ganesh Sethuraman
Reading the ZK transaction log  could be issue, as ZK seems to be sensitive
to this (
https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#The+Log+Directory
)

> incorrect placement of transasction log
> The most performance critical part of ZooKeeper is the transaction log.
> ZooKeeper syncs transactions to media before it returns a response. A
> dedicated transaction log device is key to consistent good performance.
> Putting the log on a busy device will adversely effect performance. If you
> only have one storage device, put trace files on NFS and increase the
> snapshotCount; it doesn't eliminate the problem, but it should mitigate it.


I am not sure the logs and GC logs were evident from my previous mail.
Re-posting it here for your reference:

Here is the full Solr Log file (Note that it is in INFO mode):
https://raw.githubusercontent.com/ganeshmailbox/har/master/SolrLogFile
Here is the GC Log:
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMTAvMy8tLTAxX3NvbHJfZ2MubG9nLjUtLTIxLTE5LTU3

Thanks
Ganesh

On Fri, Oct 5, 2018 at 10:13 AM Shawn Heisey  wrote:

> On 10/5/2018 5:15 AM, Ganesh Sethuraman wrote:
> > 1. Does GC and Solr Logs help to why the Solr replicas server continues
> to
> > be in the recovering/ state? Our assumption is that Sept 17 16:00 hrs we
> > had done ZK transaction log reading, that might have caused the issue. Is
> > that correct?
> > 2. Does this state can cause slowness to Solr Queries for reads?
> > 3. Is there any way to get notified/email if the servers has any replica
> > gets into the recovery mode?
>
> Seeing the GC log and Solr log will allow us to look for problems.  It
> won't solve anything, it just lets us examine the situation, see if
> there is any evidence to point to the root issue and maybe a solution.
>
> If you're running with a heap that's too small, you can get into a
> situation where you never actually run out of memory, but the amount of
> available memory is so small that Java must continually run full garbage
> collections to keep enough of it free for the program to stay running.
> This can happen to ANY java program, including your ZK servers.
>
> If that happens, the program itself will only be running a small
> percentage of the time, and there will be extremely long pauses where
> very little happens other than garbage collection, and then when the
> program starts running again, it realizes that its timeouts have been
> exceeded, which in SolrCloud, will initiate recovery operations ... and
> that will probably keep the GC pause storm happening.
>
> With an 8 GB heap and likely billions of documents being handled by one
> Solr instance, that low-memory situation I just described seems very
> possible.  The solution is to make the heap bigger.  Your Solr install
> is very large ... it seems unlikely to me that 8GB would be enough.
> Solr is not typically a memory hog kind of application, if what it is
> asked to do is small.  When it is asked to do a bigger job, more memory
> will be required.
>
> Running without sufficient system memory to effectively cache the
> indexes that are actively used can also cause performance problems.
> This is memory *NOT* allocated to programs like Solr, that the OS is
> free to use for caching purposes.  With a busy enough server,
> performance problems caused by that can spiral and lead to SolrCloud
> recovery issues.
>
> Thanks,
> Shawn
>
>


Re: Solr Cloud in recovering state & down state for long

2018-10-05 Thread Shawn Heisey

On 10/5/2018 9:15 AM, Ganesh Sethuraman wrote:

I am not sure the logs and GC logs were evident from my previous mail.
Re-posting it here for your reference:

Here is the full Solr Log file (Note that it is in INFO mode):
https://raw.githubusercontent.com/ganeshmailbox/har/master/SolrLogFile
Here is the GC Log:
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMTAvMy8tLTAxX3NvbHJfZ2MubG9nLjUtLTIxLTE5LTU3


The GC log shows pretty good performance.  The note at the top talks 
about consecutive full GCs, but the peak usage on the heap isn't close 
to max heap, so I don't know why that would be happening.  It also says 
that there's a lot of application waiting for resources ... which can be 
caused by not having enough memory for caching purposes.  The solution 
there would be to add total memory to the system ... no config changes 
are likely to help.


Even though the GC log doesn't seem to indicate extreme memory pressure, 
I would still suggest that you make the heap a little bit bigger.  Maybe 
10GB instead of 8GB.  See if that helps at all.  It might not.


There are a TON of errors and warnings in the solr log, things that are 
very strange and may indicate other problems going on.


Thanks,
Shawn



Re: Does SolrJ support JSON DSL?

2018-10-05 Thread Mikhail Khludnev
There's nothing out-of-the-box.

On Fri, Oct 5, 2018 at 5:34 PM Alexandre Rafalovitch 
wrote:

> Hi,
>
> Does anybody know if it is possible to do the new JSON DSL and JSON
> Facets requests via SolrJ. The SolrJ documentation is a bit sparse and
> I don't often use it. So, I can't figure out if there is a direct
> support or even a pass-through workaround.
>
> Thank you,
>Alex.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Connecting Solr to Nutch

2018-10-05 Thread Timeka Cobb
Thank you so very much for the help!!

On Fri, Oct 5, 2018 at 10:53 AM Shawn Heisey  wrote:

> On 10/5/2018 7:24 AM, Timeka Cobb wrote:
> > Good morning! The Nutch community doesn't help much..the problem I notice
> > is where they say install Solr the first step create resources: the
> > basicconfig file does not exist at all in the Solr packet..I can't
> connect
> > because Solr is missing files that are required in the setup process.
> Maybe
> > try to install this in the Nutch directory..I don't know but I'm going to
> > figure it out. Thank you for your help😊
>
> The config example included with Solr that is considered "default" used
> to be called basic_configs.  In more recent versions, it has been
> renamed to _default. The name does include the leading underscore as I
> have written it.
>
> Nutch should not be relying on example configs included with Solr.
> Those can easily change in new versions to be something that's not
> compatible with their software.  They should be completely supplying the
> entire configuration (the "nutch" configset).  This includes the schema
> and solrconfig.xml, as well as any other config files referenced by
> those two.  Different configs for different Solr versions might become
> necessary ... they will need to be prepared for that.
>
> Thanks,
> Shawn
>
>


Re: Encoding issue in solr

2018-10-05 Thread Tim Allison
This is probably caused by an encoding detection problem in Nutch and/or
Tika. If you can share the file on the Tika user’s list, I can take a look.

On Fri, Oct 5, 2018 at 7:11 AM UMA MAHESWAR 
wrote:

> HI ALL,
>
> while i am using nutch for crawling and indexing in to solr,while storing
> data in to solr encoding issue facing
>
>
> in site  having the title
>
> title : ebm-papst Motoren & Ventilatoren GmbH - Axialventilatoren und
> Radialventilatoren aus Linz, Österreich
>
> but in solr storing in the below format
>
> title": "ebm-papst Motoren & Ventilatoren GmbH - Axialventilatoren und
> Radialventilatoren aus Linz, Österrei",
>
> suggest me how to store actual data in to solr .
>
> thanks for your suggestions.
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Does SolrJ support JSON DSL?

2018-10-05 Thread Chris Hostetter


: There's nothing out-of-the-box.

Which is to say: there are no explicit convenience methods for it, but you 
can absolutely use the JSON DSL and JSON facets via SolrJ and the 
QueryRequest -- just add the param key=value that you want, where the 
value is the JSON syntax...

ModifiableSolrParams p = new ModifiableSolrParams()
p.add("json.facet","{ ... }");
// and/or: p.add("json", "{ ... }");
QueryRequest req = new QueryRequest(p, SolrRequest.METHOD.POST);
QueryResponse rsp = req.process(client);

: On Fri, Oct 5, 2018 at 5:34 PM Alexandre Rafalovitch 
: wrote:
: 
: > Hi,
: >
: > Does anybody know if it is possible to do the new JSON DSL and JSON
: > Facets requests via SolrJ. The SolrJ documentation is a bit sparse and
: > I don't often use it. So, I can't figure out if there is a direct
: > support or even a pass-through workaround.
: >
: > Thank you,
: >Alex.
: >
: 
: 
: -- 
: Sincerely yours
: Mikhail Khludnev
: 

-Hoss
http://www.lucidworks.com/