Solr stops working...randomly

Michael Beccaria Wed, 24 Feb 2016 10:20:03 -0800

We're running solr 4.4.0 running in this software 
(https://github.com/CDRH/nebnews - Django based newspaper site). Solr is 
running on Ubuntu 12.04 in Jetty. The site occasionally (once a day) goes down 
with a Connection Refused error. I’m having a hard time troubleshooting the 
issue and was looking for help in next steps in trying to find out why it is 
failing.




After debugging it turns out that it is solr that is refusing the connection 
(restarting Jetty fixes it every time). It randomly fails.



things I've tried:

running sudo service jetty check

Says the service is running



Opened up the port on the server and tried going to the solr admin page. This 
failed until I restarted jetty, then it works.



Checked the solr.log files and no errors are found. The jetty log level is set 
to INFO and I'm hesitant to put it to Debug because of file size growth and the 
long time between failures. The time between failures in the logs simply has a 
normal query at one time followed by a startup log sequence when I restart 
jetty.



Apache logs show tons of traffic (it's still running) from Google bots and 
maybe this is causing issues but I would still expect to find some sort of 
error. There is a mix of 200, 500 and 404 codes. Here's a small sample:


GET /lccn/sn85053037/1981-09-15/ed-1/seq-13/ocr/ HTTP/1.1

500

14814

-

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

GET /lccn/sn86075296/1910-10-27/ed-1/seq-1/ HTTP/1.1

500

14884

-

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

GET /lccn/sn84036028/1925-05-22/ed-1/seq-6/ocr/ HTTP/1.1

500

14791

-

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

GET /lccn/sn84036028/1917-10-28/ed-1/seq-1/ocr.xml HTTP/1.1

200

400827

-

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

GET /lccn/TheRetort/2011-10-07/ed-1/seq-10/ocr/ HTTP/1.1

500

14798

-

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

GET /lccn/TheRetort/1979-05-10/ed-1/seq-8/ocr.xml HTTP/1.1

200

193883

-

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

GET /lccn/sn84036124/1977-02-23/ed-1/seq-12/ocr/ HTTP/1.1

500

14790

-

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

GET /lccn/Emcoe/1958-11-21/ed-1/seq-3/ocr/ HTTP/1.1

500

14760

-

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

GET /lccn/sn85053252/1909-10-08/ed-1/.rdf HTTP/1.1

404

3051

-

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)






I could simply restart jetty nightly I guess but that seems to be putting a 
bandaid on the issue and I'm not sure how to proceed on this one. Any ideas?



Mike







Mike Beccaria

Director of Library Services

Paul Smith’s College

7833 New York 30

Paul Smiths, NY 12970

518.327.6376

mbecca...@paulsmiths.edu

www.paulsmiths.edu



-----Original Message-----

From: roland.sz...@booknwalk.com [mailto:roland.sz...@booknwalk.com] On Behalf 
Of Szucs Roland

Sent: Wednesday, February 24, 2016 1:10 PM

To: solr-user@lucene.apache.org

Subject: Re: very slow frequent updates



Thanks again Jeff. I will check the documentation of join queries becasue I 
never used it before.



Regards



Roland



2016-02-24 19:07 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>:



>

> I suspect your problem is the intersection of “very large document”

> and “high rate of change”. Either of those alone would be fine.

>

> You’re correct, if the thing you need to search or sort by is the

> thing with a high change rate, you probably aren’t going to be able to

> peel those things out of your index.

>

> Perhaps you could work something out with join queries? So you have

> two kinds of documents - book content and book price - and your

> high-frequency change is limited to documents with very little data.

>

>

>

>

>

> On 2/24/16, 4:01 AM, "roland.sz...@booknwalk.com on behalf of Szűcs

> Roland" <roland.sz...@booknwalk.com on behalf of

> szucs.rol...@bookandwalk.hu> wrote:

>

> >I have checked it already in the ref. guide. It is stated that you

> >can not search in external fields:

> >

> https://cwiki.apache.org/confluence/display/solr/Working+with+External

> +Files+and+Processes

> >

> >Really I am very curios that my problem is not a usual one or the

> >case is that SOLR mainly focuses on search and not a kind of end-to-end 
> >support.

> >How this approach works with 1 million documents with frequently

> >changing prices?

> >

> >Thanks your time,

> >

> >Roland

> >

> >2016-02-24 12:39 GMT+01:00 Stefan Matheis <matheis.ste...@gmail.com>:

> >

> >> Depending of what features you do actually need, might be worth a

> >> look on "External File Fields" Roland?

> >>

> >> -Stefan

> >>

> >> On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland

> >> <szucs.rol...@bookandwalk.hu> wrote:

> >> > Thanks Jeff your help,

> >> >

> >> > Can it work in production environment? Imagine when my customer

> initiate

> >> a

> >> > query having 1 000 docs in the result set. I can not use the

> pagination

> >> of

> >> > SOLR as the field which is the basis of the sort is not included

> >> > in

> the

> >> > schema for example the price. The customer wants the list in

> descending

> >> > order of the price.

> >> >

> >> > So I have to get all the 1000 docids from solr and find the

> >> > metadata

> of

> >> > them in a sql database or in cache in best case. This is the way

> >> > you suggested? Is it not too slow?

> >> >

> >> > Regards,

> >> > Roland

> >> >

> >> > 2016-02-23 19:29 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>:

> >> >

> >> >>

> >> >> My suggestion would be to split your problem domain. Use Solr

> >> exclusively

> >> >> for search - index the id and only those fields you need to

> >> >> search

> on.

> >> Then

> >> >> use some other data store for retrieval. Get the id’s from the

> >> >> solr results, and look them up in the data store to get the rest

> >> >> of your

> >> fields.

> >> >> This allows you to keep your solr docs as small as possible, and

> >> >> you

> >> only

> >> >> need to update them when a *searchable* field changes.

> >> >>

> >> >> Every “update" in solr is a delete/insert. Even the "atomic update”

> >> >> feature is just a shortcut for that. It requires stored fields

> because

> >> the

> >> >> data from the stored fields gets copied into the new insert.

> >> >>

> >> >>

> >> >>

> >> >>

> >> >>

> >> >> On 2/22/16, 12:21 PM, "Roland Szűcs"

> >> >> <roland.sz...@booknwalk.com>

> >> wrote:

> >> >>

> >> >> >Hi folks,

> >> >> >

> >> >> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority

> >> >> >of

> the

> >> >> >fields do not change at all like content, author, publisher....

> >> >> >Only

> >> the

> >> >> >price field changes frequently.

> >> >> >

> >> >> >We let the customers to make full text search so we indexed the

> content

> >> >> >filed. Due to the frequency of the price updates we use the

> >> >> >atomic

> >> update

> >> >> >feature. As a requirement of the atomic updates we have to

> >> >> >store all

> >> the

> >> >> >fields even the content field which is 1MB/document and we did

> >> >> >not

> >> want to

> >> >> >store it just index it.

> >> >> >

> >> >> >As we wanted to update 100 documents with atomic update it took

> about 3

> >> >> >minutes. Taking into account that our metadata /document is 1

> >> >> >Kb and

> >> our

> >> >> >content field / document is 1MB we use 1000 more memory to

> accelerate

> >> the

> >> >> >update process.

> >> >> >

> >> >> >I am almost 100% sure that we make something wrong.

> >> >> >

> >> >> >What is the best practice of the frequent updates when 99% part

> >> >> >of a

> >> given

> >> >> >document is constant forever?

> >> >> >

> >> >> >Thank in advance

> >> >> >

> >> >> >--

> >> >> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>

> Roland

> >> >> Szűcs

> >> >> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>

> Connect

> >> >> with

> >> >> >me on Linkedin <

> >> >> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>

> >> >> ><https://bookandwalk.hu/>

> >> >> >CEO Phone: +36 1 210 81 13

> >> >> >Bookandwalk.hu <https://bokandwalk.hu/>

> >> >>

> >> >

> >> >

> >> >

> >> > --

> >> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>

> >> > Szűcs

> >> Roland

> >> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>

> >> Ismerkedjünk

> >> > meg a Linkedin <

> >> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>

> >> > -en <https://bookandwalk.hu/>

> >> > Ügyvezető Telefon: +36 1 210 81 13 Bookandwalk.hu

> >> > <https://bokandwalk.hu/>

> >>

> >

> >

> >

> >--

> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs

> Roland

> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>

> Ismerkedjünk

> >meg a Linkedin <

> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>

> >-en <https://bookandwalk.hu/>

> >Ügyvezető Telefon: +36 1 210 81 13

> >Bookandwalk.hu <https://bokandwalk.hu/>

>







--

<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs Roland 
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Ismerkedjünk meg 
a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>

-en <https://bookandwalk.hu/>

Ügyvezető Telefon: +36 1 210 81 13

Bookandwalk.hu <https://bokandwalk.hu/>

Solr stops working...randomly

Reply via email to