We're running solr 4.4.0 running in this software (https://github.com/CDRH/nebnews - Django based newspaper site). Solr is running on Ubuntu 12.04 in Jetty. The site occasionally (once a day) goes down with a Connection Refused error. I’m having a hard time troubleshooting the issue and was looking for help in next steps in trying to find out why it is failing.
After debugging it turns out that it is solr that is refusing the connection (restarting Jetty fixes it every time). It randomly fails. things I've tried: running sudo service jetty check Says the service is running Opened up the port on the server and tried going to the solr admin page. This failed until I restarted jetty, then it works. Checked the solr.log files and no errors are found. The jetty log level is set to INFO and I'm hesitant to put it to Debug because of file size growth and the long time between failures. The time between failures in the logs simply has a normal query at one time followed by a startup log sequence when I restart jetty. Apache logs show tons of traffic (it's still running) from Google bots and maybe this is causing issues but I would still expect to find some sort of error. There is a mix of 200, 500 and 404 codes. Here's a small sample: GET /lccn/sn85053037/1981-09-15/ed-1/seq-13/ocr/ HTTP/1.1 500 14814 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) GET /lccn/sn86075296/1910-10-27/ed-1/seq-1/ HTTP/1.1 500 14884 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) GET /lccn/sn84036028/1925-05-22/ed-1/seq-6/ocr/ HTTP/1.1 500 14791 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) GET /lccn/sn84036028/1917-10-28/ed-1/seq-1/ocr.xml HTTP/1.1 200 400827 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) GET /lccn/TheRetort/2011-10-07/ed-1/seq-10/ocr/ HTTP/1.1 500 14798 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) GET /lccn/TheRetort/1979-05-10/ed-1/seq-8/ocr.xml HTTP/1.1 200 193883 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) GET /lccn/sn84036124/1977-02-23/ed-1/seq-12/ocr/ HTTP/1.1 500 14790 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) GET /lccn/Emcoe/1958-11-21/ed-1/seq-3/ocr/ HTTP/1.1 500 14760 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) GET /lccn/sn85053252/1909-10-08/ed-1/.rdf HTTP/1.1 404 3051 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) I could simply restart jetty nightly I guess but that seems to be putting a bandaid on the issue and I'm not sure how to proceed on this one. Any ideas? Mike Mike Beccaria Director of Library Services Paul Smith’s College 7833 New York 30 Paul Smiths, NY 12970 518.327.6376 mbecca...@paulsmiths.edu www.paulsmiths.edu -----Original Message----- From: roland.sz...@booknwalk.com [mailto:roland.sz...@booknwalk.com] On Behalf Of Szucs Roland Sent: Wednesday, February 24, 2016 1:10 PM To: solr-user@lucene.apache.org Subject: Re: very slow frequent updates Thanks again Jeff. I will check the documentation of join queries becasue I never used it before. Regards Roland 2016-02-24 19:07 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>: > > I suspect your problem is the intersection of “very large document” > and “high rate of change”. Either of those alone would be fine. > > You’re correct, if the thing you need to search or sort by is the > thing with a high change rate, you probably aren’t going to be able to > peel those things out of your index. > > Perhaps you could work something out with join queries? So you have > two kinds of documents - book content and book price - and your > high-frequency change is limited to documents with very little data. > > > > > > On 2/24/16, 4:01 AM, "roland.sz...@booknwalk.com on behalf of Szűcs > Roland" <roland.sz...@booknwalk.com on behalf of > szucs.rol...@bookandwalk.hu> wrote: > > >I have checked it already in the ref. guide. It is stated that you > >can not search in external fields: > > > https://cwiki.apache.org/confluence/display/solr/Working+with+External > +Files+and+Processes > > > >Really I am very curios that my problem is not a usual one or the > >case is that SOLR mainly focuses on search and not a kind of end-to-end > >support. > >How this approach works with 1 million documents with frequently > >changing prices? > > > >Thanks your time, > > > >Roland > > > >2016-02-24 12:39 GMT+01:00 Stefan Matheis <matheis.ste...@gmail.com>: > > > >> Depending of what features you do actually need, might be worth a > >> look on "External File Fields" Roland? > >> > >> -Stefan > >> > >> On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland > >> <szucs.rol...@bookandwalk.hu> wrote: > >> > Thanks Jeff your help, > >> > > >> > Can it work in production environment? Imagine when my customer > initiate > >> a > >> > query having 1 000 docs in the result set. I can not use the > pagination > >> of > >> > SOLR as the field which is the basis of the sort is not included > >> > in > the > >> > schema for example the price. The customer wants the list in > descending > >> > order of the price. > >> > > >> > So I have to get all the 1000 docids from solr and find the > >> > metadata > of > >> > them in a sql database or in cache in best case. This is the way > >> > you suggested? Is it not too slow? > >> > > >> > Regards, > >> > Roland > >> > > >> > 2016-02-23 19:29 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>: > >> > > >> >> > >> >> My suggestion would be to split your problem domain. Use Solr > >> exclusively > >> >> for search - index the id and only those fields you need to > >> >> search > on. > >> Then > >> >> use some other data store for retrieval. Get the id’s from the > >> >> solr results, and look them up in the data store to get the rest > >> >> of your > >> fields. > >> >> This allows you to keep your solr docs as small as possible, and > >> >> you > >> only > >> >> need to update them when a *searchable* field changes. > >> >> > >> >> Every “update" in solr is a delete/insert. Even the "atomic update” > >> >> feature is just a shortcut for that. It requires stored fields > because > >> the > >> >> data from the stored fields gets copied into the new insert. > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> On 2/22/16, 12:21 PM, "Roland Szűcs" > >> >> <roland.sz...@booknwalk.com> > >> wrote: > >> >> > >> >> >Hi folks, > >> >> > > >> >> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority > >> >> >of > the > >> >> >fields do not change at all like content, author, publisher.... > >> >> >Only > >> the > >> >> >price field changes frequently. > >> >> > > >> >> >We let the customers to make full text search so we indexed the > content > >> >> >filed. Due to the frequency of the price updates we use the > >> >> >atomic > >> update > >> >> >feature. As a requirement of the atomic updates we have to > >> >> >store all > >> the > >> >> >fields even the content field which is 1MB/document and we did > >> >> >not > >> want to > >> >> >store it just index it. > >> >> > > >> >> >As we wanted to update 100 documents with atomic update it took > about 3 > >> >> >minutes. Taking into account that our metadata /document is 1 > >> >> >Kb and > >> our > >> >> >content field / document is 1MB we use 1000 more memory to > accelerate > >> the > >> >> >update process. > >> >> > > >> >> >I am almost 100% sure that we make something wrong. > >> >> > > >> >> >What is the best practice of the frequent updates when 99% part > >> >> >of a > >> given > >> >> >document is constant forever? > >> >> > > >> >> >Thank in advance > >> >> > > >> >> >-- > >> >> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > Roland > >> >> Szűcs > >> >> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > Connect > >> >> with > >> >> >me on Linkedin < > >> >> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > >> >> ><https://bookandwalk.hu/> > >> >> >CEO Phone: +36 1 210 81 13 > >> >> >Bookandwalk.hu <https://bokandwalk.hu/> > >> >> > >> > > >> > > >> > > >> > -- > >> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > >> > Szűcs > >> Roland > >> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > >> Ismerkedjünk > >> > meg a Linkedin < > >> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > >> > -en <https://bookandwalk.hu/> > >> > Ügyvezető Telefon: +36 1 210 81 13 Bookandwalk.hu > >> > <https://bokandwalk.hu/> > >> > > > > > > > >-- > ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs > Roland > ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > Ismerkedjünk > >meg a Linkedin < > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > >-en <https://bookandwalk.hu/> > >Ügyvezető Telefon: +36 1 210 81 13 > >Bookandwalk.hu <https://bokandwalk.hu/> > -- <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs Roland <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Ismerkedjünk meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> -en <https://bookandwalk.hu/> Ügyvezető Telefon: +36 1 210 81 13 Bookandwalk.hu <https://bokandwalk.hu/>