Re: Solr Full Import frozen after indexing a fixed number of records
On 27 July 2014 12:13, Aniket Bhoi wrote: > On Fri, Jul 25, 2014 at 8:32 PM, Aniket Bhoi > wrote: > > > I have Apache Solr,hosted on my apache Tomcat Server with SQLServer > > Backend. > > [...] > > After I run a full import,Indexing proceeds sucessfully,but seems to > > freeze everytime after fetching fixed number of records.What I mean is > > after it fetches 10730 records it just freezes and doesnt process any > more. > > > > Excerpt from dataimport.xml: > > > > > > 0:15:31.959 > > 0 > > *10730* > > 3579 > > 0 > > 2014-07-25 10:44:39 > > > > This seems to happen everytime. > > > > I checked the tomcatlog.Following is the excerpt when Solr freezes: > > > > INFO: Generating record for Unique ID :null attachment Ref:null > > parent ref :nullexecuted by thread:25 > [...] Something is wrong with your DIH config file: You seem to be getting null for a document unique ID. Please share the file with us. Regards, Gora
Re: integrating Accumulo with solr
Right, and that's exactly what DataStax Enterprise provides (at great engineering effort!) - synchronization of database updates and search indexing. Sure, you can do it as well, but that's a significant engineering challenge with both sides of the equation, and not a simple "plug and play" configuration setting by writing a simple "connector." But, hey, if you consider yourself one of those "true hard-core gunslingers" then you'll be able to code that up in a weekend without any of our assistance, right? In short, synchronizing two data stores is a real challenge. Yes, it is doable, but... it is non-trivial. Especially if both stores are distributed clusters. Maybe now you can guess why the Sqrrl guys went the Lucene route instead of Solr. I'm certainly not suggesting that it can't be done. Just highlighting the challenge of such a task. Just to be clear, you are referring to "sync mode" and not mere "ETL", which people do all the time with batch scripts, Java extraction and ingestion connectors, and cron jobs. Give it a shot and let us know how it works out. -- Jack Krupansky -Original Message- From: Ali Nazemian Sent: Sunday, July 27, 2014 1:20 AM To: solr-user@lucene.apache.org Subject: Re: integrating Accumulo with solr Dear Jack, Hi, One more thing to mention: I dont want to use solr or lucence for indexing accumulo or full text search inside that. I am looking for have both in a sync mode. I mean import some parts of data to solr for indexing. For this purpose probably I need something like trigger in RDBMS, I have to define something (probably with accumulo iterator) to import to solr on inserting new data. Regards. On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian wrote: Dear Jack, Actually I am going to do benefit-cost analysis for in-house developement or going for sqrrl support. Best regards. On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky wrote: Like I said, you're going to have to be a real, hard-core gunslinger to do that well. Sqrrl uses Lucene directly, BTW: "Full-Text Search: Utilizing open-source Lucene and custom indexing methods, Sqrrl Enterprise users can conduct real-time, full-text search across data in Sqrrl Enterprise." See: http://sqrrl.com/product/search/ Out of curiosity, why are you not using that integrated Lucene support of Sqrrl Enterprise? -- Jack Krupansky -Original Message- From: Ali Nazemian Sent: Thursday, July 24, 2014 3:07 PM To: solr-user@lucene.apache.org Subject: Re: integrating Accumulo with solr Dear Jack, Thank you. I am aware of datastax but I am looking for integrating accumulo with solr. This is something like what sqrrl guys offer. Regards. On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky wrote: If you are not a "true hard-core gunslinger" who is willing to dive in and integrate the code yourself, instead you should give serious consideration to a product such as DataStax Enterprise that fully integrates and packages a NoSQL database (Cassandra) and Solr for search. The security aspects are still a work in progress, but certainly headed in the right direction. And it has Hadoop and Spark integration as well. See: http://www.datastax.com/what-we-offer/products-services/ datastax-enterprise -- Jack Krupansky -Original Message- From: Ali Nazemian Sent: Thursday, July 24, 2014 10:30 AM To: solr-user@lucene.apache.org Subject: Re: integrating Accumulo with solr Thank you very much. Nice Idea but how can Solr and Accumulo can be synchronized in this way? I know that Solr can be integrated with HDFS and also Accumulo works on the top of HDFS. So can I use HDFS as integration point? I mean set Solr to use HDFS as a source of documents as well as the destination of documents. Regards. On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock wrote: Ali, Sounds like a good choice. It's pretty standard to store the primary storage id as a field in Solr so that you can search the full text in Solr and then retrieve the full document elsewhere. I would recommend creating a document structure in Solr with whatever fields you want indexed (most likely as text_en, etc.), and then store a "string" field named "content_id", which would be the Accumulo row id that you look up with a scan. One caveat -- Accumulo will be protected at the cell level, but if you need your Solr search results to be protected by complex authorization strings similar to Accumulo, you will need to write your own QParserPlugin and use post filtering: http://java.dzone.com/articles/custom-security-filtering-solr The code you see in that article is written for an earlier version of Solr, but it's not too difficult to adjust it for the latest (we've done so in our project). Once you've implemented this, you would store an "authorizations" string field in each Solr document, and pass in the authorizations that the user has access to in the fq parameter of every query. It's also not too bad to write something that parses the A
Re: solr always loading and not any response
I always get the "Loading" message on the Solr Admin Console if I use IE. However - the page loads perfectly fine when I use Google Chrome or Mozilla Firefox. Could you check if your problem resolves itself if you use a different browser ??? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-always-loading-and-not-any-response-tp4148960p4149341.html Sent from the Solr - User mailing list archive at Nabble.com.
Content-Charset header in HttpSolrServer
I was reviewing the httpclient code in HttpSolrServer and noticed that it sets a "Content-Charset" header. As far as I know this is not a real header and is not necessary. Anyone know a reason for this to be there? I'm guessing this was just a mistake when converting from httpclient3 to httpclient4. -Michael
Re: Latest jetty
Yes, we are on Java7 so we can move now. I'll open an issue. On Sun, Jul 27, 2014 at 5:39 AM, Bill Bell wrote: > Since we are now on latest Java JDK can we move to Jetty 9? > > Thoughts ? > > Bill Bell > Sent from mobile > > -- Regards, Shalin Shekhar Mangar.
Re: Latest jetty
I found SOLR-4839 so we'll use that issue. https://issues.apache.org/jira/browse/SOLR-4839 On Sun, Jul 27, 2014 at 8:06 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Yes, we are on Java7 so we can move now. I'll open an issue. > > > On Sun, Jul 27, 2014 at 5:39 AM, Bill Bell wrote: > >> Since we are now on latest Java JDK can we move to Jetty 9? >> >> Thoughts ? >> >> Bill Bell >> Sent from mobile >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > -- Regards, Shalin Shekhar Mangar.
Subject=How to Get Highlighting Working in Velocity (Solr 4.8.0)
May be you miss that your field "dom_title" should be index="true" termVectors="true" termPositions="true" termOffsets="true"
Re: how to achieve static boost in solr
Yep, Query Elevation is a pretty blunt instrument. You should be able to get the configuration file to re-load by issuing a reload command rather than re-starting. But your problem of having a bunch of different queries return the same top doc is, indeed, the problem. You need a complete list of query terms and each one needs an entry. The only real alternative is to be able to somehow encode the _reason_ these docs need to be returned first, which you haven't articulated. If it's an arbitrary reason (i.e. "sponsored search" or some such) it's pretty hard because there's no rule to turn into an algorithm. Best, Erick On Thu, Jul 24, 2014 at 4:30 AM, rahulmodi wrote: > Thanks a lot Erick, > > i have looked at Query Elevation Component, it works but the problem is if > i > need to add new tag or update existing tag in elevate.xml > file then i need to restart the server in order to take effect. > > I have also used "forceElevation=true" even then it requires restarting > server. > > Is there any way by which we can achieve this without restarting server. > > Also, there is another issue is that it works only when we use exact query, > example is below: > elevate.xml file has entry like:- > > > http://welcome.energy.com/"; /> > > > if i use "energy" as query then i get correct url as > "http://welcome.energy.com/"; > But if i use "power energy" as query then i get another url but here also i > want the url "http://welcome.energy.com/"; to be displayed. > > Please suggest how to achieve this. > Thanks in advance. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-achieve-static-boost-in-solr-tp4148788p4148999.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Auto Suggest
No, although there's been some joy with using shingles. Autosuggest works off of the _indexed tokens_. So the problem is really reducing the tokenization to something that is multi-word. Best, Erick On Thu, Jul 24, 2014 at 5:11 AM, benjelloun wrote: > Hello, > > Did solr.SuggestComponent work on MultiValued Field to Auto suggest not > only > one word but the whole sentence? > > indexed="true"/> > > Regards, > Anass BENJELLOUN > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Auto-Suggest-tp4149004.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: how to achieve static boost in solr
Hi, Can't you use elevateIds parameter? https://wiki.apache.org/solr/QueryElevationComponent#elevateIds.2FexcludeIds On Thursday, July 24, 2014 2:30 PM, rahulmodi wrote: Thanks a lot Erick, i have looked at Query Elevation Component, it works but the problem is if i need to add new tag or update existing tag in elevate.xml file then i need to restart the server in order to take effect. I have also used "forceElevation=true" even then it requires restarting server. Is there any way by which we can achieve this without restarting server. Also, there is another issue is that it works only when we use exact query, example is below: elevate.xml file has entry like:- http://welcome.energy.com/"; /> if i use "energy" as query then i get correct url as "http://welcome.energy.com/"; But if i use "power energy" as query then i get another url but here also i want the url "http://welcome.energy.com/"; to be displayed. Please suggest how to achieve this. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-achieve-static-boost-in-solr-tp4148788p4148999.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Passivate core in Solr Cloud
"Does not play nice" really means it was designed to run in a non-distributed mode. There has been no work done to verify that it does work in cloud mode, I fully expect some "interesting" problems in that mode. If/when we get to it that is. About replication: I haven't heard of any problems, but I also haven't heard of it working in that environment. I expect that it'll only try to replicate when it's loaded, so that might be interesting Best, Erick On Thu, Jul 24, 2014 at 6:49 AM, Aurélien MAZOYER < aurelien.mazo...@francelabs.com> wrote: > Thank you Erick and Alex for your answers. Lots of core stuff seems to > meet my requirement but it is a problem if it does not work with Solr > Cloud. Is there an issue opened for this problem? > If I understand well, the only solution for me is to use multiple > monoinstances of Solr using transient cores and to distribute manually the > cores for my tenant (I assume the LRU mechanimn will be less effective as > it will be done per solr instance). > When you say "does NOT play nice with distributed mode", does it also > include the standard replication mecanism? > > Thanks, > > Regards, > > Aurelien > > > > Le 23/07/2014 17:21, Erick Erickson a écrit : > > Do note that the lots of cores stuff does NOT play nice with in >> distributed mode (yet). >> >> Best, >> Erick >> >> >> On Wed, Jul 23, 2014 at 6:00 AM, Alexandre Rafalovitch> > >> wrote: >> >> Solr has some support for large number of cores, including transient >>> cores:http://wiki.apache.org/solr/LotsOfCores >>> >>> Regards, >>> Alex. >>> Personal:http://www.outerthoughts.com/ and @arafalov >>> Solr resources:http://www.solr-start.com/ and @solrstart >>> Solr popularizers community:https://www.linkedin.com/groups?gid=6713853 >>> >>> >>> On Wed, Jul 23, 2014 at 7:55 PM, Aurélien MAZOYER >>> wrote: >>> Hello, We want to setup a Solr Cloud cluster in order to handle a high volume of documents with a multi-tenant architecture. The problem is that an application-level isolation for a tenant (using a mutual index with a >>> field >>> "customer") is not enough to fit our requirements. As a result, we need 1 collection/customer. There is more than a thousand customers and it seems unreasonable to create thousands of collections in Solr Cloud... But as >>> we >>> know that there are less than 1 query/customer/day, we are currently >>> looking >>> for a way to passivate collection when they are not in use. Can it be a >>> good >>> idea? If yes, are there best practices to implement this? What side >>> effects >>> can we expect? Do we need to put some application-level logic on top on >>> the >>> Solr Cloud cluster to choose which collection we have to unload (and >>> maybe >>> there is something smarter (and quicker?) than simply loading/unloading >>> the >>> core when it is not in used?) ? Thank you for your answer(s), Aurelien >
Re: To warm the whole cache of Solr other than the only autowarmcount
Why do you think you _need_ to autowarm the entire cache? It is, after all, an LRU cache, the theory being that the most recent queries are most likely to be reused. Personally I'd run some tests on using small autowarm counts before getting at all mixed up in some complex scheme that may not be useful at all. Say an autowarm count of 16. Then measure using that, then say 32 then... Insure you have a real problem before worrying about a solution! ;) Best, Erick On Fri, Jul 25, 2014 at 6:45 AM, Shawn Heisey wrote: > On 7/24/2014 8:45 PM, YouPeng Yang wrote: > > To Matt > > > > Thank you,your opinion is very valuable ,So I have checked the source > > codes about how the cache warming up. It seems to just put items of the > > old caches into the new caches. > > I will pull Mark Miller into this discussion.He is the one of the > > developer of the Solr whom I had contacted with. > > > > To Mark Miller > > > >Would you please check out what we are discussing in the last two > > posts.I need your help. > > Matt is completely right. Any commit can drastically change the Lucene > document id numbers. It would be too expensive to determine which > numbers haven't changed. That means Solr must throw away all cache > information on commit. > > Two of Solr's caches support autowarming. Those caches use queries as > keys and results as values. Autowarming works by re-executing the top N > queries (keys) in the old cache to obtain fresh Lucene document id > numbers (values). The cache code does take *keys* from the old cache > for the new cache, but not *values*. I'm very sure about this, as I > wrote the current (and not terribly good) LFUCache. > > Thanks, > Shawn > >
Re: SolrCloud extended warmup support
H, well _I_ don't know what to say then This is puzzling. How much of a latency difference are you seeing? It'd be interesting to see what happens if you experiment with only going to a single shard (add &distrib=false to the query). Each cache is local to the shard, so it's vaguely possible that you're seeing queries hit different shards and in aggregate reduce your total latency. But I'm really shooting in the dark here. Best, Erick On Mon, Jul 21, 2014 at 5:57 PM, Erick Erickson wrote: > I've never seen it necessary to run "thousands of queries" > to warm Solr. Usually less than a dozen will work fine. My > challenge would be for you to measure performance differences > on queries after running, say, 12 well-chosen queries as > opposed to hundreds/thousands. I bet that if > 1> you search across all the relevant fields, you'll fill up the > low-level caches for those fields. > 2> you facet on all the fields you intend to facet on. > 3> you sort on all the fields you intend to sort on. > 4> you specify some filter queries. This is fuzzy since > really depends on you being able to predict what > those will be for firstSearcher. Things like "in the > last day/week/month" can be pre-configured, but > others you won't get. BTW, here's a blog about > why "in the last day" fq clauses can be tricky. >http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/ > > that you'll pretty much nail warmup and be fine. Note that > you can do all the faceting on a single query. Specifying > the primary, secondary & etc. sorts will fill those caches. > > Best, > Erick > > > On Mon, Jul 21, 2014 at 5:07 PM, Jeff Wartes > wrote: > >> >> On 7/21/14, 4:50 PM, "Shawn Heisey" wrote: >> >> >On 7/21/2014 5:37 PM, Jeff Wartes wrote: >> >> I¹d like to ensure an extended warmup is done on each SolrCloud node >> >>prior to that node serving traffic. >> >> I can do certain things prior to starting Solr, such as pump the index >> >>dir through /dev/null to pre-warm the filesystem cache, and post-start I >> >>can use the ping handler with a health check file to prevent the node >> >>from entering the clients load balancer until I¹m ready. >> >> What I seem to be missing is control over when a node starts >> >>participating in queries sent to the other nodes. >> >> >> >> I can, of course, add solrconfig.xml firstSearcher queries, which I >> >>assume (and fervently hope!) happens before a node registers itself in >> >>ZK clusterstate.json as ready for work, but that doesn¹t scale so well >> >>if I want that initial warmup to run thousands of queries, or run them >> >>with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m sensitive >> >>to the size. >> >> >> >> Any ideas, or corrections to my assumptions? >> > >> >I think that firstSearcher/newSearcher (and making sure useColdSearcher >> >is set to false) is going to be the only way you can do this in a way >> >that's compatible with SolrCloud. If you were doing manual distributed >> >search without SolrCloud, you'd have more options available. >> > >> >If useColdSearcher is set to false, that should keep *everything* from >> >using the searcher until the warmup has finished. I cannot be certain >> >that this is the case, but I have some reasonable confidence that this >> >is how it works. If you find that it doesn't behave this way, I'd call >> >it a bug. >> > >> >Thanks, >> >Shawn >> >> >> Thanks for the quick reply. Since distributed search latency is the max of >> the shard sub-requests, I¹m trying my best to minimize any spikes in >> cluster latency due to node restarts. >> I double-checked useColdSearcher was false, but the doc says this means >> requests ³block until the first searcher is done warming², which >> translates pretty clearly to ³latency spike². The more I think about it, >> the more worried I am that a node might indeed register itself in >> live_nodes and get distributed requests before it¹s got a searcher to work >> with. *Especially* if I have lots of serial firstSearcher queries. >> >> I¹ll look through the code myself tomorrow, but if anyone can help >> confirm/deny the order of operations here, I¹d appreciate it. >> >> >
Re: Slow inserts when using Solr Cloud
bq: Whoa! That's awesome! And scary. Ian: Thanks a _lot_ for trying this out and reporting back. Also, let me say that this was a nice writeup, I wish more people would post as thorough a problem statement! Best, Erick On Sat, Jul 26, 2014 at 5:08 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Whoa! That's awesome! > > > On Fri, Jul 25, 2014 at 8:03 PM, ian wrote: > > > I've built and installed the latest snapshot of Solr 4.10 using the same > > SolrCloud configuration and that gave me a tenfold increase in > throughput, > > so it certainly looks like SOLR-6136 was the issue that was causing my > slow > > insert rate/high latency with shard routing and replicas. Thanks for > your > > help. > > > > > > Timothy Potter wrote > > > Hi Ian, > > > > > > What's the CPU doing on the leader? Have you tried attaching a > > > profiler to the leader while running and then seeing if there are any > > > hotspots showing. Not sure if this is related but we recently fixed an > > > issue in the area of leader forwarding to replica that used too many > > > CPU cycles inefficiently - see SOLR-6136. > > > > > > Tim > > > > > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4149219.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: how to extract stats component with solrj 4.9.0
Have you tried the getFieldStatsInfo method in the QueryResponse object? Best, Erick On Sat, Jul 26, 2014 at 3:36 PM, Edith Au wrote: > I have a solr query like this > > q=categories:cat1 OR > categories:cat2&stats=true&stats.field=count&stats.facet=block_num > > Basically, I want to get the sum(count) group by block num. > > > This query works on a browser. But with solrj, I could not access the stats > fields from the Response obj. I can do a response.getFieldStatsInfo(). But > it is not what I want. Here is how I construct the query > > SolrQuery query = new SolrQuery(q); > query.add("stats", "true"); > query.add("stats.field", "count"); > query.add("stats.facet", "block_num"); > > With a debugger, I could see that the response has a private statsInfo > object and it has the information I am looking for. But there is no api to > access the private object. > > I would like to know if there is > >1. a better way to construct my query. I only need the sum of (count), >group by block num >2. a way to access the hidden statsInfo object in the query response()? >[it is so frustrated. I can see all the info I need in the private obj > on >my debugger!] > > Thanks! > > > ps. I posted this question on stackoverflow but have gotten no response so > far. Any help will be greatly appreciated! > > Thanks! >
Re: /solr/admin/ping causing exceptions in log?
On 7/26/2014 5:15 PM, Nathan Neulinger wrote: > Recently deployed haproxy in front of my solr instances, and seeing a > large number of exceptions in the logs now... Example below. I can pound > the server with requests against /solr/admin/ping via curl, with no > obvious issue, but the haproxy checks appear to be aggravating something. > > Solr 4.8.0 w/ solr cloud, 2 nodes, 3 zk, linux x86_64 > > It seems like when the issue occurs, I get a set of the errors all in a > burst (below), never just one. > > Suggestions? > > -- Nathan > > > Nathan Neulinger nn...@neulinger.org > Neulinger Consulting (573) 612-1412 > > > > 2014-07-26 23:04:36,506 ERROR qtp1532385072-4864 > [g.apache.solr.servlet.SolrDispatchFilter] - > null:org.eclipse.jetty.io.EofException EofException means that the client has disconnected the TCP connection before Solr has responded to the request. I assume that this is the httpchk config to make sure that the server is operational. If so, you need to increase the "timeout check" value, because it is too small. The ping request is taking longer to run than you have allowed in the timeout. Here's part of my haproxy config: listen idx_nc bind 0.0.0.0:8984 option httpchk GET /solr/ncmain/admin/ping balance leastconn timeout check 4990 server idxa1 10.100.0.240:8981 check inter 5s fastinter 2s rise 3 fall 2 weight 100 server idxb1 10.100.0.241:8981 check inter 5s fastinter 2s rise 3 fall 2 weight 100 backup server idxa2 10.100.0.242:8981 check inter 15s fastinter 2s rise 2 fall 1 weight 2 backup server idxb2 10.100.0.243:8981 check inter 15s fastinter 2s rise 2 fall 1 weight 1 backup If you have allowed what you think is plenty of time, then you may need to investigate Solr's performance or the specific query that you are using for the ping. http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Re: Latest jetty
On 7/27/2014 8:37 AM, Shalin Shekhar Mangar wrote: > I found SOLR-4839 so we'll use that issue. > > https://issues.apache.org/jira/browse/SOLR-4839 I hope you have better luck than I did. It wasn't a simple matter of upgrading the jars and locating simple API changes, a job that I've tackled a few times. More extensive knowledge of jetty will be required, knowledge that I do not have. Thanks, Shawn
Re: /solr/admin/ping causing exceptions in log?
Cool. That's likely exactly it, since I don't have one set, it's using the check interval, and occasionally must just be too short. Thank you! -- Nathan I assume that this is the httpchk config to make sure that the server is operational. If so, you need to increase the "timeout check" value, because it is too small. The ping request is taking longer to run than you have allowed in the timeout. Here's part of my haproxy config: -- Nathan Neulinger nn...@neulinger.org Neulinger Consulting (573) 612-1412
Re: /solr/admin/ping causing exceptions in log?
Unfortunately, doesn't look like this clears the symptom. The ping is responding almost instantly every time. I've tried setting a 15 second timeout on the check, with no change in occurences of the error. Looking at a packet capture on the server side, there is a clear distinction between working and failing/error-triggering connections. It looks like in a "working" case, I see two packets immediately back to back (one with header, and next a continuation with content) with no ack in between, followed by ack, rst+ack, rst. In the failing request, I see the GET request, acked, then the http/1.1 200 Ok response from Solr, a single ack, and then an almost instantaneous reset sent by the client. I'm only seeing this on traffic to/from haproxy checks. If I do a simple: while [ true ]; do curl -s http://host:8983/solr/admin/ping; done from the same box, that flood runs with generally 10-20ms request times and zero errors. -- Nathan On 07/27/2014 07:12 PM, Nathan Neulinger wrote: Cool. That's likely exactly it, since I don't have one set, it's using the check interval, and occasionally must just be too short. Thank you! -- Nathan I assume that this is the httpchk config to make sure that the server is operational. If so, you need to increase the "timeout check" value, because it is too small. The ping request is taking longer to run than you have allowed in the timeout. Here's part of my haproxy config: -- Nathan Neulinger nn...@neulinger.org Neulinger Consulting (573) 612-1412 solr-working.cap Description: application/vnd.tcpdump.pcap solr-cutoff2.cap Description: application/vnd.tcpdump.pcap
Re: /solr/admin/ping causing exceptions in log?
Either way, looks like this is not a SOLR issue, but rather haproxy. Thanks. -- Nathan On 07/27/2014 08:23 PM, Nathan Neulinger wrote: Unfortunately, doesn't look like this clears the symptom. The ping is responding almost instantly every time. I've tried setting a 15 second timeout on the check, with no change in occurences of the error. Looking at a packet capture on the server side, there is a clear distinction between working and failing/error-triggering connections. It looks like in a "working" case, I see two packets immediately back to back (one with header, and next a continuation with content) with no ack in between, followed by ack, rst+ack, rst. In the failing request, I see the GET request, acked, then the http/1.1 200 Ok response from Solr, a single ack, and then an almost instantaneous reset sent by the client. I'm only seeing this on traffic to/from haproxy checks. If I do a simple: while [ true ]; do curl -s http://host:8983/solr/admin/ping; done from the same box, that flood runs with generally 10-20ms request times and zero errors. -- Nathan On 07/27/2014 07:12 PM, Nathan Neulinger wrote: Cool. That's likely exactly it, since I don't have one set, it's using the check interval, and occasionally must just be too short. Thank you! -- Nathan I assume that this is the httpchk config to make sure that the server is operational. If so, you need to increase the "timeout check" value, because it is too small. The ping request is taking longer to run than you have allowed in the timeout. Here's part of my haproxy config: -- Nathan Neulinger nn...@neulinger.org Neulinger Consulting (573) 612-1412
Re: /solr/admin/ping causing exceptions in log?
On 7/27/2014 7:23 PM, Nathan Neulinger wrote: > Unfortunately, doesn't look like this clears the symptom. > > The ping is responding almost instantly every time. I've tried setting a > 15 second timeout on the check, with no change in occurences of the error. > > Looking at a packet capture on the server side, there is a clear > distinction between working and failing/error-triggering connections. > > It looks like in a "working" case, I see two packets immediately back to > back (one with header, and next a continuation with content) with no ack > in between, followed by ack, rst+ack, rst. > > In the failing request, I see the GET request, acked, then the http/1.1 > 200 Ok response from Solr, a single ack, and then an almost > instantaneous reset sent by the client. > > > I'm only seeing this on traffic to/from haproxy checks. If I do a simple: > > while [ true ]; do curl -s http://host:8983/solr/admin/ping; done > > from the same box, that flood runs with generally 10-20ms request times > and zero errors. I won't claim to understand what's going on here, but it might be a matter of the haproxy options. Here are the options I'm using in the "defaults" section of the config: defaults log global modehttp option httplog option dontlognull option redispatch option abortonclose option http-server-close option http-pretend-keepalive retries 1 maxconn 1024 timeout connect 1s timeout client 5s timeout server 30s One bit of information I came across when I first started setting haproxy up for Solr is that servlet containers like Jetty and Tomcat require the "http-pretend-keepalive" option to work properly. Are you using this option? Thanks, Shawn
Re: Solr Full Import frozen after indexing a fixed number of records
On Sun, Jul 27, 2014 at 12:28 PM, Gora Mohanty wrote: > On 27 July 2014 12:13, Aniket Bhoi wrote: > > > On Fri, Jul 25, 2014 at 8:32 PM, Aniket Bhoi > > wrote: > > > > > I have Apache Solr,hosted on my apache Tomcat Server with SQLServer > > > Backend. > > > > [...] > > > > After I run a full import,Indexing proceeds sucessfully,but seems to > > > freeze everytime after fetching fixed number of records.What I mean is > > > after it fetches 10730 records it just freezes and doesnt process any > > more. > > > > > > Excerpt from dataimport.xml: > > > > > > > > > 0:15:31.959 > > > 0 > > > *10730* > > > 3579 > > > 0 > > > 2014-07-25 10:44:39 > > > > > > This seems to happen everytime. > > > > > > I checked the tomcatlog.Following is the excerpt when Solr freezes: > > > > > > INFO: Generating record for Unique ID :null attachment Ref:null > > > parent ref :nullexecuted by thread:25 > > > > [...] > > Something is wrong with your DIH config file: You seem to be getting null > for a document unique ID. Please share the file with us. > > Regards, > Gora > Hi, The thing is that I have 3 Solr instances deployed across Dev ,QA and Production.They have the exact same configuration files but point to different databases.The DIH config is the same across all 2 instances.It is only in QA that this issue occurs though.Thoughts on this? Regards, Aniket
Re: To warm the whole cache of Solr other than the only autowarmcount
Hi Erick We do the DIH job from the DB and committed frequently.It takes a long time to autowarm the filterCaches after commit or soft commit happened when setting the autowarmcount=1024,which I do think is small enough. So It comes up an idea that whether it could directly pass the reference of the caches over to the new caches so that the autowarm processing will take much fewer time . 2014-07-28 2:30 GMT+08:00 Erick Erickson : > Why do you think you _need_ to autowarm the entire cache? It > is, after all, an LRU cache, the theory being that the most recent > queries are most likely to be reused. > > Personally I'd run some tests on using small autowarm counts > before getting at all mixed up in some complex scheme that > may not be useful at all. Say an autowarm count of 16. Then > measure using that, then say 32 then... Insure you have a real > problem before worrying about a solution! ;) > > Best, > Erick > > > On Fri, Jul 25, 2014 at 6:45 AM, Shawn Heisey wrote: > > > On 7/24/2014 8:45 PM, YouPeng Yang wrote: > > > To Matt > > > > > > Thank you,your opinion is very valuable ,So I have checked the source > > > codes about how the cache warming up. It seems to just put items of > the > > > old caches into the new caches. > > > I will pull Mark Miller into this discussion.He is the one of the > > > developer of the Solr whom I had contacted with. > > > > > > To Mark Miller > > > > > >Would you please check out what we are discussing in the last two > > > posts.I need your help. > > > > Matt is completely right. Any commit can drastically change the Lucene > > document id numbers. It would be too expensive to determine which > > numbers haven't changed. That means Solr must throw away all cache > > information on commit. > > > > Two of Solr's caches support autowarming. Those caches use queries as > > keys and results as values. Autowarming works by re-executing the top N > > queries (keys) in the old cache to obtain fresh Lucene document id > > numbers (values). The cache code does take *keys* from the old cache > > for the new cache, but not *values*. I'm very sure about this, as I > > wrote the current (and not terribly good) LFUCache. > > > > Thanks, > > Shawn > > > > >
Re: To warm the whole cache of Solr other than the only autowarmcount
Hi Shawn No affense to your work,I am still confusing about the cache warm processing about your explanation.So I check the warm method of FastLRUCache as [1]. As far as I see,there is no values refresh during the the warm processing. the *regenerator.regenerateItem* just put the old value to the new cache. Did I miss anything? [1]-- public void warm(SolrIndexSearcher searcher, SolrCache old) { if (regenerator == null) return; long warmingStartTime = System.nanoTime(); FastLRUCache other = (FastLRUCache) old; // warm entries if (isAutowarmingOn()) { int sz = autowarm.getWarmCount(other.size()); Map items = other.cache.getLatestAccessedItems(sz); Map.Entry[] itemsArr = new Map.Entry[items.size()]; int counter = 0; for (Object mapEntry : items.entrySet()) { itemsArr[counter++] = (Map.Entry) mapEntry; } for (int i = itemsArr.length - 1; i >= 0; i--) { try { boolean continueRegen = regenerator.regenerateItem(searcher, this, old, itemsArr[i].getKey(), itemsArr[i].getValue()); if (!continueRegen) break; } catch (Exception e) { SolrException.log(log, "Error during auto-warming of key:" + itemsArr[i].getKey(), e); } } } warmupTime = TimeUnit.MILLISECONDS.convert(System.nanoTime() - warmingStartTime, TimeUnit.NANOSECONDS); } 2014-07-25 21:45 GMT+08:00 Shawn Heisey : > On 7/24/2014 8:45 PM, YouPeng Yang wrote: > > To Matt > > > > Thank you,your opinion is very valuable ,So I have checked the source > > codes about how the cache warming up. It seems to just put items of the > > old caches into the new caches. > > I will pull Mark Miller into this discussion.He is the one of the > > developer of the Solr whom I had contacted with. > > > > To Mark Miller > > > >Would you please check out what we are discussing in the last two > > posts.I need your help. > > Matt is completely right. Any commit can drastically change the Lucene > document id numbers. It would be too expensive to determine which > numbers haven't changed. That means Solr must throw away all cache > information on commit. > > Two of Solr's caches support autowarming. Those caches use queries as > keys and results as values. Autowarming works by re-executing the top N > queries (keys) in the old cache to obtain fresh Lucene document id > numbers (values). The cache code does take *keys* from the old cache > for the new cache, but not *values*. I'm very sure about this, as I > wrote the current (and not terribly good) LFUCache. > > Thanks, > Shawn > >
Re: Slow inserts when using Solr Cloud
I'm benchmarking this right now so I'll share some numbers soon. On Mon, Jul 28, 2014 at 12:45 AM, Erick Erickson wrote: > bq: Whoa! That's awesome! > > And scary. > > Ian: Thanks a _lot_ for trying this out and reporting back. > > Also, let me say that this was a nice writeup, I wish more people would > post > as thorough a problem statement! > > Best, > Erick > > > On Sat, Jul 26, 2014 at 5:08 AM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > > Whoa! That's awesome! > > > > > > On Fri, Jul 25, 2014 at 8:03 PM, ian wrote: > > > > > I've built and installed the latest snapshot of Solr 4.10 using the > same > > > SolrCloud configuration and that gave me a tenfold increase in > > throughput, > > > so it certainly looks like SOLR-6136 was the issue that was causing my > > slow > > > insert rate/high latency with shard routing and replicas. Thanks for > > your > > > help. > > > > > > > > > Timothy Potter wrote > > > > Hi Ian, > > > > > > > > What's the CPU doing on the leader? Have you tried attaching a > > > > profiler to the leader while running and then seeing if there are any > > > > hotspots showing. Not sure if this is related but we recently fixed > an > > > > issue in the area of leader forwarding to replica that used too many > > > > CPU cycles inefficiently - see SOLR-6136. > > > > > > > > Tim > > > > > > > > > > > > > > > > > > -- > > > View this message in context: > > > > > > http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4149219.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > > -- Regards, Shalin Shekhar Mangar.
copy EnumField to text field
Hi all, I have an enumField called severity. these are its relevant definitions in schema.xml: And in enumsConfig.xml: Not Available Low Medium High Critical The default field for free text search is text. An enum field can be sent with its integer value or with its string value, and the value will stored and indexed as integer and displayed as string. When severity is sent with "Not Available", there will be matches for the free text search of "Not Available". When severity is sent with "0" (the integer equivalent of " Not Available"), there will be no matches for the free text search of "Not Available". In order to enable it, the following change should be made in DocumentBuilder: Instead of: // Perhaps trim the length of a copy field Object val = v; The code will be: // Perhaps trim the length of a copy field Object val = sfield.getType().toExternal(sfield.createField(v, 1.0f)); Am I right? It seems to work. I think this change is suitable for all field types. What do you think? But when no value is sent with severity, and the default of 0 is used, the fix doesn't seem to work. How can I make it work also for default values? Thanks.
Re: copy EnumField to text field
On Mon, Jul 28, 2014 at 1:31 PM, Elran Dvir wrote: > But when no value is sent with severity, and the default of 0 is used, the > fix doesn't seem to work. I guess the default in this case is figured out at the query time because there is no empty value as such. So that would be too late for copyField. If I am right, then you could probably use UpdateRequestProcessor and set the default value explicitly (DefaultValueUpdateProcessorFactory). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853