Re: Distributing Collections across Shards
Thanks Erick for the help. Appreciate it. Regards, Salman On Wed, Mar 30, 2016 at 7:29 AM, Erick Erickson wrote: > Absolutely. You haven't said which version of Solr you're using, > but there are several possibilities: > 1> create the collection with replicationFactor=1, then use the > ADDREPLICA command to specify exactly what node the replicas > for each shard are created on with the 'node' parameter. > 2> For recent versions of Solr, you can create a collection with _no_ > replicas and then ADDREPLICA as you choose. > > Best, > Erick > > On Tue, Mar 29, 2016 at 5:10 AM, Salman Ansari > wrote: > > Hi, > > > > I believe the default behavior of creating collections distributed across > > shards through the following command > > > > http:// > > > [solrlocation]:8983/solr/admin/collections?action=CREATE&name=[collection_name]&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=[configuration_name] > > > > is that Solr will create the collection as follows > > > > *shard1: *leader in server1 and replica in server2 > > *shard2:* leader in server2 and replica in server1 > > > > However, I have seen cases when running the above command that it creates > > both the leader and replica on the same server. > > > > Wondering if there is a way to control this behavior (I mean control > where > > the leader and the replica of each shard will reside)? > > > > Regards, > > Salman >
Re: High Cpu sys usage
Hi Thanks you Erik. The main collection that stores our trade data is set to the softcomit when we import data using DIH. As you guess that the softcommit intervals is " 1000 " and we have autowarm counts to 0.However there is some collections that store our meta info in which we commit after each add.and these metadata collections just hold a few docs. Best Regards 2016-03-30 12:25 GMT+08:00 Erick Erickson : > Do not, repeat NOT try to "cure" the "Overlapping onDeckSearchers" > by bumping this limit! What that means is that your commits > (either hard commit with openSearcher=true or softCommit) are > happening far too frequently and your Solr instance is trying to do > all sorts of work that is immediately thrown away and chewing up > lots of CPU. Perhaps this will help: > > > https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > I'd guess that you're > > > commiting every second, or perhaps your indexing client is committing > after each add. If the latter, do not do this and rely on the > autocommit settings > and if the formaer make those intervals as long as you can stand. > > > you may have your autowarm counts in your solrconfig.xml file set at > very high numbers (let's see the filterCache settings, the queryResultCache > settings etc.). > > I'd _strongly_ recommend that you put the on deck searchers back to > 2 and figure out why you have so many overlapping searchers. > > Best, > Erick > > On Tue, Mar 29, 2016 at 8:57 PM, YouPeng Yang > wrote: > > Hi Toke > > The number of collection is just 10.One of collection has 43 > shards,each > > shard has two replicas.We continue importing data from oracle all the > time > > while our systems provide searching service. > >There are "Overlapping onDeckSearchers" in my solr.logs. What is the > > meaning about the "Overlapping onDeckSearchers" ,We set the the < > > maxWarmingSearchers>20 and true > useColdSearcher>.Is it right ? > > > > > > > > Best Regard. > > > > > > 2016-03-29 22:31 GMT+08:00 Toke Eskildsen : > > > >> On Tue, 2016-03-29 at 20:12 +0800, YouPeng Yang wrote: > >> > Our system still goes down as times going.We found lots of threads > are > >> > WAITING.Here is the threaddump that I copy from the web page.And 4 > >> pictures > >> > for it. > >> > Is there any relationship with my problem? > >> > >> That is a lot of commitScheduler-threads. Do you have hundreds of > >> collections in your cloud? > >> > >> > >> Try grepping for "Overlapping onDeckSearchers" in your solr.logs to see > >> if you got caught in a downwards spiral of concurrent commits. > >> > >> - Toke Eskildsen, State and University Library, Denmark > >> > >> > >> >
Re: How to implement Autosuggestion
Hi Mugeesh, autocompletion world is not that simple as you would expect. Which kind of auto suggestion are you interested in ? First of all, simple string autosuggestion or document autosuggestion ? ( with more additional field to show then the label) Are you interested in the analysis for the text to suggest ? Fuzzy suggestions ? exact "beginning of the phrase" suggestions ? infix suggestions ? Try to give some example and we could help better . There is a specific suggester component, so it is likely to be useful to you, but let's try to discover more. Cheers On Mon, Mar 28, 2016 at 6:03 PM, Reth RM wrote: > Solr AnalyzingInfix suggester component: > https://lucidworks.com/blog/2015/03/04/solr-suggester/ > > > > On Mon, Mar 28, 2016 at 7:57 PM, Mugeesh Husain wrote: > > > Hi, > > > > I am looking for the best way to implement autosuggestion in ecommerce > > using solr or elasticsearch. > > > > I guess using ngram analyzer is not a good way if data is big. > > > > > > Please suggest me any link or your opinion ? > > > > > > > > Thanks > > Mugeesh > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/How-to-implement-Autosuggestion-tp4266434.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: Solr not working on new environment
OK, an update. I managed to remove the example/cloud directories, and stop Solr. I changed my startup script to be much simpler (./solr start) and now I get this: *[root@ bin]# ./startsolr.sh* *Waiting up to 30 seconds to see Solr running on port 8983 [|]* *Started Solr server on port 8983 (pid=31937). Happy searching!* * [root@nationalarchives bin]# ./solr status* *Found 1 Solr nodes:* *Solr process 31937 running on port 8983* *{* * "solr_home":"/opt/solr-5.5.0/server/solr",* * "version":"5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:22:52",* * "startTime":"2016-03-30T09:24:21.445Z",* * "uptime":"0 days, 0 hours, 3 minutes, 9 seconds",* * "memory":"62 MB (%12.6) of 490.7 MB"}* I now want to connect to it from my Drupal installation, but I'm getting this: "The Solr server could not be reached. Further data is therefore unavailable." - I realise this is probably not a Solr error, just giving all the information I have. When I try to connect to :8983/solr, I get a timeout. Does it sound like firewall issues? Regards, Jarus "Getting information off the Internet is like taking a drink from a fire hydrant." - Mitchell Kapor .---. .-. .-..-. .-.,'|"\.---.,--, / .-. ) ) \_/ / \ \_/ )/| |\ \ / .-. ) .' .' | | |(_)(_) /\ (_)| | \ \ | | |(_)| | __ | | | | / _ \ ) ( | | \ \| | | | \ \ ( _) \ `-' / / / ) \| | /(|`-' /\ `-' / \ `-) ) )---' `-' (_)-' /(_| (__)`--' )---' )\/ (_) (__)(_) (__) On Wed, Mar 30, 2016 at 8:50 AM, Jarus Bosman wrote: > Hi Erick, > > Thanks for the reply. It seems I have not done all my homework yet. > > We used to use Solr 3.6.2 on the old environment (we're using it in > conjunction with Drupal). When I got connectivity problems on the new > server, I decided to rather implement the latest version of Solr (5.5.0). I > read the Quick Start documentation and expected it to work first time, but > not so (as per my previous email). I will read up a bit on ZooKeeper (never > heard of it before - What is it?). Is there a good place to read up on > getting started with ZooKeeper and the latest versions of Solr (apart from > what you have replied, of course)? > > Thank you so much for your assistance, > Jarus > > > "Getting information off the Internet is like taking a drink from a fire > hydrant." - Mitchell Kapor > > .---. .-. .-..-. .-.,'|"\.---.,--, > / .-. ) ) \_/ / \ \_/ )/| |\ \ / .-. ) .' .' > | | |(_)(_) /\ (_)| | \ \ | | |(_)| | __ > | | | | / _ \ ) ( | | \ \| | | | \ \ ( _) > \ `-' / / / ) \| | /(|`-' /\ `-' / \ `-) ) > )---' `-' (_)-' /(_| (__)`--' )---' )\/ > (_) (__)(_) (__) > > On Wed, Mar 30, 2016 at 6:20 AM, Erick Erickson > wrote: > >> Good to meet you! >> >> It looks like you've tried to start Solr a time or two. When you start >> up the "cloud" example >> it creates >> /opt/solr-5.5.0/example/cloud >> and puts your SolrCloud stuff under there. It also automatically puts >> your configuration >> sets up on Zookeeper. When I get this kind of thing, I usually >> >> > stop Zookeeper (if running externally) >> >> > rm -rf /opt/solr-5.5.0/example/cloud >> >> > delete all the Zookeeper data. It may take a bit of poking to find out >> where >> the Zookeeper data is. It's usually in /tmp/zookeeper if you're running ZK >> standalone, or in a subdirectory in Solr if you're using embedded ZK. >> NOTE: if you're running standalone zookeeper, you should _definitely_ >> change the data dir because it may disappear from /tmp/zookeeper One >> of Zookeeper's little quirks >> >> > try it all over again. >> >> Here's the problem. The examples (-e cloud) tries to do a bunch of stuff >> for >> you to get the installation up and running without having to wend your way >> through all of the indiviual commands. Sometimes getting partway through >> leaves you in an ambiguous state. Or at least a state you don't quite know >> what all the moving parts are. >> >> Here's the steps you need to follow if you're doing them yourself rather >> than >> relying on the canned example >> 1> start Zookeeper externally. For experimentation, a single ZK is quite >> sufficient, I don't bother with 3 ZK instances and a quorum unless I'm >> in a production situation. >> 2> start solr with the bin/solr script, use the -c and -z options. At >> this point, >> you have a functioning Solr, but no collections. You should be >> able to see the solr admin UI at http://node:8982/solr at this point. >> 3> use the bin/solr zk -upconfig command to put a configset in ZK >> 4> use the Collections API to create and maintain collections. >> >> And one more note. When you use the '-e cloud' option, you'll see >> messages go by about starting nodes with a command like: >> >> bin/solr start -c -z localhost:2181 -p 8981 -s example/cloud/node1/solr >> bin/solr start -c -z localhost:2181 -p
RE: Deleted documents and expungeDeletes
Hello - with TieredMergePolicy and default reclaimDeletesWeight of 2.0, and frequent updates, it is not uncommon to see a ratio of 25%. If you want deletes to be reclaimed more often, e.g. weight of 4.0, you will see very frequent merging of large segments, killing performance if you are on spinning disks. Markus -Original message- > From:Erick Erickson > Sent: Wednesday 30th March 2016 2:50 > To: solr-user > Subject: Re: Deleted documents and expungeDeletes > > bq: where I see that the number of deleted documents just > keeps on growing and growing, but they never seem to be deleted > > This shouldn't be happening. The default TieredMergePolicy weights > segments to be merged (which happens automatically) heavily as per > the percentage of deleted docs. Here's a great visualization: > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html > > It may be that when you say "growing and growing", that the number of > deleted docs hasn't reached the threshold where they get merged away. > > Please specify "growing and growing", Until it gets to 15% or more of the > total > then I'd start to worry. And then only if it kept growing after that. > > To your questions: > 1> This is automatic. It'll "just happen", but you will probably always carry > some deleted docs around in your index. > > 2> You always need at least as much free space as your index occupies on disk. > In the worst case of normal merging, _all_ the segments will be merged > and they're > copied first. Once that's successful, then the original is deleted. > > 3> Not really. Normally there should be no need. > > 4> True, but usually the effect is so minuscule that nobody notices. > People spend > endless time obsessing about this and unless and until you can show that your > _users_ notice, I'd ignore it. > > Best, > Erick > > On Tue, Mar 29, 2016 at 8:16 AM, Jostein Elvaker Haande > wrote: > > Hello everyone, > > > > I apologise beforehand if this is a question that has been visited > > numerous times on this list, but after hours spent on Google and > > talking to SOLR savvy people on #solr @ Freenode I'm still a bit at a > > loss about SOLR and deleted documents. > > > > I have quite a few indexes in both production and development > > environments, where I see that the number of deleted documents just > > keeps on growing and growing, but they never seem to be deleted. From > > my understanding, this can be controller in the merge policy set for > > the current core, but I've not been able to find any specifics on the > > topic. > > > > The general consensus on most search hits I've found is to perform an > > optimize of the core, however this is both an expensive operation, > > both in terms of CPU cycles as well as disk I/O, and also requires you > > to have anywhere from 2 times to 3 times the size of the index > > available on disk to be guaranteed to complete fully. Given these > > criteria, it's often not something that is a viable option in certain > > environments, both to it being a resource hog and often that you just > > don't have the needed available disk space to perform the optimize. > > > > After having spoken with a couple of people on IRC (thanks tokee and > > elyograg), I was made aware of an optional parameter for > > called 'expungeDeletes' that can explicitly make sure that deleted > > documents are deleted from the index, i.e: > > > > curl http://localhost:8983/solr/coreName/update -H "Content-Type: > > text/xml" --data-binary '' > > > > Now my questions are as follows: > > > > 1) How can I make sure that this is dealt with in my merge policy, if > > at all possible? > > 2) I've tried to find some disk space guidelines for 'expungeDeletes', > > however I've not been able to find any. What are the general > > guidelines here? Does it require as much space as an optimize, or is > > it less "aggressive" compared to an optimize? > > 3) Is 'expungeDeletes' the recommended method to make sure your > > deleted documents are actually removed from the index, or should you > > deal with this in your merge policy? > > 4) I have also heard from talks on #SOLR that deleted documents has an > > impact on the relevancy of performed searches. Is this correct, or > > just misinformation? > > > > If you require any additional information, like snippets from my > > configuration (solrconfig.xml), I'm more than happy to provide this. > > > > Again, if this is an issue that's being revisited for the Nth time, I > > apologize, I'm just trying to get my head around this with my somewhat > > limited SOLR knowledge. > > > > -- > > Yours sincerely Jostein Elvaker Haande > > "A free society is a society where it is safe to be unpopular" > > - Adlai Stevenson > > > > http://tolecnal.net -- tolecnal at tolecnal dot net >
[Possible Bug] 5.5.0 Startup script ignoring host parameter?
Hi folks, It looks like the "-h" parameter isn't being processed correctly. I want Solr to listen on 127.0.0.1, but instead it binds to all interfaces. Am I doing something wrong? Or am I misinterpreting what the -h parameter is for? Linux: # bin/solr start -h 127.0.0.1 -p 8180 # netstat -tlnp | grep 8180 tcp6 0 0 :::8180 :::* LISTEN 14215/java Windows: > solr.cmd start -h 127.0.0.1 -p 8180 > netstat -a TCP 0.0.0.0:8180MyBox:0 LISTENING The Solr JVM args are likely the cause. From the Solr Admin GUI: -DSTOP.KEY=solrrocks -Dhost=127.0.0.1 -Djetty.port=8180 Presumably that ought to be -Djetty.host=127.0.0.1 instead of -Dhost? This has potential security implications for us :-( Thanks, - Bram
Re: Deleted documents and expungeDeletes
On 30 March 2016 at 02:49, Erick Erickson wrote: > Please specify "growing and growing", Until it gets to 15% or more of the > total > then I'd start to worry. And then only if it kept growing after that. I tested 'expungeDeletes' on four different cores, three of them were nearly identical in terms of numbers. Max Docs were around ~2.2M, Num Docs was ~1.6M and Deleted Docs were ~600K - so the percentage of Deleted Docs were around the ~27 percent mark. So according to your feedback, I should start to worry! Now the question is, why aren't the Deleted Docs being merged away if this is in fact supposed to happen? > 1> This is automatic. It'll "just happen", but you will probably always carry > some deleted docs around in your index. Yeah, that I am aware of - I noticed that even after running 'expungeDeletes' I had a few thousand docs left, which is acceptable and does not worry me. > 4> True, but usually the effect is so minuscule that nobody notices. > People spend > endless time obsessing about this and unless and until you can show that your > _users_ notice, I'd ignore it. Hehe, then I'll refrain from being one of those that obsess over this. As long as I know the effect it has is minuscule, then I'll just toss the thought in the bin. -- Yours sincerely Jostein Elvaker Haande "A free society is a society where it is safe to be unpopular" - Adlai Stevenson http://tolecnal.net -- tolecnal at tolecnal dot net
Re: Deleted documents and expungeDeletes
On 30 March 2016 at 12:25, Markus Jelsma wrote: > Hello - with TieredMergePolicy and default reclaimDeletesWeight of 2.0, and > frequent updates, it is not uncommon to see a ratio of 25%. If you want > deletes to be reclaimed more often, e.g. weight of 4.0, you will see very > frequent merging of large segments, killing performance if you are on > spinning disks. Most of our installations are on spinning disks, so if I want a more aggressive reclaim, this will impact performance. This is of course something that I do not desire, so I'm wondering if scheduling a commit with 'expungeDeletes' during off peak business hours is a better approach than setting up a more aggressive merge policy. -- Yours sincerely Jostein Elvaker Haande "A free society is a society where it is safe to be unpopular" - Adlai Stevenson http://tolecnal.net -- tolecnal at tolecnal dot net
Re: Deleted documents and expungeDeletes
On 03/30/2016 08:23 AM, Jostein Elvaker Haande wrote: On 30 March 2016 at 12:25, Markus Jelsma wrote: Hello - with TieredMergePolicy and default reclaimDeletesWeight of 2.0, and frequent updates, it is not uncommon to see a ratio of 25%. If you want deletes to be reclaimed more often, e.g. weight of 4.0, you will see very frequent merging of large segments, killing performance if you are on spinning disks. Most of our installations are on spinning disks, so if I want a more aggressive reclaim, this will impact performance. This is of course something that I do not desire, so I'm wondering if scheduling a commit with 'expungeDeletes' during off peak business hours is a better approach than setting up a more aggressive merge policy. As far as my experimentation with @expungeDeletes goes, if the data you indexed and committed using @expungeDeletes didn't touch segments with any deleted documents nor wasn't enough data to cause merging with a segment containing deleted documents, no deleted documents will be removed. Basically, @expungeDeletes expunges deletes in segments affected by the commit. If you have a large update that touches many segments containing deleted documents and you use @expungeDeletes, it could be just as resource intensive as an optimize. My setting for reclaimDeletesWeight: 5.0 It keeps the deleted documents down to ~ 10% without any noticable impact on resources or performance. But I'm still in the testing phase with this setting.
Re: Regarding JSON indexing in SOLR 4.10
On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote: > I am running SOLR 4.10 on port 8984 by changing the default port in > etc/jetty.xml. I am now trying to index all my JSON files to Solr running > on 8984. The following is the command > > curl 'http://localhost:8984/solr/update?commit=true' --data-binary *.json > -H 'Content-type:application/json' The wildcard is the problem; your shell is expanding --data-binary *.json to --data-binary foo.json bar.json baz.json and curl doesn't know how to download bar.json and baz.json. Try this instead: for file in *.json; do curl 'http://localhost:8984/solr/update?commit=true' --data-binary "$file" -H 'Content-type:application/json' done Paul. -- Paul Hoffman Systems Librarian Fenway Libraries Online c/o Wentworth Institute of Technology 550 Huntington Ave. Boston, MA 02115 (617) 442-2384 (FLO main number)
Re: [Possible Bug] 5.5.0 Startup script ignoring host parameter?
On 3/30/2016 5:45 AM, Bram Van Dam wrote: > It looks like the "-h" parameter isn't being processed correctly. I want > Solr to listen on 127.0.0.1, but instead it binds to all interfaces. Am > I doing something wrong? Or am I misinterpreting what the -h parameter > is for? The host parameter does not control binding to network interfaces. It controls what hostname is published to zookeeper when running in cloud mode. Solr's networking is provided by a third-party application -- the servlet container. In 5.x, we took steps to ensure that the container everyone uses is Jetty. The default Jetty configuration supplied with Solr will bind to all interfaces. If you want to control interface binding, you need to edit the Jetty config, in server/etc. The file most likely to need changes is server/etc/jetty.xml. The folliwing URL contains Jetty's documentation on how to configure the networking. Right now this URL applies to version 9, used in Solr 5.x: http://www.eclipse.org/jetty/documentation/current/configuring-connectors.html Thanks, Shawn
Re: Deleted documents and expungeDeletes
through a clever bit of reflection, you can set the reclaimDeletesWeight variable from solrconfig by including something like 5 (going from memory here, you'll get an error on startup if I've messed it up.) That may help.. Best, Erick On Wed, Mar 30, 2016 at 6:15 AM, David Santamauro wrote: > > > On 03/30/2016 08:23 AM, Jostein Elvaker Haande wrote: >> >> On 30 March 2016 at 12:25, Markus Jelsma >> wrote: >>> >>> Hello - with TieredMergePolicy and default reclaimDeletesWeight of 2.0, >>> and frequent updates, it is not uncommon to see a ratio of 25%. If you want >>> deletes to be reclaimed more often, e.g. weight of 4.0, you will see very >>> frequent merging of large segments, killing performance if you are on >>> spinning disks. >> >> >> Most of our installations are on spinning disks, so if I want a more >> aggressive reclaim, this will impact performance. This is of course >> something that I do not desire, so I'm wondering if scheduling a >> commit with 'expungeDeletes' during off peak business hours is a >> better approach than setting up a more aggressive merge policy. >> > > As far as my experimentation with @expungeDeletes goes, if the data you > indexed and committed using @expungeDeletes didn't touch segments with any > deleted documents nor wasn't enough data to cause merging with a segment > containing deleted documents, no deleted documents will be removed. > Basically, @expungeDeletes expunges deletes in segments affected by the > commit. If you have a large update that touches many segments containing > deleted documents and you use @expungeDeletes, it could be just as resource > intensive as an optimize. > > My setting for reclaimDeletesWeight: > 5.0 > > It keeps the deleted documents down to ~ 10% without any noticable impact on > resources or performance. But I'm still in the testing phase with this > setting. >
Re: Solr not working on new environment
Whoa! I thought you were going for SolrCloud. If you're not interested in SolrCloud, you don't need to know anything about Zookeeper. So it looks like Solr is running. You say: bq: When I try to connect to :8983/solr, I get a timeout. Does it sound like firewall issues? are you talking about Drupal or about a simple browser connection? If the former, I'm all out of ideas as I know very little about the Drupal integration and/or whether it's even possible with a 5.x... Best, Erick On Wed, Mar 30, 2016 at 2:52 AM, Jarus Bosman wrote: > OK, an update. I managed to remove the example/cloud directories, and stop > Solr. I changed my startup script to be much simpler (./solr start) and now > I get this: > > *[root@ bin]# ./startsolr.sh* > *Waiting up to 30 seconds to see Solr running on port 8983 [|]* > *Started Solr server on port 8983 (pid=31937). Happy searching!* > * > > [root@nationalarchives bin]# > ./solr status* > > *Found 1 Solr nodes:* > > *Solr process 31937 running on port 8983* > *{* > * "solr_home":"/opt/solr-5.5.0/server/solr",* > * "version":"5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - > 2016-02-16 15:22:52",* > * "startTime":"2016-03-30T09:24:21.445Z",* > * "uptime":"0 days, 0 hours, 3 minutes, 9 seconds",* > * "memory":"62 MB (%12.6) of 490.7 MB"}* > > I now want to connect to it from my Drupal installation, but I'm getting > this: "The Solr server could not be reached. Further data is therefore > unavailable." - I realise this is probably not a Solr error, just giving > all the information I have. When I try to connect to > :8983/solr, I get a timeout. Does it sound like firewall issues? > > Regards, > Jarus > > "Getting information off the Internet is like taking a drink from a fire > hydrant." - Mitchell Kapor > > .---. .-. .-..-. .-.,'|"\.---.,--, > / .-. ) ) \_/ / \ \_/ )/| |\ \ / .-. ) .' .' > | | |(_)(_) /\ (_)| | \ \ | | |(_)| | __ > | | | | / _ \ ) ( | | \ \| | | | \ \ ( _) > \ `-' / / / ) \| | /(|`-' /\ `-' / \ `-) ) > )---' `-' (_)-' /(_| (__)`--' )---' )\/ > (_) (__)(_) (__) > > On Wed, Mar 30, 2016 at 8:50 AM, Jarus Bosman wrote: > >> Hi Erick, >> >> Thanks for the reply. It seems I have not done all my homework yet. >> >> We used to use Solr 3.6.2 on the old environment (we're using it in >> conjunction with Drupal). When I got connectivity problems on the new >> server, I decided to rather implement the latest version of Solr (5.5.0). I >> read the Quick Start documentation and expected it to work first time, but >> not so (as per my previous email). I will read up a bit on ZooKeeper (never >> heard of it before - What is it?). Is there a good place to read up on >> getting started with ZooKeeper and the latest versions of Solr (apart from >> what you have replied, of course)? >> >> Thank you so much for your assistance, >> Jarus >> >> >> "Getting information off the Internet is like taking a drink from a fire >> hydrant." - Mitchell Kapor >> >> .---. .-. .-..-. .-.,'|"\.---.,--, >> / .-. ) ) \_/ / \ \_/ )/| |\ \ / .-. ) .' .' >> | | |(_)(_) /\ (_)| | \ \ | | |(_)| | __ >> | | | | / _ \ ) ( | | \ \| | | | \ \ ( _) >> \ `-' / / / ) \| | /(|`-' /\ `-' / \ `-) ) >> )---' `-' (_)-' /(_| (__)`--' )---' )\/ >> (_) (__)(_) (__) >> >> On Wed, Mar 30, 2016 at 6:20 AM, Erick Erickson >> wrote: >> >>> Good to meet you! >>> >>> It looks like you've tried to start Solr a time or two. When you start >>> up the "cloud" example >>> it creates >>> /opt/solr-5.5.0/example/cloud >>> and puts your SolrCloud stuff under there. It also automatically puts >>> your configuration >>> sets up on Zookeeper. When I get this kind of thing, I usually >>> >>> > stop Zookeeper (if running externally) >>> >>> > rm -rf /opt/solr-5.5.0/example/cloud >>> >>> > delete all the Zookeeper data. It may take a bit of poking to find out >>> where >>> the Zookeeper data is. It's usually in /tmp/zookeeper if you're running ZK >>> standalone, or in a subdirectory in Solr if you're using embedded ZK. >>> NOTE: if you're running standalone zookeeper, you should _definitely_ >>> change the data dir because it may disappear from /tmp/zookeeper One >>> of Zookeeper's little quirks >>> >>> > try it all over again. >>> >>> Here's the problem. The examples (-e cloud) tries to do a bunch of stuff >>> for >>> you to get the installation up and running without having to wend your way >>> through all of the indiviual commands. Sometimes getting partway through >>> leaves you in an ambiguous state. Or at least a state you don't quite know >>> what all the moving parts are. >>> >>> Here's the steps you need to follow if you're doing them yourself rather >>> than >>> relying on the canned example >>> 1> start Zookeeper externally. For experimentation, a single ZK is quite >>> sufficient, I don't
Re: High Cpu sys usage
Both of these are anit-patterns. The soft commit interval of 1 second is usually far too aggressive. And committing after every add is also something to avoid. Your original problem statement is high CPU usage. To see if your committing is the culprit, I'd stop committing at all after adding and make the soft commit interval, say, 60 seconds. And keep the hard commit interval whatever it is not but make sure openSearcher is set to false. That should pinpoint whether the CPU usage is just because of your committing. From there you can figure out the right balance... If that's _not_ the source of your CPU usage, then at least you'll have eliminated it as a potential problem. Best, Erick On Wed, Mar 30, 2016 at 12:37 AM, YouPeng Yang wrote: > Hi > Thanks you Erik. >The main collection that stores our trade data is set to the softcomit > when we import data using DIH. As you guess that the softcommit intervals > is " 1000 " and we have autowarm counts to 0.However > there is some collections that store our meta info in which we commit after > each add.and these metadata collections just hold a few docs. > > > Best Regards > > > 2016-03-30 12:25 GMT+08:00 Erick Erickson : > >> Do not, repeat NOT try to "cure" the "Overlapping onDeckSearchers" >> by bumping this limit! What that means is that your commits >> (either hard commit with openSearcher=true or softCommit) are >> happening far too frequently and your Solr instance is trying to do >> all sorts of work that is immediately thrown away and chewing up >> lots of CPU. Perhaps this will help: >> >> >> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ >> >> I'd guess that you're >> >> > commiting every second, or perhaps your indexing client is committing >> after each add. If the latter, do not do this and rely on the >> autocommit settings >> and if the formaer make those intervals as long as you can stand. >> >> > you may have your autowarm counts in your solrconfig.xml file set at >> very high numbers (let's see the filterCache settings, the queryResultCache >> settings etc.). >> >> I'd _strongly_ recommend that you put the on deck searchers back to >> 2 and figure out why you have so many overlapping searchers. >> >> Best, >> Erick >> >> On Tue, Mar 29, 2016 at 8:57 PM, YouPeng Yang >> wrote: >> > Hi Toke >> > The number of collection is just 10.One of collection has 43 >> shards,each >> > shard has two replicas.We continue importing data from oracle all the >> time >> > while our systems provide searching service. >> >There are "Overlapping onDeckSearchers" in my solr.logs. What is the >> > meaning about the "Overlapping onDeckSearchers" ,We set the the < >> > maxWarmingSearchers>20 and true> > useColdSearcher>.Is it right ? >> > >> > >> > >> > Best Regard. >> > >> > >> > 2016-03-29 22:31 GMT+08:00 Toke Eskildsen : >> > >> >> On Tue, 2016-03-29 at 20:12 +0800, YouPeng Yang wrote: >> >> > Our system still goes down as times going.We found lots of threads >> are >> >> > WAITING.Here is the threaddump that I copy from the web page.And 4 >> >> pictures >> >> > for it. >> >> > Is there any relationship with my problem? >> >> >> >> That is a lot of commitScheduler-threads. Do you have hundreds of >> >> collections in your cloud? >> >> >> >> >> >> Try grepping for "Overlapping onDeckSearchers" in your solr.logs to see >> >> if you got caught in a downwards spiral of concurrent commits. >> >> >> >> - Toke Eskildsen, State and University Library, Denmark >> >> >> >> >> >> >>
Re: Solr response error 403 when I try to index medium.com articles
Jack, thanks for the reply. With other sites over https I'm not having trouble. What logic suggests you change? Did not quite understand. 2016-03-29 21:01 GMT-03:00 Jack Krupansky : > Medium switches from http to https, so you would need the logic for dealing > with https security handshakes. > > -- Jack Krupansky > > On Tue, Mar 29, 2016 at 7:54 PM, Jeferson dos Anjos < > jefersonan...@packdocs.com> wrote: > > > I'm trying to index some pages of the medium. But I get error 403. I > > believe it is because the medium does not accept the user-agent solr. Has > > anyone ever experienced this? You know how to change? > > > > I appreciate any help > > > > > > 500 > > 94 > > > > > > > > Server returned HTTP response code: 403 for URL: > > > > > https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1 > > > > > > java.io.IOException: Server returned HTTP response code: 403 for URL: > > > > > https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1 > > at sun.reflect.GeneratedConstructorAccessor314.newInstance(Unknown > > Source) at > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > > Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) > > at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source) > > at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source) > > at java.security.AccessController.doPrivileged(Native Method) at > > sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown > > Source) at > > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown > > Source) at > > sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown > > Source) at > > sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown > > Source) at > > > org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:87) > > at > > > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158) > > at > > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) > > at > > > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at > > > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > > at > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > > at > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > > at org.eclipse.jetty.server.Server.handle(Server.java:368) at > > > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) > > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at > > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > > at > > > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
Re: Solr response error 403 when I try to index medium.com articles
403 means "forbidden" Something about the request Solr is sending -- or soemthing about the IP address Solr is connecting from when talking to medium.com -- is causing hte medium.com web server to reject the request. This is something that servers may choose to do if they detect (via headers, or missing headers, or reverse ip lookup, or other distinctive nuances of how the connection was made) that the client connecting to their server isn't a "human browser" (ie: firefox, chrome, safari) and is a Robot that they don't want to cooperate with (ie: they might be happy toserve their pages to the google-bot crawler, but not to some third-party they've never heard of. The specifics of how/why you might get a 403 for any given url are hard to debug -- it might literally depend on how many requests you've sent tothat domain in the past X hours. In general Solr's ContentStream indexing from remote hosts isn't inteded to be a super robust solution for crawling arbitrary websites on the web -- if that's your goal, then i would suggest you look into running a more robust crawler (nutch, droids, Lucidworks Fusion, etc...) that has more features and debugging options (notably: rate limiting) and use that code to feath the content, then push it to Solr. : Date: Tue, 29 Mar 2016 20:54:52 -0300 : From: Jeferson dos Anjos : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Solr response error 403 when I try to index medium.com articles : : I'm trying to index some pages of the medium. But I get error 403. I : believe it is because the medium does not accept the user-agent solr. Has : anyone ever experienced this? You know how to change? : : I appreciate any help : : : 500 : 94 : : : : Server returned HTTP response code: 403 for URL: : https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1 : : : java.io.IOException: Server returned HTTP response code: 403 for URL: : https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1 : at sun.reflect.GeneratedConstructorAccessor314.newInstance(Unknown : Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown : Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) : at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source) : at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source) : at java.security.AccessController.doPrivileged(Native Method) at : sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown : Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown : Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown : Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown : Source) at org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:87) : at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158) : at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) : at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) : at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291) : at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at : org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) : at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) : at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204) : at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) : at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) : at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) : at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) : at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) : at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) : at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) : at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) : at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) : at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) : at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) : at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) : at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) : at org.eclipse.jetty.server.Server.handle(Server.java:368) at : org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) : at org.eclipse.jetty.server.Bl
Re: Load Resource from within Solr Plugin
: : 1) as a general rule, if you have a delcaration which includes "WEB-INF" you are probably doing something wrong. Maybe not in this case -- maybe "search-webapp/target" is a completley distinct java application and you are just re-using it's jars. But 9 times out of 10, when people have a WEB-INF path they are trying to load jars from, it's because they *first* added their jars to Solr's WEB_INF directory, and then when that didn't work they added the path to the WEB-INF dir as a ... but now you've got those classes being loaded twice, and you've multiplied all of your problems. 2) let's ignore the fact that your path has WEB-INF in it, and just assume it's some path to somewhere where on disk that has nothing to do with solr, and you want to load those jars. great -- solr will do that for you, and all of those classes will be available to plugins. Now if you wnat to explicitly do something classloader related, you do *not* want to be using Thread.currentThread().getContextClassLoader() ... because the threads that execute everything in Solr are a pool of worker threads that is created before solr ever has a chance to parse your directive. You want to ensure anything you do related to a Classloader uses the ClassLoader Solr sets up for plugins -- that's available from the SolrResourceLoader. You can always get the SolrResourceLoader via SolrCore.getSolrResourceLoader(). from there you can getClassLoader() if you really need some hairy custom stuff -- or if you are just trying to load a simple resource file as an InputStream, use openResource(String name) ... that will start by checking for it in the conf dir, and will fallback to your jar -- so you can have a default resource file shipped with your plugin, but allow users to override it in their collection configs. -Hoss http://www.lucidworks.com/
Re: Regarding JSON indexing in SOLR 4.10
Hi Paul Thanks a lot for your help! I have one small question, I have schema that includes {Keyword,id,currency,geographic_name}. Now I have given id And Whenever I am running your script I am getting an error as 4002Document is missing mandatory uniqueKey field: id400 Can you please share your expertise advice here. Can you please guide me a good source to learn SOLR? I am learning and I would really appreciate if you can help me. Regards On Wed, Mar 30, 2016 at 6:55 AM, Paul Hoffman wrote: > On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote: > > I am running SOLR 4.10 on port 8984 by changing the default port in > > etc/jetty.xml. I am now trying to index all my JSON files to Solr running > > on 8984. The following is the command > > > > curl ' > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= > ' --data-binary *.json > > -H 'Content-type:application/json' > > The wildcard is the problem; your shell is expanding --data-binary > *.json to --data-binary foo.json bar.json baz.json and curl doesn't know > how to download bar.json and baz.json. > > Try this instead: > > for file in *.json; do > curl ' > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= > ' --data-binary "$file" -H 'Content-type:application/json' > done > > Paul. > > -- > Paul Hoffman > Systems Librarian > Fenway Libraries Online > c/o Wentworth Institute of Technology > 550 Huntington Ave. > Boston, MA 02115 > (617) 442-2384 (FLO main number) > -- Aditya Ramachandra Desai MS Computer Science Graduate Student USC Viterbi School of Engineering Los Angeles, CA 90007 M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai
[possible bug]: [child] - ChildDocTransformerFactory returns top level documents nested under middle level documents when queried for the middle level ones
I think I am observing an unexpected behavior of ChildDocTransformerFactory. The query is like this: /select?q={!parent which= "type_s:doc.enriched.text "}t ype_s:doc.enriched.text.entities +text_t:pjm +type_t:Company +relevance_tf:[0.7%20TO%20*]&fl=*,[child parentFilter=type_s:doc.enriched.text limit=1000] The levels of hierarchy are shown in the type_s field. So I am querying on some descendants and returning some ancestors that are somewhere in the middle of the hierarchy. I also want to get all the nested documents below that middle level. Here is the result: doc.enriched.text // this is the level I wanted to get to and then go down from it ... 13565 doc.enriched // This is a document from 1 level up, the parent of the // current type_s : doc.enriched.text document -- why is it here? 22024 doc.original // This is an "uncle" 26698 doc // and this a grandparent!!! And so on, bringing the whole tree up and down all under my middle-level document. I really hope this is not the expected behavior. I appreciate your help in advance. -- Alisa Zhila
Re: [possible bug]: [child] - ChildDocTransformerFactory returns top level documents nested under middle level documents when queried for the middle level ones
I'm not the best person to comment on this so perhaps someone could chime in as well, but can you try using a wildcard for your childFilter? Something like: childFilter=type_s:doc.enriched.text.* You could also possibly enrich the document with depth information and use that for filtering out. On Wed, Mar 30, 2016 at 11:34 AM, Alisa Z. wrote: > I think I am observing an unexpected behavior of > ChildDocTransformerFactory. > > The query is like this: > > /select?q={!parent which= "type_s:doc.enriched.text "}t > ype_s:doc.enriched.text.entities +text_t:pjm +type_t:Company > +relevance_tf:[0.7%20TO%20*]&fl=*,[child > parentFilter=type_s:doc.enriched.text limit=1000] > > The levels of hierarchy are shown in the type_s field. So I am querying > on some descendants and returning some ancestors that are somewhere in the > middle of the hierarchy. I also want to get all the nested documents > below that middle level. > > Here is the result: > > > > > doc.enriched.text// this is the level > I wanted to get to and then go down from it > ... > 13565 > > doc.enriched // This is a document > from 1 level up, the parent of the >// current type_s : > doc.enriched.text document -- why is it here? > 22024 > > > doc.original // This is an "uncle" > 26698 > > > doc// and this a > grandparent!!! > > > > > And so on, bringing the whole tree up and down all under my middle-level > document. > I really hope this is not the expected behavior. > > I appreciate your help in advance. > > -- > Alisa Zhila -- Anshum Gupta
Re: Solr response error 403 when I try to index medium.com articles
You could use the curl command to read a URL on Medium.com. That would let you examine and control the headers to experiment. Google is able to index Medium. Check the URL and make sure it's not on one of the paths disallowed by medium.com/robots.txt (the one you gave seems fine): User-Agent: * Disallow: /_/ Disallow: /m/ Disallow: /me/ Disallow: /@me$ Disallow: /@me/ Disallow: /*/*/edit Sitemap: https://medium.com/sitemap/sitemap.xml -- Jack Krupansky On Wed, Mar 30, 2016 at 1:05 PM, Chris Hostetter wrote: > > 403 means "forbidden" > > Something about the request Solr is sending -- or soemthing about the IP > address Solr is connecting from when talking to medium.com -- is causing > hte medium.com web server to reject the request. > > This is something that servers may choose to do if they detect (via > headers, or missing headers, or reverse ip lookup, or other > distinctive nuances of how the connection was made) that the > client connecting to their server isn't a "human browser" (ie: firefox, > chrome, safari) and is a Robot that they don't want to cooperate with (ie: > they might be happy toserve their pages to the google-bot crawler, but not > to some third-party they've never heard of. > > The specifics of how/why you might get a 403 for any given url are hard to > debug -- it might literally depend on how many requests you've sent tothat > domain in the past X hours. > > In general Solr's ContentStream indexing from remote hosts isn't inteded > to be a super robust solution for crawling arbitrary websites on the web > -- if that's your goal, then i would suggest you look into running a more > robust crawler (nutch, droids, Lucidworks Fusion, etc...) that has more > features and debugging options (notably: rate limiting) and use that code > to feath the content, then push it to Solr. > > > : Date: Tue, 29 Mar 2016 20:54:52 -0300 > : From: Jeferson dos Anjos > : Reply-To: solr-user@lucene.apache.org > : To: solr-user@lucene.apache.org > : Subject: Solr response error 403 when I try to index medium.com articles > : > : I'm trying to index some pages of the medium. But I get error 403. I > : believe it is because the medium does not accept the user-agent solr. Has > : anyone ever experienced this? You know how to change? > : > : I appreciate any help > : > : > : 500 > : 94 > : > : > : > : Server returned HTTP response code: 403 for URL: > : > https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1 > : > : > : java.io.IOException: Server returned HTTP response code: 403 for URL: > : > https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1 > : at sun.reflect.GeneratedConstructorAccessor314.newInstance(Unknown > : Source) at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > : Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) > : at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source) > : at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source) > : at java.security.AccessController.doPrivileged(Native Method) at > : sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown > : Source) at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown > : Source) at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown > : Source) at > sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown > : Source) at > org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:87) > : at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158) > : at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > : at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) > : at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291) > : at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at > : > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) > : at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) > : at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204) > : at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > : at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > : at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > : at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > : at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > : at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > : at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > : at > org.eclipse.jetty.server.session.SessionHandler.doSc
Re: Load Resource from within Solr Plugin
Max, Have you looked in External file field which is reload on every hard commit, only disadvantage of this is the file (personal-words.txt) has to be placed in all data folders in each solr core, for which we have a bash script to do this job. https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes Ignore this if this does not meets your requirement. *Rajesh**.* On Wed, Mar 30, 2016 at 1:21 PM, Chris Hostetter wrote: > : > : : regex=".*\.jar" /> > > 1) as a general rule, if you have a delcaration which includes > "WEB-INF" you are probably doing something wrong. > > Maybe not in this case -- maybe "search-webapp/target" is a completley > distinct java application and you are just re-using it's jars. But 9 > times out of 10, when people have a WEB-INF path they are trying to load > jars from, it's because they *first* added their jars to Solr's WEB_INF > directory, and then when that didn't work they added the path to the > WEB-INF dir as a ... but now you've got those classes being loaded > twice, and you've multiplied all of your problems. > > 2) let's ignore the fact that your path has WEB-INF in it, and just > assume it's some path to somewhere where on disk that has nothing to > do with solr, and you want to load those jars. > > great -- solr will do that for you, and all of those classes will be > available to plugins. > > Now if you wnat to explicitly do something classloader related, you do > *not* want to be using Thread.currentThread().getContextClassLoader() ... > because the threads that execute everything in Solr are a pool of worker > threads that is created before solr ever has a chance to parse your /> directive. > > You want to ensure anything you do related to a Classloader uses the > ClassLoader Solr sets up for plugins -- that's available from the > SolrResourceLoader. > > You can always get the SolrResourceLoader via > SolrCore.getSolrResourceLoader(). from there you can getClassLoader() if > you really need some hairy custom stuff -- or if you are just trying to > load a simple resource file as an InputStream, use openResource(String > name) ... that will start by checking for it in the conf dir, and will > fallback to your jar -- so you can have a default resource file shipped > with your plugin, but allow users to override it in their collection > configs. > > > -Hoss > http://www.lucidworks.com/ >
Re: Regarding JSON indexing in SOLR 4.10
The document you're sending to Solr doesn't have an "id" field. The copyField directive has nothing to do with it. And you copyField would be copying _from_ the id field _to_ the Keyword field, is that what you intended? Even if the source and dest fields were reversed, it still wouldn't work since there is no id field as indicated by the error. Let's see one of the json files please? Are they carefully-formulated or arbitrary files? If carefully formulated, just switch Best, Erick On Wed, Mar 30, 2016 at 11:26 AM, Aditya Desai wrote: > Hi Paul > > Thanks a lot for your help! I have one small question, I have schema that > includes {Keyword,id,currency,geographic_name}. Now I have given > id > And > > Whenever I am running your script I am getting an error as > > > 400 name="QTime">2Document is > missing mandatory uniqueKey field: id400 > > > Can you please share your expertise advice here. Can you please guide me a > good source to learn SOLR? > > I am learning and I would really appreciate if you can help me. > > Regards > > > On Wed, Mar 30, 2016 at 6:55 AM, Paul Hoffman wrote: > >> On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote: >> > I am running SOLR 4.10 on port 8984 by changing the default port in >> > etc/jetty.xml. I am now trying to index all my JSON files to Solr running >> > on 8984. The following is the command >> > >> > curl ' >> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= >> ' --data-binary *.json >> > -H 'Content-type:application/json' >> >> The wildcard is the problem; your shell is expanding --data-binary >> *.json to --data-binary foo.json bar.json baz.json and curl doesn't know >> how to download bar.json and baz.json. >> >> Try this instead: >> >> for file in *.json; do >> curl ' >> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= >> ' --data-binary "$file" -H 'Content-type:application/json' >> done >> >> Paul. >> >> -- >> Paul Hoffman >> Systems Librarian >> Fenway Libraries Online >> c/o Wentworth Institute of Technology >> 550 Huntington Ave. >> Boston, MA 02115 >> (617) 442-2384 (FLO main number) >> > > > > -- > Aditya Ramachandra Desai > MS Computer Science Graduate Student > USC Viterbi School of Engineering > Los Angeles, CA 90007 > M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai
Re: Regarding JSON indexing in SOLR 4.10
Hi Erick Thanks for your email. Here is the attached sample JSON file. When I indexed the same JSON file with SOLR 5.5 using bin/post it indexed successfully. Also all of my documents were indexed successfully with 5.5 and not with 4.10. Regards On Wed, Mar 30, 2016 at 3:13 PM, Erick Erickson wrote: > The document you're sending to Solr doesn't have an "id" field. The > copyField directive has > nothing to do with it. And you copyField would be copying _from_ the > id field _to_ the > Keyword field, is that what you intended? > > Even if the source and dest fields were reversed, it still wouldn't > work since there is no id > field as indicated by the error. > > Let's see one of the json files please? Are they carefully-formulated > or arbitrary files? If > carefully formulated, just switch > > Best, > Erick > > On Wed, Mar 30, 2016 at 11:26 AM, Aditya Desai wrote: > > Hi Paul > > > > Thanks a lot for your help! I have one small question, I have schema that > > includes {Keyword,id,currency,geographic_name}. Now I have given > > id > > And > > > > Whenever I am running your script I am getting an error as > > > > > > 400 > name="QTime">2Document is > > missing mandatory uniqueKey field: id name="code">400 > > > > > > Can you please share your expertise advice here. Can you please guide me > a > > good source to learn SOLR? > > > > I am learning and I would really appreciate if you can help me. > > > > Regards > > > > > > On Wed, Mar 30, 2016 at 6:55 AM, Paul Hoffman wrote: > > > >> On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote: > >> > I am running SOLR 4.10 on port 8984 by changing the default port in > >> > etc/jetty.xml. I am now trying to index all my JSON files to Solr > running > >> > on 8984. The following is the command > >> > > >> > curl ' > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= > >> ' --data-binary *.json > >> > -H 'Content-type:application/json' > >> > >> The wildcard is the problem; your shell is expanding --data-binary > >> *.json to --data-binary foo.json bar.json baz.json and curl doesn't know > >> how to download bar.json and baz.json. > >> > >> Try this instead: > >> > >> for file in *.json; do > >> curl ' > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= > >> ' --data-binary "$file" -H 'Content-type:application/json' > >> done > >> > >> Paul. > >> > >> -- > >> Paul Hoffman > >> Systems Librarian > >> Fenway Libraries Online > >> c/o Wentworth Institute of Technology > >> 550 Huntington Ave. > >> Boston, MA 02115 > >> (617) 442-2384 (FLO main number) > >> > > > > > > > > -- > > Aditya Ramachandra Desai > > MS Computer Science Graduate Student > > USC Viterbi School of Engineering > > Los Angeles, CA 90007 > > M : +1-415-463-9864 | L : > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_adityardesai&d=CwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=ihbpCZYoNmoSqzckKlY5lkESOZXPuLtNIGjnLZCzj78&s=YD-dm-5blmQ07_4vYFoLz6r0NqKRNK1aHtIgHUvc48U&e= > -- Aditya Ramachandra Desai MS Computer Science Graduate Student USC Viterbi School of Engineering Los Angeles, CA 90007 M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai 0A0B69C000E730AE9A1F08E6D7442CC0FB94FC0512624704D06EB48E03C49E16_Output.json Description: application/json
issue with 5.3.1 and index version
When I index 5.4.1 using luceneVer in solrlconfig.xml of 5.3.1, the segmentsw_9 files has in it Lucene54. Why? Is this a known bug? #strings segments_9 segments Lucene54 commitTimeMSec 1459374733276 -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Regarding JSON indexing in SOLR 4.10
Hmmm, not sure and unfortunately won't be able to look very closely. Do the Solr logs say anything more informative? Also, the admin UI>>select core>>documents lets you submit docs interactively to Solr, that's also worth a try I should think. Best, Erick On Wed, Mar 30, 2016 at 3:15 PM, Aditya Desai wrote: > Hi Erick > > Thanks for your email. Here is the attached sample JSON file. When I indexed > the same JSON file with SOLR 5.5 using bin/post it indexed successfully. > Also all of my documents were indexed successfully with 5.5 and not with > 4.10. > > Regards > > On Wed, Mar 30, 2016 at 3:13 PM, Erick Erickson > wrote: >> >> The document you're sending to Solr doesn't have an "id" field. The >> copyField directive has >> nothing to do with it. And you copyField would be copying _from_ the >> id field _to_ the >> Keyword field, is that what you intended? >> >> Even if the source and dest fields were reversed, it still wouldn't >> work since there is no id >> field as indicated by the error. >> >> Let's see one of the json files please? Are they carefully-formulated >> or arbitrary files? If >> carefully formulated, just switch >> >> Best, >> Erick >> >> On Wed, Mar 30, 2016 at 11:26 AM, Aditya Desai wrote: >> > Hi Paul >> > >> > Thanks a lot for your help! I have one small question, I have schema >> > that >> > includes {Keyword,id,currency,geographic_name}. Now I have given >> > id >> > And >> > >> > Whenever I am running your script I am getting an error as >> > >> > >> > 400> > name="QTime">2Document is >> > missing mandatory uniqueKey field: id> > name="code">400 >> > >> > >> > Can you please share your expertise advice here. Can you please guide me >> > a >> > good source to learn SOLR? >> > >> > I am learning and I would really appreciate if you can help me. >> > >> > Regards >> > >> > >> > On Wed, Mar 30, 2016 at 6:55 AM, Paul Hoffman wrote: >> > >> >> On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote: >> >> > I am running SOLR 4.10 on port 8984 by changing the default port in >> >> > etc/jetty.xml. I am now trying to index all my JSON files to Solr >> >> > running >> >> > on 8984. The following is the command >> >> > >> >> > curl ' >> >> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= >> >> ' --data-binary *.json >> >> > -H 'Content-type:application/json' >> >> >> >> The wildcard is the problem; your shell is expanding --data-binary >> >> *.json to --data-binary foo.json bar.json baz.json and curl doesn't >> >> know >> >> how to download bar.json and baz.json. >> >> >> >> Try this instead: >> >> >> >> for file in *.json; do >> >> curl ' >> >> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= >> >> ' --data-binary "$file" -H 'Content-type:application/json' >> >> done >> >> >> >> Paul. >> >> >> >> -- >> >> Paul Hoffman >> >> Systems Librarian >> >> Fenway Libraries Online >> >> c/o Wentworth Institute of Technology >> >> 550 Huntington Ave. >> >> Boston, MA 02115 >> >> (617) 442-2384 (FLO main number) >> >> >> > >> > >> > >> > -- >> > Aditya Ramachandra Desai >> > MS Computer Science Graduate Student >> > USC Viterbi School of Engineering >> > Los Angeles, CA 90007 >> > M : +1-415-463-9864 | L : >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_adityardesai&d=CwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=ihbpCZYoNmoSqzckKlY5lkESOZXPuLtNIGjnLZCzj78&s=YD-dm-5blmQ07_4vYFoLz6r0NqKRNK1aHtIgHUvc48U&e= > > > > > -- > Aditya Ramachandra Desai > MS Computer Science Graduate Student > USC Viterbi School of Engineering > Los Angeles, CA 90007 > M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai >
Re: Solr response error 403 when I try to index medium.com articles
Great, but there is any way to change solr header to set user-agent? 2016-03-30 17:13 GMT-03:00 Jack Krupansky : > You could use the curl command to read a URL on Medium.com. That would let > you examine and control the headers to experiment. > > Google is able to index Medium. > > Check the URL and make sure it's not on one of the paths disallowed by > medium.com/robots.txt (the one you gave seems fine): > > User-Agent: * > Disallow: /_/ > Disallow: /m/ > Disallow: /me/ > Disallow: /@me$ > Disallow: /@me/ > Disallow: /*/*/edit > Sitemap: https://medium.com/sitemap/sitemap.xml > > > > -- Jack Krupansky > > On Wed, Mar 30, 2016 at 1:05 PM, Chris Hostetter > > wrote: > > > > > 403 means "forbidden" > > > > Something about the request Solr is sending -- or soemthing about the IP > > address Solr is connecting from when talking to medium.com -- is causing > > hte medium.com web server to reject the request. > > > > This is something that servers may choose to do if they detect (via > > headers, or missing headers, or reverse ip lookup, or other > > distinctive nuances of how the connection was made) that the > > client connecting to their server isn't a "human browser" (ie: firefox, > > chrome, safari) and is a Robot that they don't want to cooperate with > (ie: > > they might be happy toserve their pages to the google-bot crawler, but > not > > to some third-party they've never heard of. > > > > The specifics of how/why you might get a 403 for any given url are hard > to > > debug -- it might literally depend on how many requests you've sent > tothat > > domain in the past X hours. > > > > In general Solr's ContentStream indexing from remote hosts isn't inteded > > to be a super robust solution for crawling arbitrary websites on the web > > -- if that's your goal, then i would suggest you look into running a more > > robust crawler (nutch, droids, Lucidworks Fusion, etc...) that has more > > features and debugging options (notably: rate limiting) and use that code > > to feath the content, then push it to Solr. > > > > > > : Date: Tue, 29 Mar 2016 20:54:52 -0300 > > : From: Jeferson dos Anjos > > : Reply-To: solr-user@lucene.apache.org > > : To: solr-user@lucene.apache.org > > : Subject: Solr response error 403 when I try to index medium.com > articles > > : > > : I'm trying to index some pages of the medium. But I get error 403. I > > : believe it is because the medium does not accept the user-agent solr. > Has > > : anyone ever experienced this? You know how to change? > > : > > : I appreciate any help > > : > > : > > : 500 > > : 94 > > : > > : > > : > > : Server returned HTTP response code: 403 for URL: > > : > > > https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1 > > : > > : > > : java.io.IOException: Server returned HTTP response code: 403 for URL: > > : > > > https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1 > > : at sun.reflect.GeneratedConstructorAccessor314.newInstance(Unknown > > : Source) at > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > > : Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) > > : at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source) > > : at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source) > > : at java.security.AccessController.doPrivileged(Native Method) at > > : sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown > > : Source) at > > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown > > : Source) at > > sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown > > : Source) at > > sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown > > : Source) at > > > org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:87) > > : at > > > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158) > > : at > > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > > : at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) > > : at > > > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291) > > : at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at > > : > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) > > : at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) > > : at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204) > > : at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > > : at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > > : at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > : at > > > org.eclipse.jetty.secu
Re: No live SolrServers available to handle this request
Thanks Shawn and Elaine, Elaine, Yes all the documents of same route key resides on same shard. Shawn, I will try to capture the logs. Thanks. Regards, Anil On 25 March 2016 at 02:57, Elaine Cario wrote: > Anil, > > I've seen situations where if there was a problem with a specific query, > and every shard responds with the same error, the actual exception gets > hidden by a "No live SolrServers..." exception. We originally saw this > with wildcard queries (when every shard reported a "too many expansions..." > type error, but the exception in the response was "No live SolrServers..." > error. > > You mention that you are using collapse/expand, and that you have shards - > that could possibly cause some issue, as I think collapse and expand only > work correctly if the data for any particular collapse value resides on one > shard. > > On Sat, Mar 19, 2016 at 1:04 PM, Shawn Heisey wrote: > > > On 3/18/2016 9:55 PM, Anil wrote: > > > Thanks for your response. > > > CDH is a Cloudera (third party) distribution. is there any to get the > > > notifications copy of it when cluster state changed ? in logs ? > > > > > > I can assume that the exception is result of no availability of > replicas > > > only. Agree? > > > > Yes, I think that Solr believes there are no replicas for at least one > > shard. As for why it believes that, I cannot say. > > > > If Solr logged every single thing that happened where zookeeper (or even > > just the clusterstate) is involved, you'd be drowning in logs. Much > > more than already happens. The logfile is already very verbose. > > > > Chances are that at least one of your Solr nodes *did* log something > > related to a problem with that collection before you got the error > > you're asking about. > > > > The "No live SolrServers" error is one that people are seeing quite > > frequently. There may be some instances where Solr isn't behaving > > correctly, but I think when this happens, it usually indicates there's a > > real problem of some kind. > > > > To troubleshoot, we'll need to see any errors or warnings you find in > > your Solr logfiles from the time before you get an error on a request. > > You'll need to check the logfile on all Solr nodes. > > > > It might be a good idea to also involve Cloudera support, see what they > > think. > > > > Thanks, > > Shawn > > > > >
Re: How to implement Autosuggestion
Hi All, I've similar query regarding autosuggestion. My use case is as below: 1. User enters product name (say Nokia) 2. I want suggestions along with the category with which the product belongs. (e.g Nokia belongs to "electronics" and "mobile" category) so I want suggestion like Nokia in electronics and Nokia in mobile. I am able to get the suggestions using the OOTB AnalyzingInFixSuggester but not sure how I can get the category along with the suggestion (can this category be considered as facet of the suggestion??) Any help/pointer is highly appreciated. Thanks, Chandan On Wed, Mar 30, 2016 at 1:37 PM, Alessandro Benedetti wrote: > Hi Mugeesh, autocompletion world is not that simple as you would expect. > Which kind of auto suggestion are you interested in ? > > First of all, simple string autosuggestion or document autosuggestion ? ( > with more additional field to show then the label) > Are you interested in the analysis for the text to suggest ? Fuzzy > suggestions ? exact "beginning of the phrase" suggestions ? infix > suggestions ? > Try to give some example and we could help better . > There is a specific suggester component, so it is likely to be useful to > you, but let's try to discover more. > > Cheers > > On Mon, Mar 28, 2016 at 6:03 PM, Reth RM wrote: > > > Solr AnalyzingInfix suggester component: > > https://lucidworks.com/blog/2015/03/04/solr-suggester/ > > > > > > > > On Mon, Mar 28, 2016 at 7:57 PM, Mugeesh Husain > wrote: > > > > > Hi, > > > > > > I am looking for the best way to implement autosuggestion in ecommerce > > > using solr or elasticsearch. > > > > > > I guess using ngram analyzer is not a good way if data is big. > > > > > > > > > Please suggest me any link or your opinion ? > > > > > > > > > > > > Thanks > > > Mugeesh > > > > > > > > > > > > -- > > > View this message in context: > > > > > > http://lucene.472066.n3.nabble.com/How-to-implement-Autosuggestion-tp4266434.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >
Re: Solr not working on new environment
OK, solved. It seems I had to first create a core, then configure Drupal to point to the path for that core. I have to say, this is one of the more helpful lists I have used. Thanks a lot for your help! "Getting information off the Internet is like taking a drink from a fire hydrant." - Mitchell Kapor .---. .-. .-..-. .-.,'|"\.---.,--, / .-. ) ) \_/ / \ \_/ )/| |\ \ / .-. ) .' .' | | |(_)(_) /\ (_)| | \ \ | | |(_)| | __ | | | | / _ \ ) ( | | \ \| | | | \ \ ( _) \ `-' / / / ) \| | /(|`-' /\ `-' / \ `-) ) )---' `-' (_)-' /(_| (__)`--' )---' )\/ (_) (__)(_) (__) On Wed, Mar 30, 2016 at 5:51 PM, Erick Erickson wrote: > Whoa! I thought you were going for SolrCloud. If you're not interested in > SolrCloud, you don't need to know anything about Zookeeper. > > So it looks like Solr is running. You say: > > bq: When I try to connect to :8983/solr, I get a timeout. > Does it sound like firewall issues? > > are you talking about Drupal or about a simple browser connection? If > the former, I'm all out of ideas > as I know very little about the Drupal integration and/or whether it's > even possible with a 5.x... > > Best, > Erick > > On Wed, Mar 30, 2016 at 2:52 AM, Jarus Bosman wrote: > > OK, an update. I managed to remove the example/cloud directories, and > stop > > Solr. I changed my startup script to be much simpler (./solr start) and > now > > I get this: > > > > *[root@ bin]# ./startsolr.sh* > > *Waiting up to 30 seconds to see Solr running on port 8983 [|]* > > *Started Solr server on port 8983 (pid=31937). Happy searching!* > > * > > > > [root@nationalarchives bin]# > > ./solr status* > > > > *Found 1 Solr nodes:* > > > > *Solr process 31937 running on port 8983* > > *{* > > * "solr_home":"/opt/solr-5.5.0/server/solr",* > > * "version":"5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - > > 2016-02-16 15:22:52",* > > * "startTime":"2016-03-30T09:24:21.445Z",* > > * "uptime":"0 days, 0 hours, 3 minutes, 9 seconds",* > > * "memory":"62 MB (%12.6) of 490.7 MB"}* > > > > I now want to connect to it from my Drupal installation, but I'm getting > > this: "The Solr server could not be reached. Further data is therefore > > unavailable." - I realise this is probably not a Solr error, just giving > > all the information I have. When I try to connect to > > :8983/solr, I get a timeout. Does it sound like firewall > issues? > > > > Regards, > > Jarus > > > > "Getting information off the Internet is like taking a drink from a fire > > hydrant." - Mitchell Kapor > > > > .---. .-. .-..-. .-.,'|"\.---.,--, > > / .-. ) ) \_/ / \ \_/ )/| |\ \ / .-. ) .' .' > > | | |(_)(_) /\ (_)| | \ \ | | |(_)| | __ > > | | | | / _ \ ) ( | | \ \| | | | \ \ ( _) > > \ `-' / / / ) \| | /(|`-' /\ `-' / \ `-) ) > > )---' `-' (_)-' /(_| (__)`--' )---' )\/ > > (_) (__)(_) (__) > > > > On Wed, Mar 30, 2016 at 8:50 AM, Jarus Bosman wrote: > > > >> Hi Erick, > >> > >> Thanks for the reply. It seems I have not done all my homework yet. > >> > >> We used to use Solr 3.6.2 on the old environment (we're using it in > >> conjunction with Drupal). When I got connectivity problems on the new > >> server, I decided to rather implement the latest version of Solr > (5.5.0). I > >> read the Quick Start documentation and expected it to work first time, > but > >> not so (as per my previous email). I will read up a bit on ZooKeeper > (never > >> heard of it before - What is it?). Is there a good place to read up on > >> getting started with ZooKeeper and the latest versions of Solr (apart > from > >> what you have replied, of course)? > >> > >> Thank you so much for your assistance, > >> Jarus > >> > >> > >> "Getting information off the Internet is like taking a drink from a fire > >> hydrant." - Mitchell Kapor > >> > >> .---. .-. .-..-. .-.,'|"\.---.,--, > >> / .-. ) ) \_/ / \ \_/ )/| |\ \ / .-. ) .' .' > >> | | |(_)(_) /\ (_)| | \ \ | | |(_)| | __ > >> | | | | / _ \ ) ( | | \ \| | | | \ \ ( _) > >> \ `-' / / / ) \| | /(|`-' /\ `-' / \ `-) ) > >> )---' `-' (_)-' /(_| (__)`--' )---' )\/ > >> (_) (__)(_) (__) > >> > >> On Wed, Mar 30, 2016 at 6:20 AM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >>> Good to meet you! > >>> > >>> It looks like you've tried to start Solr a time or two. When you start > >>> up the "cloud" example > >>> it creates > >>> /opt/solr-5.5.0/example/cloud > >>> and puts your SolrCloud stuff under there. It also automatically puts > >>> your configuration > >>> sets up on Zookeeper. When I get this kind of thing, I usually > >>> > >>> > stop Zookeeper (if running externally) > >>> > >>> > rm -rf /opt/solr-5.5.0/example/cloud > >>> > >>> > delete all the Zookeeper data. It may take a bit of poking to fin