Re: Queries regarding solr cache
Hi Shawn, Need your help: I am using master slave architecture in my system and here is the solrconfig.xml: ${enable.master:false} startup commit 00:00:10 managed-schema ${enable.slave:false} http://${MASTER_CORE_URL}/${solr.core.name} ${POLL_TIME} Problem: I am Noticing that my slaves are not able to use proper caching as: 1. I am indexing on my master and committing frequently, what i am noticing is that my slaves are committing very frequently and cache is not being build properly and so my hit ratio is almost zero for caching. 2. What changes I need to make so that the cache builds up properly even after commits and cache could be used properly, this is wasting a lot of my resources and also slowering up the queries. On Mon, Dec 5, 2016 at 9:06 PM, Shawn Heisey wrote: > On 12/5/2016 6:44 AM, kshitij tyagi wrote: > > - lookups:381 > > - hits:24 > > - hitratio:0.06 > > - inserts:363 > > - evictions:0 > > - size:345 > > - warmupTime:2932 > > - cumulative_lookups:294948 > > - cumulative_hits:15840 > > - cumulative_hitratio:0.05 > > - cumulative_inserts:277963 > > - cumulative_evictions:70078 > > > > How can I increase my hit ratio? I am not able to understand solr > > caching mechanism clearly. Please help. > > This means that out of the nearly 30 queries executed by that > handler, only five percent (15000) of them were found in the cache. The > rest of them were not found in the cache at the moment they were made. > Since these numbers come from the queryResultCache, this refers to the > "q" parameter. The filterCache handles things in the fq parameter. The > documentCache holds actual documents from your index and fills in stored > data in results so the document doesn't have to be fetched from the index. > > Possible reasons: 1) Your users are rarely entering the same query more > than once. 2) Your client code is adding something unique to every > query (q parameter) so very few of them are the same. 3) You are > committing so frequently that the cache never has a chance to get large > enough to make a difference. > > Here are some queryResultCache stats from one of my indexes: > > class:org.apache.solr.search.FastLRUCache > version:1.0 > description:Concurrent LRU Cache(maxSize=512, initialSize=512, > minSize=460, acceptableSize=486, cleanupThread=true, > autowarmCount=8, > regenerator=org.apache.solr.search.SolrIndexSearcher$3@1d172ac0) > src:$URL: > https://svn.apache.org/repos/asf/lucene/dev/ > branches/lucene_solr_4_7/solr/core/src/java/org/ > apache/solr/search/FastLRUCache.java > lookups: 3496 > hits: 3145 > hitratio: 0.9 > inserts: 335 > evictions: 0 > size: 338 > warmupTime: 2209 > cumulative_lookups: 12394606 > cumulative_hits: 11247114 > cumulative_hitratio: 0.91 > cumulative_inserts: 1110375 > cumulative_evictions: 409887 > > These numbers indicate that 91 percent of the queries made to this > handler were served from the cache. > > Thanks, > Shawn > >
Re: Solr ACL Plugin Windows
I didn't see a real Java project there, but the directions to compile on Linux are almost always applicable to Windows with Java. If you find a project that says it uses Ant or Maven, all you need to do is download Ant or Maven, the Java Development Kit and put both of them on the windows path. Then it's either "ant package" (IIRC most of the time) or "mvn install" from within the folder that has the project. FWIW, creating a simple ACL doesn't even require a custom plugin. This is roughly how you would do it w/ an application that your team has written that works with solr: 1. Add a multivalue string field called ACL or privileges 2. Write something for your app that can pull a list of attributes/privileges from a database for the current user. 3. Append a filter query to the query that matches those attributes. Ex: fq=privileges:(DEVELOPER AND DEVOPS) If you are using a role-based system that bundles groups of permissions into a role, all you need to do is decompose the role into a list of permissions for the user and put all of the required permissions into that multivalue field. Mike On Wed, Jan 4, 2017 at 2:55 AM, wrote: > I am searching a SOLR ACL Plugin, i found this > https://lucidworks.com/blog/2015/05/15/custom-security-filtering-solr-5/ > > but i don't know how i can compile the jave into to a jar - all Infos i > found was how to complie it on linux - but this doesn't help. > > I am running solr version 6.3.0 on windows Server 2003 > > So i am searching for infos about compiling a plugin under windows. > > Thanxs in advance :D > > > This message was sent using IMP, the Internet Messaging Program. > >
ClusterStateMutator
Hi, in solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java there is the following code starting line 107: //TODO default to 2; but need to debug why BasicDistributedZk2Test fails early on String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? null : ZkStateReader.getCollectionPath(cName); Any if that will be changed to default to version 2 anytime soon? thanks, Hendrik
Re: create collection gets stuck on node restart
Problem is that we would like to run without down times. Rolling updates worked fine so far except when creating a collection at the wrong time. I just did another test with stateFormat=2. This seems to greatly improve the situation. One collection creation got stuck but other creations still worked and after a restart of some nodes the stuck collection creation also looked ok. For some reason it just resulted in two replicas for the same shard getting assigned to the same node even though I specified a rule of "shard:*,replica:<2,node:*". On 03.01.2017 15:34, Shawn Heisey wrote: On 1/3/2017 2:59 AM, Hendrik Haddorp wrote: I have a SolrCloud setup with 5 nodes and am creating collections with a replication factor of 3. If I kill and restart nodes at the "right" time during the creation process the creation seems to get stuck. Collection data is left in the clusterstate.json file in ZooKeeper and no collections can be created anymore until this entry gets removed. I can reproduce this on Solr 6.2.1 and 6.3, while 6.3 seems to be somewhat less likely to get stuck. Is Solr supposed to recover from data being stuck in the clusterstate.json at some point? I had one instance where it looked like data was removed again but normally the data does not seem to get cleaned up automatically and just blocks any further collection creations. I did not find anything like this in Jira. Just SOLR-7198 sounds a bit similar even though it is about deleting collections. Don't restart your nodes at the same time you're trying to do maintenance of any kind on your collections. Try to only do maintenance when they are all working, or you'll get unexpected results. The most recent development goal is make it so that collection deletion can be done even if the creation was partial. The idea is that if something goes wrong, you can delete the bad collection and then be free to try to create it again. I see that you've started another thread about deletion not fully eliminating everything in HDFS. That does sound like a bug. I have no experience with HDFS at all, so I can't be helpful with that. Thanks, Shawn
Re: Solr ACL Plugin Windows
Thanks, Mike, for emphasizing that point. I put that point in the blog post as well - the recommended approach if it's sufficient for sure. Erik > On Jan 4, 2017, at 07:36, Mike Thomsen wrote: > > I didn't see a real Java project there, but the directions to compile on > Linux are almost always applicable to Windows with Java. If you find a > project that says it uses Ant or Maven, all you need to do is download Ant > or Maven, the Java Development Kit and put both of them on the windows > path. Then it's either "ant package" (IIRC most of the time) or "mvn > install" from within the folder that has the project. > > FWIW, creating a simple ACL doesn't even require a custom plugin. This is > roughly how you would do it w/ an application that your team has written > that works with solr: > > 1. Add a multivalue string field called ACL or privileges > 2. Write something for your app that can pull a list of > attributes/privileges from a database for the current user. > 3. Append a filter query to the query that matches those attributes. Ex: > > fq=privileges:(DEVELOPER AND DEVOPS) > > > If you are using a role-based system that bundles groups of permissions > into a role, all you need to do is decompose the role into a list of > permissions for the user and put all of the required permissions into that > multivalue field. > > Mike > >> On Wed, Jan 4, 2017 at 2:55 AM, wrote: >> >> I am searching a SOLR ACL Plugin, i found this >> https://lucidworks.com/blog/2015/05/15/custom-security-filtering-solr-5/ >> >> but i don't know how i can compile the jave into to a jar - all Infos i >> found was how to complie it on linux - but this doesn't help. >> >> I am running solr version 6.3.0 on windows Server 2003 >> >> So i am searching for infos about compiling a plugin under windows. >> >> Thanxs in advance :D >> >> >> This message was sent using IMP, the Internet Messaging Program. >> >>
Re: create collection gets stuck on node restart
On 1/4/2017 6:23 AM, Hendrik Haddorp wrote: > Problem is that we would like to run without down times. Rolling > updates worked fine so far except when creating a collection at the > wrong time. I just did another test with stateFormat=2. This seems to > greatly improve the situation. One collection creation got stuck but > other creations still worked and after a restart of some nodes the > stuck collection creation also looked ok. For some reason it just > resulted in two replicas for the same shard getting assigned to the > same node even though I specified a rule of "shard:*,replica:<2,node:*". I have no idea what that rule means or where you might be configuring it. That must be for a feature that I've never used. If you're going to restart nodes, then don't create collections at that moment. Wait until after the restart is completely finished. If it's all automated ... then design your tools so that they do not create collections and do the restarts at the same time. Thanks, Shawn
Re: Queries regarding solr cache
On 1/4/2017 3:45 AM, kshitij tyagi wrote: > Problem: > > I am Noticing that my slaves are not able to use proper caching as: > > 1. I am indexing on my master and committing frequently, what i am noticing > is that my slaves are committing very frequently and cache is not being > build properly and so my hit ratio is almost zero for caching. > > 2. What changes I need to make so that the cache builds up properly even > after commits and cache could be used properly, this is wasting a lot of my > resources and also slowering up the queries. Whenever you commit with openSearcher set to true (which is the default), Solr immediately throws the cache away. This is by design -- the cache contains internal document IDs from the previous index, due to merging, the new index might have entirely different ID values for the same documents. A commit on the master will cause the slave to copy the index on its next configured replication interval, and then basically do a commit of its own to signal that a new searcher is needed. The caches have a feature called autowarming, which takes the top N entries in the cache and re-executes the queries that produced the entries to populate the new cache before the new searcher starts. If you set autowarmCount too high, it makes the commits take a really long time. If you are committing so frequently that your cache is ineffective, then you need to commit less frequently. Whenever you do a commit on the master, the slave will also do a commit after it copies the new index. Thanks, Shawn
update/extract override ExtractTyp
Hello Is it possible to override the ExtractClass for a specific document? I would like to upload a XML Document, but this XML is not XML conform I need this XML because it is part of a project where a corrupt XML is need, for testing purpose. The update/extract process failes every time with an 500 error. I tried to override the Content-Type with "text/plain" but get still the XML parse error. Is it possible to override it? This message was sent using IMP, the Internet Messaging Program.
RE: Can I use SolrJ 6.3.0 to talk to a Solr 5.2.3 server?
Can anyone explain how to get rid of this error? java.lang.Exception: Assertions mismatch: -ea was not specified but -Dtests.asserts=true at __randomizedtesting.SeedInfo.seed([5B25E606A72BD541]:0) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:47) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at java.lang.Thread.run(Thread.java:745) Here is my test class: public class SolrCatalogClientTest extends SolrJettyTestBase { private static SolrCatalogClient solrClient; protected static JettySolrRunner SOLR; @BeforeClass public static void setUpBeforeClass() throws Exception { System.setProperty("tests.asserts", "false"); System.setProperty("solr.solr.home", "solr/conf"); System.setProperty("solr.core.name", "mySolrCore"); System.setProperty("solr.data.dir", new File("target/solr-embedded-data").getAbsolutePath()); System.out.println("Initializing Solr for JUnit tests"); SOLR = createJetty( "target/solr-embedded-data", JettyConfig.builder() .setPort(8983) .setContext("/solr") .stopAtShutdown(true) .build()); solrClient = new SolrCatalogClient(); ...} } Should I be using initCore instead of createJetty? Thank you! -Jennifer -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Tuesday, January 03, 2017 2:31 PM To: solr-user@lucene.apache.org Subject: Re: Can I use SolrJ 6.3.0 to talk to a Solr 5.2.3 server? On 1/3/2017 10:35 AM, Jennifer Coston wrote: > I am running into a conflict with Solr and ElasticSearch. We are trying to > add support for Elastic Search 5.1.1 which requires Lucene 6.3.0 to an > existing system that uses Solr 5.2.3. At the moment I am using SolrJ 5.3.1 to > talk to the 5.2.3 Server. I was hoping I could just update the SolrJ > libraries to 6.3.0 so the Lucene conflict goes away, but when I try to run my > unit tests I'm seeing this error: There is no 5.2.3 version of Solr. The 5.2.x line ended with 5.2.1. There is a 5.3.2 version. If you're using HttpSolrClient, mixing a 5.x server and a 6.x client will be no problem at all. If you're using CloudSolrClient you may run into issues because of the very rapid pace of change in SolrCloud -- mixing major versions is not recommended. If you're using EmbeddedSolrServer, then the client and the server are not separate, so upgrading both at once is probably OK, but the code and the core config might require changes. > java.util.ServiceConfigurationError: Cannot instantiate SPI class: > org.apache.lucene.codecs.simpletext.SimpleTextPostingsFormat This is an error from a Solr *server*, not a client. Are you using EmbeddedSolrServer? > Here are the Solr Dependencies I have in my pom.xml: You've got a solr-core dependency here. That is not at all necessary for a SolrJ client, *unless* you are using EmbeddedSolrServer. If you are using the embedded server, then things are much more complicated, and your code may require changes before it will work correctly with a new major version. The embedded server is NOT recommended unless you have no other choice. The unit tests included with Solr are able to fire up Solr with Jetty for tests, just like a real separate Solr server. Your tests can also do this. Look for tests in the Lucene/Solr codebase that extend SolrJettyTestBase, and use those as a guide. Thanks, Shawn
Re: ClusterStateMutator
Actually the state format defaults to 2 since many releases (all of 6.x at least). This default is enforced in CollectionsHandler much before the code in ClusterStateMutator is executed. On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp wrote: > Hi, > > in > solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java > there is the following code starting line 107: > > //TODO default to 2; but need to debug why BasicDistributedZk2Test fails > early on > String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? null > : ZkStateReader.getCollectionPath(cName); > > Any if that will be changed to default to version 2 anytime soon? > > thanks, > Hendrik -- Regards, Shalin Shekhar Mangar.
Re: Random Streaming Function not there? SolrCloud 6.3.0
This issue is resolved for Solr 6.4: https://issues.apache.org/jira/browse/SOLR-9919 I also created an issue to resolve future bugs of this nature: https://issues.apache.org/jira/browse/SOLR-9924 Thanks for the bug report! Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jan 3, 2017 at 9:04 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Thanks! I'll give this a shot. > > -Joe > > > > On 1/3/2017 8:52 PM, Joel Bernstein wrote: > >> Luckily https://issues.apache.org/jira/browse/SOLR-9103 is available in >> Solr 6.3 >> >> So you can register the random expression through the solrconfig. The >> ticket shows an example. >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Tue, Jan 3, 2017 at 7:59 PM, Joel Bernstein >> wrote: >> >> This is a bug. I just checked and the random expression is not mapped in >>> the /stream handler. The test cases pass because they register the random >>> function explicitly inside the test case. This is something that we need >>> to >>> fix, so the /stream handler registration always gets tested. >>> >>> I'm fairly sure I remember testing random at scale through the /stream >>> handler so I'm not sure how this missed getting committed. >>> >>> I will fix this for Solr 6.4. >>> >>> Joel Bernstein >>> http://joelsolr.blogspot.com/ >>> >>> On Tue, Jan 3, 2017 at 6:46 PM, Joe Obernberger < >>> joseph.obernber...@gmail.com> wrote: >>> >>> I'm getting an error: {"result-set":{"docs":[ {"EXCEPTION":"Invalid stream expression random(MAIN,q=\"FULL_DOCUMENT: obamacare\",rows=100,fl=DocumentId) - function 'random' is unknown (not mapped to a valid TupleStream)","EOF":true}]}} When trying to use the streaming random function. I'm using curl with: curl --data-urlencode 'expr=random(MAIN,q="FULL_DOCU MENT:obamacare",rows="100",fl="DocumentId")' http://cordelia:9100/solr/MAIN/stream Any ideas? Thank you! -Joe >
Re: Can I use SolrJ 6.3.0 to talk to a Solr 5.2.3 server?
On 1/4/2017 8:29 AM, Jennifer Coston wrote: > Can anyone explain how to get rid of this error? > > java.lang.Exception: Assertions mismatch: -ea was not specified but > -Dtests.asserts=true > at __randomizedtesting.SeedInfo.seed([5B25E606A72BD541]:0) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:47) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) > at java.lang.Thread.run(Thread.java:745) When you started your test, a system property was included (the java commandline option for that is -Dtests.assert=true) that tells something that you wanted assertions enabled. Looks like this message comes from the lucene test-framework. When this property is set to true, it also requires a java commandline option to enable assertions, which is -ea as the message said. You'll need to either add "-ea" to the java commandline or remove the system property. If you didn't specify the system property and can't figure out how to remove it, then you'll likely need help from somebody who's really familiar with the build system. You can find those people on the dev mailing list or the #lucene-dev IRC channel on freenode. Some of them do hang out on this mailing list, but it's not really on-topic for this list. http://lucene.apache.org/core/discussion.html If you choose the IRC channel, here's a Solr resource about IRC channels. The channels this page mentions are specific to Solr, but the information provided is relevant for any technical IRC channel: https://wiki.apache.org/solr/IRCChannels Thanks, Shawn
Re: update/extract override ExtractTyp
On 1/4/2017 8:12 AM, sn0...@ulysses-erp.com wrote: > Is it possible to override the ExtractClass for a specific document? > I would like to upload a XML Document, but this XML is not XML conform > > I need this XML because it is part of a project where a corrupt XML is > need, for testing purpose. > > > The update/extract process failes every time with an 500 error. > > I tried to override the Content-Type with "text/plain" but get still > the XML parse error. If you send something to the /update handler, and don't tell Solr that it is another format that it knows like CSV, JSON, or Javabin, then Solr assumes that it is XML -- and that it is the *specific* XML format that Solr uses. "text/plain" is not one of the formats that the update handler knows how to handle, so it will assume XML. If you send some other arbitrary XML content, even if that XML is otherwise correctly formed (which apparently yours isn't), Solr will throw an error, because it is not the type of XML that Solr is looking for. On this page are some examples of what Solr is expecting when you send XML: https://wiki.apache.org/solr/UpdateXmlMessages If you want to parse arbitrary XML into fields, you probably need to send it using DIH and the XPathEntityProcessor. If you want the XML to go into a field completely as-is, then you need to encode the XML into one of the update formats that Solr knows (XML, JSON, etc) and set it as the value of one of the fields. Thanks, Shawn
Zip Bomb Exception in HTML File
i get an exception "org.apache.tika.exception.TikaException: Zip bomb detected! if i would like to parse a html file - and i think i know why. because there are many many in cascade over 200 divs and span are inside each. Is it correct that there is this limit for html files? This message was sent using IMP, the Internet Messaging Program.
Re: Zip Bomb Exception in HTML File
You might get a more knowledgeable response from the Tika folks, that's really not something Solr controls. Best, Erick On Wed, Jan 4, 2017 at 8:50 AM, wrote: > i get an exception "org.apache.tika.exception.TikaException: > Zip bomb detected! > if i would like to parse a html file - and i think i know why. > because there are many many in cascade over 200 divs and span > are inside each. > > Is it correct that there is this limit for html files? > > > This message was sent using IMP, the Internet Messaging Program. >
Re: ClusterStateMutator
You are right, the code looks like it. But why did I then see collection data in the clusterstate.json file? If version 1 is not used I would assume that no data ends up in there. When explicitly setting the state format 2 the system seemed to behave differently. And if the code always uses version 2 shouldn't the default in that line be changed accordingly? On 04/01/17 16:41, Shalin Shekhar Mangar wrote: > Actually the state format defaults to 2 since many releases (all of > 6.x at least). This default is enforced in CollectionsHandler much > before the code in ClusterStateMutator is executed. > > On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp > wrote: >> Hi, >> >> in >> solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java >> there is the following code starting line 107: >> >> //TODO default to 2; but need to debug why BasicDistributedZk2Test fails >> early on >> String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? null >> : ZkStateReader.getCollectionPath(cName); >> >> Any if that will be changed to default to version 2 anytime soon? >> >> thanks, >> Hendrik > >
RE: Zip Bomb Exception in HTML File
This came up back in September [1] and [2]. Same trigger...crazy number of divs. I think we could modify the AutoDetectParser to enable configuration of maximum zip-bomb depth via tika-config. If there's any interest in this, re-open TIKA-2091, and I'll take a look. Best, Tim [1] http://git.net/ml/solr-user.lucene.apache.org/2016-09/msg00561.html [2] https://issues.apache.org/jira/browse/TIKA-2091 -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, January 4, 2017 12:20 PM To: solr-user Subject: Re: Zip Bomb Exception in HTML File You might get a more knowledgeable response from the Tika folks, that's really not something Solr controls. Best, Erick On Wed, Jan 4, 2017 at 8:50 AM, wrote: > i get an exception "org.apache.tika.exception.TikaException: > Zip bomb detected! > if i would like to parse a html file - and i think i know why. > because there are many many in cascade over 200 divs and > span are inside each. > > Is it correct that there is this limit for html files? > > > This message was sent using IMP, the Internet Messaging Program. >
Re: Solr 6.x : howto create a core in (embedded) CoreConatiner
I had success doing something like this, which I found in some of the Solr tests... SolrResourceLoader loader = new SolrResourceLoader(solrHomeDir.toPath()); Path configSetPath = Paths.get(configSetHome).toAbsolutePath(); final NodeConfig config = new NodeConfig.NodeConfigBuilder("embeddedSolrServerNode", loader) .setConfigSetBaseDirectory(configSetPath.toString()) .build(); EmbeddedSolrServer embeddedSolrServer = new EmbeddedSolrServer(config, coreName); CoreAdminRequest.Create createRequest = new CoreAdminRequest.Create(); createRequest.setCoreName(coreName); createRequest.setConfigSet(coreName); embeddedSolrServer.request(createRequest); The setup was to have a config set located at src/test/resources/configsets so configSetHome was src/test/resources/configsets, the coreName was the name of a configset in that directory, and solrHome was a path to target/solr. https://github.com/bbende/embeddedsolrserver-example/blob/master/src/test/java/org/apache/solr/EmbeddedSolrServerFactory.java https://github.com/bbende/embeddedsolrserver-example/blob/master/src/test/java/org/apache/solr/TestEmbeddedSolrServerFactory.java On Fri, Dec 30, 2016 at 3:27 AM, Clemens Wyss DEV wrote: > I am still using 5.4.1 and have the following code to create a new core: > ... > Properties coreProperties = new Properties(); > coreProperties.setProperty( CoreDescriptor.CORE_CONFIGSET, configsetToUse > ); > CoreDescriptor coreDescriptor = new CoreDescriptor( container, coreName, > coreFolder, coreProperties ); > coreContainer.create( coreDescriptor ); > coreContainer.getCoresLocator().create( coreContainer, coreDescriptor ); > ... > > What is the equivalent Java snippet in Solr 6.x (latest greatest)? > > Thx & a successful 2017! > Clemens >
Solr Initialization failure
Hello, while creating a new collection, it fails to spin up solr cores on some nodes due to "insufficient direct memory". Here is the error: - *3044_01_17_shard42_replica1:* org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: The max direct memory is likely too low. Either increase it (by adding -XX:MaxDirectMemorySize=g -XX:+UseLargePages to your containers startup args) or disable direct allocation using solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If you are putting the block cache on the heap, your java heap size might not be large enough. Failed allocating ~2684.35456 MB. The error is self explanatory. My question is: why does it require around 2.7 GB of off-heap memory to spin up a single core?? Thank you!
Re: ClusterStateMutator
Hendrik: Historically in 4.x, there was code that would reconstruct the clusterstate.json code. So you would see "deleted" collections come back. One scenario was: - Have a Solr node offline that had a replica for a collection. - Delete that collection - Bring the node back - It would register itself in clusterstate.json. So my guess is that something like this is going on and you're getting a clusterstate.json that's reconstructed (and possibly not complete). You can avoid this by specifying legacyCloud=false clusterprop Kind of a shot in the dark... Erick On Wed, Jan 4, 2017 at 11:12 AM, Hendrik Haddorp wrote: > You are right, the code looks like it. But why did I then see collection > data in the clusterstate.json file? If version 1 is not used I would > assume that no data ends up in there. When explicitly setting the state > format 2 the system seemed to behave differently. And if the code always > uses version 2 shouldn't the default in that line be changed accordingly? > > On 04/01/17 16:41, Shalin Shekhar Mangar wrote: >> Actually the state format defaults to 2 since many releases (all of >> 6.x at least). This default is enforced in CollectionsHandler much >> before the code in ClusterStateMutator is executed. >> >> On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp >> wrote: >>> Hi, >>> >>> in >>> solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java >>> there is the following code starting line 107: >>> >>> //TODO default to 2; but need to debug why BasicDistributedZk2Test fails >>> early on >>> String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? null >>> : ZkStateReader.getCollectionPath(cName); >>> >>> Any if that will be changed to default to version 2 anytime soon? >>> >>> thanks, >>> Hendrik >> >> >
Re: ClusterStateMutator
Hi Erik, I have actually also seen that behavior already. So will check what happens when I set that property. I still believe I'm getting the clusterstate.json set already before the node comes up again. But I will try to verify that further tomorrow. thanks, Hendrik On 04/01/17 22:10, Erick Erickson wrote: > Hendrik: > > Historically in 4.x, there was code that would reconstruct the > clusterstate.json code. So you would see "deleted" collections come > back. One scenario was: > > - Have a Solr node offline that had a replica for a collection. > - Delete that collection > - Bring the node back > - It would register itself in clusterstate.json. > > So my guess is that something like this is going on and you're getting > a clusterstate.json that's reconstructed (and possibly not complete). > > You can avoid this by specifying legacyCloud=false clusterprop > > Kind of a shot in the dark... > > Erick > > On Wed, Jan 4, 2017 at 11:12 AM, Hendrik Haddorp > wrote: >> You are right, the code looks like it. But why did I then see collection >> data in the clusterstate.json file? If version 1 is not used I would >> assume that no data ends up in there. When explicitly setting the state >> format 2 the system seemed to behave differently. And if the code always >> uses version 2 shouldn't the default in that line be changed accordingly? >> >> On 04/01/17 16:41, Shalin Shekhar Mangar wrote: >>> Actually the state format defaults to 2 since many releases (all of >>> 6.x at least). This default is enforced in CollectionsHandler much >>> before the code in ClusterStateMutator is executed. >>> >>> On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp >>> wrote: Hi, in solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java there is the following code starting line 107: //TODO default to 2; but need to debug why BasicDistributedZk2Test fails early on String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? null : ZkStateReader.getCollectionPath(cName); Any if that will be changed to default to version 2 anytime soon? thanks, Hendrik >>>
Re: Solr Initialization failure
On 1/4/2017 1:43 PM, Chetas Joshi wrote: > while creating a new collection, it fails to spin up solr cores on some > nodes due to "insufficient direct memory". > > Here is the error: > >- *3044_01_17_shard42_replica1:* > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: >The max direct memory is likely too low. Either increase it (by adding >-XX:MaxDirectMemorySize=g -XX:+UseLargePages to your containers >startup args) or disable direct allocation using >solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If >you are putting the block cache on the heap, your java heap size might not >be large enough. Failed allocating ~2684.35456 MB. > > The error is self explanatory. > > My question is: why does it require around 2.7 GB of off-heap memory to > spin up a single core?? This message comes from the HdfsDirectoryFactory class. This is the calculation of the total amount of memory needed: long totalMemory = (long) bankCount * (long) numberOfBlocksPerBank * (long) blockSize; The numberOfBlocksPerBank variable can come from the configuration, the code defaults it to 16384. The blockSize variable gets assigned by a convoluted method involving bit shifts, and defaults to 8192. The bankCount variable seems to come from solr.hdfs.blockcache.slab.count, and apparently defaults to 1. Looks like it's been set to 20 on your config. If we assume the other two are at their defaults and you have 20 for the slab count, then this results in 2684354560 bytes, which would cause the exact output seen in the error message when the memory allocation fails. I know very little about HDFS or how the HDFS directory works, but apparently it needs a lot of memory if you want good performance. Reducing solr.hdfs.blockcache.slab.count sounds like it might result in less memory being required. You might want to review this page for info about how to set up HDFS, where it says that each slab requires 128MB of memory: https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS The default settings for the HDFS directory cause the block cache to be global, so all cores use it, instead of spinning up another cache for every additional core. What I've seen sounds like one of these two problems: 1) You've turned off the global cache option. 2) This node doesn't yet have any HDFS cores, so your collection create is tryin to create the first core using HDFS. That action is trying to allocate the global cache, which has been sized at 20 slabs. Thanks, Shawn
Re: ClusterStateMutator
Let us know how it goes. You'll probably want to remove the _contents_ of clusterstate.json and just leave it as a pair of brackets , i.e. {} if for no other reason than it's confusing. Times past the node needed to be there even if empty. Although I just tried removing it completely on 6x and I was able to start Solr, part of the startup process recreates it as an empty node, just a pair of braces. Best, Erick On Wed, Jan 4, 2017 at 1:22 PM, Hendrik Haddorp wrote: > Hi Erik, > > I have actually also seen that behavior already. So will check what > happens when I set that property. > I still believe I'm getting the clusterstate.json set already before the > node comes up again. But I will try to verify that further tomorrow. > > thanks, > Hendrik > > On 04/01/17 22:10, Erick Erickson wrote: >> Hendrik: >> >> Historically in 4.x, there was code that would reconstruct the >> clusterstate.json code. So you would see "deleted" collections come >> back. One scenario was: >> >> - Have a Solr node offline that had a replica for a collection. >> - Delete that collection >> - Bring the node back >> - It would register itself in clusterstate.json. >> >> So my guess is that something like this is going on and you're getting >> a clusterstate.json that's reconstructed (and possibly not complete). >> >> You can avoid this by specifying legacyCloud=false clusterprop >> >> Kind of a shot in the dark... >> >> Erick >> >> On Wed, Jan 4, 2017 at 11:12 AM, Hendrik Haddorp >> wrote: >>> You are right, the code looks like it. But why did I then see collection >>> data in the clusterstate.json file? If version 1 is not used I would >>> assume that no data ends up in there. When explicitly setting the state >>> format 2 the system seemed to behave differently. And if the code always >>> uses version 2 shouldn't the default in that line be changed accordingly? >>> >>> On 04/01/17 16:41, Shalin Shekhar Mangar wrote: Actually the state format defaults to 2 since many releases (all of 6.x at least). This default is enforced in CollectionsHandler much before the code in ClusterStateMutator is executed. On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp wrote: > Hi, > > in > solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java > there is the following code starting line 107: > > //TODO default to 2; but need to debug why BasicDistributedZk2Test fails > early on > String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? > null > : ZkStateReader.getCollectionPath(cName); > > Any if that will be changed to default to version 2 anytime soon? > > thanks, > Hendrik >
Re: Solr Initialization failure
Hi Shawn Thanks for the explanation! I have slab count set to 20 and I did not have global block cache. I have a follow up question. Does setting slab count=1 affect the write/read performance of Solr while reading the indices from HDFS? Is this setting just used while creating new cores? Thanks! On Wed, Jan 4, 2017 at 4:11 PM, Shawn Heisey wrote: > On 1/4/2017 1:43 PM, Chetas Joshi wrote: > > while creating a new collection, it fails to spin up solr cores on some > > nodes due to "insufficient direct memory". > > > > Here is the error: > > > >- *3044_01_17_shard42_replica1:* > > org.apache.solr.common.SolrException:org.apache.solr. > common.SolrException: > >The max direct memory is likely too low. Either increase it (by adding > >-XX:MaxDirectMemorySize=g -XX:+UseLargePages to your containers > >startup args) or disable direct allocation using > >solr.hdfs.blockcache.direct.memory.allocation=false in > solrconfig.xml. If > >you are putting the block cache on the heap, your java heap size > might not > >be large enough. Failed allocating ~2684.35456 MB. > > > > The error is self explanatory. > > > > My question is: why does it require around 2.7 GB of off-heap memory to > > spin up a single core?? > > This message comes from the HdfsDirectoryFactory class. This is the > calculation of the total amount of memory needed: > > long totalMemory = (long) bankCount * (long) numberOfBlocksPerBank > * (long) blockSize; > > The numberOfBlocksPerBank variable can come from the configuration, the > code defaults it to 16384. The blockSize variable gets assigned by a > convoluted method involving bit shifts, and defaults to 8192. The > bankCount variable seems to come from solr.hdfs.blockcache.slab.count, > and apparently defaults to 1. Looks like it's been set to 20 on your > config. If we assume the other two are at their defaults and you have > 20 for the slab count, then this results in 2684354560 bytes, which > would cause the exact output seen in the error message when the memory > allocation fails. > > I know very little about HDFS or how the HDFS directory works, but > apparently it needs a lot of memory if you want good performance. > Reducing solr.hdfs.blockcache.slab.count sounds like it might result in > less memory being required. > > You might want to review this page for info about how to set up HDFS, > where it says that each slab requires 128MB of memory: > > https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS > > The default settings for the HDFS directory cause the block cache to be > global, so all cores use it, instead of spinning up another cache for > every additional core. > > What I've seen sounds like one of these two problems: 1) You've turned > off the global cache option. 2) This node doesn't yet have any HDFS > cores, so your collection create is tryin to create the first core using > HDFS. That action is trying to allocate the global cache, which has > been sized at 20 slabs. > > Thanks, > Shawn > >
Re: SolrJ doesn't work with Json facet api
Thanks for your response. We definitely use solrQuery.set("json.facet", "the json query here"); Btw we are using Solr 5.2.1. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-doesn-t-work-with-Json-facet-api-tp4299867p4312459.html Sent from the Solr - User mailing list archive at Nabble.com.