date:20170104

Re: Queries regarding solr cache

2017-01-04 Thread kshitij tyagi

Hi Shawn,

Need your help:

I am using master slave architecture in my system and here is the
solrconfig.xml:
   ${enable.master:false} startup commit 00:00:10 managed-schema
   ${enable.slave:false} http://${MASTER_CORE_URL}/${solr.core.name}
${POLL_TIME}  

Problem:

I am Noticing that my slaves are not able to use proper caching as:

1. I am indexing on my master and committing frequently, what i am noticing
is that my slaves are committing very frequently and cache is not being
build properly and so my hit ratio is almost zero for caching.

2. What changes I need to make so that the cache builds up properly even
after commits and cache could be used properly, this is wasting a lot of my
resources and also slowering up the queries.

On Mon, Dec 5, 2016 at 9:06 PM, Shawn Heisey  wrote:

> On 12/5/2016 6:44 AM, kshitij tyagi wrote:
> >   - lookups:381
> >   - hits:24
> >   - hitratio:0.06
> >   - inserts:363
> >   - evictions:0
> >   - size:345
> >   - warmupTime:2932
> >   - cumulative_lookups:294948
> >   - cumulative_hits:15840
> >   - cumulative_hitratio:0.05
> >   - cumulative_inserts:277963
> >   - cumulative_evictions:70078
> >
> >   How can I increase my hit ratio? I am not able to understand solr
> >   caching mechanism clearly. Please help.
>
> This means that out of the nearly 30 queries executed by that
> handler, only five percent (15000) of them were found in the cache.  The
> rest of them were not found in the cache at the moment they were made.
> Since these numbers come from the queryResultCache, this refers to the
> "q" parameter.  The filterCache handles things in the fq parameter.  The
> documentCache holds actual documents from your index and fills in stored
> data in results so the document doesn't have to be fetched from the index.
>
> Possible reasons:  1) Your users are rarely entering the same query more
> than once.  2) Your client code is adding something unique to every
> query (q parameter) so very few of them are the same.  3) You are
> committing so frequently that the cache never has a chance to get large
> enough to make a difference.
>
> Here are some queryResultCache stats from one of my indexes:
>
> class:org.apache.solr.search.FastLRUCache
> version:1.0
> description:Concurrent LRU Cache(maxSize=512, initialSize=512,
> minSize=460, acceptableSize=486, cleanupThread=true,
> autowarmCount=8,
> regenerator=org.apache.solr.search.SolrIndexSearcher$3@1d172ac0)
> src:$URL:
> https://svn.apache.org/repos/asf/lucene/dev/
> branches/lucene_solr_4_7/solr/core/src/java/org/
> apache/solr/search/FastLRUCache.java
> lookups:   3496
> hits:  3145
> hitratio:  0.9
> inserts:   335
> evictions: 0
> size:  338
> warmupTime: 2209
> cumulative_lookups:   12394606
> cumulative_hits:  11247114
> cumulative_hitratio:  0.91
> cumulative_inserts:   1110375
> cumulative_evictions: 409887
>
> These numbers indicate that 91 percent of the queries made to this
> handler were served from the cache.
>
> Thanks,
> Shawn
>
>

Re: Solr ACL Plugin Windows

2017-01-04 Thread Mike Thomsen

I didn't see a real Java project there, but the directions to compile on
Linux are almost always applicable to Windows with Java. If you find a
project that says it uses Ant or Maven, all you need to do is download Ant
or Maven, the Java Development Kit and put both of them on the windows
path. Then it's either "ant package" (IIRC most of the time) or "mvn
install" from within the folder that has the project.

FWIW, creating a simple ACL doesn't even require a custom plugin. This is
roughly how you would do it w/ an application that your team has written
that works with solr:

1. Add a multivalue string field called ACL or privileges
2. Write something for your app that can pull a list of
attributes/privileges from a database for the current user.
3. Append a filter query to the query that matches those attributes. Ex:

fq=privileges:(DEVELOPER AND DEVOPS)

If you are using a role-based system that bundles groups of permissions
into a role, all you need to do is decompose the role into a list of
permissions for the user and put all of the required permissions into that
multivalue field.

Mike

On Wed, Jan 4, 2017 at 2:55 AM,  wrote:

> I am searching a SOLR ACL Plugin, i found this
> https://lucidworks.com/blog/2015/05/15/custom-security-filtering-solr-5/
>
> but i don't know how i can compile the jave into to a jar - all Infos i
> found was how to complie it on linux - but this doesn't help.
>
> I am running solr version 6.3.0 on windows Server 2003
>
> So i am searching for infos about compiling a plugin under windows.
>
> Thanxs in advance :D
>
> 
> This message was sent using IMP, the Internet Messaging Program.
>
>

ClusterStateMutator

2017-01-04 Thread Hendrik Haddorp


Hi,

in 
solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java 
there is the following code starting line 107:


//TODO default to 2; but need to debug why BasicDistributedZk2Test fails 
early on
String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? 
null

: ZkStateReader.getCollectionPath(cName);

Any if that will be changed to default to version 2 anytime soon?

thanks,
Hendrik

Re: create collection gets stuck on node restart

2017-01-04 Thread Hendrik Haddorp

Problem is that we would like to run without down times. Rolling updates 
worked fine so far except when creating a collection at the wrong time. 
I just did another test with stateFormat=2. This seems to greatly 
improve the situation. One collection creation got stuck but other 
creations still worked and after a restart of some nodes the stuck 
collection creation also looked ok. For some reason it just resulted in 
two replicas for the same shard getting assigned to the same node even 
though I specified a rule of "shard:*,replica:<2,node:*".


On 03.01.2017 15:34, Shawn Heisey wrote:

On 1/3/2017 2:59 AM, Hendrik Haddorp wrote:

I have a SolrCloud setup with 5 nodes and am creating collections with
a replication factor of 3. If I kill and restart nodes at the "right"
time during the creation process the creation seems to get stuck.
Collection data is left in the clusterstate.json file in ZooKeeper and
no collections can be created anymore until this entry gets removed. I
can reproduce this on Solr 6.2.1 and 6.3, while 6.3 seems to be
somewhat less likely to get stuck. Is Solr supposed to recover from
data being stuck in the clusterstate.json at some point? I had one
instance where it looked like data was removed again but normally the
data does not seem to get cleaned up automatically and just blocks any
further collection creations.

I did not find anything like this in Jira. Just SOLR-7198 sounds a bit
similar even though it is about deleting collections.

Don't restart your nodes at the same time you're trying to do
maintenance of any kind on your collections.  Try to only do maintenance
when they are all working, or you'll get unexpected results.

The most recent development goal is make it so that collection deletion
can be done even if the creation was partial.  The idea is that if
something goes wrong, you can delete the bad collection and then be free
to try to create it again.  I see that you've started another thread
about deletion not fully eliminating everything in HDFS.  That does
sound like a bug.  I have no experience with HDFS at all, so I can't be
helpful with that.

Thanks,
Shawn

Re: Solr ACL Plugin Windows

2017-01-04 Thread Erik Hatcher

Thanks, Mike, for emphasizing that point.   I put that point in the blog post 
as well - the recommended approach if it's sufficient for sure.  

Erik

> On Jan 4, 2017, at 07:36, Mike Thomsen  wrote:
> 
> I didn't see a real Java project there, but the directions to compile on
> Linux are almost always applicable to Windows with Java. If you find a
> project that says it uses Ant or Maven, all you need to do is download Ant
> or Maven, the Java Development Kit and put both of them on the windows
> path. Then it's either "ant package" (IIRC most of the time) or "mvn
> install" from within the folder that has the project.
> 
> FWIW, creating a simple ACL doesn't even require a custom plugin. This is
> roughly how you would do it w/ an application that your team has written
> that works with solr:
> 
> 1. Add a multivalue string field called ACL or privileges
> 2. Write something for your app that can pull a list of
> attributes/privileges from a database for the current user.
> 3. Append a filter query to the query that matches those attributes. Ex:
> 
> fq=privileges:(DEVELOPER AND DEVOPS)
> 
> 
> If you are using a role-based system that bundles groups of permissions
> into a role, all you need to do is decompose the role into a list of
> permissions for the user and put all of the required permissions into that
> multivalue field.
> 
> Mike
> 
>> On Wed, Jan 4, 2017 at 2:55 AM,  wrote:
>> 
>> I am searching a SOLR ACL Plugin, i found this
>> https://lucidworks.com/blog/2015/05/15/custom-security-filtering-solr-5/
>> 
>> but i don't know how i can compile the jave into to a jar - all Infos i
>> found was how to complie it on linux - but this doesn't help.
>> 
>> I am running solr version 6.3.0 on windows Server 2003
>> 
>> So i am searching for infos about compiling a plugin under windows.
>> 
>> Thanxs in advance :D
>> 
>> 
>> This message was sent using IMP, the Internet Messaging Program.
>> 
>>

Re: create collection gets stuck on node restart

2017-01-04 Thread Shawn Heisey

On 1/4/2017 6:23 AM, Hendrik Haddorp wrote:
> Problem is that we would like to run without down times. Rolling
> updates worked fine so far except when creating a collection at the
> wrong time. I just did another test with stateFormat=2. This seems to
> greatly improve the situation. One collection creation got stuck but
> other creations still worked and after a restart of some nodes the
> stuck collection creation also looked ok. For some reason it just
> resulted in two replicas for the same shard getting assigned to the
> same node even though I specified a rule of "shard:*,replica:<2,node:*".

I have no idea what that rule means or where you might be configuring
it.  That must be for a feature that I've never used.

If you're going to restart nodes, then don't create collections at that
moment.  Wait until after the restart is completely finished.  If it's
all automated ... then design your tools so that they do not create
collections and do the restarts at the same time.

Thanks,
Shawn

Re: Queries regarding solr cache

2017-01-04 Thread Shawn Heisey

On 1/4/2017 3:45 AM, kshitij tyagi wrote:
> Problem:
>
> I am Noticing that my slaves are not able to use proper caching as:
>
> 1. I am indexing on my master and committing frequently, what i am noticing
> is that my slaves are committing very frequently and cache is not being
> build properly and so my hit ratio is almost zero for caching.
>
> 2. What changes I need to make so that the cache builds up properly even
> after commits and cache could be used properly, this is wasting a lot of my
> resources and also slowering up the queries.

Whenever you commit with openSearcher set to true (which is the
default), Solr immediately throws the cache away.  This is by design --
the cache contains internal document IDs from the previous index, due to
merging, the new index might have entirely different ID values for the
same documents.  A commit on the master will cause the slave to copy the
index on its next configured replication interval, and then basically do
a commit of its own to signal that a new searcher is needed.

The caches have a feature called autowarming, which takes the top N
entries in the cache and re-executes the queries that produced the
entries to populate the new cache before the new searcher starts.  If
you set autowarmCount too high, it makes the commits take a really long
time.

If you are committing so frequently that your cache is ineffective, then
you need to commit less frequently.  Whenever you do a commit on the
master, the slave will also do a commit after it copies the new index.

Thanks,
Shawn

update/extract override ExtractTyp

2017-01-04 Thread sn00py


Hello

Is it possible to override the ExtractClass for a specific document?
I would like to upload a XML Document, but this XML is not XML conform

I need this XML because it is part of a project where a corrupt XML is  
need, for testing purpose.



The update/extract process failes every time with an 500 error.

I tried to override the Content-Type with "text/plain" but  get still  
the XML parse error.


Is it possible to override it?


This message was sent using IMP, the Internet Messaging Program.

RE: Can I use SolrJ 6.3.0 to talk to a Solr 5.2.3 server?

2017-01-04 Thread Jennifer Coston

Can anyone explain how to get rid of this error?

java.lang.Exception: Assertions mismatch: -ea was not specified but 
-Dtests.asserts=true
at __randomizedtesting.SeedInfo.seed([5B25E606A72BD541]:0)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:47)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)

Here is my test class:

public class SolrCatalogClientTest extends SolrJettyTestBase {

private static SolrCatalogClient solrClient;
protected static JettySolrRunner SOLR;

@BeforeClass
public static void setUpBeforeClass() throws Exception {
System.setProperty("tests.asserts", "false");
System.setProperty("solr.solr.home", "solr/conf");
System.setProperty("solr.core.name", "mySolrCore");
System.setProperty("solr.data.dir", new 
File("target/solr-embedded-data").getAbsolutePath());

System.out.println("Initializing Solr for JUnit tests");

SOLR = createJetty(
"target/solr-embedded-data",
JettyConfig.builder()
.setPort(8983)
.setContext("/solr")
.stopAtShutdown(true)
.build());

solrClient = new SolrCatalogClient();

...}
}

Should I be using initCore instead of createJetty?

Thank you!
-Jennifer

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, January 03, 2017 2:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Can I use SolrJ 6.3.0 to talk to a Solr 5.2.3 server?

On 1/3/2017 10:35 AM, Jennifer Coston wrote:
> I am running into a conflict with Solr and ElasticSearch. We are trying to 
> add support for Elastic Search 5.1.1 which requires Lucene 6.3.0 to an 
> existing system that uses Solr 5.2.3. At the moment I am using SolrJ 5.3.1 to 
> talk to the 5.2.3 Server. I was hoping I could just update the SolrJ 
> libraries to 6.3.0 so the Lucene conflict goes away, but when I try to run my 
> unit tests I'm seeing this error:

There is no 5.2.3 version of Solr.  The 5.2.x line ended with 5.2.1. 
There is a 5.3.2 version.

If you're using HttpSolrClient, mixing a 5.x server and a 6.x client will be no 
problem at all.  If you're using CloudSolrClient you may run into issues 
because of the very rapid pace of change in SolrCloud -- mixing major versions 
is not recommended.  If you're using EmbeddedSolrServer, then the client and 
the server are not separate, so upgrading both at once is probably OK, but the 
code and the core config might require changes.

> java.util.ServiceConfigurationError: Cannot instantiate SPI class: 
> org.apache.lucene.codecs.simpletext.SimpleTextPostingsFormat

This is an error from a Solr *server*, not a client.  Are you using 
EmbeddedSolrServer?

> Here are the Solr Dependencies I have in my pom.xml:

You've got a solr-core dependency here.  That is not at all necessary for a 
SolrJ client, *unless* you are using EmbeddedSolrServer.  If you are using the 
embedded server, then things are much more complicated, and your code may 
require changes before it will work correctly with a new major version.  The 
embedded server is NOT recommended unless you have no other choice.  The unit 
tests included with Solr are able to fire up Solr with Jetty for tests, just 
like a real separate Solr server.  Your tests can also do this.  Look for tests 
in the Lucene/Solr codebase that extend SolrJettyTestBase, and use those as a 
guide.

Thanks,
Shawn

Re: ClusterStateMutator

2017-01-04 Thread Shalin Shekhar Mangar

Actually the state format defaults to 2 since many releases (all of
6.x at least). This default is enforced in CollectionsHandler much
before the code in ClusterStateMutator is executed.

On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp  wrote:
> Hi,
>
> in
> solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java
> there is the following code starting line 107:
>
> //TODO default to 2; but need to debug why BasicDistributedZk2Test fails
> early on
> String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? null
> : ZkStateReader.getCollectionPath(cName);
>
> Any if that will be changed to default to version 2 anytime soon?
>
> thanks,
> Hendrik



-- 
Regards,
Shalin Shekhar Mangar.

Re: Random Streaming Function not there? SolrCloud 6.3.0

2017-01-04 Thread Joel Bernstein

This issue is resolved for Solr 6.4:
https://issues.apache.org/jira/browse/SOLR-9919

I also created an issue to resolve future bugs of this nature:
https://issues.apache.org/jira/browse/SOLR-9924

Thanks for the bug report!

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jan 3, 2017 at 9:04 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Thanks!  I'll give this a shot.
>
> -Joe
>
>
>
> On 1/3/2017 8:52 PM, Joel Bernstein wrote:
>
>> Luckily https://issues.apache.org/jira/browse/SOLR-9103 is available in
>> Solr 6.3
>>
>> So you can register the random expression through the solrconfig. The
>> ticket shows an example.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Tue, Jan 3, 2017 at 7:59 PM, Joel Bernstein 
>> wrote:
>>
>> This is a bug. I just checked and the random expression is not mapped in
>>> the /stream handler. The test cases pass because they register the random
>>> function explicitly inside the test case. This is something that we need
>>> to
>>> fix, so the /stream handler registration always gets tested.
>>>
>>> I'm fairly sure I remember testing random at scale through the /stream
>>> handler so I'm not sure how this missed getting committed.
>>>
>>> I will fix this for Solr 6.4.
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Tue, Jan 3, 2017 at 6:46 PM, Joe Obernberger <
>>> joseph.obernber...@gmail.com> wrote:
>>>
>>> I'm getting an error:

 {"result-set":{"docs":[
 {"EXCEPTION":"Invalid stream expression random(MAIN,q=\"FULL_DOCUMENT:
 obamacare\",rows=100,fl=DocumentId) - function 'random' is unknown (not
 mapped to a valid TupleStream)","EOF":true}]}}

 When trying to use the streaming random function.  I'm using curl with:
 curl --data-urlencode 'expr=random(MAIN,q="FULL_DOCU
 MENT:obamacare",rows="100",fl="DocumentId")'
 http://cordelia:9100/solr/MAIN/stream

 Any ideas?  Thank you!

 -Joe



>

Re: Can I use SolrJ 6.3.0 to talk to a Solr 5.2.3 server?

2017-01-04 Thread Shawn Heisey

On 1/4/2017 8:29 AM, Jennifer Coston wrote:
> Can anyone explain how to get rid of this error?
>
> java.lang.Exception: Assertions mismatch: -ea was not specified but 
> -Dtests.asserts=true
>   at __randomizedtesting.SeedInfo.seed([5B25E606A72BD541]:0)
>   at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:47)
>   at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>   at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>   at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
>   at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>   at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
>   at java.lang.Thread.run(Thread.java:745)

When you started your test, a system property was included (the java
commandline option for that is -Dtests.assert=true) that tells something
that you wanted assertions enabled.  Looks like this message comes from
the lucene test-framework.  When this property is set to true, it also
requires a java commandline option to enable assertions, which is -ea as
the message said.

You'll need to either add "-ea" to the java commandline or remove the
system property.  If you didn't specify the system property and can't
figure out how to remove it, then you'll likely need help from somebody
who's really familiar with the build system.  You can find those people
on the dev mailing list or the #lucene-dev IRC channel on freenode. 
Some of them do hang out on this mailing list, but it's not really
on-topic for this list.

http://lucene.apache.org/core/discussion.html

If you choose the IRC channel, here's a Solr resource about IRC
channels.  The channels this page mentions are specific to Solr, but the
information provided is relevant for any technical IRC channel:

https://wiki.apache.org/solr/IRCChannels

Thanks,
Shawn

Re: update/extract override ExtractTyp

2017-01-04 Thread Shawn Heisey

On 1/4/2017 8:12 AM, sn0...@ulysses-erp.com wrote:
> Is it possible to override the ExtractClass for a specific document?
> I would like to upload a XML Document, but this XML is not XML conform
>
> I need this XML because it is part of a project where a corrupt XML is
> need, for testing purpose.
>
>
> The update/extract process failes every time with an 500 error.
>
> I tried to override the Content-Type with "text/plain" but  get still
> the XML parse error.

If you send something to the /update handler, and don't tell Solr that
it is another format that it knows like CSV, JSON, or Javabin, then Solr
assumes that it is XML -- and that it is the *specific* XML format that
Solr uses.  "text/plain" is not one of the formats that the update
handler knows how to handle, so it will assume XML.

If you send some other arbitrary XML content, even if that XML is
otherwise correctly formed (which apparently yours isn't), Solr will
throw an error, because it is not the type of XML that Solr is looking
for.  On this page are some examples of what Solr is expecting when you
send XML:

https://wiki.apache.org/solr/UpdateXmlMessages

If you want to parse arbitrary XML into fields, you probably need to
send it using DIH and the XPathEntityProcessor.  If you want the XML to
go into a field completely as-is, then you need to encode the XML into
one of the update formats that Solr knows (XML, JSON, etc) and set it as
the value of one of the fields.

Thanks,
Shawn

Zip Bomb Exception in HTML File

2017-01-04 Thread sn00py

i get an exception  
"org.apache.tika.exception.TikaException: Zip bomb  
detected!

if i would like to parse a html file - and i think i know why.
because there are many many  in cascade over 200 divs and  
span are inside each.


Is it correct that there is this limit for html files?


This message was sent using IMP, the Internet Messaging Program.

Re: Zip Bomb Exception in HTML File

2017-01-04 Thread Erick Erickson

You might get a more knowledgeable response from the Tika folks,
that's really not something Solr controls.


Best,
Erick

On Wed, Jan 4, 2017 at 8:50 AM,   wrote:
> i get an exception "org.apache.tika.exception.TikaException:
> Zip bomb detected!
> if i would like to parse a html file - and i think i know why.
> because there are many many  in cascade over 200 divs and span
> are inside each.
>
> Is it correct that there is this limit for html files?
>
> 
> This message was sent using IMP, the Internet Messaging Program.
>

Re: ClusterStateMutator

2017-01-04 Thread Hendrik Haddorp

You are right, the code looks like it. But why did I then see collection
data in the clusterstate.json file? If version 1 is not used I would
assume that no data ends up in there. When explicitly setting the state
format 2 the system seemed to behave differently. And if the code always
uses version 2 shouldn't the default in that line be changed accordingly?

On 04/01/17 16:41, Shalin Shekhar Mangar wrote:
> Actually the state format defaults to 2 since many releases (all of
> 6.x at least). This default is enforced in CollectionsHandler much
> before the code in ClusterStateMutator is executed.
>
> On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp  
> wrote:
>> Hi,
>>
>> in
>> solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java
>> there is the following code starting line 107:
>>
>> //TODO default to 2; but need to debug why BasicDistributedZk2Test fails
>> early on
>> String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? null
>> : ZkStateReader.getCollectionPath(cName);
>>
>> Any if that will be changed to default to version 2 anytime soon?
>>
>> thanks,
>> Hendrik
>
>

RE: Zip Bomb Exception in HTML File

2017-01-04 Thread Allison, Timothy B.

This came up back in September [1] and [2].  Same trigger...crazy number of 
divs.  

I think we could modify the AutoDetectParser to enable configuration of maximum 
zip-bomb depth via tika-config.

If there's any interest in this, re-open TIKA-2091, and I'll take a look.

Best,

Tim

[1] http://git.net/ml/solr-user.lucene.apache.org/2016-09/msg00561.html
[2] https://issues.apache.org/jira/browse/TIKA-2091

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, January 4, 2017 12:20 PM
To: solr-user 
Subject: Re: Zip Bomb Exception in HTML File

You might get a more knowledgeable response from the Tika folks, that's really 
not something Solr controls.

Best,
Erick

On Wed, Jan 4, 2017 at 8:50 AM,   wrote:
> i get an exception "org.apache.tika.exception.TikaException:
> Zip bomb detected!
> if i would like to parse a html file - and i think i know why.
> because there are many many  in cascade over 200 divs and 
> span are inside each.
>
> Is it correct that there is this limit for html files?
>
> 
> This message was sent using IMP, the Internet Messaging Program.
>

Re: Solr 6.x : howto create a core in (embedded) CoreConatiner

2017-01-04 Thread Bryan Bende

I had success doing something like this, which I found in some of the Solr
tests...

SolrResourceLoader loader = new SolrResourceLoader(solrHomeDir.toPath());
Path configSetPath = Paths.get(configSetHome).toAbsolutePath();

final NodeConfig config = new
NodeConfig.NodeConfigBuilder("embeddedSolrServerNode", loader)
.setConfigSetBaseDirectory(configSetPath.toString()) .build();

EmbeddedSolrServer embeddedSolrServer = new EmbeddedSolrServer(config,
coreName);
CoreAdminRequest.Create createRequest = new CoreAdminRequest.Create();
createRequest.setCoreName(coreName);
createRequest.setConfigSet(coreName);
embeddedSolrServer.request(createRequest);

The setup was to have a config set located at src/test/resources/configsets
so configSetHome was src/test/resources/configsets, the coreName was the
name of a configset in that directory, and solrHome was a path to
target/solr.

https://github.com/bbende/embeddedsolrserver-example/blob/master/src/test/java/org/apache/solr/EmbeddedSolrServerFactory.java
https://github.com/bbende/embeddedsolrserver-example/blob/master/src/test/java/org/apache/solr/TestEmbeddedSolrServerFactory.java

On Fri, Dec 30, 2016 at 3:27 AM, Clemens Wyss DEV 
wrote:

> I am still using 5.4.1 and have the following code to create a new core:
> ...
> Properties coreProperties = new Properties();
> coreProperties.setProperty( CoreDescriptor.CORE_CONFIGSET, configsetToUse
> );
> CoreDescriptor coreDescriptor = new CoreDescriptor( container, coreName,
> coreFolder, coreProperties );
> coreContainer.create( coreDescriptor );
> coreContainer.getCoresLocator().create( coreContainer, coreDescriptor );
> ...
>
> What is the equivalent Java snippet in Solr 6.x (latest greatest)?
>
> Thx & a successful 2017!
> Clemens
>

Solr Initialization failure

2017-01-04 Thread Chetas Joshi

Hello,

while creating a new collection, it fails to spin up solr cores on some
nodes due to "insufficient direct memory".

Here is the error:

   - *3044_01_17_shard42_replica1:*
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
   The max direct memory is likely too low. Either increase it (by adding
   -XX:MaxDirectMemorySize=g -XX:+UseLargePages to your containers
   startup args) or disable direct allocation using
   solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If
   you are putting the block cache on the heap, your java heap size might not
   be large enough. Failed allocating ~2684.35456 MB.

The error is self explanatory.

My question is: why does it require around 2.7 GB of off-heap memory to
spin up a single core??

Thank you!

Re: ClusterStateMutator

2017-01-04 Thread Erick Erickson

Hendrik:

Historically in 4.x, there was code that would reconstruct the
clusterstate.json code. So you would see "deleted" collections come
back. One scenario was:

- Have a Solr node offline that had a replica for a collection.
- Delete that collection
- Bring the node back
- It would register itself in clusterstate.json.

So my guess is that something like this is going on and you're getting
a clusterstate.json that's reconstructed (and possibly not complete).

You can avoid this by specifying legacyCloud=false clusterprop

Kind of a shot in the dark...

Erick

On Wed, Jan 4, 2017 at 11:12 AM, Hendrik Haddorp
 wrote:
> You are right, the code looks like it. But why did I then see collection
> data in the clusterstate.json file? If version 1 is not used I would
> assume that no data ends up in there. When explicitly setting the state
> format 2 the system seemed to behave differently. And if the code always
> uses version 2 shouldn't the default in that line be changed accordingly?
>
> On 04/01/17 16:41, Shalin Shekhar Mangar wrote:
>> Actually the state format defaults to 2 since many releases (all of
>> 6.x at least). This default is enforced in CollectionsHandler much
>> before the code in ClusterStateMutator is executed.
>>
>> On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp  
>> wrote:
>>> Hi,
>>>
>>> in
>>> solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java
>>> there is the following code starting line 107:
>>>
>>> //TODO default to 2; but need to debug why BasicDistributedZk2Test fails
>>> early on
>>> String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? null
>>> : ZkStateReader.getCollectionPath(cName);
>>>
>>> Any if that will be changed to default to version 2 anytime soon?
>>>
>>> thanks,
>>> Hendrik
>>
>>
>

Re: ClusterStateMutator

2017-01-04 Thread Hendrik Haddorp

Hi Erik,

I have actually also seen that behavior already. So will check what
happens when I set that property.
I still believe I'm getting the clusterstate.json set already before the
node comes up again. But I will try to verify that further tomorrow.

thanks,
Hendrik

On 04/01/17 22:10, Erick Erickson wrote:
> Hendrik:
>
> Historically in 4.x, there was code that would reconstruct the
> clusterstate.json code. So you would see "deleted" collections come
> back. One scenario was:
>
> - Have a Solr node offline that had a replica for a collection.
> - Delete that collection
> - Bring the node back
> - It would register itself in clusterstate.json.
>
> So my guess is that something like this is going on and you're getting
> a clusterstate.json that's reconstructed (and possibly not complete).
>
> You can avoid this by specifying legacyCloud=false clusterprop
>
> Kind of a shot in the dark...
>
> Erick
>
> On Wed, Jan 4, 2017 at 11:12 AM, Hendrik Haddorp
>  wrote:
>> You are right, the code looks like it. But why did I then see collection
>> data in the clusterstate.json file? If version 1 is not used I would
>> assume that no data ends up in there. When explicitly setting the state
>> format 2 the system seemed to behave differently. And if the code always
>> uses version 2 shouldn't the default in that line be changed accordingly?
>>
>> On 04/01/17 16:41, Shalin Shekhar Mangar wrote:
>>> Actually the state format defaults to 2 since many releases (all of
>>> 6.x at least). This default is enforced in CollectionsHandler much
>>> before the code in ClusterStateMutator is executed.
>>>
>>> On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp  
>>> wrote:
 Hi,

 in
 solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java
 there is the following code starting line 107:

 //TODO default to 2; but need to debug why BasicDistributedZk2Test fails
 early on
 String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? 
 null
 : ZkStateReader.getCollectionPath(cName);

 Any if that will be changed to default to version 2 anytime soon?

 thanks,
 Hendrik
>>>

Re: Solr Initialization failure

2017-01-04 Thread Shawn Heisey

On 1/4/2017 1:43 PM, Chetas Joshi wrote:
> while creating a new collection, it fails to spin up solr cores on some
> nodes due to "insufficient direct memory".
>
> Here is the error:
>
>- *3044_01_17_shard42_replica1:*
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>The max direct memory is likely too low. Either increase it (by adding
>-XX:MaxDirectMemorySize=g -XX:+UseLargePages to your containers
>startup args) or disable direct allocation using
>solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If
>you are putting the block cache on the heap, your java heap size might not
>be large enough. Failed allocating ~2684.35456 MB.
>
> The error is self explanatory.
>
> My question is: why does it require around 2.7 GB of off-heap memory to
> spin up a single core??

This message comes from the HdfsDirectoryFactory class.  This is the
calculation of the total amount of memory needed:

long totalMemory = (long) bankCount * (long) numberOfBlocksPerBank
* (long) blockSize;

The numberOfBlocksPerBank variable can come from the configuration, the
code defaults it to 16384.  The blockSize variable gets assigned by a
convoluted method involving bit shifts, and defaults to 8192.   The
bankCount variable seems to come from solr.hdfs.blockcache.slab.count,
and apparently defaults to 1.  Looks like it's been set to 20 on your
config.  If we assume the other two are at their defaults and you have
20 for the slab count, then this results in 2684354560 bytes, which
would cause the exact output seen in the error message when the memory
allocation fails.

I know very little about HDFS or how the HDFS directory works, but
apparently it needs a lot of memory if you want good performance. 
Reducing solr.hdfs.blockcache.slab.count sounds like it might result in
less memory being required.

You might want to review this page for info about how to set up HDFS,
where it says that each slab requires 128MB of memory:

https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS

The default settings for the HDFS directory cause the block cache to be
global, so all cores use it, instead of spinning up another cache for
every additional core.

What I've seen sounds like one of these two problems:  1) You've turned
off the global cache option.  2) This node doesn't yet have any HDFS
cores, so your collection create is tryin to create the first core using
HDFS.  That action is trying to allocate the global cache, which has
been sized at 20 slabs.

Thanks,
Shawn

Re: ClusterStateMutator

2017-01-04 Thread Erick Erickson

Let us know how it goes. You'll probably want to remove the _contents_
of clusterstate.json and just leave it as a pair of brackets , i.e. {}
if for no other reason than it's confusing.

Times past the node needed to be there even if empty. Although I just
tried removing it completely on 6x and I was able to start Solr, part
of the startup process recreates it as an empty node, just a pair of
braces.

Best,
Erick

On Wed, Jan 4, 2017 at 1:22 PM, Hendrik Haddorp  wrote:
> Hi Erik,
>
> I have actually also seen that behavior already. So will check what
> happens when I set that property.
> I still believe I'm getting the clusterstate.json set already before the
> node comes up again. But I will try to verify that further tomorrow.
>
> thanks,
> Hendrik
>
> On 04/01/17 22:10, Erick Erickson wrote:
>> Hendrik:
>>
>> Historically in 4.x, there was code that would reconstruct the
>> clusterstate.json code. So you would see "deleted" collections come
>> back. One scenario was:
>>
>> - Have a Solr node offline that had a replica for a collection.
>> - Delete that collection
>> - Bring the node back
>> - It would register itself in clusterstate.json.
>>
>> So my guess is that something like this is going on and you're getting
>> a clusterstate.json that's reconstructed (and possibly not complete).
>>
>> You can avoid this by specifying legacyCloud=false clusterprop
>>
>> Kind of a shot in the dark...
>>
>> Erick
>>
>> On Wed, Jan 4, 2017 at 11:12 AM, Hendrik Haddorp
>>  wrote:
>>> You are right, the code looks like it. But why did I then see collection
>>> data in the clusterstate.json file? If version 1 is not used I would
>>> assume that no data ends up in there. When explicitly setting the state
>>> format 2 the system seemed to behave differently. And if the code always
>>> uses version 2 shouldn't the default in that line be changed accordingly?
>>>
>>> On 04/01/17 16:41, Shalin Shekhar Mangar wrote:
 Actually the state format defaults to 2 since many releases (all of
 6.x at least). This default is enforced in CollectionsHandler much
 before the code in ClusterStateMutator is executed.

 On Wed, Jan 4, 2017 at 6:16 PM, Hendrik Haddorp  
 wrote:
> Hi,
>
> in
> solr-6.3.0/solr/core/src/java/org/apache/solr/cloud/overseer/ClusterStateMutator.java
> there is the following code starting line 107:
>
> //TODO default to 2; but need to debug why BasicDistributedZk2Test fails
> early on
> String znode = message.getInt(DocCollection.STATE_FORMAT, 1) == 1 ? 
> null
> : ZkStateReader.getCollectionPath(cName);
>
> Any if that will be changed to default to version 2 anytime soon?
>
> thanks,
> Hendrik

>

Re: Solr Initialization failure

2017-01-04 Thread Chetas Joshi

Hi Shawn

Thanks for the explanation!

I have slab count set to 20 and I did not have global block cache.
I have a follow up question. Does setting slab count=1 affect the
write/read performance of Solr while reading the indices from HDFS? Is this
setting just used while creating new cores?

Thanks!

On Wed, Jan 4, 2017 at 4:11 PM, Shawn Heisey  wrote:

> On 1/4/2017 1:43 PM, Chetas Joshi wrote:
> > while creating a new collection, it fails to spin up solr cores on some
> > nodes due to "insufficient direct memory".
> >
> > Here is the error:
> >
> >- *3044_01_17_shard42_replica1:*
> > org.apache.solr.common.SolrException:org.apache.solr.
> common.SolrException:
> >The max direct memory is likely too low. Either increase it (by adding
> >-XX:MaxDirectMemorySize=g -XX:+UseLargePages to your containers
> >startup args) or disable direct allocation using
> >solr.hdfs.blockcache.direct.memory.allocation=false in
> solrconfig.xml. If
> >you are putting the block cache on the heap, your java heap size
> might not
> >be large enough. Failed allocating ~2684.35456 MB.
> >
> > The error is self explanatory.
> >
> > My question is: why does it require around 2.7 GB of off-heap memory to
> > spin up a single core??
>
> This message comes from the HdfsDirectoryFactory class.  This is the
> calculation of the total amount of memory needed:
>
> long totalMemory = (long) bankCount * (long) numberOfBlocksPerBank
> * (long) blockSize;
>
> The numberOfBlocksPerBank variable can come from the configuration, the
> code defaults it to 16384.  The blockSize variable gets assigned by a
> convoluted method involving bit shifts, and defaults to 8192.   The
> bankCount variable seems to come from solr.hdfs.blockcache.slab.count,
> and apparently defaults to 1.  Looks like it's been set to 20 on your
> config.  If we assume the other two are at their defaults and you have
> 20 for the slab count, then this results in 2684354560 bytes, which
> would cause the exact output seen in the error message when the memory
> allocation fails.
>
> I know very little about HDFS or how the HDFS directory works, but
> apparently it needs a lot of memory if you want good performance.
> Reducing solr.hdfs.blockcache.slab.count sounds like it might result in
> less memory being required.
>
> You might want to review this page for info about how to set up HDFS,
> where it says that each slab requires 128MB of memory:
>
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
>
> The default settings for the HDFS directory cause the block cache to be
> global, so all cores use it, instead of spinning up another cache for
> every additional core.
>
> What I've seen sounds like one of these two problems:  1) You've turned
> off the global cache option.  2) This node doesn't yet have any HDFS
> cores, so your collection create is tryin to create the first core using
> HDFS.  That action is trying to allocate the global cache, which has
> been sized at 20 slabs.
>
> Thanks,
> Shawn
>
>

Re: SolrJ doesn't work with Json facet api

2017-01-04 Thread Jeffery Yuan

Thanks for your response.
We definitely use solrQuery.set("json.facet", "the json query here");

Btw we are using Solr 5.2.1.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-doesn-t-work-with-Json-facet-api-tp4299867p4312459.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Queries regarding solr cache

Re: Solr ACL Plugin Windows

ClusterStateMutator

Re: create collection gets stuck on node restart

Re: Solr ACL Plugin Windows

Re: create collection gets stuck on node restart

Re: Queries regarding solr cache

update/extract override ExtractTyp

RE: Can I use SolrJ 6.3.0 to talk to a Solr 5.2.3 server?

Re: ClusterStateMutator

Re: Random Streaming Function not there? SolrCloud 6.3.0

Re: Can I use SolrJ 6.3.0 to talk to a Solr 5.2.3 server?

Re: update/extract override ExtractTyp

Zip Bomb Exception in HTML File

Re: Zip Bomb Exception in HTML File

Re: ClusterStateMutator

RE: Zip Bomb Exception in HTML File

Re: Solr 6.x : howto create a core in (embedded) CoreConatiner

Solr Initialization failure

Re: ClusterStateMutator

Re: ClusterStateMutator

Re: Solr Initialization failure

Re: ClusterStateMutator

Re: Solr Initialization failure

Re: SolrJ doesn't work with Json facet api

25 matches

Site Navigation

Mail list logo

Footer information