Maybe the fix is to sunset solr-cell and make an example using Tika
properly? Or folks can contribute to JesterJ which has a way to find
documents, run Tika to extract text and send the result to Solr, but if it
crashes, it can skip the documents it already processed. The primary example
<https://github.com/nsoft/jesterj/blob/master/code/examples/shakespeare/src/main/java/org/jesterj/example/shakespeare/ShakespeareConfig.java#L100>
in fact uses Tika (though the files are text files to begin with so it's
kind of trivial). I just looked at the ref guide page and even though there
is some mention of potential problems under "performance implications" I
think there is a risk that a new user looking at the length and scope of
that page may assume that the cautions are just CYA and anything with that
many features and that much documentation must be meant for production
usage. Yet any experienced solr dev (or at least all the ones I've met)
will tell you that using SolrCell for anything other than a very tiny
install or one with very low availability guarantees is a very bad idea.

Also, disappointingly the link to Erick Erickson's article in the potential
problems section has gone dead.

-Gus

On Wed, May 17, 2023 at 9:42 AM Eric Pugh <ep...@opensourceconnections.com>
wrote:

> I did discover that if I turned off the Security Manager via
>
> export SOLR_SECURITY_MANAGER_ENABLED=false
>
> Then everything worked.
>
> I noticed that there is a message about security manager being removed in
> the future.   Wondering if the fix is just to tell folks if you are using
> extraction, to disable the security manager?  (Which feels like weird
> advice given our other security related discussions)
>
> > On May 17, 2023, at 8:24 AM, Marcus Eagan <marcusea...@gmail.com> wrote:
> >
> > It’s early for me but by the looks of it, the problem is about lack of
> > access to either a directory or a form. I appreciate the concern believe
> me
> > but I don’t think it will be so bad.
> >
> > The error is truncated. Do you have another error that is complete?
> >
> > Best
> >
> > Marcus
> >
> > On Wed, May 17, 2023 at 3:38 AM Eric Pugh <
> ep...@opensourceconnections.com <mailto:ep...@opensourceconnections.com>>
> > wrote:
> >
> >> I was following the steps in
> >>
> https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html
> >> as part of work on SOLR-6994 to start up the extraction module.
> >>
> >> Unfortunately I am getting back an error:
> >> {
> >>  "error":{
> >>    "msg":"java.security.AccessControlException: access denied
> >> (\"java.io.FilePermission\"
> >>
> \"/private/var/folders/hd/s8jf5wvs4xl4t4mbf0c15rcw0000gp/T/jetty-127_0_0_1-8983-webapp-_solr-any-11939190802556493401\"
> >> \"read\")",
> >>    "trace":"java.lang.IllegalStateException:
> >> java.security.AccessControlException: access denied
> >> (\"java.io.FilePermission\"
> >>
> \"/private/var/folders/hd/s8jf5wvs4xl4t4mbf0c15rcw0000gp/T/jetty-127_0_0_1-8983-webapp-_solr-any-11939190802556493401\"
> >> \"read\")\n\tat
> >>
> org.eclipse.jetty.server.MultiPartFormInputStream.throwIfError(MultiPartFormInputStream.java:526)\n\tat
> >> org.eclipse.jetty.server.MultiPartF
> >>
> >>
> >> My set up is that from my fork of main, I am running “./gradlew dev” and
> >> then I “cd solr/packaging/build/dev”.    From there I follow the
> >> instructions I the ref guide.
> >>
> >> I am at a bit of standstill since I’ve never touched any of the security
> >> stuff….  Would love some help!
> >>
> >> Eric
> >> _______________________
> >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> >> http://www.opensourceconnections.com <
> >> http://www.opensourceconnections.com/> | My Free/Busy <
> >> http://tinyurl.com/eric-cal>
> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >>
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> >>
> >> This e-mail and all contents, including attachments, is considered to be
> >> Company Confidential unless explicitly stated otherwise, regardless of
> >> whether attachments are marked as such.
> >>
> >> --
> > Marcus Eagan
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to