ExtractRequestHandler, skipping errors
Hi, I helped a customer to deployed solr+manifoldCF and everything is going quite smoothly, but every time solr is raising an exception, the manifoldcfjob feeding solr aborts. I would like to know if it is possible to configure the ExtractRequestHandler to ignore errors like it seems to be possible with dataimporthandler and entity processors. I know that it is possible to configure the ExtractRequestHandler to ignore tika exception (We already do that) but the errors that now stops the mcfjobs are generated by solr itself. While it is interesting to have such option in solr, I plan to post to the manifoldcf mailing list, anyway, to know if it is possible to configure manifolcf to be less picky about solr errors. Regards, Roland.
Re: Solr errors
Even if I don't test it myself, you can use Tika, it is able to extract document from zip archives and index them, but of course it depends of the file type in the archive. Regards, Roland. On Thu, Oct 17, 2013 at 2:36 PM, wonder wrote: > Does anybody know how index files in zip archives? > >
Re: Solr errors
I have just find this JIRA report, which could explain your problem: https://issues.apache.org/jira/browse/SOLR-2416 Regards, Roland. On Thu, Oct 17, 2013 at 3:30 PM, wonder wrote: > Thanks for answer. Yes Tika extract, but not index content. Here is the > solr response > ... > "content": [ " 9118_xmessengereu_v18ximpda.**jar dimonvideo.ru.txt " ], > ... > There are not any of this files in index. > Any ideas? > 17.10.2013 17:20, Roland Everaert ?: > > Even if I don't test it myself, you can use Tika, it is able to extract >> document from zip archives and index them, but of course it depends of the >> file type in the archive. >> > >
Re: ExtractRequestHandler, skipping errors
Hi, We already configure the extractrequesthandler to ignore tika exceptions, but it is solr that complains. The customer manage to reproduce the problem. Following is the error from the solr.log. The file type cause this exception was WMZ. It seems that something is missing in a solr class. We use SOLR 4.4. ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:673) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1852) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoSuchMethodError: org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V at org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.java:102) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) ... 16 more On Thu, Oct 17, 2013 at 5:19 PM, Koji Sekiguchi wrote: > Hi Roland, > > > (13/10/17 20:44), Roland Everaert wrote: > >> Hi, >> >> I helped a customer to deployed solr+manifoldCF and everything is going >> quite smoothly, but every time solr is raising an exception, the >> manifoldcfjob feeding >> >> solr aborts. I would like to know if it is possible to configure the >> ExtractRequestHandler to ignore errors like it seems to be possible with >> dataimporthandler and entity processors. >> >> I know that it is possible to configure the ExtractRequestHandler to >> ignore >> tika exception (We already do that) but the errors that now stops the >> mcfjobs are generated by >> >> solr itself. >> >> While it is interesting to have such option in solr, I plan to post to the >> manifoldcf mailing list, anyway, to know if it is possible to configure >> manifolcf to be less picky about solr errors. >> >> > ignoreTikaException flag might help you? > > https://issues.apache.org/**jira/browse/SOLR-2480<https://issues.apache.org/jira/browse/SOLR-2480> > > koji > -- > http://soleami.com/blog/**automatically-acquiring-** > synonym-knowledge-from-**wikipedia.html<http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html> >
XLSB files not indexed
Hi, Can someone tells me if tika is supposed to extract data from xlsb files (the new MS Office format in binary form)? If so then it seems that solr is not able to index them like it is not able to index ODF files (a JIRA is already opened for ODF https://issues.apache.org/jira/browse/SOLR-4809) Can someone confirm the problem, or tell me what to do to make solr works with XLSB files. Regards, Roland.
Re: ExtractRequestHandler, skipping errors
I will open a JIRA issue, I suppose that I just have to create an account first? Regards, Roland. On Fri, Oct 18, 2013 at 12:05 PM, Koji Sekiguchi wrote: > Hi, > > I think the flag cannot ignore NoSuchMethodError. There may be something > wrong here? > > ... I've just checked my Solr 4.5 directories and I found Tika version is > 1.4. > > Tika 1.4 seems to use commons compress 1.5: > > http://svn.apache.org/viewvc/**tika/tags/1.4/tika-parsers/** > pom.xml?view=markup<http://svn.apache.org/viewvc/tika/tags/1.4/tika-parsers/pom.xml?view=markup> > > But I see commons-compress-1.4.1.jar in solr/contrib/extraction/lib/ > directory. > > Can you open a JIRA issue? > > For now, you can get commons compress 1.5 and put it to the directory > (don't forget to remove 1.4.1 jar file). > > koji > > > (13/10/18 16:37), Roland Everaert wrote: > >> Hi, >> >> We already configure the extractrequesthandler to ignore tika exceptions, >> but it is solr that complains. The customer manage to reproduce the >> problem. Following is the error from the solr.log. The file type cause >> this >> exception was WMZ. It seems that something is missing in a solr class. We >> use SOLR 4.4. >> >> ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.**SolrException; >> null:java.lang.**RuntimeException: java.lang.NoSuchMethodError: >> org.apache.commons.compress.**compressors.**CompressorStreamFactory.** >> setDecompressConcatenated(Z)V >> at >> org.apache.solr.servlet.**SolrDispatchFilter.sendError(** >> SolrDispatchFilter.java:673) >> at >> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** >> SolrDispatchFilter.java:383) >> at >> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** >> SolrDispatchFilter.java:158) >> at >> org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter(** >> ApplicationFilterChain.java:**243) >> at >> org.apache.catalina.core.**ApplicationFilterChain.**doFilter(** >> ApplicationFilterChain.java:**210) >> at >> org.apache.catalina.core.**StandardWrapperValve.invoke(** >> StandardWrapperValve.java:222) >> at >> org.apache.catalina.core.**StandardContextValve.invoke(** >> StandardContextValve.java:123) >> at >> org.apache.catalina.core.**StandardHostValve.invoke(** >> StandardHostValve.java:171) >> at >> org.apache.catalina.valves.**ErrorReportValve.invoke(** >> ErrorReportValve.java:99) >> at >> org.apache.catalina.valves.**AccessLogValve.invoke(** >> AccessLogValve.java:953) >> at >> org.apache.catalina.core.**StandardEngineValve.invoke(** >> StandardEngineValve.java:118) >> at >> org.apache.catalina.connector.**CoyoteAdapter.service(** >> CoyoteAdapter.java:408) >> at >> org.apache.coyote.http11.**AbstractHttp11Processor.**process(** >> AbstractHttp11Processor.java:**1023) >> at >> org.apache.coyote.**AbstractProtocol$**AbstractConnectionHandler.** >> process(AbstractProtocol.java:**589) >> at >> org.apache.tomcat.util.net.**AprEndpoint$SocketProcessor.** >> run(AprEndpoint.java:1852) >> at java.util.concurrent.**ThreadPoolExecutor.runWorker(**Unknown >> Source) >> at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**Unknown >> Source) >> at java.lang.Thread.run(Unknown Source) >> Caused by: java.lang.NoSuchMethodError: >> org.apache.commons.compress.**compressors.**CompressorStreamFactory.** >> setDecompressConcatenated(Z)V >> at >> org.apache.tika.parser.pkg.**CompressorParser.parse(** >> CompressorParser.java:102) >> at >> org.apache.tika.parser.**CompositeParser.parse(** >> CompositeParser.java:242) >> at >> org.apache.tika.parser.**CompositeParser.parse(** >> CompositeParser.java:242) >> at >> org.apache.tika.parser.**AutoDetectParser.parse(** >> AutoDetectParser.java:120) >> at >> org.apache.solr.handler.**extraction.**ExtractingDocumentLoader.load(** >> ExtractingDocumentLoader.java:**219) >> at >> org.apache.solr.handler.**ContentStreamHandlerBase.**handleRequestBody(** >> ContentStreamHandlerBase.java:**74) >> at >> org.apache.solr.handler.**RequestHandlerBase.**handleRequest(** >> RequestHandlerBase.java:135) >> at >> org.apache.solr.core.**RequestHandlers$**LazyRequestHandlerWrapper.** >> handleRequest(RequestHandlers.**java:241) >> at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1904) >
Re: ExtractRequestHandler, skipping errors
Here is the link to the issue: https://issues.apache.org/jira/browse/SOLR-5365 Thanks for your help. Roland Everaert. On Fri, Oct 18, 2013 at 1:40 PM, Roland Everaert wrote: > I will open a JIRA issue, I suppose that I just have to create an account > first? > > > Regards, > > > Roland. > > > On Fri, Oct 18, 2013 at 12:05 PM, Koji Sekiguchi wrote: > >> Hi, >> >> I think the flag cannot ignore NoSuchMethodError. There may be something >> wrong here? >> >> ... I've just checked my Solr 4.5 directories and I found Tika version is >> 1.4. >> >> Tika 1.4 seems to use commons compress 1.5: >> >> http://svn.apache.org/viewvc/**tika/tags/1.4/tika-parsers/** >> pom.xml?view=markup<http://svn.apache.org/viewvc/tika/tags/1.4/tika-parsers/pom.xml?view=markup> >> >> But I see commons-compress-1.4.1.jar in solr/contrib/extraction/lib/ >> directory. >> >> Can you open a JIRA issue? >> >> For now, you can get commons compress 1.5 and put it to the directory >> (don't forget to remove 1.4.1 jar file). >> >> koji >> >> >> (13/10/18 16:37), Roland Everaert wrote: >> >>> Hi, >>> >>> We already configure the extractrequesthandler to ignore tika exceptions, >>> but it is solr that complains. The customer manage to reproduce the >>> problem. Following is the error from the solr.log. The file type cause >>> this >>> exception was WMZ. It seems that something is missing in a solr class. We >>> use SOLR 4.4. >>> >>> ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.**SolrException; >>> null:java.lang.**RuntimeException: java.lang.NoSuchMethodError: >>> org.apache.commons.compress.**compressors.**CompressorStreamFactory.** >>> setDecompressConcatenated(Z)V >>> at >>> org.apache.solr.servlet.**SolrDispatchFilter.sendError(** >>> SolrDispatchFilter.java:673) >>> at >>> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** >>> SolrDispatchFilter.java:383) >>> at >>> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** >>> SolrDispatchFilter.java:158) >>> at >>> org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter(** >>> ApplicationFilterChain.java:**243) >>> at >>> org.apache.catalina.core.**ApplicationFilterChain.**doFilter(** >>> ApplicationFilterChain.java:**210) >>> at >>> org.apache.catalina.core.**StandardWrapperValve.invoke(** >>> StandardWrapperValve.java:222) >>> at >>> org.apache.catalina.core.**StandardContextValve.invoke(** >>> StandardContextValve.java:123) >>> at >>> org.apache.catalina.core.**StandardHostValve.invoke(** >>> StandardHostValve.java:171) >>> at >>> org.apache.catalina.valves.**ErrorReportValve.invoke(** >>> ErrorReportValve.java:99) >>> at >>> org.apache.catalina.valves.**AccessLogValve.invoke(** >>> AccessLogValve.java:953) >>> at >>> org.apache.catalina.core.**StandardEngineValve.invoke(** >>> StandardEngineValve.java:118) >>> at >>> org.apache.catalina.connector.**CoyoteAdapter.service(** >>> CoyoteAdapter.java:408) >>> at >>> org.apache.coyote.http11.**AbstractHttp11Processor.**process(** >>> AbstractHttp11Processor.java:**1023) >>> at >>> org.apache.coyote.**AbstractProtocol$**AbstractConnectionHandler.** >>> process(AbstractProtocol.java:**589) >>> at >>> org.apache.tomcat.util.net.**AprEndpoint$SocketProcessor.** >>> run(AprEndpoint.java:1852) >>> at java.util.concurrent.**ThreadPoolExecutor.runWorker(**Unknown >>> Source) >>> at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**Unknown >>> Source) >>> at java.lang.Thread.run(Unknown Source) >>> Caused by: java.lang.NoSuchMethodError: >>> org.apache.commons.compress.**compressors.**CompressorStreamFactory.** >>> setDecompressConcatenated(Z)V >>> at >>> org.apache.tika.parser.pkg.**CompressorParser.parse(** >>> CompressorParser.java:102) >>> at >>> org.apache.tika.parser.**CompositeParser.parse(** >>> CompositeParser.java:242) >>> at >>> org.apache.tika.parser.**CompositeParser.parse(** >>> CompositeParser.java:242) >>> at >>> org.apache.tika.parser.**AutoDetectParser.parse(** >>> AutoDetectParser.java:120)
Re: XLSB files not indexed
Hi Otis, In our case, there is no exception raised by tika or solr, a lucene document is created, but the content field contains only a few white spaces like for ODF files. Roland. On Sat, Oct 19, 2013 at 3:54 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi Roland, > > It looks like: > Tika - yes > Solr - no? > > Based on http://search-lucene.com/?q=xlsb > > ODF != XLSB though, I think... > > Otis > -- > Solr & ElasticSearch Support -- http://sematext.com/ > Performance Monitoring -- http://sematext.com/spm > > > > On Fri, Oct 18, 2013 at 7:36 AM, Roland Everaert > wrote: > > Hi, > > > > Can someone tells me if tika is supposed to extract data from xlsb files > > (the new MS Office format in binary form)? > > > > If so then it seems that solr is not able to index them like it is not > able > > to index ODF files (a JIRA is already opened for ODF > > https://issues.apache.org/jira/browse/SOLR-4809) > > > > Can someone confirm the problem, or tell me what to do to make solr works > > with XLSB files. > > > > > > Regards, > > > > > > Roland. >
Re: Why do people want to deploy to Tomcat?
In my case, the first time I had to deploy and configure solr on tomcat (and jboss) it was a requirement to reuse as much as possible the application/web server already in place. The next deployment I also use tomcat, because I was used to deploy on tomcat and I don't know jetty at all. I could ask the same question with regard to jetty. Why use/bundle(/ if not recommend) jetty with solr over other webserver solutions? Regards, Roland Everaert. On Tue, Nov 12, 2013 at 12:33 PM, Alvaro Cabrerizo wrote: > In my case, the selection of the servlet container has never been a hard > requirement. I mean, some customers provide us a virtual machine configured > with java/tomcat , others have a tomcat installed and want to share it with > solr, others prefer jetty because their sysadmins are used to configure > it... At least in the projects I've been working in, the selection of the > servlet engine has not been a key factor in the project success. > > Regards. > > > On Tue, Nov 12, 2013 at 12:11 PM, Andre Bois-Crettez > wrote: > > > We are using Solr running on Tomcat. > > > > I think the top reasons for us are : > > - we already have nagios monitoring plugins for tomcat that trace > > queries ok/error, http codes / response time etc in access logs, number > > of threads, jvm memory usage etc > > - start, stop, watchdogs, logs : we also use our standard tools for that > > - what about security filters ? Is that possible with jetty ? > > > > André > > > > > > On 11/12/2013 04:54 AM, Alexandre Rafalovitch wrote: > > > >> Hello, > >> > >> I keep seeing here and on Stack Overflow people trying to deploy Solr to > >> Tomcat. We don't usually ask why, just help when where we can. > >> > >> But the question happens often enough that I am curious. What is the > >> actual > >> business case. Is that because Tomcat is well known? Is it because other > >> apps are running under Tomcat and it is ops' requirement? Is it because > >> Tomcat gives something - to Solr - that Jetty does not? > >> > >> It might be useful to know. Especially, since Solr team is considering > >> making the server part into a black box component. What use cases will > >> that > >> break? > >> > >> So, if somebody runs Solr under Tomcat (or needed to and gave up), let's > >> use this thread to collect this knowledge. > >> > >> Regards, > >> Alex. > >> Personal website: http://www.outerthoughts.com/ > >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > >> - Time is the quality of nature that keeps events from happening all at > >> once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > >> > >> -- > >> André Bois-Crettez > >> > >> Software Architect > >> Search Developer > >> http://www.kelkoo.com/ > >> > > > > Kelkoo SAS > > Société par Actions Simplifiée > > Au capital de € 4.168.964,30 > > Siège social : 8, rue du Sentier 75002 Paris > > 425 093 069 RCS Paris > > > > Ce message et les pièces jointes sont confidentiels et établis à > > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le > > destinataire de ce message, merci de le détruire et d'en avertir > > l'expéditeur. > > >
Unable to deplay solr 4.3.0 on jboss EAP 6.1 in mode full JavaEE 6
Hi, For the past months I have deplaoyed and used SOLR 4.3.0 on a JBOSS EAP 6.1 using the standalone configuration. Now due to the addition of a new service, I have to start jboss with a modified version of the standalone-full.xml configuration file, because the service uses JavaEE 6. The only change concerns connection to a datasource and interaction with active directory. With that configuration file, when I try to deploy solr.war, I got the following error: 11:02:23,291 INFO [org.jboss.as.server.deployment] (MSC service thread 1-2) JBA S015876: Starting deployment of "solr.war" (runtime-name: "solr.war") 11:02:25,540 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-2) MSC 01: Failed to start service jboss.deployment.unit."solr.war".PARSE: org.jboss.ms c.service.StartException in service jboss.deployment.unit."solr.war".PARSE: JBAS 018733: Failed to process phase PARSE of deployment "solr.war" at org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(Deplo ymentUnitPhaseService.java:127) [jboss-as-server-7.2.0.Final-redhat-8.jar:7.2.0. Final-redhat-8] at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(Se rviceControllerImpl.java:1811) [jboss-msc-1.0.4.GA-redhat-1.jar:1.0.4.GA-redhat- 1] at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceCont rollerImpl.java:1746) [jboss-msc-1.0.4.GA-redhat-1.jar:1.0.4.GA-redhat-1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1145) [rt.jar:1.7.0_21] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:615) [rt.jar:1.7.0_21] at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_21] Caused by: java.lang.IllegalStateException: Failed to resolve expression: ${cont ext} at org.jboss.metadata.property.DefaultPropertyReplacer.replaceProperties (DefaultPropertyReplacer.java:125) at org.jboss.metadata.parser.util.MetaDataElementParser.getElementText(M etaDataElementParser.java:194) at org.jboss.metadata.parser.ee.ParamValueMetaDataParser.parse(ParamValu eMetaDataParser.java:78) at org.jboss.metadata.parser.servlet.ServletMetaDataParser.parse(Servlet MetaDataParser.java:93) at org.jboss.metadata.parser.servlet.WebCommonMetaDataParser.parse(WebCo mmonMetaDataParser.java:102) at org.jboss.metadata.parser.servlet.WebMetaDataParser.parse(WebMetaData Parser.java:175) at org.jboss.metadata.parser.servlet.WebMetaDataParser.parse(WebMetaData Parser.java:55) at org.jboss.as.web.deployment.WebParsingDeploymentProcessor.deploy(WebP arsingDeploymentProcessor.java:91) at org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(Deplo ymentUnitPhaseService.java:120) [jboss-as-server-7.2.0.Final-redhat-8.jar:7.2.0. Final-redhat-8] ... 5 more 11:02:25,556 ERROR [org.jboss.as.server] (HttpManagementService-threads - 4) JBA S015870: Deploy of deployment "solr.war" was rolled back with the following fail ure message: {"JBAS014671: Failed services" => {"jboss.deployment.unit.\"solr.war\".PARSE" => "org.jboss.msc.service.StartException in service jboss.deployment.unit.\"solr.w ar\".PARSE: JBAS018733: Failed to process phase PARSE of deployment \"solr.war\" Caused by: java.lang.IllegalStateException: Failed to resolve expression: ${ context}"}} 11:02:25,728 INFO [org.jboss.as.server.deployment] (MSC service thread 1-2) JBA S015877: Stopped deployment solr.war (runtime-name: solr.war) in 161ms Jboss: EAP 6.1 Solr: 4.3.0 OS: Windows server 2008 R2 Does anybody already deploy solr with such configuration? Thanks, Roland Everaert.
Re: Unable to deplay solr 4.3.0 on jboss EAP 6.1 in mode full JavaEE 6
No, I have not. I am now working on something else, and I really don't know how to investigate this problem :( On Wed, Oct 2, 2013 at 8:24 PM, delkant wrote: > Did you solve this problem?? I'm dealing exactly with the same issue! > Please > share the solution if you have it. Thanks! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Unable-to-deplay-solr-4-3-0-on-jboss-EAP-6-1-in-mode-full-JavaEE-6-tp4084528p4093183.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Unable to deplay solr 4.3.0 on jboss EAP 6.1 in mode full JavaEE 6
Thanks for the tips. When I got time, I will have a look into it and I will try to use solr via the embedded jetty. Regards, Roland. On Thu, Oct 3, 2013 at 3:26 PM, Shawn Heisey wrote: > On 8/14/2013 5:16 AM, Roland Everaert wrote: > > For the past months I have deplaoyed and used SOLR 4.3.0 on a JBOSS EAP > 6.1 > > using the standalone configuration. > > > > Now due to the addition of a new service, I have to start jboss with a > > modified version of the standalone-full.xml configuration file, because > the > > service uses JavaEE 6. The only change concerns connection to a > datasource > > and interaction with active directory. > > It's difficult for this list to support containers other than what > actually comes with Solr 4.x, which is a stripped down (but otherwise > unmodified) Jetty 8. > > Most of us have come to the realization that running with the jetty > that's included in the example is the least painful way to proceed. > Most of the rest are using Tomcat. Tomcat gets somewhat special > treatment for two reasons: 1) It's very widespread. 2) It's a fellow > Apache project, just as much open source and transparent as Solr itself. > > None of the error messages in the log you have shown us come from Solr. > If that's the only logging info you have, there's nothing for us to go > on. You'll need to get help from redhat or another jboss support avenue > to narrow down the problem, and if you ultimately do find Solr error > messages, then come back here for help resolving them. > > I can say one general thing that might be helpful: The standard .war > file for Solr 4.3.0 and later does not contain logging jars. A proper > slf4j logging setup is critically important for Solr operation, and > particularly problematic with containers other than the included jetty. > > It's possible that by adding the other application, there is now a > problem with logging jars, likely a version conflict. Problems with > logging are, by their very nature, difficult to detect. > > http://wiki.apache.org/solr/SolrLogging#Solr_4.3_and_above > > If you want to use a logging mechanism other than log4j, the note about > intercept jars is sometimes of particular relevance. > > Thanks, > Shawn > >
Adding pdf/word file using JSON/XML
Hi, Based on the wiki, below is an example of how I am currently adding a pdf file with an extra field called name: curl " http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text"; --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf" Is it possible to add a file + any extra fields using a JSON or XML request. Thanks, Roland Everaert.
Re: Adding pdf/word file using JSON/XML
Sorry if it was not clear. What I would like is to know how to construct an XML/JSON request that provide any necessary information (supposedly the full path on disk) to solr to retrieve and index a pdf/ms word document. So, an XML request could look like this: doc10 BLAH /path/to/file.pdf Regards, Roland. On Mon, Jun 10, 2013 at 3:12 PM, Gora Mohanty wrote: > On 10 June 2013 17:47, Roland Everaert wrote: > > Hi, > > > > Based on the wiki, below is an example of how I am currently adding a pdf > > file with an extra field called name: > > curl " > > > http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text > " > > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf" > > > > Is it possible to add a file + any extra fields using a JSON or XML > request. > > It is not entirely clear what you are asking. Do you mean > can one do the same as your example above for a PDF > file, but with a XML or JSON file? If so, yes. Please see > the examples in example/exampledocs/ of a Solr source > tree, and http://wiki.apache.org/solr/ExtractingRequestHandler > > Regards, > Gora >
Re: Adding pdf/word file using JSON/XML
We are working on an application that allows some users to add files (pdf, ms word, odt, etc), located on their local hard disk, to our internal system and allows other users to search for them. So we are considering Solr for the indexing and search functionalities of the system. Along with the file content, we want to index some metadata related to the file. It seems obvious that Solr couldn't import the file from the local disk of the user, so the system will have to import the file into a directory that Solr can reach and instruct Solr to index the file with the metadata, but is it possible to index the file + metadata with a JSON/XML request? It seems that the only way to index a file with some metadata is to build a request that would look like the following exemple that uses curl. The developer would like to avoid using parameters in the url to pass arguments. curl " http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text"; --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf" Additionally, it seems that if a subsequent request is sent to the indexer to update the file, if the metadata are not passed to Solr with the request, they are deleted. Thanks for your help, Roland. On Mon, Jun 10, 2013 at 4:14 PM, Jack Krupansky wrote: > Sorry, but you are STILL not being clear! > > Are you asking if you can pass Solr parameters as XML fields? No. > > Are you asking if the file name and path can be indexed as metadata? To > some degree: > > curl > "http://localhost:8983/solr/**update/extract?literal.id=doc-**1\<http://localhost:8983/solr/update/extract?literal.id=doc-1%5C> > &commit=true&uprefix=attr_" -F "HelloWorld.docx=@HelloWorld.**docx" > > Then the stream has a name that is indexed as metadata: > > > stream_source_info > HelloWorld.docx > stream_content_type > application/octet-stream<**/str> > stream_size > 10096 > stream_name > HelloWorld.docx > Content-Type > application/vnd.**openxmlformats-officedocument.** > wordprocessingml.document > > > and > > > HelloWorld.docx > > > > HelloWorld.docx > > > Or, what is it that you are really string to do? > > Simply tell us in plain language what problem you are trying to solve. > > -- Jack Krupansky > > -Original Message- From: Roland Everaert > Sent: Monday, June 10, 2013 9:23 AM > To: solr-user@lucene.apache.org > Subject: Re: Adding pdf/word file using JSON/XML > > > Sorry if it was not clear. > > What I would like is to know how to construct an XML/JSON request that > provide any necessary information (supposedly the full path on disk) to > solr to retrieve and index a pdf/ms word document. > > So, an XML request could look like this: > > > > doc10 > BLAH > /path/to/file.pdf<**/field> > > > > > Regards, > > > Roland. > > > On Mon, Jun 10, 2013 at 3:12 PM, Gora Mohanty wrote: > > On 10 June 2013 17:47, Roland Everaert wrote: >> > Hi, >> > >> > Based on the wiki, below is an example of how I am currently adding a > >> pdf >> > file with an extra field called name: >> > curl " >> > >> http://localhost:8080/solr/**update/extract?literal.id=** >> doc10&literal.name=BLAH&**defaultField=text<http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text> >> " >> > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf" >> > >> > Is it possible to add a file + any extra fields using a JSON or XML >> request. >> >> It is not entirely clear what you are asking. Do you mean >> can one do the same as your example above for a PDF >> file, but with a XML or JSON file? If so, yes. Please see >> the examples in example/exampledocs/ of a Solr source >> tree, and >> http://wiki.apache.org/solr/**ExtractingRequestHandler<http://wiki.apache.org/solr/ExtractingRequestHandler> >> >> Regards, >> Gora >> >> >
Re: Adding pdf/word file using JSON/XML
Jan, Thanks for the answer. Concerning the usage of /extract, If I understand correctly how works the interface, it seems that the Document is recreated every time the url is called. That would means that all metadata must be provided along the file every time we want to update the related document, to avoid deletion of extra fields. Roland. On Tue, Jun 11, 2013 at 3:31 PM, Jan Høydahl wrote: > Hi, > > You can let your web application where people upload the files take care > of extracting the text, e.g. using Apache Tika. > Once you have the text of the PDF, you can add that to your Solr document > along with all the rest of the metadata, and > post it to Solr as JSON, XML or whatever you like. You do not need to use > extracting request handler then, since you do > the extraction on the client side. > > PS: Evem if you use /extract, note that you can pass the literal.* params > as POST if you choose, using 100% standards-based HTTP multipart post. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > 11. juni 2013 kl. 14:48 skrev Roland Everaert : > > > We are working on an application that allows some users to add files > (pdf, > > ms word, odt, etc), located on their local hard disk, to our internal > > system and allows other users to search for them. So we are considering > > Solr for the indexing and search functionalities of the system. Along > with > > the file content, we want to index some metadata related to the file. > > > > It seems obvious that Solr couldn't import the file from the local disk > of > > the user, so the system will have to import the file into a directory > that > > Solr can reach and instruct Solr to index the file with the metadata, but > > is it possible to index the file + metadata with a JSON/XML request? > > > > It seems that the only way to index a file with some metadata is to > build a > > request that would look like the following exemple that uses curl. The > > developer would like to avoid using parameters in the url to pass > arguments. > > > > curl " > > > http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text > " > > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf" > > > > > > Additionally, it seems that if a subsequent request is sent to the > indexer > > to update the file, if the metadata are not passed to Solr with the > > request, they are deleted. > > > > Thanks for your help, > > > > > > > > Roland. > > > > > > On Mon, Jun 10, 2013 at 4:14 PM, Jack Krupansky >wrote: > > > >> Sorry, but you are STILL not being clear! > >> > >> Are you asking if you can pass Solr parameters as XML fields? No. > >> > >> Are you asking if the file name and path can be indexed as metadata? To > >> some degree: > >> > >> curl "http://localhost:8983/solr/**update/extract?literal.id=doc-**1\< > http://localhost:8983/solr/update/extract?literal.id=doc-1%5C> > >> &commit=true&uprefix=attr_" -F "HelloWorld.docx=@HelloWorld.**docx" > >> > >> Then the stream has a name that is indexed as metadata: > >> > >> > >> stream_source_info > >> HelloWorld.docx > >> stream_content_type > >> application/octet-stream<**/str> > >> stream_size > >> 10096 > >> stream_name > >> HelloWorld.docx > >> Content-Type > >> application/vnd.**openxmlformats-officedocument.** > >> wordprocessingml.document > >> > >> > >> and > >> > >> > >> HelloWorld.docx > >> > >> > >> > >> HelloWorld.docx > >> > >> > >> Or, what is it that you are really string to do? > >> > >> Simply tell us in plain language what problem you are trying to solve. > >> > >> -- Jack Krupansky > >> > >> -Original Message- From: Roland Everaert > >> Sent: Monday, June 10, 2013 9:23 AM > >> To: solr-user@lucene.apache.org > >> Subject: Re: Adding pdf/word file using JSON/XML > >> > >> > >> Sorry if it was not clear. > >> > >> What I would like is to know how to construct an XML/JSON request that > >> provide any necessary information (supposedly the full path on disk) to > >> solr to retrieve and index a pdf/ms word document. > >> > >> So, an XML request could look like this: > >>
Re: Adding pdf/word file using JSON/XML
1) Being aggressive and insulting is not a way to help people understand such complex tool or to help people in general. 2) I read again the feature page of Solr and it is stated that the interface is REST-like and not RESTful as I though in the first place, and communicate to the devs. And as the devs told me a RESTful interface doesn't use parameters in the URI/URL, so ii is my mistake. Hence we have no problem with the interface as it is. Any way I still have a question regarding the /extract interface. It seems that every time a file is updated in Solr, the lucene document is recreated from scratch which means that any extra information we want to be indexed/stored along the file is erased if the request doesn't contains them. Is there a parameter that allow changing that behaviour? Regards, Roland. On Tue, Jun 11, 2013 at 4:35 PM, Jack Krupansky wrote: > "is it possible to index the file + metadata with a JSON/XML request?" > > You still aren't being clear as to what you are really trying to achieve > here. I mean, just write a shell script that does the curl command, or > write a Java program or application layer that uses SolrJ to talk to Solr > and accepts JSON?XML/REST requests. > > > "It seems that the only way to index a file with some metadata is to build > a > request that would look like the following example that uses curl." > > Curl is just a fancy way to do an HTTP request. You can do the same HTTP > request from Java code (or Python or whatever.) > > > "The developer would like to avoid using parameters in the url to pass > arguments." > > Seriously?! What is THAT all about!! I mean, really, HTTP and URLs and > URL query parameters are part of the heart of the Internet infrastructure! > > If this whole thread is merely that you have an IDIOT who can't cope with > passing HTTP URL query parameters, all I can say is... Wow! > > But use SolrJ and then at least it doesn't LOOK like they are URL Query > parameters. > > Or, maybe this is just a case where the developer WANTS to use SOAP rather > than a REST style of API. > > In any case, please clue us in as to what PROBLEM you are really trying to > solve. Just use plain English and avoid getting caught up in what the > solution might be. > > The real bottom line is that random application developers should not be > talking directly to Solr anyway - they should be provided with an > "application layer" that has a clean, application-oriented REST API and the > gory details of the Solr API would be hidden inside the application layer. > > > -- Jack Krupansky > > -Original Message- From: Roland Everaert > Sent: Tuesday, June 11, 2013 8:48 AM > > To: solr-user@lucene.apache.org > Subject: Re: Adding pdf/word file using JSON/XML > > We are working on an application that allows some users to add files (pdf, > ms word, odt, etc), located on their local hard disk, to our internal > system and allows other users to search for them. So we are considering > Solr for the indexing and search functionalities of the system. Along with > the file content, we want to index some metadata related to the file. > > It seems obvious that Solr couldn't import the file from the local disk of > the user, so the system will have to import the file into a directory that > Solr can reach and instruct Solr to index the file with the metadata, but > is it possible to index the file + metadata with a JSON/XML request? > > It seems that the only way to index a file with some metadata is to build a > request that would look like the following exemple that uses curl. The > developer would like to avoid using parameters in the url to pass > arguments. > > curl " > http://localhost:8080/solr/**update/extract?literal.id=** > doc10&literal.name=BLAH&**defaultField=text<http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text> > " > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf" > > > Additionally, it seems that if a subsequent request is sent to the indexer > to update the file, if the metadata are not passed to Solr with the > request, they are deleted. > > Thanks for your help, > > > > Roland. > > > On Mon, Jun 10, 2013 at 4:14 PM, Jack Krupansky * > *wrote: > > Sorry, but you are STILL not being clear! >> >> Are you asking if you can pass Solr parameters as XML fields? No. >> >> Are you asking if the file name and path can be indexed as metadata? To >> some degree: >> >> curl >> "http://localhost:8983/solr/update/extract?literal.id=doc-1\<http://localhost:8983/
Re: Adding pdf/word file using JSON/XML
I apologize also for my obscure questions and I thanks you and the list for your help so far and the very clear explanation you give about the behaviour of Solr and SolrCell. I am effectively an intermediary between the list and the dev, because our development process is not efficient. The full story is (beware its boring), we are a bunch of devs in a consultancy company waiting for the next mission. In the mean time, our boss gives us something to do, but instead of developing a big application where each dev has a module to care of, or working each on its own machine. We have to develop the same application with various technologies/tools/language. One is using .NET, another is using Java and the spring framework and the 3rd one is using JavaEE. And I am in the middle as a sysadmin/dba/investigator of tools and API/provider of information and transparent API for everybody while managing 3 databases, 2 application servers and 2 different indexers on the same server and take into consideration that at some points in time the devs will interchange their tools (rdbms and/or indexers) *now you can breath*. Top that with the fact that, one of the dev is experienced in REST and web technologies (the IDIOT ;)) and that I have misread the first line of the Solr feature page (Solr is a standalone enterprise search server with a REST-like API), I actually communicate that Solr provides a RESTful API. So I think I am a bit overwhelmed by the task at hand. To conclude, yesterday I discuss with the team and we decide that I will provide a RESTful web service that will hide the access to the indexers among other things, so even the .NET guy will be able to use it. That will allow me to study REST and, I hope, make clearer questions in the future. Thanks again for your help and your patience, Roland Everaert. On Wed, Jun 12, 2013 at 4:18 PM, Jack Krupansky wrote: > I'm sorry if I came across as aggressive or insulting - I'm only trying to > dig down to what your actual difficulty is - and you have been making that > extremely difficult for all of us. You need to help us all out here by more > clearly expressing what your actual problem is. You will have to excuse the > rest of us if we are unable to read your mind! > > It sounds as if you are an intermediary between your devs and this list. > That's NOT a very effective communications strategy! You need to either > have your devs communicate directly on this list, or you need to do a much > better job of understanding what their actual problem is and then > communicate that actual problem to this list, plainly and clearly. > > TRYING to read your mind (and indirectly your devs' minds as well - not an > easy task!), and reading between the lines, it is starting to sound as if > you (or/and your devs) are not clear on how Solr works as a "database". > > Core Solr does have full CRUD (Add or Create, Read or Query, Update, and > Delete), although not in a strict, pure REST sense, that is true. > > A "full" update in Solr is the same as an Add - add a new, fresh document, > and then delete the old document. Some people call this an "Upsert" > (combination of Update or Insert). > > There are really two forms of update (a difficulty in REST): 1) full > update or "replace" - equal to a delete and an add, and 2) partial or > incremental update. True REST only has the latter > > Core Solr does have support for partial or incremental Update with Atomic > Updates. Solr will in fact retain the existing data and only update any new > field values that are supplied on the update request. > > SolrCell (Extracting RequestHandler or "/update/extract") is not a core > part of Solr. It is an add on "contrib" module. It does not have full CRUD > - no delete, and no partial update, but it does support add and full update. > > As someone else already suggested, you can do the work of SolrCell > yourself by calling Tika directly in your app layer and then sending normal > Solr CRUD requests. > > > -- Jack Krupansky > > -Original Message- From: Roland Everaert > Sent: Wednesday, June 12, 2013 5:21 AM > > To: solr-user@lucene.apache.org > Subject: Re: Adding pdf/word file using JSON/XML > > 1) Being aggressive and insulting is not a way to help people understand > such complex tool or to help people in general. > > 2) I read again the feature page of Solr and it is stated that the > interface is REST-like and not RESTful as I though in the first place, and > communicate to the devs. And as the devs told me a RESTful interface > doesn't use parameters in the URI/URL, so ii is my mistake. Hence we have > no problem with the interface as it is. > > Any way I still have a question regarding the /extract interface. It seems