ExtractRequestHandler, skipping errors

2013-10-17 Thread Roland Everaert
Hi,

I helped a customer to deployed solr+manifoldCF and everything is going
quite smoothly, but every time solr is raising an exception, the
manifoldcfjob feeding
solr aborts. I would like to know if it is possible to configure the
ExtractRequestHandler to ignore errors like it seems to be possible with
dataimporthandler and entity processors.

I know that it is possible to configure the ExtractRequestHandler to ignore
tika exception (We already do that) but the errors that now stops the
mcfjobs are generated by
solr itself.

While it is interesting to have such option in solr, I plan to post to the
manifoldcf mailing list, anyway, to know if it is possible to configure
manifolcf to be less picky about solr errors.


Regards,


Roland.


Re: Solr errors

2013-10-17 Thread Roland Everaert
Even if I don't test it myself, you can use Tika, it is able to extract
document from zip archives and index them, but of course it depends of the
file type in the archive.

Regards,


Roland.


On Thu, Oct 17, 2013 at 2:36 PM, wonder  wrote:

> Does anybody know how index files in zip archives?
>
>


Re: Solr errors

2013-10-17 Thread Roland Everaert
I have just find this JIRA report, which could explain your problem:

https://issues.apache.org/jira/browse/SOLR-2416


Regards,

Roland.



On Thu, Oct 17, 2013 at 3:30 PM, wonder  wrote:

> Thanks for answer. Yes Tika extract, but not index content. Here is the
> solr response
> ...
> "content": [ " 9118_xmessengereu_v18ximpda.**jar dimonvideo.ru.txt " ],
> ...
> There are not any of this files in index.
> Any ideas?
> 17.10.2013 17:20, Roland Everaert ?:
>
>  Even if I don't test it myself, you can use Tika, it is able to extract
>> document from zip archives and index them, but of course it depends of the
>> file type in the archive.
>>
>
>


Re: ExtractRequestHandler, skipping errors

2013-10-18 Thread Roland Everaert
Hi,

We already configure the extractrequesthandler to ignore tika exceptions,
but it is solr that complains. The customer manage to reproduce the
problem. Following is the error from the solr.log. The file type cause this
exception was WMZ. It seems that something is missing in a solr class. We
use SOLR 4.4.

ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: java.lang.NoSuchMethodError:
org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:673)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1852)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NoSuchMethodError:
org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V
at
org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.java:102)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
... 16 more





On Thu, Oct 17, 2013 at 5:19 PM, Koji Sekiguchi  wrote:

> Hi Roland,
>
>
> (13/10/17 20:44), Roland Everaert wrote:
>
>> Hi,
>>
>> I helped a customer to deployed solr+manifoldCF and everything is going
>> quite smoothly, but every time solr is raising an exception, the
>> manifoldcfjob feeding
>>
>> solr aborts. I would like to know if it is possible to configure the
>> ExtractRequestHandler to ignore errors like it seems to be possible with
>> dataimporthandler and entity processors.
>>
>> I know that it is possible to configure the ExtractRequestHandler to
>> ignore
>> tika exception (We already do that) but the errors that now stops the
>> mcfjobs are generated by
>>
>> solr itself.
>>
>> While it is interesting to have such option in solr, I plan to post to the
>> manifoldcf mailing list, anyway, to know if it is possible to configure
>> manifolcf to be less picky about solr errors.
>>
>>
> ignoreTikaException flag might help you?
>
> https://issues.apache.org/**jira/browse/SOLR-2480<https://issues.apache.org/jira/browse/SOLR-2480>
>
> koji
> --
> http://soleami.com/blog/**automatically-acquiring-**
> synonym-knowledge-from-**wikipedia.html<http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html>
>


XLSB files not indexed

2013-10-18 Thread Roland Everaert
Hi,

Can someone tells me if tika is supposed to extract data from xlsb files
(the new MS Office format in binary form)?

If so then it seems that solr is not able to index them like it is not able
to index ODF files (a JIRA is already opened for ODF
https://issues.apache.org/jira/browse/SOLR-4809)

Can someone confirm the problem, or tell me what to do to make solr works
with XLSB files.


Regards,


Roland.


Re: ExtractRequestHandler, skipping errors

2013-10-18 Thread Roland Everaert
I will open a JIRA issue, I suppose that I just have to create an account
first?


Regards,


Roland.


On Fri, Oct 18, 2013 at 12:05 PM, Koji Sekiguchi  wrote:

> Hi,
>
> I think the flag cannot ignore NoSuchMethodError. There may be something
> wrong here?
>
> ... I've just checked my Solr 4.5 directories and I found Tika version is
> 1.4.
>
> Tika 1.4 seems to use commons compress 1.5:
>
> http://svn.apache.org/viewvc/**tika/tags/1.4/tika-parsers/**
> pom.xml?view=markup<http://svn.apache.org/viewvc/tika/tags/1.4/tika-parsers/pom.xml?view=markup>
>
> But I see commons-compress-1.4.1.jar in solr/contrib/extraction/lib/
> directory.
>
> Can you open a JIRA issue?
>
> For now, you can get commons compress 1.5 and put it to the directory
> (don't forget to remove 1.4.1 jar file).
>
> koji
>
>
> (13/10/18 16:37), Roland Everaert wrote:
>
>> Hi,
>>
>> We already configure the extractrequesthandler to ignore tika exceptions,
>> but it is solr that complains. The customer manage to reproduce the
>> problem. Following is the error from the solr.log. The file type cause
>> this
>> exception was WMZ. It seems that something is missing in a solr class. We
>> use SOLR 4.4.
>>
>> ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.**SolrException;
>> null:java.lang.**RuntimeException: java.lang.NoSuchMethodError:
>> org.apache.commons.compress.**compressors.**CompressorStreamFactory.**
>> setDecompressConcatenated(Z)V
>>  at
>> org.apache.solr.servlet.**SolrDispatchFilter.sendError(**
>> SolrDispatchFilter.java:673)
>>  at
>> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
>> SolrDispatchFilter.java:383)
>>  at
>> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
>> SolrDispatchFilter.java:158)
>>  at
>> org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter(**
>> ApplicationFilterChain.java:**243)
>>  at
>> org.apache.catalina.core.**ApplicationFilterChain.**doFilter(**
>> ApplicationFilterChain.java:**210)
>>  at
>> org.apache.catalina.core.**StandardWrapperValve.invoke(**
>> StandardWrapperValve.java:222)
>>  at
>> org.apache.catalina.core.**StandardContextValve.invoke(**
>> StandardContextValve.java:123)
>>  at
>> org.apache.catalina.core.**StandardHostValve.invoke(**
>> StandardHostValve.java:171)
>>  at
>> org.apache.catalina.valves.**ErrorReportValve.invoke(**
>> ErrorReportValve.java:99)
>>  at
>> org.apache.catalina.valves.**AccessLogValve.invoke(**
>> AccessLogValve.java:953)
>>  at
>> org.apache.catalina.core.**StandardEngineValve.invoke(**
>> StandardEngineValve.java:118)
>>  at
>> org.apache.catalina.connector.**CoyoteAdapter.service(**
>> CoyoteAdapter.java:408)
>>  at
>> org.apache.coyote.http11.**AbstractHttp11Processor.**process(**
>> AbstractHttp11Processor.java:**1023)
>>  at
>> org.apache.coyote.**AbstractProtocol$**AbstractConnectionHandler.**
>> process(AbstractProtocol.java:**589)
>>  at
>> org.apache.tomcat.util.net.**AprEndpoint$SocketProcessor.**
>> run(AprEndpoint.java:1852)
>>  at java.util.concurrent.**ThreadPoolExecutor.runWorker(**Unknown
>> Source)
>>  at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**Unknown
>> Source)
>>  at java.lang.Thread.run(Unknown Source)
>> Caused by: java.lang.NoSuchMethodError:
>> org.apache.commons.compress.**compressors.**CompressorStreamFactory.**
>> setDecompressConcatenated(Z)V
>>  at
>> org.apache.tika.parser.pkg.**CompressorParser.parse(**
>> CompressorParser.java:102)
>>  at
>> org.apache.tika.parser.**CompositeParser.parse(**
>> CompositeParser.java:242)
>>  at
>> org.apache.tika.parser.**CompositeParser.parse(**
>> CompositeParser.java:242)
>>  at
>> org.apache.tika.parser.**AutoDetectParser.parse(**
>> AutoDetectParser.java:120)
>>  at
>> org.apache.solr.handler.**extraction.**ExtractingDocumentLoader.load(**
>> ExtractingDocumentLoader.java:**219)
>>  at
>> org.apache.solr.handler.**ContentStreamHandlerBase.**handleRequestBody(**
>> ContentStreamHandlerBase.java:**74)
>>  at
>> org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
>> RequestHandlerBase.java:135)
>>  at
>> org.apache.solr.core.**RequestHandlers$**LazyRequestHandlerWrapper.**
>> handleRequest(RequestHandlers.**java:241)
>>  at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1904)
>

Re: ExtractRequestHandler, skipping errors

2013-10-18 Thread Roland Everaert
Here is the link to the issue:

https://issues.apache.org/jira/browse/SOLR-5365

Thanks for your help.


Roland Everaert.


On Fri, Oct 18, 2013 at 1:40 PM, Roland Everaert wrote:

> I will open a JIRA issue, I suppose that I just have to create an account
> first?
>
>
> Regards,
>
>
> Roland.
>
>
> On Fri, Oct 18, 2013 at 12:05 PM, Koji Sekiguchi wrote:
>
>> Hi,
>>
>> I think the flag cannot ignore NoSuchMethodError. There may be something
>> wrong here?
>>
>> ... I've just checked my Solr 4.5 directories and I found Tika version is
>> 1.4.
>>
>> Tika 1.4 seems to use commons compress 1.5:
>>
>> http://svn.apache.org/viewvc/**tika/tags/1.4/tika-parsers/**
>> pom.xml?view=markup<http://svn.apache.org/viewvc/tika/tags/1.4/tika-parsers/pom.xml?view=markup>
>>
>> But I see commons-compress-1.4.1.jar in solr/contrib/extraction/lib/
>> directory.
>>
>> Can you open a JIRA issue?
>>
>> For now, you can get commons compress 1.5 and put it to the directory
>> (don't forget to remove 1.4.1 jar file).
>>
>> koji
>>
>>
>> (13/10/18 16:37), Roland Everaert wrote:
>>
>>> Hi,
>>>
>>> We already configure the extractrequesthandler to ignore tika exceptions,
>>> but it is solr that complains. The customer manage to reproduce the
>>> problem. Following is the error from the solr.log. The file type cause
>>> this
>>> exception was WMZ. It seems that something is missing in a solr class. We
>>> use SOLR 4.4.
>>>
>>> ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.**SolrException;
>>> null:java.lang.**RuntimeException: java.lang.NoSuchMethodError:
>>> org.apache.commons.compress.**compressors.**CompressorStreamFactory.**
>>> setDecompressConcatenated(Z)V
>>>  at
>>> org.apache.solr.servlet.**SolrDispatchFilter.sendError(**
>>> SolrDispatchFilter.java:673)
>>>  at
>>> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
>>> SolrDispatchFilter.java:383)
>>>  at
>>> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
>>> SolrDispatchFilter.java:158)
>>>  at
>>> org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter(**
>>> ApplicationFilterChain.java:**243)
>>>  at
>>> org.apache.catalina.core.**ApplicationFilterChain.**doFilter(**
>>> ApplicationFilterChain.java:**210)
>>>  at
>>> org.apache.catalina.core.**StandardWrapperValve.invoke(**
>>> StandardWrapperValve.java:222)
>>>  at
>>> org.apache.catalina.core.**StandardContextValve.invoke(**
>>> StandardContextValve.java:123)
>>>  at
>>> org.apache.catalina.core.**StandardHostValve.invoke(**
>>> StandardHostValve.java:171)
>>>  at
>>> org.apache.catalina.valves.**ErrorReportValve.invoke(**
>>> ErrorReportValve.java:99)
>>>  at
>>> org.apache.catalina.valves.**AccessLogValve.invoke(**
>>> AccessLogValve.java:953)
>>>  at
>>> org.apache.catalina.core.**StandardEngineValve.invoke(**
>>> StandardEngineValve.java:118)
>>>  at
>>> org.apache.catalina.connector.**CoyoteAdapter.service(**
>>> CoyoteAdapter.java:408)
>>>  at
>>> org.apache.coyote.http11.**AbstractHttp11Processor.**process(**
>>> AbstractHttp11Processor.java:**1023)
>>>  at
>>> org.apache.coyote.**AbstractProtocol$**AbstractConnectionHandler.**
>>> process(AbstractProtocol.java:**589)
>>>  at
>>> org.apache.tomcat.util.net.**AprEndpoint$SocketProcessor.**
>>> run(AprEndpoint.java:1852)
>>>  at java.util.concurrent.**ThreadPoolExecutor.runWorker(**Unknown
>>> Source)
>>>  at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**Unknown
>>> Source)
>>>  at java.lang.Thread.run(Unknown Source)
>>> Caused by: java.lang.NoSuchMethodError:
>>> org.apache.commons.compress.**compressors.**CompressorStreamFactory.**
>>> setDecompressConcatenated(Z)V
>>>  at
>>> org.apache.tika.parser.pkg.**CompressorParser.parse(**
>>> CompressorParser.java:102)
>>>  at
>>> org.apache.tika.parser.**CompositeParser.parse(**
>>> CompositeParser.java:242)
>>>  at
>>> org.apache.tika.parser.**CompositeParser.parse(**
>>> CompositeParser.java:242)
>>>  at
>>> org.apache.tika.parser.**AutoDetectParser.parse(**
>>> AutoDetectParser.java:120)

Re: XLSB files not indexed

2013-10-21 Thread Roland Everaert
Hi Otis,

In our case, there is no exception raised by tika or solr, a lucene
document is created, but the content field contains only a few white spaces
like for ODF files.


Roland.


On Sat, Oct 19, 2013 at 3:54 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi Roland,
>
> It looks like:
> Tika - yes
> Solr - no?
>
> Based on http://search-lucene.com/?q=xlsb
>
> ODF != XLSB though, I think...
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Oct 18, 2013 at 7:36 AM, Roland Everaert 
> wrote:
> > Hi,
> >
> > Can someone tells me if tika is supposed to extract data from xlsb files
> > (the new MS Office format in binary form)?
> >
> > If so then it seems that solr is not able to index them like it is not
> able
> > to index ODF files (a JIRA is already opened for ODF
> > https://issues.apache.org/jira/browse/SOLR-4809)
> >
> > Can someone confirm the problem, or tell me what to do to make solr works
> > with XLSB files.
> >
> >
> > Regards,
> >
> >
> > Roland.
>


Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Roland Everaert
In my case, the first time I had to deploy and configure solr on tomcat
(and jboss) it was a requirement to reuse as much as possible the
application/web server already in place. The next deployment I also use
tomcat, because I was used to deploy on tomcat and I don't know jetty at
all.

I could ask the same question with regard to jetty. Why use/bundle(/ if not
recommend) jetty with solr over other webserver solutions?

Regards,


Roland Everaert.



On Tue, Nov 12, 2013 at 12:33 PM, Alvaro Cabrerizo wrote:

> In my case, the selection of the servlet container has never been a hard
> requirement. I mean, some customers provide us a virtual machine configured
> with java/tomcat , others have a tomcat installed and want to share it with
> solr, others prefer jetty because their sysadmins are used to configure
> it...  At least in the projects I've been working in, the selection of the
> servlet engine has not been a key factor in the project success.
>
> Regards.
>
>
> On Tue, Nov 12, 2013 at 12:11 PM, Andre Bois-Crettez
> wrote:
>
> > We are using Solr running on Tomcat.
> >
> > I think the top reasons for us are :
> >  - we already have nagios monitoring plugins for tomcat that trace
> > queries ok/error, http codes / response time etc in access logs, number
> > of threads, jvm memory usage etc
> >  - start, stop, watchdogs, logs : we also use our standard tools for that
> >  - what about security filters ? Is that possible with jetty ?
> >
> > André
> >
> >
> > On 11/12/2013 04:54 AM, Alexandre Rafalovitch wrote:
> >
> >> Hello,
> >>
> >> I keep seeing here and on Stack Overflow people trying to deploy Solr to
> >> Tomcat. We don't usually ask why, just help when where we can.
> >>
> >> But the question happens often enough that I am curious. What is the
> >> actual
> >> business case. Is that because Tomcat is well known? Is it because other
> >> apps are running under Tomcat and it is ops' requirement? Is it because
> >> Tomcat gives something - to Solr - that Jetty does not?
> >>
> >> It might be useful to know. Especially, since Solr team is considering
> >> making the server part into a black box component. What use cases will
> >> that
> >> break?
> >>
> >> So, if somebody runs Solr under Tomcat (or needed to and gave up), let's
> >> use this thread to collect this knowledge.
> >>
> >> Regards,
> >> Alex.
> >> Personal website: http://www.outerthoughts.com/
> >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >> - Time is the quality of nature that keeps events from happening all at
> >> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> >>
> >> --
> >> André Bois-Crettez
> >>
> >> Software Architect
> >> Search Developer
> >> http://www.kelkoo.com/
> >>
> >
> > Kelkoo SAS
> > Société par Actions Simplifiée
> > Au capital de € 4.168.964,30
> > Siège social : 8, rue du Sentier 75002 Paris
> > 425 093 069 RCS Paris
> >
> > Ce message et les pièces jointes sont confidentiels et établis à
> > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> > destinataire de ce message, merci de le détruire et d'en avertir
> > l'expéditeur.
> >
>


Unable to deplay solr 4.3.0 on jboss EAP 6.1 in mode full JavaEE 6

2013-08-14 Thread Roland Everaert
Hi,

For the past months I have deplaoyed and used SOLR 4.3.0 on a JBOSS EAP 6.1
using the standalone configuration.

Now due to the addition of a new service, I have to start jboss with a
modified version of the standalone-full.xml configuration file, because the
service uses JavaEE 6. The only change concerns connection to a datasource
and interaction with active directory.

With that configuration file, when I try to deploy solr.war, I got the
following error:

11:02:23,291 INFO  [org.jboss.as.server.deployment] (MSC service thread
1-2) JBA
S015876: Starting deployment of "solr.war" (runtime-name: "solr.war")
11:02:25,540 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-2)
MSC
01: Failed to start service jboss.deployment.unit."solr.war".PARSE:
org.jboss.ms
c.service.StartException in service jboss.deployment.unit."solr.war".PARSE:
JBAS
018733: Failed to process phase PARSE of deployment "solr.war"
at
org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(Deplo
ymentUnitPhaseService.java:127)
[jboss-as-server-7.2.0.Final-redhat-8.jar:7.2.0.
Final-redhat-8]
at
org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(Se
rviceControllerImpl.java:1811)
[jboss-msc-1.0.4.GA-redhat-1.jar:1.0.4.GA-redhat-
1]
at
org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceCont
rollerImpl.java:1746) [jboss-msc-1.0.4.GA-redhat-1.jar:1.0.4.GA-redhat-1]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145) [rt.jar:1.7.0_21]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615) [rt.jar:1.7.0_21]
at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_21]
Caused by: java.lang.IllegalStateException: Failed to resolve expression:
${cont
ext}
at
org.jboss.metadata.property.DefaultPropertyReplacer.replaceProperties
(DefaultPropertyReplacer.java:125)
at
org.jboss.metadata.parser.util.MetaDataElementParser.getElementText(M
etaDataElementParser.java:194)
at
org.jboss.metadata.parser.ee.ParamValueMetaDataParser.parse(ParamValu
eMetaDataParser.java:78)
at
org.jboss.metadata.parser.servlet.ServletMetaDataParser.parse(Servlet
MetaDataParser.java:93)
at
org.jboss.metadata.parser.servlet.WebCommonMetaDataParser.parse(WebCo
mmonMetaDataParser.java:102)
at
org.jboss.metadata.parser.servlet.WebMetaDataParser.parse(WebMetaData
Parser.java:175)
at
org.jboss.metadata.parser.servlet.WebMetaDataParser.parse(WebMetaData
Parser.java:55)
at
org.jboss.as.web.deployment.WebParsingDeploymentProcessor.deploy(WebP
arsingDeploymentProcessor.java:91)
at
org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(Deplo
ymentUnitPhaseService.java:120)
[jboss-as-server-7.2.0.Final-redhat-8.jar:7.2.0.
Final-redhat-8]
... 5 more

11:02:25,556 ERROR [org.jboss.as.server] (HttpManagementService-threads -
4) JBA
S015870: Deploy of deployment "solr.war" was rolled back with the following
fail
ure message:
{"JBAS014671: Failed services" =>
{"jboss.deployment.unit.\"solr.war\".PARSE" =>
 "org.jboss.msc.service.StartException in service
jboss.deployment.unit.\"solr.w
ar\".PARSE: JBAS018733: Failed to process phase PARSE of deployment
\"solr.war\"

Caused by: java.lang.IllegalStateException: Failed to resolve
expression: ${
context}"}}
11:02:25,728 INFO  [org.jboss.as.server.deployment] (MSC service thread
1-2) JBA
S015877: Stopped deployment solr.war (runtime-name: solr.war) in 161ms



Jboss: EAP 6.1
Solr: 4.3.0
OS: Windows server 2008 R2


Does anybody already deploy solr with such configuration?


Thanks,



Roland Everaert.


Re: Unable to deplay solr 4.3.0 on jboss EAP 6.1 in mode full JavaEE 6

2013-10-02 Thread Roland Everaert
No, I have not. I am now working on something else, and I really don't know
how to investigate this problem :(


On Wed, Oct 2, 2013 at 8:24 PM, delkant  wrote:

> Did you solve this problem?? I'm dealing exactly with the same issue!
> Please
> share the solution if you have it. Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Unable-to-deplay-solr-4-3-0-on-jboss-EAP-6-1-in-mode-full-JavaEE-6-tp4084528p4093183.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Unable to deplay solr 4.3.0 on jboss EAP 6.1 in mode full JavaEE 6

2013-10-04 Thread Roland Everaert
Thanks for the tips. When I got time, I will have a look into it and I will
try to use solr via the embedded jetty.


Regards,


Roland.


On Thu, Oct 3, 2013 at 3:26 PM, Shawn Heisey  wrote:

> On 8/14/2013 5:16 AM, Roland Everaert wrote:
> > For the past months I have deplaoyed and used SOLR 4.3.0 on a JBOSS EAP
> 6.1
> > using the standalone configuration.
> >
> > Now due to the addition of a new service, I have to start jboss with a
> > modified version of the standalone-full.xml configuration file, because
> the
> > service uses JavaEE 6. The only change concerns connection to a
> datasource
> > and interaction with active directory.
>
> It's difficult for this list to support containers other than what
> actually comes with Solr 4.x, which is a stripped down (but otherwise
> unmodified) Jetty 8.
>
> Most of us have come to the realization that running with the jetty
> that's included in the example is the least painful way to proceed.
> Most of the rest are using Tomcat.  Tomcat gets somewhat special
> treatment for two reasons: 1) It's very widespread.  2) It's a fellow
> Apache project, just as much open source and transparent as Solr itself.
>
> None of the error messages in the log you have shown us come from Solr.
>  If that's the only logging info you have, there's nothing for us to go
> on.  You'll need to get help from redhat or another jboss support avenue
> to narrow down the problem, and if you ultimately do find Solr error
> messages, then come back here for help resolving them.
>
> I can say one general thing that might be helpful:  The standard .war
> file for Solr 4.3.0 and later does not contain logging jars.  A proper
> slf4j logging setup is critically important for Solr operation, and
> particularly problematic with containers other than the included jetty.
>
> It's possible that by adding the other application, there is now a
> problem with logging jars, likely a version conflict.  Problems with
> logging are, by their very nature, difficult to detect.
>
> http://wiki.apache.org/solr/SolrLogging#Solr_4.3_and_above
>
> If you want to use a logging mechanism other than log4j, the note about
> intercept jars is sometimes of particular relevance.
>
> Thanks,
> Shawn
>
>


Adding pdf/word file using JSON/XML

2013-06-10 Thread Roland Everaert
Hi,

Based on the wiki, below is an example of how I am currently adding a pdf
file with an extra field called name:
curl "
http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text";
--data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"

Is it possible to add a file + any extra fields using a JSON or XML request.


Thanks,



Roland Everaert.


Re: Adding pdf/word file using JSON/XML

2013-06-10 Thread Roland Everaert
Sorry if it was not clear.

What I would like is to know how to construct an XML/JSON request that
provide any necessary information (supposedly the full path on disk) to
solr to retrieve and index a pdf/ms word document.

So, an XML request could look like this:



doc10
BLAH
/path/to/file.pdf




Regards,


Roland.


On Mon, Jun 10, 2013 at 3:12 PM, Gora Mohanty  wrote:

> On 10 June 2013 17:47, Roland Everaert  wrote:
> > Hi,
> >
> > Based on the wiki, below is an example of how I am currently adding a pdf
> > file with an extra field called name:
> > curl "
> >
> http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text
> "
> > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"
> >
> > Is it possible to add a file + any extra fields using a JSON or XML
> request.
>
> It is not entirely clear what you are asking. Do you mean
> can one do the same as your example above for a PDF
> file, but with a XML or JSON file? If so, yes. Please see
> the examples in example/exampledocs/ of a Solr source
> tree, and http://wiki.apache.org/solr/ExtractingRequestHandler
>
> Regards,
> Gora
>


Re: Adding pdf/word file using JSON/XML

2013-06-11 Thread Roland Everaert
We are working on an application that allows some users to add files (pdf,
ms word, odt, etc), located on their local hard disk, to our internal
system and allows other users to search for them. So we are considering
Solr for the indexing and search functionalities of the system. Along with
the file content, we want to index some metadata related to the file.

It seems obvious that Solr couldn't import the file from the local disk of
the user, so the system will have to import the file into a directory that
Solr can reach and instruct Solr to index the file with the metadata, but
is it possible to index the file + metadata with a JSON/XML request?

It seems that the only way to index a file with some metadata is to build a
request that would look like the following exemple that uses curl. The
developer would like to avoid using parameters in the url to pass arguments.

curl "
http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text";
--data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"


Additionally, it seems that if a subsequent request is sent to the indexer
to update the file, if the metadata are not passed to Solr with the
request, they are deleted.

Thanks for your help,



Roland.


On Mon, Jun 10, 2013 at 4:14 PM, Jack Krupansky wrote:

> Sorry, but you are STILL not being clear!
>
> Are you asking if you can pass Solr parameters as XML fields? No.
>
> Are you asking if the file name and path can be indexed as metadata? To
> some degree:
>
> curl 
> "http://localhost:8983/solr/**update/extract?literal.id=doc-**1\<http://localhost:8983/solr/update/extract?literal.id=doc-1%5C>
> &commit=true&uprefix=attr_" -F "HelloWorld.docx=@HelloWorld.**docx"
>
> Then the stream has a name that is indexed as metadata:
>
> 
>  stream_source_info
>  HelloWorld.docx
>  stream_content_type
>  application/octet-stream<**/str>
>  stream_size
>  10096
>  stream_name
>  HelloWorld.docx
>  Content-Type
>  application/vnd.**openxmlformats-officedocument.**
> wordprocessingml.document
> 
>
> and
>
> 
>  HelloWorld.docx
> 
>
> 
>  HelloWorld.docx
> 
>
> Or, what is it that you are really string to do?
>
> Simply tell us in plain language what problem you are trying to solve.
>
> -- Jack Krupansky
>
> -Original Message- From: Roland Everaert
> Sent: Monday, June 10, 2013 9:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Adding pdf/word file using JSON/XML
>
>
> Sorry if it was not clear.
>
> What I would like is to know how to construct an XML/JSON request that
> provide any necessary information (supposedly the full path on disk) to
> solr to retrieve and index a pdf/ms word document.
>
> So, an XML request could look like this:
>
> 
> 
> doc10
> BLAH
> /path/to/file.pdf<**/field>
> 
> 
>
>
> Regards,
>
>
> Roland.
>
>
> On Mon, Jun 10, 2013 at 3:12 PM, Gora Mohanty  wrote:
>
>  On 10 June 2013 17:47, Roland Everaert  wrote:
>> > Hi,
>> >
>> > Based on the wiki, below is an example of how I am currently adding a >
>> pdf
>> > file with an extra field called name:
>> > curl "
>> >
>> http://localhost:8080/solr/**update/extract?literal.id=**
>> doc10&literal.name=BLAH&**defaultField=text<http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text>
>> "
>> > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"
>> >
>> > Is it possible to add a file + any extra fields using a JSON or XML
>> request.
>>
>> It is not entirely clear what you are asking. Do you mean
>> can one do the same as your example above for a PDF
>> file, but with a XML or JSON file? If so, yes. Please see
>> the examples in example/exampledocs/ of a Solr source
>> tree, and 
>> http://wiki.apache.org/solr/**ExtractingRequestHandler<http://wiki.apache.org/solr/ExtractingRequestHandler>
>>
>> Regards,
>> Gora
>>
>>
>


Re: Adding pdf/word file using JSON/XML

2013-06-11 Thread Roland Everaert
Jan,

Thanks for the answer.

Concerning the usage of /extract, If I understand correctly how works the
interface, it seems that the Document is recreated every time the url is
called. That would means that all metadata must be provided along the file
every time we want to update the related document, to avoid deletion of
extra fields.


Roland.



On Tue, Jun 11, 2013 at 3:31 PM, Jan Høydahl  wrote:

> Hi,
>
> You can let your web application where people upload the files take care
> of extracting the text, e.g. using Apache Tika.
> Once you have the text of the PDF, you can add that to your Solr document
> along with all the rest of the metadata, and
> post it to Solr as JSON, XML or whatever you like. You do not need to use
> extracting request handler then, since you do
> the extraction on the client side.
>
> PS: Evem if you use /extract, note that you can pass the literal.* params
> as POST if you choose, using 100% standards-based HTTP multipart post.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 11. juni 2013 kl. 14:48 skrev Roland Everaert :
>
> > We are working on an application that allows some users to add files
> (pdf,
> > ms word, odt, etc), located on their local hard disk, to our internal
> > system and allows other users to search for them. So we are considering
> > Solr for the indexing and search functionalities of the system. Along
> with
> > the file content, we want to index some metadata related to the file.
> >
> > It seems obvious that Solr couldn't import the file from the local disk
> of
> > the user, so the system will have to import the file into a directory
> that
> > Solr can reach and instruct Solr to index the file with the metadata, but
> > is it possible to index the file + metadata with a JSON/XML request?
> >
> > It seems that the only way to index a file with some metadata is to
> build a
> > request that would look like the following exemple that uses curl. The
> > developer would like to avoid using parameters in the url to pass
> arguments.
> >
> > curl "
> >
> http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text
> "
> > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"
> >
> >
> > Additionally, it seems that if a subsequent request is sent to the
> indexer
> > to update the file, if the metadata are not passed to Solr with the
> > request, they are deleted.
> >
> > Thanks for your help,
> >
> >
> >
> > Roland.
> >
> >
> > On Mon, Jun 10, 2013 at 4:14 PM, Jack Krupansky  >wrote:
> >
> >> Sorry, but you are STILL not being clear!
> >>
> >> Are you asking if you can pass Solr parameters as XML fields? No.
> >>
> >> Are you asking if the file name and path can be indexed as metadata? To
> >> some degree:
> >>
> >> curl "http://localhost:8983/solr/**update/extract?literal.id=doc-**1\<
> http://localhost:8983/solr/update/extract?literal.id=doc-1%5C>
> >> &commit=true&uprefix=attr_" -F "HelloWorld.docx=@HelloWorld.**docx"
> >>
> >> Then the stream has a name that is indexed as metadata:
> >>
> >> 
> >> stream_source_info
> >> HelloWorld.docx
> >> stream_content_type
> >> application/octet-stream<**/str>
> >> stream_size
> >> 10096
> >> stream_name
> >> HelloWorld.docx
> >> Content-Type
> >> application/vnd.**openxmlformats-officedocument.**
> >> wordprocessingml.document
> >> 
> >>
> >> and
> >>
> >> 
> >> HelloWorld.docx
> >> 
> >>
> >> 
> >> HelloWorld.docx
> >> 
> >>
> >> Or, what is it that you are really string to do?
> >>
> >> Simply tell us in plain language what problem you are trying to solve.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Roland Everaert
> >> Sent: Monday, June 10, 2013 9:23 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Adding pdf/word file using JSON/XML
> >>
> >>
> >> Sorry if it was not clear.
> >>
> >> What I would like is to know how to construct an XML/JSON request that
> >> provide any necessary information (supposedly the full path on disk) to
> >> solr to retrieve and index a pdf/ms word document.
> >>
> >> So, an XML request could look like this:
> >>

Re: Adding pdf/word file using JSON/XML

2013-06-12 Thread Roland Everaert
1) Being aggressive and insulting is not a way to help people understand
such complex tool or to help people in general.

2) I read again the feature page of Solr and it is stated that the
interface is REST-like and not RESTful as I though in the first place, and
communicate to the devs. And as the devs told me a RESTful interface
doesn't use parameters in the URI/URL, so ii is my mistake. Hence we have
no problem with the interface as it is.

Any way I still have a question regarding the /extract interface. It seems
that every time a file is updated in Solr, the lucene document is recreated
from scratch which means that any extra information we want to be
indexed/stored along the file is erased if the request doesn't contains
them. Is there a parameter that allow changing that behaviour?



Regards,


Roland.


On Tue, Jun 11, 2013 at 4:35 PM, Jack Krupansky wrote:

> "is it possible to index the file + metadata with a JSON/XML request?"
>
> You still aren't being clear as to what you are really trying to achieve
> here. I mean, just write a shell script that does the curl command, or
> write a Java program or application layer that uses SolrJ to talk to Solr
> and accepts JSON?XML/REST requests.
>
>
> "It seems that the only way to index a file with some metadata is to build
> a
> request that would look like the following example that uses curl."
>
> Curl is just a fancy way to do an HTTP request. You can do the same HTTP
> request from Java code (or Python or whatever.)
>
>
> "The developer would like to avoid using parameters in the url to pass
> arguments."
>
> Seriously?! What is THAT all about!!  I mean, really, HTTP and URLs and
> URL query parameters are part of the heart of the Internet infrastructure!
>
> If this whole thread is merely that you have an IDIOT who can't cope with
> passing HTTP URL query parameters, all I can say is... Wow!
>
> But use SolrJ and then at least it doesn't LOOK like they are URL Query
> parameters.
>
> Or, maybe this is just a case where the developer WANTS to use SOAP rather
> than a REST style of API.
>
> In any case, please clue us in as to what PROBLEM you are really trying to
> solve. Just use plain English and avoid getting caught up in what the
> solution might be.
>
> The real bottom line is that random application developers should not be
> talking directly to Solr anyway - they should be provided with an
> "application layer" that has a clean, application-oriented REST API and the
> gory details of the Solr API would be hidden inside the application layer.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Roland Everaert
> Sent: Tuesday, June 11, 2013 8:48 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Adding pdf/word file using JSON/XML
>
> We are working on an application that allows some users to add files (pdf,
> ms word, odt, etc), located on their local hard disk, to our internal
> system and allows other users to search for them. So we are considering
> Solr for the indexing and search functionalities of the system. Along with
> the file content, we want to index some metadata related to the file.
>
> It seems obvious that Solr couldn't import the file from the local disk of
> the user, so the system will have to import the file into a directory that
> Solr can reach and instruct Solr to index the file with the metadata, but
> is it possible to index the file + metadata with a JSON/XML request?
>
> It seems that the only way to index a file with some metadata is to build a
> request that would look like the following exemple that uses curl. The
> developer would like to avoid using parameters in the url to pass
> arguments.
>
> curl "
> http://localhost:8080/solr/**update/extract?literal.id=**
> doc10&literal.name=BLAH&**defaultField=text<http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text>
> "
> --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"
>
>
> Additionally, it seems that if a subsequent request is sent to the indexer
> to update the file, if the metadata are not passed to Solr with the
> request, they are deleted.
>
> Thanks for your help,
>
>
>
> Roland.
>
>
> On Mon, Jun 10, 2013 at 4:14 PM, Jack Krupansky *
> *wrote:
>
>  Sorry, but you are STILL not being clear!
>>
>> Are you asking if you can pass Solr parameters as XML fields? No.
>>
>> Are you asking if the file name and path can be indexed as metadata? To
>> some degree:
>>
>> curl 
>> "http://localhost:8983/solr/update/extract?literal.id=doc-1\<http://localhost:8983/

Re: Adding pdf/word file using JSON/XML

2013-06-13 Thread Roland Everaert
I apologize also for my obscure questions and I thanks you and the list for
your help so far and the very clear explanation you give about the
behaviour of Solr and SolrCell.

I am effectively an intermediary between the list and the dev, because our
development process is not efficient. The full story is (beware its
boring), we are a bunch of devs in a consultancy company waiting for the
next mission. In the mean time, our boss gives us something to do, but
instead of developing a big application where each dev has a module to care
of, or working each on its own machine. We have to develop the same
application with various technologies/tools/language. One is using .NET,
another is using Java and the spring framework and the 3rd one is using
JavaEE. And I am in the middle as a sysadmin/dba/investigator of tools and
API/provider of information and transparent API for everybody while
managing 3 databases, 2 application servers and 2 different indexers on the
same server and take into consideration that at some points in time the
devs will interchange their tools (rdbms and/or indexers) *now you can
breath*.

Top that with the fact that, one of the dev is experienced in REST and web
technologies (the IDIOT ;)) and that I have misread the first line of the
Solr feature page (Solr is a standalone enterprise search server with a
REST-like API), I actually communicate that Solr provides a RESTful API.

So I think I am a bit overwhelmed by the task at hand.

To conclude, yesterday I discuss with the team and we decide that I will
provide a RESTful web service that will hide the access to the indexers
among other things, so even the .NET guy will be able to use it. That will
allow me to study REST and, I hope, make clearer questions in the future.

Thanks again for your help and your patience,


Roland Everaert.




On Wed, Jun 12, 2013 at 4:18 PM, Jack Krupansky wrote:

> I'm sorry if I came across as aggressive or insulting - I'm only trying to
> dig down to what your actual difficulty is - and you have been making that
> extremely difficult for all of us. You need to help us all out here by more
> clearly expressing what your actual problem is. You will have to excuse the
> rest of us if we are unable to read your mind!
>
> It sounds as if you are an intermediary between your devs and this list.
> That's NOT a very effective communications strategy! You need to either
> have your devs communicate directly on this list, or you need to do a much
> better job of understanding what their actual problem is and then
> communicate that actual problem to this list, plainly and clearly.
>
> TRYING to read your mind (and indirectly your devs' minds as well - not an
> easy task!), and reading between the lines, it is starting to sound as if
> you (or/and your devs) are not clear on how Solr works as a "database".
>
> Core Solr does have full CRUD (Add or Create, Read or Query, Update, and
> Delete), although not in a strict, pure REST sense, that is true.
>
> A "full" update in Solr is the same as an Add - add a new, fresh document,
> and then delete the old document. Some people call this an "Upsert"
> (combination of Update or Insert).
>
> There are really two forms of update (a difficulty in REST): 1) full
> update or "replace" - equal to a delete and an add, and 2) partial or
> incremental update. True REST only has the latter
>
> Core Solr does have support for partial or incremental Update with Atomic
> Updates. Solr will in fact retain the existing data and only update any new
> field values that are supplied on the update request.
>
> SolrCell (Extracting RequestHandler or "/update/extract") is not a core
> part of Solr. It is an add on "contrib" module. It does not have full CRUD
> - no delete, and no partial update, but it does support add and full update.
>
> As someone else already suggested, you can do the work of SolrCell
> yourself by calling Tika directly in your app layer and then sending normal
> Solr CRUD requests.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Roland Everaert
> Sent: Wednesday, June 12, 2013 5:21 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Adding pdf/word file using JSON/XML
>
> 1) Being aggressive and insulting is not a way to help people understand
> such complex tool or to help people in general.
>
> 2) I read again the feature page of Solr and it is stated that the
> interface is REST-like and not RESTful as I though in the first place, and
> communicate to the devs. And as the devs told me a RESTful interface
> doesn't use parameters in the URI/URL, so ii is my mistake. Hence we have
> no problem with the interface as it is.
>
> Any way I still have a question regarding the /extract interface. It seems