date:20100504

Re: Overlapping onDeckSearchers=2

2010-05-04 Thread revas

Thanks for the repsonse .What happens in this scenario?

Does the commit happen in this case or does the search server hang or just
throws an error without  committing

Regards
Sujatha

On Mon, May 3, 2010 at 11:41 PM, Chris Hostetter
wrote:

> : When i run 2 -3 commits parallely  to diff instances or same instance I
> get
> : this error
> :
> : PERFORMANCE WARNING: Overlapping onDeckSearchers=2
> :
> : What is the Best approach to solve this
>
>
> http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F
>
>
>
> -Hoss
>
>

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal

Hello,



But I see that the libraries are being loaded :



INFO: Adding specified lib dirs to ClassLoader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar' 
to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar'
 to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar'
 to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar' 
to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-scratchpad-3.6.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tagsoup-1.2.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-core-0.7.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-parsers-0.7.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xercesImpl-2.8.1.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xml-apis-1.0.b2.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xmlbeans-2.3.0.jar' to 
classloader

May 4, 2010 12:50:16 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-cell-1.4.0.jar' to 
classloader

May 4, 2010 12:50:20 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-clustering-1.4.0.jar' 
to classloader

May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/carrot2-mini-3.1.0.jar' to 
classloader

May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/commons-lang-2.4.jar' to 
classloader

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal

Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved 
the issue and the content extraction works fine now.

Thanks,
Sandhya

-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com] 
Sent: Tuesday, May 04, 2010 12:58 PM
To: solr-user@lucene.apache.org
Subject: RE: Problem with pdf, upgrading Cell

Hello,



But I see that the libraries are being loaded :



INFO: Adding specified lib dirs to ClassLoader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar' 
to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar'
 to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar'
 to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar' 
to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-scratchpad-3.6.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tagsoup-1.2.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-core-0.7.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-parsers-0.7.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xercesImpl-2.8.1.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xml-apis-1.0.b2.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xmlbeans-2.3.0.jar' to 
classloader

May 4, 2010 12:50:16 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-cell-1.4.0.jar' to 
classloader

May 4, 2010 12:50:20 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-clustering-1.4.0.jar' 
to classloader

May 4, 2010 12:51:52 PM org.apac

Re: Score cutoff

2010-05-04 Thread Michael Kuhlmann

Am 03.05.2010 23:32, schrieb Satish Kumar:
> Hi,
> 
> Can someone give clues on how to implement this feature? This is a very
> important requirement for us, so any help is greatly appreciated.
> 

Hi,

I just implemented exactly this feature. You need to patch Solr to make
this work.

We at Zalando are planning to set up a technology blog where we'll offer
such tools, but at the moment this is not done. I can make a patch out
of my work and send it to you today.

Greetings,
Michael

> On Tue, Apr 27, 2010 at 5:54 PM, Satish Kumar <
> satish.kumar.just.d...@gmail.com> wrote:
> 
>> Hi,
>>
>> For some of our queries, the top xx (five or so) results are of very high
>> quality and results after xx are very poor. The difference in score for the
>> high quality and poor quality results is high. For example, 3.5 for high
>> quality and 0.8 for poor quality. We want to exclude results with score
>> value that is less than 60% or so of the first result. Is there a filter
>> that does this? If not, can someone please give some hints on how to
>> implement this (we want to do this as part of solr relevance ranking so that
>> the facet counts, etc will be correct).
>>
>>
>> Thanks,
>> Satish
>>
>

Re: Commit takes 1 to 2 minutes, CPU usage affects other apps

2010-05-04 Thread Markus Fischer


Hi,

On 04.05.2010 03:24, Mark Miller wrote:

On 5/3/10 9:06 AM, Markus Fischer wrote:

we recently began having trouble with our Solr 1.4 instance. We've about
850k documents in the index which is about 1.2GB in size; the JVM which
runs tomcat/solr (no other apps are deployed) has been given 2GB.

We've a forum and run a process every minute which indexes the new
messages. The number of messages updated are from 0 to 20 messages
average. The commit takes about 1 or two minutes, but usually when it
finished a few seconds later the next batch of documents is processed
and the story starts again.

Our environment is being providing by a company purely using VMWare
infrastructure, the Solr index itself is on an NSF for which we get some
33MB/s throughput.


That is certainly not a normal commit time for an index of that size.

Note that Solr 1.4 can have issues when working on NFS, but I don't know
that it would have anything to do with this.

Are you using the simple lock factory rather than the default native
lock factory? (as you should do when running on NFS)


I've switched the lockType to "simple" but didn't see any timing 
difference; it's still somewhat between one or two minutes.


In my last test case I tested with the indexing having been updated with 
only a single document.


I'm not very familiar with getting more debug information or similar out 
of Solr; is there a way to enable something to find out what's actually 
doing and what costs much time?


thanks so far,
- Markus

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Marc Ghorayeb


Sandhya,
How did you proceed?I did this:- jar -xf solr.war.- i then added all of the 
libs i had into the web-inf/lib folder- i then recreated the jar with jar -cvf 
solr.war *- replaced the war files- deleted the libs in the shared lib folder- 
started tomcat
i'm now getting an error saying this:
SEVERE: org.apache.solr.common.SolrException: Error loading class 
'org.apache.solr.handler.extraction.ExtractingRequestHandler'at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)  
  at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:418)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:454)
at 
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152)
Thanks Grant for investigating the problem!
Marc

> From: sagar...@opentext.com
> To: solr-user@lucene.apache.org
> Date: Tue, 4 May 2010 13:10:25 +0530
> Subject: RE: Problem with pdf, upgrading Cell
> 
> Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved 
> the issue and the content extraction works fine now.
> 
> Thanks,
> Sandhya
> 
> -Original Message-
> From: Sandhya Agarwal [mailto:sagar...@opentext.com] 
> Sent: Tuesday, May 04, 2010 12:58 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Problem with pdf, upgrading Cell
> 
> Hello,
> 
> 
> 
> But I see that the libraries are being loaded :
> 
> 
> 
> INFO: Adding specified lib dirs to ClassLoader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' 
> to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar' 
> to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar' 
> to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar'
>  to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar'
>  to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' 
> to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar' 
> to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-scratchpad-3.6.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tagsoup-1.2.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> repla

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal

I think this is most likely because tika-core-0.7.jar, no longer has the 
tika-config.xml. Die, to which we have the default tika config being loaded. 
This can be seen in ExtractingRequestHandler.inform() method. Hence, the 
parsers list is empty. I am still investigating.

Thanks,
Sandhya

-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com] 
Sent: Tuesday, May 04, 2010 1:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Problem with pdf, upgrading Cell

Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved 
the issue and the content extraction works fine now.

Thanks,
Sandhya

-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com] 
Sent: Tuesday, May 04, 2010 12:58 PM
To: solr-user@lucene.apache.org
Subject: RE: Problem with pdf, upgrading Cell

Hello,



But I see that the libraries are being loaded :



INFO: Adding specified lib dirs to ClassLoader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar' 
to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar'
 to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar'
 to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar' 
to classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-scratchpad-3.6.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tagsoup-1.2.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-core-0.7.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-parsers-0.7.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xercesImpl-2.8.1.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INFO: Adding 
'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xml-apis-1.0.b2.jar' to 
classloader

May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader

INF

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal

May be as Sandhya indicated, it was loading libs earlier, so it might be
trying to load from contrib when you have deleted from there, but somehow
not been 'seen' by Solr or something.

May be to keep them there, as well put them in solr/lib in tomcat webapps..

I'm yet to try though..


On Tue, May 4, 2010 at 2:16 PM, Marc Ghorayeb  wrote:

>
> Sandhya,
> How did you proceed?I did this:- jar -xf solr.war.- i then added all of the
> libs i had into the web-inf/lib folder- i then recreated the jar with jar
> -cvf solr.war *- replaced the war files- deleted the libs in the shared lib
> folder- started tomcat
> i'm now getting an error saying this:
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.handler.extraction.ExtractingRequestHandler'at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:418)
>  at
> org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:454)
>  at
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152)
> Thanks Grant for investigating the problem!
> Marc
>
> > From: sagar...@opentext.com
> > To: solr-user@lucene.apache.org
> > Date: Tue, 4 May 2010 13:10:25 +0530
> > Subject: RE: Problem with pdf, upgrading Cell
> >
> > Yes, Grant. You are right. Copying the tika libraries to solr webapp,
> solved the issue and the content extraction works fine now.
> >
> > Thanks,
> > Sandhya
> >
> > -Original Message-
> > From: Sandhya Agarwal [mailto:sagar...@opentext.com]
> > Sent: Tuesday, May 04, 2010 12:58 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: Problem with pdf, upgrading Cell
> >
> > Hello,
> >
> >
> >
> > But I see that the libraries are being loaded :
> >
> >
> >
> > INFO: Adding specified lib dirs to ClassLoader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to
> classloader
> >
> > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> >
> > INFO: Adding
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar'
> to classloader
> >
> > May 4, 2010 12:49:59 P

Re: Commit takes 1 to 2 minutes, CPU usage affects other apps

2010-05-04 Thread Peter Sturge

It might be worth checking the VMWare environment - if you're using the
VMWare scsi vmdk and it's shared across multiple VMs and there's a lot of
disk contention (i.e. multiple VMs are all busy reading/writing to/from the
same disk channel), this can really slow down I/O operations.


On Tue, May 4, 2010 at 8:52 AM, Markus Fischer  wrote:

> Hi,
>
>
> On 04.05.2010 03:24, Mark Miller wrote:
>
>> On 5/3/10 9:06 AM, Markus Fischer wrote:
>>
>>> we recently began having trouble with our Solr 1.4 instance. We've about
>>> 850k documents in the index which is about 1.2GB in size; the JVM which
>>> runs tomcat/solr (no other apps are deployed) has been given 2GB.
>>>
>>> We've a forum and run a process every minute which indexes the new
>>> messages. The number of messages updated are from 0 to 20 messages
>>> average. The commit takes about 1 or two minutes, but usually when it
>>> finished a few seconds later the next batch of documents is processed
>>> and the story starts again.
>>>
>>> Our environment is being providing by a company purely using VMWare
>>> infrastructure, the Solr index itself is on an NSF for which we get some
>>> 33MB/s throughput.
>>>
>>
>> That is certainly not a normal commit time for an index of that size.
>>
>> Note that Solr 1.4 can have issues when working on NFS, but I don't know
>> that it would have anything to do with this.
>>
>> Are you using the simple lock factory rather than the default native
>> lock factory? (as you should do when running on NFS)
>>
>
> I've switched the lockType to "simple" but didn't see any timing
> difference; it's still somewhat between one or two minutes.
>
> In my last test case I tested with the indexing having been updated with
> only a single document.
>
> I'm not very familiar with getting more debug information or similar out of
> Solr; is there a way to enable something to find out what's actually doing
> and what costs much time?
>
> thanks so far,
> - Markus
>
>

Re: Overlapping onDeckSearchers=2

2010-05-04 Thread Erik Hatcher

The commit happens (twice!) causing potentially double the RAM to be  
used for warming two index searchers, one of which will be thrown away  
right after warming.   It's best to avoid overlapping these warming  
searchers.  Using Solr's autocommit capability is the easiest way to  
manage the situation of multiple indexing clients at a time.


Erik

On May 4, 2010, at 3:12 AM, revas wrote:


Thanks for the repsonse .What happens in this scenario?

Does the commit happen in this case or does the search server hang  
or just

throws an error without  committing

Regards
Sujatha

On Mon, May 3, 2010 at 11:41 PM, Chris Hostetter
wrote:

: When i run 2 -3 commits parallely  to diff instances or same  
instance I

get
: this error
:
: PERFORMANCE WARNING: Overlapping onDeckSearchers=2
:
: What is the Best approach to solve this


http://wiki.apache.org/solr/FAQ#What_does_. 
22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX. 
22_mean_in_my_logs.3F




-Hoss

full-import cycle, period ?!

2010-05-04 Thread stockii


Hello.

how often do you perform an full-import ? 

my full-import with DIH runs every night an every two hours runs a
delta-import.

is it really necessary to run an full import every night ? 

we have an shopsystem and i think that is necessary, to have always synchron
data.  we discuss here about it and i want to know what the other users
do...

thx ;)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-import-cycle-period-tp775478p775478.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SpellChecking

2010-05-04 Thread Jan Kammer


Hi,

thanks, exactly that i forgot. Now it works fine. :-)

Am 03.05.2010 16:50, schrieb Michael Kuhlmann:

Am 03.05.2010 16:43, schrieb Jan Kammer:
   

Hi,

It worked fine with a normal field. There must something wrong with
copyfield, or why does dataimporthandler add/update no more documents?
 

Did you define your destination field as multivalue?

-Michael

Re: Score cutoff

2010-05-04 Thread dc tech

Michael,
The cutoff filter would be very useful for us as well. We want to use
it for more like this feature where only the top n similar docs tend
to be reallt similar.



On 5/4/10, Michael Kuhlmann  wrote:
> Am 03.05.2010 23:32, schrieb Satish Kumar:
>> Hi,
>>
>> Can someone give clues on how to implement this feature? This is a very
>> important requirement for us, so any help is greatly appreciated.
>>
>
> Hi,
>
> I just implemented exactly this feature. You need to patch Solr to make
> this work.
>
> We at Zalando are planning to set up a technology blog where we'll offer
> such tools, but at the moment this is not done. I can make a patch out
> of my work and send it to you today.
>
> Greetings,
> Michael
>
>> On Tue, Apr 27, 2010 at 5:54 PM, Satish Kumar <
>> satish.kumar.just.d...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> For some of our queries, the top xx (five or so) results are of very high
>>> quality and results after xx are very poor. The difference in score for
>>> the
>>> high quality and poor quality results is high. For example, 3.5 for high
>>> quality and 0.8 for poor quality. We want to exclude results with score
>>> value that is less than 60% or so of the first result. Is there a filter
>>> that does this? If not, can someone please give some hints on how to
>>> implement this (we want to do this as part of solr relevance ranking so
>>> that
>>> the facet counts, etc will be correct).
>>>
>>>
>>> Thanks,
>>> Satish
>>>
>>
>
>

-- 
Sent from my mobile device

Re: Commit takes 1 to 2 minutes, CPU usage affects other apps

2010-05-04 Thread Markus Fischer


On 04.05.2010 11:01, Peter Sturge wrote:

It might be worth checking the VMWare environment - if you're using the
VMWare scsi vmdk and it's shared across multiple VMs and there's a lot of
disk contention (i.e. multiple VMs are all busy reading/writing to/from the
same disk channel), this can really slow down I/O operations.


Ok, thanks, I'll try to get the information from my hoster.

I noticed that the commiting seems to be constant in time: it doesn't 
matter whether I'm updating only one document or 50 (usually it won't be 
more). Maybe these numbers are so low anyway to cause any real impact ...


- Markus

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Grant Ingersoll

Yes, it is loading the libraries, but they are in a different classloader that 
apparently the new way Tika loads doesn't have access to.

-Grant

On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote:

> Hello,
> 
> 
> 
> But I see that the libraries are being loaded :
> 
> 
> 
> INFO: Adding specified lib dirs to ClassLoader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' 
> to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar' 
> to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar' 
> to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar'
>  to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar'
>  to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' 
> to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar' 
> to classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-scratchpad-3.6.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tagsoup-1.2.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-core-0.7.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-parsers-0.7.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xercesImpl-2.8.1.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xml-apis-1.0.b2.jar' to 
> classloader
> 
> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xmlbeans-2.3.0.jar' to 
> classloader
> 
> May 4, 2010 12:50:16 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> 
> INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-cell-1.4.0.jar' to 
> classloader
> 
> May 4, 2010 12:50:20 PM

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal

Ok. In tika 0.4 and 0.5, I see that this is how the tika config is loaded :



public static TikaConfig getDefaultConfig()

  {

InputStream stream;

try

{

  stream = 
TikaConfig.class.getResourceAsStream("/org/apache/tika/tika-config.xml");



  return new TikaConfig(stream);

} catch (IOException e) {

  throw new RuntimeException("Unable to read default configuration", e);

}

catch (SAXException e) {

  throw new RuntimeException("Unable to parse default configuration", e);

}

catch (TikaException e) {

  throw new RuntimeException("Unable to access default configuration", e);

}

  }



And this has changed in tika 0.7, to



public TikaConfig()

throws MimeTypeException, IOException

  {

this.parsers = new HashMap();



ParseContext context = new ParseContext();

Iterator iterator = ServiceRegistry.lookupProviders(Parser.class);



while (iterator.hasNext()) {

  Parser parser = (Parser)iterator.next();

  for (Iterator i$ = parser.getSupportedTypes(context).iterator(); 
i$.hasNext(); ) { MediaType type = (MediaType)i$.next();

this.parsers.put(type.toString(), parser);

  }

}

this.mimeTypes = MimeTypesFactory.create("tika-mimetypes.xml");

  }



Hence, the reason why we no longer have tika-config.xml, bundled.



Thanks,

Sandhya



-Original Message-
From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll
Sent: Tuesday, May 04, 2010 4:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell



Yes, it is loading the libraries, but they are in a different classloader that 
apparently the new way Tika loads doesn't have access to.



-Grant



On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote:



> Hello,

>

>

>

> But I see that the libraries are being loaded :

>

>

>

> INFO: Adding specified lib dirs to ClassLoader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' 
> to classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to 
> classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to 
> classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar' 
> to classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar' 
> to classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to 
> classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to 
> classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar'
>  to classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to 
> classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to 
> classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar'
>  to classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to 
> classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' 
> to classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to 
> classloader

>

> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader

>

> INFO: Adding 
> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar' 
> to classloade

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Marc Ghorayeb


Hey,
I got it to work. I just redid my steps, i had forgotten several libraries that 
were imported through the xml. PDF extraction seems to work once again, i have 
yet to find one that raises an exception!

Thanks for the investigation, at least we now have a fix :)
Marc  
_
Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
Blackberry, …
http://www.messengersurvotremobile.com/?d=Hotmail

Need help in filtering records based on radius value in solr

2010-05-04 Thread KshamaPai


Hi,
I am using solr with  Lucene spatial 2.9.1 as per
http://www.ibm.com/developerworks/java/library/j-spatial/ 

I want to write a query, that will retrieve records within a given radius 
using hsin function, and using cartesian tiers as filters. So i wrote query
like this
 
http://localhost:8983/solr/select/?q=body:engineering colleges^0 AND
_val_:"recip(hsin(0.227486,1.354193 , lat_rad, lng_rad, 4), 1, 1, 0)"^100
&&fq={!tier x=13.033993 y=77.589569 radians=false dist=4 prefix=tier_
unit=m}

But the records retrieved are not varying even if i change the radius . So
can any one tell me if anything is wrong with the query or is there any
configuration issues related to solr in order to make this work

Thanks in advance.  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-in-filtering-records-based-on-radius-value-in-solr-tp775644p775644.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal

I seems to have mixed results:

Here is what i did:
copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in
contrib/extraction/lib (of-course removed old ones),. as well as in
web-inf/lib of solr web app in tomcat.

Now it extracts contents from some pdf, but either no content from others,
or only a line of content. For ex, "/docs/Installing Solr in Tomcat.pdf"
still shows no contents. I've two other pdfs, for which it extracts only one
line of content.

Also, now i;m getting a field 'title' single value for some pdfs, and two
for others. In case where it can extract full content, it shows title as
what i gave as literal while submitting the pdf. For pdf wher no comtent was
extracted, it shows one empty title and one mine. For pdf where it extracted
only one line of content, it shows that line as title too and mine one.
'title' field is defined as multivalue in schema.

Any idea, whats going on? or am i missing something?

On Tue, May 4, 2010 at 4:13 PM, Marc Ghorayeb  wrote:

>
> Hey,
> I got it to work. I just redid my steps, i had forgotten several libraries
> that were imported through the xml. PDF extraction seems to work once again,
> i have yet to find one that raises an exception!
>
> Thanks for the investigation, at least we now have a fix :)
> Marc
> _
> Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
> Blackberry, …
> http://www.messengersurvotremobile.com/?d=Hotmail
>

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal

Praveen,

Along with the tika core and parser jars, did you run "mvn 
dependency:copy-dependencies", to generate all the dependencies too.

Thanks,
Sandhya

-Original Message-
From: Praveen Agrawal [mailto:pkal...@gmail.com] 
Sent: Tuesday, May 04, 2010 4:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell

I seems to have mixed results:

Here is what i did:
copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in
contrib/extraction/lib (of-course removed old ones),. as well as in
web-inf/lib of solr web app in tomcat.

Now it extracts contents from some pdf, but either no content from others,
or only a line of content. For ex, "/docs/Installing Solr in Tomcat.pdf"
still shows no contents. I've two other pdfs, for which it extracts only one
line of content.

Also, now i;m getting a field 'title' single value for some pdfs, and two
for others. In case where it can extract full content, it shows title as
what i gave as literal while submitting the pdf. For pdf wher no comtent was
extracted, it shows one empty title and one mine. For pdf where it extracted
only one line of content, it shows that line as title too and mine one.
'title' field is defined as multivalue in schema.

Any idea, whats going on? or am i missing something?

On Tue, May 4, 2010 at 4:13 PM, Marc Ghorayeb  wrote:

>
> Hey,
> I got it to work. I just redid my steps, i had forgotten several libraries
> that were imported through the xml. PDF extraction seems to work once again,
> i have yet to find one that raises an exception!
>
> Thanks for the investigation, at least we now have a fix :)
> Marc
> _
> Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
> Blackberry, …
> http://www.messengersurvotremobile.com/?d=Hotmail
>

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal

Yes Sandhya,
i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what
you were asking.
Thanks.


On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal wrote:

> Praveen,
>
> Along with the tika core and parser jars, did you run "mvn
> dependency:copy-dependencies", to generate all the dependencies too.
>
> Thanks,
> Sandhya
>
> -Original Message-
> From: Praveen Agrawal [mailto:pkal...@gmail.com]
> Sent: Tuesday, May 04, 2010 4:52 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with pdf, upgrading Cell
>
> I seems to have mixed results:
>
> Here is what i did:
> copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in
> contrib/extraction/lib (of-course removed old ones),. as well as in
> web-inf/lib of solr web app in tomcat.
>
> Now it extracts contents from some pdf, but either no content from others,
> or only a line of content. For ex, "/docs/Installing Solr in Tomcat.pdf"
> still shows no contents. I've two other pdfs, for which it extracts only
> one
> line of content.
>
> Also, now i;m getting a field 'title' single value for some pdfs, and two
> for others. In case where it can extract full content, it shows title as
> what i gave as literal while submitting the pdf. For pdf wher no comtent
> was
> extracted, it shows one empty title and one mine. For pdf where it
> extracted
> only one line of content, it shows that line as title too and mine one.
> 'title' field is defined as multivalue in schema.
>
> Any idea, whats going on? or am i missing something?
>
>
>
> On Tue, May 4, 2010 at 4:13 PM, Marc Ghorayeb 
> wrote:
>
> >
> > Hey,
> > I got it to work. I just redid my steps, i had forgotten several
> libraries
> > that were imported through the xml. PDF extraction seems to work once
> again,
> > i have yet to find one that raises an exception!
> >
> > Thanks for the investigation, at least we now have a fix :)
> > Marc
> > _
> > Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
> > Blackberry, …
> > http://www.messengersurvotremobile.com/?d=Hotmail
> >
>

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal

Ok. So, I am assuming you copied all the dependencies from 
tika-app\target\dependency ? I tried with a number of files and don't see this 
issue yet.

Thanks,
Sandhya

-Original Message-
From: Praveen Agrawal [mailto:pkal...@gmail.com] 
Sent: Tuesday, May 04, 2010 5:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell

Yes Sandhya,
i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what
you were asking.
Thanks.


On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal wrote:

> Praveen,
>
> Along with the tika core and parser jars, did you run "mvn
> dependency:copy-dependencies", to generate all the dependencies too.
>
> Thanks,
> Sandhya
>
> -Original Message-
> From: Praveen Agrawal [mailto:pkal...@gmail.com]
> Sent: Tuesday, May 04, 2010 4:52 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with pdf, upgrading Cell
>
> I seems to have mixed results:
>
> Here is what i did:
> copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in
> contrib/extraction/lib (of-course removed old ones),. as well as in
> web-inf/lib of solr web app in tomcat.
>
> Now it extracts contents from some pdf, but either no content from others,
> or only a line of content. For ex, "/docs/Installing Solr in Tomcat.pdf"
> still shows no contents. I've two other pdfs, for which it extracts only
> one
> line of content.
>
> Also, now i;m getting a field 'title' single value for some pdfs, and two
> for others. In case where it can extract full content, it shows title as
> what i gave as literal while submitting the pdf. For pdf wher no comtent
> was
> extracted, it shows one empty title and one mine. For pdf where it
> extracted
> only one line of content, it shows that line as title too and mine one.
> 'title' field is defined as multivalue in schema.
>
> Any idea, whats going on? or am i missing something?
>
>
>
> On Tue, May 4, 2010 at 4:13 PM, Marc Ghorayeb 
> wrote:
>
> >
> > Hey,
> > I got it to work. I just redid my steps, i had forgotten several
> libraries
> > that were imported through the xml. PDF extraction seems to work once
> again,
> > i have yet to find one that raises an exception!
> >
> > Thanks for the investigation, at least we now have a fix :)
> > Marc
> > _
> > Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
> > Blackberry, …
> > http://www.messengersurvotremobile.com/?d=Hotmail
> >
>

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal

This email contained a .zip file attachment. Raytheon does not allow email 
attachments that are considered likely to contain malicious code. For your 
protection this attachment has been removed.

If this email is from an unknown source, please simply delete this email.

If this email was expected, and it is from a known sender, you may follow the 
below suggested instructions to obtain these types of attachments.

+ Instruct the sender to enclose the file(s) in a ".zip" compressed file, and 
rename the ".zip" compressed file with a different extension, such as, 
".rtnzip".  Password protecting the renamed ".zip" compressed file adds an 
additional layer of protection. When you receive the file, please rename it 
with the extension ".zip".

Additional instructions and options on how to receive these attachments can be 
found at:

http://security.it.ray.com/antivirus/extensions.html
http://security.it.ray.com/news/2007/zipfiles.html

Should you have any questions or difficulty with these instructions, please 
contact the Help Desk at 877.844.4712

---

It bounced because of attachment's size..
attaching one by one now..


On Tue, May 4, 2010 at 5:17 PM, Praveen Agrawal  wrote:

> I noticed following pattern/relationship b/w producer/creator and content
> extraction, not sure if helpful (as Grant told earlier pdfs are notorious):
>
> producer: Bullzip PDF Printer / www.bullzip.com / Freeware Edition (not
> registered)
> Creator: PScript5.dll Version 5.2.2
> Extraction: no content  --  "installing Solr in Tomcat.pdf" (attached - i
> generated)
> -
>
> Producer: Acrobat Distiller 7.0.5 (Windows)
> creator: PScript5.dll Version 5.2.2
> Extraction: One line content
> --
>
> Producer: Acrobat Distiller 8.1.0 (Windows)
> creator: Acrobat PDFMaker 8.1 for Word
> Extraction:  one line of content(Free_Two_way_Radio_Guide.pdf - attached
> - was available freely on their website)
> -
>
> Producer: FOP 0.20.5
> Extraction: full content"/docs/features.pdf | linkmap.pdf" etc
> --
> Thanks.
> Praveen
>
>
>
> On Tue, May 4, 2010 at 5:05 PM, Praveen Agrawal  wrote:
>
>> Yes Sandhya,
>> i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is
>> what you were asking.
>> Thanks.
>>
>>
>>
>> On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal wrote:
>>
>>> Praveen,
>>>
>>> Along with the tika core and parser jars, did you run "mvn
>>> dependency:copy-dependencies", to generate all the dependencies too.
>>>
>>> Thanks,
>>> Sandhya
>>>
>>> -Original Message-
>>> From: Praveen Agrawal [mailto:pkal...@gmail.com]
>>> Sent: Tuesday, May 04, 2010 4:52 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Problem with pdf, upgrading Cell
>>>
>>> I seems to have mixed results:
>>>
>>> Here is what i did:
>>> copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in
>>> contrib/extraction/lib (of-course removed old ones),. as well as in
>>> web-inf/lib of solr web app in tomcat.
>>>
>>> Now it extracts contents from some pdf, but either no content from
>>> others,
>>> or only a line of content. For ex, "/docs/Installing Solr in Tomcat.pdf"
>>> still shows no contents. I've two other pdfs, for which it extracts only
>>> one
>>> line of content.
>>>
>>> Also, now i;m getting a field 'title' single value for some pdfs, and two
>>> for others. In case where it can extract full content, it shows title as
>>> what i gave as literal while submitting the pdf. For pdf wher no comtent
>>> was
>>> extracted, it shows one empty title and one mine. For pdf where it
>>> extracted
>>> only one line of content, it shows that line as title too and mine one.
>>> 'title' field is defined as multivalue in schema.
>>>
>>> Any idea, whats going on? or am i missing something?
>>>
>>>
>>>
>>> On Tue, May 4, 2010 at 4:13 PM, Marc Ghorayeb 
>>> wrote:
>>>
>>> >
>>> > Hey,
>>> > I got it to work. I just redid my steps, i had forgotten several
>>> libraries
>>> > that were imported through the xml. PDF extraction seems to work once
>>> again,
>>> > i have yet to find one that raises an exception!
>>> >
>>> > Thanks for the investigation, at least we now have a fix :)
>>> > Marc
>>> > _
>>> > Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
>>> > Blackberry, …
>>> > http://www.messengersurvotremobile.com/?d=Hotmail
>>> >
>>>
>>
>>
>

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal

Both the files work for me, Praveen.

Thanks,
Sandhya

From: Praveen Agrawal [mailto:pkal...@gmail.com]
Sent: Tuesday, May 04, 2010 5:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell

another one here..
On Tue, May 4, 2010 at 5:20 PM, Praveen Agrawal 
mailto:pkal...@gmail.com>> wrote:
It bounced because of attachment's size..
attaching one by one now..

On Tue, May 4, 2010 at 5:17 PM, Praveen Agrawal 
mailto:pkal...@gmail.com>> wrote:
I noticed following pattern/relationship b/w producer/creator and content 
extraction, not sure if helpful (as Grant told earlier pdfs are notorious):

producer: Bullzip PDF Printer / www.bullzip.com / 
Freeware Edition (not registered)
Creator: PScript5.dll Version 5.2.2
Extraction: no content  --  "installing Solr in Tomcat.pdf" (attached - i 
generated)
-

Producer: Acrobat Distiller 7.0.5 (Windows)
creator: PScript5.dll Version 5.2.2
Extraction: One line content
--

Producer: Acrobat Distiller 8.1.0 (Windows)
creator: Acrobat PDFMaker 8.1 for Word
Extraction:  one line of content(Free_Two_way_Radio_Guide.pdf - attached - 
was available freely on their website)
-

Producer: FOP 0.20.5
Extraction: full content"/docs/features.pdf | linkmap.pdf" etc
--
Thanks.
Praveen

On Tue, May 4, 2010 at 5:05 PM, Praveen Agrawal 
mailto:pkal...@gmail.com>> wrote:
Yes Sandhya,
i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what 
you were asking.
Thanks.

On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal 
mailto:sagar...@opentext.com>> wrote:
Praveen,

Along with the tika core and parser jars, did you run "mvn 
dependency:copy-dependencies", to generate all the dependencies too.

Thanks,
Sandhya

-Original Message-
From: Praveen Agrawal [mailto:pkal...@gmail.com]
Sent: Tuesday, May 04, 2010 4:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell
I seems to have mixed results:

Here is what i did:
copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in
contrib/extraction/lib (of-course removed old ones),. as well as in
web-inf/lib of solr web app in tomcat.

Now it extracts contents from some pdf, but either no content from others,
or only a line of content. For ex, "/docs/Installing Solr in Tomcat.pdf"
still shows no contents. I've two other pdfs, for which it extracts only one
line of content.

Also, now i;m getting a field 'title' single value for some pdfs, and two
for others. In case where it can extract full content, it shows title as
what i gave as literal while submitting the pdf. For pdf wher no comtent was
extracted, it shows one empty title and one mine. For pdf where it extracted
only one line of content, it shows that line as title too and mine one.
'title' field is defined as multivalue in schema.

Any idea, whats going on? or am i missing something?

On Tue, May 4, 2010 at 4:13 PM, Marc Ghorayeb 
mailto:dekay...@hotmail.com>> wrote:

>
> Hey,
> I got it to work. I just redid my steps, i had forgotten several libraries
> that were imported through the xml. PDF extraction seems to work once again,
> i have yet to find one that raises an exception!
>
> Thanks for the investigation, at least we now have a fix :)
> Marc
> _
> Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
> Blackberry, …
> http://www.messengersurvotremobile.com/?d=Hotmail
>

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal

This email contained a .zip file attachment. Raytheon does not allow email 
attachments that are considered likely to contain malicious code. For your 
protection this attachment has been removed.

If this email is from an unknown source, please simply delete this email.

If this email was expected, and it is from a known sender, you may follow the 
below suggested instructions to obtain these types of attachments.

+ Instruct the sender to enclose the file(s) in a ".zip" compressed file, and 
rename the ".zip" compressed file with a different extension, such as, 
".rtnzip".  Password protecting the renamed ".zip" compressed file adds an 
additional layer of protection. When you receive the file, please rename it 
with the extension ".zip".

Additional instructions and options on how to receive these attachments can be 
found at:

http://security.it.ray.com/antivirus/extensions.html
http://security.it.ray.com/news/2007/zipfiles.html

Should you have any questions or difficulty with these instructions, please 
contact the Help Desk at 877.844.4712

---

another one here..

On Tue, May 4, 2010 at 5:20 PM, Praveen Agrawal  wrote:

> It bounced because of attachment's size..
> attaching one by one now..
>
>
>
> On Tue, May 4, 2010 at 5:17 PM, Praveen Agrawal  wrote:
>
>> I noticed following pattern/relationship b/w producer/creator and content
>> extraction, not sure if helpful (as Grant told earlier pdfs are notorious):
>>
>> producer: Bullzip PDF Printer / www.bullzip.com / Freeware Edition (not
>> registered)
>> Creator: PScript5.dll Version 5.2.2
>> Extraction: no content  --  "installing Solr in Tomcat.pdf" (attached - i
>> generated)
>> -
>>
>> Producer: Acrobat Distiller 7.0.5 (Windows)
>> creator: PScript5.dll Version 5.2.2
>> Extraction: One line content
>> --
>>
>> Producer: Acrobat Distiller 8.1.0 (Windows)
>> creator: Acrobat PDFMaker 8.1 for Word
>> Extraction:  one line of content(Free_Two_way_Radio_Guide.pdf - attached
>> - was available freely on their website)
>> -
>>
>> Producer: FOP 0.20.5
>> Extraction: full content"/docs/features.pdf | linkmap.pdf" etc
>> --
>> Thanks.
>> Praveen
>>
>>
>>
>> On Tue, May 4, 2010 at 5:05 PM, Praveen Agrawal wrote:
>>
>>> Yes Sandhya,
>>> i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is
>>> what you were asking.
>>> Thanks.
>>>
>>>
>>>
>>> On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal 
>>> wrote:
>>>
 Praveen,

 Along with the tika core and parser jars, did you run "mvn
 dependency:copy-dependencies", to generate all the dependencies too.

 Thanks,
 Sandhya

 -Original Message-
 From: Praveen Agrawal [mailto:pkal...@gmail.com]
 Sent: Tuesday, May 04, 2010 4:52 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Problem with pdf, upgrading Cell

 I seems to have mixed results:

 Here is what i did:
 copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in
 contrib/extraction/lib (of-course removed old ones),. as well as in
 web-inf/lib of solr web app in tomcat.

 Now it extracts contents from some pdf, but either no content from
 others,
 or only a line of content. For ex, "/docs/Installing Solr in Tomcat.pdf"
 still shows no contents. I've two other pdfs, for which it extracts only
 one
 line of content.

 Also, now i;m getting a field 'title' single value for some pdfs, and
 two
 for others. In case where it can extract full content, it shows title as
 what i gave as literal while submitting the pdf. For pdf wher no comtent
 was
 extracted, it shows one empty title and one mine. For pdf where it
 extracted
 only one line of content, it shows that line as title too and mine one.
 'title' field is defined as multivalue in schema.

 Any idea, whats going on? or am i missing something?

 On Tue, May 4, 2010 at 4:13 PM, Marc Ghorayeb 
 wrote:

 >
 > Hey,
 > I got it to work. I just redid my steps, i had forgotten several
 libraries
 > that were imported through the xml. PDF extraction seems to work once
 again,
 > i have yet to find one that raises an exception!
 >
 > Thanks for the investigation, at least we now have a fix :)
 > Marc
 > _
 > Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
 > Blackberry, …
 > http://www.messengersurvotremobile.com/?d=Hotmail
 >

>>>
>>>
>>
>

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Marc Ghorayeb


Praveen,
Did you try the technique I wrote a little earlier?Take your solr.war, put it 
in a directory of its own. Execute "jar -xf solr.war", that should extract its 
content. Next, copy all of your libraries inside the WEB-INF/lib folder. This 
means all the extraction/lib files, and the lib files from the Solr's roots. 
Once this is done, we now recreate the solr.war by doing "jar -cvf solr.war *" 
(the * meaning all the files inside the current directory, so be sure to be 
inside the root directory where you extracted the war previously).
Once this is done, put the new solr.war inside the tomcat webapps folder, and 
recreate from scratch the solr folder (so as not to leave any overlapping 
libraries). This should work hopefully.
For the multivalued fields (title for example), this is a know feature/issue of 
Tika's integration. In my case, I always provide a literal.title along with my 
pdfs, but if Tika successfully extracts a title from the Pdf's meta, then it 
will create the Solr index entry with an array of the inputted literal, and the 
extracted value. There is no way to force an override of the extracted data 
with the literals, they just get appended. Someone correct me if i am wrong 
here :)
Marc

> Date: Tue, 4 May 2010 11:58:56 +
> From: pkal...@gmail.com
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with pdf, upgrading Cell
> 
> This email contained a .zip file attachment. Raytheon does not allow email 
> attachments that are considered likely to contain malicious code. For your 
> protection this attachment has been removed.
> 
> If this email is from an unknown source, please simply delete this email.
> 
> If this email was expected, and it is from a known sender, you may follow the 
> below suggested instructions to obtain these types of attachments.
> 
> + Instruct the sender to enclose the file(s) in a ".zip" compressed file, and 
> rename the ".zip" compressed file with a different extension, such as, 
> ".rtnzip".  Password protecting the renamed ".zip" compressed file adds an 
> additional layer of protection. When you receive the file, please rename it 
> with the extension ".zip".
> 
> Additional instructions and options on how to receive these attachments can 
> be found at:
> 
> http://security.it.ray.com/antivirus/extensions.html
> http://security.it.ray.com/news/2007/zipfiles.html
> 
> Should you have any questions or difficulty with these instructions, please 
> contact the Help Desk at 877.844.4712
> 
> ---
> 
> another one here..
> 
> On Tue, May 4, 2010 at 5:20 PM, Praveen Agrawal  wrote:
> 
> > It bounced because of attachment's size..
> > attaching one by one now..
> >
> >
> >
> > On Tue, May 4, 2010 at 5:17 PM, Praveen Agrawal  wrote:
> >
> >> I noticed following pattern/relationship b/w producer/creator and content
> >> extraction, not sure if helpful (as Grant told earlier pdfs are notorious):
> >>
> >> producer: Bullzip PDF Printer / www.bullzip.com / Freeware Edition (not
> >> registered)
> >> Creator: PScript5.dll Version 5.2.2
> >> Extraction: no content  --  "installing Solr in Tomcat.pdf" (attached - i
> >> generated)
> >> -
> >>
> >> Producer: Acrobat Distiller 7.0.5 (Windows)
> >> creator: PScript5.dll Version 5.2.2
> >> Extraction: One line content
> >> --
> >>
> >> Producer: Acrobat Distiller 8.1.0 (Windows)
> >> creator: Acrobat PDFMaker 8.1 for Word
> >> Extraction:  one line of content(Free_Two_way_Radio_Guide.pdf - 
> >> attached
> >> - was available freely on their website)
> >> -
> >>
> >> Producer: FOP 0.20.5
> >> Extraction: full content"/docs/features.pdf | linkmap.pdf" etc
> >> --
> >> Thanks.
> >> Praveen
> >>
> >>
> >>
> >> On Tue, May 4, 2010 at 5:05 PM, Praveen Agrawal wrote:
> >>
> >>> Yes Sandhya,
> >>> i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is
> >>> what you were asking.
> >>> Thanks.
> >>>
> >>>
> >>>
> >>> On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal 
> >>> wrote:
> >>>
>  Praveen,
> 
>  Along with the tika core and parser jars, did you run "mvn
>  dependency:copy-dependencies", to generate all the dependencies too.
> 
>  Thanks,
>  Sandhya
> 
>  -Original Message-
>  From: Praveen Agrawal [mailto:pkal...@gmail.com]
>  Sent: Tuesday, May 04, 2010 4:52 PM
>  To: solr-user@lucene.apache.org
>  Subject: Re: Problem with pdf, upgrading Cell
> 
>  I seems to have mixed results:
> 
>  Here is what i did:
>  copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in
>  contrib/extraction/lib (of-course removed old ones),. as well as in
>  web-inf/lib of solr web app in tomcat.
> 
>  Now it extracts contents from some pdf, but either no content from
>  others,
>  or only a line of content. For ex, "/docs/Installing Solr in Tomcat.pdf"
>  still shows no contents. I've two other pdfs, for which it extracts onl

RE: Commit takes 1 to 2 minutes, CPU usage affects other apps

2010-05-04 Thread cbennett

Hi,

This could also be caused by performing an optimize after the commit, or it
could be caused by auto warming the caches, or a combination of both.

If you are using the Data Import Handler the default for a delta import is
commit and optimize, which caused us a similar problem except we were
optimizing a 7 million document, 23Gb index with every delta import which
was taking over 10 minutes. As soon as we added optimize=false to the
command updates took a few seconds. You can always add separate calls to
perform the optimize when it's convenient for you.

To see if the problem is auto warming take a look at the warm up time for
the searcher. If this is the cause you will need to consider lowering the
autowarmCount for your caches. 

Colin.

> -Original Message-
> From: Markus Fischer [mailto:mar...@fischer.name]
> Sent: Tuesday, May 04, 2010 6:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Commit takes 1 to 2 minutes, CPU usage affects other apps
> 
> On 04.05.2010 11:01, Peter Sturge wrote:
> > It might be worth checking the VMWare environment - if you're using
> the
> > VMWare scsi vmdk and it's shared across multiple VMs and there's a
> lot of
> > disk contention (i.e. multiple VMs are all busy reading/writing
> to/from the
> > same disk channel), this can really slow down I/O operations.
> 
> Ok, thanks, I'll try to get the information from my hoster.
> 
> I noticed that the commiting seems to be constant in time: it doesn't
> matter whether I'm updating only one document or 50 (usually it won't
> be
> more). Maybe these numbers are so low anyway to cause any real impact
> ...
> 
> - Markus

Short DismaxRequestHandler Question

2010-05-04 Thread MitchK


Hello community,

I need a minimum should match only on some fields, not on all.

Let me give you an example:
title: "Breaking News: New information about Solr 1.5"
category: development
tag: Solr News

If I am searching for "Solr development", I want to return this doc,
although I defined a minimum should match of 100%, because 100% of the query
match the *whole* document. 
At the moment, 100% applies only if 100% of the query match a field. 

Is this possible at the moment?
If not, are there any suggestions or practices to make this working?

Thank you.

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Short-DismaxRequestHandler-Question-tp775913p775913.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom SearchComponent to reset facet value counts after collapse

2010-05-04 Thread MitchK


When is the returned facet-info the expected info for your multiValued
fields?
Before or after your collapse?
It could be possible, that you need to facet only on your multiValued fields
before you are collapsing to retrive the right values.
If this is the case, you need to integrate the before-collapsing feature of
the collapsing-patch in your own component, the rest is done by the patch
itself.

Hope this helps.

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-SearchComponent-to-reset-facet-value-counts-after-collapse-tp770826p776067.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: cores and SWAP

2010-05-04 Thread Tim Heckman

If it helps, I am running:

solr 1.4.0
tomcat 6.0.26

java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

Red Hat Enterprise Linux Server release 5.4 (Tikanga)


thanks,
Tim


On Mon, May 3, 2010 at 4:47 PM, Tim Heckman  wrote:
> I have 2 cores: core1 and core2.
>
> Load the same data set into each and commit. Verify that searches
> return the same for each core.
>
> Delete a document (call it docA) from core2 but not from core1.
>
> Commit and verify search results (docA disappears from core2's search
> results. core1 continues to return the docA)
>
> Swap cores.
>
> Core2 should now return docA, but it doesn't until I reload core2.
>
>
> thanks,
> Tim
>
>
> On Mon, May 3, 2010 at 1:41 PM, Shalin Shekhar Mangar
>  wrote:
>> On Mon, May 3, 2010 at 10:27 PM, Tim Heckman  wrote:
>>
>>> Hi, I'm trying to figure out whether I need to reload a core (or both
>>> cores?) after performing a swap.
>>>
>>> When I perform a swap in my sandbox (non-production) environment, I am
>>> seeing that one of the cores needs to be reloaded following a swap and
>>> the other does not, but I haven't been able to find a pattern to which
>>> one it will be.
>>>
>>>
>> No, you should not need to reload any core after a swap. What is the
>> behavior that you see?
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>

Spatial Solr: problem with multiValued PointType

2010-05-04 Thread pointbreak+solr

I want to link documents to multiple spatial points, and filter
documents based on a bounding box. I was expecting that the
solr.PointType would help me with that, but run into a problem. When I
create a filter, it seems that Solr matches the latitude and longitude
of the PointType separately. Could somebody please advice me if this is
expected behavior, and if so how to handle this usecase.

My setup is as follows:

in schema.xml:
   
   

I create a document with the following locations:
   52.3672174, 4.9126891  and:
   52.3624717, 4.9106624

This document will match with the filter:
   location:[52.362,4.911 TO 52.363,4.913]

I would have expected it not to match, since both locations are outside
this bounding box (the longitude of the second, and the latitude of the
first point as point would be inside the bounding box).

Thank for any help.

Re: Short DismaxRequestHandler Question

2010-05-04 Thread Papiya Misra


I think you could combine the minimum set of fields into one field at
the time of indexing, for example, you could concatenate 'category' and
'tag' at the time of querying (if you are using a database).

On 05/04/2010 09:06 AM, MitchK wrote:

Hello community,

I need a minimum should match only on some fields, not on all.

Let me give you an example:
title: "Breaking News: New information about Solr 1.5"
category: development
tag: Solr News

If I am searching for "Solr development", I want to return this doc,
although I defined a minimum should match of 100%, because 100% of the query
match the *whole* document.
At the moment, 100% applies only if 100% of the query match a field.

Is this possible at the moment?
If not, are there any suggestions or practices to make this working?

Thank you.

Kind regards
- Mitch



Pink OTC Markets Inc. provides the leading inter-dealer quotation and trading 
system in the over-the-counter (OTC) securities market.   We create innovative 
technology and data solutions to efficiently connect market participants, 
improve price discovery, increase issuer disclosure, and better inform 
investors.   Our marketplace, comprised of the issuer-listed OTCQX and 
broker-quoted   Pink Sheets, is the third largest U.S. equity trading venue for 
company shares.

This document contains confidential information of Pink OTC Markets and is only 
intended for the recipient.   Do not copy, reproduce (electronically or 
otherwise), or disclose without the prior written consent of Pink OTC Markets.  
If you receive this message in error, please destroy all copies in your 
possession (electronically or otherwise) and contact the sender above.

Re: Short DismaxRequestHandler Question

2010-05-04 Thread MitchK


Thank you for responsing.

This would be possible. However, I wouldn't like to do so, because a match
in "title" should boost higher than a match in "category". 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Short-DismaxRequestHandler-Question-tp775913p776238.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal

Hi Sandhya..
I must be missing something. I copied all dependencies jars to both
contrib/extraction/lib and web-in/lib folders. Here is the list of jars
copied:

asm-3.1.jar
bcmail-jdk15-1.45.jar
bcprov-jdk15-1.45.jar
commons-compress-1.0.jar
commons-logging-1.1.1.jar
dom4j-1.6.1.jar
fontbox-1.1.0.jar
geronimo-stax-api_1.0_spec-1.0.1.jar
hamcrest-core-1.1.jar
jempbox-1.1.0.jar
junit-3.8.1.jar
log4j-1.2.14.jar
metadata-extractor-2.4.0-beta-1.jar
mockito-core-1.7.jar
nekohtml-1.9.9.jar
objenesis-1.0.jar
ooxml-schemas-1.0.jar
pdfbox-1.1.0.jar
poi-3.6.jar
poi-ooxml-3.6.jar
poi-ooxml-schemas-3.6.jar
poi-scratchpad-3.6.jar
tagsoup-1.2.jar
tika-core-0.7.jar
tika-parsers-0.7.jar
xml-apis-1.0.b2.jar
xmlbeans-2.3.0.jar

Still same result for me..

Marc,
i'm on windows, and i copied above jars directly into already extracted
folder webapps/solr/web-in/lib, in addition to what were already there. I
didn;t explicitly un-jar'd and re-jar'd the solr.war, but do you think that
could be the issue? i think tomcat extract the war and use the folder in
webapps (i didn;t put the war file in webapps, instead had put extracted
solr folder directly)

If it has worked for you guys, specially with my two pdfs, then that's
really great. Please let me know your exact procedure, including what all
you copied and where, or if you see i missed something obvious..

Thanks,
Praveen


On Tue, May 4, 2010 at 5:28 PM, Sandhya Agarwal wrote:

> Both the files work for me, Praveen.
>
> Thanks,
> Sandhya
>
> From: Praveen Agrawal [mailto:pkal...@gmail.com]
> Sent: Tuesday, May 04, 2010 5:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with pdf, upgrading Cell
>
> another one here..
> On Tue, May 4, 2010 at 5:20 PM, Praveen Agrawal  pkal...@gmail.com>> wrote:
> It bounced because of attachment's size..
> attaching one by one now..
>
>
> On Tue, May 4, 2010 at 5:17 PM, Praveen Agrawal  pkal...@gmail.com>> wrote:
> I noticed following pattern/relationship b/w producer/creator and content
> extraction, not sure if helpful (as Grant told earlier pdfs are notorious):
>
> producer: Bullzip PDF Printer / www.bullzip.com /
> Freeware Edition (not registered)
> Creator: PScript5.dll Version 5.2.2
> Extraction: no content  --  "installing Solr in Tomcat.pdf" (attached - i
> generated)
> -
>
> Producer: Acrobat Distiller 7.0.5 (Windows)
> creator: PScript5.dll Version 5.2.2
> Extraction: One line content
> --
>
> Producer: Acrobat Distiller 8.1.0 (Windows)
> creator: Acrobat PDFMaker 8.1 for Word
> Extraction:  one line of content(Free_Two_way_Radio_Guide.pdf -
> attached - was available freely on their website)
> -
>
> Producer: FOP 0.20.5
> Extraction: full content"/docs/features.pdf | linkmap.pdf" etc
> --
> Thanks.
> Praveen
>
>
> On Tue, May 4, 2010 at 5:05 PM, Praveen Agrawal  pkal...@gmail.com>> wrote:
> Yes Sandhya,
> i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is
> what you were asking.
> Thanks.
>
>
> On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal  > wrote:
> Praveen,
>
> Along with the tika core and parser jars, did you run "mvn
> dependency:copy-dependencies", to generate all the dependencies too.
>
> Thanks,
> Sandhya
>
> -Original Message-
> From: Praveen Agrawal [mailto:pkal...@gmail.com]
> Sent: Tuesday, May 04, 2010 4:52 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with pdf, upgrading Cell
> I seems to have mixed results:
>
> Here is what i did:
> copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in
> contrib/extraction/lib (of-course removed old ones),. as well as in
> web-inf/lib of solr web app in tomcat.
>
> Now it extracts contents from some pdf, but either no content from others,
> or only a line of content. For ex, "/docs/Installing Solr in Tomcat.pdf"
> still shows no contents. I've two other pdfs, for which it extracts only
> one
> line of content.
>
> Also, now i;m getting a field 'title' single value for some pdfs, and two
> for others. In case where it can extract full content, it shows title as
> what i gave as literal while submitting the pdf. For pdf wher no comtent
> was
> extracted, it shows one empty title and one mine. For pdf where it
> extracted
> only one line of content, it shows that line as title too and mine one.
> 'title' field is defined as multivalue in schema.
>
> Any idea, whats going on? or am i missing something?
>
>
>
> On Tue, May 4, 2010 at 4:13 PM, Marc Ghorayeb  > wrote:
>
> >
> > Hey,
> > I got it to work. I just redid my steps, i had forgotten several
> libraries
> > that were imported through the xml. PDF extraction seems to work once
> again,
> > i have yet to find one that raises an exception!
> >
> > Thanks for the investigation, at least we now have a fix :)
> > Marc
> > _

Lucidworks

2010-05-04 Thread joyce chan

Hi

Does anybody know how to install LucidWorks Solr (LucidWorks.jar) without
the gui installer?  Or maybe to do it as a silent install?

Thanks
Joyce

Re: Lucidworks

2010-05-04 Thread joyce chan

Sorry, please ignore my previous message, I figured it out.  (That is, use
the console mode)

On Tue, May 4, 2010 at 11:01 AM, joyce chan  wrote:

> Hi
>
> Does anybody know how to install LucidWorks Solr (LucidWorks.jar) without
> the gui installer?  Or maybe to do it as a silent install?
>
> Thanks
> Joyce
>

-- 
Joyce Chan
Search Engineer
Springboard Retail Networks Inc.
207 Queen's Quay W Suite 320
joyce.c...@springboardretailnetworks.com

Re: Short DismaxRequestHandler Question

2010-05-04 Thread MitchK


I got an idea:
If I would catenate all relevant fields to one large multiValued field, I
could query like this: 
{!dismax qf='myLargeField^5'}solr development //mm is 1 (100%) if not set
Additionally to that, I add a phraseQuery

{!dismax qf='myLargeField^5'}solr development AND title:(solr
development)^10 OR category:(solr development)^2 

Any other ideas are welcome.
Thank you for the discussion. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Short-DismaxRequestHandler-Question-tp775913p776446.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: cores and SWAP

2010-05-04 Thread Tim Heckman

It looks like this was not a solr issue at all. It looks like it's
browser-related.

When I use safari, curl or wget, I don't see the issue. When I use
firefox or chrome, I do.

I'll have to dig into this a little more with an http proxy to see
what's going on. I have not altered the httpCaching settings from the
example solrconfig.xml.

thanks,
Tim

On Tue, May 4, 2010 at 10:00 AM, Tim Heckman  wrote:
> If it helps, I am running:
>
> solr 1.4.0
> tomcat 6.0.26
>
> java version "1.6.0_20"
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>
> Red Hat Enterprise Linux Server release 5.4 (Tikanga)
>
>
> thanks,
> Tim
>
>
> On Mon, May 3, 2010 at 4:47 PM, Tim Heckman  wrote:
>> I have 2 cores: core1 and core2.
>>
>> Load the same data set into each and commit. Verify that searches
>> return the same for each core.
>>
>> Delete a document (call it docA) from core2 but not from core1.
>>
>> Commit and verify search results (docA disappears from core2's search
>> results. core1 continues to return the docA)
>>
>> Swap cores.
>>
>> Core2 should now return docA, but it doesn't until I reload core2.
>>
>>
>> thanks,
>> Tim
>>
>>
>> On Mon, May 3, 2010 at 1:41 PM, Shalin Shekhar Mangar
>>  wrote:
>>> On Mon, May 3, 2010 at 10:27 PM, Tim Heckman  wrote:
>>>
 Hi, I'm trying to figure out whether I need to reload a core (or both
 cores?) after performing a swap.

 When I perform a swap in my sandbox (non-production) environment, I am
 seeing that one of the cores needs to be reloaded following a swap and
 the other does not, but I haven't been able to find a pattern to which
 one it will be.

>>> No, you should not need to reload any core after a swap. What is the
>>> behavior that you see?
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>
>

Re: cores and SWAP

2010-05-04 Thread Erik Hatcher

The issue is that browsers (apparently not Safari?) will send the last- 
modified/etag headers to Solr and get back a 304 and your browser will  
simply display the last response it got.  Use the force reload option  
from the browser (it's a habit for me now) to ensure you're actually  
getting a current response from Solr, or turn off the HTTP 304  
capability of Solr altogether.


Erik

On May 4, 2010, at 12:19 PM, Tim Heckman wrote:


It looks like this was not a solr issue at all. It looks like it's
browser-related.

When I use safari, curl or wget, I don't see the issue. When I use
firefox or chrome, I do.

I'll have to dig into this a little more with an http proxy to see
what's going on. I have not altered the httpCaching settings from the
example solrconfig.xml.

thanks,
Tim


On Tue, May 4, 2010 at 10:00 AM, Tim Heckman   
wrote:

If it helps, I am running:

solr 1.4.0
tomcat 6.0.26

java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

Red Hat Enterprise Linux Server release 5.4 (Tikanga)


thanks,
Tim


On Mon, May 3, 2010 at 4:47 PM, Tim Heckman   
wrote:

I have 2 cores: core1 and core2.

Load the same data set into each and commit. Verify that searches
return the same for each core.

Delete a document (call it docA) from core2 but not from core1.

Commit and verify search results (docA disappears from core2's  
search

results. core1 continues to return the docA)

Swap cores.

Core2 should now return docA, but it doesn't until I reload core2.


thanks,
Tim


On Mon, May 3, 2010 at 1:41 PM, Shalin Shekhar Mangar
 wrote:
On Mon, May 3, 2010 at 10:27 PM, Tim Heckman   
wrote:


Hi, I'm trying to figure out whether I need to reload a core (or  
both

cores?) after performing a swap.

When I perform a swap in my sandbox (non-production)  
environment, I am
seeing that one of the cores needs to be reloaded following a  
swap and
the other does not, but I haven't been able to find a pattern to  
which

one it will be.


No, you should not need to reload any core after a swap. What is  
the

behavior that you see?

--
Regards,
Shalin Shekhar Mangar.

[PECL-DEV] [ANNOUNCEMENT] solr-0.9.10 (beta) Released

2010-05-04 Thread Israel Ekpo

The new PECL package solr-0.9.10 (beta) has been released at
http://pecl.php.net/.

Release notes
-
- Increased compatibility with older systems running CentOS 4 or 5 and RHEL4
or 5
- Added ability to compile directly without having to build libcurl and
libxml2 from source on older systems
- Lowered minimum supported version for libcurl to 7.15.0 (Alex Samorukov)
- Lowered minimum supported version for libxml2 to 2.6.26 (Alex Samorukov)
- Fixed PECL Bug# 17172 MoreLikeThis only parses one doc (trevor at blubolt
dot com, max at blubolt dot com)
- Declared workaround macros for SSL private key constants due to support
for earlier versions of libcurl (Alex Samorukov)
- Changed extension version numbers to start using hexadecimal numbers
(Israel Ekpo)
- Added instructions on how to attempt to compile on windows (Israel Ekpo)
- Fixed PECL Bug# 17292 sending UTF-8 encoding in header (giguet at info dot
unicaen dot fr)

Package Info
-
It effectively simplifies the process of interacting with Apache Solr using
PHP5 and it already comes with built-in readiness for the latest features
added in Solr 1.4. The extension has features such as built-in, serializable
query string builder objects which effectively simplifies the manipulation
of name-value pair request parameters across repeated requests. The response
from the Solr server is also automatically parsed into native php objects
whose properties can be accessed as array keys or object properties without
any additional configuration on the client-side. Its advanced HTTP client
reuses the same connection across multiple requests and provides built-in
support for connecting to Solr servers secured behind HTTP Authentication or
HTTP proxy servers. It is also able to connect to SSL-enabled containers.
Please consult the documentation for more details on features.

Related Links
-
Package home: http://pecl.php.net/package/solr
Changelog: http://pecl.php.net/package-changelog.php?package=solr
Download: http://pecl.php.net/get/solr-0.9.10.tgz
Documentation: http://www.php.net/solr

Authors
-
Israel Ekpo  (lead)

Re: cores and SWAP

2010-05-04 Thread Tim Heckman

OK, yes, I see now. Even though the etags change when the swap
happens, the last modified date on the server may be earlier than what
the client has from the request prior to the swap.

thank you.
Tim


On Tue, May 4, 2010 at 12:30 PM, Erik Hatcher  wrote:
> The issue is that browsers (apparently not Safari?) will send the
> last-modified/etag headers to Solr and get back a 304 and your browser will
> simply display the last response it got.  Use the force reload option from
> the browser (it's a habit for me now) to ensure you're actually getting a
> current response from Solr, or turn off the HTTP 304 capability of Solr
> altogether.
>
>        Erik
>

RE: Short DismaxRequestHandler Question

2010-05-04 Thread Naga Darbha

You may create a new field by copying the fields title, category and tag to the 
new field, like the following:





and search against the new field.  You may go for newField of type "textgen".

Give it a try,
Naga

From: MitchK [mitc...@web.de]
Sent: Tuesday, May 04, 2010 6:36 PM
To: solr-user@lucene.apache.org
Subject: Short DismaxRequestHandler Question

Hello community,

I need a minimum should match only on some fields, not on all.

Let me give you an example:
title: "Breaking News: New information about Solr 1.5"
category: development
tag: Solr News

If I am searching for "Solr development", I want to return this doc,
although I defined a minimum should match of 100%, because 100% of the query
match the *whole* document.
At the moment, 100% applies only if 100% of the query match a field.

Is this possible at the moment?
If not, are there any suggestions or practices to make this working?

Thank you.

Kind regards
- Mitch
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Short-DismaxRequestHandler-Question-tp775913p775913.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom SearchComponent to reset facet value counts after collapse

2010-05-04 Thread MitchK


I would prefer extending the given CollapseComponent, because of
performance-reasons. What you want to do sounds a bit like making things too
complicate. 

There are two options I would prefer:
1. get the schema-information for every field you want to query against and
define, whether you want to facet before or after collapsing. As far as I
have understood: For multiValued fields you want to facet before collapsing,
because if you facet after collapsing, the returned counts are wrong.

2. As a developer, you know which of the queried fields is a multiValued
one. Knowing this, you create a new param that contains on those fields, you
always want to facet on BEFORE collapsing. 

I want to emphasize that I never had a look at the sourcecode of the patch.
However, I really think that you do not need to reimplement so much things.
You only need to implement the logic when to facet which field. That's
everything. 
And as far as the component seems to implement both things: facet before
*and* after collapsing, you can use the provided methods to make your logic
work. 

Just some thoughts. :)

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-SearchComponent-to-reset-facet-value-counts-after-collapse-tp770826p776896.html
Sent from the Solr - User mailing list archive at Nabble.com.

SOLR-343 date facet mincount patch

2010-05-04 Thread Umesh_


Hi All,

As per https://issues.apache.org/jira/browse/SOLR-343, the date facet
mincount patch is tested.
Has any one tried to apply this patch on Solr 1.4? When I tried I was able
to patch 'SOLR-343.patch' but it failed for another
'DateFacetsMincountPatch.patch'.


patching file src/java/org/apache/solr/request/SimpleFacets.java
Hunk #1 FAILED at 430.
Hunk #2 FAILED at 492.
2 out of 2 hunks FAILED -- saving rejects to file
src/java/org/apache/solr/request/SimpleFacets.java.rej

Please let me know if you were able to apply this patch successfully and
date facet mincount works.

Thanks,
Umesh

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-343-date-facet-mincount-patch-tp777172p777172.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How do I return all the results in an index?

2010-05-04 Thread Umesh_


querying for *:* works in Solr 1.4 as well. Did you check that your index has
any data?

~Umesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-return-all-the-results-in-an-index-tp777214p777239.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How do I return all the results in an index?

2010-05-04 Thread Umesh_


Please post the query you are using. It could be something like
'http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on'.

~Umesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-return-all-the-results-in-an-index-tp777214p777260.html
Sent from the Solr - User mailing list archive at Nabble.com.

inconsistency in SolrParams.get()

2010-05-04 Thread Frank Wesemann


Dear list,
I recently stumpled upon this:

modifiableParams = new ModifiableSolrParams( req.getParams() );

assert modifiableParams.get("key").equals( req.getParams().get("key") );

this test fails for requests built from a SimpleRequestParser or 
StandardRequestParser where the parameter "key" was given, but empty ( 
e.g. localhost:8393/select/?key=¶1=val1&parm2=val2 ).


The reason is that oas.request.ServletSolrParams returns null for values 
with length() == 0,

but all other SolrParams implementations return the empty String.

This behaviour has also side effects on search components:
Most, if not all, standard search components check for something like

if (reg.getParams().getBool(myTriggerParameter, false) ) {

   ...do what I am supposed to do...

}


In case of ServletSolrParams getBool() returns the desired and expected 
"false",

all other Implementations throw a "bad request" Exception.
One may argue that suppling a parameter with an empty value indeed is a 
malformed request,
but as an example, in our frontend servers we use a Perllib which always 
adds the "q" parameter to a SolrRequest

( and our Solr implementation allows  requests without a explicit query ).

Nonetheless I think, the above mentioned equality check should hold true 
for any request and any SolrParams.
Because I cannot oversee the implications, I currently don't have a 
better suggestion to achieve this, than
to make ServleSolrParams also return the empty String, which is in my 
opinion counter-intuitive and does not the right thing for the 
getBool(), getInt() etc. cases.

Any thoughts?

btw:
is ModifiableSolrParams.set(key, null) removes key from the params 
really the desired and expected behaviour?




--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky

Re: How do I return all the results in an index?

2010-05-04 Thread MitchK


Did you clean up the Browser-Cache? 
Maybe you need to restart (I am currently not sure, whether Solr caches
HTTP-requests, even when you did a commit???).

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-return-all-the-results-in-an-index-tp777214p777353.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AutoSuggest with custom sorting

2010-05-04 Thread Chris Hostetter


First off: i would suggest that instead of doing a simple prefix search, 
you look into using EdgeNGrams for this sort of thing.

I'm also assuming since you need custom scoring for this, you aren't going 
to get what you need using the TermsComponent or any other simple solution 
using your main corpus -- it would make more sense to setup a special 
index consisting of one document per "term" to include in your 
autosuggest.

: > 1. Results matching field1 should be ranked higher. Results matching the

easily done with dismax .. even if you are using EdgeNGrams (just make 
sure you have EdgeNGrams on at index time, but not at query time)

: > 2.The next sort parameter is the length of the word. So, if you are
: > searching for IR, Row2 (2 out of 4 ) matches higher than Row3 (2 out of 5).

this can be accomplished by indexing a numeric field containing the 
"length" of the field as a number, and then doing a secondary sort on it.  
the fieldNorm typically takes care of this sort of thing for you, but is 
more of a generalized concept, and doesn't give you exact precision for 
small numbers.


-Hoss

Re: Commit takes 1 to 2 minutes, CPU usage affects other apps

2010-05-04 Thread Lance Norskog

Are you accidentally building the spellchecker database on each commit?

An option is to use the MergePolicy stuff to avoid merging during
normal commits, but I failed to understand the interactions of
configuration numbers. It's a bit of a jungle in there.

On Tue, May 4, 2010 at 5:43 AM,   wrote:
> Hi,
>
> This could also be caused by performing an optimize after the commit, or it
> could be caused by auto warming the caches, or a combination of both.
>
> If you are using the Data Import Handler the default for a delta import is
> commit and optimize, which caused us a similar problem except we were
> optimizing a 7 million document, 23Gb index with every delta import which
> was taking over 10 minutes. As soon as we added optimize=false to the
> command updates took a few seconds. You can always add separate calls to
> perform the optimize when it's convenient for you.
>
> To see if the problem is auto warming take a look at the warm up time for
> the searcher. If this is the cause you will need to consider lowering the
> autowarmCount for your caches.
>
>
> Colin.
>
>> -Original Message-
>> From: Markus Fischer [mailto:mar...@fischer.name]
>> Sent: Tuesday, May 04, 2010 6:22 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Commit takes 1 to 2 minutes, CPU usage affects other apps
>>
>> On 04.05.2010 11:01, Peter Sturge wrote:
>> > It might be worth checking the VMWare environment - if you're using
>> the
>> > VMWare scsi vmdk and it's shared across multiple VMs and there's a
>> lot of
>> > disk contention (i.e. multiple VMs are all busy reading/writing
>> to/from the
>> > same disk channel), this can really slow down I/O operations.
>>
>> Ok, thanks, I'll try to get the information from my hoster.
>>
>> I noticed that the commiting seems to be constant in time: it doesn't
>> matter whether I'm updating only one document or 50 (usually it won't
>> be
>> more). Maybe these numbers are so low anyway to cause any real impact
>> ...
>>
>> - Markus
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Need help/assistance with Multicore admin/cores?action=CREATE

2010-05-04 Thread Chris Hostetter


: Shouldn't all the parameters be added to the solr.xml core2 that were 

yep .. it does in fact look like a bug in the solr.xml persistence code.  
please file a bug in Jira.

: passed in from the URL?  And why did the config="solrconfig.xml" get 
: removed from the core1 definition?

i believe that is actually working as intended -- since solrconfig.xml is 
the default name it's superfluous to have it explicitly declared so it 
gets left out -- i could be wrong though, if you use 
'config="someting-special.xml"' and then notice it get removed when the 
solr.xml file is persisted that would also be a bug.



-Hoss

Re: Sort by membership of range query

2010-05-04 Thread Chris Hostetter


: What I can't quite figure out is how, when including all results, to sort
: the results by whether they are "active" or not. In other words, have all
: products within the date range appear before the products outside the date
: range (or vice versa).

if you want it to function as a true "sort" option (as in: you want to 
sort by active or not, then sort by some rating field, then sort by score 
as a final resort) it isn't possible in 1.4 (but Grant's roadmap for 
spatial work in the next version of Solr will include the ability to 
specify an arbitrary function as a sort "field" so it should be possible 
eventually)

what you can do if you normally sort by score is to "boost" the score of 
any docs that match your active criteria, so that they score much much 
higher then docs which are not active...

q=+(your query) (startDate:[* TO NOW] -endDate:[* TO NOW])^10


-Hoss

Case Insensitive search while preserving case

2010-05-04 Thread dbashford


I've looked through the history and tried a lot of things but can't quite get
this to work.

Used this in my last attempt:


  


  



What I'm looking to do is allow user's to execute case insensitive searches,
which this does.  "BLaH" should return all the "Blah"s.  However, what this
also seems to do is render the values lowercased when I do faceted or stats
queries, or if I do a terms search.  Always returned as "blah".

Is there any way to only ever get the original value out of Solr no matter
how I ask for it? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Case-Insensitive-search-while-preserving-case-tp777602p777602.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Case Insensitive search while preserving case

2010-05-04 Thread Ahmet Arslan

> I've looked through the history and tried a lot of things
> but can't quite get
> this to work.
> 
> Used this in my last attempt:
> 
>      class="solr.TextField"
> positionIncrementGap="100">
>       
>          class="solr.KeywordTokenizerFactory"/>
>          class="solr.LowerCaseFilterFactory" />
>       
>     
> 
> 
> What I'm looking to do is allow user's to execute case
> insensitive searches,
> which this does.  "BLaH" should return all the
> "Blah"s.  However, what this
> also seems to do is render the values lowercased when I do
> faceted or stats
> queries, or if I do a terms search.  Always returned
> as "blah".
> 
> Is there any way to only ever get the original value out of
> Solr no matter
> how I ask for it? 

Making your field stored="true" and requesting it with &fl=lowercaseField does 
not satisfy your needs?

Re: Case Insensitive search while preserving case

2010-05-04 Thread dbashford


All my fields are stored.

And if my field name is "state" means that your suggestion is appending
"fl=state", then no, that's not doing anything for me.  =(

The above config gets me part of the way to where I need to be.  Storing,
for instance, "Alaska" in such a way that querying for "alaska", "AlaSkA",
and "ALASKA" will all return "Alaska".  However, if I include the field as a
stats.facet, or I'd doing a faceted search (facet=true), or do a terms
search, what I get out is "alaska".

Any way around that without the dupe field?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Case-Insensitive-search-while-preserving-case-tp777602p777674.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facets vs TermV's

2010-05-04 Thread Chris Hostetter

: Basically, what is the difference between issuing a facet field query
: that returns facets with counts,
: and a query with term vectors that also returns document frequency
: counts for terms in a field? 

The FacetComponent generates counts that are relative the set of documents 
that match your query, so if you search for "jimbo" and 1034 docs match, 
then your facet counts will tell you how many of those 1034 docs match 
each of the facet constraints.

The TermsComponent generates counts that tell ou about the raw terms in 
the index -- you don't even need to have a search (or a doc set) to use 
the TermsComponent, it just gives you raw data from the index.

(that's why the "/terms" handler in the example Solr 1.4 configs doesn't 
even include the "query" component, and the examples using hte 
TermsComponent never mention a "q" param)

-Hoss

Re: AutoSuggest with custom sorting

2010-05-04 Thread Sean Timm


Chris Hostetter wrote:
this can be accomplished by indexing a numeric field containing the 
"length" of the field as a number, and then doing a secondary sort on it.  
the fieldNorm typically takes care of this sort of thing for you, but is 
more of a generalized concept, and doesn't give you exact precision for 
small numbers
Or see https://issues.apache.org/jira/browse/LUCENE-1360 if you don't 
want to index a field length.


-Sean

Custom DIH variables

2010-05-04 Thread Blargy


Can someone please point me in the right direction (classes) on how to create
my own custom dih variable that can be used in my data-config.xml

So instead of ${dataimporter.last_index_time} I want to be able to create
${dataimporter.foo}

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p777696.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Commit takes 1 to 2 minutes, CPU usage affects other apps

2010-05-04 Thread Chris Hostetter


: Are you accidentally building the spellchecker database on each commit?
...
: > This could also be caused by performing an optimize after the commit, or it
: > could be caused by auto warming the caches, or a combination of both.

The heart of the matter being: it's pretty much impossible to guess what 
is taking up all this time (and eating up all that CPU) w/o seeing your 
configs, and having a better idea of how you are timing this "1 to 2 
minutes" ... is this what your client sending the commit reports? what 
exactly is the command it's executing?

what do your Solr logs say about the commit, and the subsequent 
newSearcher?


-Hoss

Re: Monitoring via JMX; changing mbean names?

2010-05-04 Thread Chris Hostetter


: For example, when running multiple instances of solr in the same Tomcat 
: instance, each has an associated searc...@1234567 mbean.  Alright, I 
: expect that.  However, some values that I'm looking for (such as 
: avgRequestsPerSecond, avgTimePerRequest) are all located under the 
: solr/standard mbean.  I'm not sure which actual instance of solr this 
: value actually represents.

Good question ... running multiple isntances of solr on the same server 
and then monitoring with JMX isn't something i think most people have 
looked into -- but there has definitley been a reported instance of a 
problem that seems to match what you are talking about...

https://issues.apache.org/jira/browse/SOLR-1843

...please try out that code (and comment on that issue) if you can

: Maybe it would be wise to have a separate JMX server for each instance 
: of Solr?  I don't know.

this is definitely possible.  please take a look at the comments on the 
 declaration in the example solrconfig.xml file -- by default it 
registers itself with teh first JMX Server it finds, but you can also 
configure it to register itself with a particular existing server by 
agentId, or to spin up it's own distinct JMX server.


-Hoss

Re: synonym filter problem for string or phrase

2010-05-04 Thread Chris Hostetter


: yes my default search field is text_sync.

fields and fieldTypes are differnet things -- you've shows us the 

for "text_sync" but not the  ... with out that we can't be sure 
you have things configured properly.

you also having shown us the debugQuery output to help us know exactly 
what query is getting constructed in your tests.

: I think due to stemming I am not getting expected result. However when I am
: checking result from the analysis.jsp,

analysis.jsp only shows you what the *analyzer* produces at index time vs 
query time, it doesn't tell you anything how the query parser is going to 
behave (and the query parser decides wich analyzer to use and how/when)  
In particular whitespace is "special" to the default query parser, so even 
though you have a KeywordTokenizer in your "query" analyzer, a query like 
this...
   /select?q=what+is+the+question?
..is going to result in the individual chunks of input "what", "is", and 
"the" being passed to your analyzer ("question?" won't even be analyzed, 
because the "?" mark is also special syntax that gets converted to a wild 
card query)

the only way this will posisble work the way you want is if you quote the 
entire input and/or use something like the the "field" or "raw" 
QParserPlugin...
   /select?q="what+is+the+question?"
   /select?q={field}what+is+the+question?
   /select?q={raw}what+is+the+question?

furthermore: assuming the field type you showed us is getting used, there 
is no possibility that stemming is entering the equaion -- you have no 
stemming configured, it's not used magically.


-Hoss

Re: copyField - how does it work?

2010-05-04 Thread Chris Hostetter


:
:
...
:
: 
: Is the copyField valid specified in BLOCK-4?  It seems it is not 
: populating the clubbed_string with the values of field_A and field_B.

copyFields aren't chained together -- supporting that is "hard" and can 
lead to infinite loops, so each copyField src is copied direclty into the 
dest (without checking to see if the "dest" is the src of any other 
copyFields)

: Do I need to populate clubbed_string by explicitly copying field_A and 
field_B directly to it?

yes.


-Hoss

Re: Custom SolrQueryRequest/SolrQueryResponse

2010-05-04 Thread Chris Hostetter



: Herein lies the problem from what I can tell:  I don�t have any control 
: over SolrQueryRequest or SolrQueryResponse.  My initial attempts have me 
: subclassing both of these to hold a List of requests and responses, with 
: a cursor that moves the �current� req/res each time through my handler.  
: All methods are implemented to delegate directly to the req/res that the 
: cursor is pointing to.  I would check, via instanceof, whether we are 
: dealing with a normal or composite query in the writer to dump the 
: results appropriately.

whoa... back up.

why do you feel the need to subclass these at all?

why don't you just make add the multiple DocLists (from each of the 
queries) to the same SolrQueryResponse ... there's no limit to the nuber 
of DocLists that can be added -- you can even give each of them their own 
name.

For that matter: if you have other metadata specific to the individual 
"component" requests, you can build up a NamedList (or Map or Collection) 
containing all of that metadata (including hte DocList) and then add that 
to the SolrQueryResponse.

This is already how many of the existing stock components are designed, 
all you are doing differnetly then some of them is executing multiple 
"main" searches that produce a DocList -- but i've made several custom 
components in the past that do the same thing and had no problems.  If 
this doens't seem feasible for your usecase, then please explain in 
moredepth what your use case is and why it doesn't seem like this would 
fit.


-Hoss

highlighting exact phrases bug?

2010-05-04 Thread Karthik Ram

Hi Folks,
 I am unable to get highlighting to work when searching for exact phrases in
SOLR 1.4

A discussion about the exact same issue can be found here:

http://www.mail-archive.com/solr-user@lucene.apache.org/msg27872.html

Can someone please tell how to fix this?

I am using the parameter hl.usePhraseHighlighter=true in the query string
and searching on a full text field as suggested by the one of the replier
and it still doesn't work.


Following is the error I get:
---

Problem accessing /solr/select. Reason:

org/apache/lucene/index/memory/MemoryIndex

java.lang.NoClassDefFoundError: org/apache/lucene/index/memory/MemoryIndex
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getReaderForField(WeightedSpanTermExtractor.java:361)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:282)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:149)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
at 
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
at 
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Re: highlighting exact phrases bug?

2010-05-04 Thread Mark Miller


You need to put memory.jar on the classpath along with highlighter.jar.

On 5/4/10 10:38 PM, Karthik Ram wrote:

Hi Folks,
  I am unable to get highlighting to work when searching for exact phrases in
SOLR 1.4

A discussion about the exact same issue can be found here:

http://www.mail-archive.com/solr-user@lucene.apache.org/msg27872.html

Can someone please tell how to fix this?

I am using the parameter hl.usePhraseHighlighter=true in the query string
and searching on a full text field as suggested by the one of the replier
and it still doesn't work.


Following is the error I get:
---

Problem accessing /solr/select. Reason:

 org/apache/lucene/index/memory/MemoryIndex

java.lang.NoClassDefFoundError: org/apache/lucene/index/memory/MemoryIndex
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getReaderForField(WeightedSpanTermExtractor.java:361)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:282)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:149)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
at 
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
at 
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)




--
- Mark

http://www.lucidimagination.com

Re: highlighting exact phrases bug?

2010-05-04 Thread Mark Miller

Hmm...this is actually an odd error if you are using the war though - 
this jar (lucene-memory-*.jar) should be in the webapp.


On 5/4/10 10:48 PM, Mark Miller wrote:

You need to put memory.jar on the classpath along with highlighter.jar.

On 5/4/10 10:38 PM, Karthik Ram wrote:

Hi Folks,
I am unable to get highlighting to work when searching for exact
phrases in
SOLR 1.4

A discussion about the exact same issue can be found here:

http://www.mail-archive.com/solr-user@lucene.apache.org/msg27872.html

Can someone please tell how to fix this?

I am using the parameter hl.usePhraseHighlighter=true in the query string
and searching on a full text field as suggested by the one of the replier
and it still doesn't work.


Following is the error I get:
---

Problem accessing /solr/select. Reason:

org/apache/lucene/index/memory/MemoryIndex

java.lang.NoClassDefFoundError:
org/apache/lucene/index/memory/MemoryIndex
at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getReaderForField(WeightedSpanTermExtractor.java:361)

at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:282)

at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:149)

at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)

at
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)

at
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)

at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)

at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)

at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)

at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)

at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)

at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)

at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)

at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)








--
- Mark

http://www.lucidimagination.com

SOLR Based Search - Response Times - what do you consider slow or fast?

2010-05-04 Thread dc tech

We are using SOLR in a production setup with a jRuby  on Rails front end
with about  20 different instances of SOLR running on heavy duty hardware.
The setup is load balanced front end (jRoR) on a pair of machines and the
SOLR backends on a different machine. We have plenty of memory and CPU and
the machines are not particularly loaded (<5% CPUs). Loads are in the range
of 12,000 to 16,000 searches a day so not a huge number. Our overall
response  (front end + SOLR) averages 0.5s to 0.7s with SOLR typicall taking
about 100 - 300 ms.

How does this compare with your experience? Would you say the performance is
good/bad/ugly?

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal

Praveen,



I only have the highlighted jars copied. Not sure, if we need the other jars. 
Also, I copied the jars directly into solr\WEB-INF\lib, like you did.



Thanks,

Sandhya



-Original Message-
From: Praveen Agrawal [mailto:pkal...@gmail.com]
Sent: Tuesday, May 04, 2010 8:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell



Hi Sandhya..

I must be missing something. I copied all dependencies jars to both

contrib/extraction/lib and web-in/lib folders. Here is the list of jars

copied:



asm-3.1.jar

bcmail-jdk15-1.45.jar

bcprov-jdk15-1.45.jar

commons-compress-1.0.jar

commons-logging-1.1.1.jar

dom4j-1.6.1.jar

fontbox-1.1.0.jar

geronimo-stax-api_1.0_spec-1.0.1.jar

hamcrest-core-1.1.jar

jempbox-1.1.0.jar

junit-3.8.1.jar

log4j-1.2.14.jar

metadata-extractor-2.4.0-beta-1.jar

mockito-core-1.7.jar

nekohtml-1.9.9.jar

objenesis-1.0.jar

ooxml-schemas-1.0.jar

pdfbox-1.1.0.jar

poi-3.6.jar

poi-ooxml-3.6.jar

poi-ooxml-schemas-3.6.jar

poi-scratchpad-3.6.jar

tagsoup-1.2.jar

tika-core-0.7.jar

tika-parsers-0.7.jar

xml-apis-1.0.b2.jar

xmlbeans-2.3.0.jar



Still same result for me..



Marc,

i'm on windows, and i copied above jars directly into already extracted

folder webapps/solr/web-in/lib, in addition to what were already there. I

didn;t explicitly un-jar'd and re-jar'd the solr.war, but do you think that

could be the issue? i think tomcat extract the war and use the folder in

webapps (i didn;t put the war file in webapps, instead had put extracted

solr folder directly)



If it has worked for you guys, specially with my two pdfs, then that's

really great. Please let me know your exact procedure, including what all

you copied and where, or if you see i missed something obvious..



Thanks,

Praveen





On Tue, May 4, 2010 at 5:28 PM, Sandhya Agarwal wrote:



> Both the files work for me, Praveen.

>

> Thanks,

> Sandhya

>

> From: Praveen Agrawal [mailto:pkal...@gmail.com]

> Sent: Tuesday, May 04, 2010 5:22 PM

> To: solr-user@lucene.apache.org

> Subject: Re: Problem with pdf, upgrading Cell

>

> another one here..

> On Tue, May 4, 2010 at 5:20 PM, Praveen Agrawal  pkal...@gmail.com>> wrote:

> It bounced because of attachment's size..

> attaching one by one now..

>

>

> On Tue, May 4, 2010 at 5:17 PM, Praveen Agrawal  pkal...@gmail.com>> wrote:

> I noticed following pattern/relationship b/w producer/creator and content

> extraction, not sure if helpful (as Grant told earlier pdfs are notorious):

>

> producer: Bullzip PDF Printer / www.bullzip.com /

> Freeware Edition (not registered)

> Creator: PScript5.dll Version 5.2.2

> Extraction: no content  --  "installing Solr in Tomcat.pdf" (attached - i

> generated)

> -

>

> Producer: Acrobat Distiller 7.0.5 (Windows)

> creator: PScript5.dll Version 5.2.2

> Extraction: One line content

> --

>

> Producer: Acrobat Distiller 8.1.0 (Windows)

> creator: Acrobat PDFMaker 8.1 for Word

> Extraction:  one line of content(Free_Two_way_Radio_Guide.pdf -

> attached - was available freely on their website)

> -

>

> Producer: FOP 0.20.5

> Extraction: full content"/docs/features.pdf | linkmap.pdf" etc

> --

> Thanks.

> Praveen

>

>

> On Tue, May 4, 2010 at 5:05 PM, Praveen Agrawal  pkal...@gmail.com>> wrote:

> Yes Sandhya,

> i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is

> what you were asking.

> Thanks.

>

>

> On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal  > wrote:

> Praveen,

>

> Along with the tika core and parser jars, did you run "mvn

> dependency:copy-dependencies", to generate all the dependencies too.

>

> Thanks,

> Sandhya

>

> -Original Message-

> From: Praveen Agrawal [mailto:pkal...@gmail.com]

> Sent: Tuesday, May 04, 2010 4:52 PM

> To: solr-user@lucene.apache.org

> Subject: Re: Problem with pdf, upgrading Cell

> I seems to have mixed results:

>

> Here is what i did:

> copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in

> contrib/extraction/lib (of-course removed old ones),. as well as in

> web-inf/lib of solr web app in tomcat.

>

> Now it extracts contents from some pdf, but either no content from others,

> or only a line of content. For ex, "/docs/Installing Solr in Tomcat.pdf"

> still shows no contents. I've two other pdfs, for which it extracts only

> one

> line of content.

>

> Also, now i;m getting a field 'title' single value for some pdfs, and two

> for others. In case where it can extract full content, it shows title as

> what i gave as literal while submitting the pdf. For pdf wher no comtent

> was

> extracted, it shows one empty title and one mine. For pdf where it

> extracted

> only one line of content, it shows that line as title too and mine one.

> 'title' field is defined a

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal

Looks like the highlighting may not work here. Following is the list of jars I 
copied :

asm-3.1.jar
bcmail-jdk15-1.45.jar
bcprov-jdk15-1.45.jar
commons-compress-1.0.jar
commons-logging-1.1.1.jar
dom4j-1.6.1.jar
fontbox-1.1.0.jar
geronimo-stax-api_1.0_spec-1.0.1.jar
jempbox-1.1.0.jar
log4j-1.2.14.jar
metadata-extractor-2.4.0-beta-1.jar
pdfbox-1.1.0.jar
poi-3.6.jar
poi-ooxml-3.6.jar
poi-ooxml-schemas-3.6.jar
poi-scratchpad-3.6.jar
tagsoup-1.2.jar
tika-core-0.7.jar
tika-parsers-0.7.jar
xml-apis-1.0.b2.jar
xmlbeans-2.3.0.jar

Thanks,
Sandhya



-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com] 
Sent: Wednesday, May 05, 2010 10:06 AM
To: solr-user@lucene.apache.org
Subject: RE: Problem with pdf, upgrading Cell

Praveen,



I only have the highlighted jars copied. Not sure, if we need the other jars. 
Also, I copied the jars directly into solr\WEB-INF\lib, like you did.



Thanks,

Sandhya



-Original Message-
From: Praveen Agrawal [mailto:pkal...@gmail.com]
Sent: Tuesday, May 04, 2010 8:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell



Hi Sandhya..

I must be missing something. I copied all dependencies jars to both

contrib/extraction/lib and web-in/lib folders. Here is the list of jars

copied:



asm-3.1.jar

bcmail-jdk15-1.45.jar

bcprov-jdk15-1.45.jar

commons-compress-1.0.jar

commons-logging-1.1.1.jar

dom4j-1.6.1.jar

fontbox-1.1.0.jar

geronimo-stax-api_1.0_spec-1.0.1.jar

hamcrest-core-1.1.jar

jempbox-1.1.0.jar

junit-3.8.1.jar

log4j-1.2.14.jar

metadata-extractor-2.4.0-beta-1.jar

mockito-core-1.7.jar

nekohtml-1.9.9.jar

objenesis-1.0.jar

ooxml-schemas-1.0.jar

pdfbox-1.1.0.jar

poi-3.6.jar

poi-ooxml-3.6.jar

poi-ooxml-schemas-3.6.jar

poi-scratchpad-3.6.jar

tagsoup-1.2.jar

tika-core-0.7.jar

tika-parsers-0.7.jar

xml-apis-1.0.b2.jar

xmlbeans-2.3.0.jar



Still same result for me..



Marc,

i'm on windows, and i copied above jars directly into already extracted

folder webapps/solr/web-in/lib, in addition to what were already there. I

didn;t explicitly un-jar'd and re-jar'd the solr.war, but do you think that

could be the issue? i think tomcat extract the war and use the folder in

webapps (i didn;t put the war file in webapps, instead had put extracted

solr folder directly)



If it has worked for you guys, specially with my two pdfs, then that's

really great. Please let me know your exact procedure, including what all

you copied and where, or if you see i missed something obvious..



Thanks,

Praveen





On Tue, May 4, 2010 at 5:28 PM, Sandhya Agarwal wrote:



> Both the files work for me, Praveen.

>

> Thanks,

> Sandhya

>

> From: Praveen Agrawal [mailto:pkal...@gmail.com]

> Sent: Tuesday, May 04, 2010 5:22 PM

> To: solr-user@lucene.apache.org

> Subject: Re: Problem with pdf, upgrading Cell

>

> another one here..

> On Tue, May 4, 2010 at 5:20 PM, Praveen Agrawal  pkal...@gmail.com>> wrote:

> It bounced because of attachment's size..

> attaching one by one now..

>

>

> On Tue, May 4, 2010 at 5:17 PM, Praveen Agrawal  pkal...@gmail.com>> wrote:

> I noticed following pattern/relationship b/w producer/creator and content

> extraction, not sure if helpful (as Grant told earlier pdfs are notorious):

>

> producer: Bullzip PDF Printer / www.bullzip.com /

> Freeware Edition (not registered)

> Creator: PScript5.dll Version 5.2.2

> Extraction: no content  --  "installing Solr in Tomcat.pdf" (attached - i

> generated)

> -

>

> Producer: Acrobat Distiller 7.0.5 (Windows)

> creator: PScript5.dll Version 5.2.2

> Extraction: One line content

> --

>

> Producer: Acrobat Distiller 8.1.0 (Windows)

> creator: Acrobat PDFMaker 8.1 for Word

> Extraction:  one line of content(Free_Two_way_Radio_Guide.pdf -

> attached - was available freely on their website)

> -

>

> Producer: FOP 0.20.5

> Extraction: full content"/docs/features.pdf | linkmap.pdf" etc

> --

> Thanks.

> Praveen

>

>

> On Tue, May 4, 2010 at 5:05 PM, Praveen Agrawal  pkal...@gmail.com>> wrote:

> Yes Sandhya,

> i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is

> what you were asking.

> Thanks.

>

>

> On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal  > wrote:

> Praveen,

>

> Along with the tika core and parser jars, did you run "mvn

> dependency:copy-dependencies", to generate all the dependencies too.

>

> Thanks,

> Sandhya

>

> -Original Message-

> From: Praveen Agrawal [mailto:pkal...@gmail.com]

> Sent: Tuesday, May 04, 2010 4:52 PM

> To: solr-user@lucene.apache.org

> Subject: Re: Problem with pdf, upgrading Cell

> I seems to have mixed results:

>

> Here is what i did:

> copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in

> contrib/extraction/lib (of-course removed old on

max no of column in schema

2010-05-04 Thread Ranveer


Hi,

How many column can we defined in schema.
I have already around 100 column in schema..

thanks

Re: Custom DIH variables

2010-05-04 Thread Noble Paul നോബിള്‍ नोब्ळ्

you can use custom parameters from request like ,
${dataimporter.request.foo}. pass the value of foo as a request param
say foo=bar


On Wed, May 5, 2010 at 6:05 AM, Blargy  wrote:
>
> Can someone please point me in the right direction (classes) on how to create
> my own custom dih variable that can be used in my data-config.xml
>
> So instead of ${dataimporter.last_index_time} I want to be able to create
> ${dataimporter.foo}
>
> Thanks
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p777696.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

71 matches

Mail list logo