when to change rows param?

2011-03-22 Thread Paul Libbrecht

Hello list,

I've been using my own QueryComponent (that extends the search one) 
successfully to rewrite web-received parameters that are sent from the 
(ExtJS-based) javascript client.
This allows an amount of query-rewriting, that's good.
I tried to change the rows parameter there (which is "limit" in the query, as 
per the underpinnings of ExtJS) but it seems that this is not enough.

Which component should I subclass to change the rows parameter?

thanks in advance

paul

Re: working with collection : Where is default schema.xml

2011-03-22 Thread Geert-Jan Brits
Changing the default schema.xml to what you want is the way to go for most
of us.
It's a good learning experience as well, since it contains a lot of
documentation about the options that may be of interest to you.

Cheers,
Geert-Jan

2011/3/22 geag34 

> Ok thank.
>
> It is my fault. I have created collection with a lucidimagination perl
> script.
>
> I will errase the schema.xml.
>
> Thanks
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/working-with-collection-Where-is-default-schema-xml-tp2700455p2712496.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SOLR-2242-distinctFacet.patch

2011-03-22 Thread Isha Garg

Hi,
  I  want to enquire the patch for 
namedistinct(SOLR-2242-distinctFacet.patch) available with solr4.0 trunk.


Thank!
Isha


Re: Help with explain query syntax

2011-03-22 Thread Glòria Martínez
Thank you very much!

On Wed, Mar 9, 2011 at 2:01 AM, Yonik Seeley wrote:

> It's probably the WordDelimiterFilter:
>
> > org.apache.solr.analysis.WordDelimiterFilterFactory
> args:{preserveOriginal:
> > 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0
> > generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 }
>
> Get rid of the preserveOriginal="1" in the query analyzer.
>
> -Yonik
> http://lucidimagination.com
>
> On Tue, Mar 1, 2011 at 9:01 AM, Glòria Martínez
>  wrote:
> > Hello,
> >
> > I can't understand why this query is not matching anything. Could someone
> > help me please?
> >
> > *Query*
> >
> http://localhost:8894/solr/select?q=linguajob.pl&qf=company_name&wt=xml&qt=dismax&debugQuery=on&explainOther=id%3A1
> >
> > 
> > -
> > 
> > 0
> > 12
> > -
> > 
> > id:1
> > on
> > linguajob.pl
> > company_name
> > xml
> > dismax
> > 
> > 
> > 
> > -
> > 
> > linguajob.pl
> > linguajob.pl
> > -
> > 
> > +DisjunctionMaxQuery((company_name:"(linguajob.pl linguajob) pl")~0.01)
> ()
> > 
> > -
> > 
> > +(company_name:"(linguajob.pl linguajob) pl")~0.01 ()
> > 
> > 
> > id:1
> > -
> > 
> > -
> > 
> >
> > 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
> > clause(s)
> >  0.0 = no match on required clause (company_name:"(linguajob.pllinguajob)
> > pl") *<- What does this syntax (field:"(token1 token2) token3") mean?*
> >0.0 = (NON-MATCH) fieldWeight(company_name:"(linguajob.pl linguajob)
> pl"
> > in 0), product of:
> >  0.0 = tf(phraseFreq=0.0)
> >  1.6137056 = idf(company_name:"(linguajob.pl linguajob) pl")
> >  0.4375 = fieldNorm(field=company_name, doc=0)
> > 
> > 
> > DisMaxQParser
> > 
> > 
> > +
> > 
> > ...
> > 
> >
> >
> >
> > There's only one document indexed:
> >
> > *Document*
> > http://localhost:8894/solr/select?q=1&qf=id&wt=xml&qt=dismax
> > 
> > -
> > 
> > 0
> > 2
> > -
> > 
> > id
> > xml
> > dismax
> > 1
> > 
> > 
> > -
> > 
> > -
> > 
> > LinguaJob.pl
> > 1
> > 6
> > 2011-03-01T11:14:24.553Z
> > 
> > 
> > 
> >
> > *Solr Admin Schema*
> > Field: company_name
> > Field Type: text
> > Properties: Indexed, Tokenized, Stored
> > Schema: Indexed, Tokenized, Stored
> > Index: Indexed, Tokenized, Stored
> >
> > Position Increment Gap: 100
> >
> > Index Analyzer: org.apache.solr.analysis.TokenizerChain Details
> > Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory
> > Filters:
> > schema.UnicodeNormalizationFilterFactory args:{composed: false
> > remove_modifiers: true fold: true version: java6 remove_diacritics: true
> }
> > org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
> > ignoreCase: true enablePositionIncrements: true }
> > org.apache.solr.analysis.WordDelimiterFilterFactory
> args:{preserveOriginal:
> > 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1
> > generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 }
> > org.apache.solr.analysis.LowerCaseFilterFactory args:{}
> > org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
> >
> > Query Analyzer: org.apache.solr.analysis.TokenizerChain Details
> > Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory
> > Filters:
> > schema.UnicodeNormalizationFilterFactory args:{composed: false
> > remove_modifiers: true fold: true version: java6 remove_diacritics: true
> }
> > org.apache.solr.analysis.SynonymFilterFactory args:{synonyms:
> synonyms.txt
> > expand: true ignoreCase: true }
> > org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
> > ignoreCase: true }
> > org.apache.solr.analysis.WordDelimiterFilterFactory
> args:{preserveOriginal:
> > 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0
> > generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 }
> > org.apache.solr.analysis.LowerCaseFilterFactory args:{}
> > org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
> >
> > Docs: 1
> > Distinct: 5
> > Top 5 terms
> > term frequency
> > lingua 1
> > linguajob.pl 1
> > linguajobpl 1
> > pl 1
> > job 1
> >
> > *Solr Analysis*
> > Field name: company_name
> > Field value (Index): LinguaJob.pl
> > Field value (Query): linguajob.pl
> >
> > *Index Analyzer
> >
> > org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> > term position 1
> > term text LinguaJob.pl
> > term type word
> > source start,end 0,12
> > payload
> >
> > schema.UnicodeNormalizationFilterFactory {composed=false,
> > remove_modifiers=true, fold=true, version=java6, remove_diacritics=true}
> > term position 1
> > term text LinguaJob.pl
> > term type word
> > source start,end 0,12
> > payload
> >
> > org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
> > ignoreCase=true, enablePositionIncrements=true}
> > term position 1
> > term text LinguaJob.pl
> > term type word
> > source start,end 0,12
> > payload
> >
> > org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1,
> > splitOnCaseChange=1, generateNumberParts=1, catenateWords=1,
> > generateWordParts=1, catenateAll=0, catenate

Re: Transform a SolrDocument into a SolrInputDocument

2011-03-22 Thread Marc SCHNEIDER
Ok that's perfectly clear.
Thanks a lot for all your answers!

Marc.

On Mon, Mar 21, 2011 at 4:34 PM, Gora Mohanty  wrote:

> On Mon, Mar 21, 2011 at 8:33 PM, Marc SCHNEIDER
>  wrote:
> > Hi Erick,
> >
> > Thanks for your answer.
> > I'm a quite newbie to Solr so I'm a little bit confused.
> > Do you mean that (using Solrj in my case) I should add all fields (stored
> > and not stored) before adding the document to the index?
> [...]
>
> No, what he means is that the Solr output contains *only*
> stored fields. There might be fields that are indexed (i.e.,
> available for search), but not stored in the current index.
> In such a case, these fields will be blank in the Solr output,
> and consequently, in the updated document.
>
> Regards,
> Gora
>


solr on the cloud

2011-03-22 Thread Dmitry Kan
hey folks,

I have tried running the sharded solr with zoo keeper on a single machine.
The SOLR code is from current trunk. It runs nicely. Can you please point me
to a page, where I can check the status of the solr on the cloud development
and available features, apart from http://wiki.apache.org/solr/SolrCloud ?

Basically, of high interest is checking out the Map-Reduce for distributed
faceting, is it even possible with the trunk?

-- 
Regards,

Dmitry Kan


Re: DIH Issue(newbie to solr)

2011-03-22 Thread Gora Mohanty
On Mon, Mar 21, 2011 at 10:59 PM, neha  wrote:
> Thanks Gora it works..!!! Thanks again. One last question, the documents get
> indexed well and all but when I issue full-import command it still says
> Total Requests made to DataSource 0
[...]

Not sure why that is, but would guess that it is something to do with
how a FileDataSource is used. Unfortunately, I do not have the time
to test this right now, but you could check if you get the same zero
requests with the RSS indexing example in example/example-DIH/solr/rss/
in the Solr distribution?

Regards,
Gora


Re: email - DIH

2011-03-22 Thread Erick Erickson
Not unless you provide a lot more data. Have you
inspected the Solr logs and seen any anomalies?

Please review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, Mar 21, 2011 at 3:56 PM, Matias Alonso  wrote:
> Hi,
>
>
> I’m using Data Import Handler for index emails.
>
> The problem is that nota ll the emails was indexed When I do a full import.
>
> Someone have any idea?
>
>
> Regards,
>
> --
> Matias.
>


Re: How to upgrade to Solr4.0 to use Result Grouping?

2011-03-22 Thread Erick Erickson
Awww, rats. Thanks Yonik, I keep getting this mixed up...

Erick

On Mon, Mar 21, 2011 at 2:57 PM, Yonik Seeley
 wrote:
> On Mon, Mar 21, 2011 at 10:20 AM, Erick Erickson
>  wrote:
>> Get the release and re-index? You can get a trunk
>> version either through SVN or from the nightly build
>> at https://builds.apache.org/hudson/view/S-Z/view/Solr/
>>
>> Note that 3.1 also has result grouping
>
> Result grouping / field collapsing is in trunk (4.0-dev) only.
>
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
>


help with Solr installation within Tomcat7

2011-03-22 Thread ramdev.wudali
Hi All:
   I have just started using Solr and have it successfully installed within a 
Tomcat7 Webapp server.
I have also indexed documents using the SolrJ interfaces. The following is my 
problem:

I installed Solr under Tomcat7 folders and setup an xml configuration file to 
indicate the Solr home variables as detailed on the wiki (for Solr install 
within TOmcat)
The indexes seem to reside within the solr_home folder under the data folder  
(/data/index )

However when I make a zip copy of the the complete install (i.e. tomcat with 
Solr), and move it to a different machine and unzip/install it,
The index seems to be inaccessible. (I did change the solr.xml configuration 
variables to point to the new location)

>From what I know, with tomcat installations, it should be as simple as zipping 
>a current working installation and unzipping/installing  on a different 
>machine/location.

Am I missing something that makes Solr "hardcode" the path to the index in an 
install ?

Simple plut, I would like to know how to "transport" an existing install of 
Solr within TOmcat 7 from one machine to another and still have it working.

Ramdev=


Re: Adding the suggest component

2011-03-22 Thread Brian Lamb
Thanks everyone for the advice. I checked out a recent version from SVN and
ran:

ant clean example

This worked just fine. However when I went to start the solr server, I get
this error message:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.dataimport.DataImportHandler'

It looks like those files are there:

contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/

But for some reason, they aren't able to be found. Where would I update this
setting and what would I update it to?

Thanks,

Brian Lamb

On Mon, Mar 21, 2011 at 10:15 AM, Erick Erickson wrote:

> OK, I think you're jumping ahead and trying to do
> too many things at once.
>
> What did you download? Source? The distro? The error
> you posted usually happens for me when I haven't
> compiled the "example" target from source. So I'd guess
> you don't have the proper targets built. This assumes you
> downloaded the source via SVN.
>
> If you downloaded a distro, I'd start by NOT copying anything
> anywhere, just go to the example code and start Solr. Make
> sure you have what you think you have.
>
> I've seen "interesting" things get cured by removing the entire
> directory where your servlet container unpacks war files, but
> that's usually in development environments.
>
> When I get in these situations, I usually find it's best to back
> up, do one thing at a time and verify that I get the expected
> results at each step. It's tedious, but
>
> Best
> Erick
>
>
> On Fri, Mar 18, 2011 at 4:18 PM, Ahmet Arslan  wrote:
> >> downloaded a recent version and
> >> > > there were the following files/folders:
> >> > >
> >> > > build.xml
> >> > > dev-tools
> >> > > LICENSE.txt
> >> > > lucene
> >> > > NOTICE.txt
> >> > > README.txt
> >> > > solr
> >> > >
> >> > > So I did cp -r solr/* /path/to/solr/stuff/ and
> >> started solr. I didn't get
> >> > > any error message but I only got the following
> >> messages:
> >
> > How do you start solr? using java -jar start.jar? Did you run 'ant clean
> example' in the solr folder?
> >
> >
> >
> >
>


Re: Adding the suggest component

2011-03-22 Thread Brian Lamb
I found the following in the build.xml file:


   
















  


It looks like the dataimport handler path is correct in there so I don't
understand why it's not being compile.

I ran ant example again today but I'm still getting the same error.

Thanks,

Brian Lamb

On Tue, Mar 22, 2011 at 11:28 AM, Brian Lamb
wrote:

> Thanks everyone for the advice. I checked out a recent version from SVN and
> ran:
>
> ant clean example
>
> This worked just fine. However when I went to start the solr server, I get
> this error message:
>
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.handler.dataimport.DataImportHandler'
>
> It looks like those files are there:
>
> contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/
>
> But for some reason, they aren't able to be found. Where would I update
> this setting and what would I update it to?
>
> Thanks,
>
> Brian Lamb
>
> On Mon, Mar 21, 2011 at 10:15 AM, Erick Erickson 
> wrote:
>
>> OK, I think you're jumping ahead and trying to do
>> too many things at once.
>>
>> What did you download? Source? The distro? The error
>> you posted usually happens for me when I haven't
>> compiled the "example" target from source. So I'd guess
>> you don't have the proper targets built. This assumes you
>> downloaded the source via SVN.
>>
>> If you downloaded a distro, I'd start by NOT copying anything
>> anywhere, just go to the example code and start Solr. Make
>> sure you have what you think you have.
>>
>> I've seen "interesting" things get cured by removing the entire
>> directory where your servlet container unpacks war files, but
>> that's usually in development environments.
>>
>> When I get in these situations, I usually find it's best to back
>> up, do one thing at a time and verify that I get the expected
>> results at each step. It's tedious, but
>>
>> Best
>> Erick
>>
>>
>> On Fri, Mar 18, 2011 at 4:18 PM, Ahmet Arslan  wrote:
>> >> downloaded a recent version and
>> >> > > there were the following files/folders:
>> >> > >
>> >> > > build.xml
>> >> > > dev-tools
>> >> > > LICENSE.txt
>> >> > > lucene
>> >> > > NOTICE.txt
>> >> > > README.txt
>> >> > > solr
>> >> > >
>> >> > > So I did cp -r solr/* /path/to/solr/stuff/ and
>> >> started solr. I didn't get
>> >> > > any error message but I only got the following
>> >> messages:
>> >
>> > How do you start solr? using java -jar start.jar? Did you run 'ant clean
>> example' in the solr folder?
>> >
>> >
>> >
>> >
>>
>
>


Re: Adding the suggest component

2011-03-22 Thread Ahmet Arslan

--- On Tue, 3/22/11, Brian Lamb  wrote:

> From: Brian Lamb 
> Subject: Re: Adding the suggest component
> To: solr-user@lucene.apache.org
> Cc: "Erick Erickson" 
> Date: Tuesday, March 22, 2011, 5:28 PM
> Thanks everyone for the advice. I
> checked out a recent version from SVN and
> ran:
> 
> ant clean example
> 
> This worked just fine. However when I went to start the
> solr server, I get
> this error message:
> 
> SEVERE: org.apache.solr.common.SolrException: Error loading
> class
> 'org.apache.solr.handler.dataimport.DataImportHandler'

run 'ant clean dist' and copy trunk/solr/dist/

apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar
apache-solr-dataimporthandler-4.0-SNAPSHOT.jar

to solrHome/lib directory.





  


Re: email - DIH

2011-03-22 Thread Matias Alonso
Thank you very much for your answer Erick.


My apologies for the previous email; my problem is that I don´t speak
English very well and I´m new in the world of mailing list.


The problem is that I´m indexing emails throw Data import Handler using
Gmail with imaps; I do this for search on email list in the future. The
emails are indexed partiality and I can´t found the problem of why don´t
index all of the emails.



Below I show you de configuration of my DIH.






   







The date of my emails is later to “2010-01-01 00:00:00”.




I´ve done a full import and no errors were found, but in the status I saw
that was added 28 documents, and in the console, I found 35 messanges.

Below I show you the status screen, first, and then part of the console
output.



Status:





0

1







data-config.xml





status

idle





0

28

0

2011-03-22 15:55:12



Indexing completed. Added/Updated: 28 documents. Deleted 0 documents.



2011-03-22 15:55:20

2011-03-22 15:55:20

28

0:0:8.520





This response format is experimental.  It is likely to change in the future.







…”

Mar 22, 2011 3:55:14 PM
org.apache.solr.handler.dataimport.MailEntityProcessor connectToMailBox

INFO: Connected to mailbox

Mar 22, 2011 3:55:15 PM
org.apache.solr.handler.dataimport.MailEntityProcessor$FolderIterator next

INFO: Opened folder : inbox

Mar 22, 2011 3:55:15 PM
org.apache.solr.handler.dataimport.MailEntityProcessor$FolderIterator next

INFO: Added its children to list  :

Mar 22, 2011 3:55:15 PM
org.apache.solr.handler.dataimport.MailEntityProcessor$FolderIterator next

INFO: NO children :

Mar 22, 2011 3:55:16 PM
org.apache.solr.handler.dataimport.MailEntityProcessor$MessageIterator


INFO: Total messages : 35

Mar 22, 2011 3:55:16 PM
org.apache.solr.handler.dataimport.MailEntityProcessor$MessageIterator


INFO: Search criteria applied. Batching disabled

Mar 22, 2011 3:55:19 PM org.apache.solr.handler.dataimport.DocBuilder finish

INFO: Import completed successfully

“…



Regards,

Matias.





2011/3/22 Erick Erickson 

> Not unless you provide a lot more data. Have you
> inspected the Solr logs and seen any anomalies?
>
> Please review:
> http://wiki.apache.org/solr/UsingMailingLists
>
> Best
> Erick
>
> On Mon, Mar 21, 2011 at 3:56 PM, Matias Alonso 
> wrote:
> > Hi,
> >
> >
> > I’m using Data Import Handler for index emails.
> >
> > The problem is that nota ll the emails was indexed When I do a full
> import.
> >
> > Someone have any idea?
> >
> >
> > Regards,
> >
> > --
> > Matias.
> >
>


Re: Adding the suggest component

2011-03-22 Thread Brian Lamb
Awesome! That fixed that problem. I'm getting another class not found error
but I'll see if I can fix it on my own first.

On Tue, Mar 22, 2011 at 11:56 AM, Ahmet Arslan  wrote:

>
> --- On Tue, 3/22/11, Brian Lamb  wrote:
>
> > From: Brian Lamb 
> > Subject: Re: Adding the suggest component
> > To: solr-user@lucene.apache.org
> > Cc: "Erick Erickson" 
> > Date: Tuesday, March 22, 2011, 5:28 PM
> > Thanks everyone for the advice. I
> > checked out a recent version from SVN and
> > ran:
> >
> > ant clean example
> >
> > This worked just fine. However when I went to start the
> > solr server, I get
> > this error message:
> >
> > SEVERE: org.apache.solr.common.SolrException: Error loading
> > class
> > 'org.apache.solr.handler.dataimport.DataImportHandler'
>
> run 'ant clean dist' and copy trunk/solr/dist/
>
> apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar
> apache-solr-dataimporthandler-4.0-SNAPSHOT.jar
>
> to solrHome/lib directory.
>
>
>
>
>
>
>


Re: Adding the suggest component

2011-03-22 Thread Brian Lamb
I fixed a few other exceptions it threw when I started the server but I
don't know how to fix this one:

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.solr.handler.dataimport.DataImportHandler
at java.lang.Class.forName0(Native Method)

java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at
org.apache.solr.handler.dataimport.DataImportHandler.(DataImportHandler.java:72)

Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

I've searched Google but haven't been able to find a reason why this happens
and how to fix it.

Thanks,

Brian Lamb

On Tue, Mar 22, 2011 at 12:54 PM, Brian Lamb
wrote:

> Awesome! That fixed that problem. I'm getting another class not found error
> but I'll see if I can fix it on my own first.
>
>
> On Tue, Mar 22, 2011 at 11:56 AM, Ahmet Arslan  wrote:
>
>>
>> --- On Tue, 3/22/11, Brian Lamb  wrote:
>>
>> > From: Brian Lamb 
>> > Subject: Re: Adding the suggest component
>> > To: solr-user@lucene.apache.org
>> > Cc: "Erick Erickson" 
>> > Date: Tuesday, March 22, 2011, 5:28 PM
>> > Thanks everyone for the advice. I
>> > checked out a recent version from SVN and
>> > ran:
>> >
>> > ant clean example
>> >
>> > This worked just fine. However when I went to start the
>> > solr server, I get
>> > this error message:
>> >
>> > SEVERE: org.apache.solr.common.SolrException: Error loading
>> > class
>> > 'org.apache.solr.handler.dataimport.DataImportHandler'
>>
>> run 'ant clean dist' and copy trunk/solr/dist/
>>
>> apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar
>> apache-solr-dataimporthandler-4.0-SNAPSHOT.jar
>>
>> to solrHome/lib directory.
>>
>>
>>
>>
>>
>>
>>
>


Solr 1.4.1 and Tika 0.9 - some tests not passing

2011-03-22 Thread Andreas Kemkes
Due to some PDF indexing issues with the Solr 1.4.1 distribution, we would like 
to upgrade it to Tika 0.9, as the issues are not occurring in Tika 0.9.

With the changes we made to Solr 1.4.1, we can successfully index the 
previously 
failing PDF documents.

Unfortunately we cannot get the HTML-related tests to pass.

The following asserts in ExtractingRequestHandlerTest.java are failing:

assertQ(req("title:Welcome"), "//*[@numFound='1']");
assertQ(req("+id:simple2 +t_href:[* TO *]"), "//*[@numFound='1']");
assertQ(req("t_href:http"), "//*[@numFound='2']");
assertQ(req("t_href:http"), "//doc[1]/str[.='simple3']");
assertQ(req("+id:simple4 +t_content:Solr"), "//*[@numFound='1']");
assertQ(req("defaultExtr:http\\://www.apache.org"), "//*[@numFound='1']");
assertQ(req("+id:simple2 +t_href:[* TO *]"), "//*[@numFound='1']");
assertTrue(val + " is not equal to " + "linkNews", val.equals("linkNews") == 
true);//there are two  tags, and they get collapesd

Below are the differences in output from Tika 0.4 and Tika 0.9 for simple.html.

Tika 0.9 has additional meta tags, a shape attribute, and some additional white 
space.  Is this what throws it off?  

What do we need to consider so that Solr 1.4.1 will process the Tika 0.9 output 
correctly?

Do we need to configure different filters and tokenizers?  Which ones?

Or is it something else entirely?

Thanks in advance for any help,

Andreas

$ java -jar tika-app-0.4.jar 
../../../apache-solr-1.4.1-with-tika-0.9/contrib/extraction/src/test/resources/simple.html



Welcome to Solr



  Here is some text


Here is some text in a div
This has a link'>http://www.apache.org";>link.





$ java -jar tika-app-0.9.jar 
../../../apache-solr-1.4.1-with-tika-0.9/contrib/extraction/src/test/resources/simple.html
 







Welcome to Solr



  Here is some text


Here is some text in a div

This has a link'>http://www.apache.org";>link.






  

Re: Adding the suggest component

2011-03-22 Thread Ahmet Arslan
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.solr.handler.dataimport.DataImportHandler
> at java.lang.Class.forName0(Native Method)
> 
> java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
> at
> org.apache.solr.handler.dataimport.DataImportHandler.(DataImportHandler.java:72)
> 
> Caused by: java.lang.ClassNotFoundException:
> org.slf4j.LoggerFactory
> at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
> 

You can find slf4j- related jars in \trunk\solr\lib, but this error is weird.


  


Architecture question about solr sharding

2011-03-22 Thread JohnRodey
I have an issue and I'm wondering if there is an easy way around it with just
SOLR.

I have multiple SOLR servers and a field in my schema is a relative path to
a binary file.  Each SOLR server is responsible for a different subset of
data that belongs to a different base path.

For Example...

My directory structure may look like this:
/someDir/Jan/binaryfiles/...
/someDir/Feb/binaryfiles/...
/someDir/Mar/binaryfiles/...
/someDir/Apr/binaryfiles/...

Server1 is responsible for Jan, Server2 for Feb, etc...

And a response document may have a field like this
my entry
binaryfiles/12345.bin

How can I tell from my main search server which server returned a result?
I cannot put the full path in the index because my path structure might
change in the future.  Using this example it may go to '/someDir/Jan2011/'.

I basically need to find a way to say 'Ah! server01 returned this result, so
it must be in /someDir/Jan'

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Architecture-question-about-solr-sharding-tp2716417p2716417.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding the suggest component

2011-03-22 Thread Brian Lamb
That fixed that error as well as the could not initialize Dataimport class
error. Now I'm getting:

org.apache.solr.common.SolrException: Error Instantiating Request Handler,
org.apache.solr.handler.dataimport.DataImportHandler is not a
org.apache.solr.request.SolrRequestHandler

I can't find anything on this one. What I've added to the solrconfig.xml
file matches whats in example-DIH so I don't quite understand what the issue
is here. It sounds to me like it is not declared properly somewhere but I'm
not sure where/why.

Here is the relevant portion of my solrconfig.xml file:


   
 db-data-config.xml
   


Thanks for all the help so far. You all have been great.

Brian Lamb

On Tue, Mar 22, 2011 at 3:17 PM, Ahmet Arslan  wrote:

> > java.lang.NoClassDefFoundError: Could not initialize class
> > org.apache.solr.handler.dataimport.DataImportHandler
> > at java.lang.Class.forName0(Native Method)
> >
> > java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
> > at
> >
> org.apache.solr.handler.dataimport.DataImportHandler.(DataImportHandler.java:72)
> >
> > Caused by: java.lang.ClassNotFoundException:
> > org.slf4j.LoggerFactory
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
> >
>
> You can find slf4j- related jars in \trunk\solr\lib, but this error is
> weird.
>
>
>
>


Re: Solr performance issue

2011-03-22 Thread Alexey Serba
> Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
> to 8gb every 20 seconds or so,
> gc runs, falls down to 1gb.

Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot.

Do you return all results (ids) for your queries? Any tricky
faceting/sorting/function queries?


Re: help with Solr installation within Tomcat7

2011-03-22 Thread Erick Erickson
What error are you receiving? Check your config files for any
absolute rather than relative paths would be my first guess...

Best
Erick

On Tue, Mar 22, 2011 at 10:09 AM,   wrote:
> Hi All:
>   I have just started using Solr and have it successfully installed within a 
> Tomcat7 Webapp server.
> I have also indexed documents using the SolrJ interfaces. The following is my 
> problem:
>
> I installed Solr under Tomcat7 folders and setup an xml configuration file to 
> indicate the Solr home variables as detailed on the wiki (for Solr install 
> within TOmcat)
> The indexes seem to reside within the solr_home folder under the data folder  
> (/data/index )
>
> However when I make a zip copy of the the complete install (i.e. tomcat with 
> Solr), and move it to a different machine and unzip/install it,
> The index seems to be inaccessible. (I did change the solr.xml configuration 
> variables to point to the new location)
>
> From what I know, with tomcat installations, it should be as simple as 
> zipping a current working installation and unzipping/installing  on a 
> different machine/location.
>
> Am I missing something that makes Solr "hardcode" the path to the index in an 
> install ?
>
> Simple plut, I would like to know how to "transport" an existing install of 
> Solr within TOmcat 7 from one machine to another and still have it working.
>
> Ramdev=
>


Re: Architecture question about solr sharding

2011-03-22 Thread Erick Erickson
I'd just put the data in the document. That way, you're not
inferring anything, you *know* which shard (or even the
logical shard) the data came from.

Does that make sense in your problem sace?

Erick

On Tue, Mar 22, 2011 at 3:20 PM, JohnRodey  wrote:
> I have an issue and I'm wondering if there is an easy way around it with just
> SOLR.
>
> I have multiple SOLR servers and a field in my schema is a relative path to
> a binary file.  Each SOLR server is responsible for a different subset of
> data that belongs to a different base path.
>
> For Example...
>
> My directory structure may look like this:
> /someDir/Jan/binaryfiles/...
> /someDir/Feb/binaryfiles/...
> /someDir/Mar/binaryfiles/...
> /someDir/Apr/binaryfiles/...
>
> Server1 is responsible for Jan, Server2 for Feb, etc...
>
> And a response document may have a field like this
> my entry
> binaryfiles/12345.bin
>
> How can I tell from my main search server which server returned a result?
> I cannot put the full path in the index because my path structure might
> change in the future.  Using this example it may go to '/someDir/Jan2011/'.
>
> I basically need to find a way to say 'Ah! server01 returned this result, so
> it must be in /someDir/Jan'
>
> Thanks!
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Architecture-question-about-solr-sharding-tp2716417p2716417.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: help with Solr installation within Tomcat7

2011-03-22 Thread Ezequiel Calderara
Where is your solr files (war, conf files) located? How did you instance
solr in tomcat?

On Tue, Mar 22, 2011 at 7:08 PM, Erick Erickson wrote:

> What error are you receiving? Check your config files for any
> absolute rather than relative paths would be my first guess...
>
> Best
> Erick
>
> On Tue, Mar 22, 2011 at 10:09 AM,  
> wrote:
> > Hi All:
> >   I have just started using Solr and have it successfully installed
> within a Tomcat7 Webapp server.
> > I have also indexed documents using the SolrJ interfaces. The following
> is my problem:
> >
> > I installed Solr under Tomcat7 folders and setup an xml configuration
> file to indicate the Solr home variables as detailed on the wiki (for Solr
> install within TOmcat)
> > The indexes seem to reside within the solr_home folder under the data
> folder  (/data/index )
> >
> > However when I make a zip copy of the the complete install (i.e. tomcat
> with Solr), and move it to a different machine and unzip/install it,
> > The index seems to be inaccessible. (I did change the solr.xml
> configuration variables to point to the new location)
> >
> > From what I know, with tomcat installations, it should be as simple as
> zipping a current working installation and unzipping/installing  on a
> different machine/location.
> >
> > Am I missing something that makes Solr "hardcode" the path to the index
> in an install ?
> >
> > Simple plut, I would like to know how to "transport" an existing install
> of Solr within TOmcat 7 from one machine to another and still have it
> working.
> >
> > Ramdev=
> >
>



-- 
__
Ezequiel.

Http://www.ironicnet.com


Multiple Cores with Solr Cell for indexing documents

2011-03-22 Thread Brandon Waterloo
Hello everyone,

I've been trying for several hours now to set up Solr with multiple cores with 
Solr Cell working on each core.  The only items being indexed are PDF, DOC, and 
TXT files (with the possibility of expanding this list, but for now, just 
assume the only things in the index should be documents).

I never had any problems with Solr Cell when I was using a single core.  In 
fact, I just ran the default installation in example/ and worked from that.  
However, trying to migrate to multi-core has been a never ending list of 
problems.

Any time I try to add a document to the index (using the same curl command as I 
did to add to the single core, of course adding the core name to the request 
URL-- host/solr/corename/update/extract...), I get HTTP 500 errors due to 
classes not being found and/or lazy loading errors.  I've copied the exact 
example/lib directory into the cores, and that doesn't work either.

Frankly the only libraries I want are those relevant to indexing files.  The 
less bloat, the better, after all.  However, I cannot figure out where to put 
what files, and why the example installation works perfectly for single-core 
but not with multi-cores.

Here is an example of the errors I'm receiving:

command prompt> curl 
"host/solr/core0/update/extract?literal.id=2-3-1&commit=true" -F 
"myfile=@test2.txt"




Error 500 

HTTP ERROR: 500org/apache/tika/exception/TikaException

java.lang.NoClassDefFoundError: org/apache/tika/exception/TikaException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.lang.ClassNotFoundException: 
org.apache.tika.exception.TikaException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 27 more

RequestURI=/solr/core0/update/extracthttp://jetty.mortbay.org/";>Powered by Jetty://























Any assistance you could provide or installation guides/tutorials/etc. that you 
could link me to would be greatly appreciated.  Thank you all for your time!

~Brandon Waterloo



Re: email - DIH

2011-03-22 Thread Gora Mohanty
On Tue, Mar 22, 2011 at 9:38 PM, Matias Alonso  wrote:
[...]
> The problem is that I´m indexing emails throw Data import Handler using
> Gmail with imaps; I do this for search on email list in the future. The
> emails are indexed partiality and I can´t found the problem of why don´t
> index all of the emails.
[...]
> I´ve done a full import and no errors were found, but in the status I saw
> that was added 28 documents, and in the console, I found 35 messanges.
[...]

> INFO: Total messages : 35
>
> Mar 22, 2011 3:55:16 PM
> org.apache.solr.handler.dataimport.MailEntityProcessor$MessageIterator
> 
>
> INFO: Search criteria applied. Batching disabled
[...]

The above seems to indicate that the MailEntityProcessor does find
all 35 messages, but indexes only 28. Are you sure that all 35 are
since 2010-01-01 00:00:00? Could you try without fetchMailsSince?

Regards,
Gora