Dataimporthandler & Timestamp Error ?

2009-05-07 Thread gateway0

Hi,

when I do a full import I get the following error :

"Caused by: java.sql.SQLException: Cannot convert value '-00-00
00:00:00' from column 10 to TIMESTAMP.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1321)
at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:573)
at
com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6617)
at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:5943)
at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4901)
at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4951)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:220)
... 11 more
Caused by: java.sql.SQLException: Value '[...@14f9f4a' can not be represented
as java.sql.Timestamp
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1027)
... 17 more"

But I thought the Timestamp is generated automatically and has nothing to do
with my mysql database?

best regards, Sebastian
-- 
View this message in context: 
http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23422139.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dataimporthandler & Timestamp Error ?

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
you may need to change the mysql connection parameters so that it does
not throw error for null date

"jdbc:mysql://localhost/test?zeroDateTimeBehavior=convertToNull"

On Thu, May 7, 2009 at 1:39 PM, gateway0  wrote:
>
> Hi,
>
> when I do a full import I get the following error :
>
> "Caused by: java.sql.SQLException: Cannot convert value '-00-00
> 00:00:00' from column 10 to TIMESTAMP.
>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
>        at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1321)
>        at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:573)
>        at
> com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6617)
>        at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:5943)
>        at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4901)
>        at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4951)
>        at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:220)
>        ... 11 more
> Caused by: java.sql.SQLException: Value '[...@14f9f4a' can not be represented
> as java.sql.Timestamp
>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
>        at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1027)
>        ... 17 more"
>
> But I thought the Timestamp is generated automatically and has nothing to do
> with my mysql database?
>
> best regards, Sebastian
> --
> View this message in context: 
> http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23422139.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Creating new QParserPlugin

2009-05-07 Thread Andrey Klochkov
Hi!

I agree that Solr is difficult to extend in many cases. We just patch Solr,
and I guess many other users patch it too. What I propose is to create some
Solr-community site (Solr incubator?) to public patches there, and Solr core
team could then look there and choose patches to apply to the Solr codebase.
I know that one can use Jira for that, but it's not convinient to use it in
this way.

On Thu, May 7, 2009 at 2:41 AM, KaktuChakarabati wrote:

>
> Hello everyone,
> I am trying to write a new QParserPlugin+QParser, one that will work
> similar
> to how DisMax does, but will give me more control over the
> FunctionQuery-related part of the query processing (e.g in regards to a
> specified bf parameter).
>
> In specific, I want to be able to affect the way the queryNorm (and
> possibly
> other factors) interact with a
> pre-computed value I store in a static field (i.e I compute an index-time
> score for a document that I wish to use in a bf as a ValueSource, without
> being affected by queryNorm or other such extranous considerations.)
>
> While trying this, I notice I run alot into cases where some parts I try to
> override/inherit from are private to a java package namespace, and this
> makes the whole thing very cumbersome.
>
> Examples for this are the DismaxQParser class which is defined as a local
> class inside the DisMaxQParserPlugin.java file (i think this is bad
> practice
> - otherwise, FunctionQParserPlugin/FunctionQParser do have their own
> seperate files, so i think this is a good convention to follow generally).
> Another case is where i try to inherit from FunctionQParser and end up not
> being able to replicate some of the parse() logic, because it uses the
> QueryParsing.StrParser class which is a static inner class and so is only
> accessible from the solr.search namespace.
>
> In short, many such cases seem to arise and i think this poses a
> considerable limitation on
> the possibilities of extending solr.
> If this resonates with more people here, I'd take this issue up with
> solr-dev.
>
> Otherwise, if some of you have some notions about going about what i'm
> trying to do differently,
> I would be happy to hear.
>
> Thanks,
> -Chak
> --
> View this message in context:
> http://www.nabble.com/Creating-new-QParserPlugin-tp23416974p23416974.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Andrew Klochkov


Re: Dataimporthandler & Timestamp Error ?

2009-05-07 Thread gateway0

Awesome, thanks!!! I first thought it could be "blob-field" related.

Have a nice day

Sebastian


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> you may need to change the mysql connection parameters so that it does
> not throw error for null date
> 
> "jdbc:mysql://localhost/test?zeroDateTimeBehavior=convertToNull"
> 
> On Thu, May 7, 2009 at 1:39 PM, gateway0  wrote:
>>
>> Hi,
>>
>> when I do a full import I get the following error :
>>
>> "Caused by: java.sql.SQLException: Cannot convert value '-00-00
>> 00:00:00' from column 10 to TIMESTAMP.
>>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
>>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
>>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
>>        at
>> com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1321)
>>        at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:573)
>>        at
>> com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6617)
>>        at
>> com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:5943)
>>        at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4901)
>>        at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4951)
>>        at
>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:220)
>>        ... 11 more
>> Caused by: java.sql.SQLException: Value '[...@14f9f4a' can not be
>> represented
>> as java.sql.Timestamp
>>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
>>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
>>        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
>>        at
>> com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1027)
>>        ... 17 more"
>>
>> But I thought the Timestamp is generated automatically and has nothing to
>> do
>> with my mysql database?
>>
>> best regards, Sebastian
>> --
>> View this message in context:
>> http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23422139.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23424354.html
Sent from the Solr - User mailing list archive at Nabble.com.



French and SpellingQueryConverter

2009-05-07 Thread Jonathan Mamou

Hi
I have tried to run the following code
package org.apache.solr.spelling;

import org.apache.lucene.analysis.fr.FrenchAnalyzer;


public class Test {

  public static void main (String args[]) {
SpellingQueryConverter sqc = new SpellingQueryConverter();
sqc.analyzer = new FrenchAnalyzer();
System.out.println(sqc.convert("français"));
  };

}};

I would expect to get [(français,0,8,type=)]
However I get [(fran,0,4,type=), (ais,5,8,type=)]
Is there any issue with the support of special characters?
Thanks
Jonathan



Re: aka Replication Stall

2009-05-07 Thread Jeff Newburn
We have not pushed the fix into production yet.  However, I am wondering two
things. 1. If the download takes more than 10 seconds (our replication can
take up to 90 seconds) will that be an issue 2. There are 3 patches, 2 have
2 line changes 1 has a large amount. Do we need the latest 2 or just the
latest 1?

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Reply-To: 
> Date: Wed, 6 May 2009 10:05:49 +0530
> To: 
> Subject: Re:  aka Replication Stall
> 
> SOLR-1096



Re: Solr Plugins Simple Questions for the Simpleton

2009-05-07 Thread Jeff Newburn
> On May 6, 2009, at 3:25 PM, Jeff Newburn wrote:
> 
>> We are trying to implement a SearchCompnent plugin.  I have been
>> looking at
>> QueryElevateComponent trying to weed through what needs to be done.
>> My
>> basic desire is to get the results back and manipulate them either by
>> altering the actual results or the facets.
>> 
>> Questions:
>> 1. Do the components fire off in order or all individually? If so
>> how does
>> one chain them together?
> 
> http://wiki.apache.org/solr/SearchComponent
I apologize. This question was more looking for insight into how the
requests are made.  One more interesting question is what does each
component get from the previous one?

>> 
>> 2. Where are the actual documents returned (ie what object gets the
>> return
>> results)?
> 
> Look on the ResponseBuilder object.
I looked into the javadoc for this class and the description is as follows:
This class is experimental and will be changing in the future.

Are there any tips to point us in the right direction to use and manipulate
this? Also does this class get passed from component to component?


>> 
>> 3. Is there any specific place I should manipulate the result set?
> 
> I've done it in the past right on the response docset/doclist, but
> I've seen others discourage this kind of thing b/c you might not know
> the downstream effects
So does the doc list get passed down the chain in the responsebuilder?


>> 4. Can the individual documents be changed before returning to the
>> client?
> 
> In what way?
In a way that might manipulate what is returned.  We have 2 potential
avenues. 1. Change the document to remove some values out of a multivalued
field or 2. Change the facets returned.

>> -- 
>> Jeff Newburn
>> Software Engineer, Zappos.com
>> jnewb...@zappos.com - 702-943-7562
>> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 



Re: Upgrading from 1.2.0 to 1.3.0

2009-05-07 Thread Rob Casson
this isn't advice on how to upgrade, but if you/your-project have a
bit of time to wait, 1.4 sounds like it's getting close to an official
releasefyi.

cheers,
rob


On Tue, May 5, 2009 at 1:05 PM, Francis Yakin  wrote:
>
> What's the best way to upgrade solr from 1.2.0 to 1.3.0 ?
>
> We have the current index that our users search running on 1.2.0 Solr version.
>
> We would like to upgrade it to 1.3.0?
>
> We have Master/Slaves env.
>
> What's the best way to upgrade it without affecting the search? Do we need to 
> do it on master or slaves first?
>
>
>
> Thanks
>
> Francis
>
>
>


Re: Solr autocompletion in rails

2009-05-07 Thread manisha_5

Thanks a lot for the information. But I am still a bit confused about the use
of TermsComponents. Like where are we exactly going to put these codes in
Solr.For example I changed schema.xml to add autocomplete feauture.I read
your blog too, its very helpful.But still a little confused. :-((
Can you explain it a bit?



Matt Weber-2 wrote:
> 
> You will probably want to use the new TermsComponent in Solr 1.4.  See
> http://wiki.apache.org/solr/TermsComponent 
> .  I just recently wrote a blog post about using autocompletion with  
> TermsComponent, a servlet, and jQuery.  You can probably follow these  
> instructions, but instead of writing a servlet you can write a rails  
> handler parsing the json output directly.
> 
> http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/
>  
> .
> 
> Thanks,
> 
> Matt Weber
> 
> 
> 
> On May 4, 2009, at 9:39 AM, manisha_5 wrote:
> 
>>
>> Hi,
>>
>> I am new to solr. I am using solr server to index the data and make  
>> search
>> in a Ruby on rails project.I want to add autocompletion feature. I  
>> tried
>> with the xml patch in the schema.xml file of solr, but dont know how  
>> to test
>> if the feature is working.also havent been able to integrate the  
>> same in the
>> Rails project that is using Solr.Can anyone please provide some help  
>> in this
>> regards??
>>
>> the patch of codes in Schema.xml is :
>>
>> 
>>
>>> minGramSize="3"
>> maxGramSize="15" />
>>
>>> pattern="([^a-z0-9])" replacement="" replace="all" />
>>> maxGramSize="100"
>> minGramSize="1" />
>>
>>
>>
>>
>>> pattern="([^a-z0-9])" replacement="" replace="all" />
>>> pattern="^(.{20})(.*)?" replacement="$1" replace="all" />
>>
>>   
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23372020.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23428267.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: When should I

2009-05-07 Thread Eric Sabourin
Great... thanks for the response!

2009/5/7 Noble Paul നോബിള്‍ नोब्ळ् 

> it is wise to optimize the index once in a while (daily may be). But
> it depends on how many commits you do in a day. Every commit causes
> fragmentation of index files and your search can become slow if you do
> not optimize it.
>
> But optimizing always is not recommended because it is time consuming
> and your replication (if it is a master slave setup) can take longer
>
> .
> if you do a delete all then do an optimize anyway
>
> On Wed, May 6, 2009 at 9:18 PM, Eric Sabourin
>  wrote:
> > Is the optimize xml command something which is only required when I
> delete
> > all the docs?
> > Or should I also send the optimize command following other operations? or
> > daily?
> >
> > Thanks...
> > Eric
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
Eric
Sent from Halifax, NS, Canada


Is it possible to writing solr result on disk from the server side?

2009-05-07 Thread arno13

Do you know if it's possible to writing solr results directly on a hard disk
from server side and not to use an HTTP connection to transfer the results?

While the query time is very fast for solr, I want to do that, cause of the
time taken during the transfer of the results between the client and the
solr server when you have lot of 'rows'.
For instance for 10'000 rows, the query time could be 50 ms and 19s to get
the results from the server. As my client and server are on the same system,
I could get the results faster directly on the hard disk (or better in a ram
disk), is it possible configuring solr for that?

Regards,



 
-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-writing-solr-result-on-disk-from-the-server-side--tp23428509p23428509.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: aka Replication Stall

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
the patches have gone into the trunk. The latest patch should be the
one if you wish to run a patched Solr.

10 secs readTimeout means that if there is no data coming from the
other end for 10 secs, then the waiting thread returns throwing an
exception. It is not the total time taken to read the entire data. At
least that is what I observed while testing.

BTW, if the timeout occurs it resumes from the point where the failure
happened. It retries 5 times before giving up.

On Thu, May 7, 2009 at 7:32 PM, Jeff Newburn  wrote:
> We have not pushed the fix into production yet.  However, I am wondering two
> things. 1. If the download takes more than 10 seconds (our replication can
> take up to 90 seconds) will that be an issue 2. There are 3 patches, 2 have
> 2 line changes 1 has a large amount. Do we need the latest 2 or just the
> latest 1?
>
> --
> Jeff Newburn
> Software Engineer, Zappos.com
> jnewb...@zappos.com - 702-943-7562
>
>
>> From: Noble Paul നോബിള്‍  नोब्ळ् 
>> Reply-To: 
>> Date: Wed, 6 May 2009 10:05:49 +0530
>> To: 
>> Subject: Re:  aka Replication Stall
>>
>> SOLR-1096
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Is it possible to writing solr result on disk from the server side?

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
did you consider using an EmbeddedSolrServer?

On Thu, May 7, 2009 at 8:25 PM, arno13  wrote:
>
> Do you know if it's possible to writing solr results directly on a hard disk
> from server side and not to use an HTTP connection to transfer the results?
>
> While the query time is very fast for solr, I want to do that, cause of the
> time taken during the transfer of the results between the client and the
> solr server when you have lot of 'rows'.
> For instance for 10'000 rows, the query time could be 50 ms and 19s to get
> the results from the server. As my client and server are on the same system,
> I could get the results faster directly on the hard disk (or better in a ram
> disk), is it possible configuring solr for that?
>
> Regards,
>
>
>
>
> --
> View this message in context: 
> http://www.nabble.com/Is-it-possible-to-writing-solr-result-on-disk-from-the-server-side--tp23428509p23428509.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: aka Replication Stall

2009-05-07 Thread Jeff Newburn
Excellent! Thank you I am going to start testing that.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Reply-To: 
> Date: Thu, 7 May 2009 20:26:02 +0530
> To: 
> Subject: Re:  aka Replication Stall
> 
> the patches have gone into the trunk. The latest patch should be the
> one if you wish to run a patched Solr.
> 
> 10 secs readTimeout means that if there is no data coming from the
> other end for 10 secs, then the waiting thread returns throwing an
> exception. It is not the total time taken to read the entire data. At
> least that is what I observed while testing.
> 
> BTW, if the timeout occurs it resumes from the point where the failure
> happened. It retries 5 times before giving up.
> 
> On Thu, May 7, 2009 at 7:32 PM, Jeff Newburn  wrote:
>> We have not pushed the fix into production yet.  However, I am wondering two
>> things. 1. If the download takes more than 10 seconds (our replication can
>> take up to 90 seconds) will that be an issue 2. There are 3 patches, 2 have
>> 2 line changes 1 has a large amount. Do we need the latest 2 or just the
>> latest 1?
>> 
>> --
>> Jeff Newburn
>> Software Engineer, Zappos.com
>> jnewb...@zappos.com - 702-943-7562
>> 
>> 
>>> From: Noble Paul നോബിള്‍  नोब्ळ् 
>>> Reply-To: 
>>> Date: Wed, 6 May 2009 10:05:49 +0530
>>> To: 
>>> Subject: Re:  aka Replication Stall
>>> 
>>> SOLR-1096
>> 
>> 
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com



Re: large index vs multicore

2009-05-07 Thread Nicolas Pastorino

Hi, and sorry for slightly hijacking the thread,

On Mar 26, 2009, at 2:54 , Otis Gospodnetic wrote:



Hi,

Without knowing the details, I'd say keep it in the same index if  
the additional information shares some/enough fields with the main  
product data and separately if it's sufficiently distinct (this  
also means 2 queries and manual merging/joining).


Where would this manual merging/joining occur? At the client-side or  
inside Solr, before returning the results ?

I was wondering what relevancy, sorting, etc. would become.
--
Nicolas



Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: "Manepalli, Kalyan" 
To: "solr-user@lucene.apache.org" 
Sent: Wednesday, March 25, 2009 5:46:40 PM
Subject: large index vs multicore

Hi All,
In my project, I have one primary core containing all  
the basic

information for a product.
Now I need to add additional information which will be searched  
and displayed in

conjunction with the product results.
My question is - From design and query speed point of - should I  
add new core to
handle the additional data or should I add the data to the  
existing core.


The data size is not very large around 150,000 - 200,000 documents.

Any insights into this will be helpful

Thanks,
Kalyan Manepalli




--
Nicolas Pastorino
Consultant - Trainer - System Developer
Phone :  +33 (0)4.78.37.01.34
eZ Systems ( Western Europe )  |  http://ez.no






RE: When should I

2009-05-07 Thread Wang, Ching-Hsien
We do optimize once a day at 1am.

Ching-hsien Wang,  Manager
Library and Archives System Support Branch
Office of Chief Information Officer
Smithsonian Institution
202-633-5581(office)  202-312-2874(fax)
wan...@si.edu
Visit us online: www.siris.si.edu

-Original Message-
From: Eric Sabourin [mailto:eric.sabourin2...@gmail.com] 
Sent: Thursday, May 07, 2009 10:52 AM
To: solr-user@lucene.apache.org
Subject: Re: When should I 

Great... thanks for the response!

2009/5/7 Noble Paul നോബിള്‍ नोब्ळ् 

> it is wise to optimize the index once in a while (daily may be). But
> it depends on how many commits you do in a day. Every commit causes
> fragmentation of index files and your search can become slow if you do
> not optimize it.
>
> But optimizing always is not recommended because it is time consuming
> and your replication (if it is a master slave setup) can take longer
>
> .
> if you do a delete all then do an optimize anyway
>
> On Wed, May 6, 2009 at 9:18 PM, Eric Sabourin
>  wrote:
> > Is the optimize xml command something which is only required when I
> delete
> > all the docs?
> > Or should I also send the optimize command following other operations? or
> > daily?
> >
> > Thanks...
> > Eric
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
Eric
Sent from Halifax, NS, Canada


Re: Phrase matching on a text field

2009-05-07 Thread Jay Hill
The string fieldtype is not being tokenized, while the text fieldtype is
tokenized. So the stop word "for" is being removed by a stop word filter,
which doesn't happen with the text field type (no tokenizing).

Have a look at the schema.xml in the example dir and look at the default
configuration for both the text and string fieldtypes. String string
fieldtype is not analyzed whereas the text fieldtype has a number of
different filters that take action.

-Jay


On Wed, May 6, 2009 at 11:09 PM, Phil Chadwick
wrote:

> Hi,
>
> I'm trying to figure out why phrase matching on a text field only works
> some of the time.
>
> I have a SOLR index containing a document titled "FUTURE DIRECTIONS FOR
> INTEGRATED CATCHMENT".  The "FOR" seems to be causing a problem...
>
> The title field is indexed as both s_title and t_title (string and text,
> as defined in the demo schema), thus:
>
>multiValued="false" />
>multiValued="false" />
>multiValued="false" />
>
>
>
> I can match the document with an exact query on the string:
>
>q=s_title:"FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT"
>
> I can match the document with this phrase query on the text:
>
>q=t_title:"future directions"
>
> which uses the parsedquery shown by "&debugQuery=true":
>
>t_title:"future directions"
>t_title:"future directions"
>PhraseQuery(t_title:"futur direct")
>t_title:"futur direct"
>
> Similarly, I can match the document with this query:
>
>q=t_title:"integrated catchment"
>
> which uses the parsedquery shown by "&debugQuery=true":
>
>t_title:"integrated catchment"
>t_title:"integrated catchment"
>PhraseQuery(t_title:"integr catchment")
>t_title:"integr catchment"
>
> But I can not match the document with the query:
>
>q=t_title:"future directions for integrated catchment"
>
> which uses the phrase query shown by "&debugQuery=true":
>
>
>t_title:"future directions for integrated catchment"
>
>t_title:"future directions for integrated catchment"
>
>PhraseQuery(t_title:"futur direct integr catchment")
>
>t_title:"futur direct integr catchment"
>
> Any wisdom gratefully accepted.
>
> Cheers,
>
>
> --
> Phil
>
> 640K ought to be enough for anybody.
>-- Bill Gates, in 1981
>


Re: French and SpellingQueryConverter

2009-05-07 Thread Jay Hill
It seems to me that this is just the expected behavior of the FrenchAnalyzer
using the FrenchStemmer. I'm not familiar with the French language, but in
English words like running, runner, and runs are all stemmed down to "run"
as intended. I don't know what other words in French would stem down to
"franc", but wouldn't this be what you would want? If not, maybe experiment
with some of the other Analyzers to see if they give you what you need.

-Jay

On Thu, May 7, 2009 at 6:51 AM, Jonathan Mamou  wrote:

>
> Hi
> I have tried to run the following code
> package org.apache.solr.spelling;
>
> import org.apache.lucene.analysis.fr.FrenchAnalyzer;
>
>
> public class Test {
>
>  public static void main (String args[]) {
>SpellingQueryConverter sqc = new SpellingQueryConverter();
>sqc.analyzer = new FrenchAnalyzer();
>System.out.println(sqc.convert("français"));
>  };
>
> }};
>
> I would expect to get [(français,0,8,type=)]
> However I get [(fran,0,4,type=), (ais,5,8,type=)]
> Is there any issue with the support of special characters?
> Thanks
> Jonathan
>
>


RE: What are the Unicode encodings supported by Solr?

2009-05-07 Thread Steven A Rowe
Hi KK,

On 5/7/2009 at 2:55 AM, KK wrote:
> In some of the pages I'm getting some \ufffd chars which I think is
> some sort of unmappable[by Java?] character, right?. Any idea on how
> to handle this? Just replacing with blank char will not do [this
> depends on the requirement, though].

>From :

FFFD: REPLACEMENT CHARACTER: used to replace an
incoming character whose value is unknown or
unrepresentable in Unicode.

Also, from :

Applications are free to use any of these noncharacter
code points internally but should never attempt to
exchange them. If a noncharacter is received in open
interchange, an application is not required to
interpret it in any way. It is good practice, however,
to recognize it as a noncharacter and to take
appropriate action, such as replacing it with U+FFFD
REPLACEMENT CHARACTER, to indicate the problem in the
text. It is not recommended to simply delete
noncharacter code points from such text, because of
the potential security issues caused by deleting
uninterpreted characters. (See conformance clause C7
in Section 3.2, Conformance Requirements, and Unicode
Technical Report #36, "Unicode Security
Considerations.")

So if you're seeing \ufffd in text, you (or someone before you in the 
processing chain) attempted to convert the text from some other encoding into 
Unicode, but the encoding conversion failed (no target Unicode character 
corresponding to the source character).  This can happen when attempting to 
convert from an incorrectly identified source encoding.

Steve



Sorting by 'starts with'

2009-05-07 Thread wojtekpia

I have an index of product names. I'd like to sort results so that entries
starting with the user query come first. 
E.g. 

q=kitchen

Results would sort something like:
1. kitchen appliance
2. kitchenaid dishwasher
3. fridge for kitchen

It looks like using a query Function Query comes close, but I don't know how
to write a subquery that only matches if the value starts with the query
string. 

Has anyone solved a similar need?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Sorting-by-%27starts-with%27-tp23432815p23432815.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr spring application context error

2009-05-07 Thread Raju444us

I have configured solr using tomcat.Everything works fine.I overrode
QParserPlugin and configured it.The overriden QParserPlugin has a dependency
on another project say project1.So I made a jar of the project and copied
the jar to the solr/home lib dir.

the project1 project is using spring.It has a factory class which loads the
beans.Iam using this factory calss in QParserPlugin to get a bean.When I
start my tomcat the factory class is loading fine.But the problem is its not
loading the beans.And Iam getting exception 

org.springframework.beans.factory.BeanDefinitionStoreException: IOException
parsing XML document from class path resource
[com/mypackage/applicationContext.xml]; nested exception is
java.io.FileNotFoundException: class path resource
[com/mypackage/applicationContext.xml] cannot be opened because it does not
exist

Do I need to do something else?. Can anybody please help me.

Thanks,
Raju


-- 
View this message in context: 
http://www.nabble.com/Solr-spring-application-context-error-tp23432901p23432901.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solrcofig.xml - need some info

2009-05-07 Thread Raju444us

This is resolved.I solved this by reading solrPlugins on the solr wiki.

Thanks,
Raju

Raju444us wrote:
> 
> Hi Hoss,
> 
> If i extend SolrQueryParser and override method getFieldQuery for some
> customization.Can I configure my new queryParser somthing like below
> 
>   
> 
>  
>explicit
>
>  
>   
> 
> Do I need to place my new Parser class in solr/home/lib folder?
> Is this the right way to do this.
> 
> Thanks,
> Raju
> 
> 
> 
> 
> hossman wrote:
>> 
>> : I am pretty new to solr. I was wondering what is this "mm" attribute in
>> : requestHandler in solrconfig.xml and how it works. Tried to search wiki
>> : could not find it
>> 
>> Hmmm... yeah wiki search does mid-word matching doesn't it?
>> 
>> the key thng to realize is that the requestHandler you were looking at 
>> when you saw that option was the DisMaxRequestHandler...
>> 
>>  http://wiki.apache.org/solr/DisMaxRequestHandler
>> 
>> 
>> 
>> -Hoss
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/solrcofig.xml---need-some-info-tp15341858p23433477.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: large index vs multicore

2009-05-07 Thread Manepalli, Kalyan
The manual suggested by Otis would happen inside of Solr. We use the 
last-component to do the sub-query and then merge the results. Since it's a new 
sub-query the relevancy and sorting should be independent of the main query.

Thanks,
Kalyan Manepalli
-Original Message-
From: Nicolas Pastorino [mailto:n...@ez.no]
Sent: Thursday, May 07, 2009 10:21 AM
To: solr-user@lucene.apache.org
Subject: Re: large index vs multicore

Hi, and sorry for slightly hijacking the thread,

On Mar 26, 2009, at 2:54 , Otis Gospodnetic wrote:

>
> Hi,
>
> Without knowing the details, I'd say keep it in the same index if
> the additional information shares some/enough fields with the main
> product data and separately if it's sufficiently distinct (this
> also means 2 queries and manual merging/joining).

Where would this manual merging/joining occur? At the client-side or
inside Solr, before returning the results ?
I was wondering what relevancy, sorting, etc. would become.
--
Nicolas

>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: "Manepalli, Kalyan" 
>> To: "solr-user@lucene.apache.org" 
>> Sent: Wednesday, March 25, 2009 5:46:40 PM
>> Subject: large index vs multicore
>>
>> Hi All,
>> In my project, I have one primary core containing all
>> the basic
>> information for a product.
>> Now I need to add additional information which will be searched
>> and displayed in
>> conjunction with the product results.
>> My question is - From design and query speed point of - should I
>> add new core to
>> handle the additional data or should I add the data to the
>> existing core.
>>
>> The data size is not very large around 150,000 - 200,000 documents.
>>
>> Any insights into this will be helpful
>>
>> Thanks,
>> Kalyan Manepalli
>

--
Nicolas Pastorino
Consultant - Trainer - System Developer
Phone :  +33 (0)4.78.37.01.34
eZ Systems ( Western Europe )  |  http://ez.no






Re: Solr spring application context error

2009-05-07 Thread Erik Hatcher
This is probably because Solr loads its extensions from a custom class  
loader, but if that class then needs to access things from the  
classpath, it is only going to see the built-in WEB-INF/lib classes,  
not solr/home lib JAR files.  Maybe there is a Spring way to point it  
at that lib directory also?  This is the kinda pain we get, it seems,  
when reinventing a container, unfortunately.


Erik

On May 7, 2009, at 2:49 PM, Raju444us wrote:



I have configured solr using tomcat.Everything works fine.I overrode
QParserPlugin and configured it.The overriden QParserPlugin has a  
dependency
on another project say project1.So I made a jar of the project and  
copied

the jar to the solr/home lib dir.

the project1 project is using spring.It has a factory class which  
loads the
beans.Iam using this factory calss in QParserPlugin to get a  
bean.When I
start my tomcat the factory class is loading fine.But the problem is  
its not

loading the beans.And Iam getting exception

org.springframework.beans.factory.BeanDefinitionStoreException:  
IOException

parsing XML document from class path resource
[com/mypackage/applicationContext.xml]; nested exception is
java.io.FileNotFoundException: class path resource
[com/mypackage/applicationContext.xml] cannot be opened because it  
does not

exist

Do I need to do something else?. Can anybody please help me.

Thanks,
Raju


--
View this message in context: 
http://www.nabble.com/Solr-spring-application-context-error-tp23432901p23432901.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Control segment size

2009-05-07 Thread vivek sar
Thanks Otis.

I did set the maxMergeDocs to 10M, but I still see couple of index
files over 30G which do not match with max number of documents. Here
are some numbers,

1) My total index size = 66GB
2) Number of total documents = 200M
3) 1M doc = 300MB
4) 10M doc should be roughly around 3-4GB.

Under the index I see,

-rw-r--r--   1 dssearch  staff  31771545312 May  6 14:15 _2tp.cfs
-rw-r--r--   1 dssearch  staff  31932190573 May  7 08:13 _5ne.cfs
-rw-r--r--   1 dssearch  staff543118747 May  7 08:32 _5p2.cfs
-rw-r--r--   1 dssearch  staff543124452 May  7 08:53 _5qr.cfs
-rw-r--r--   1 dssearch  staff543100201 May  7 09:18 _5sg.cfs
..
..

As you can see couple of files are huge. Are those documents or index
files? How can I control the file size so no single file grows more
than 10GB.

Thanks,
-vivek



On Thu, Apr 23, 2009 at 10:26 AM, Otis Gospodnetic
 wrote:
>
> Hi,
>
> You are looking for maxMergeDocs, I believe.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, April 23, 2009 1:08:20 PM
>> Subject: Control segment size
>>
>> Hi,
>>
>>   Is there any configuration to control the segments' file size in
>> Solr? Currently, I've an index (70G) with 80 segment files and one of
>> the file is 24G. We noticed that in some cases commit takes over 2
>> hours to complete (committing 50K records), whereas usually it
>> finishes in 20 seconds. After further investigation it turns out the
>> system was doing lot of paging - the file system buffer was trying to
>> write back the big segment back to disk. I got 20G memory on system
>> with 6 G assigned to Solr instance (running 2 instances).
>>
>> It seems if I can control the segment size to max of 4-5 GB I'll be
>> ok. Is there any way to do so?
>>
>> I got merging factor of 100 - does that impacts the size too? Why
>> different segments have different size?
>>
>> Thanks,
>> -vivek
>
>


Backups using Java-based Replication (forced snapshot)

2009-05-07 Thread Grant Ingersoll
On the page http://wiki.apache.org/solr/SolrReplication, it says the  
following:
"Force a snapshot on master.This is useful to take periodic  
backups .command :  http://master_host:port/solr/replication? 
command=snapshoot"


This then puts the snapshot under the data directory.  Perfectly  
reasonable thing to do.  However, is it possible to have it take in a  
directory location and store the snapshot there?  For instance, I may  
want to have it write to a specific directory that is being watched  
for backup data.


Thanks,
Grant


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: French and SpellingQueryConverter

2009-05-07 Thread Jonathan Mamou
Hi
It does not seem to be related to FrenchStemmer, the stemmer does not split
a word into 2 words. I have checked with other words and
SpellingQueryConverter always splits words with special character.
I think that the issue is in SpellingQueryConverter class
Pattern.compile.("(?:(?!(\\w+:|\\d+)))\\w+");?:
According to
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html,
\w A word character: [a-zA-Z_0-9]
I think that special character should also be added to the regex.
Best regards,
Jonathan


   
 Jay Hill  
 To
   solr-user@lucene.apache.org 
 07/05/2009 20:33   cc
   
   Subject
 Please respond to Re: French and  
 solr-u...@lucene. SpellingQueryConverter  
apache.org 
   
   
   
   
   




It seems to me that this is just the expected behavior of the
FrenchAnalyzer
using the FrenchStemmer. I'm not familiar with the French language, but in
English words like running, runner, and runs are all stemmed down to "run"
as intended. I don't know what other words in French would stem down to
"franc", but wouldn't this be what you would want? If not, maybe experiment
with some of the other Analyzers to see if they give you what you need.

-Jay

On Thu, May 7, 2009 at 6:51 AM, Jonathan Mamou  wrote:

>
> Hi
> I have tried to run the following code
> package org.apache.solr.spelling;
>
> import org.apache.lucene.analysis.fr.FrenchAnalyzer;
>
>
> public class Test {
>
>  public static void main (String args[]) {
>SpellingQueryConverter sqc = new SpellingQueryConverter();
>sqc.analyzer = new FrenchAnalyzer();
>System.out.println(sqc.convert("français"));
>  };
>
> }};
>
> I would expect to get [(français,0,8,type=)]
> However I get [(fran,0,4,type=), (ais,5,8,type=)]
> Is there any issue with the support of special characters?
> Thanks
> Jonathan
>
>




Autocommit blocking adds? AutoCommit Speedup?

2009-05-07 Thread Jim Murphy

Question 1: I see in DirectUpdateHandler2 that there is a read/Write lock
used between addDoc and commit.  

My mental model of the process was this: clients can add/update documents
until the auto commit threshold was hit.  At that point the commit tracker
would schedule a background commit.  The commit would run and NOT BLOCK
subsequent adds.  clearly thast not happening because when the autocommit
background thread runs it gets the iwCommit lock blocking anyone in addDoc
trying to get iwAccess lock.

Is this just the way it is or is it possible to configure Solr to process
the pending documents int he background, queuing new documents in memory as
before.  

Question 2: I ask this question because autocommits are taking a LONG time
to complete, like 10-25 seconds.  I have a 40M document index many 10s of
GBs.  What can I do to speed this up?

Thanks

Jim
-- 
View this message in context: 
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23435224.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-07 Thread Yonik Seeley
On Thu, May 7, 2009 at 5:03 PM, Jim Murphy  wrote:
> Question 1: I see in DirectUpdateHandler2 that there is a read/Write lock
> used between addDoc and commit.
>
> My mental model of the process was this: clients can add/update documents
> until the auto commit threshold was hit.  At that point the commit tracker
> would schedule a background commit.  The commit would run and NOT BLOCK
> subsequent adds.  clearly thast not happening because when the autocommit
> background thread runs it gets the iwCommit lock blocking anyone in addDoc
> trying to get iwAccess lock.

Background: in the past, you had to close the Lucene IndexWriter so
all changes would be flushed to disk (and you could then open a new
IndexReader to seel the changes).  You obviously can't be adding new
documents while you're trying to close the writer - hence the locking.
 It as also the case that readers and writers had to be opened and
closed in the right way to handle things like deletes (which had to go
through the reader).

This is no longer the case, and we should revisit the locking.  I do
think we should be able to continue indexing while doing a commit.

-Yonik
http://www.lucidimagination.com


preImportDeleteQuery

2009-05-07 Thread wojtekpia

Hi,
I'm importing data using the DIH. I manage all my data updates outside of
Solr, so I use the full-import command to update my index (with
clean=false). Everything works fine, except that I can't delete documents
easily using the DIH. I noticed the preImportDeleteQuery attribute, but
doesn't seem to do what I'm looking for. I'm looking to do something like:

preImportDeleteQuery="ItemId={select ItemId from table where
status='delete'}"

http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059  seems to address
this, but I couldn't find any documentation for it in the wiki. Can someone
provide an example of how to use this?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr autocompletion in rails

2009-05-07 Thread Matt Weber
First, your solrconfig.xml should have the something similar to the  
following:


  class="org.apache.solr.handler.component.TermsComponent"/>


  class="org.apache.solr.handler.component.SearchHandler">


  termsComp

  

This will give you a request handler called "/autoSuggest" that you  
will use for suggestions.


Then you need to write some rails code to access this.  I am not very  
familiar with ruby, but I believe you might want to try http://wiki.apache.org/solr/solr-ruby 
.  Make sure you set your query type to "/autoSuggest".  If that won't  
work for you, then just use the standard http libraries to access the  
autoSuggest url directly and get json output.


With any of these methods make sure you set the following parameters:

terms=true
terms.fl=source_field
terms.lower=input_term
terms.prefix=input_term
terms.lower.incl=false

For direct access to the json output you will  want these as well:

indent=true
wt=json

The terms.fl parameter specifys the field(s) you want to use as the  
source for suggestions.  Make sure this field has very little  
processing done on it, maybe lowercasing and tokenization only.


Here is an example url that should give you some output once things  
are working:


http://localhost:8983/solr/autoSuggest?terms=true&terms.fl=spell&terms.lower=t&terms.prefix=t&terms.lower.incl=false&indent=true&wt=json

The next thing is to parse the json output and do whatever you want  
with the results.  In my example, I just printed out each suggestion  
on a single line of the response because this is what the jQuery  
autocomplete plugin wanted.  The easiest way to parse the json output  
is to use the json ruby library, http://json.rubyforge.org/.


After you have your rails controller working you can hook it into your  
FE with some javascript like I did in the example on my blog.  Hope  
this helps.


Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com




On May 7, 2009, at 7:37 AM, manisha_5 wrote:



Thanks a lot for the information. But I am still a bit confused  
about the use
of TermsComponents. Like where are we exactly going to put these  
codes in
Solr.For example I changed schema.xml to add autocomplete feauture.I  
read

your blog too, its very helpful.But still a little confused. :-((
Can you explain it a bit?



Matt Weber-2 wrote:


You will probably want to use the new TermsComponent in Solr 1.4.   
See

http://wiki.apache.org/solr/TermsComponent
.  I just recently wrote a blog post about using autocompletion with
TermsComponent, a servlet, and jQuery.  You can probably follow these
instructions, but instead of writing a servlet you can write a rails
handler parsing the json output directly.

http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/
.

Thanks,

Matt Weber



On May 4, 2009, at 9:39 AM, manisha_5 wrote:



Hi,

I am new to solr. I am using solr server to index the data and make
search
in a Ruby on rails project.I want to add autocompletion feature. I
tried
with the xml patch in the schema.xml file of solr, but dont know how
to test
if the feature is working.also havent been able to integrate the
same in the
Rails project that is using Solr.Can anyone please provide some help
in this
regards??

the patch of codes in Schema.xml is :


  
  
  
  
  
  
  
  
  
  
  
  
 

--
View this message in context:
http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23372020.html
Sent from the Solr - User mailing list archive at Nabble.com.







--
View this message in context: 
http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23428267.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: preImportDeleteQuery

2009-05-07 Thread Martin Davidsson

On May 7, 2009, at 4:52 PM, wojtekpia wrote:


Hi,
I'm importing data using the DIH. I manage all my data updates  
outside of

Solr, so I use the full-import command to update my index (with
clean=false). Everything works fine, except that I can't delete  
documents
easily using the DIH. I noticed the preImportDeleteQuery attribute,  
but
doesn't seem to do what I'm looking for. I'm looking to do something  
like:


preImportDeleteQuery="ItemId={select ItemId from table where
status='delete'}"

http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059  seems to  
address
this, but I couldn't find any documentation for it in the wiki. Can  
someone

provide an example of how to use this?

Thanks,

Wojtek
--
View this message in context: 
http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html
Sent from the Solr - User mailing list archive at Nabble.com.


I haven't used those special variables but I noticed an example of  
$skipDoc in the wiki under the "Indexing Wikipedia" example (http://wiki.apache.org/solr/DataImportHandler 
).


-- Martin

bug? No highlighting results with dismax and q.alt=*:*

2009-05-07 Thread Peter Wolanin
For the Drupal Apache Solr Integration module, we are exploring the
possibility of doing facet browsing  - since we are using dismax as
the default handler, this would mean issuing a query with an empty q
and falling back to to q.alt='*:*' or some other q.alt that matches
all docs.

However, I notice when I do this that we do not get any highlights
back in the results despite defining a highlight alternate field.

In contrast, if I force the standard request handler then I do get
text back from the highlight alternate field:

select/?q=*:*&qt=standard&hl=true&hl.fl=body&hl.alternateField=body&hl.maxAlternateFieldLength=256

However, I then loose the nice dismax features of weighting the
results using bq and bf parameters.  So, is this a bug or the intended
behavior?

The relevant fragment of the solrconfig.xml is this:

  

 dismax

 *:*

   
 true
 body
 3
 true
   
 body
 256


Full solrconfig.xml and other files:
http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/?pathrev=DRUPAL-6--1

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-07 Thread Jim Murphy

Interesting.  So is there a JIRA ticket open for this already? Any chance of
getting it into 1.4?  Its seriously kicking out butts right now.  We write
into our masters with ~50ms response times till we hit the autocommit then
add/update response time is 10-30 seconds.  Ouch.

I'd be willing to work on submitting a patch given a better understanding of
the issue. 

Jim
-- 
View this message in context: 
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23438134.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Phrase matching on a text field

2009-05-07 Thread Phil Chadwick
Hi Jay

Thank you for your response.

The data relating to the string (s_title) defines *exactly* what was
fed into the SOLR indexing.  The string is not otherwise relevant to
the question.

The essence of my question is why can the indexed text (t_title) not
be phrase matched by the query on the text when the word "for" is
present in the query.

The following work (and I would expect them to work):

q=s_title:"FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT"
q=t_title:"future directions"
q=t_title:"integrated catchment"

The following do not work (and I would expect them to work):

q=t_title:"directions for integrated"

The following do not work (not sure if I expect them to work or not):

q=t_title:"directions integrated"

My reading is that if the "FOR" is removed in the text indexing, it
should also be removed for the text query!

I also added 'enablePositionIncrements="true"' to the text query analyzer
to make it the same as the text index analyzer:



There was no change in the outcome.

The definitions for text and string were exactly as in the SOLR 1.3
example schema (shown below).

The section of that schema for "text" is shown below.



  






  

  



  />




  




Cheers,


-- 
Phil

The art of being wise is the art of knowing what to overlook.
-- William James



Jay Hill wrote:
>
> The string fieldtype is not being tokenized, while the text fieldtype is
> tokenized. So the stop word "for" is being removed by a stop word filter,
> which doesn't happen with the text field type (no tokenizing).
> 
> Have a look at the schema.xml in the example dir and look at the default
> configuration for both the text and string fieldtypes. String string
> fieldtype is not analyzed whereas the text fieldtype has a number of
> different filters that take action.

> On Wed, May 6, 2009 at 11:09 PM, Phil Chadwick
> wrote:
> 
> > Hi,
> >
> > I'm trying to figure out why phrase matching on a text field only works
> > some of the time.
> >
> > I have a SOLR index containing a document titled "FUTURE DIRECTIONS FOR
> > INTEGRATED CATCHMENT".  The "FOR" seems to be causing a problem...
> >
> > The title field is indexed as both s_title and t_title (string and text,
> > as defined in the demo schema), thus:
> >
> > >multiValued="false" />
> > >multiValued="false" />
> > >multiValued="false" />
> >
> >
> >
> > I can match the document with an exact query on the string:
> >
> >q=s_title:"FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT"
> >
> > I can match the document with this phrase query on the text:
> >
> >q=t_title:"future directions"
> >
> > which uses the parsedquery shown by "&debugQuery=true":
> >
> >t_title:"future directions"
> >t_title:"future directions"
> >PhraseQuery(t_title:"futur direct")
> >t_title:"futur direct"
> >
> > Similarly, I can match the document with this query:
> >
> >q=t_title:"integrated catchment"
> >
> > which uses the parsedquery shown by "&debugQuery=true":
> >
> >t_title:"integrated catchment"
> >t_title:"integrated catchment"
> >PhraseQuery(t_title:"integr catchment")
> >t_title:"integr catchment"
> >
> > But I can not match the document with the query:
> >
> >q=t_title:"future directions for integrated catchment"
> >
> > which uses the phrase query shown by "&debugQuery=true":
> >
> >
> >t_title:"future directions for integrated catchment"
> >
> >t_title:"future directions for integrated catchment"
> >
> >PhraseQuery(t_title:"futur direct integr catchment")
> >
> >t_title:"futur direct integr catchment"
> >
> > Any wisdom gratefully accepted.
> >
> > Cheers,
> >
> >
> > --
> > Phil
> >
> > 640K ought to be enough for anybody.
> >-- Bill Gates, in 1981
> >


StatsComponent and 1.3

2009-05-07 Thread David Shettler
Foreword:  I'm not a java developer :)

OSVDB.org and datalossdb.org make use of solr pretty extensively via
acts_as_solr.

I found myself with a real need for some of the StatsComponent stuff
(mainly the sum feature), so I pulled down a nightly build and played
with it.  StatsComponent proved perfect, but... the nightly build
output seems to be different, and thus incompatible with acts_as_solr.

Now, I realize this is more or less an acts_as_solr issue, but...

Is it possible, with some degree of effort (obviously) for me to
essentially port some of the functionality of StatsComponent to 1.3
myself?  It's that, or waiting for 1.4 to come out and someone
developing support for it into acts_as_solr, or myself fixing what I
have for acts_as_solr to work with the output.  I'm just trying to
gauge the easiest solution :)

Any feedback or suggestions would be grand.

Thanks,

Dave
Open Security Foundation


Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-07 Thread Yonik Seeley
On Thu, May 7, 2009 at 8:37 PM, Jim Murphy  wrote:
> Interesting.  So is there a JIRA ticket open for this already? Any chance of
> getting it into 1.4?

No ticket currently open, but IMO it could make it for 1.4.

> Its seriously kicking out butts right now.  We write
> into our masters with ~50ms response times till we hit the autocommit then
> add/update response time is 10-30 seconds.  Ouch.

It's probably been made a little worse lately since Lucene now does
fsync on index files before writing the segments file that points to
those files.  A necessary evil to prevent index corruption.

> I'd be willing to work on submitting a patch given a better understanding of
> the issue.

Great, go for it!

-Yonik
http://www.lucidimagination.com


Re: Backups using Java-based Replication (forced snapshot)

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
makes sense. I'll open an issue

On Fri, May 8, 2009 at 1:53 AM, Grant Ingersoll  wrote:
> On the page http://wiki.apache.org/solr/SolrReplication, it says the
> following:
> "Force a snapshot on master.This is useful to take periodic backups .command
> :  http://master_host:port/solr/replication?command=snapshoot";
>
> This then puts the snapshot under the data directory.  Perfectly reasonable
> thing to do.  However, is it possible to have it take in a directory
> location and store the snapshot there?  For instance, I may want to have it
> write to a specific directory that is being watched for backup data.
>
> Thanks,
> Grant
>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: preImportDeleteQuery

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
are you doing a full-import or a delta-import?

for delta-import there is an option of deletedPkQuery which should
meet your needs

On Fri, May 8, 2009 at 5:22 AM, wojtekpia  wrote:
>
> Hi,
> I'm importing data using the DIH. I manage all my data updates outside of
> Solr, so I use the full-import command to update my index (with
> clean=false). Everything works fine, except that I can't delete documents
> easily using the DIH. I noticed the preImportDeleteQuery attribute, but
> doesn't seem to do what I'm looking for. I'm looking to do something like:
>
> preImportDeleteQuery="ItemId={select ItemId from table where
> status='delete'}"
>
> http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059  seems to address
> this, but I couldn't find any documentation for it in the wiki. Can someone
> provide an example of how to use this?
>
> Thanks,
>
> Wojtek
> --
> View this message in context: 
> http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Solr spring application context error

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
a point to keep in mind is that all the plugin code and everything
else must be put into the solrhome/lib directory.

where have you placed the file com/mypackage/applicationContext.xml ?

On Fri, May 8, 2009 at 12:19 AM, Raju444us  wrote:
>
> I have configured solr using tomcat.Everything works fine.I overrode
> QParserPlugin and configured it.The overriden QParserPlugin has a dependency
> on another project say project1.So I made a jar of the project and copied
> the jar to the solr/home lib dir.
>
> the project1 project is using spring.It has a factory class which loads the
> beans.Iam using this factory calss in QParserPlugin to get a bean.When I
> start my tomcat the factory class is loading fine.But the problem is its not
> loading the beans.And Iam getting exception
>
> org.springframework.beans.factory.BeanDefinitionStoreException: IOException
> parsing XML document from class path resource
> [com/mypackage/applicationContext.xml]; nested exception is
> java.io.FileNotFoundException: class path resource
> [com/mypackage/applicationContext.xml] cannot be opened because it does not
> exist
>
> Do I need to do something else?. Can anybody please help me.
>
> Thanks,
> Raju
>
>
> --
> View this message in context: 
> http://www.nabble.com/Solr-spring-application-context-error-tp23432901p23432901.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Fwd: Solr MultiCore dataDir bug - a fix

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
I didn't notice that the mail was not sent to the list. Plz send all
your communication to the mailing list


-- Forwarded message --
From: Noble Paul നോബിള്‍  नोब्ळ् 
Date: 2009/5/8
Subject: Re: Solr MultiCore dataDir bug - a fix
To: pasi.j.matilai...@tieto.com


are you sure that your solrconfig.xml does not have a  tag ?
If it is there, then it is supposed to take precedence over the one
you have put in solr.xml

On Fri, May 8, 2009 at 10:43 AM,   wrote:
> Hello,
>
> I encountered yesterday the problem that MultiCore Solr doesn't handle
> properly the dataDir setting in solr.xml, regardless of whether it's
> specified as a nested property or as an attribute to core element. I found a
> mail thread where you on March 4th, 2009 promised to have it fixed in a day
> or two.
> (http://markmail.org/message/oylfeldy53lebsfe#query:solr%20multicore%20datadir+page:1+mid:abfbhdxxt3r3zujs+state:results)
>
> Anyway, as the current Solr trunk didn't contain the fix, I hunted the bug
> down myself. And as I don't want to take the time to get account to update
> the patch to Solr SVN myself, I'm sending the fix to you instead.
>
> In current trunk, in SolrCore constructor, at line 491, there currently is:
>
>     if (dataDir == null)
>     dataDir = config.get("dataDir",cd.getDataDir());
>
> I replaced this with the following code:
>
>     if (dataDir == null) {
>  if (cd.getDataDir()!=null)
>  dataDir = cd.getDataDir();
>  else
>     dataDir = config.get("dataDir",cd.getDefaultDataDir());
>     }
>
> I'm not sure this fully represents how this is supposed to work, but it
> works anyway. At least when I specify dataDir as an attribute to core
> element with a path relative to instanceDir:
>
>     
>     
>
> Best regards,
>
> Pasi J. Matilainen, Software Engineer
> Tieto Finland Oy, R&D Services, Devices R&D
> pasi.j.matilai...@tieto.com, mobile +358 (0)40 575 7738, fax +358 (0)14 618
> 566
> Visiting address: Mattilanniemi 6, 40101 JYVÄSKYLÄ, Mailing address: P.O.
> Box 163, 40101 JYVÄSKYLÄ, Finland, www.tieto.com
>
> Meet the new Tieto: www.tieto.com/newtieto
> Please note: The information contained in this message may be legally
> privileged and confidential and protected from disclosure. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> unauthorised use, distribution or copying of this communication is strictly
> prohibited. If you have received this communication in error, please notify
> us immediately by replying to the message and deleting it from your
> computer. Thank You.
>
>
>



--
-
Noble Paul | Principal Engineer| AOL | http://aol.com



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Phrase matching on a text field

2009-05-07 Thread Phil Chadwick
Hi,

I have tracked this problem to:

  https://issues.apache.org/jira/browse/SOLR-879

Executive summary is that there are errors that relate to
text fields in both:

  - src/java/org/apache/solr/search/SolrQueryParser.java
  - example/solr/conf/schema.xml

It is fixed in 1.4.

Thank you Yonik Seeley for the original diagnosis and fix.

Cheers,


-- 
Phil

It may be that your sole purpose in life is simply to serve as a
warning to others.



Phil Chadwick wrote:

> Hi Jay
> 
> Thank you for your response.
> 
> The data relating to the string (s_title) defines *exactly* what was
> fed into the SOLR indexing.  The string is not otherwise relevant to
> the question.
> 
> The essence of my question is why can the indexed text (t_title) not
> be phrase matched by the query on the text when the word "for" is
> present in the query.
> 
> The following work (and I would expect them to work):
> 
> q=s_title:"FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT"
> q=t_title:"future directions"
> q=t_title:"integrated catchment"
> 
> The following do not work (and I would expect them to work):
> 
> q=t_title:"directions for integrated"
> 
> The following do not work (not sure if I expect them to work or not):
> 
> q=t_title:"directions integrated"
> 
> My reading is that if the "FOR" is removed in the text indexing, it
> should also be removed for the text query!
> 
> I also added 'enablePositionIncrements="true"' to the text query analyzer
> to make it the same as the text index analyzer:
> 
>ignoreCase="true"
>   words="stopwords.txt"
>   enablePositionIncrements="true"/>
> 
> There was no change in the outcome.
> 
> The definitions for text and string were exactly as in the SOLR 1.3
> example schema (shown below).
> 
> The section of that schema for "text" is shown below.
> 
> 
> 
>   
> 
>ignoreCase="true"
>   words="stopwords.txt"
>   enablePositionIncrements="true"/>
>generateWordParts="1"
>   generateNumberParts="1"
>   catenateWords="1"
>   catenateNumbers="1"
>   catenateAll="0"
>   splitOnCaseChange="1"/>
> 
>protected="protwords.txt"/>
> 
>   
> 
>   
> 
>synonyms="synonyms.txt"
>   ignoreCase="true"
>   expand="true"/>
>ignoreCase="true"
>   words="stopwords.txt"
>   
>   />
>generateWordParts="1"
>   generateNumberParts="1"
>   catenateWords="0"
>   catenateNumbers="0"
>   catenateAll="0"
>   splitOnCaseChange="1"/>
> 
>protected="protwords.txt"/>
> 
>   
> 
> 
> 
> 
> Cheers,
> 
> 
> -- 
> Phil
> 
> The art of being wise is the art of knowing what to overlook.
>   -- William James
> 
> 
> 
> Jay Hill wrote:
> >
> > The string fieldtype is not being tokenized, while the text fieldtype is
> > tokenized. So the stop word "for" is being removed by a stop word filter,
> > which doesn't happen with the text field type (no tokenizing).
> > 
> > Have a look at the schema.xml in the example dir and look at the default
> > configuration for both the text and string fieldtypes. String string
> > fieldtype is not analyzed whereas the text fieldtype has a number of
> > different filters that take action.
> 
> > On Wed, May 6, 2009 at 11:09 PM, Phil Chadwick
> > wrote:
> > 
> > > Hi,
> > >
> > > I'm trying to figure out why phrase matching on a text field only works
> > > some of the time.
> > >
> > > I have a SOLR index containing a document titled "FUTURE DIRECTIONS FOR
> > > INTEGRATED CATCHMENT".  The "FOR" seems to be causing a problem...
> > >
> > > The title field is indexed as both s_title and t_title (string and text,
> > > as defined in the demo schema), thus:
> > >
> > > > >multiValued="false" />
> > > > >multiValued="false" />
> > > > >multiValued="false" />
> > >
> > >
> > >
> > > I can match the document with an exact query on the string:
> > >
> > >q=s_title:"FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT"
> > >
> > > I can match the document with this phrase query on the text:
> > >
> > >q=t_title:"future directions"
> > >
> > > which uses the parsedquery shown by "&debugQuery=true":
> > >
> > >t_title:"future directions"
> > >t_title:"future directions"
> > >PhraseQuery(t_title:"futur direct")
> > >t_title:"futur direct"
> > >
> > > Similarly, I can match the document with this query:
> > >
> > >q=t_title:"integrated catchment"
> > >
> > > which uses the parsedquery shown by "&debugQuery=true":
> > >
> > >t_title:"integrated catchment"
> > >t_title:"integrated catchment"
> > >PhraseQuery(t_title:"integr catchment")
> > >t_title:"integr catchment"
> > >
> > > But I can not match the document with the query:
> > >
> > >q=t_title:"future directions for integrated catchment"
> > >
> > > which uses the phrase query shown by "&debugQuery=true":
> > >
> > >
> > >t_title:"future directions for integrate

RE: Core Reload issue

2009-05-07 Thread Sagar Khetkade

 

>From my understanding re-indexing the documents is a different thing. If you 
>have the stop word filter for field type say "text" then after reloading the 
>core if i type in a query which is stop word only it would get parsed from 
>stop word filter which would eventually will not serach against the index.

But in my case i am getting the results having the stop word; so the issue.

 

~Sagar 

 

 
> From: noble.p...@gmail.com
> Date: Tue, 5 May 2009 10:09:29 +0530
> Subject: Re: Core Reload issue
> To: solr-user@lucene.apache.org
> 
> If you change the the conf files and if you reindex the documents it
> must be reflected are you sure you re-indexed?
> 
> On Tue, May 5, 2009 at 10:00 AM, Sagar Khetkade
>  wrote:
> >
> > Hi,
> >
> > I came across a strange problem while reloading the core in multicore 
> > scenario. In the config of one of the core I am making changes in the 
> > synonym and stopword files and then reloading the core. The core gets 
> > reloaded but the changes in stopword and synonym fiels does not get 
> > reflected when I query in. The filters for index and query are the same. I 
> > face this problem even if I reindex the documents. But when I restart the 
> > servlet container in which the solr is embedded I problem does not 
> > resurfaces.
> > My ultimate goal is/was to reflect the changes made in the text files 
> > inside the config folder.
> > Is this the expected behaviour or some problem at my side. Could anyone 
> > suggest me the possible work around?
> >
> > Thanks in advance!
> >
> > Regards,
> > Sagar Khetkade
> > _
> > More than messages–check out the rest of the Windows Live™.
> > http://www.microsoft.com/india/windows/windowslive/
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com

_
Planning the weekend ? Here’s what is happening in your town.
http://msn.asklaila.com/events/

Wildcard Search

2009-05-07 Thread dabboo

Hi,

I am facing a n wierd issue while searching.

I am searching for word *sytem*, it displays all the records which contains
system, systems etc. But when I tried to search *systems*, it only returns
me those records, which have systems-, systems/ etc etc. It is considering
wildcard as 1 or more character and not zero character.

So, it is not returning records which has systems has one word. Is there any
way to resolve this.

Please suggest.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Wildcard-Search-tp23440795p23440795.html
Sent from the Solr - User mailing list archive at Nabble.com.