Multiple Search in Solr

2008-02-04 Thread Niveen Nagy
Hello ,

 

I have a question concerning solr multiple indices. We have 4 solr
indices in our system and we want to use distributed search (Multiple
search) that searches in the four indices in parallel. We downloaded the
latest code from svn and we applied the patch distributed.patch but we
need more detailed description on how to use this patch and what changes
should be applied to solr schema, and how these indices should be
located. Another question here is could the steps be applied to our
indices that was built using a version before applying the distributed
patch.

 

 Thanks in advance.

   

Best Regards,

 

Niveen Nagy

 



RE: Multiple Search in Solr

2008-02-04 Thread Jae Joo
I have downloaded version 1.3 and built multiple indices.

I could not find any way for multiple indices search at Solr level, I
have written the Lucene application. It is working well.

Jae Joo

-Original Message-
From: Niveen Nagy [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 04, 2008 8:55 AM
To: solr-user@lucene.apache.org
Subject: Multiple Search in Solr

Hello ,

 

I have a question concerning solr multiple indices. We have 4 solr
indices in our system and we want to use distributed search (Multiple
search) that searches in the four indices in parallel. We downloaded the
latest code from svn and we applied the patch distributed.patch but we
need more detailed description on how to use this patch and what changes
should be applied to solr schema, and how these indices should be
located. Another question here is could the steps be applied to our
indices that was built using a version before applying the distributed
patch.

 

 Thanks in advance.

   

Best Regards,

 

Niveen Nagy

 



RE: Search not working for indexed words...

2008-02-04 Thread Ard Schrijvers

Hello,

your problem stems ( :-) ) from stemming. You can search this list, and
probably will find many threads. Last week somebody had the same
question, see archive of last week,

Regards Ard

> 
> Hi All,
> 
>   From past 6 months i am working and using SOLR. Now i am 
> facing some problem with that while searching.
>   I have searched for some words but it doesnt return the 
> result even its existing and indexed in data folder in SOLR 
> server(i meant solr tomcat).
>   
>   I have given the following words :
>"administrators",
>"visitors",
>
>   The format of my search query is:
>   Search word is : administrator*
> 
> http://192.168.1.65:8085/solr/select/?q=administrator*&version
> =2.2&start=0&rows=10&indent=on
> 
>   Its return nothing even the administrator existing in the 
> data folder.  
>   
>   Search word is : administrator
> 
> http://192.168.1.65:8085/solr/select/?q=administrator&version=
> 2.2&start=0&rows=10&indent=on
>   
>   If i search for "administrator" without giving "*", its 
> searching and returning the result.
>   
>   Search word is : administrator/*
> 
> http://192.168.1.65:8085/solr/select/?q=administrator%5C*&vers
> ion=2.2&start=0&rows=10&indent=on
>   
>   ("/" decoded as %5C) here.
>   If i search for "administrator/*", its returning the result.
>   
>  My query should be optimized, so that i can use it over my 
> project. So i need the query using wildcard character like 
> "searchword+*"
>  But now its not searching if i use "*". But if i use "/*" it 
> can search.
> But now i have faced the following problem.
> 
>   Search word is : admini\*
>   
> http://192.168.1.65:8085/solr/select/?q=admini%5C*&version=2.2
> &start=0&rows=10&indent=on
>   
>   Not returning any result.
>   
>   Search word is : admini
>   
> http://192.168.1.65:8085/solr/select/?q=admini&version=2.2&sta
> rt=0&rows=10&indent=on
>   
>   Not returning any result.
>   
>   Search word is : admini*
>   
> http://192.168.1.65:8085/solr/select/?q=admini*&version=2.2&st
> art=0&rows=10&indent=on
>   
>   This returning result.  
>   
>   Search word is : admin
>   
>   If i search the word "admin" or "admin*" or "admin\*", 
> its return the result.
>   
>   I am using the same SolrConfig.xml and Schema.xml 
> without any change given by solr during download and i didnt 
> make any changes on that.
>   
>   Whether i have to change my query or i have to change 
> Schema.xml and whether i have to add any words in stopwords.txt etc..,
> 
> And likewise some words i am searching and i am 
> getting the result.But after some time if i search for the 
> same word its not searching.Its coming by random.
>   
>   If anyone know the solution and have any idea, please 
> help me out.
>   
>   Thanks in advance.
>   
> with regards,
> V.Nithya. 
> --
> View this message in context: 
> http://www.nabble.com/Search-not-working-for-indexed-words...-
> tp15266626p15266626.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 


Re: duplicate entries being returned, possible caching issue?

2008-02-04 Thread Yonik Seeley
On Feb 4, 2008 2:20 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote:
> > If you are running snapshooter asynchronously, this would be the cause.
> > It's designed to be run from solr (via a postCommit or postOptimize
> > hook) at specific points where a consistent view of the index is
> > available.
>
> So our cron job might be running DURING an update, for example, and
> get duplicate values that way?

Right.  Duplicates are removed on a commit(), so if a snapshot is
being taken at any other time than right after a commit, those deletes
will not have been performed.

>  I'd have thought that in that case,
> the dupe values would stick around until the next update, 20 minutes
> later,

If you don't call commit() on the master, those dups will still be there.

-Yonik


Re: Dates, Times and Timezones

2008-02-04 Thread Chris Hostetter

:   One great problem we are having to integrate solr with plone is that
: plone can have dates and times in diferent timezones, and each user can query
: the data in its own timezone. So we would be really interested in being able
: to put date/time data on solr with a timezone and specifying the timezone of a
: query so we get perfect results. I saw somewhere that part of this suport is
: going to be in 1.3, is that right? And how is it going to work?

I'm not sure what "part" of this you are thinking of will be in 1.3 ... I 
don't know of any new Timezone related stuff in the trunk.

Solr specificly tries to be as agnostic about timezones as possible ... 
when interacting with Solr all dates should be in UTC.  If your 
application is getting/giving dates fromt/to users who have configured 
timezones prefrences then the parsing/formatting when interacting with the 
user should be aware of their prefered timezone -- but you should allways 
be in UTC when dealing with Solr.

The one place where it would *definitely* make sense to make Solr aware of 
timezones would be in dealing with DateMath -- when you round by "DAY" 
Solr currently does that in UTC, even though that's probably not what 
matters to you -- but ever other aspect of date processing Solr should 
work fine provided you transform your dates before putting them in the 
index.

There have been a some other discussions about adding options DateField to 
make it more flexible in parsing Dates in other formats (which might 
included timezone support) but: 1) i don't know of anyone who has actually 
started on this; 2) it would only affect the input to Solr ... values 
would still be indexed in UTC so they could be compared with any other 
date (regardless of it's format/timezone) and search clients would still 
need to know the current users prefered TZ to make sure to query/display 
those dates appropriately.



-Hoss



Re: duplicate entries being returned, possible caching issue?

2008-02-04 Thread Rachel McConnell
We are using Solr's replication scripts.  They are set to run every 20
minutes, via a cron job on the slave servers.  Any further useful info
I can give regarding them?

R

On 2/3/08, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> I would guess you are seeing a view of the index after adding some
> documents but before the duplicates have been removed.  Are you using
> Solr's replication scripts?
>
> -Yonik
>
> On Feb 1, 2008 6:01 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote:
> > We have just started seeing an intermittent problem in our production
> > Solr instances, where the same document is returned twice in one
> > request.  Most of the content of the response consists of duplicates.
> > It's not consistent; maybe 1/3 of the time this is happening and the
> > rest of the time, one return document is sent per actual Solr
> > document.
> >
> > We recently made some changes to our caching strategy, basically to
> > increase the values across the board.  This is the only change to our
> > Solr instance for quite some time.
> >
> > Our production system consists of the following:
> >
> > * 'write', a Solr server used as the master index, optimized for
> > writes.  all 3 application servers use this
> > * 'read1' & 'read2', Solr servers optimized for reads, which synch
> > from the master every 20 minutes.  these two are behind a pound load
> > balancer.  Two application servers use these for searching.
> > * 'read3', a Solr server identical to read1 & read2, but which is not
> > load balanced, and used by only one application server.
> >
> > Has anyone any ideas how to start debugging this?  What information
> > should I be looking for that could shed some light on this?
> >
> > Thanks for any advice,
> > Rachel
> >
>


Re: Limiting duplicate field occurrences to specified number

2008-02-04 Thread Briggs
Again, thanks for pointing me to the patches.  Admittedly, I am not
all that well informed in the patching world. I know how to apply them
and all that. But, I am trying to track down exactly which patches I
need to apply.

I currently have the source for Solr 1.2.  The patches are for 1.1,
1.3 and the current dev (a total of 8 patches in there).  But, there
is no tag or anything for a 1.3  The 1.3 patch does seem to
successfully apply to the 1.2 code base and compiles (though, not sure
if that will work anyway).

But, there is a: " This issue depends on: SOLR-281 Search Components
(plugins)" heading under it. Then that link shows 14 patches.

So, if you have a moment, could you list the patches that I need to
get to test this?  If not, I would be more than happy to read any
documentation how to understand the patching process.



On Feb 4, 2008 11:52 AM, Briggs <[EMAIL PROTECTED]> wrote:
> Cool, thanks!
>
>
> On Feb 4, 2008 11:36 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> > perhaps:
> > https://issues.apache.org/jira/browse/SOLR-236
> >
> >
> >
> > Briggs wrote:
> > > Is it possible to limit the number of duplicate field values are
> > > returned in a search result?
> > >
> > > To give a use case,
> > >
> > > I have a set of products. Each product belongs to a single vendor.
> > > When I query, I would like only n-number of results per vendor
> > > returned.
> > >
> > > Thanks!
> > >
> > > 
> > >
> >
> >
>
>
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"
>



-- 
"Conscious decisions by conscious minds are what make reality real"


Re: Search not working for indexed words...

2008-02-04 Thread Yonik Seeley
It's stemming.  Administrator stems to administr
Stemming isn't really possible for wildcard queries, so administrator*
won't match.
If you really need both wildcard queries and stemming, then use two
different fields (via copyField).

-Yonik

On Feb 4, 2008 6:54 AM, nithyavembu <[EMAIL PROTECTED]> wrote:
>
> Hi All,
>
>   From past 6 months i am working and using SOLR. Now i am facing some
> problem with that while searching.
>   I have searched for some words but it doesnt return the result even its
> existing and indexed in data folder in SOLR server(i meant solr tomcat).
>
>   I have given the following words :
>"administrators",
>"visitors",
>
>   The format of my search query is:
>   Search word is : administrator*
>
> http://192.168.1.65:8085/solr/select/?q=administrator*&version=2.2&start=0&rows=10&indent=on
>
>   Its return nothing even the administrator existing in the data folder.
>
>   Search word is : administrator
>
> http://192.168.1.65:8085/solr/select/?q=administrator&version=2.2&start=0&rows=10&indent=on
>
>   If i search for "administrator" without giving "*", its searching and
> returning the result.
>
>   Search word is : administrator/*
>
> http://192.168.1.65:8085/solr/select/?q=administrator%5C*&version=2.2&start=0&rows=10&indent=on
>
> ("/" decoded as %5C) here.
> If i search for "administrator/*", its returning the result.
>
>  My query should be optimized, so that i can use it over my project. So i
> need the query using wildcard character like "searchword+*"
>  But now its not searching if i use "*". But if i use "/*" it can search.
> But now i have faced the following problem.
>
> Search word is : admini\*
>
> http://192.168.1.65:8085/solr/select/?q=admini%5C*&version=2.2&start=0&rows=10&indent=on
>
> Not returning any result.
>
> Search word is : admini
>
> http://192.168.1.65:8085/solr/select/?q=admini&version=2.2&start=0&rows=10&indent=on
>
> Not returning any result.
>
> Search word is : admini*
>
> http://192.168.1.65:8085/solr/select/?q=admini*&version=2.2&start=0&rows=10&indent=on
>
> This returning result.
>
> Search word is : admin
>
> If i search the word "admin" or "admin*" or "admin\*", its return the
> result.
>
> I am using the same SolrConfig.xml and Schema.xml without any change 
> given
> by solr during download and i didnt make any changes on that.
>
> Whether i have to change my query or i have to change Schema.xml and
> whether i have to add any words in stopwords.txt etc..,
>
> And likewise some words i am searching and i am getting the
> result.But after some time if i search for the same word its not
> searching.Its coming by random.
>
> If anyone know the solution and have any idea, please help me out.
>
> Thanks in advance.
>
> with regards,
> V.Nithya.
> --
> View this message in context: 
> http://www.nabble.com/Search-not-working-for-indexed-words...-tp15266626p15266626.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Limiting duplicate field occurrences to specified number

2008-02-04 Thread Briggs
Is it possible to limit the number of duplicate field values are
returned in a search result?

To give a use case,

I have a set of products. Each product belongs to a single vendor.
When I query, I would like only n-number of results per vendor
returned.

Thanks!



-- 
"Conscious decisions by conscious minds are what make reality real"


Re: SpanQuery support

2008-02-04 Thread Renaud Delbru

Hi Yonik,

Yonik Seeley wrote:

On Feb 2, 2008 3:43 PM, Renaud Delbru <[EMAIL PROTECTED]> wrote:
  

I was looking at the discussion of SOLR-281. If I understand correctly,
the task would be to write my own search component class,
SpanQueryComponent that extends the SearchComponent class, then
overwriting the declaration of the "query searchComponent" in
solrconfig.xml:

Then, I will be able to use directly my own query syntax and query
component ? Is it correct ?



You could, but that would be the hard way (by a big margin).
There are pluggable query parsers now (see QParserPlugin)... but the
current missing piece is being able to specify a new parser plugin
from solrconfig.xml

-Yonik
  

Hum, I would prefer to follow the easiest way ;o).
Could you explain me briefly the easiest way ? And give me some hints on 
which classes I need to extend to achieve my goal ?


Regards.

--
Renaud Delbru,
E.C.S., Ph.D. Student,
Semantic Information Systems and
Language Engineering Group (SmILE),
Digital Enterprise Research Institute,
National University of Ireland, Galway.
http://smile.deri.ie/


Re: duplicate entries being returned, possible caching issue?

2008-02-04 Thread Yonik Seeley
On Feb 4, 2008 1:48 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote:
> On 2/4/08, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > On Feb 4, 2008 1:15 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote:
> > > We are using Solr's replication scripts.  They are set to run every 20
> > > minutes, via a cron job on the slave servers.  Any further useful info
> > > I can give regarding them?
> >
> > Are you using the postCommit hook in solrconfig.xml to call snapshooter?
>
> No, just the crontab.  We have only one master server on which commits
> are made, and the servers on which requests are made run the
> snapshooter periodically.

If you are running snapshooter asynchronously, this would be the cause.
It's designed to be run from solr (via a postCommit or postOptimize
hook) at specific points where a consistent view of the index is
available.

-Yonik


Search not working for indexed words...

2008-02-04 Thread nithyavembu

Hi All,

  From past 6 months i am working and using SOLR. Now i am facing some
problem with that while searching.
  I have searched for some words but it doesnt return the result even its
existing and indexed in data folder in SOLR server(i meant solr tomcat).
  
  I have given the following words :
   "administrators",
   "visitors",
   
  The format of my search query is:
  Search word is : administrator*

http://192.168.1.65:8085/solr/select/?q=administrator*&version=2.2&start=0&rows=10&indent=on

  Its return nothing even the administrator existing in the data folder.

  Search word is : administrator

http://192.168.1.65:8085/solr/select/?q=administrator&version=2.2&start=0&rows=10&indent=on

  If i search for "administrator" without giving "*", its searching and
returning the result.
  
  Search word is : administrator/*

http://192.168.1.65:8085/solr/select/?q=administrator%5C*&version=2.2&start=0&rows=10&indent=on

("/" decoded as %5C) here.
If i search for "administrator/*", its returning the result.

 My query should be optimized, so that i can use it over my project. So i
need the query using wildcard character like "searchword+*"
 But now its not searching if i use "*". But if i use "/*" it can search.
But now i have faced the following problem.

Search word is : admini\*

http://192.168.1.65:8085/solr/select/?q=admini%5C*&version=2.2&start=0&rows=10&indent=on

Not returning any result.

Search word is : admini

http://192.168.1.65:8085/solr/select/?q=admini&version=2.2&start=0&rows=10&indent=on

Not returning any result.

Search word is : admini*

http://192.168.1.65:8085/solr/select/?q=admini*&version=2.2&start=0&rows=10&indent=on

This returning result.  

Search word is : admin

If i search the word "admin" or "admin*" or "admin\*", its return the
result.

I am using the same SolrConfig.xml and Schema.xml without any change 
given
by solr during download and i didnt make any changes on that.

Whether i have to change my query or i have to change Schema.xml and
whether i have to add any words in stopwords.txt etc..,

And likewise some words i am searching and i am getting the
result.But after some time if i search for the same word its not
searching.Its coming by random.

If anyone know the solution and have any idea, please help me out.

Thanks in advance.

with regards,
V.Nithya.   
-- 
View this message in context: 
http://www.nabble.com/Search-not-working-for-indexed-words...-tp15266626p15266626.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to setup German stemmer?

2008-02-04 Thread Tobias Lohr

No ideas??

Hi,

I wonder how I can setup a German stemmer correctly in a list of 
filters for a field type definition.


Neither



nor



works. Any suggestions?

Thanks, Tobi





Re: Limiting duplicate field occurrences to specified number

2008-02-04 Thread Ryan McKinley

perhaps:
https://issues.apache.org/jira/browse/SOLR-236


Briggs wrote:

Is it possible to limit the number of duplicate field values are
returned in a search result?

To give a use case,

I have a set of products. Each product belongs to a single vendor.
When I query, I would like only n-number of results per vendor
returned.

Thanks!







How to setup German stemmer?

2008-02-04 Thread Tobias Lohr

Hi,

I wonder how I can setup a German stemmer correctly in a list of filters 
for a field type definition.


Neither



nor



works. Any suggestions?

Thanks, Tobi


Re: Limiting duplicate field occurrences to specified number

2008-02-04 Thread Briggs
Cool, thanks!

On Feb 4, 2008 11:36 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> perhaps:
> https://issues.apache.org/jira/browse/SOLR-236
>
>
>
> Briggs wrote:
> > Is it possible to limit the number of duplicate field values are
> > returned in a search result?
> >
> > To give a use case,
> >
> > I have a set of products. Each product belongs to a single vendor.
> > When I query, I would like only n-number of results per vendor
> > returned.
> >
> > Thanks!
> >
> > 
> >
>
>



-- 
"Conscious decisions by conscious minds are what make reality real"


Re: SpanQuery support

2008-02-04 Thread Yonik Seeley
On Feb 2, 2008 3:43 PM, Renaud Delbru <[EMAIL PROTECTED]> wrote:
> I was looking at the discussion of SOLR-281. If I understand correctly,
> the task would be to write my own search component class,
> SpanQueryComponent that extends the SearchComponent class, then
> overwriting the declaration of the "query searchComponent" in
> solrconfig.xml:
> 
> Then, I will be able to use directly my own query syntax and query
> component ? Is it correct ?

You could, but that would be the hard way (by a big margin).
There are pluggable query parsers now (see QParserPlugin)... but the
current missing piece is being able to specify a new parser plugin
from solrconfig.xml

-Yonik


Re: duplicate entries being returned, possible caching issue?

2008-02-04 Thread Rachel McConnell
On 2/4/08, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Feb 4, 2008 1:48 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote:
> > On 2/4/08, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > > On Feb 4, 2008 1:15 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote:
> > > > We are using Solr's replication scripts.  They are set to run every 20
> > > > minutes, via a cron job on the slave servers.  Any further useful info
> > > > I can give regarding them?
> > >
> > > Are you using the postCommit hook in solrconfig.xml to call snapshooter?
> >
> > No, just the crontab.  We have only one master server on which commits
> > are made, and the servers on which requests are made run the
> > snapshooter periodically.
>
> If you are running snapshooter asynchronously, this would be the cause.
> It's designed to be run from solr (via a postCommit or postOptimize
> hook) at specific points where a consistent view of the index is
> available.

So our cron job might be running DURING an update, for example, and
get duplicate values that way?  I'd have thought that in that case,
the dupe values would stick around until the next update, 20 minutes
later, and we have not observed that to happen.  Or do you mean
something else?

thanks,
Rachel


Re: How to setup German stemmer?

2008-02-04 Thread Chris Hostetter

First off: you have to be more specific about what you mean when you say 
neither ... or ... works" .. what happens?  Do you get an error on 
startup?  what does the error look like?

You also have to be clear about which version of Solr you are using ... 
GermanStemFilterFactory was added after Solr 1.2.

: I wonder how I can setup a German stemmer correctly in a list of filters for a
: field type definition.
: 
: Neither
: 
: 
: 
: nor
: 
: 
: 
: works. Any suggestions?



-Hoss



RE: How to setup German stemmer?

2008-02-04 Thread Steven A Rowe
Hi Tobi,

On 02/04/2008 at 4:11 PM, Tobias Lohr wrote:
> On 02/04/2008 at 3:42 PM, Steven A Rowe wrote:
> >  
>
> thanks for your hint. I've already tried this [...]

Did it work?

> I wonder, whether this leads to the same or a different result.
> If the latter, what is the difference between the stemmer and
> the snowball variant?

Here is a report on the algorithm used by Lucene's GermanStemmer, which you 
were initially attempting to use:

   

And here is a description of the Snowball German stemming algorithm:

   

Also, if you want to make the GermanStemmer work, have you tried the following?:

   

Steve



Re: SpanQuery support

2008-02-04 Thread Renaud Delbru

Yonik Seeley wrote:

On Feb 2, 2008 3:43 PM, Renaud Delbru <[EMAIL PROTECTED]> wrote:
  

I was looking at the discussion of SOLR-281. If I understand correctly,
the task would be to write my own search component class,
SpanQueryComponent that extends the SearchComponent class, then
overwriting the declaration of the "query searchComponent" in
solrconfig.xml:

Then, I will be able to use directly my own query syntax and query
component ? Is it correct ?



You could, but that would be the hard way (by a big margin).
There are pluggable query parsers now (see QParserPlugin)... but the
current missing piece is being able to specify a new parser plugin
from solrconfig.xml

-Yonik
  
I have looked at MoreLikeThisHandler.java. I saw that all the 
MoreLikeThis logics is defined inside the handler and through the inner 
class MoreLikeThisHelper.
Could I follow the same approach and define a ProximityHandler class 
that execute Lucene SpanQuery based on some request parameters ? Is it 
the right way to do ?


Regards.

--
Renaud Delbru,
E.C.S., Ph.D. Student,
Semantic Information Systems and
Language Engineering Group (SmILE),
Digital Enterprise Research Institute,
National University of Ireland, Galway.
http://smile.deri.ie/


RE: How to setup German stemmer?

2008-02-04 Thread Steven A Rowe
Hi Tobi,

On 02/04/2008 at 10:13 AM, Tobias Lohr wrote:
> Hi,
> 
> I wonder how I can setup a German stemmer correctly in a list
> of filters for a field type definition.

>From 
>:



"German2" is also listed as a valid language value.

Steve



Re: How to setup German stemmer?

2008-02-04 Thread Shalin Shekhar Mangar
Probably someone who knows more about this can shed some light, but
aren't you supposed to use GermanStemFilter instead of
GermanStemFilterFactory ?

On Feb 5, 2008 1:27 AM, Tobias Lohr <[EMAIL PROTECTED]> wrote:
> No ideas??
>
> > Hi,
> >
> > I wonder how I can setup a German stemmer correctly in a list of
> > filters for a field type definition.
> >
> > Neither
> >
> > 
> >
> > nor
> >
> > 
> >
> > works. Any suggestions?
> >
> > Thanks, Tobi
> >
>
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: duplicate entries being returned, possible caching issue?

2008-02-04 Thread Rachel McConnell
On 2/4/08, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Feb 4, 2008 1:15 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote:
> > We are using Solr's replication scripts.  They are set to run every 20
> > minutes, via a cron job on the slave servers.  Any further useful info
> > I can give regarding them?
>
> Are you using the postCommit hook in solrconfig.xml to call snapshooter?

No, just the crontab.  We have only one master server on which commits
are made, and the servers on which requests are made run the
snapshooter periodically.  No data changes are made on the read
servers, so postCommit would never be called anyway (I believe).

> The other possibility is a JVM crash happening before Solr removes
> deleted documents.

This would crash the appserver, which isn't happening.  Also the
duplicates don't seem to be returned often; we see a case of duplicate
results, but within a minute or less it goes away and the correct set
of results is returned again.  This seems to point to a problem with
the cache, to me.  But I don't have a good sense of how to debug it...

We tried changing the autowarmer settings to not pull anything from
the cache.  I'll write again if this seems to fix the problem - by
which I mean, if we don't see it at all for a day or two.

thanks,
Rachel


Re: duplicate entries being returned, possible caching issue?

2008-02-04 Thread Yonik Seeley
On Feb 4, 2008 1:15 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote:
> We are using Solr's replication scripts.  They are set to run every 20
> minutes, via a cron job on the slave servers.  Any further useful info
> I can give regarding them?

Are you using the postCommit hook in solrconfig.xml to call snapshooter?
The other possibility is a JVM crash happening before Solr removes
deleted documents.

-Yonik


Re: How to setup German stemmer?

2008-02-04 Thread Tobias Lohr

@steve:

Your suggestion

   

didn't work either. But anyway, the snowball porter filter worked. 

@rachel: 


As I already posted, I got the following error

  org.apache.solr.core.SolrException: Error loading class '...'

I use Solr 1.2



Hi Tobi,

On 02/04/2008 at 4:11 PM, Tobias Lohr wrote:
  

On 02/04/2008 at 3:42 PM, Steven A Rowe wrote:


 
  

thanks for your hint. I've already tried this [...]



Did it work?

  

I wonder, whether this leads to the same or a different result.
If the latter, what is the difference between the stemmer and
the snowball variant?



Here is a report on the algorithm used by Lucene's GermanStemmer, which you 
were initially attempting to use:

   

And here is a description of the Snowball German stemming algorithm:

   

Also, if you want to make the GermanStemmer work, have you tried the following?:

   

Steve

  




Re: duplicate entries being returned, possible caching issue?

2008-02-04 Thread Rachel McConnell
On 2/4/08, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Feb 4, 2008 2:20 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote:
> > > If you are running snapshooter asynchronously, this would be the cause.
> > > It's designed to be run from solr (via a postCommit or postOptimize
> > > hook) at specific points where a consistent view of the index is
> > > available.
> >
> > So our cron job might be running DURING an update, for example, and
> > get duplicate values that way?
>
> Right.  Duplicates are removed on a commit(), so if a snapshot is
> being taken at any other time than right after a commit, those deletes
> will not have been performed.

I've reviewed the wiki pages about snappuller
(http://wiki.apache.org/solr/SolrCollectionDistributionScripts) and
solrconfig.xml (http://wiki.apache.org/solr/SolrConfigXml) and it
seems that the snappuller is intended to be used on the slave server.
In our case, the slave servers do no updating and never commit; the
master is the only one that commits.  Is there a standard way for the
just-committed, consistent index to be pushed from the master server
out to the slaves?

In fact I don't see how this is supposed to work in any environment
where the master and slave Solr servers are on different physical
machines.  The postCommit handler should run after a commit, which
only happens on the master server; yet it runs snappuller which should
run on a slave.  I am probably missing something here, is there any
more documentation you can point me to?

Rachel


Re: How to setup German stemmer?

2008-02-04 Thread Tobias Lohr

Your suggestions don't work either! Both

 and class="solr.GermanStemFilter"/>


lead to

org.apache.solr.core.SolrException: Error loading class ..


Probably someone who knows more about this can shed some light, but
aren't you supposed to use GermanStemFilter instead of
GermanStemFilterFactory ?

On Feb 5, 2008 1:27 AM, Tobias Lohr <[EMAIL PROTECTED]> wrote:
  

No ideas??



Hi,

I wonder how I can setup a German stemmer correctly in a list of
filters for a field type definition.

Neither



nor



works. Any suggestions?

Thanks, Tobi

  





  




Re: For an "XML" fieldtype

2008-02-04 Thread Frédéric Glorieux


Hi Ryan

Thanks  for answer,


Depends what you are trying to do.

Is there anything wrong with just using string or text fieldType?
If you use the XML writer, it will get returned xml encodedd (> becomes 
> etc).


This is quite the only change I done to StrField, so I get back the 
original XML string stored, and could directly transform it with XSL.



I think if you use the JSON writer, it is only escaped for json.


I haven't tested json writer, but could verify before proposing the class.


what is missing?  what problem are you hitting?


I would be glad that this class could be commited, so that I do not need 
to keep it up to date with future Solr release.


--
Frédéric Glorieux
École nationale des chartes
Direction des nouvelles technologies et de l'informatique


Re: Factory in Solr

2008-02-04 Thread Chris Hostetter

: I'm trying to add a factory in solr for tokenizing Arabic text, but I
: receive some error (the one at the last of my email)

: java.lang.VerifyError: (class:
: org/apache/solr/analysis/ArabicTokenizerFactory, method: create
: signature: (Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream;)
: Wrong return type in function at java.lang.Class.forName0(Native Method)
: at java.lang.Class.forName(Unknown Source) at

"VerrifyError" is a pretty low level JVM error ... based on that message 
i'm guessing that the version of the TokenStream class you compiled 
against isn't the same version being used when you run Solr ... if you 
compile your factory using the jars that come with Solr in your 
classpath (and no other versions of lucene jars) it should work.



-Hoss



Re: For an "XML" fieldtype

2008-02-04 Thread Ryan McKinley

Depends what you are trying to do.

Is there anything wrong with just using string or text fieldType?

If you use the XML writer, it will get returned xml encodedd (> becomes 
> etc).  I think if you use the JSON writer, it is only escaped for json.


what is missing?  what problem are you hitting?

ryan


Frédéric Glorieux wrote:

Hi all,

Sorry to repost on this issue.
Is there a regular way to use a field to store XML source of a document? 
If not, is a fieldType the solution ?


Or, is it a "solr-user" question ?

Sorry if I have post in the bad place.





RE: Querying multiple dynamicField

2008-02-04 Thread Lance Norskog
You can use the  directive to copy all 'sentence_*' fields into
one indexed field. You then have a named field that you can search against.

Lance Norskog

-Original Message-
From: Renaud Delbru [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 01, 2008 6:48 PM
To: solr-user@lucene.apache.org
Subject: Querying multiple dynamicField

Hi,

We would like to know if there is an efficient way to query multiple
dynamicField at the same time, using wildcard in the field name. For
example, we have a list of dynamic fields "sentence_*" and we would like to
execute a query on all the "sentence_*" fields.
Is there a way to execute such queries on Solr 1.3 / Lucene 2.3 ?

Regards.

--
Renaud Delbru



Re: How to setup German stemmer?

2008-02-04 Thread Tobias Lohr

Hi Steve,

thanks for your hint. I've already tried this and I wonder, whether this 
leads to the same or a different result. If the latter, what is the 
difference between the stemmer and the snowball variant?


tobi

Hi Tobi,

On 02/04/2008 at 10:13 AM, Tobias Lohr wrote:
  

Hi,

I wonder how I can setup a German stemmer correctly in a list
of filters for a field type definition.



>From 
:



"German2" is also listed as a valid language value.

Steve

  




Re: SpanQuery support

2008-02-04 Thread Yonik Seeley
On Feb 4, 2008 11:53 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> You could, but that would be the hard way (by a big margin).
> There are pluggable query parsers now (see QParserPlugin)... but the
> current missing piece is being able to specify a new parser plugin
> from solrconfig.xml

Hmmm, it appears I forgot what I implemented already ;-)

Support for adding new parser plugins from solrconfig.xml already
exists (and I just added a test).
So add something like the following to your solrconfig.xml


And then implement FooQParserPlugin in Java to create your desired
query structures (span queries or whatever).  See other
implementations of FooQParserPlugin in Solr for guidance.

To use your "foo" parser, set it to the default query type by adding
defType="foo" to the request (or to the defaults for your handler).
You can also override the current query type via q=my query


-Yonik