date:20071128

Re: LSA Implementation

2007-11-28 Thread Eswar K

Lance,

It does cover European languages, but pretty much nothing on Asian languages
(CJK).

- Eswar

On Nov 28, 2007 1:51 AM, Norskog, Lance <[EMAIL PROTECTED]> wrote:

> WordNet itself is English-only. There are various ontology projects for
> it.
>
> http://www.globalwordnet.org/ is a separate world language database
> project. I found it at the bottom of the WordNet wikipedia page. Thanks
> for starting me on the search!
>
> Lance
>
> -Original Message-
> From: Eswar K [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 26, 2007 6:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: LSA Implementation
>
> The languages also include CJK :) among others.
>
> - Eswar
>
> On Nov 27, 2007 8:16 AM, Norskog, Lance <[EMAIL PROTECTED]> wrote:
>
> > The WordNet project at Princeton (USA) is a large database of
> synonyms.
> > If you're only working in English this might be useful instead of
> > running your own analyses.
> >
> > http://en.wikipedia.org/wiki/WordNet
> > http://wordnet.princeton.edu/
> >
> > Lance
> >
> > -Original Message-
> > From: Eswar K [mailto:[EMAIL PROTECTED]
> > Sent: Monday, November 26, 2007 6:34 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: LSA Implementation
> >
> > In addition to recording which keywords a document contains, the
> > method examines the document collection as a whole, to see which other
>
> > documents contain some of those same words. this algo should consider
> > documents that have many words in common to be semantically close, and
>
> > ones with few words in common to be semantically distant. This simple
> > method correlates surprisingly well with how a human being, looking at
>
> > content, might classify a document collection. Although the algorithm
> > doesn't understand anything about what the words *mean*, the patterns
> > it notices can make it seem astonishingly intelligent.
> >
> > When you search an such  an index, the search engine looks at
> > similarity values it has calculated for every content word, and
> > returns the documents that it thinks best fit the query. Because two
> > documents may be semantically very close even if they do not share a
> > particular keyword,
> >
> > Where a plain keyword search will fail if there is no exact match,
> > this algo will often return relevant documents that don't contain the
> > keyword at all.
> >
> > - Eswar
> >
> > On Nov 27, 2007 7:51 AM, Marvin Humphrey <[EMAIL PROTECTED]>
> wrote:
> >
> > >
> > > On Nov 26, 2007, at 6:06 PM, Eswar K wrote:
> > >
> > > > We essentially are looking at having an implementation for doing
> > > > search which can return documents having conceptually similar
> > > > words without necessarily having the original word searched for.
> > >
> > > Very challenging.  Say someone searches for "LSA" and hits an
> > > archived
> >
> > > version of the mail you sent to this list.  "LSA" is a reasonably
> > > discriminating term.  But so is "Eswar".
> > >
> > > If you knew that the original term was "LSA", then you might look
> > > for documents near it in term vector space.  But if you don't know
> > > the original term, only the content of the document, how do you know
>
> > > whether you should look for docs near "lsa" or "eswar"?
> > >
> > > Marvin Humphrey
> > > Rectangular Research
> > > http://www.rectangular.com/
> > >
> > >
> > >
> >
>

Re: Combining SOLR and JAMon to monitor query execution times from a browser

2007-11-28 Thread Siegfried Goeschl

Hi Noberto,

JAMon is all about aggregating statistical data and displaying the
information for a web browser - the main beauty is that it is easy to
define what you are monitoring such as querying domain objects per customer.

Cheers,

Siegfried Goeschl

Norberto Meijome wrote:

On Tue, 27 Nov 2007 18:18:16 +0100
Siegfried Goeschl <[EMAIL PROTECTED]> wrote:

Hi folks,

working on a closed source project for an IP concerned company is not
always fun ... we combined SOLR with JAMon
(http://jamonapi.sourceforge.net/) to keep an eye of the query times and
this might be of general interest

+) JAMon comes with a ready-to-use ServletFilter
+) we extended this implementation to keep track for queries issued by a
customer and the requested domain objects, e.g. "artist", "album", "track"
+) this allows us to keep track of the execution times and their
distribution to find quickly long running queries without having access
to the access.log from a web browser
+) a small presentation can be found at
http://people.apache.org/~sgoeschl/presentations/jamon-20070717.pdf

+) if it is of general I can rewrite the code as contribution

Thanks Siegfried,

I am further interested in plugging this information into something like Nagios , Cacti , Zenoss , bigsister , Openview or your monitoring system of choice, but I haven't had much time to look into this yet. How does JAMon compare to JMX ( http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/) ?

cheers,
B

_
{Beto|Norberto|Numard} Meijome

There are no stupid questions, but there are a LOT of inquisitive idiots.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

2007-11-28 Thread Daniel Alheiros

Hi

I experienced a very unpleasant problem recently, when my search indexing
adaptor was changed to add some new fields. The problem is my schema didn't
follow those changes (new fields added), and after that SOLR was silently
ignoring all documents I sent.

Neither SOLR Java client or SOLR server returned me an error code or log
message. In the server side, nothing was logged and the client received a
standard success return.

Why didn't my documents got indexed and this new fields were just ignored?
That is what I think it was supposed to do.

Please let me know your thoughts.

Regards,
Daniel 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: Memory use with sorting problem

2007-11-28 Thread Chris Laux

Just wanted to add the solution to this problem, in case someone finds
the matching description in the archives (see below).

By reducing the granularity of the timestamp field (stored as slong)
from seconds to minutes the number of unique values was reduced by an
order of magnitude (there are about 500.000 minutes in a year) and hence
the memory use was also reduced.

Chris


Chris Laux wrote:
> Hi again,
> 
> in the meantime I discovered the use of jmap (I'm not a Java programmer)
> and found that all the memory was being used up by String and char[]
> objects.
> 
> The Lucene docs have the following to say on sorting memory use:
> 
>> For String fields, the cache is larger: in addition to the above
> array, the value of every term in the field is kept in memory. If there
> are many unique terms in the field, this could be quite large.
> 
> (http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Sort.html)
> 
> I am sorting on the "slong" schema type, which is of course stored as a
> string. The above quote seems to indicate that it is possible for a
> field not to be a string for the purposes of the sort, while I took it
> from LiA that everything is a string to Lucene.
> 
> What can I do to make sure the additional memory is not used by every
> unique term? i.e. how to have the slong not be a "String field"?
> 
> Cheers,
> Chris
> 
> 
> Chris Laux wrote:
>> Hi all,
>>
>> I've been struggling with this problem for over a month now, and
>> although memory issues have been discussed often, I don't seem to be
>> able to find a fitting solution.
>>
>> The index is merely 1.5 GB large, but memory use quickly fills out the
>> heap max of 1 GB on a 2 GB machine. This then works fine until
>> auto-warming starts. Switching the latter off altogether is unattractive
>> as it leads to response times of up to 30 s. When auto-warming starts, I
>> get this error:
>>
>>> SEVERE: Error during auto-warming of
>> key:org.apache.solr.search.QueryResultKey
>> @e0b93139:java.lang.OutOfMemoryError: Java heap space
>>
>> Now when I reduce the size of caches (to a fraction of the default
>> settings) and number of warming Searchers (to 2), memory use is not
>> reduced and the problem stays. Only deactivating auto-warming will help.
>> When I set the heap size limit higher (and go into swap space), all the
>> extra memory seems to be used up right away, independently from
>> auto-warming.
>>
>> This all seems to be closely connected to sorting by a numerical field,
>> as switching this off does make memory use a lot more friendly.
>>
>> Is it normal to need that much Memory for such a small index?
>>
>> I suspect the problem is in Lucene, would it be better to post on their
>> list?
>>
>> Does anyone know a better way of getting the sorting done?
>>
>> Thanks in advance for your help,
>>
>> Chris
>>
>>
>> This is the field setup in schema.xml:
>>
>> > multiValued="false" />
>> > multiValued="false" />
>> 
>> 
>>
>> And this is a sample query:
>>
>> select/?q=solr&start=0&rows=20&sort=created+desc
>>
>>
>

Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

2007-11-28 Thread Ravish Bhagdev

Yup, I do remember that happening to me before.

Is this intentionally so?

Ravish

On Nov 28, 2007 1:41 PM, Daniel Alheiros <[EMAIL PROTECTED]> wrote:
> Hi
>
> I experienced a very unpleasant problem recently, when my search indexing
> adaptor was changed to add some new fields. The problem is my schema didn't
> follow those changes (new fields added), and after that SOLR was silently
> ignoring all documents I sent.
>
> Neither SOLR Java client or SOLR server returned me an error code or log
> message. In the server side, nothing was logged and the client received a
> standard success return.
>
> Why didn't my documents got indexed and this new fields were just ignored?
> That is what I think it was supposed to do.
>
> Please let me know your thoughts.
>
> Regards,
> Daniel
>
>
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal 
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance 
> on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
>

Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

2007-11-28 Thread Erik Hatcher



On Nov 28, 2007, at 8:41 AM, Daniel Alheiros wrote:
I experienced a very unpleasant problem recently, when my search  
indexing
adaptor was changed to add some new fields. The problem is my  
schema didn't
follow those changes (new fields added), and after that SOLR was  
silently

ignoring all documents I sent.


Is your schema perhaps configured to ignore undefined fields?

Erik

Re: CJK Analyzers for Solr

2007-11-28 Thread Walter Underwood

With Ultraseek, we switched to a dictionary-based segmenter for Chinese
because the N-gram highlighting wasn't acceptable to our Chinese customers.

I guess it is something to check for each application.

wunder

On 11/27/07 10:46 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

> For what it's worth I worked on indexing and searching a *massive* pile of
> data, a good portion of which was in CJ and some K.  The n-gram approach was
> used for all 3 languages and the quality of search results, including
> highlighting was evaluated and okay-ed by native speakers of these languages.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> - Original Message 
> From: Walter Underwood <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, November 27, 2007 2:41:38 PM
> Subject: Re: CJK Analyzers for Solr
> 
> Dictionaries are surprisingly expensive to build and maintain and
> bi-gram is surprisingly effective for Chinese. See this paper:
> 
>http://citeseer.ist.psu.edu/kwok97comparing.html
> 
> I expect that n-gram indexing would be less effective for Japanese
> because it is an inflected language. Korean is even harder. It might
> work to break Korean into the phonetic subparts and use n-gram on
> those.
> 
> You should not do term highlighting with any of the n-gram methods.
> The relevance can be very good, but the highlighting just looks dumb.
> 
> wunder
> 
> On 11/27/07 8:54 AM, "Eswar K" <[EMAIL PROTECTED]> wrote:
> 
>> Is there any specific reason why the CJK analyzers in Solr were
>  chosen to be
>> n-gram based instead of it being a morphological analyzer which is
>  kind of
>> implemented in Google as it considered to be more effective than the
>  n-gram
>> ones?
>> 
>> Regards,
>> Eswar
>> 
>> 
>> 
>> On Nov 27, 2007 7:57 AM, Eswar K <[EMAIL PROTECTED]> wrote:
>> 
>>> thanks james...
>>> 
>>> How much time does it take to index 18m docs?
>>> 
>>> - Eswar
>>> 
>>> 
>>> On Nov 27, 2007 7:43 AM, James liu <[EMAIL PROTECTED] > wrote:
>>> 
 i not use HYLANDA analyzer.
 
 i use je-analyzer and indexing at least 18m docs.
 
 i m sorry i only use chinese analyzer.
 
 
 On Nov 27, 2007 10:01 AM, Eswar K <[EMAIL PROTECTED]> wrote:
 
> What is the performance of these CJK analyzers (one in lucene and
 hylanda
> )?
> We would potentially be indexing millions of documents.
> 
> James,
> 
> We would have a look at hylanda too. What abt japanese and korean
> analyzers,
> any recommendations?
> 
> - Eswar
> 
> On Nov 27, 2007 7:21 AM, James liu <[EMAIL PROTECTED]>
>  wrote:
> 
>> I don't think NGram is good method for Chinese.
>> 
>> CJKAnalyzer of Lucene is 2-Gram.
>> 
>> Eswar K:
>>  if it is chinese analyzer,,i recommend
>  hylanda（www.hylanda.com）,,,it
 is
>> the best chinese analyzer and it not free.
>>  if u wanna free chinese analyzer, maybe u can try je-analyzer.
>  it
 have
>> some problem when using it.
>> 
>> 
>> 
>> On Nov 27, 2007 5:56 AM, Otis Gospodnetic <
 [EMAIL PROTECTED]>
>> wrote:
>> 
>>> Eswar,
>>> 
>>> We've uses the NGram stuff that exists in Lucene's
 contrib/analyzers
>>> instead of CJK.  Doesn't that allow you to do everything that
>  the
>> Chinese
>>> and CJK analyzers do?  It's been a few months since I've looked
>  at
>> Chinese
>>> and CJK Analzyers, so I could be off.
>>> 
>>> Otis
>>> 
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> 
>>> - Original Message 
>>> From: Eswar K <[EMAIL PROTECTED]>
>>> To: solr-user@lucene.apache.org
>>> Sent: Monday, November 26, 2007 8:30:52 AM
>>> Subject: CJK Analyzers for Solr
>>> 
>>> Hi,
>>> 
>>> Does Solr come with Language analyzers for CJK? If not, can you
 please
>>> direct me to some good CJK analyzers?
>>> 
>>> Regards,
>>> Eswar
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> --
>> regards
>> jl
>> 
> 
 
 
 
 --
 regards
 jl
 
>>> 
>>> 
> 
> 
> 
>

query parsing & wildcards

2007-11-28 Thread Charles Hornberger

I'm confused by some behavior I'm seeing in Solr (i'm using 1.2.0). I
have a field named "description", declared with the following
fieldType:


  






  


The problem I'm having is that when I search for description:deck*, I
get the results I expect; when I search for description:Deck*, I get
nothing. I want both queries to return the same result set. (I'm using
the standard request handler.)

Interestingly, when I search for description:Deck from the web
interface, the debug output shows that the query term is converted to
lowercase:

description:Deck
description:Deck
description:deck
description:deck

... but when I search for description:Deck*, it shows that it is not:

description:Deck*
description:Deck*
description:Deck*
description:Deck*

What am I doing wrong here?

Also, when I use the Field Analysis tool for description:Deck*, it
shows the following (sorry for the bad copy/paste):

Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1
term text   Deck*
term type   word
source start,end0,5
org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
expand=false, ignoreCase=true}
term position   1
term text   Deck*
term type   word
source start,end0,5
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position   1
term text   Deck*
term type   word
source start,end0,5
org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=0, catenateWords=1, generateWordParts=0,
catenateAll=0, catenateNumbers=1}
term position   1
term text   Deck
term type   word
source start,end0,4
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position   1
term text   deck
term type   word
source start,end0,4
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position   1
term text   deck
term type   word
source start,end0,4

Thanks,
Charlie

Re: SOLR / Tomcat JNDI Settings

2007-11-28 Thread vis

Thanks a lot Hossman; this solved it for me.

Essential for me was to understand that I had to create a solr.xml file in
\conf\Catalina\localhost
see hereunder in the quote an example.

The docbase should point to the .war file somewhere on my system.
The value-attribute for the  should point to a directory where tomcat can create the lucene/solr index
files.  That home directory should also contain the conf directory from the
example in the solr distribution.

And that was it.

hossman wrote:
> 
> 
>docBase="/var/tmp/ac-demo/apache-solr-1.2.0/dist/apache-solr-1.2.0.war"
>   debug="0"
>   crossContext="true" >
> 
>   value="/var/tmp/ac-demo/books-solr-home/"
>type="java.lang.String"
>override="true" />
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Tomcat-JNDI-Settings-tf4753435.html#a14001375
Sent from the Solr - User mailing list archive at Nabble.com.

Re: query parsing & wildcards

2007-11-28 Thread Charles Hornberger

I should have Googled better. It seems that my question has been asked
and answered already, and not just once:

  http://www.nabble.com/Using-wildcard-with-accented-words-tf4673239.html
  
http://groups.google.com/group/acts_as_solr/browse_thread/thread/42920dc2dcc5fa88

On Nov 28, 2007 9:42 AM, Charles Hornberger
<[EMAIL PROTECTED]> wrote:
> I'm confused by some behavior I'm seeing in Solr (i'm using 1.2.0). I
> have a field named "description", declared with the following
> fieldType:
>
>  positionIncrementGap="100" >
>   
> 
>  synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>  words="stopwords.txt"/>
>  generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
> 
> 
>   
> 
>
> The problem I'm having is that when I search for description:deck*, I
> get the results I expect; when I search for description:Deck*, I get
> nothing. I want both queries to return the same result set. (I'm using
> the standard request handler.)
>
> Interestingly, when I search for description:Deck from the web
> interface, the debug output shows that the query term is converted to
> lowercase:
>
> description:Deck
> description:Deck
> description:deck
> description:deck
>
> ... but when I search for description:Deck*, it shows that it is not:
>
> description:Deck*
> description:Deck*
> description:Deck*
> description:Deck*
>
> What am I doing wrong here?
>
> Also, when I use the Field Analysis tool for description:Deck*, it
> shows the following (sorry for the bad copy/paste):
>
> Query Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> term position   1
> term text   Deck*
> term type   word
> source start,end0,5
> org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
> expand=false, ignoreCase=true}
> term position   1
> term text   Deck*
> term type   word
> source start,end0,5
> org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
> ignoreCase=true}
> term position   1
> term text   Deck*
> term type   word
> source start,end0,5
> org.apache.solr.analysis.WordDelimiterFilterFactory
> {generateNumberParts=0, catenateWords=1, generateWordParts=0,
> catenateAll=0, catenateNumbers=1}
> term position   1
> term text   Deck
> term type   word
> source start,end0,4
> org.apache.solr.analysis.LowerCaseFilterFactory {}
> term position   1
> term text   deck
> term type   word
> source start,end0,4
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
> term position   1
> term text   deck
> term type   word
> source start,end0,4
>
> Thanks,
> Charlie
>

Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

2007-11-28 Thread Daniel Alheiros

I didn't know that trick.

Could you point me this documentation?

Anyway, don't you think that is something wrong in discarding all documents
without any warning? It's returning 200 return code without any other
content on the SOLRJ response to updates and don't log anything on the
server side...

Regards,
Daniel

On 28/11/07 15:40, "Erik Hatcher" <[EMAIL PROTECTED]> wrote:

> 
> On Nov 28, 2007, at 8:41 AM, Daniel Alheiros wrote:
>> I experienced a very unpleasant problem recently, when my search
>> indexing
>> adaptor was changed to add some new fields. The problem is my
>> schema didn't
>> follow those changes (new fields added), and after that SOLR was
>> silently
>> ignoring all documents I sent.
> 
> Is your schema perhaps configured to ignore undefined fields?
> 
> Erik
> 

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

2007-11-28 Thread Chris Hostetter


: I didn't know that trick.

erik is refering to this in the example schema.xml...


   

...but it sounds like you are having some other problem ... you said that 
when you POST your documents with "extra" fields you get a 200 
response but the documents aren't getting indexed at all correct?

that is not suppose to happen, Solr should be generating an error.  can 
you give us more info on your setup: what does your schema.xml look like, 
what does your update code look like (you said you were using SolrJ i 
believe?) what does Solr log when these updates happen, etc...



-Hoss

Re: query parsing & wildcards

2007-11-28 Thread Chris Hostetter


: I should have Googled better. It seems that my question has been asked
: and answered already, and not just once:

right, wildcard and prefix queries aren't analyzed by the query 
parser (there's more on the "why" of this in the Lucene-Java FAQ).

To clarify one other part of your question

: > Also, when I use the Field Analysis tool for description:Deck*, it
: > shows the following (sorry for the bad copy/paste):

the analysis tool only shows you the "analysis" portion of 
indexing/querying ... it knows nothing about which query parser you are 
using, so it doesn't know anything about any special query parser 
characters (like "*").  The output it gave you shows you want the 
standard request handler would have done if you'd used the standard 
request handler to search for...
 description:"Deck*"
or:  description:Deck\*

(where the * character is 'escaped')



-Hoss

RequestHandler shared resources

2007-11-28 Thread Grant Ingersoll

I have an object that I would like to share between two or more  
RequestHandlers.  One request handler will be responsible for the  
object and the other I would like to handle information requests about  
what the object is doing.  Thus, I need to share the object between  
the handlers.  Short of using a static, does anyone have any  
recommended way of doing this?  In a pure servlet, I could use the  
ServletContext.  Or am I missing something?


Thanks,
Grant

Re: RequestHandler shared resources

2007-11-28 Thread Ryan McKinley


Grant Ingersoll wrote:
I have an object that I would like to share between two or more 
RequestHandlers.  One request handler will be responsible for the object 
and the other I would like to handle information requests about what the 
object is doing.  Thus, I need to share the object between the 
handlers.  Short of using a static, does anyone have any recommended way 
of doing this?  In a pure servlet, I could use the ServletContext.  Or 
am I missing something?




RequestHandlers can know about each other by asking SolrCore

core.getRequestHandler( "myhandler" )

If you are using 1.3-dev, make the RequestHandler implement 
SolrCoreAware and then inform( SolrCore ) will be called *after* 
everything is initialized.


is that what you need?

ryan

Re: RequestHandler shared resources

2007-11-28 Thread Grant Ingersoll

Yeah, I think that would work.  Actually, I should be able to get all  
the request handlers and then look for instances of the req handlers  
that I need.


Thanks!

-Grant

On Nov 28, 2007, at 4:42 PM, Ryan McKinley wrote:


Grant Ingersoll wrote:
I have an object that I would like to share between two or more  
RequestHandlers.  One request handler will be responsible for the  
object and the other I would like to handle information requests  
about what the object is doing.  Thus, I need to share the object  
between the handlers.  Short of using a static, does anyone have  
any recommended way of doing this?  In a pure servlet, I could use  
the ServletContext.  Or am I missing something?


RequestHandlers can know about each other by asking SolrCore

   core.getRequestHandler( "myhandler" )

If you are using 1.3-dev, make the RequestHandler implement  
SolrCoreAware and then inform( SolrCore ) will be called *after*  
everything is initialized.


is that what you need?

ryan

Re: RequestHandler shared resources

2007-11-28 Thread Chris Hostetter

: Yeah, I think that would work.  Actually, I should be able to get all the
: request handlers and then look for instances of the req handlers that I need.

or configure reqHandler "B" with the name of reqHandler "A" that owns the 
resource so it knows who to ask.


-Hoss

LowerCaseFilterFactory and spellchecker

2007-11-28 Thread Rob Casson

think i'm just doing something wrong...

was experimenting with the spellcheck handler with the nightly
checkout from 11-28; seems my spellchecking is case-sensitive, even
tho i think i'm adding the LowerCaseFilterFactory to both the index
and query analyzers.

here's a brief rundown of my testing steps.

from schema.xml:























from solrconfig.xml:



1
0.5

spell
spelling




adding the doc:

curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
--data-binary 'Thorne'
curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
--data-binary ''



building the spellchecker:

http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuild



querying the spellchecker:

results from http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker




0
1

Thorne
false

thorne



results from http://localhost:8983/solr/select/?q=thorne&qt=spellchecker




0
2

thorne
true




any pointers as to what i'm doing wrong, misinterpreting?  i suspect
i'm just doing something bone-headed in the analyzer sections...

thanks as always,

rob casson
miami university libraries

Re: LowerCaseFilterFactory and spellchecker

2007-11-28 Thread Rob Casson

lance,

thanks for the quick replylooks like 'thorne' is getting added to
the dictionary, as it comes up as a suggestion for 'Thorne'

i could certainly just lowercase in my client, but just confirming
that i'm not just screwing it up in the firstplace :)

thanks again,
rc

On Nov 28, 2007 8:11 PM, Norskog, Lance <[EMAIL PROTECTED]> wrote:
> There are a few parameters for limiting what words are added to the
> dictionary.  You might be trimming out 'thorne'. See this page:
>
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
>
>
> -Original Message-
> From: Rob Casson [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, November 28, 2007 4:25 PM
> To: solr-user@lucene.apache.org
> Subject: LowerCaseFilterFactory and spellchecker
>
> think i'm just doing something wrong...
>
> was experimenting with the spellcheck handler with the nightly checkout
> from 11-28; seems my spellchecking is case-sensitive, even tho i think
> i'm adding the LowerCaseFilterFactory to both the index and query
> analyzers.
>
> here's a brief rundown of my testing steps.
>
> from schema.xml:
>
>  positionIncrementGap="100">
> 
> 
> 
>  class="solr.RemoveDuplicatesTokenFilterFactory"/>
> 
> 
> 
> 
> 
>  class="solr.RemoveDuplicatesTokenFilterFactory"/>
> 
> 
> 
>
>  multiValued="true"/>
>  multiValued="true"/>
>
> 
>
> 
>
> from solrconfig.xml:
>
>  class="solr.SpellCheckerRequestHandler" startup="lazy">
> 
> 1
> 0.5
> 
> spell
> spelling
> 
>
> 
>
> adding the doc:
>
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary ' name="title">Thorne'
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary ''
>
> 
>
> building the spellchecker:
>
> http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuild
>
> 
>
> querying the spellchecker:
>
> results from http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker
>
> 
> 
> 
> 0
> 1
> 
> Thorne
> false
> 
> thorne
> 
> 
>
> results from http://localhost:8983/solr/select/?q=thorne&qt=spellchecker
>
> 
> 
> 
> 0
> 2
> 
> thorne
> true
> 
> 
>
>
> any pointers as to what i'm doing wrong, misinterpreting?  i suspect i'm
> just doing something bone-headed in the analyzer sections...
>
> thanks as always,
>
> rob casson
> miami university libraries
>

RE: LowerCaseFilterFactory and spellchecker

2007-11-28 Thread Norskog, Lance

Oops, sorry, didn't think that through.

The query to the spellchecker is not filtered through the field query
definition. You have to do your own lower-case transformation when you
do the query.  This is a simple thing to resolve. But, I'm working with
international alphabets and I would like 'protege' and 'protege with
both e's accented` to match. The ISOLatin1 filter does this in indexing
& querying. But I have to rip off the code and use it in my app to
preprocess words for spell-checks.

Lance

-Original Message-
From: Rob Casson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 28, 2007 5:16 PM
To: solr-user@lucene.apache.org
Subject: Re: LowerCaseFilterFactory and spellchecker

lance,

thanks for the quick replylooks like 'thorne' is getting added to
the dictionary, as it comes up as a suggestion for 'Thorne'

i could certainly just lowercase in my client, but just confirming that
i'm not just screwing it up in the firstplace :)

thanks again,
rc

On Nov 28, 2007 8:11 PM, Norskog, Lance <[EMAIL PROTECTED]> wrote:
> There are a few parameters for limiting what words are added to the 
> dictionary.  You might be trimming out 'thorne'. See this page:
>
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
>
>
> -Original Message-
> From: Rob Casson [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, November 28, 2007 4:25 PM
> To: solr-user@lucene.apache.org
> Subject: LowerCaseFilterFactory and spellchecker
>
> think i'm just doing something wrong...
>
> was experimenting with the spellcheck handler with the nightly 
> checkout from 11-28; seems my spellchecking is case-sensitive, even 
> tho i think i'm adding the LowerCaseFilterFactory to both the index 
> and query analyzers.
>
> here's a brief rundown of my testing steps.
>
> from schema.xml:
>
>  positionIncrementGap="100">
> 
> 
> 
>  class="solr.RemoveDuplicatesTokenFilterFactory"/>
> 
> 
> 
> 
> 
>  class="solr.RemoveDuplicatesTokenFilterFactory"/>
> 
> 
> 
>
>  multiValued="true"/>
>  multiValued="true"/>
>
> 
>
> 
>
> from solrconfig.xml:
>
>  class="solr.SpellCheckerRequestHandler" startup="lazy">
> 
> 1
> 0.5
> 
> spell
> spelling
> 
>
> 
>
> adding the doc:
>
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary ' name="title">Thorne'
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary ''
>
> 
>
> building the spellchecker:
>
> http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuil
> d
>
> 
>
> querying the spellchecker:
>
> results from 
> http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker
>
>  
> 
> 0
> 1
> 
> Thorne
> false
> 
> thorne
> 
> 
>
> results from 
> http://localhost:8983/solr/select/?q=thorne&qt=spellchecker
>
>  
> 
> 0
> 2
> 
> thorne
> true
> 
> 
>
>
> any pointers as to what i'm doing wrong, misinterpreting?  i suspect
i'm
> just doing something bone-headed in the analyzer sections...
>
> thanks as always,
>
> rob casson
> miami university libraries
>

RE: LowerCaseFilterFactory and spellchecker

2007-11-28 Thread Norskog, Lance

There are a few parameters for limiting what words are added to the
dictionary.  You might be trimming out 'thorne'. See this page:

http://wiki.apache.org/solr/SpellCheckerRequestHandler

-Original Message-
From: Rob Casson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 28, 2007 4:25 PM
To: solr-user@lucene.apache.org
Subject: LowerCaseFilterFactory and spellchecker

think i'm just doing something wrong...

was experimenting with the spellcheck handler with the nightly checkout
from 11-28; seems my spellchecking is case-sensitive, even tho i think
i'm adding the LowerCaseFilterFactory to both the index and query
analyzers.

here's a brief rundown of my testing steps.

from schema.xml:

from solrconfig.xml:

1
0.5

spell
spelling

adding the doc:

curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
--data-binary 'Thorne'
curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
--data-binary ''

building the spellchecker:

http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuild

querying the spellchecker:

results from http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker

0
1

Thorne
false

thorne

results from http://localhost:8983/solr/select/?q=thorne&qt=spellchecker

0
2

thorne
true

any pointers as to what i'm doing wrong, misinterpreting?  i suspect i'm
just doing something bone-headed in the analyzer sections...

thanks as always,

rob casson
miami university libraries

Re: LowerCaseFilterFactory and spellchecker

2007-11-28 Thread John Stewart

Rob,

Let's say it worked as you want it to in the first place.  If the
query is for Thurne, wouldn't you get thorne (lower-case 't') as the
suggestion?  This may look weird for proper names.

jds

Schema class configuration syntax

2007-11-28 Thread Norskog, Lance

Hi-
 
What is the  element in an  element that will load
this class:
 
org.apache.lucene.analysis.cn.ChineseFilter
 
This did not work:
 
 

This is in Solr 1.2.
 
Thanks,
 
Lance Norskog

Re: LSA Implementation

Re: Combining SOLR and JAMon to monitor query execution times from a browser

SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

Re: Memory use with sorting problem

Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

Re: CJK Analyzers for Solr

query parsing & wildcards

Re: SOLR / Tomcat JNDI Settings

Re: query parsing & wildcards

Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

Re: query parsing & wildcards

RequestHandler shared resources

Re: RequestHandler shared resources

Re: RequestHandler shared resources

Re: RequestHandler shared resources

LowerCaseFilterFactory and spellchecker

Re: LowerCaseFilterFactory and spellchecker

RE: LowerCaseFilterFactory and spellchecker

RE: LowerCaseFilterFactory and spellchecker

Re: LowerCaseFilterFactory and spellchecker

Schema class configuration syntax

23 matches

Site Navigation

Mail list logo

Footer information