Re: Inconsistent results in Solr Search with Lucene Index

2007-12-03 Thread trysteps

I fixed that problem with reconfiguring schema.xml.
Thanks for your help.
Jak

Grant Ingersoll yazmış:
Have you setup your Analyzers, etc. so they correspond to the exact 
ones that you were using in Lucene? Under the Solr Admin you can try 
the analysis tool to see how your index and queries are treated. What 
happens if you do a *:* query from the Admin query screen?


If your index is reasonably sized, I would just reindex, but you 
shouldn't have to do this.


-Grant

On Nov 27, 2007, at 8:18 AM, trysteps wrote:


Hi All,
I am trying to use Solr Search with Lucene Index so just set all 
schema.xml configs like tokenize and field necessaries.

But I can not get results like Lucene.
For example ,
search for 'dog' returns lots of results with lucene but in Solr, I 
can't get any result. But search with 'dog*' returns same result with 
Lucene.
What is the best way to integrate Lucene index to Solr, are there any 
well-documented sources?

Thanks for your Attention,
Trysteps



--
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








RE: Tips for searching

2007-12-03 Thread Will Johnson
If you want any letter and any possible substring you might be better off
breaking every word into single letters with special tokens between words:
ie:

the quick brown fox

Becomes

t h e ZZ q u i c k ZZ b r o w n ZZ f o x

then you can do all the single letter searches and multi letter searches
turn into phrase searches.  Ie:

uic (from quick)

would be rewritten as

"u i c"

And so on.  This should give you better performance and more predictable
results than wildcard searches depending on the size and complexity of your
data.  Relevancy would be horrible since the tf/idf would always have a
common denominator depending on character set but there are ways around that
as well.

- will 

 

-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 30, 2007 7:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Tips for searching

On 30-Nov-07, at 4:43 PM, Dave C. wrote:

>
> Thanks for the quick response Mike...
> Ideally it should match more than just a single character, i.e.  
> "the" in "weather" or "pro" in "profile" or "000" in "18000".
>
> Would these cases be taken care of by the StopFilterFactory?

No... you are looking for variant of WildcardQuery's.  Prefix  
wildcards are supported (pro* -> profile), but generalize wildcard  
queries aren't enabled by default.  There has been lots of discussion  
on the list if you do a search.

-Mike



Tomcat6?

2007-12-03 Thread Jörg Kiegeland
In the Solr wiki, there is not described how to install Solr on Tomcat 
6, and I not managed it myself :(
In the chapter "Configuring Solr Home with JNDI" there is mentioned the 
directory $CATALINA_HOME/conf/Catalina/localhost , which not exists with 
TOMCAT 6.


Alternatively I tried the folder $CATALINA_HOME/work/Catalina/localhost, 
but with no success.. (I can query the top level page, but the "Solr 
Admin" link then not works).


Can anybody help?

--
Dipl.-Inf. Jörg Kiegeland
ikv++ technologies ag
Bernburger Strasse 24-25, D-10963 Berlin
e-mail: [EMAIL PROTECTED], web: http://www.ikv.de
phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0
=
Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg
board of  directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO)
supervising board: Prof. Dr. Bernd Mahr (chairman)
_



Re: Tomcat6?

2007-12-03 Thread Matthew Runo

In context.xml, I added..




I think that's all I did to get it working in Tocmat 6.

--Matthew Runo

On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote:

In the Solr wiki, there is not described how to install Solr on  
Tomcat 6, and I not managed it myself :(
In the chapter "Configuring Solr Home with JNDI" there is mentioned  
the directory $CATALINA_HOME/conf/Catalina/localhost , which not  
exists with TOMCAT 6.


Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ 
localhost, but with no success.. (I can query the top level page,  
but the "Solr Admin" link then not works).


Can anybody help?

--
Dipl.-Inf. Jörg Kiegeland
ikv++ technologies ag
Bernburger Strasse 24-25, D-10963 Berlin
e-mail: [EMAIL PROTECTED], web: http://www.ikv.de
phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0
=
Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg
board of  directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO)
supervising board: Prof. Dr. Bernd Mahr (chairman)
_





RE: Tomcat6?

2007-12-03 Thread Charlie Jackson
$CALINA_HOME/conf/Catalina/localhost doesn't exist by default, but you can 
create it and it will work exactly the same way it did in Tomcat 5. It's not 
created by default because its not needed by the manager webapp anymore.


-Original Message-
From: Matthew Runo [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 03, 2007 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Tomcat6?

In context.xml, I added..



I think that's all I did to get it working in Tocmat 6.

--Matthew Runo

On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote:

> In the Solr wiki, there is not described how to install Solr on  
> Tomcat 6, and I not managed it myself :(
> In the chapter "Configuring Solr Home with JNDI" there is mentioned  
> the directory $CATALINA_HOME/conf/Catalina/localhost , which not  
> exists with TOMCAT 6.
>
> Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ 
> localhost, but with no success.. (I can query the top level page,  
> but the "Solr Admin" link then not works).
>
> Can anybody help?
>
> -- 
> Dipl.-Inf. Jörg Kiegeland
> ikv++ technologies ag
> Bernburger Strasse 24-25, D-10963 Berlin
> e-mail: [EMAIL PROTECTED], web: http://www.ikv.de
> phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0
> =
> Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg
> board of  directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO)
> supervising board: Prof. Dr. Bernd Mahr (chairman)
> _
>



RE: Solr Highlighting, word index

2007-12-03 Thread Owens, Martin


> You can tell lucene to store token offsets using TermVectors  
> (configurable via schema.xml).  Then you can customize the request  
> handler to return the token offsets (and/or positions) by retrieving  
> the TVs.

I think that is the best plan of action, how do I create a custom request 
handler that will use the existing indexed fields? There will be 2 requests as 
I see it, 1 for the search and 1 to retrieve the offsets when you view one of 
those found items. Any advice you can give me will be much appricated as I've 
had no luck with google so far.

Thanks for your help so far,

Best Regards, Martin Owens



How to delete records that don't contain a field?

2007-12-03 Thread Jeff Leedy
I was wondering if there was a way to post a delete query using curl to 
delete all records that do not contain a certain field--something like this:


curl http://localhost:8080/solr/update --data-binary 
'-_title:[* TO *]' -H 
'Content-type:text/xml; charset=utf-8'


The minus syntax seems to return the correct list of ids (that is, all 
records that do not contain the "_title" field) when I use the Solr 
administrative console to do the above query, so I'm wondering if Solr 
just doesn't support this type of delete.


Thanks for any help...


Re: How to delete records that don't contain a field?

2007-12-03 Thread Yonik Seeley
On Dec 3, 2007 5:22 PM, Jeff Leedy <[EMAIL PROTECTED]> wrote:

> I was wondering if there was a way to post a delete query using curl to
> delete all records that do not contain a certain field--something like
> this:
>
> curl http://localhost:8080/solr/update --data-binary
> '-_title:[* TO *]' -H
> 'Content-type:text/xml; charset=utf-8'
>
> The minus syntax seems to return the correct list of ids (that is, all
> records that do not contain the "_title" field) when I use the Solr
> administrative console to do the above query, so I'm wondering if Solr
> just doesn't support this type of delete.


Not yet... it makes sense to support this in the future though.

-Yonik


1.2 commit script chokes on 1.2 response format

2007-12-03 Thread Charles Hornberger
LIke others before me, I stumbled across this bug, where
solr/bin/commit warns that a commit failed when in fact it succeeded
quite nicely, while getting collection distribution up & running
today:

http://www.mail-archive.com/solr-user@lucene.apache.org/msg04585.html

It's a trivial fix, and it seems like it's already been done in trunk:


http://svn.apache.org/viewvc/lucene/solr/trunk/src/scripts/commit?r1=543259&r2=555612&view=patch

The change has not been applied to 1.2. It might be nice if it were.

-Charlie


RE: How to delete records that don't contain a field?

2007-12-03 Thread Norskog, Lance
Wouldn't this be: *:* AND "negative query" 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Monday, December 03, 2007 2:23 PM
To: solr-user@lucene.apache.org
Subject: Re: How to delete records that don't contain a field?

On Dec 3, 2007 5:22 PM, Jeff Leedy <[EMAIL PROTECTED]> wrote:

> I was wondering if there was a way to post a delete query using curl 
> to delete all records that do not contain a certain field--something 
> like
> this:
>
> curl http://localhost:8080/solr/update --data-binary
> '-_title:[* TO *]' -H 
> 'Content-type:text/xml; charset=utf-8'
>
> The minus syntax seems to return the correct list of ids (that is, all

> records that do not contain the "_title" field) when I use the Solr 
> administrative console to do the above query, so I'm wondering if Solr

> just doesn't support this type of delete.


Not yet... it makes sense to support this in the future though.

-Yonik


Re: CJK Analyzers for Solr

2007-12-03 Thread James liu
it seems good.

On Dec 3, 2007 1:01 AM, Ken Krugler <[EMAIL PROTECTED]> wrote:

> >Wunder - are you aware of any free dictionaries
> >for either C or J or K?  When I dealt with this
> >in the past, I looked for something free, but
> >found only commercial dictionaries.
>
> I would use data files from:
>
> http://ftp.monash.edu.au/pub/nihongo/00INDEX.html
>
> -- Ken
>
>
> >Sematext -- http://sematext.com/ -- Lucene -
> >Solr - Nutch - Original Message  From:
> >Walter Underwood <[EMAIL PROTECTED]> To:
> >solr-user@lucene.apache.org Sent: Wednesday,
> >November 28, 2007 5:43:32 PM Subject: Re: CJK
> >Analyzers for Solr With Ultraseek, we switched
> >to a dictionary-based segmenter for Chinese
> >because the N-gram highlighting wasn't
> >acceptable to our Chinese customers. I guess it
> >is something to check for each application.
> >wunder On 11/27/07 10:46 PM, "Otis Gospodnetic"
> ><[EMAIL PROTECTED]> wrote: > For what
> >it's worth I worked on indexing and searching a
> >*massive* pile of > data, a good portion of
> >which was in CJ and some K.  The n-gram approach
> >was > used for all 3 languages and the quality
> >of search results, including > highlighting was
> >evaluated and okay-ed by native speakers of
> >these languages. > > Otis > -- > Sematext --
> >http://sematext.com/ -- Lucene - Solr -
> >Nutch > > - Original Message  > From:
> >Walter Underwood <[EMAIL PROTECTED]> > To:
> >solr-user@lucene.apache.org > Sent: Tuesday,
> >November 27, 2007 2:41:38 PM > Subject: Re: CJK
> >Analyzers for Solr > > Dictionaries are
> >surprisingly expensive to build and maintain
> >and > bi-gram is surprisingly effective for
> >Chinese. See this paper: > >
> >http://citeseer.ist.psu.edu/kwok97comparing.html > >
> >I expect that n-gram indexing would be less
> >effective for Japanese > because it is an
> >inflected language. Korean is even harder. It
> >might > work to break Korean into the phonetic
> >subparts and use n-gram on > those. > > You
> >should not do term highlighting with any of the
> >n-gram methods. > The relevance can be very
> >good, but the highlighting just looks dumb. > >
> >wunder > > On 11/27/07 8:54 AM, "Eswar K"
> ><[EMAIL PROTECTED]> wrote: > >> Is there any
> >specific reason why the CJK analyzers in Solr
> >were >  chosen to be >> n-gram based instead of
> >it being a morphological analyzer which is >
> >kind of >> implemented in Google as it
> >considered to be more effective than the >
> >n-gram >> ones? >> >> Regards, >>
> >Eswar >> >> >> >> On Nov 27, 2007 7:57 AM, Eswar
> >K <[EMAIL PROTECTED]> wrote: >> >>> thanks
> >james... >>> >>> How much time does it take to
> >index 18m docs? >>> >>> - Eswar >>> >>> >>> On
> >Nov 27, 2007 7:43 AM, James liu
> ><[EMAIL PROTECTED] > wrote: >>>  i not
> >use HYLANDA analyzer.   i use
> >je-analyzer and indexing at least 18m
> >docs.   i m sorry i only use chinese
> >analyzer.    On Nov 27, 2007 10:01
> >AM, Eswar K <[EMAIL PROTECTED]>
> >wrote:  > What is the performance of
> >these CJK analyzers (one in lucene and 
> >hylanda > )? > We would potentially be
> >indexing millions of documents. > >
> >James, > > We would have a look at
> >hylanda too. What abt japanese and korean >
> >analyzers, > any
> >recommendations? > > - Eswar > >
> >On Nov 27, 2007 7:21 AM, James liu
> ><[EMAIL PROTECTED]> >  wrote: > >>
> >I don't think NGram is good method for
> >Chinese. >> >> CJKAnalyzer of Lucene is
> >2-Gram. >> >> Eswar K: >>  if it is
> >chinese analyzer,,i recommend >
> >hylandaÅiwww.hylanda.comÅj,,,it  is >>
> >the best chinese analyzer and it not
> >free. >>  if u wanna free chinese analyzer,
> >maybe u can try je-analyzer. >  it 
> >have >> some problem when using
> >it. >> >> >> >> On Nov 27, 2007
> >5:56 AM, Otis Gospodnetic < 
> >[EMAIL PROTECTED]> >>
> >wrote: >> >>> Eswar, >>> >>>
> >We've uses the NGram stuff that exists in
> >Lucene's  contrib/analyzers >>> instead
> >of CJK.  Doesn't that allow you to do everything
> >that >  the >> Chinese >>> and CJK
> >analyzers do?  It's been a few months since I've
> >looked >  at >> Chinese >>> and CJK
> >Analzyers, so I could be off. >>> >>>
> >Otis >>> >>> -- >>> Sematext --
> >http://sematext.com/ -- Lucene - Solr -
> >Nutch >>> >>> - Original Message
> > >>> From: Eswar K
> ><[EMAIL PROTECTED]> >>> To:
> >solr-user@lucene.apache.org >>> Sent:
> >Monday, November 26, 2007 8:30:52 AM >>>
> >Subject: CJK Analyzers for Solr >>> >>>
> >Hi, >>> >>> Does Solr come with Language
> >analyzers for CJK? If not, can you 
> >please >>> direct me to some good CJK
> >analyzers? >>> >>> Regards, >>>
> >Eswar >>> >>> >>> >>> >> >> >>
> >-- >> regards >>
> >jl >> > >>>