Re: Inconsistent results in Solr Search with Lucene Index
I fixed that problem with reconfiguring schema.xml. Thanks for your help. Jak Grant Ingersoll yazmış: Have you setup your Analyzers, etc. so they correspond to the exact ones that you were using in Lucene? Under the Solr Admin you can try the analysis tool to see how your index and queries are treated. What happens if you do a *:* query from the Admin query screen? If your index is reasonably sized, I would just reindex, but you shouldn't have to do this. -Grant On Nov 27, 2007, at 8:18 AM, trysteps wrote: Hi All, I am trying to use Solr Search with Lucene Index so just set all schema.xml configs like tokenize and field necessaries. But I can not get results like Lucene. For example , search for 'dog' returns lots of results with lucene but in Solr, I can't get any result. But search with 'dog*' returns same result with Lucene. What is the best way to integrate Lucene index to Solr, are there any well-documented sources? Thanks for your Attention, Trysteps -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
RE: Tips for searching
If you want any letter and any possible substring you might be better off breaking every word into single letters with special tokens between words: ie: the quick brown fox Becomes t h e ZZ q u i c k ZZ b r o w n ZZ f o x then you can do all the single letter searches and multi letter searches turn into phrase searches. Ie: uic (from quick) would be rewritten as "u i c" And so on. This should give you better performance and more predictable results than wildcard searches depending on the size and complexity of your data. Relevancy would be horrible since the tf/idf would always have a common denominator depending on character set but there are ways around that as well. - will -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Friday, November 30, 2007 7:51 PM To: solr-user@lucene.apache.org Subject: Re: Tips for searching On 30-Nov-07, at 4:43 PM, Dave C. wrote: > > Thanks for the quick response Mike... > Ideally it should match more than just a single character, i.e. > "the" in "weather" or "pro" in "profile" or "000" in "18000". > > Would these cases be taken care of by the StopFilterFactory? No... you are looking for variant of WildcardQuery's. Prefix wildcards are supported (pro* -> profile), but generalize wildcard queries aren't enabled by default. There has been lots of discussion on the list if you do a search. -Mike
Tomcat6?
In the Solr wiki, there is not described how to install Solr on Tomcat 6, and I not managed it myself :( In the chapter "Configuring Solr Home with JNDI" there is mentioned the directory $CATALINA_HOME/conf/Catalina/localhost , which not exists with TOMCAT 6. Alternatively I tried the folder $CATALINA_HOME/work/Catalina/localhost, but with no success.. (I can query the top level page, but the "Solr Admin" link then not works). Can anybody help? -- Dipl.-Inf. Jörg Kiegeland ikv++ technologies ag Bernburger Strasse 24-25, D-10963 Berlin e-mail: [EMAIL PROTECTED], web: http://www.ikv.de phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0 = Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg board of directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO) supervising board: Prof. Dr. Bernd Mahr (chairman) _
Re: Tomcat6?
In context.xml, I added.. I think that's all I did to get it working in Tocmat 6. --Matthew Runo On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote: In the Solr wiki, there is not described how to install Solr on Tomcat 6, and I not managed it myself :( In the chapter "Configuring Solr Home with JNDI" there is mentioned the directory $CATALINA_HOME/conf/Catalina/localhost , which not exists with TOMCAT 6. Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ localhost, but with no success.. (I can query the top level page, but the "Solr Admin" link then not works). Can anybody help? -- Dipl.-Inf. Jörg Kiegeland ikv++ technologies ag Bernburger Strasse 24-25, D-10963 Berlin e-mail: [EMAIL PROTECTED], web: http://www.ikv.de phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0 = Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg board of directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO) supervising board: Prof. Dr. Bernd Mahr (chairman) _
RE: Tomcat6?
$CALINA_HOME/conf/Catalina/localhost doesn't exist by default, but you can create it and it will work exactly the same way it did in Tomcat 5. It's not created by default because its not needed by the manager webapp anymore. -Original Message- From: Matthew Runo [mailto:[EMAIL PROTECTED] Sent: Monday, December 03, 2007 10:15 AM To: solr-user@lucene.apache.org Subject: Re: Tomcat6? In context.xml, I added.. I think that's all I did to get it working in Tocmat 6. --Matthew Runo On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote: > In the Solr wiki, there is not described how to install Solr on > Tomcat 6, and I not managed it myself :( > In the chapter "Configuring Solr Home with JNDI" there is mentioned > the directory $CATALINA_HOME/conf/Catalina/localhost , which not > exists with TOMCAT 6. > > Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ > localhost, but with no success.. (I can query the top level page, > but the "Solr Admin" link then not works). > > Can anybody help? > > -- > Dipl.-Inf. Jörg Kiegeland > ikv++ technologies ag > Bernburger Strasse 24-25, D-10963 Berlin > e-mail: [EMAIL PROTECTED], web: http://www.ikv.de > phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0 > = > Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg > board of directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO) > supervising board: Prof. Dr. Bernd Mahr (chairman) > _ >
RE: Solr Highlighting, word index
> You can tell lucene to store token offsets using TermVectors > (configurable via schema.xml). Then you can customize the request > handler to return the token offsets (and/or positions) by retrieving > the TVs. I think that is the best plan of action, how do I create a custom request handler that will use the existing indexed fields? There will be 2 requests as I see it, 1 for the search and 1 to retrieve the offsets when you view one of those found items. Any advice you can give me will be much appricated as I've had no luck with google so far. Thanks for your help so far, Best Regards, Martin Owens
How to delete records that don't contain a field?
I was wondering if there was a way to post a delete query using curl to delete all records that do not contain a certain field--something like this: curl http://localhost:8080/solr/update --data-binary '-_title:[* TO *]' -H 'Content-type:text/xml; charset=utf-8' The minus syntax seems to return the correct list of ids (that is, all records that do not contain the "_title" field) when I use the Solr administrative console to do the above query, so I'm wondering if Solr just doesn't support this type of delete. Thanks for any help...
Re: How to delete records that don't contain a field?
On Dec 3, 2007 5:22 PM, Jeff Leedy <[EMAIL PROTECTED]> wrote: > I was wondering if there was a way to post a delete query using curl to > delete all records that do not contain a certain field--something like > this: > > curl http://localhost:8080/solr/update --data-binary > '-_title:[* TO *]' -H > 'Content-type:text/xml; charset=utf-8' > > The minus syntax seems to return the correct list of ids (that is, all > records that do not contain the "_title" field) when I use the Solr > administrative console to do the above query, so I'm wondering if Solr > just doesn't support this type of delete. Not yet... it makes sense to support this in the future though. -Yonik
1.2 commit script chokes on 1.2 response format
LIke others before me, I stumbled across this bug, where solr/bin/commit warns that a commit failed when in fact it succeeded quite nicely, while getting collection distribution up & running today: http://www.mail-archive.com/solr-user@lucene.apache.org/msg04585.html It's a trivial fix, and it seems like it's already been done in trunk: http://svn.apache.org/viewvc/lucene/solr/trunk/src/scripts/commit?r1=543259&r2=555612&view=patch The change has not been applied to 1.2. It might be nice if it were. -Charlie
RE: How to delete records that don't contain a field?
Wouldn't this be: *:* AND "negative query" -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, December 03, 2007 2:23 PM To: solr-user@lucene.apache.org Subject: Re: How to delete records that don't contain a field? On Dec 3, 2007 5:22 PM, Jeff Leedy <[EMAIL PROTECTED]> wrote: > I was wondering if there was a way to post a delete query using curl > to delete all records that do not contain a certain field--something > like > this: > > curl http://localhost:8080/solr/update --data-binary > '-_title:[* TO *]' -H > 'Content-type:text/xml; charset=utf-8' > > The minus syntax seems to return the correct list of ids (that is, all > records that do not contain the "_title" field) when I use the Solr > administrative console to do the above query, so I'm wondering if Solr > just doesn't support this type of delete. Not yet... it makes sense to support this in the future though. -Yonik
Re: CJK Analyzers for Solr
it seems good. On Dec 3, 2007 1:01 AM, Ken Krugler <[EMAIL PROTECTED]> wrote: > >Wunder - are you aware of any free dictionaries > >for either C or J or K? When I dealt with this > >in the past, I looked for something free, but > >found only commercial dictionaries. > > I would use data files from: > > http://ftp.monash.edu.au/pub/nihongo/00INDEX.html > > -- Ken > > > >Sematext -- http://sematext.com/ -- Lucene - > >Solr - Nutch - Original Message From: > >Walter Underwood <[EMAIL PROTECTED]> To: > >solr-user@lucene.apache.org Sent: Wednesday, > >November 28, 2007 5:43:32 PM Subject: Re: CJK > >Analyzers for Solr With Ultraseek, we switched > >to a dictionary-based segmenter for Chinese > >because the N-gram highlighting wasn't > >acceptable to our Chinese customers. I guess it > >is something to check for each application. > >wunder On 11/27/07 10:46 PM, "Otis Gospodnetic" > ><[EMAIL PROTECTED]> wrote: > For what > >it's worth I worked on indexing and searching a > >*massive* pile of > data, a good portion of > >which was in CJ and some K. The n-gram approach > >was > used for all 3 languages and the quality > >of search results, including > highlighting was > >evaluated and okay-ed by native speakers of > >these languages. > > Otis > -- > Sematext -- > >http://sematext.com/ -- Lucene - Solr - > >Nutch > > - Original Message > From: > >Walter Underwood <[EMAIL PROTECTED]> > To: > >solr-user@lucene.apache.org > Sent: Tuesday, > >November 27, 2007 2:41:38 PM > Subject: Re: CJK > >Analyzers for Solr > > Dictionaries are > >surprisingly expensive to build and maintain > >and > bi-gram is surprisingly effective for > >Chinese. See this paper: > > > >http://citeseer.ist.psu.edu/kwok97comparing.html > > > >I expect that n-gram indexing would be less > >effective for Japanese > because it is an > >inflected language. Korean is even harder. It > >might > work to break Korean into the phonetic > >subparts and use n-gram on > those. > > You > >should not do term highlighting with any of the > >n-gram methods. > The relevance can be very > >good, but the highlighting just looks dumb. > > > >wunder > > On 11/27/07 8:54 AM, "Eswar K" > ><[EMAIL PROTECTED]> wrote: > >> Is there any > >specific reason why the CJK analyzers in Solr > >were > chosen to be >> n-gram based instead of > >it being a morphological analyzer which is > > >kind of >> implemented in Google as it > >considered to be more effective than the > > >n-gram >> ones? >> >> Regards, >> > >Eswar >> >> >> >> On Nov 27, 2007 7:57 AM, Eswar > >K <[EMAIL PROTECTED]> wrote: >> >>> thanks > >james... >>> >>> How much time does it take to > >index 18m docs? >>> >>> - Eswar >>> >>> >>> On > >Nov 27, 2007 7:43 AM, James liu > ><[EMAIL PROTECTED] > wrote: >>> i not > >use HYLANDA analyzer. i use > >je-analyzer and indexing at least 18m > >docs. i m sorry i only use chinese > >analyzer. On Nov 27, 2007 10:01 > >AM, Eswar K <[EMAIL PROTECTED]> > >wrote: > What is the performance of > >these CJK analyzers (one in lucene and > >hylanda > )? > We would potentially be > >indexing millions of documents. > > > >James, > > We would have a look at > >hylanda too. What abt japanese and korean > > >analyzers, > any > >recommendations? > > - Eswar > > > >On Nov 27, 2007 7:21 AM, James liu > ><[EMAIL PROTECTED]> > wrote: > >> > >I don't think NGram is good method for > >Chinese. >> >> CJKAnalyzer of Lucene is > >2-Gram. >> >> Eswar K: >> if it is > >chinese analyzer,,i recommend > > >hylandaÅiwww.hylanda.comÅj,,,it is >> > >the best chinese analyzer and it not > >free. >> if u wanna free chinese analyzer, > >maybe u can try je-analyzer. > it > >have >> some problem when using > >it. >> >> >> >> On Nov 27, 2007 > >5:56 AM, Otis Gospodnetic < > >[EMAIL PROTECTED]> >> > >wrote: >> >>> Eswar, >>> >>> > >We've uses the NGram stuff that exists in > >Lucene's contrib/analyzers >>> instead > >of CJK. Doesn't that allow you to do everything > >that > the >> Chinese >>> and CJK > >analyzers do? It's been a few months since I've > >looked > at >> Chinese >>> and CJK > >Analzyers, so I could be off. >>> >>> > >Otis >>> >>> -- >>> Sematext -- > >http://sematext.com/ -- Lucene - Solr - > >Nutch >>> >>> - Original Message > > >>> From: Eswar K > ><[EMAIL PROTECTED]> >>> To: > >solr-user@lucene.apache.org >>> Sent: > >Monday, November 26, 2007 8:30:52 AM >>> > >Subject: CJK Analyzers for Solr >>> >>> > >Hi, >>> >>> Does Solr come with Language > >analyzers for CJK? If not, can you > >please >>> direct me to some good CJK > >analyzers? >>> >>> Regards, >>> > >Eswar >>> >>> >>> >>> >> >> >> > >-- >> regards >> > >jl >> > >>>