Re: Reusing lucene index file in Solr
On Mar 22, 2008, at 12:32 AM, Raghav Kapoor wrote: How can we re-use an existing lucene index file (.cfs) in Solr and search on it in solr? I need to do this as the index is created on one machine(client) to be used by solr server for searching. The solr server will refer to this index file by some http url. We cannot store this index file on the solr server. Solr needs file-level access to the Lucene index, perhaps by some shared disk - but not via HTTP. You certainly can use an indexed created by pure Java Lucene in Solr, provided the schema.xml jives with how the index is structured and is to be queried. Erik
Re: Reusing lucene index file in Solr
Hi Erik, Thanks for your response ! On Page 180 of Lucene In action, there is a reference for searching multiple indexes remotely using RMI. I am still trying to figure out how that works and if that would fit in our scenario. We have multiple client machines running a web server where the indexes will reside. Can the server running solr query these indexes remotely over http ? Regards, Raghav --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Mar 22, 2008, at 12:32 AM, Raghav Kapoor wrote: > > How can we re-use an existing lucene index file > (.cfs) > > in Solr and search on it in solr? > > I need to do this as the index is created on one > > machine(client) to be used by solr server for > > searching. The solr server will refer to this > index > > file by some http url. We cannot store this index > file > > on the solr server. > > Solr needs file-level access to the Lucene index, > perhaps by some > shared disk - but not via HTTP. > > You certainly can use an indexed created by pure > Java Lucene in Solr, > provided the schema.xml jives with how the index is > structured and is > to be queried. > > Erik > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
Converting lucene index into solr usable xml
Hi All: How can we convert the lucene index file into format that solr can understand. I have very little knowledge about solr and not sure if there is a way we can post the .cfs index file directly to the solr server with this command :- java -jar post.jar ? I assume post.jar only takes xml documents ? Any help would be appreciated ! Regards Raghav Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
Re: Tomcat 6.0 solr home not set (solved)
Well, just to add to this, the fact is that Tomcat (or any other container) will probably never have info about SOLR so while I sympathize with the "cleanness" aspect of not providing this info, it sucks when one is trying to figure it out. I subscribed to the wiki but I'm a little wary. Should I (can I?) just change the page? Or should I look at the markup, modify it and send it to you (or someone)? David Chris Hostetter wrote: I guess what I'm saying is: people should add any detail to the SolrTomcat page (and the other container pages) that's relevant to running Solr, but we should try to organize it in such a way that if you are already very knowledgable about Tomcat, you don't have to wade through a ton of stuff you already know to get to the stuff that's *really* Solr specific. -Hoss -- They must find it difficult, those who have taken authority as truth, rather than truth as authority. - Gerald Massey
Re: Reusing lucene index file in Solr
On Sat, Mar 22, 2008 at 12:22 PM, Raghav Kapoor <[EMAIL PROTECTED]> wrote: > On Page 180 of Lucene In action, there is a reference > for searching multiple indexes remotely using RMI. I > am still trying to figure out how that works and if > that would fit in our scenario. We have multiple > client machines running a web server where the indexes > will reside. Can the server running solr query these > indexes remotely over http ? You need something running locally to read and export the lucene index via whatever method. Reconsider your requirements to see if they really make sense. -Yonik
Re: Reusing lucene index file in Solr
Hi Yonik, Thanks for reply ! Once we have exported the index file on the server where Solr is running, how can we configure solr to use that index file and search on it ? In short, how does solr search on java lucene indexed files ? I am very new to Solr and am still trying to learn the basics. Since there is no proper documentation on solr, this mailing list is my only hope. Thanks Raghav ! --- Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Sat, Mar 22, 2008 at 12:22 PM, Raghav Kapoor > <[EMAIL PROTECTED]> wrote: > > On Page 180 of Lucene In action, there is a > reference > > for searching multiple indexes remotely using > RMI. I > > am still trying to figure out how that works and > if > > that would fit in our scenario. We have multiple > > client machines running a web server where the > indexes > > will reside. Can the server running solr query > these > > indexes remotely over http ? > > You need something running locally to read and > export the lucene index > via whatever method. > Reconsider your requirements to see if they really > make sense. > > -Yonik > Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs
Re: synonym dictionary inclusion
: I would like to incorporate a synonym dictionary! Is there any readymade : synonym dictionary/list available.. which : i can incorporate in my search module The SynonmFilter is ready to use for incorporating synonyms into Solr, but if you're looking for an actual list of Synonyms to use ... that tends to not only be language specific, but also domain specific (ie: you would probably use a different list of synonyms for car searching then you would for searching 18th century literature. Off the top of my head: WordNet should provide some nice general purpose (english language) synonyms. -Hoss
Re: Minimum should match and PhraseQuery
the topic has come up before on the lucene java lists (allthough i can't think of any good search terms to find the old threads .. I can't really remember how people have discribed this idea in the past) I don't remember anyone ever suggesting/sharing a general purpose solution intrinsicly more efficient then if you just generated all the permutations yourself : 2) I also want to relax PhraseQuery a bit so that it not only match "Senior : Java Developer"~2 but also matches "Java Developer"~2 but of course with a : lower score. I can programmatically generate on the combination but it's not : gonna be efficient if user issues query with many terms. -Hoss
Re: RAM size
: is there a way (or formula) to determine the required amount of RAM memory, : e.g. by number of documents, document size? There's a lot of factors that come into play ... number of documents and size of documents aren't nearly as significant as number of unique indexed terms. : with 4.000.000 documents, searching the index is quite fast, but when I trie : to sort the results, I get the well-known OutOfMemory error. I'm aware of the Sorting does have some pretty sell defined memory requirements. Sorting a field builds up a "FieldCache" ... esentailly an array with one slot per document of whatever type you are sorting on, so sorting an index of 15Million docs on an int field takes ~60Megs, string fields get more interesting. There the FieldCache maintains an int[] for each doc, and a String[] for each unique string value ... so sorting your 15M docs by a "category" string field where there are only 1 category names and they are all about 20 characters would take still only take ~60Megs, but sotring on a "title" field where every doc has a unique title and the average title length is 20 characters would take ~60Megs + ~290Megs If you plan on doing some static warming of your searches using your sorts as newSearcher events (which is a good idea so the first user to do a search after any commit doesn't have to wait a really long time for the FieldCache to be built) you'll need twice that (one FieldCache for the current searcher, one FieldCache for the "on deck" searcher). -Hoss
Re: cannot start solr after adding Analyzer, ClassCaseException error
: : : : : : I tried some different analyzer, but the same exception happened, so I think : it is solr's problem or my configuration has something wrong your configuration looks right, what does the source code for your PaodingAnalyzer look like? does it have a default (no arg) constructor? did you compile it using the same version of lucene that Solr is using (from the lib directory of your Solr release) ? -Hoss
Re: Nullpointer when using QuerySenderListener
: I'm developing against solr trunk and I wanted to start using the newSearcher : ans firstSearcher functionality. : However I'm getting a nullpointer exception when I startup my solr instance . ... : What I'm I doning wrong, because it looks like SearchHandler.inform(..) : is never called but handleRequestBody is You're doing nothing wrong, I can reproduce this error... it looks like SolrCore is running through the Event LIsteners before it's informing the Handlers ... kind of a chicken and egg problem actually. the contract of inform is suppose to be that it happens after the SolrCore is finished initalizing, but before any handleRequest calls are made ... but the newSearcher events happen before hte first and after the second. catch-22 ..I'll open a bug. -Hoss
Re: Tomcat 6.0 solr home not set (solved)
: Well, just to add to this, the fact is that Tomcat (or any other container) : will probably never have info about SOLR so while I sympathize with the : "cleanness" aspect of not providing this info, it sucks when one is trying to : figure it out. right ... but generic things about tomcat (like what a context file is, what the "path" attribute was for prior to Tomcat 5.5, where the access log is kept, etc...) can be found in the tomcat documentation ... putting lots of details about things like that in the SolrTomcat wiki isn't really appropriate ... that page should focus on stuff about Tomcat you should know if you are running Solr that you may not have ever learned about or worried about before even if you've been using tomcat for a long time. : I subscribed to the wiki but I'm a little wary. Should I (can I?) just change : the page? Or should I look at the markup, modify it and send it to you (or : someone)? it's a wiki ... edit away. email notification about all edits go to to the colr-commits list, if people disagree with something they'll discuss it on solr-dev ... or just change it again. :) -Hoss