Re: Reusing lucene index file in Solr

2008-03-22 Thread Erik Hatcher


On Mar 22, 2008, at 12:32 AM, Raghav Kapoor wrote:

How can we re-use an existing lucene index file (.cfs)
in Solr and search on it in solr?
I need to do this as the index is created on one
machine(client) to be used by solr server for
searching. The solr server will refer to this index
file by some http url. We cannot store this index file
on the solr server.


Solr needs file-level access to the Lucene index, perhaps by some  
shared disk - but not via HTTP.


You certainly can use an indexed created by pure Java Lucene in Solr,  
provided the schema.xml jives with how the index is structured and is  
to be queried.


Erik



Re: Reusing lucene index file in Solr

2008-03-22 Thread Raghav Kapoor
Hi Erik,

Thanks for your response !

On Page 180 of Lucene In action, there is a reference
for searching multiple indexes remotely using RMI. I
am still trying to figure out how that works and if
that would fit in our scenario. We have multiple
client machines running a web server where the indexes
will reside. Can the server running solr query these
indexes remotely over http ?

Regards,

Raghav


--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> 
> On Mar 22, 2008, at 12:32 AM, Raghav Kapoor wrote:
> > How can we re-use an existing lucene index file
> (.cfs)
> > in Solr and search on it in solr?
> > I need to do this as the index is created on one
> > machine(client) to be used by solr server for
> > searching. The solr server will refer to this
> index
> > file by some http url. We cannot store this index
> file
> > on the solr server.
> 
> Solr needs file-level access to the Lucene index,
> perhaps by some  
> shared disk - but not via HTTP.
> 
> You certainly can use an indexed created by pure
> Java Lucene in Solr,  
> provided the schema.xml jives with how the index is
> structured and is  
> to be queried.
> 
>   Erik
> 
> 



  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ


Converting lucene index into solr usable xml

2008-03-22 Thread Raghav Kapoor
Hi All:

How can we convert the lucene index file into format
that solr can understand. I have very little knowledge
about solr and not sure if there is a way we can post 
the .cfs index file directly to the solr server with
this command :-
java -jar post.jar ?

I assume post.jar only takes xml documents ?

Any help would be appreciated !

Regards

Raghav


  

Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  
http://tools.search.yahoo.com/newsearch/category.php?category=shopping


Re: Tomcat 6.0 solr home not set (solved)

2008-03-22 Thread David Arpad Geller
Well, just to add to this, the fact is that Tomcat (or any other 
container) will probably never have info about SOLR so while I 
sympathize with the "cleanness" aspect of not providing this info, it 
sucks when one is trying to figure it out.


I subscribed to the wiki but I'm a little wary.  Should I (can I?) just 
change the page?  Or should I look at the markup, modify it and send it 
to you (or someone)?


David

Chris Hostetter wrote:
I guess what I'm saying is: people should add any detail to 
the SolrTomcat page (and the other container pages) that's relevant to 
running Solr, but we should try to organize it in such a way that if you 
are already very knowledgable about Tomcat, you don't have to wade through 
a ton of stuff you already know to get to the stuff that's *really* Solr 
specific.


-Hoss

  

--
They must find it difficult, those who have taken authority as truth, rather 
than truth as authority. - Gerald Massey



Re: Reusing lucene index file in Solr

2008-03-22 Thread Yonik Seeley
On Sat, Mar 22, 2008 at 12:22 PM, Raghav Kapoor
<[EMAIL PROTECTED]> wrote:
>  On Page 180 of Lucene In action, there is a reference
>  for searching multiple indexes remotely using RMI. I
>  am still trying to figure out how that works and if
>  that would fit in our scenario. We have multiple
>  client machines running a web server where the indexes
>  will reside. Can the server running solr query these
>  indexes remotely over http ?

You need something running locally to read and export the lucene index
via whatever method.
Reconsider your requirements to see if they really make sense.

-Yonik


Re: Reusing lucene index file in Solr

2008-03-22 Thread Raghav Kapoor
Hi Yonik,

Thanks for reply !

Once we have exported the index file on the server
where Solr is running, how can we configure solr to
use that index file and search on it ?

In short, how does solr search on java lucene indexed
files ?

I am very new to Solr and am still trying to learn the
basics. Since there is no proper documentation on
solr, this mailing list is my only hope.

Thanks

Raghav !
--- Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Sat, Mar 22, 2008 at 12:22 PM, Raghav Kapoor
> <[EMAIL PROTECTED]> wrote:
> >  On Page 180 of Lucene In action, there is a
> reference
> >  for searching multiple indexes remotely using
> RMI. I
> >  am still trying to figure out how that works and
> if
> >  that would fit in our scenario. We have multiple
> >  client machines running a web server where the
> indexes
> >  will reside. Can the server running solr query
> these
> >  indexes remotely over http ?
> 
> You need something running locally to read and
> export the lucene index
> via whatever method.
> Reconsider your requirements to see if they really
> make sense.
> 
> -Yonik
> 



  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs


Re: synonym dictionary inclusion

2008-03-22 Thread Chris Hostetter

: I would like to incorporate a synonym dictionary! Is there any readymade
: synonym dictionary/list available.. which
: i can incorporate in my search module

The SynonmFilter is ready to use for incorporating synonyms into Solr, but 
if you're looking for an actual list of Synonyms to use ... that tends to 
not only be language specific, but also domain specific (ie: you would 
probably use a different list of synonyms for car searching then you 
would for searching 18th century literature.

Off the top of my head: WordNet should provide some nice general purpose 
(english language) synonyms.


-Hoss



Re: Minimum should match and PhraseQuery

2008-03-22 Thread Chris Hostetter

the topic has come up before on the lucene java lists (allthough i can't 
think of any good search terms to find the old threads .. I can't really 
remember how people have discribed this idea in the past)

I don't remember anyone ever suggesting/sharing a general purpose 
solution intrinsicly more efficient then if you just generated all the 
permutations yourself

: 2) I also want to relax PhraseQuery a bit so that it not only match "Senior
: Java Developer"~2 but also matches "Java Developer"~2 but of course with a
: lower score. I can programmatically generate on the combination but it's not
: gonna be efficient if user issues query with many terms.



-Hoss



Re: RAM size

2008-03-22 Thread Chris Hostetter

: is there a way (or formula) to determine the required amount of RAM memory,
: e.g. by number of documents, document size?

There's a lot of factors that come into play ... number of documents and 
size of documents aren't nearly as significant as number of unique indexed 
terms.

: with 4.000.000 documents, searching the index is quite fast, but when I trie
: to sort the results, I get the well-known OutOfMemory error. I'm aware of the

Sorting does have some pretty sell defined memory requirements.  Sorting a 
field builds up a "FieldCache" ... esentailly an array with one slot per 
document of whatever type you are sorting on, so sorting an index 
of 15Million docs on an int field takes ~60Megs, string fields get more 
interesting.  There the FieldCache maintains an int[] for each doc, and a 
String[] for each unique string value ... so sorting your 15M docs by a 
"category" string field where there are only 1 category names and they 
are all about 20 characters would take still only take ~60Megs, but 
sotring on a "title" field where every doc has a unique title and the 
average title length is 20 characters would take ~60Megs + ~290Megs

If you plan on doing some static warming of your searches using your sorts 
as newSearcher events (which is a good idea so the first user to do a 
search after any commit doesn't have to wait a really long time for the 
FieldCache to be built) you'll need twice that (one FieldCache for the 
current searcher, one FieldCache for the "on deck" searcher).

-Hoss



Re: cannot start solr after adding Analyzer, ClassCaseException error

2008-03-22 Thread Chris Hostetter

: 
: 
: 
:   
: 
: I tried some different analyzer, but the same exception happened, so I think
: it is solr's problem or my configuration has something wrong

your configuration looks right, what does the source code for your 
PaodingAnalyzer look like?  does it have a default (no arg) constructor?  
did you compile it using the same version of lucene that Solr is using 
(from the lib directory of your Solr release) ?


-Hoss



Re: Nullpointer when using QuerySenderListener

2008-03-22 Thread Chris Hostetter

: I'm developing against solr trunk and I wanted to start using the newSearcher
: ans firstSearcher functionality.
: However I'm getting a nullpointer exception when I startup my solr instance .
...
: What I'm I doning wrong, because it looks like SearchHandler.inform(..) 
: is never called but handleRequestBody is

You're doing nothing wrong, I can reproduce this error... it looks like 
SolrCore is running through the Event LIsteners before it's informing the 
Handlers ... kind of a chicken and egg problem actually.  the contract of 
inform is suppose to be that it happens after the SolrCore is finished 
initalizing, but before any handleRequest calls are made ... but the 
newSearcher events happen before hte first and after the second.

catch-22

..I'll open a bug.



-Hoss



Re: Tomcat 6.0 solr home not set (solved)

2008-03-22 Thread Chris Hostetter

: Well, just to add to this, the fact is that Tomcat (or any other container)
: will probably never have info about SOLR so while I sympathize with the
: "cleanness" aspect of not providing this info, it sucks when one is trying to
: figure it out.

right ... but generic things about tomcat (like what a context file is, 
what the "path" attribute was for prior to Tomcat 5.5, where the access 
log is kept, etc...) can be found in the tomcat documentation ... putting 
lots of details about things like that in the SolrTomcat wiki isn't really 
appropriate ... that page should focus on stuff about Tomcat you should 
know if you are running Solr that you may not have ever learned about or 
worried about before even if you've been using tomcat for a long time.

: I subscribed to the wiki but I'm a little wary.  Should I (can I?) just change
: the page?  Or should I look at the markup, modify it and send it to you (or
: someone)?

it's a wiki ... edit away.  email notification about all edits go to 
to the colr-commits list, if people disagree with something they'll 
discuss it on solr-dev  ...  or just change it again. :)



-Hoss