Re: Use Parallel Search

Gustavo Maia Fri, 04 Feb 2011 07:06:05 -0800

Hello,

I am not using Nutch.


Let me explain more about how to use the lucene.
The class has lucene RemoteSearch which a server machine is used to publish
its index.

RemoteSearchable remote = new RemoteSearchable (parallelSearcher);
Naming.rebind ("//"+ LocalIP +"/"+ artPortMap.getNick (), remote);


On the client it is necessary only to make lookuo based on IP of the
machine. In each class we use the machine ParallelSearch which allows me to
do searches in parallel using different processors dirent hds. Logo with 6
hds and a machine with more than 6 processors have a perfect looking into is
parallel.

In the example below is how to seek the reference of the server machine.
     Searchable ts = (Searchable) Naming.lookup ("//" + ip + ":" + port +
"/" + name);

My all document is in XML format. I have a pre processing that converts the
HTML document, DOC, PDF to XML.

The searches did not use facet, because the lucene not possible. That's one
reason I'm studying the SOLR. Today I have the need to use FACET:). Use
queries such as sorting, filtering, multiple fields ....

With this architecture have an index of 18 fragments scattered in 18hds of
three machines, each index fragment with a size of 10GB, which give me a
180GB total size of index.

But I'm afraid because I multiply by 10 the index going from 180GB to
1180GB. Apache SOLR is best suited for this new index size or can I continue
using lucene, being only necessary to add more machines?




2011/2/4 Ganesh <emailg...@yahoo.co.in>

> I am having similar kind of problem. I need to scale out. Could you explain
> how you have done distributed indexing and search using Lucene.
>
> Regards
> Ganesh
>
> ----- Original Message -----
> From: "Gustavo Maia" <gust...@goshme.com>
> To: <solr-user@lucene.apache.org>
> Sent: Thursday, February 03, 2011 11:36 PM
> Subject: Use Parallel Search
>
>
> > Hello,
> >
> > Let me give a brief description of my scenario.
> > Today I am only using Lucene 2.9.3. I have an index of 30 million
> documents
> > distributed on three machines and each machine with 6 hds (15k rmp).
> > The server queries the search index using the remote class search. And
> each
> > machine is made to search using the parallel search (search
> simultaneously
> > in 6 hds).
> > So during the search are simulating using the three machines and 18 hds,
> > returning me to a very good response time.
> >
> >
> > Today I am studying the SOLR and am interested in knowing more about the
> > searches and use of distributed parallel search on the same machine. What
> > would be the best scenario using SOLR that is better than I already am
> using
> > today only with lucene?
> >  Note: I need to have installed on each machine 6 SOLR instantiate from
> my
> > server? One for each hd? Or would some other alternative way for me to
> use
> > the 6 hds without having 6 instances of SORL server?
> >
> >  Another question would be if the SOLR would have some limiting size
> index
> > for Hard drive? It would be interesting not index too big because when
> the
> > index increased the longer the search.
> >
> > Thanks for everything.
> >
> >
> > Gustavo Maia
> >
> Send free SMS to your Friends on Mobile from your Yahoo! Messenger.
> Download Now! http://messenger.yahoo.com/download.php
>

Re: Use Parallel Search

Reply via email to