Re: Building a web based search engine

Jim Anderson Tue, 02 Jun 2020 11:00:07 -0700

Hi Markus,

Thanks for your response. I appreciate you giving me the bullet list of
things to do. I can take that list and work from it and hopefully make
progress, but I don't think it will get me where I want to be - just a bit
closer.

You say, "We have been building precisely that for over ten years now". Is
it in a document? I would like to read it.

Some basic things I would like to know that should be documented:

1) Using nutch as the crawler, how do I run a nutch thread that crawls my
named URLs.
2) I will use nutch to visit websites and create documents in solr. How do
I verify that documents have been created in Solr via nutch?
3) Solr will store and index the documents. How do I verify the index?
4) I assume I can run a tomcat server on my host and then provide a
localhost URI to my web browser. Tomcat will then forward the URI to my
application. My application will take a query and using a java API is will
pass the query to Solr. I would like to see an example of a java program
passing a query to Solr.
5) Solr will take the query, parse it and then locate appropriate documents
using the index. Is there a log in Solr showing what queries have been
parsed?
6) Solr will pass back the list of documents it has located. I have not
really looked at this issue yet, but it would be nice to have an example of
this.

Jim

On Tue, Jun 2, 2020 at 12:12 PM Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello,
>
> We have been building precisely that for over ten years now. The '10,000
> foot level overview' is basically:
>
> * forget about Lucene for now, Solr uses it under the hood;
> * get Solr, and start it with the schema.xml file that comes with Nutch;
> * get Nutch, give it a set of domains or hosts to crawl and some URLs to
> start the crawl with and point the indexer towards the previously
> configured Solr;
> * put a proxy in front of Solr (we use Nginx), or skip this step if it is
> just an internal demo (do not expose Solr to the outside world);
> * make some basic JS tool that handles input and search result responses.
>
> This was our first web search engine prototype and it was set up in a few
> days. The chapter "How To Build A Web Based Search Engine With Solr, Lucene
> and Nutch" just means: set up Solr, and point Nutch towards it, and tell it
> to start crawling and indexing.
>
> Then there comes and endless list of things to improve, autocomplete,
> spell checking, query and click log handling and analysis, proper text
> extraction, etc.
>
> Regards,
> Markus
>
> -----Original message-----
> > From:Jim Anderson <jjanderson52...@gmail.com>
> > Sent: Tuesday 2nd June 2020 16:36
> > To: solr-user@lucene.apache.org
> > Subject: Building a web based search engine
> >
> > Hi,
> >
> > I have been looking at solr, lucene and nutch websites and tutuorials for
> > over a week now, experimenting and learning, but also frustrated be the
> > fact the I am totally missing the 'how to' do what I want. I see a lot of
> > examples of how to use each of the tools, but not how to put them all
> > together. I think an 'overview' at the 10,000 foot level is needed, Maybe
> > one is available and I have not yet found it. If someone can point me to
> > one, please do.
> >
> > If I am correct that an overview on "How To Build A Web Based Search
> Engine
> > With Solr, Lucene and Nutch" is not available, then I will be willing to
> > write an overview and make it available to the Solr community.  I will
> need
> > input, explanation and review of others.
> >
> > My 2 goals are:
> >
> > 1) Build a demo web based search engine [Note: I have a very specific
> > business need to able to demonstrate a web application on top of a search
> > engine. This demo is intended to show a 'proof of concept' of the web
> > application to a small audience.]
> >
> > 2) Document the process of building the demo and customizing it using the
> > java API so that others can more easily build their own web base search
> > engine.
> >
> > Jim Anderson
> >
>

Re: Building a web based search engine

Reply via email to