Markus,

Thank for your replies. I will review them and experiment more and see if I
can get everything working.

Jim

On Tue, Jun 2, 2020 at 2:36 PM Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello, see inline.
>
> Markus
>
> -----Original message-----
> > From:Jim Anderson <jjanderson52...@gmail.com>
> > Sent: Tuesday 2nd June 2020 19:59
> > To: solr-user@lucene.apache.org
> > Subject: Re: Building a web based search engine
> >
> > Hi Markus,
> >
> > Thanks for your response. I appreciate you giving me the bullet list of
> > things to do. I can take that list and work from it and hopefully make
> > progress, but I don't think it will get me where I want to be - just a
> bit
> > closer.
> >
> > You say, "We have been building precisely that for over ten years now".
> Is
> > it in a document? I would like to read it.
>
> No, i haven't written a book about it and don't intend to.
>
> > Some basic things I would like to know that should be documented:
> >
> > 1) Using nutch as the crawler, how do I run a nutch thread that crawls my
> > named URLs.
>
> You don't, but run Nutch as a separate process from the command line. Or
> when you have to deal with 50+ million records, you run it on Apache Hadoop.
>
> > 2) I will use nutch to visit websites and create documents in solr. How
> do
> > I verify that documents have been created in Solr via nutch?
>
> By searching for them using Solr, or retrieving them by URL, using Solr's
> simple HTTP API. You can use SolrJ, the Java client, too.
>
> > 3) Solr will store and index the documents. How do I verify the index?
>
> See 2.
>
> > 4) I assume I can run a tomcat server on my host and then provide a
> > localhost URI to my web browser. Tomcat will then forward the URI to my
> > application. My application will take a query and using a java API is
> will
> > pass the query to Solr. I would like to see an example of a java program
> > passing a query to Solr.
>
> See 3. Though i would recommend to use Solr's HTTP API, it is much easier
> to deal with.
>
> > 5) Solr will take the query, parse it and then locate appropriate
> documents
> > using the index. Is there a log in Solr showing what queries have been
> > parsed?
>
> Yes, see Solr's log directory.
>
> > 6) Solr will pass back the list of documents it has located. I have not
> > really looked at this issue yet, but it would be nice to have an example
> of
> > this.
>
> Search for a SolrJ tutorial, they are plentiful. Also check out Solr's own
> extensive manual, everything you need is there.
>
> > Jim
> >
> >
> >
> > On Tue, Jun 2, 2020 at 12:12 PM Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello,
> > >
> > > We have been building precisely that for over ten years now. The
> '10,000
> > > foot level overview' is basically:
> > >
> > > * forget about Lucene for now, Solr uses it under the hood;
> > > * get Solr, and start it with the schema.xml file that comes with
> Nutch;
> > > * get Nutch, give it a set of domains or hosts to crawl and some URLs
> to
> > > start the crawl with and point the indexer towards the previously
> > > configured Solr;
> > > * put a proxy in front of Solr (we use Nginx), or skip this step if it
> is
> > > just an internal demo (do not expose Solr to the outside world);
> > > * make some basic JS tool that handles input and search result
> responses.
> > >
> > > This was our first web search engine prototype and it was set up in a
> few
> > > days. The chapter "How To Build A Web Based Search Engine With Solr,
> Lucene
> > > and Nutch" just means: set up Solr, and point Nutch towards it, and
> tell it
> > > to start crawling and indexing.
> > >
> > > Then there comes and endless list of things to improve, autocomplete,
> > > spell checking, query and click log handling and analysis, proper text
> > > extraction, etc.
> > >
> > > Regards,
> > > Markus
> > >
> > > -----Original message-----
> > > > From:Jim Anderson <jjanderson52...@gmail.com>
> > > > Sent: Tuesday 2nd June 2020 16:36
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Building a web based search engine
> > > >
> > > > Hi,
> > > >
> > > > I have been looking at solr, lucene and nutch websites and
> tutuorials for
> > > > over a week now, experimenting and learning, but also frustrated be
> the
> > > > fact the I am totally missing the 'how to' do what I want. I see a
> lot of
> > > > examples of how to use each of the tools, but not how to put them all
> > > > together. I think an 'overview' at the 10,000 foot level is needed,
> Maybe
> > > > one is available and I have not yet found it. If someone can point
> me to
> > > > one, please do.
> > > >
> > > > If I am correct that an overview on "How To Build A Web Based Search
> > > Engine
> > > > With Solr, Lucene and Nutch" is not available, then I will be
> willing to
> > > > write an overview and make it available to the Solr community.  I
> will
> > > need
> > > > input, explanation and review of others.
> > > >
> > > > My 2 goals are:
> > > >
> > > > 1) Build a demo web based search engine [Note: I have a very specific
> > > > business need to able to demonstrate a web application on top of a
> search
> > > > engine. This demo is intended to show a 'proof of concept' of the web
> > > > application to a small audience.]
> > > >
> > > > 2) Document the process of building the demo and customizing it
> using the
> > > > java API so that others can more easily build their own web base
> search
> > > > engine.
> > > >
> > > > Jim Anderson
> > > >
> > >
> >
>

Reply via email to