Markus, Thank for your replies. I will review them and experiment more and see if I can get everything working.
Jim On Tue, Jun 2, 2020 at 2:36 PM Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello, see inline. > > Markus > > -----Original message----- > > From:Jim Anderson <jjanderson52...@gmail.com> > > Sent: Tuesday 2nd June 2020 19:59 > > To: solr-user@lucene.apache.org > > Subject: Re: Building a web based search engine > > > > Hi Markus, > > > > Thanks for your response. I appreciate you giving me the bullet list of > > things to do. I can take that list and work from it and hopefully make > > progress, but I don't think it will get me where I want to be - just a > bit > > closer. > > > > You say, "We have been building precisely that for over ten years now". > Is > > it in a document? I would like to read it. > > No, i haven't written a book about it and don't intend to. > > > Some basic things I would like to know that should be documented: > > > > 1) Using nutch as the crawler, how do I run a nutch thread that crawls my > > named URLs. > > You don't, but run Nutch as a separate process from the command line. Or > when you have to deal with 50+ million records, you run it on Apache Hadoop. > > > 2) I will use nutch to visit websites and create documents in solr. How > do > > I verify that documents have been created in Solr via nutch? > > By searching for them using Solr, or retrieving them by URL, using Solr's > simple HTTP API. You can use SolrJ, the Java client, too. > > > 3) Solr will store and index the documents. How do I verify the index? > > See 2. > > > 4) I assume I can run a tomcat server on my host and then provide a > > localhost URI to my web browser. Tomcat will then forward the URI to my > > application. My application will take a query and using a java API is > will > > pass the query to Solr. I would like to see an example of a java program > > passing a query to Solr. > > See 3. Though i would recommend to use Solr's HTTP API, it is much easier > to deal with. > > > 5) Solr will take the query, parse it and then locate appropriate > documents > > using the index. Is there a log in Solr showing what queries have been > > parsed? > > Yes, see Solr's log directory. > > > 6) Solr will pass back the list of documents it has located. I have not > > really looked at this issue yet, but it would be nice to have an example > of > > this. > > Search for a SolrJ tutorial, they are plentiful. Also check out Solr's own > extensive manual, everything you need is there. > > > Jim > > > > > > > > On Tue, Jun 2, 2020 at 12:12 PM Markus Jelsma < > markus.jel...@openindex.io> > > wrote: > > > > > Hello, > > > > > > We have been building precisely that for over ten years now. The > '10,000 > > > foot level overview' is basically: > > > > > > * forget about Lucene for now, Solr uses it under the hood; > > > * get Solr, and start it with the schema.xml file that comes with > Nutch; > > > * get Nutch, give it a set of domains or hosts to crawl and some URLs > to > > > start the crawl with and point the indexer towards the previously > > > configured Solr; > > > * put a proxy in front of Solr (we use Nginx), or skip this step if it > is > > > just an internal demo (do not expose Solr to the outside world); > > > * make some basic JS tool that handles input and search result > responses. > > > > > > This was our first web search engine prototype and it was set up in a > few > > > days. The chapter "How To Build A Web Based Search Engine With Solr, > Lucene > > > and Nutch" just means: set up Solr, and point Nutch towards it, and > tell it > > > to start crawling and indexing. > > > > > > Then there comes and endless list of things to improve, autocomplete, > > > spell checking, query and click log handling and analysis, proper text > > > extraction, etc. > > > > > > Regards, > > > Markus > > > > > > -----Original message----- > > > > From:Jim Anderson <jjanderson52...@gmail.com> > > > > Sent: Tuesday 2nd June 2020 16:36 > > > > To: solr-user@lucene.apache.org > > > > Subject: Building a web based search engine > > > > > > > > Hi, > > > > > > > > I have been looking at solr, lucene and nutch websites and > tutuorials for > > > > over a week now, experimenting and learning, but also frustrated be > the > > > > fact the I am totally missing the 'how to' do what I want. I see a > lot of > > > > examples of how to use each of the tools, but not how to put them all > > > > together. I think an 'overview' at the 10,000 foot level is needed, > Maybe > > > > one is available and I have not yet found it. If someone can point > me to > > > > one, please do. > > > > > > > > If I am correct that an overview on "How To Build A Web Based Search > > > Engine > > > > With Solr, Lucene and Nutch" is not available, then I will be > willing to > > > > write an overview and make it available to the Solr community. I > will > > > need > > > > input, explanation and review of others. > > > > > > > > My 2 goals are: > > > > > > > > 1) Build a demo web based search engine [Note: I have a very specific > > > > business need to able to demonstrate a web application on top of a > search > > > > engine. This demo is intended to show a 'proof of concept' of the web > > > > application to a small audience.] > > > > > > > > 2) Document the process of building the demo and customizing it > using the > > > > java API so that others can more easily build their own web base > search > > > > engine. > > > > > > > > Jim Anderson > > > > > > > > > >