Hi Markus, Thanks for your response. I appreciate you giving me the bullet list of things to do. I can take that list and work from it and hopefully make progress, but I don't think it will get me where I want to be - just a bit closer.
You say, "We have been building precisely that for over ten years now". Is it in a document? I would like to read it. Some basic things I would like to know that should be documented: 1) Using nutch as the crawler, how do I run a nutch thread that crawls my named URLs. 2) I will use nutch to visit websites and create documents in solr. How do I verify that documents have been created in Solr via nutch? 3) Solr will store and index the documents. How do I verify the index? 4) I assume I can run a tomcat server on my host and then provide a localhost URI to my web browser. Tomcat will then forward the URI to my application. My application will take a query and using a java API is will pass the query to Solr. I would like to see an example of a java program passing a query to Solr. 5) Solr will take the query, parse it and then locate appropriate documents using the index. Is there a log in Solr showing what queries have been parsed? 6) Solr will pass back the list of documents it has located. I have not really looked at this issue yet, but it would be nice to have an example of this. Jim On Tue, Jun 2, 2020 at 12:12 PM Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello, > > We have been building precisely that for over ten years now. The '10,000 > foot level overview' is basically: > > * forget about Lucene for now, Solr uses it under the hood; > * get Solr, and start it with the schema.xml file that comes with Nutch; > * get Nutch, give it a set of domains or hosts to crawl and some URLs to > start the crawl with and point the indexer towards the previously > configured Solr; > * put a proxy in front of Solr (we use Nginx), or skip this step if it is > just an internal demo (do not expose Solr to the outside world); > * make some basic JS tool that handles input and search result responses. > > This was our first web search engine prototype and it was set up in a few > days. The chapter "How To Build A Web Based Search Engine With Solr, Lucene > and Nutch" just means: set up Solr, and point Nutch towards it, and tell it > to start crawling and indexing. > > Then there comes and endless list of things to improve, autocomplete, > spell checking, query and click log handling and analysis, proper text > extraction, etc. > > Regards, > Markus > > -----Original message----- > > From:Jim Anderson <jjanderson52...@gmail.com> > > Sent: Tuesday 2nd June 2020 16:36 > > To: solr-user@lucene.apache.org > > Subject: Building a web based search engine > > > > Hi, > > > > I have been looking at solr, lucene and nutch websites and tutuorials for > > over a week now, experimenting and learning, but also frustrated be the > > fact the I am totally missing the 'how to' do what I want. I see a lot of > > examples of how to use each of the tools, but not how to put them all > > together. I think an 'overview' at the 10,000 foot level is needed, Maybe > > one is available and I have not yet found it. If someone can point me to > > one, please do. > > > > If I am correct that an overview on "How To Build A Web Based Search > Engine > > With Solr, Lucene and Nutch" is not available, then I will be willing to > > write an overview and make it available to the Solr community. I will > need > > input, explanation and review of others. > > > > My 2 goals are: > > > > 1) Build a demo web based search engine [Note: I have a very specific > > business need to able to demonstrate a web application on top of a search > > engine. This demo is intended to show a 'proof of concept' of the web > > application to a small audience.] > > > > 2) Document the process of building the demo and customizing it using the > > java API so that others can more easily build their own web base search > > engine. > > > > Jim Anderson > > >