Thanks for the link Sent from my iPhone
On Oct 30, 2013, at 4:06 PM, "Rajani Maski" <rajinima...@gmail.com> wrote: > Hi Eric, > > I have also developed mini-applications replacing GSA for some of our > clients using Apache Nutch + Solr to crawl multi lingual sites and enable > multi-lingual search. Nutch+Solr is very stable and Nutch mailing list > provides a good support. > > Reference link to start: > https://sites.google.com/site/profilerajanimaski/webcrawlers/apache-nutch > > Thanks > Rajani > > > > > On Thu, Oct 31, 2013 at 12:27 AM, Palmer, Eric <epal...@richmond.edu> wrote: > >> Markus and Jason >> >> thanks for the info. >> >> I will start to research Nutch. Writing a crawler, agree it is a rabbit >> hole. >> >> >> -- >> Eric Palmer >> >> Web Services >> U of Richmond >> >> To report technical issues, obtain technical support or make requests for >> enhancements please visit >> http://web.richmond.edu/contact/technical-support.html >> >> >> >> >> >> On 10/30/13 2:53 PM, "Jason Hellman" <jhell...@innoventsolutions.com> >> wrote: >> >>> Nutch is an excellent option. It should feel very comfortable for people >>> migrating away from the Google appliances. >>> >>> Apache Droids is another possible way to approach, and I¹ve found people >>> using Heretrix or Manifold for various use cases (and usually in >>> combination with other use cases where the extra overhead was worth the >>> trouble). >>> >>> I think the simples approach will be NutchŠit¹s absolutely worth taking a >>> shot at it. >>> >>> DO NOT write a crawler! That is a rabbit hole you do not want to peer >>> down into :) >>> >>> >>> >>> On Oct 30, 2013, at 10:54 AM, Markus Jelsma <markus.jel...@openindex.io> >>> wrote: >>> >>>> Hi Eric, >>>> >>>> We have also helped some government institution to replave their >>>> expensive GSA with open source software. In our case we use Apache Nutch >>>> 1.7 to crawl the websites and index to Apache Solr. It is very >>>> effective, robust and scales easily with Hadoop if you have to. Nutch >>>> may not be the easiest tool for the job but is very stable, feature rich >>>> and has an active community here at Apache. >>>> >>>> Cheers, >>>> >>>> -----Original message----- >>>>> From:Palmer, Eric <epal...@richmond.edu> >>>>> Sent: Wednesday 30th October 2013 18:48 >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Replacing Google Mini Search Appliance with Solr? >>>>> >>>>> Hello all, >>>>> >>>>> Been lurking on the list for awhile. >>>>> >>>>> We are at the end of life for replacing two google mini search >>>>> appliances used to index our public web sites. Google is no longer >>>>> selling the mini appliances and buying the big appliance is not cost >>>>> beneficial. >>>>> >>>>> http://search.richmond.edu/ >>>>> >>>>> We would run a solr replacement in linux (cents, redhat, similar) with >>>>> open Java or Oracle Java. >>>>> >>>>> Background >>>>> ========== >>>>> ~130 sites >>>>> only ~12,000 pages (at a depth of 3) >>>>> probably ~40,000 pages if we go to a depth of 4 >>>>> >>>>> We use key matches a lot. In solr terms these are elevated documents >>>>> (elevations) >>>>> >>>>> We would code a search query form in php and wrap it into our design >>>>> (http://www.richmond.edu) >>>>> >>>>> I have played with and love lucidworks and know that their $ solution >>>>> works for our use cases but the cost model is not attractive for such a >>>>> small collection. >>>>> >>>>> So with solr what are my open source options and what are people's >>>>> experiences crawling and indexing web sites with solr + crawler. I >>>>> understand there is not a crawler with solr so that would have to be >>>>> first up to get one working. >>>>> >>>>> We can code in Java, PHP, Python etc. if we have to, but we don't want >>>>> to write a crawler if we can avoid it. >>>>> >>>>> thanks in advance for and information. >>>>> >>>>> -- >>>>> Eric Palmer >>>>> Web Services >>>>> U of Richmond >> >>