We also use Nutch at our environment. Nutch crawls the data and it to Solr for indexing. I have implemented a custom search API that interacts with my Solr indexes cos of I don't want to expose my indexes directly to outside. You can easily configure and build up what you want with such kind of combination.
30 Ekim 2013 Çarşamba tarihinde Palmer, Eric <epal...@richmond.edu> adlı kullanıcı şöyle yazdı: > Thanks for the link > > Sent from my iPhone > > On Oct 30, 2013, at 4:06 PM, "Rajani Maski" <rajinima...@gmail.com> wrote: > >> Hi Eric, >> >> I have also developed mini-applications replacing GSA for some of our >> clients using Apache Nutch + Solr to crawl multi lingual sites and enable >> multi-lingual search. Nutch+Solr is very stable and Nutch mailing list >> provides a good support. >> >> Reference link to start: >> apache nutch | profilerajanimaski >> >> Thanks >> Rajani >> >> >> >> >> On Thu, Oct 31, 2013 at 12:27 AM, Palmer, Eric <epal...@richmond.edu> wrote: >> >>> Markus and Jason >>> >>> thanks for the info. >>> >>> I will start to research Nutch. Writing a crawler, agree it is a rabbit >>> hole. >>> >>> >>> -- >>> Eric Palmer >>> >>> Web Services >>> U of Richmond >>> >>> To report technical issues, obtain technical support or make requests for >>> enhancements please visit >>> http://web.richmond.edu/contact/technical-support.html >>> >>> >>> >>> >>> >>> On 10/30/13 2:53 PM, "Jason Hellman" <jhell...@innoventsolutions.com> >>> wrote: >>> >>>> Nutch is an excellent option. It should feel very comfortable for people >>>> migrating away from the Google appliances. >>>> >>>> Apache Droids is another possible way to approach, and I¹ve found people >>>> using Heretrix or Manifold for various use cases (and usually in >>>> combination with other use cases where the extra overhead was worth the >>>> trouble). >>>> >>>> I think the simples approach will be NutchŠit¹s absolutely worth taking a >>>> shot at it. >>>> >>>> DO NOT write a crawler! That is a rabbit hole you do not want to peer >>>> down into :) >>>> >>>> >>>> >>>> On Oct 30, 2013, at 10:54 AM, Markus Jelsma <markus.jel...@openindex.io > >>>> wrote: >>>> >>>>> Hi Eric, >>>>> >>>>> We have also helped some government institution to replave their >>>>> expensive GSA with open source software. In our case we use Apache Nutch >>>>> 1.7 to crawl the websites and index to Apache Solr. It is very >>>>> effective, robust and scales easily with Hadoop if you have to. Nutch >>>>> may not be the easiest tool for the job but is very stable, feature rich >>>>> and has an active community here at Apache. >>>>> >>>>> Cheers, >>>>> >>>>> -----Original message----- >>>>>> From:Palmer, Eric <epal...@richmond.edu> >>>>>> Sent: Wednesday 30th October 2013 18:48 >>>>>> To: solr-user@lucene.apache.org >>>>>> Subject: Replacing Google Mini Search Appliance with Solr? >>>>>> >>>>>> Hello all, >>>>>> >>>>>> Been lurking on the list for awhile. >>>>>> >>>>>> We are at the end of life for replacing two google mini search >>>>>> appliances used to index our public web sites. Google is no longer >>>>>> selling the mini appliances and buying the big appliance is not cost >>>>>> beneficial. >>>>>> >>>>>> http://search.richmond.edu/ >>>>>> >>>>>> We would run a solr replacement in linux (cents, redhat, similar) with >>>>>> open Java or Oracle Java. >>>>>> >>>>>> Background >>>>>> ========== >>>>>> ~130 sites >