Hello, I'm also in favor of maintaing a web interface that ships with nutch. As has been mentioned it say well be a bridge to Solr. If I find the time to contribute my solution (and make it general enough), I'll happily do it.
Earlier I was wondering of actually using the previous nutch web interface (not solritas/velocity) and integrate with solr index. I still find this tempting, what's the motivation against it? I've evaluated Ajax Solr but i didn't get it to work. Listening to Markus I've tried Solritas I got it to work but w/o highlighting. Why? Those are the relevant solrconfig.xml sections: <queryResponseWriter name="velocity" class="org.apache.solr.request.VelocityResponseWriter"/> <requestHandler name="/itas" class="solr.SearchHandler"> <lst name="defaults"> <str name="v.template">browse</str> <str name="v.properties">velocity.properties</str> <str name="v.contentType">text/html;charset=UTF-8</str> <str name="title">Solritas</str> <str name="hl.fl">*</str> <str name="qt">standard</str> <str name="wt">velocity</str> <str name="fq"/> <str name="rows">10000</str> <str name="hl">on</str> <str name="defType">dismax</str> <str name="q.alt">*:*</str> <str name="fl">*,score</str> <str name="facet">on</str> <str name="facet.field">title</str> <str name="facet.mincount">1</str> </lst> <!--<lst name="invariants">--> <!--<str name="v.base_dir">/solr/contrib/velocity/src/main/templates</str>--> <!--</lst>--> </requestHandler> This was already there: <highlighting> <!-- Configure the standard fragmenter --> <!-- This could most likely be commented out in the "default" case --> <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter" default="true"> <lst name="defaults"> <int name="hl.fragsize">100</int> </lst> </fragmenter> <!-- A regular-expression-based fragmenter (f.i., for sentence extraction) --> <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter"> <lst name="defaults"> <!-- slightly smaller fragsizes work better because of slop --> <int name="hl.fragsize">70</int> <!-- allow 50% slop on fragment sizes --> <float name="hl.regex.slop">0.5</float> <!-- a basic sentence pattern --> <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str> </lst> </fragmenter> <!-- Configure the standard formatter --> <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true"> <lst name="defaults"> <str name="hl.simple.pre"><![CDATA[<em>]]></str> <str name="hl.simple.post"><![CDATA[</em>]]></str> </lst> </formatter> </highlighting> Pointers: http://stackoverflow.com/questions/5071675/ajax-solr-how-to-make-an-ajax-page-readable-by-google On Mon, May 2, 2011 at 7:43 PM, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi Gabriele, > > I would have loved to have done this myself but haven't had the time. I > also favored having a web interface still included as well. > > If you find time to port it to the 1.3 branch/framework I can tell you I'd > happily devote my time towards a 1.4 release that includes it. > > Cheers, > Chris > > On May 2, 2011, at 10:54 AM, Gabriele Kahlout wrote: > > > The reason I'm asking is because I had found the nutch webapp pretty neet > > for a prototype interface (it even did highlighting). > > I'm thinking of changing it so that it pulls the data from solr index, > > updating this part in search.jsp: > > > > // perform query > > // NOTE by Dawid Weiss: > > // The 'clustering' window actually moves with the start > > // position.... this is good, bad?... ugly?.... > > Hits hits; > > try{ > > query.getParams().initFrom(start + hitsToRetrieve, hitsPerSite, > > "site", sort, reverse); > > hits = bean.search(query); > > } catch (IOException e){ > > hits = new Hits(0,new Hit[0]); > > } > > > > > > Has someone gone through that already? Are there other alternatives you > have > > taken? I stumbled upon (w/o stumbledupon.com) > > http://evolvingweb.github.com/ajax-solr/examples/reuters/index.htmlwhich is > > quite sophisticated and doesn't do the highlighting! > > > > > > On Mon, May 2, 2011 at 4:45 PM, Markus Jelsma < > markus.jel...@openindex.io>wrote: > > > >> Yes. It was removed. Indexing and searching is delegated to Solr for > now. > >> > >> On Monday 02 May 2011 16:41:32 Gabriele Kahlout wrote: > >>> Hello, > >>> > >>> Some time ago I was trying to use nutch/search.jsp to search my Solr > >>> indexes. Trying to do that again I've noticed that in nutch-1.3 there > is > >> no > >>> support for a Nutch web querying interface (presumably in favor of > solr's > >>> own). Is it? > >> > >> -- > >> Markus Jelsma - CTO - Openindex > >> http://www.linkedin.com/in/markus17 > >> 050-8536620 / 06-50258350 > >> > > > > > > > > -- > > Regards, > > K. Gabriele > > > > --- unchanged since 20/9/10 --- > > P.S. If the subject contains "[LON]" or the addressee acknowledges the > > receipt within 48 hours then I don't resend the email. > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > time(x) > > < Now + 48h) ⇒ ¬resend(I, this). > > > > If an email is sent by a sender that is not a trusted contact or the > email > > does not contain a valid code then the email is not received. A valid > code > > starts with a hyphen and ends with "X". > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > > L(-[a-z]+[0-9]X)). > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).