The indexed contents of 100 sites were imported to solr from nutch using:

bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb
crawl/segments/*

now, a solr admin search for 'photography' includes these results:

  <doc>
    <float name="score">0.12570743</float>
    <float name="boost">1.0440307</float>
    <str name="digest">94d97f2806240d18d67cafe9c34f94e1</str>
    <str name="id">http://www.galleryhopper.org/</str>
    <str name="segment">...</str>
    <str name="title">Gallery Hopper: Todd Walker's photography ephemera.
Read, enjoy, share, discard.</str>
    <date name="tstamp">...</date>
    <str name="url">http://www.galleryhopper.org/</str>
  </doc>

but highlighting options are on the title field not page text.

My question: Where is the stored parsetext content of the pages? What is the
solr command to send it from nutch with url/id key? The information is
contained in the crawl segments with solr id field matching nutch url.

Thanks.

Reply via email to