Hi ,
I'm using Nutch to crawl a list of web sites.
Solr is my index(Nutch-1.0 integration with solr).
I'm working on detecting web site defacement(if there's any changes in the
text of a web page).
I want to know if solr may give me the possibility to detect the changes in
the Documents in the ind
Thanks a lot,
I have made a look in these classes.
But what I exactly want to do is to detect if a Document(in the index of
solr)has changed when I recrawl a site with Nutch.
Not to block deduplication, but to detect if a Document has changed and
extract changes in a file without writing them over
Hi,
I have an index in wich I am always indexing the same documents
(re-indexing).
So I need to search for them by their number of segment.
When I ask solrj for the documents by their segment [for example:
solrj.query("segment:20090603142546");] , he doesn't return any thing. I
checked the schema.
Yes it's already defined as String:
When I
make query by id or url it works but not the segment...
--
View this message in context:
http://www.nabble.com/Solr-search-by-segment-tp23856569p23859699.html
Sent from the Solr - User mailing list archive at Nabble.com.
I must precise that I am running nutch-solr-integration and both schema.xml
are the same in nutch or in solr.
--
View this message in context:
http://www.nabble.com/Solr-search-by-segment-tp23856569p23859728.html
Sent from the Solr - User mailing list archive at Nabble.com.