...@openindex.io]
Sent: Tuesday, January 21, 2014 3:09 PM
To: solr-user@lucene.apache.org
Subject: RE: Indexing URLs from websites
Hi, are you getting pdfs at all? Sounds like a problem with url filters, those
also work on the linkdb. You should also try dumping the linkdb and inspect it
for urls
t;, "/Article 2", andÂ
"/documents/Article 1.pdf"
How can I get these URLs?
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Monday, January 20, 2014 9:08 AM
To: solr-user@lucene.apache.org
Subject: RE: Indexing URLs from websites
Well
"/Article 2", and
"/documents/Article 1.pdf"
How can I get these URLs?
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Monday, January 20, 2014 9:08 AM
To: solr-user@lucene.apache.org
Subject: RE: Indexing URLs from websites
Well it is
solr-user@lucene.apache.org
> Subject: RE: Indexing URLs from websites
>
> Progress!
>
> I changed the value of that property in nutch-default.xml and I am getting
> the anchor field now. However, the stuff going in there is a bit random and
> doesn't seem to correlate to
ith me on this - I really appreciate your help!
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Friday, January 17, 2014 6:46 AM
To: solr-user@lucene.apache.org
Subject: RE: Indexing URLs from websites
-Original message-
> From:Teague James
-Original message-
> From:Teague James
> Sent: Thursday 16th January 2014 20:23
> To: solr-user@lucene.apache.org
> Subject: RE: Indexing URLs from websites
>
> Okay. I had used that previously and I just tried it again. The following
> generated no e
: RE: Indexing URLs from websites
Usage: SolrIndexer [-linkdb ] [-params
k1=v1&k2=v2...] ( ... | -dir ) [-noCommit] [-deleteGone]
[-deleteRobotsNoIndex] [-deleteSkippedByIndexingFilter] [-filter] [-normalize]
You must point to the linkdb via the -linkdb parameter.
-Original mes
> Sent: Thursday 16th January 2014 16:57
> To: solr-user@lucene.apache.org
> Subject: RE: Indexing URLs from websites
>
> Okay. I changed my solrindex to this:
>
> bin/nutch solrindex http://localhost/solr/ crawl/crawldb crawl/linkdb
> crawl/segments/20140115143147
>
[mailto:markus.jel...@openindex.io]
Sent: Thursday, January 16, 2014 10:44 AM
To: solr-user@lucene.apache.org
Subject: RE: Indexing URLs from websites
Hi - you cannot use wildcards for segments. You need to give one segment or a
-dir segments_dir. Check the usage of your indexer command
ve this one produced the same errors.
>
> When/How are the missing directories supposed to be created?
>
> I really appreciate the help! Thank you very much!
>
> -Original Message-
> From: Markus Jelsma [mailto:markus.jel...@openindex.io]
> Sent
very much!
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Thursday, January 16, 2014 5:45 AM
To: solr-user@lucene.apache.org
Subject: RE: Indexing URLs from websites
-Original message-
> From:Teague James
> Sent: Wednesday 15th January
-Original message-
> From:Teague James
> Sent: Wednesday 15th January 2014 22:01
> To: solr-user@lucene.apache.org
> Subject: Re: Indexing URLs from websites
>
> I am still unsuccessful in getting this to work. My expectation is that the
> index-anchor plugin shou
I am still unsuccessful in getting this to work. My expectation is that the
index-anchor plugin should produce values for the field anchor. However this
field is not showing up in my Solr index no matter what I try.
Here's what I have in my nutch-site.xml for plugins:
protocol-http|urlfilter-regex
You could use something like Apache Droids -
http://incubator.apache.org/droids/
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
On Tue, Jan 7, 2014 at 2:27 PM, Teague James wrote:
> I am trying to index a website that contai
14 matches
Mail list logo