RE: Indexing URLs from websites

2014-01-22 Thread Teague James
...@openindex.io] Sent: Tuesday, January 21, 2014 3:09 PM To: solr-user@lucene.apache.org Subject: RE: Indexing URLs from websites Hi, are you getting pdfs at all? Sounds like a problem with url filters, those also work on the linkdb. You should also try dumping the linkdb and inspect it for urls

RE: Indexing URLs from websites

2014-01-21 Thread Markus Jelsma
t;, "/Article 2", and  "/documents/Article 1.pdf" How can I get these URLs? -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Monday, January 20, 2014 9:08 AM To: solr-user@lucene.apache.org Subject: RE: Indexing URLs from websites Well

RE: Indexing URLs from websites

2014-01-21 Thread Teague James
"/Article 2", and "/documents/Article 1.pdf" How can I get these URLs? -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Monday, January 20, 2014 9:08 AM To: solr-user@lucene.apache.org Subject: RE: Indexing URLs from websites Well it is

RE: Indexing URLs from websites

2014-01-20 Thread Markus Jelsma
solr-user@lucene.apache.org > Subject: RE: Indexing URLs from websites > > Progress! > > I changed the value of that property in nutch-default.xml and I am getting > the anchor field now. However, the stuff going in there is a bit random and > doesn't seem to correlate to

RE: Indexing URLs from websites

2014-01-17 Thread Teague James
ith me on this - I really appreciate your help! -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Friday, January 17, 2014 6:46 AM To: solr-user@lucene.apache.org Subject: RE: Indexing URLs from websites -Original message- > From:Teague James

RE: Indexing URLs from websites

2014-01-17 Thread Markus Jelsma
-Original message- > From:Teague James > Sent: Thursday 16th January 2014 20:23 > To: solr-user@lucene.apache.org > Subject: RE: Indexing URLs from websites > > Okay. I had used that previously and I just tried it again. The following > generated no e

RE: Indexing URLs from websites

2014-01-16 Thread Teague James
: RE: Indexing URLs from websites Usage: SolrIndexer [-linkdb ] [-params k1=v1&k2=v2...] ( ... | -dir ) [-noCommit] [-deleteGone] [-deleteRobotsNoIndex] [-deleteSkippedByIndexingFilter] [-filter] [-normalize] You must point to the linkdb via the -linkdb parameter. -Original mes

RE: Indexing URLs from websites

2014-01-16 Thread Markus Jelsma
> Sent: Thursday 16th January 2014 16:57 > To: solr-user@lucene.apache.org > Subject: RE: Indexing URLs from websites > > Okay. I changed my solrindex to this: > > bin/nutch solrindex http://localhost/solr/ crawl/crawldb crawl/linkdb > crawl/segments/20140115143147 >

RE: Indexing URLs from websites

2014-01-16 Thread Teague James
[mailto:markus.jel...@openindex.io] Sent: Thursday, January 16, 2014 10:44 AM To: solr-user@lucene.apache.org Subject: RE: Indexing URLs from websites Hi - you cannot use wildcards for segments. You need to give one segment or a -dir segments_dir. Check the usage of your indexer command

RE: Indexing URLs from websites

2014-01-16 Thread Markus Jelsma
ve this one produced the same errors. > > When/How are the missing directories supposed to be created? > > I really appreciate the help! Thank you very much! > > -Original Message- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent

RE: Indexing URLs from websites

2014-01-16 Thread Teague James
very much! -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Thursday, January 16, 2014 5:45 AM To: solr-user@lucene.apache.org Subject: RE: Indexing URLs from websites -Original message- > From:Teague James > Sent: Wednesday 15th January

RE: Indexing URLs from websites

2014-01-16 Thread Markus Jelsma
-Original message- > From:Teague James > Sent: Wednesday 15th January 2014 22:01 > To: solr-user@lucene.apache.org > Subject: Re: Indexing URLs from websites > > I am still unsuccessful in getting this to work. My expectation is that the > index-anchor plugin shou

Re: Indexing URLs from websites

2014-01-15 Thread Teague James
I am still unsuccessful in getting this to work. My expectation is that the index-anchor plugin should produce values for the field anchor. However this field is not showing up in my Solr index no matter what I try. Here's what I have in my nutch-site.xml for plugins: protocol-http|urlfilter-regex

Re: Indexing URLs from websites

2014-01-07 Thread Otis Gospodnetic
You could use something like Apache Droids - http://incubator.apache.org/droids/ Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Tue, Jan 7, 2014 at 2:27 PM, Teague James wrote: > I am trying to index a website that contai