Re: [ANNOUNCE] Web Crawler

2013-05-23 Thread Dominique Bejean
Hi, Release 3.0.3 was tested with : * Oracle Java 6 but should work fine with version 7 * Tomcat 5.5 and 6 and 7 * PHP 5.2.x and 5.3.x * Apache 2.2.x * MongoDB 64 bits 2.2 (know issue with 2.4) The new release 4.0.0-alpha-2 is available under Github - https://github.com/bejean/crawl-anywhere

Re: [ANNOUNCE] Web Crawler

2013-05-22 Thread Rajesh Nikam
Hi, crawl anywhere seems to using old versions of java, tomcat, etc. http://www.crawl-anywhere.com/installation-v300/ Will it work with new versions of these required software ? Is there updated installation guide available ? Thanks Rajesh On Wed, May 22, 2013 at 6:48 PM, Dominique Bejean

Re: [ANNOUNCE] Web Crawler

2013-05-22 Thread Dominique Bejean
"Access denied for user 'crawler'@'localhost' (using password: YES)" mysql user crawler/crawler was created and privileges added as mentioned in the tutorial.. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-

Re: [ANNOUNCE] Web Crawler

2013-05-22 Thread Dominique Bejean
Hi, Crawl-Anywhere is now open-source - https://github.com/bejean/crawl-anywhere Best regards. Le 02/03/11 10:02, findbestopensource a écrit : Hello Dominique Bejean, Good job. We identified almost 8 open source web crawlers http://www.findbestopensource.com/tagged/webcrawler I don't kno

Re: [ANNOUNCE] Web Crawler

2013-01-29 Thread SivaKarthik
Hi, i resolved the issue "Access denied for user 'crawler'@'localhost' (using password: YES)" mysql user crawler/crawler was created and privileges added as mentioned in the tutorial.. Thank you. -- View this message in context: http://lucene.472066.n3.n

Re: [ANNOUNCE] Web Crawler

2013-01-29 Thread SivaKarthik
gards -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4036966.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: [ANNOUNCE] Web Crawler

2013-01-27 Thread O. Klein
.. could please help me out to resolve the > problem.. thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4036520.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: [ANNOUNCE] Web Crawler

2013-01-27 Thread SivaKarthik
ws/ 1 Missing action Not sure where im doing wrong.. could please help me out to resolve the problem.. thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4036493.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Nestor Oviedo
ginal Message- >> From: Dominique Bejean [mailto:dominique.bej...@eolya.fr] >> Sent: Wednesday, March 02, 2011 6:22 AM >> To: solr-user@lucene.apache.org >> Subject: Re: [ANNOUNCE] Web Crawler >> >> Aditya, >> >> The crawler is not open source an

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
NTLM2 and that is posing challenges with Nutch? -Original Message- From: Dominique Bejean [mailto:dominique.bej...@eolya.fr] Sent: Wednesday, March 02, 2011 6:22 AM To: solr-user@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler Aditya, The crawler is not open source and won't

RE: [ANNOUNCE] Web Crawler

2011-03-02 Thread Thumuluri, Sai
@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler Aditya, The crawler is not open source and won't be in the next future. Anyway, I have to change the license because it can be use for any personal or commercial projects. Sincerely, Dominique Le 02/03/11 10:02, findbestopensource a

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Paul Libbrecht
VIewing the indexing result, which is a part of what you are describing I think, is a nice job for such an indexing framework. Do you guys know whether such feature is already out there? paul Le 2 mars 2011 à 12:20, Geert-Jan Brits a écrit : > Hi Dominique, > > This looks nice. > In the past

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Hi, The crawler comes with a extendible document processing pipeline. If you know java libraries or web services for 'wrapper induction' processing, it is possible to implement a dedicated stage in the pipeline. Dominique Le 02/03/11 12:20, Geert-Jan Brits a écrit : Hi Dominique, This look

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Aditya, The crawler is not open source and won't be in the next future. Anyway, I have to change the license because it can be use for any personal or commercial projects. Sincerely, Dominique Le 02/03/11 10:02, findbestopensource a écrit : Hello Dominique Bejean, Good job. We identified

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Lukas, I am thinking about it but no decision yet. Anyway, in next release, I will provide source code of pipeline stages and connectors as samples. Dominique Le 02/03/11 10:01, Lukáš Vlček a écrit : Hi, is there any plan to open source it? Regards, Lukas [OT] I tried HuriSearch, input "

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Geert-Jan Brits
Hi Dominique, This looks nice. In the past, I've been interested in (semi)-automatically inducing a scheme/wrapper from a set of example webpages (often called 'wrapper induction' is the scientific field) . This would allow for fast scheme-creation which could be used as a basis for extraction. L

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Rosa, In the pipeline, there is a stage that extract the text from the original document (PDF, HTML, ...). It is possible to plug scripts (Java 6 compliant) in order to keep only relevant parts of the document. See http://www.wiizio.com/confluence/display/CRAWLUSERS/DocTextExtractor+stage Do

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
David, The UI was not the only reason that make me choose to write a totaly new crawler. After eliminating candidate crawlers due to various reasons (inactive project, ...), Nutch and Heritrix where the 2 crawlers in my short list of possible candidates to be use. In my mind, the crawler and

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread findbestopensource
Hello Dominique Bejean, Good job. We identified almost 8 open source web crawlers http://www.findbestopensource.com/tagged/webcrawler I don't know how far yours would be different from the rest. Your license states that it is not open source but it is free for personnel use. Regards Aditya ww

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Lukáš Vlček
Hi, is there any plan to open source it? Regards, Lukas [OT] I tried HuriSearch, input "Java" into search field, it returned a lot of references to coldfusion error pages. May be a recrawl would help? On Wed, Mar 2, 2011 at 1:25 AM, Dominique Bejean wrote: > Hi, > > I would like to announce Cr

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Rosa (Anuncios)
Nice job! It would be good to be able to extract specific data from a given page via XPATH though. Regards, Le 02/03/2011 01:25, Dominique Bejean a écrit : Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document pro

Re: [ANNOUNCE] Web Crawler

2011-03-01 Thread David Smiley (@MITRE.org)
m/ANNOUNCE-Web-Crawler-tp2607831p2608956.html Sent from the Solr - User mailing list archive at Nabble.com.

[ANNOUNCE] Web Crawler

2011-03-01 Thread Dominique Bejean
Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document processing pipeline * a solr indexer The crawler has a web administration in order to manage web sites to be crawled. Each web site crawl is configured with a lo