I had a partial success with executing a wget as follows:

wget --recursive --page-requisites --no-clobber --html-extension
 --convert-links --restrict-file-names=windows --domains
 wiki.apache.org http://wiki.apache.org/solr/ -w 10 -l 5

configuring a web server to serve that location and then indexing it with:

java -Ddata=web -Ddelay=0 -Drecursive=5 -jar post.jar http://<solr_url>/solr/

I still have some fixes I need to to like decide how to index the
files exactly, maybe a dynamic field that will be indexed and stored.

I can then grep the resulting HTML to point the search box back at the
solr instance.

I didn't test all of it yet, I feel it might work but it feels a little ugly.

On Tue, Jan 1, 2013 at 11:32 AM, Upayavira <u...@odoko.co.uk> wrote:
> I have permission to provide an export. Right now I'm thinking of it
> being a one off dump, without the user dir. If someone wants to research
> how to make moin automate it, I at least promise to listen.
>
> Upayavira
>
> On Tue, Jan 1, 2013, at 08:10 AM, Alexandre Rafalovitch wrote:
>> That's why I think this could be a nice joint project with Apacha Infra.
>> They provide Moin export, we build a way to index it with Solr for local
>> usage. Start with our own - Solr - project , then sell it to others once
>> it
>> has been dog-fooded enough. Instant increased Solr exposure to all Apache
>> project users.....
>>
>> Just a thought.
>>
>> Regards,
>>    Alex.
>>
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all at
>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>
>>
>> On Tue, Jan 1, 2013 at 7:03 PM, Lance Norskog <goks...@gmail.com> wrote:
>>
>> > 3 problems:
>> > a- he wanted to read it locally.
>> > b- crawling the open web is imperfect.
>> > c- /browse needs to get at the files with the same URL as the uploader.
>> >
>> > a and b- Try downloading the whole thing with 'wget'. It has a 'make links
>> > point to the downloaded files' option. Wget is great.
>> >
>> > I have done this by parking my files behind a web server. You can use
>> > Tomcat. (I recommend the XAMPP distro: http://www.apachefriends.org/**
>> > en/xampp.html <http://www.apachefriends.org/en/xampp.html>). Then, use
>> > Erik's command to crawl that server. Use /browse to read it.
>> >
>> > Looking at this again, it should be possible to add a file system service
>> > to the Solr start.jar etc/jetty.xml file. I think I did this once. It would
>> > be a handy patch. In fact, this whole thing would make a great blog post.
>> >
>> >
>> > On 12/30/2012 05:05 AM, Erik Hatcher wrote:
>> >
>> >> Here's a geeky way to do it yourself:
>> >>
>> >> Fire up Solr 4.x, run this from example/exampledocs:
>> >>
>> >>     java -Ddata=web -Ddelay=2 -Drecursive=1 -jar post.jar
>> >> http://wiki.apache.org/solr/
>> >>
>> >> (although I do end up getting a bunch of 503's, so maybe this isn't very
>> >> reliable yet?)
>> >>
>> >> Tada: 
>> >> http://localhost:8983/solr/**collection1/browse<http://localhost:8983/solr/collection1/browse>
>> >>
>> >> :)
>> >>
>> >>         Erik
>> >>
>> >>
>> >> On Dec 29, 2012, at 16:54 , d_k wrote:
>> >>
>> >>  Hello,
>> >>>
>> >>> I'm setting up Solr inside an intranet without an internet access and
>> >>> I was wondering if there is a way to obtain the data dump of the Solr
>> >>> Wiki (http://wiki.apache.org/solr/) for offline viewing and searching.
>> >>>
>> >>> I understand MoinMoin has an export feature one can use
>> >>> (http://moinmo.in/MoinDump and
>> >>> http://moinmo.in/**HelpOnMoinCommand/ExportDump<http://moinmo.in/HelpOnMoinCommand/ExportDump>)
>> >>> but i'm afraid it needs
>> >>> to be executed from within the MoinMoin server.
>> >>>
>> >>> Is there a way to obtain the result of that command?
>> >>> Is there another way to view the solr wiki offline?
>> >>>
>> >>
>> >

Reply via email to