Hi Syao,

You should just write a simple (Java) app that traverses the dir tree, gets
info about each file, uses it to construct Solr doc objects
(SolrInputDocuments if you are working in Java with SolrJ) and sends them
to Solr for indexing.  Should be about 30 minutes of work or less.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Mar 6, 2013 at 3:37 AM, Syao Work <syao.w...@gmail.com> wrote:

> So you are suggesting me to iterate file system and index fs tree entities
> including: directory names, file names, file size etc. and then post it to
> solr?
> I need to index the FS tree, not the file contents.
>
> On Tue, Mar 5, 2013 at 5:54 PM, Erik Hatcher <erik.hatc...@gmail.com>
> wrote:
>
> > Would Solr's post.jar work for you?   It has a directory recurse option.
> >  The usage/help output is pasted below.
> >
> > Here's what should work for you: "java -Dauto -Drecursive -jar post.jar
> > /some/folder"
> >
> >         Erik
> >
> >
> >
> > exampledocs  java -jar post.jar --help
> > SimplePostTool version 1.5
> > Usage: java [SystemProperties] -jar post.jar [-h|-]
> [<file|folder|url|arg>
> > [<file|folder|url|arg>...]]
> >
> > Supported System Properties and their defaults:
> >   -Ddata=files|web|args|stdin (default=files)
> >   -Dtype=<content-type> (default=application/xml)
> >   -Durl=<solr-update-url> (default=http://localhost:8983/solr/update)
> >   -Dauto=yes|no (default=no)
> >   -Drecursive=yes|no|<depth> (default=0)
> >   -Ddelay=<seconds> (default=0 for files, 10 for web)
> >   -Dfiletypes=<type>[,<type>,...]
> >
> (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)
> >   -Dparams="<key>=<value>[&<key>=<value>...]" (values must be
> URL-encoded)
> >   -Dcommit=yes|no (default=yes)
> >   -Doptimize=yes|no (default=no)
> >   -Dout=yes|no (default=no)
> >
> > This is a simple command line tool for POSTing raw data to a Solr
> > port.  Data can be read from files specified as commandline args,
> > URLs specified as args, as raw commandline arg strings or via STDIN.
> > Examples:
> >   java -jar post.jar *.xml
> >   java -Ddata=args  -jar post.jar '<delete><id>42</id></delete>'
> >   java -Ddata=stdin -jar post.jar < hd.xml
> >   java -Ddata=web -jar post.jar http://example.com/
> >   java -Dtype=text/csv -jar post.jar *.csv
> >   java -Dtype=application/json -jar post.jar *.json
> >   java -Durl=http://localhost:8983/solr/update/extract -Dparams=
> literal.id=a
> > -Dtype=application/pdf -jar post.jar a.pdf
> >   java -Dauto -jar post.jar *
> >   java -Dauto -Drecursive -jar post.jar afolder
> >   java -Dauto -Dfiletypes=ppt,html -jar post.jar afolder
> > The options controlled by System Properties include the Solr
> > URL to POST to, the Content-Type of the data, whether a commit
> > or optimize should be executed, and whether the response should
> > be written to STDOUT. If auto=yes the tool will try to set type
> > and url automatically from file name. When posting rich documents
> > the file name will be propagated as "resource.name" and also used
> > as "literal.id". You may override these or any other request parameter
> > through the -Dparams property. To do a commit only, use "-" as argument.
> > The web mode is a simple crawler following links within domain, default
> > delay=10s.
> >
> >
> > On Mar 5, 2013, at 04:38 , Syao Work wrote:
> >
> > > Hello,
> > >
> > > I am trying to index some FS folder tree.
> > > Spent 2 days finding what could be the problem - got nothing :) There
> > are not so much examples on indexing File System.
> > > In the logs I cant find any exceptions why it does not process the info
> > > Data import configuration and debug response are attached
> > >
> > >
> > > Using:
> > > 1. solr web admin tool,
> > > 2. Java version "1.7.0_09-icedtea"
> > >    OpenJDK Runtime Environment (fedora-2.3.7.0.fc17-x86_64)
> > >    OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
> > >
> > > Thank you for your time,
> > > Ro
> > >
> > > P.S. Excuse my bad English, I am not a native English speaker.
> > > <data-config.xml><import-debug-response.json>
> >
> >
>

Reply via email to