Specific replies below, but what I'd seriously consider
is writing my own filesystem-aware hook that pushed
documents to known Solr servers rather than using
DIH to pull them. You could use the code from
FileSystemEntityProcessor as a base and go from there.
The FileSystemEntityProcessor isn't really intended to
do very complex stuff.....

1> Don't think this is possible OOB. There's nothing built in to the
     DIH that puts in filesystem hooks and automatically tries to index
     it....
2> Nope. DIH is pretty simple that way as per the
     FileListEntityProcessor.
3> I'm pretty sure this is irrelevant to FileSystemEntityProcessor,
     it's really used for the database importation.
4> "whatever order Java returns them in". Take a look at the
      FileListEntityProcessor code, but the relevant bit is below.
    So the ordering is whatever Java does which I don't know
   what, if any, guarantees are made.
    private void getFolderFiles(File dir, final List<Map<String,
Object>> fileDetails) {
    // Fetch an array of file objects that pass the filter, however the
    // returned array is never populated; accept() always returns false.
    // Rather we make use of the fileDetails array which is populated as
    // a side affect of the accept method.
    dir.list(new FilenameFilter() {
      public boolean accept(File dir, String name) {
        File fileObj = new File(dir, name);
        if (fileObj.isDirectory()) {
          if (recursive) getFolderFiles(fileObj, fileDetails);
        } else if (fileNamePattern == null) {
          addDetails(fileDetails, dir, name);
        } else if (fileNamePattern.matcher(name).find()) {
          if (excludesPattern != null && excludesPattern.matcher(name).find())
            return false;
          addDetails(fileDetails, dir, name);
        }
        return false;
      }
    });
  }

On Tue, Sep 27, 2011 at 4:51 PM, Gabriel Cooper <inanutshel...@gmail.com> wrote:
> I'm researching using DataImportHandler to import my data files utilizing
> FileDataSource with FileListEntityProcessor and have a couple questions
> before I get started that I'm hoping you guys can assist with.
>
> 1) I would like to put a file on the local filesystem in the configured
> location and have Solr see and process the file without additional effort on
> my part.
> 1a) Is this doable in any way? From what I've seen, this is not supported
> and I must manually call a URL (e.g.
> http://foo/solr/dataimport?command=full-import).
> 1b) The manual, URL-based invocation method seems perfectly logical in a
> database-oriented world, where one might schedule an update to run regularly
> but in my case I have a couple identical indexes I load balance between and
> don't want to run the same hefty query multiple times in parallel. As such,
> I'm doing one query, writing the results to an XML file, pushing that file
> to each box, and then wanting that file processed. I'd like the process to
> be as automated as possible.
>
> 2) I would like any files processed by Solr to be deleted after they've been
> imported. I haven't seen any way to do this currently. I thought I might be
> able to subclass something, but FileListEntityProcessor, for example,
> doesn't seem to give any handles at the right time in the workflow to delete
> a file.
>
> 3) When reading the DIH documentation, I ran across this statement: "When
> delta-import command is executed, it reads the start time stored in *
> conf/dataimport.properties*. It uses that timestamp to run delta queries and
> after completion, updates the timestamp in *conf/dataimport.properties*." If
> it really does update the date to the completion date, what happens to any
> files added between the start and end dates? Are they lost?
>
> 4) For delta imports, I don't see mention of how processed files are ordered
> other than that it tries not to re-import files older than that mentioned in
> the conf/dataimport.properties file. In cases where order matters, does it
> order the files by name or creation date or ...?
>
> Thanks for any help,
>
> Gabriel.
>

Reply via email to