Hello everyone, I'm in the process of pulling together requirements for a SCM (source code manager) crawling mechanism for our Solr index. I probably don't need to argue the need for a crawler, but to be specific, we have an index which receives its updates from a custom built application. I would, however, like to periodically crawl the SCM to ensure the index is up to date. In addition, if updates are made which require a complete reindex (such as schema.xml modifications), I could utilize this crawler to update everything or specific areas.
I'm wondering if there are any initiatives, tools (like Nutch) or whitepapers out there, which crawl an SCM. More specifically, I'm looking for a Perforce solution. I'm guessing that there is nothing specific and I'm prepared to design to our specific requirements, but wanted to check with the Solr community prior to getting too far in. I'm most likely going to build the solution to interact with the SCM directly (via their API) versus sync'ing the SCM repository to the filesystem and crawl that way, since there could be filesystem problem syncing the data and because there may be relevant metadata information that can be retrieved from the SCM. Thanks in advance for any information you may have, Kristian