Hello everyone,

I'm in the process of pulling together requirements for a SCM (source code 
manager) crawling mechanism for our Solr index. I probably don't need to argue 
the need for a crawler, but to be specific, we have an index which receives its 
updates from a custom built application. I would, however, like to periodically 
crawl the SCM to ensure the index is up to date. In addition, if updates are 
made which require a complete reindex (such as schema.xml modifications), I 
could utilize this crawler to update everything or specific areas.

I'm wondering if there are any initiatives, tools (like Nutch) or whitepapers 
out there, which crawl an SCM. More specifically, I'm looking for a Perforce 
solution. I'm guessing that there is nothing specific and I'm prepared to 
design to our specific requirements, but wanted to check with the Solr 
community prior to getting too far in.

I'm most likely going to build the solution to interact with the SCM directly 
(via their API) versus sync'ing the SCM repository to the filesystem and crawl 
that way, since there could be filesystem problem syncing the data and because 
there may be relevant metadata information that can be retrieved from the SCM.

Thanks in advance for any information you may have,
Kristian

Reply via email to