Kristian, For what it's worth, for http://search-lucene.com and http://search-hadoop.com we simply check out the source code from the SCM and index from the file system. It works reasonably well. The only issues that I can recall us having is with the source code organization under SCM - modules get moved around and sometimes this requires us to update stuff on our end to match those changes.
Otis ---- Performance Monitoring for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html >________________________________ > From: "Van Tassell, Kristian" <kristian.vantass...@siemens.com> >To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> >Sent: Friday, April 20, 2012 3:26 PM >Subject: Crawling an SCM to update a Solr index > >Hello everyone, > >I'm in the process of pulling together requirements for a SCM (source code >manager) crawling mechanism for our Solr index. I probably don't need to argue >the need for a crawler, but to be specific, we have an index which receives >its updates from a custom built application. I would, however, like to >periodically crawl the SCM to ensure the index is up to date. In addition, if >updates are made which require a complete reindex (such as schema.xml >modifications), I could utilize this crawler to update everything or specific >areas. > >I'm wondering if there are any initiatives, tools (like Nutch) or whitepapers >out there, which crawl an SCM. More specifically, I'm looking for a Perforce >solution. I'm guessing that there is nothing specific and I'm prepared to >design to our specific requirements, but wanted to check with the Solr >community prior to getting too far in. > >I'm most likely going to build the solution to interact with the SCM directly >(via their API) versus sync'ing the SCM repository to the filesystem and crawl >that way, since there could be filesystem problem syncing the data and because >there may be relevant metadata information that can be retrieved from the SCM. > >Thanks in advance for any information you may have, >Kristian > > >