There might be an OCR plugin for Apache Tika (which does exactly this out of the box except for OCR capability, i believe).
http://lucene.apache.org/tika/ -mike 2010/2/4 Kranti™ K K Parisa <kranti.par...@gmail.com> > Hi, > > Can anyone list the best OCR APIs available to use in combination with > SOLR. > > The idea is to take a scanned file (format could be pdf,word,image..etc) as > input and give OCRd file which could be used to get the contents for the > SOLR indexing. > > Best Regards, > Kranti K K Parisa >