Hello,

I want to start off by saying that I am not a programmer...and have very little 
knowledge in this area.


What I would like to know if Apache would be capable of doing the following:

Take an extensive list (A) of strings of unique words (these are titles - 
anywhere from 4 words to 30) saved in either an Excel worksheet or in a text 
file and search for instances (B) where these can be found in PDF files saved 
on a hard drive (over 100k files). The search would need to be done using a 
fuzzy logic rather than exact matching and the output would be in an Excel file 
list the unique string found (A), the file name in which the match was made 
(B), the page number where the match was made and the surrounding text on 
either side of As well, would this be a complicated program, usable by novices 
coached in the process necessary to input the title file (A) and direct the 
search to the relevant folder containing the PDF files (B).


I eagerly await (hopefully) an affirmative answer.


Cheers!

Reply via email to