Hi,
I have been trying to use Tesseract through the data-import-handler in Solr and
it actually works very well – with English. As the documents are in Danish, I
need to change the language setting in Tesseract to Danish as well, is that
possible from Solr?
I was using the update/extract-handler to import single files into Solr, and it
worked for a single file, how would I implement several files from a
file-system?
Here is the request-handler I used:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">false</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
</lst>
</requestHandler>
Martin Frank Hansen, Senior Data Analytiker
Data, IM & Analytics
[cid:[email protected]]
Lautrupparken 40-42, DK-2750 Ballerup
E-mail [email protected]<mailto:[email protected]> Web www.kmd.dk<http://www.kmd.dk/>
Mobil +4525571418
Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du
KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller,
hvordan vi behandler oplysninger om dig.
Protection of your personal data is important to us. Here you can read KMD’s
Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process your
personal data.
Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis
du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere
afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette
e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og
ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre
fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og
læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar
for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.
Please note that this message may contain confidential information. If you have
received this message by mistake, please inform the sender of the mistake by
sending a reply, then delete the message from your system without making,
distributing or retaining any copies of it. Although we believe that the
message and any attachments are free from viruses and other errors that might
affect the computer or it-system where it is received and read, the recipient
opens the message at his or her own risk. We assume no responsibility for any
loss or damage arising from the receipt or use of this message.