I want to extract text from a PDF and preserve any table or at least
convert it to a CSV. I am using the PDFtoText package (which uses the
Poppler software). The text is extracted vertically (i.e. one column at a
time) and each text is separated by a space. There is no line break making
it difficult to manipulate. I want to extract the text horizontally to
preserve and possible add line breaks to allow for further manipulation.
Your help in this matter is appreciated. Suggest alternatives if available.
Here is the Go code:
package main
import (
"fmt"
"log"
"os"
pdftotext "github.com/heussd/pdftotext-go"
)
func main() {
// Replace "test.pdf" with the path to your PDF file
pdfPath := "test.pdf"
// Open the PDF file
f, err := os.Open(pdfPath)
if err != nil {
log.Fatalf("Failed to open PDF file: %v", err)
}
defer f.Close()
// Read the file content
content, err := os.ReadFile(pdfPath)
if err != nil {
log.Fatalf("Failed to read PDF file: %v", err)
}
// Extract text from the PDF file
text, err := pdftotext.Extract(content)
if err != nil {
log.Fatalf("Failed to extract text from PDF file: %v", err)
}
// Print the extracted text
fmt.Println(text)
}
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/golang-nuts/c19e212d-a81f-4525-ae0d-a9abb0b292fbn%40googlegroups.com.