Hi all,
I need index/search words extracted from pdf files with coordinates and page
number, so I have this structure:
- index the document id
- a document has many pages
- a page has many words
- a word has geometry[w,h,x,y](inside of page)
Is this possible with solr?
If yes, ho
I have a project that involves words extracted by OCR, each page has words,
each word has its geometry to blink a highlight to end user.
I've been trying represent this document structure by xml
foo
bar
baz
qux
Using the field 'fulltext_st' ,
ample the word
> Medizin is extracted as Me di zin.
> As a consequence the suggestions are often unusable and the search does not
> work as expected.
>
> Has anyone a suggestion how to extract the content of PDF containing
> sof-hyphens withpout fragmenting it?
>
> Best
> Dirk
>
--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo