from:"Shairon"

Non-linear structure for search and index documents

2009-04-08 Thread Shairon

Hi all, I need index/search words extracted from pdf files with coordinates and page number, so I have this structure: - index the document id - a document has many pages - a page has many words - a word has geometry[w,h,x,y](inside of page) Is this possible with solr? If yes, ho

Phrase search issue with XMLPayload? Is it the better solution?

2010-01-04 Thread Shairon

I have a project that involves words extracted by OCR, each page has words, each word has its geometry to blink a highlight to end user. I've been trying represent this document structure by xml foo bar baz qux Using the field 'fulltext_st' ,

Re: Solr / Tika Integration

2012-02-10 Thread Shairon Toledo

ample the word > Medizin is extracted as Me di zin. > As a consequence the suggestions are often unusable and the search does not > work as expected. > > Has anyone a suggestion how to extract the content of PDF containing > sof-hyphens withpout fragmenting it? > > Best > Dirk > -- [ ]'s Shairon Toledo http://www.google.com/profiles/shairon.toledo