Re: Unstructured/Structured data for indexing

2015-12-09 Thread Jack Krupansky
You can also use Solr Cell to send entire PDF or office documents: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika -- Jack Krupansky On Wed, Dec 9, 2015 at 3:09 AM, subinalex wrote: > Hi, > > I am a solr newbie,just got a quick question. > > SOLR

Re: Unstructured/Structured data for indexing

2015-12-09 Thread Walter Underwood
Often Solr documents are “semi-structured”. They have some structured fields and some free-text fields. e-mail messages are like that, with structured headers and an unstructured body. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 9, 2015, at

Re: Unstructured/Structured data for indexing

2015-12-09 Thread Alexandre Rafalovitch
Don't think about indexing so much, think about searching. Say you are searching a video? What does that mean? Do you want to match random sequence of binary values that represent inter-frame change? Probably not. When you answer what you want to actually search (title? length? subscripts?), you w

Re: Unstructured/Structured data for indexing

2015-12-09 Thread subinalex
Thanks jurgen...for clarifying...:-) On 9 Dec 2015 2:06 pm, Jürgen Wagner (DVT)" [via Lucene]" < ml-node+s472066n4244411...@n3.nabble.com> wrote: > Subin, > Only the envelope is structured. What's inside the individual fields of > the structure may be single values (possibly considered structure

Re: Unstructured/Structured data for indexing

2015-12-09 Thread DVT
Subin, Only the envelope is structured. What's inside the individual fields of the structure may be single values (possibly considered structured meta-data) or unstructured (like free text or other fields with informal semantics). Even if you pass a 5-hour video as a major case of unstructured d