Thanks a lot Alexandre for the response much appreciated.

Thanks
Saurabh

On Fri, Mar 28, 2014 at 8:56 AM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> 1. You don't actually put PDF/Word into Solr. Instead, it is run
> through content and metadata extraction process and then index that.
> This is important because "a computer" does not understand what you
> are looking for when you open a PDF. It only understand whatever text
> is possible to extract. In case of PDF it is often not much at all,
> unless it was generated with accessibility layer in place. You can
> experiment with what you can extract by downloading a standalone
> Apache Tika install, which has a command line version or using Solr's
> extractOnly flag. Solr, internally, uses Tika, so the results should
> be the same.
>
> 2) When you do a search you can do "field:(Keyword1 Keyword2 Keyword3
> Keyword4) and you get as results any document that matches one of
> those. Not sure about 1000 of them in one go, but certainly a large
> number.
>
> On the other hand, if you have same keywords all the time and you are
> trying to match documents against them, you might be more interested
> in Elastic Search's percolator
> (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html
> ) or in Luwak (https://github.com/flaxsearch/luwak).
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
>
>
> On Fri, Mar 28, 2014 at 10:05 AM, Saurabh Agarwal
> <sagarwal1...@gmail.com> wrote:
>> Thanks a lot Alex for your reply, Appreciate the same.
>>
>> So if i leave the line no part.
>> 1. I guess putting pdf/word  in solr for search can be done, These
>> documents will go go in solr.
>> 2. For search any automatic way to give a excel sheet or large search
>> keywords to search for .
>> ie i have 1000's of words that i want to search in doc can i do it
>> collectively or send search queries one by one.
>>
>> Thanks
>> Saurabh
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:48 AM, Alexandre Rafalovitch
>> <arafa...@gmail.com> wrote:
>>> This feels somewhat backwards. It's very hard to extract Line-Number
>>> information out of MSWord and next to impossible from PDF. So, it's
>>> not whether the Solr is a good fit or not here is that maybe your
>>> whole architecture has a major issue. Can you do this/what you want by
>>> hand at least once? Down to the precision you want?
>>>
>>> If you can, then yes you probably can automate the searching with
>>> Solr, though you will still have serious issues (sentence crossing
>>> line-boundaries, etc). But I suspect your whole approach will change
>>> once you try to do this manually.
>>>
>>> Regards,
>>>    Alex.
>>> Personal website: http://www.outerthoughts.com/
>>> Current project: http://www.solr-start.com/ - Accelerating your Solr 
>>> proficiency
>>>
>>>
>>> On Thu, Mar 27, 2014 at 11:46 PM, Saurabh Agarwal
>>> <sagarwal1...@gmail.com> wrote:
>>>> Can anyone help me please.
>>>>
>>>> Hi All,
>>>>
>>>> I am  new to Solr and from initial reading i am quite convinced Solr
>>>> will be of great help. Can anyone help in making that decision.
>>>>
>>>> Usecase:
>>>> 1.  I will have PDF,Word docs generated daily/weekly ( lot of them )
>>>> which kinds of get overwritten frequently.
>>>> 2. I have a dictionary kind of thing ( having a list of which
>>>> words/small sentences should be part of above docs , words which
>>>> cannot be and alternatives for some  ).
>>>> 3. Now i want Solr to search my Docs produced in step 1 to be searched
>>>> for words/small sentences from step 2 and give me my Doc Name/line no
>>>> in which they exist.
>>>>
>>>> Will Solr be a good help to me, If anybody can help giving some
>>>> examples that will be great.
>>>>
>>>> Appreciate your help and patience.
>>>>
>>>> Thanks
>>>> Saurabh

Reply via email to