New to Solr can someone help me to know if Solr fits my use case
Hi All, I am new to Solr and from initial reading i am quite convinced Solr will be of great help. Can anyone help in making that decision. Usecase: 1. I will have PDF,Word docs generated daily/weekly ( lot of them ) which kinds of get overwritten frequently. 2. I have a dictionary kind of thing ( having a list of which words/small sentences should be part of above docs , words which cannot be and alternatives for some ). 3. Now i want Solr to search my Docs produced in step 1 to be searched for words/small sentences from step 2 and give me my Doc Name/line no in which they exist. Will Solr be a good help to me, If anybody can help giving some examples that will be great. Appreciate your help and patience. Thanks Saurabh
Re: New to Solr can someone help me to know if Solr fits my use case
Can anyone help me please. Hi All, I am new to Solr and from initial reading i am quite convinced Solr will be of great help. Can anyone help in making that decision. Usecase: 1. I will have PDF,Word docs generated daily/weekly ( lot of them ) which kinds of get overwritten frequently. 2. I have a dictionary kind of thing ( having a list of which words/small sentences should be part of above docs , words which cannot be and alternatives for some ). 3. Now i want Solr to search my Docs produced in step 1 to be searched for words/small sentences from step 2 and give me my Doc Name/line no in which they exist. Will Solr be a good help to me, If anybody can help giving some examples that will be great. Appreciate your help and patience. Thanks Saurabh
Re: New to Solr can someone help me to know if Solr fits my use case
Thanks a lot Alex for your reply, Appreciate the same. So if i leave the line no part. 1. I guess putting pdf/word in solr for search can be done, These documents will go go in solr. 2. For search any automatic way to give a excel sheet or large search keywords to search for . ie i have 1000's of words that i want to search in doc can i do it collectively or send search queries one by one. Thanks Saurabh On Fri, Mar 28, 2014 at 6:48 AM, Alexandre Rafalovitch wrote: > This feels somewhat backwards. It's very hard to extract Line-Number > information out of MSWord and next to impossible from PDF. So, it's > not whether the Solr is a good fit or not here is that maybe your > whole architecture has a major issue. Can you do this/what you want by > hand at least once? Down to the precision you want? > > If you can, then yes you probably can automate the searching with > Solr, though you will still have serious issues (sentence crossing > line-boundaries, etc). But I suspect your whole approach will change > once you try to do this manually. > > Regards, >Alex. > Personal website: http://www.outerthoughts.com/ > Current project: http://www.solr-start.com/ - Accelerating your Solr > proficiency > > > On Thu, Mar 27, 2014 at 11:46 PM, Saurabh Agarwal > wrote: >> Can anyone help me please. >> >> Hi All, >> >> I am new to Solr and from initial reading i am quite convinced Solr >> will be of great help. Can anyone help in making that decision. >> >> Usecase: >> 1. I will have PDF,Word docs generated daily/weekly ( lot of them ) >> which kinds of get overwritten frequently. >> 2. I have a dictionary kind of thing ( having a list of which >> words/small sentences should be part of above docs , words which >> cannot be and alternatives for some ). >> 3. Now i want Solr to search my Docs produced in step 1 to be searched >> for words/small sentences from step 2 and give me my Doc Name/line no >> in which they exist. >> >> Will Solr be a good help to me, If anybody can help giving some >> examples that will be great. >> >> Appreciate your help and patience. >> >> Thanks >> Saurabh
Re: New to Solr can someone help me to know if Solr fits my use case
Thanks a lot Alexandre for the response much appreciated. Thanks Saurabh On Fri, Mar 28, 2014 at 8:56 AM, Alexandre Rafalovitch wrote: > 1. You don't actually put PDF/Word into Solr. Instead, it is run > through content and metadata extraction process and then index that. > This is important because "a computer" does not understand what you > are looking for when you open a PDF. It only understand whatever text > is possible to extract. In case of PDF it is often not much at all, > unless it was generated with accessibility layer in place. You can > experiment with what you can extract by downloading a standalone > Apache Tika install, which has a command line version or using Solr's > extractOnly flag. Solr, internally, uses Tika, so the results should > be the same. > > 2) When you do a search you can do "field:(Keyword1 Keyword2 Keyword3 > Keyword4) and you get as results any document that matches one of > those. Not sure about 1000 of them in one go, but certainly a large > number. > > On the other hand, if you have same keywords all the time and you are > trying to match documents against them, you might be more interested > in Elastic Search's percolator > (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html > ) or in Luwak (https://github.com/flaxsearch/luwak). > > Regards, >Alex. > Personal website: http://www.outerthoughts.com/ > Current project: http://www.solr-start.com/ - Accelerating your Solr > proficiency > > > On Fri, Mar 28, 2014 at 10:05 AM, Saurabh Agarwal > wrote: >> Thanks a lot Alex for your reply, Appreciate the same. >> >> So if i leave the line no part. >> 1. I guess putting pdf/word in solr for search can be done, These >> documents will go go in solr. >> 2. For search any automatic way to give a excel sheet or large search >> keywords to search for . >> ie i have 1000's of words that i want to search in doc can i do it >> collectively or send search queries one by one. >> >> Thanks >> Saurabh >> >> >> >> On Fri, Mar 28, 2014 at 6:48 AM, Alexandre Rafalovitch >> wrote: >>> This feels somewhat backwards. It's very hard to extract Line-Number >>> information out of MSWord and next to impossible from PDF. So, it's >>> not whether the Solr is a good fit or not here is that maybe your >>> whole architecture has a major issue. Can you do this/what you want by >>> hand at least once? Down to the precision you want? >>> >>> If you can, then yes you probably can automate the searching with >>> Solr, though you will still have serious issues (sentence crossing >>> line-boundaries, etc). But I suspect your whole approach will change >>> once you try to do this manually. >>> >>> Regards, >>>Alex. >>> Personal website: http://www.outerthoughts.com/ >>> Current project: http://www.solr-start.com/ - Accelerating your Solr >>> proficiency >>> >>> >>> On Thu, Mar 27, 2014 at 11:46 PM, Saurabh Agarwal >>> wrote: >>>> Can anyone help me please. >>>> >>>> Hi All, >>>> >>>> I am new to Solr and from initial reading i am quite convinced Solr >>>> will be of great help. Can anyone help in making that decision. >>>> >>>> Usecase: >>>> 1. I will have PDF,Word docs generated daily/weekly ( lot of them ) >>>> which kinds of get overwritten frequently. >>>> 2. I have a dictionary kind of thing ( having a list of which >>>> words/small sentences should be part of above docs , words which >>>> cannot be and alternatives for some ). >>>> 3. Now i want Solr to search my Docs produced in step 1 to be searched >>>> for words/small sentences from step 2 and give me my Doc Name/line no >>>> in which they exist. >>>> >>>> Will Solr be a good help to me, If anybody can help giving some >>>> examples that will be great. >>>> >>>> Appreciate your help and patience. >>>> >>>> Thanks >>>> Saurabh
question related to solr LTR plugin
Hi, I do have a question related to solr LTR plugin. I have a use case of personalization and wondering whether you can help me there. I would like to rerank my query based on the relationship of searcher with the author of the returned documents. I do have relationship score in the external datastore in form of user1(searcher), user2(author), relationship score. In my query, I can pass searcher id as external feature. My question is that during querying, how do I retrieve relationship score for each documents as a feature and rerank the documents. Would I need to implement a custom feature to do so? and How to implement the custom feature. Thanks, Saurabh