Hi Dmitry; I think that such kind of hacking may reduce the search speed. I think that it should be done with boundary scanner isn't it? I think that bs.type=LINE is what I am looking for? There is one more point. I want to do that for Turkish language and I think that I should customize it or if I put special characters to point boundaries I can use simple boundary scanner?
Thanks; Furkan KAMACI 2014-03-24 21:14 GMT+02:00 Dmitry Kan <solrexp...@gmail.com>: > Hi Furkan, > > I have done an implementation with a custom filler (special character) > sequence in between sentences. A better solution I landed at was increasing > the position of each sentence's first token by a large number, like 10000 > (perhaps, a smaller number could be used too). Then a user search can be > conducted with a proximity query: "some tokens" ~5000 (the recently > committed complexphrase parser supports rich phrase syntax, for example). > This of course expects that a sentence fits the 5000 window size and the > total number of sentences in the field * 10k does not exceed > Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within > sentences naturally. > > Is this something you are looking for? > > Dmitry > > > > On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI <furkankam...@gmail.com > >wrote: > > > Hi; > > > > When I generate snippet via Solr I do not want to remove beginning of any > > sentence at the snippet. So I need to do a sentence detection. I think > that > > I can do it before I send documents into Solr. I can put some special > > characters that signs beginning or end of a sentence. Then I can use that > > information when generating snippet. On the other hand I should not show > > that special character to the user. > > > > What do you think that how can I do it or do you have any other ideas for > > my purpose? > > > > PS: I do not do it for English sentences. > > > > Thanks; > > Furkan KAMACI > > > > > > -- > Dmitry > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan >