Hi Otis and Jack; I have made a research about highlights and debugged code. I see that highlight are query dependent and not stored. Why Solr uses Lucene for storing text, I mean i.e. content of a web page. Is there any comparison about to store texts at Hbase or any other databases versus Lucene.
Also I want to learn that is there anybody who has used anything else from Lucene to store text of document at our solr user list? 2013/4/11 Otis Gospodnetic <otis.gospodne...@gmail.com> > Source code is your best bet. Wiki has info about how to use it, but > not how highlighting is implemented. But you don't need to understand > the implementation details to understand that they are dynamic, > computed specifically for each query for each matching document, so > you cannot store them anywhere ahead of time. > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI <furkankam...@gmail.com> > wrote: > > Hi Otis; > > > > It seems that I should read more about highlights. Is there any where > that > > explains in detail how highlights are generated at Solr? > > > > 2013/4/11 Otis Gospodnetic <otis.gospodne...@gmail.com> > > > >> Hi, > >> > >> You can't store highlights ahead of time because they are query > >> dependent. You could store documents in HBase and use Solr just for > >> indexing. Is that what you want to do? If so, a custom > >> SearchComponent executed after QueryComponent could fetch data from > >> external store like HBase. I'm not sure if I'd recommend that. > >> > >> Otis > >> -- > >> Solr & ElasticSearch Support > >> http://sematext.com/ > >> > >> > >> > >> > >> > >> On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI <furkankam...@gmail.com > > > >> wrote: > >> > Actually I don't think to store documents at Solr. I want to store > just > >> > highlights (snippets) at Hbase and I want to retrieve them from Hbase > >> when > >> > needed. > >> > What do you think about separating just highlights from Solr and > storing > >> > them into Hbase at Solrclod. By the way if you explain at which > process > >> and > >> > how highlights are genareted at Solr you are welcome. > >> > > >> > > >> > 2013/4/9 Otis Gospodnetic <otis.gospodne...@gmail.com> > >> > > >> >> You may also be interested in looking at things like solrbase (on > >> Github). > >> >> > >> >> Otis > >> >> -- > >> >> Solr & ElasticSearch Support > >> >> http://sematext.com/ > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI < > furkankam...@gmail.com> > >> >> wrote: > >> >> > Hi; > >> >> > > >> >> > First of all should mention that I am new to Solr and making a > >> research > >> >> > about it. What I am trying to do that I will crawl some websites > with > >> >> Nutch > >> >> > and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud > 4.2 ) > >> >> > > >> >> > I wonder about something. I have a cloud of machines that crawls > >> websites > >> >> > and stores that documents. Then I send that documents into > SolrCloud. > >> >> Solr > >> >> > indexes that documents and generates indexes and save them. I know > >> that > >> >> > from Information Retrieval theory: it *may* not be efficient to > store > >> >> > indexes at a NoSQL database (they are something like linked lists > and > >> if > >> >> > you store them in such kind of database you *may* have a sparse > >> >> > representation -by the way there may be some solutions for it. If > you > >> >> > explain them you are welcome.) > >> >> > > >> >> > However Solr stores some documents too (i.e. highlights) So some > of my > >> >> > documents will be doubled somehow. If I consider that I will have > many > >> >> > documents, that dobuled documents may cause a problem for me. So is > >> there > >> >> > any way not storing that documents at Solr and pointing to them at > >> >> > Hbase(where I save my crawled documents) or instead of pointing > >> directly > >> >> > storing them at Hbase (is it efficient or not)? > >> >> > >> >