y also be interested in looking at things like solrbase
> >>> (on
> >>> >> >> Github).
> >>> >> >> >>
> >>> >> >> >> Otis
> >>> >> >> >> --
> >>> >> >> >> Solr & ElasticSearch Support
> >>> >> >> >> http://sematext.com/
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI <
> >>> >>
>
> > furkankamaci@
>
> >>
> >>> >> >> >> wrote:
> >>> >> >> >> > Hi;
> >>> >> >> >> >
> >>> >> >> >> > First of all should mention that I am new to Solr and making
> >>> a
> >>> >> >> research
> >>> >> >> >> > about it. What I am trying to do that I will crawl some
> >>> websites
> >>> >> with
> >>> >> >> >> Nutch
> >>> >> >> >> > and then I will index them with Solr. (Nutch 2.1,
> >>> Solr-SolrCloud
> >>> >> 4.2 )
> >>> >> >> >> >
> >>> >> >> >> > I wonder about something. I have a cloud of machines that
> >>> crawls
> >>> >> >> websites
> >>> >> >> >> > and stores that documents. Then I send that documents into
> >>> >> SolrCloud.
> >>> >> >> >> Solr
> >>> >> >> >> > indexes that documents and generates indexes and save them.
> I
> >>> know
> >>> >> >> that
> >>> >> >> >> > from Information Retrieval theory: it *may* not be efficient
> >>> to
> >>> >> store
> >>> >> >> >> > indexes at a NoSQL database (they are something like linked
> >>> lists
> >>> >> and
> >>> >> >> if
> >>> >> >> >> > you store them in such kind of database you *may* have a
> >>> sparse
> >>> >> >> >> > representation -by the way there may be some solutions for
> >>> it.
> >>> If
> >>> >> you
> >>> >> >> >> > explain them you are welcome.)
> >>> >> >> >> >
> >>> >> >> >> > However Solr stores some documents too (i.e. highlights) So
> >>> some
> >>> >> of my
> >>> >> >> >> > documents will be doubled somehow. If I consider that I will
> >>> have
> >>> >> many
> >>> >> >> >> > documents, that dobuled documents may cause a problem for
> me.
> >>> So is
> >>> >> >> there
> >>> >> >> >> > any way not storing that documents at Solr and pointing to
> >>> them
> >>> at
> >>> >> >> >> > Hbase(where I save my crawled documents) or instead of
> >>> pointing
> >>> >> >> directly
> >>> >> >> >> > storing them at Hbase (is it efficient or not)?
> >>> >> >> >>
> >>> >> >>
> >>> >>
> >>>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Pointing-to-Hbase-for-Docuements-or-Directly-Saving-Documents-at-Hbase-tp4054277p4056599.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
t;> Github).
>>> >> >> >>
>>> >> >> >> Otis
>>> >> >> >> --
>>> >> >> >> Solr & ElasticSearch Support
>>> >> >> >> http://sematext.com/
>>> >> >
Use Solr. It's pretty clear you don't yet have any problems that
would make you think about alternatives. Using Solr to store and not
just index will make your life simpler (and your app simpler and
likely faster).
Otis
--
Solr & ElasticSearch Support
http://sematext.com/
On Tue, Apr 16, 20
Thanks again for your answer. If I find any document about such comparisons
that I would like to read.
By the way, is there any advantage for using Lucene instead of anything
else as like that:
Using Lucene is naturally supported at Solr and if I use anything else I
may face with some compatibili
People do use other data stores to retrieve data sometimes. e.g. Mongo
is popular for that. Like I hinted in another email, I wouldn't
necessarily recommend this for common cases. Don't do it unless you
really know you need it. Otherwise, just store in Solr.
Otis
--
Solr & ElasticSearch Support
Hi Otis and Jack;
I have made a research about highlights and debugged code. I see that
highlight are query dependent and not stored. Why Solr uses Lucene for
storing text, I mean i.e. content of a web page. Is there any comparison
about to store texts at Hbase or any other databases versus Lucene
Source code is your best bet. Wiki has info about how to use it, but
not how highlighting is implemented. But you don't need to understand
the implementation details to understand that they are dynamic,
computed specifically for each query for each matching document, so
you cannot store them anyw
Hi Otis;
It seems that I should read more about highlights. Is there any where that
explains in detail how highlights are generated at Solr?
2013/4/11 Otis Gospodnetic
> Hi,
>
> You can't store highlights ahead of time because they are query
> dependent. You could store documents in HBase and
Hi,
You can't store highlights ahead of time because they are query
dependent. You could store documents in HBase and use Solr just for
indexing. Is that what you want to do? If so, a custom
SearchComponent executed after QueryComponent could fetch data from
external store like HBase. I'm not
Actually I don't think to store documents at Solr. I want to store just
highlights (snippets) at Hbase and I want to retrieve them from Hbase when
needed.
What do you think about separating just highlights from Solr and storing
them into Hbase at Solrclod. By the way if you explain at which process
You may also be interested in looking at things like solrbase (on Github).
Otis
--
Solr & ElasticSearch Support
http://sematext.com/
On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI wrote:
> Hi;
>
> First of all should mention that I am new to Solr and making a research
> about it. What I am tr
s, one for
text index and metadata store, and the other for raw store of the original
document bytes.
-- Jack Krupansky
-Original Message-
From: Furkan KAMACI
Sent: Saturday, April 06, 2013 6:01 PM
To: solr-user@lucene.apache.org
Subject: Pointing to Hbase for Docuements or Directly S
Hi;
First of all should mention that I am new to Solr and making a research
about it. What I am trying to do that I will crawl some websites with Nutch
and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 )
I wonder about something. I have a cloud of machines that crawls websites
13 matches
Mail list logo