Dear Shashi, As I learned, big data, such as Lucene index, was not suitable to be updated frequently. Frequent updating must affect the performance and consistency when Lucene index must be replicated in a large scale cluster. It is expected such a search engine must work in a write-once & read-many environment, right? That's what HDFS (Hadoop Distributed File System) provides. According to my experience, it is really slow when updating a Lucene Index.
Why did you say I could update Lucene index frequently? Thanks so much! Bing On Mon, Jan 23, 2012 at 11:02 PM, Shashi Kant <sk...@sloan.mit.edu> wrote: > You can update the document in the index quite frequently. IDNK what > your requirement is, another option would be to boost query time. > > On Sun, Jan 22, 2012 at 5:51 AM, Bing Li <lbl...@gmail.com> wrote: > > Dear Shashi, > > > > Thanks so much for your reply! > > > > However, I think the value of PageRank is not a static one. It must > update > > on the fly. As I know, Lucene index is not suitable to be updated too > > frequently. If so, how to deal with that? > > > > Best regards, > > Bing > > > > > > On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant <sk...@sloan.mit.edu> > wrote: > >> > >> Lucene has a mechanism to "boost" up/down documents using your custom > >> ranking algorithm. So if you come up with something like Pagerank > >> you might do something like doc.SetBoost(myboost), before writing to > >> index. > >> > >> > >> > >> On Sat, Jan 21, 2012 at 5:07 PM, Bing Li <lbl...@gmail.com> wrote: > >> > Hi, Kai, > >> > > >> > Thanks so much for your reply! > >> > > >> > If the retrieving is done on a string field, not a text field, a > >> > complete > >> > matching approach should be used according to my understanding, right? > >> > If > >> > so, how does Lucene rank the retrieved data? > >> > > >> > Best regards, > >> > Bing > >> > > >> > On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu <lukai1...@gmail.com> wrote: > >> > > >> >> Solr is kind of retrieval step, you can customize the score formula > in > >> >> Lucene. But it supposes not to be too complicated, like it's better > can > >> >> be > >> >> factorization. It also regards to the stored information, like > >> >> TF,DF,position, etc. You can do 2nd phase rerank to the top N data > you > >> >> have > >> >> got. > >> >> > >> >> Sent from my iPad > >> >> > >> >> On Jan 21, 2012, at 1:33 PM, Bing Li <lbl...@gmail.com> wrote: > >> >> > >> >> > Dear all, > >> >> > > >> >> > I am using SolrJ to implement a system that needs to provide users > >> >> > with > >> >> > searching services. I have some questions about Solr searching as > >> >> follows. > >> >> > > >> >> > As I know, Lucene retrieves data according to the degree of keyword > >> >> > matching on text field (partial matching). > >> >> > > >> >> > But, if I search data by string field (complete matching), how does > >> >> Lucene > >> >> > sort the retrieved data? > >> >> > > >> >> > If I want to add new sorting ways, Solr's function query seems to > >> >> > support > >> >> > this feature. > >> >> > > >> >> > However, for a complicated ranking strategy, such PageRank, can > Solr > >> >> > provide an interface for me to do that? > >> >> > > >> >> > My ranking ways are more complicated than PageRank. Now I have to > >> >> > load > >> >> all > >> >> > of matched data from Solr first by keyword and rank them again in > my > >> >> > ways > >> >> > before showing to users. It is correct? > >> >> > > >> >> > Thanks so much! > >> >> > Bing > >> >> > > > > >