Sort for Retrieved Data
Dear all, I have a question when sorting retrieved data from Solr. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). If I search data by string field (complete matching), how does Lucene sort the retrieved data? If I add some filters, such as time, what about the sorting way? If I just need to top ones, is it proper to just add rows? If I want to add new sorting ways, how to do that? Thanks so much! Bing
How to Sort By a PageRank-Like Complicated Strategy?
Dear all, I am using SolrJ to implement a system that needs to provide users with searching services. I have some questions about Solr searching as follows. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). But, if I search data by string field (complete matching), how does Lucene sort the retrieved data? If I want to add new sorting ways, Solr's function query seems to support this feature. However, for a complicated ranking strategy, such PageRank, can Solr provide an interface for me to do that? My ranking ways are more complicated than PageRank. Now I have to load all of matched data from Solr first by keyword and rank them again in my ways before showing to users. It is correct? Thanks so much! Bing
Re: How to Sort By a PageRank-Like Complicated Strategy?
Hi, Kai, Thanks so much for your reply! If the retrieving is done on a string field, not a text field, a complete matching approach should be used according to my understanding, right? If so, how does Lucene rank the retrieved data? Best regards, Bing On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu wrote: > Solr is kind of retrieval step, you can customize the score formula in > Lucene. But it supposes not to be too complicated, like it's better can be > factorization. It also regards to the stored information, like > TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have > got. > > Sent from my iPad > > On Jan 21, 2012, at 1:33 PM, Bing Li wrote: > > > Dear all, > > > > I am using SolrJ to implement a system that needs to provide users with > > searching services. I have some questions about Solr searching as > follows. > > > > As I know, Lucene retrieves data according to the degree of keyword > > matching on text field (partial matching). > > > > But, if I search data by string field (complete matching), how does > Lucene > > sort the retrieved data? > > > > If I want to add new sorting ways, Solr's function query seems to support > > this feature. > > > > However, for a complicated ranking strategy, such PageRank, can Solr > > provide an interface for me to do that? > > > > My ranking ways are more complicated than PageRank. Now I have to load > all > > of matched data from Solr first by keyword and rank them again in my ways > > before showing to users. It is correct? > > > > Thanks so much! > > Bing >
Re: How to Sort By a PageRank-Like Complicated Strategy?
Dear Shashi, Thanks so much for your reply! However, I think the value of PageRank is not a static one. It must update on the fly. As I know, Lucene index is not suitable to be updated too frequently. If so, how to deal with that? Best regards, Bing On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant wrote: > Lucene has a mechanism to "boost" up/down documents using your custom > ranking algorithm. So if you come up with something like Pagerank > you might do something like doc.SetBoost(myboost), before writing to index. > > > > On Sat, Jan 21, 2012 at 5:07 PM, Bing Li wrote: > > Hi, Kai, > > > > Thanks so much for your reply! > > > > If the retrieving is done on a string field, not a text field, a complete > > matching approach should be used according to my understanding, right? If > > so, how does Lucene rank the retrieved data? > > > > Best regards, > > Bing > > > > On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu wrote: > > > >> Solr is kind of retrieval step, you can customize the score formula in > >> Lucene. But it supposes not to be too complicated, like it's better can > be > >> factorization. It also regards to the stored information, like > >> TF,DF,position, etc. You can do 2nd phase rerank to the top N data you > have > >> got. > >> > >> Sent from my iPad > >> > >> On Jan 21, 2012, at 1:33 PM, Bing Li wrote: > >> > >> > Dear all, > >> > > >> > I am using SolrJ to implement a system that needs to provide users > with > >> > searching services. I have some questions about Solr searching as > >> follows. > >> > > >> > As I know, Lucene retrieves data according to the degree of keyword > >> > matching on text field (partial matching). > >> > > >> > But, if I search data by string field (complete matching), how does > >> Lucene > >> > sort the retrieved data? > >> > > >> > If I want to add new sorting ways, Solr's function query seems to > support > >> > this feature. > >> > > >> > However, for a complicated ranking strategy, such PageRank, can Solr > >> > provide an interface for me to do that? > >> > > >> > My ranking ways are more complicated than PageRank. Now I have to load > >> all > >> > of matched data from Solr first by keyword and rank them again in my > ways > >> > before showing to users. It is correct? > >> > > >> > Thanks so much! > >> > Bing > >> >
Re: How to Sort By a PageRank-Like Complicated Strategy?
Dear Shashi, As I learned, big data, such as Lucene index, was not suitable to be updated frequently. Frequent updating must affect the performance and consistency when Lucene index must be replicated in a large scale cluster. It is expected such a search engine must work in a write-once & read-many environment, right? That's what HDFS (Hadoop Distributed File System) provides. According to my experience, it is really slow when updating a Lucene Index. Why did you say I could update Lucene index frequently? Thanks so much! Bing On Mon, Jan 23, 2012 at 11:02 PM, Shashi Kant wrote: > You can update the document in the index quite frequently. IDNK what > your requirement is, another option would be to boost query time. > > On Sun, Jan 22, 2012 at 5:51 AM, Bing Li wrote: > > Dear Shashi, > > > > Thanks so much for your reply! > > > > However, I think the value of PageRank is not a static one. It must > update > > on the fly. As I know, Lucene index is not suitable to be updated too > > frequently. If so, how to deal with that? > > > > Best regards, > > Bing > > > > > > On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant > wrote: > >> > >> Lucene has a mechanism to "boost" up/down documents using your custom > >> ranking algorithm. So if you come up with something like Pagerank > >> you might do something like doc.SetBoost(myboost), before writing to > >> index. > >> > >> > >> > >> On Sat, Jan 21, 2012 at 5:07 PM, Bing Li wrote: > >> > Hi, Kai, > >> > > >> > Thanks so much for your reply! > >> > > >> > If the retrieving is done on a string field, not a text field, a > >> > complete > >> > matching approach should be used according to my understanding, right? > >> > If > >> > so, how does Lucene rank the retrieved data? > >> > > >> > Best regards, > >> > Bing > >> > > >> > On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu wrote: > >> > > >> >> Solr is kind of retrieval step, you can customize the score formula > in > >> >> Lucene. But it supposes not to be too complicated, like it's better > can > >> >> be > >> >> factorization. It also regards to the stored information, like > >> >> TF,DF,position, etc. You can do 2nd phase rerank to the top N data > you > >> >> have > >> >> got. > >> >> > >> >> Sent from my iPad > >> >> > >> >> On Jan 21, 2012, at 1:33 PM, Bing Li wrote: > >> >> > >> >> > Dear all, > >> >> > > >> >> > I am using SolrJ to implement a system that needs to provide users > >> >> > with > >> >> > searching services. I have some questions about Solr searching as > >> >> follows. > >> >> > > >> >> > As I know, Lucene retrieves data according to the degree of keyword > >> >> > matching on text field (partial matching). > >> >> > > >> >> > But, if I search data by string field (complete matching), how does > >> >> Lucene > >> >> > sort the retrieved data? > >> >> > > >> >> > If I want to add new sorting ways, Solr's function query seems to > >> >> > support > >> >> > this feature. > >> >> > > >> >> > However, for a complicated ranking strategy, such PageRank, can > Solr > >> >> > provide an interface for me to do that? > >> >> > > >> >> > My ranking ways are more complicated than PageRank. Now I have to > >> >> > load > >> >> all > >> >> > of matched data from Solr first by keyword and rank them again in > my > >> >> > ways > >> >> > before showing to users. It is correct? > >> >> > > >> >> > Thanks so much! > >> >> > Bing > >> >> > > > > >
How is Data Indexed in HBase?
Dear all, I wonder how data in HBase is indexed? Now Solr is used in my system because data is managed in inverted index. Such an index is suitable to retrieve unstructured and huge amount of data. How does HBase deal with the issue? May I replaced Solr with HBase? Thanks so much! Best regards, Bing
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Mr Gupta, Thanks so much for your reply! In my use cases, retrieving data by keyword is one of them. I think Solr is a proper choice. However, Solr does not provide a complex enough support to rank. And, frequent updating is also not suitable in Solr. So it is difficult to retrieve data randomly based on the values other than keyword frequency in text. In this case, I attempt to use HBase. But I don't know how HBase support high performance when it needs to keep consistency in a large scale distributed system. Now both of them are used in my system. I will check out ElasticSearch. Best regards, Bing On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta wrote: > Bing, > Its a classic battle on whether to use solr or hbase or a combination of > both. both systems are very different but there is some overlap in the > utility. they also differ vastly when it compares to computation power, > storage needs, etc. so in the end, it all boils down to your use case. you > need to pick the technology that it best suited to your needs. > im still not clear on your use case though. > > btw, if you haven't started using solr yet - then you might want to > checkout ElasticSearch. I spent over a week researching between solr and ES > and eventually chose ES due to its cool merits. > > thanks > > > On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu wrote: > >> There is no secondary index support in HBase at the moment. >> >> It's on our road map. >> >> FYI >> >> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li wrote: >> >> > Jacques, >> > >> > Yes. But I still have questions about that. >> > >> > In my system, when users search with a keyword arbitrarily, the query is >> > forwarded to Solr. No any updating operations but appending new indexes >> > exist in Solr managed data. >> > >> > When I need to retrieve data based on ranking values, HBase is used. >> And, >> > the ranking values need to be updated all the time. >> > >> > Is that correct? >> > >> > My question is that the performance must be low if keeping consistency >> in a >> > large scale distributed environment. How does HBase handle this issue? >> > >> > Thanks so much! >> > >> > Bing >> > >> > >> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques wrote: >> > >> > > It is highly unlikely that you could replace Solr with HBase. They're >> > > really apples and oranges. >> > > >> > > >> > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li wrote: >> > > >> > >> Dear all, >> > >> >> > >> I wonder how data in HBase is indexed? Now Solr is used in my system >> > >> because data is managed in inverted index. Such an index is suitable >> to >> > >> retrieve unstructured and huge amount of data. How does HBase deal >> with >> > >> the >> > >> issue? May I replaced Solr with HBase? >> > >> >> > >> Thanks so much! >> > >> >> > >> Best regards, >> > >> Bing >> > >> >> > > >> > > >> > >> > >
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Dear Mr Gupta, Your understanding about my solution is correct. Now both HBase and Solr are used in my system. I hope it could work. Thanks so much for your reply! Best regards, Bing On Fri, Feb 24, 2012 at 3:30 AM, T Vinod Gupta wrote: > regarding your question on hbase support for high performance and > consistency - i would say hbase is highly scalable and performant. how it > does what it does can be understood by reading relevant chapters around > architecture and design in the hbase book. > > with regards to ranking, i see your problem. but if you split the problem > into hbase specific solution and solr based solution, you can achieve the > results probably. may be you do the ranking and store the rank in hbase and > then use solr to get the results and then use hbase as a lookup to get the > rank. or you can put the rank as part of the document schema and index the > rank too for range queries and such. is my understanding of your scenario > wrong? > > thanks > > > On Wed, Feb 22, 2012 at 9:51 AM, Bing Li wrote: > >> Mr Gupta, >> >> Thanks so much for your reply! >> >> In my use cases, retrieving data by keyword is one of them. I think Solr >> is a proper choice. >> >> However, Solr does not provide a complex enough support to rank. And, >> frequent updating is also not suitable in Solr. So it is difficult to >> retrieve data randomly based on the values other than keyword frequency in >> text. In this case, I attempt to use HBase. >> >> But I don't know how HBase support high performance when it needs to keep >> consistency in a large scale distributed system. >> >> Now both of them are used in my system. >> >> I will check out ElasticSearch. >> >> Best regards, >> Bing >> >> >> On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta wrote: >> >>> Bing, >>> Its a classic battle on whether to use solr or hbase or a combination of >>> both. both systems are very different but there is some overlap in the >>> utility. they also differ vastly when it compares to computation power, >>> storage needs, etc. so in the end, it all boils down to your use case. you >>> need to pick the technology that it best suited to your needs. >>> im still not clear on your use case though. >>> >>> btw, if you haven't started using solr yet - then you might want to >>> checkout ElasticSearch. I spent over a week researching between solr and ES >>> and eventually chose ES due to its cool merits. >>> >>> thanks >>> >>> >>> On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu wrote: >>> >>>> There is no secondary index support in HBase at the moment. >>>> >>>> It's on our road map. >>>> >>>> FYI >>>> >>>> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li wrote: >>>> >>>> > Jacques, >>>> > >>>> > Yes. But I still have questions about that. >>>> > >>>> > In my system, when users search with a keyword arbitrarily, the query >>>> is >>>> > forwarded to Solr. No any updating operations but appending new >>>> indexes >>>> > exist in Solr managed data. >>>> > >>>> > When I need to retrieve data based on ranking values, HBase is used. >>>> And, >>>> > the ranking values need to be updated all the time. >>>> > >>>> > Is that correct? >>>> > >>>> > My question is that the performance must be low if keeping >>>> consistency in a >>>> > large scale distributed environment. How does HBase handle this issue? >>>> > >>>> > Thanks so much! >>>> > >>>> > Bing >>>> > >>>> > >>>> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques wrote: >>>> > >>>> > > It is highly unlikely that you could replace Solr with HBase. >>>> They're >>>> > > really apples and oranges. >>>> > > >>>> > > >>>> > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li wrote: >>>> > > >>>> > >> Dear all, >>>> > >> >>>> > >> I wonder how data in HBase is indexed? Now Solr is used in my >>>> system >>>> > >> because data is managed in inverted index. Such an index is >>>> suitable to >>>> > >> retrieve unstructured and huge amount of data. How does HBase deal >>>> with >>>> > >> the >>>> > >> issue? May I replaced Solr with HBase? >>>> > >> >>>> > >> Thanks so much! >>>> > >> >>>> > >> Best regards, >>>> > >> Bing >>>> > >> >>>> > > >>>> > > >>>> > >>>> >>> >>> >> >
Re: pagerank??
According to my knowledge, Solr cannot support this. In my case, I get data by keyword-matching from Solr and then rank the data by PageRank after that. Thanks, Bing On Wed, Apr 4, 2012 at 6:37 AM, Manuel Antonio Novoa Proenza < mano...@estudiantes.uci.cu> wrote: > Hello, > > I have in my Solr index , many indexed documents. > > Let me know any way or efficient function to calculate the page rank of > websites indexed. > > > s > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci
How to Transmit and Append Indexes
Hi, all, I am working on a distributed searching system. Now I have one server only. It has to crawl pages from the Web, generate indexes locally and respond users' queries. I think this is too busy for it to work smoothly. I plan to use two servers at at least. The jobs to crawl pages and generate indexes are done by one of them. After that, the new available indexes should be transmitted to anther one which is responsible for responding users' queries. From users' point of view, this system must be fast. However, I don't know how I can get the additional indexes which I can transmit. After transmission, how to append them to the old indexes? Does the appending block searching? Thanks so much for your help! Bing Li
Is it fine to transmit indexes in this way?
Hi, all, Since I didn't find that Lucene presents updated indexes to us, may I transmit indexes in the following way? 1) One indexing machine, A, is busy with generating indexes; 2) After a certain time, the indexing process is terminated; 3) Then, the new indexes are transmitted to machines which serve users' queries; 4) It is possible that some index files have the same names. So the conflicting files should be renamed; 5) After the transmission is done, the transmitted indexes are removed from A. 6) After the removal, the indexing process is started again on A. The reason I am trying to do that is to load balancing the search load. One machine is responsible for generating indexes and the others are responsible for responding queries. If the above approaches do not work, may I see the updates of indexes in Lucene? May I transmit them? And, may I append them to existing indexes? Does the appending affect the querying? I am learning Solr. But it seems that Solr does that for me. However, I have to set up Tomcat to use Solr. I think it is a little bit heavy. Thanks! Bing Li
Re: Is it fine to transmit indexes in this way?
Thanks so much, Gora! What do you mean by appending? If you mean adding to an existing index (on reindexing, this would normally mean an update for an existing Solr document ID, and a create for a new Solr document ID), the best way probably is not to delete the index on the master server (what you call machine A). Once the indexing is completed, a commit ensures that new documents show up for any subsequent queries. When updates are replicated to slave servers, it is supposed that the updates are merged with the existing indexes and readings on them can be done concurrently. If so, the queries must be responded instantly. That's what I mean "appending". Does it happen in Solr? Best, Bing On Sat, Nov 20, 2010 at 1:58 AM, Gora Mohanty wrote: > On Fri, Nov 19, 2010 at 10:53 PM, Bing Li wrote: > > Hi, all, > > > > Since I didn't find that Lucene presents updated indexes to us, may I > > transmit indexes in the following way? > > > > 1) One indexing machine, A, is busy with generating indexes; > > > > 2) After a certain time, the indexing process is terminated; > > > > 3) Then, the new indexes are transmitted to machines which serve users' > > queries; > > Just replied to a similar question in another thread. The best way > is probably to use Solr replication: > http://wiki.apache.org/solr/SolrReplication > > You can set up replication to happen automatically upon commit on the > master server (where the new index was made). As a commit should > have been made when indexing is complete on the master server, this > will then ensure that a new index is replicated on the slave server. > > > 4) It is possible that some index files have the same names. So the > > conflicting files should be renamed; > > Replication will handle this for you. > > > 5) After the transmission is done, the transmitted indexes are removed > from > > A. > > > > 6) After the removal, the indexing process is started again on A. > [...] > > These two items you have to do manually, i.e., delete all documents > on A, and restart the indexing. > > > > And, may I append them to > existing indexes? > > Does the appending affect the querying? > [...] > > What do you mean by appending? If you mean adding to an existing index > (on reindexing, this would normally mean an update for an existing Solr > document ID, and a create for a new Solr document ID), the best way > probably is not to delete the index on the master server (what you call > machine A). Once the indexing is completed, a commit ensures that new > documents show up for any subsequent queries. > > Regards, > Gora >
Re: How to Transmit and Append Indexes
Dear Erick, Thanks so much for your help! I am new in Solr. So I have no idea about the version. But I wonder what are the differences between Solr and Hadoop? It seems that Solr has done the same as what Hadoop promises. Best, Bing On Sat, Nov 20, 2010 at 2:28 AM, Erick Erickson wrote: > You haven't said what version of Solr you're using, but you're > asking about replication, which is built-in. > See: http://wiki.apache.org/solr/SolrReplication > > And no, your slave doesn't block while the update is happening, > and it automatically switches to the updated index upon > successful replication. > > Older versions of Solr used rsynch & etc. > > Best > Erick > > On Fri, Nov 19, 2010 at 10:52 AM, Bing Li wrote: > >> Hi, all, >> >> I am working on a distributed searching system. Now I have one server >> only. >> It has to crawl pages from the Web, generate indexes locally and respond >> users' queries. I think this is too busy for it to work smoothly. >> >> I plan to use two servers at at least. The jobs to crawl pages and >> generate >> indexes are done by one of them. After that, the new available indexes >> should be transmitted to anther one which is responsible for responding >> users' queries. From users' point of view, this system must be fast. >> However, I don't know how I can get the additional indexes which I can >> transmit. After transmission, how to append them to the old indexes? Does >> the appending block searching? >> >> Thanks so much for your help! >> >> Bing Li >> > >
Re: How to Transmit and Append Indexes
Hi, Gora, No, I really wonder if Solr is based on Hadoop? Hadoop is efficient when using on search engines since it is suitable to the write-once-read-many model. After reading your emails, it looks like Solr's distributed file system does the same thing. Both of them are good for searching large indexes in a large scale distributed environment, right? Thanks! Bing On Sat, Nov 20, 2010 at 3:01 AM, Gora Mohanty wrote: > On Sat, Nov 20, 2010 at 12:05 AM, Bing Li wrote: > > Dear Erick, > > > > Thanks so much for your help! I am new in Solr. So I have no idea about > the > > version. > > The solr/admin/registry.jsp URL on your local Solr installation should show > you the version at the top. > > > But I wonder what are the differences between Solr and Hadoop? It seems > that > > Solr has done the same as what Hadoop promises. > [...] > > Er, what? Solr and Hadoop are entirely different applications. Did you > mean Lucene or Nutch, instead of Hadoop? > > Regards, > Gora >
Import Data Into Solr
Hi, all, I am a new user of Solr. Before using it, all of the data is indexed myself with Lucene. According to the Chapter 3 of the book, Solr. 1.4 Enterprise Search Server written by David Smiley and Eric Pugh, data in the formats of XML, CSV and even PDF, etc, can be imported to Solr. If I wish to import the Lucene indexes into Solr, may I have any other approaches? I know that Solr is a serverized Lucene. Thanks, Bing Li
Solr Got Exceptions When "schema.xml" is Changed
Dear all, I am a new user of Solr. Now I am just trying to try some basic samples. Solr can be started correctly with Tomcat. However, when putting a new schema.xml under SolrHome/conf and starting Tomcat again, I got the following two exceptions. The Solr cannot be started correctly unless using the initial schema.xml from Solr. Why cannot I change the schema.xml? Thanks so much! Bing Dec 5, 2010 4:52:49 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:52) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1146) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) - SEVERE: Could not start SOLR. Check solr/home property org.apache.solr.common.SolrException: QueryElevationComponent requires the schema to have a uniqueKeyFie ld implemented using StrField at org.apache.solr.handler.component.QueryElevationComponent.inform(QueryElevationComponent.java :157) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:508) at org.apache.solr.core.SolrCore.(SolrCore.java:588) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:37 2) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4405) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5037) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:812) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:787) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:570) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:891) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:683) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:466) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1267) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:308) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:89) at org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:328) at org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:308) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1043) at org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:738) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1035) at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:289) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.StandardService.startInternal(StandardService.java:442) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:674) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.startup.Catalina.start(Catalina.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apach
SolrHome and Solr Data Dir in solrconfig.xml
Dear all, I am a new user of Solr. When using Solr, SolrHome is set to /home/libing/Solr. When Tomcat is started, it must read solrconfig.xml to get Solr data dir, which is used to contain indexes. However, I have no idea how to associate SolrHome with Solr data dir. So a mistake occurs. All the indexes are put under $TOMCAT_HOME/bin. This is NOT what I expect. I hope indexes are under SolrHome. Could you please give me a hand? Best, Bing Li
Indexing and Searching Chinese
Hi, all, Now I cannot search the index when querying with Chinese keywords. Before using Solr, I ever used Lucene for some time. Since I need to crawl some Chinese sites, I use ChineseAnalyzer in the code to run Lucene. I know Solr is a server for Lucene. However, I have no idea know how to configure the analyzer in Solr? I appreciate so much for your help! Best, LB
Indexing and Searching Chinese with SolrNet
Dear all, After reading some pages on the Web, I created the index with the following schema. .. .. It must be correct, right? However, when sending a query though SolrNet, no results are returned. Could you tell me what the reason is? Thanks, LB
Re: Indexing and Searching Chinese with SolrNet
Dear Jelsma, My servlet container is Tomcat 7. I think it should accept Chinese characters. But I am not sure how to configure it. From the console of Tomcat, I saw that the Chinese characters in the query are not displayed normally. However, it is fine in the Solr Admin page. I am not sure either if SolrNet supports Chinese. If not, how can I interact with Solr on .NET? Thanks so much! LB On Wed, Jan 19, 2011 at 2:34 AM, Markus Jelsma wrote: > Why creating two threads for the same problem? Anyway, is your servlet > container capable of accepting UTF-8 in the URL? Also, is SolrNet capable > of > handling those characters? To confirm, try a tool like curl. > > > Dear all, > > > > After reading some pages on the Web, I created the index with the > following > > schema. > > > > .. > > > positionIncrementGap="100"> > > > > > class="solr.ChineseTokenizerFactory"/> > > > > > > .. > > > > It must be correct, right? However, when sending a query though SolrNet, > no > > results are returned. Could you tell me what the reason is? > > > > Thanks, > > LB >
Re: Indexing and Searching Chinese with SolrNet
Dear Jelsma, After configuring the Tomcat URIEncoding, Chinese characters can be processed correctly. I appreciate so much for your help! Best, LB On Wed, Jan 19, 2011 at 3:02 AM, Markus Jelsma wrote: > Hi, > > Yes but Tomcat might need to be configured to accept, see the wiki for more > information on this subject. > > http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config > > Cheers, > > > Dear Jelsma, > > > > My servlet container is Tomcat 7. I think it should accept Chinese > > characters. But I am not sure how to configure it. From the console of > > Tomcat, I saw that the Chinese characters in the query are not displayed > > normally. However, it is fine in the Solr Admin page. > > > > I am not sure either if SolrNet supports Chinese. If not, how can I > > interact with Solr on .NET? > > > > Thanks so much! > > LB > > > > > > On Wed, Jan 19, 2011 at 2:34 AM, Markus Jelsma > > > > wrote: > > > Why creating two threads for the same problem? Anyway, is your servlet > > > container capable of accepting UTF-8 in the URL? Also, is SolrNet > capable > > > of > > > handling those characters? To confirm, try a tool like curl. > > > > > > > Dear all, > > > > > > > > After reading some pages on the Web, I created the index with the > > > > > > following > > > > > > > schema. > > > > > > > > .. > > > > > > > > > > > > > > > positionIncrementGap="100"> > > > > > > > > > > > > > > > > > > > > > > > class="solr.ChineseTokenizerFactory"/> > > > > > > > > > > > > > > > > > > > > > > > > .. > > > > > > > > It must be correct, right? However, when sending a query though > > > > SolrNet, > > > > > > no > > > > > > > results are returned. Could you tell me what the reason is? > > > > > > > > Thanks, > > > > LB >
SolrJ Tutorial
Hi, all, In the past, I always used SolrNet to interact with Solr. It works great. Now, I need to use SolrJ. I think it should be easier to do that than SolrNet since Solr and SolrJ should be homogeneous. But I cannot find a tutorial that is easy to follow. No tutorials explain the SolrJ programming step by step. No complete samples are found. Could anybody offer me some online resources to learn SolrJ? I also noticed Solr Cell and SolrJ POJO. Do you have detailed resources to them? Thanks so much! LB
Re: SolrJ Tutorial
I got the solution. Attach one complete sample code I made as follows. Thanks, LB package com.greatfree.Solr; import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.common.params.ModifiableSolrParams; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.common.SolrDocumentList; import org.apache.solr.client.solrj.beans.Field; import java.net.MalformedURLException; public class SolrJExample { public static void main(String[] args) throws MalformedURLException, SolrServerException { SolrServer solr = new CommonsHttpSolrServer(" http://192.168.210.195:8080/solr/CategorizedHub";); SolrQuery query = new SolrQuery(); query.setQuery("*:*"); QueryResponse rsp = solr.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println(docs.getNumFound()); try { SolrServer solrScore = new CommonsHttpSolrServer(" http://192.168.210.195:8080/solr/score";); Score score = new Score(); score.id = "4"; score.type = "modern"; score.name = "iphone"; score.score = 97; solrScore.addBean(score); solrScore.commit(); } catch (Exception e) { System.out.println(e.toString()); } } } On Sat, Jan 22, 2011 at 3:58 PM, Lance Norskog wrote: > The unit tests are simple and show the steps. > > Lance > > On Fri, Jan 21, 2011 at 10:41 PM, Bing Li wrote: > > Hi, all, > > > > In the past, I always used SolrNet to interact with Solr. It works great. > > Now, I need to use SolrJ. I think it should be easier to do that than > > SolrNet since Solr and SolrJ should be homogeneous. But I cannot find a > > tutorial that is easy to follow. No tutorials explain the SolrJ > programming > > step by step. No complete samples are found. Could anybody offer me some > > online resources to learn SolrJ? > > > > I also noticed Solr Cell and SolrJ POJO. Do you have detailed resources > to > > them? > > > > Thanks so much! > > LB > > > > > > -- > Lance Norskog > goks...@gmail.com >
SolrDocumentList Size vs NumFound
Dear all, I got a weird problem. The number of searched documents is much more than 10. However, the size of SolrDocumentList is 10 and the getNumFound() is the exact count of results. When I need to iterate the results as follows, only 10 are displayed. How to get the rest ones? .. for (SolrDocument doc : docs) { System.out.println(doc.getFieldValue(Fields.CATEGORIZED_HUB_TITLE_FIELD) + ": " + doc.getFieldValue(Fields.CATEGORIZED_HUB_URL_FIELD) + "; " + doc.getFieldValue(Fields.HUB_CATEGORY_NAME_FIELD) + "/" + doc.getFieldValue(Fields.HUB_PARENT_CATEGORY_NAME_FIELD)); } .. Could you give me a hand? Thanks, LB
Open Too Many Files
Dear all, I got an exception when querying the index within Solr. It told me that too many files are opened. How to handle this problem? Thanks so much! LB [java] org.apache.solr.client.solrj. SolrServerException: java.net.SocketException: Too many open files [java] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483) [java] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) [java] at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) [java] at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) [java] at com.greatfree.Solr.Broker.Search(Broker.java:145) [java] at com.greatfree.Solr.SolrIndex.SelectHubPageHashByHubKey(SolrIndex.java:116) [java] at com.greatfree.Web.HubCrawler.Crawl(Unknown Source) [java] at com.greatfree.Web.Worker.run(Unknown Source) [java] at java.lang.Thread.run(Thread.java:662) [java] Caused by: java.net.SocketException: Too many open files [java] at java.net.Socket.createImpl(Socket.java:397) [java] at java.net.Socket.(Socket.java:371) [java] at java.net.Socket.(Socket.java:249) [java] at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) [java] at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) [java] at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) [java] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) [java] at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) [java] at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) [java] at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) [java] at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) [java] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) [java] ... 8 more [java] Exception in thread "Thread-96" java.lang.NullPointerException [java] at com.greatfree.Solr.SolrIndex.SelectHubPageHashByHubKey(SolrIndex.java:117) [java] at com.greatfree.Web.HubCrawler.Crawl(Unknown Source) [java] at com.greatfree.Web.Worker.run(Unknown Source) [java] at java.lang.Thread.run(Thread.java:662)
Re: Solr Out of Memory Error
Dear Adam, I also got the OutOfMemory exception. I changed the JAVA_OPTS in catalina.sh as follows. ... if [ -z "$LOGGING_MANAGER" ]; then JAVA_OPTS="$JAVA_OPTS -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager" else JAVA_OPTS="$JAVA_OPTS -server -Xms8096m -Xmx8096m" fi ... Is this change correct? After that, I still got the same exception. The index is updated and searched frequently. I am trying to change the code to avoid the frequent updates. I guess only changing JAVA_OPTS does not work. Could you give me some help? Thanks, LB On Wed, Jan 19, 2011 at 10:05 PM, Adam Estrada < estrada.adam.gro...@gmail.com> wrote: > Is anyone familiar with the environment variable, JAVA_OPTS? I set > mine to a much larger heap size and never had any of these issues > again. > > JAVA_OPTS = -server -Xms4048m -Xmx4048m > > Adam > > On Wed, Jan 19, 2011 at 3:29 AM, Isan Fulia > wrote: > > Hi all, > > By adding more servers do u mean sharding of index.And after sharding , > how > > my query performance will be affected . > > Will the query execution time increase. > > > > Thanks, > > Isan Fulia. > > > > On 19 January 2011 12:52, Grijesh wrote: > > > >> > >> Hi Isan, > >> > >> It seems your index size 25GB si much more compared to you have total > Ram > >> size is 4GB. > >> You have to do 2 things to avoid Out Of Memory Problem. > >> 1-Buy more Ram ,add at least 12 GB of more ram. > >> 2-Increase the Memory allocated to solr by setting XMX values.at least > 12 > >> GB > >> allocate to solr. > >> > >> But if your all index will fit into the Cache memory it will give you > the > >> better result. > >> > >> Also add more servers to load balance as your QPS is high. > >> Your 7 Laks data makes 25 GB of index its looking quite high.Try to > lower > >> the index size > >> What are you indexing in your 25GB of index? > >> > >> - > >> Thanx: > >> Grijesh > >> -- > >> View this message in context: > >> > http://lucene.472066.n3.nabble.com/Solr-Out-of-Memory-Error-tp2280037p2285779.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > > > > > > > > -- > > Thanks & Regards, > > Isan Fulia. > > >
Detailed Steps for Scaling Solr
Dear all, I need to construct a site which supports searching for a large index. I think scaling Solr is required. However, I didn't get a tutorial which helps me do that step by step. I only have two resources as references. But both of them do not tell me the exact operations. 1) http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr 2) David Smiley, Eric Pugh; Solr 1.4 Enterprise Search Server If you have experiences to scale Solr, could you give me such tutorials? Thanks so much! LB
My Plan to Scale Solr
Dear all, I started to learn how to use Solr three months ago. My experiences are still limited. Now I crawl Web pages with my crawler and send the data to a single Solr server. It runs fine. Since the potential users are large, I decide to scale Solr. After configuring replication, a single index can be replicated to multiple servers. For shards, I think it is also required. I attempt to split the index according to the data categories and priorities. After that, I will use the above replication techniques and get high performance. The following work is not so difficult. I noticed some new terms, such as SolrClould, Katta and ZooKeeper. According to my current understandings, it seems that I can ignore them. Am I right? What benefits can I get if using them? Thanks so much! LB
Selection Between Solr and Relational Database
Dear all, I have started to learn Solr for two months. At least right now, my system runs good in a Solr cluster. I have a question when implementing one feature in my system. When retrieving documents by keyword, I believe Solr is faster than relational database. However, if doing the following operations, I guess the performance must be lower. Is it right? What I am trying to do is listed as follows. 1) All of the documents in Solr have one field which is used to differentiate them; different categories have different value in such a field, e.g., Group; the documents are classified as "news", "sports", "entertainment" and so on. 2) Retrieve all of them documents by the field, Group. 3) Besides the field of Group, another field called CreatedTime is also existed. I will filter the documents retrieved by Group according to the value of CreatedTime. The filtered documents are the final results I need. I guess the operation performance is lower than relational database, right? Could you please give me an explanation to that? Best regards, Li Bing
Re: SolrJ Tutorial
Dear Lance, Could you tell me where I can find the unit tests code? I appreciate so much for your help! Best regards, LB On Sat, Jan 22, 2011 at 3:58 PM, Lance Norskog wrote: > The unit tests are simple and show the steps. > > Lance > > On Fri, Jan 21, 2011 at 10:41 PM, Bing Li wrote: > > Hi, all, > > > > In the past, I always used SolrNet to interact with Solr. It works great. > > Now, I need to use SolrJ. I think it should be easier to do that than > > SolrNet since Solr and SolrJ should be homogeneous. But I cannot find a > > tutorial that is easy to follow. No tutorials explain the SolrJ > programming > > step by step. No complete samples are found. Could anybody offer me some > > online resources to learn SolrJ? > > > > I also noticed Solr Cell and SolrJ POJO. Do you have detailed resources > to > > them? > > > > Thanks so much! > > LB > > > > > > -- > Lance Norskog > goks...@gmail.com >
When Index is Updated Frequently
Dear all, According to my experiences, when the Lucene index updated frequently, its performance must become low. Is it correct? In my system, most data crawled from the Web is indexed and the corresponding index will NOT be updated any more. However, some indexes should be updated frequently like the records in relational databases. The sizes of the indexes are not so large as the crawled data. The updated index will NOT be scaled to many other nodes. In most time, they are located on a very limited number of machines. In this case, may I use Lucene indexes? Or I need to replace them with relational databases? Thanks so much! LB
Re: When Index is Updated Frequently
Dear Michael, Thanks so much for your answer! I have a question. If Lucene is good at updating, it must more loads on the Solr cluster. So in my system, I will leave the large amount of crawled data unchanged for ever. Meanwhile, I use a traditional database to keep mutable data. Fortunately, in most Internet systems, the amount of mutable data is much less than that of immutable one. How do you think about my solution? Best, LB On Sat, Mar 5, 2011 at 2:45 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Fri, Mar 4, 2011 at 10:09 AM, Bing Li wrote: > > > According to my experiences, when the Lucene index updated frequently, > its > > performance must become low. Is it correct? > > In fact Lucene can gracefully handle a high rate of updates with low > latency turnaround on the readers, using the near-real-time (NRT) API > -- IndexWriter.getReader() (or in soon-to-be 31, > IndexReader.open(IndexWriter)). > > NRT is really something a hybrid of "eventual consistency" and > "immediate consistency", because it lets your app have full control > over how quickly changes must be visible by controlling when you > pull a new NRT reader. > > That said, Lucene can't offer true immediate consistency at a high > update rate -- the time to open a new NRT reader is usually too costly > to do, eg, for every search. But eg every 100 msec (say) is > reasonable (depending on many variables...). > > So... for your app you should run some tests and see. And please > report back. > > (But, unfortunately, NRT hasn't been exposed in Solr yet...). > > -- > Mike > > http://blog.mikemccandless.com >