Multilingual search in multicore solr
Hi, all, I am going to multilingual search in multicore solr. Specifically, the design of the solr server is like: I have several cores corresponding to different languages, where each core has its configuration files and data. I have following questions: 1. While indexing a document, I use ExtractingRequestHandler in Tika0.10 (embed in Solr3.5.0) and I can get a field "language_s" after indexing. Is it possible to get the info of the "language_s" before indexing happens, so that I can put the document in the corresponding core? 2. In searching with a query, is it possible that I can use language detection function to determine the language code of the query, so that I direct the query to the corresponding core? Thanks for your suggestions. Note: In this thread I would like to stick on multicore solr and want to see whether the problems can be solved. Meanwhile, I am aware that multilingual search does not necessarily need multicore solr, which I have learned in previous thread. http://lucene.472066.n3.nabble.com/Tika0-10-language-identifier-in-Solr3-5-0-tt3671712.html#none -- View this message in context: http://lucene.472066.n3.nabble.com/Multilingual-search-in-multicore-solr-tp3698969p3698969.html Sent from the Solr - User mailing list archive at Nabble.com.
language specific fields of "text"
Hi, all, In this thread, I would like to ask some technical questions about how the schema is defined to achieve language specific fields "text". Say, currently I have the filed "text" defined as follows: text*" type="text_general" indexed="true" stored="true" multiValued="true"/> After indexing a document, I can see a field in the document extracted correctly. My first attempt is to add a filed named "text_en", defined exactly the same way as "text": text_en*" type="text_general" indexed="true" stored="true" multiValued="true"/> However, after indexing the same document, why cannot I see the filed extracted? Is it because "text" is a reserved field that cannot be changed dynamically? -- View this message in context: http://lucene.472066.n3.nabble.com/language-specific-fields-of-text-tp3698985p3698985.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multilingual search in multicore solr
Hi, Erick Erickson, Your suggestions are sound. For (1), if I use SolrJ as the client to access Solr, then java coding becomes the most challenging part. Technically, I want to achieve the same effect with highlighting, faceting search, language detection, etc. Do you know some example SC that I can refer to? For (2), I agree with you on the difficulty in detecting language from just a few words. Thus, alternatively I can suggest a set of results and let users to decide. You also mentioned score. Say, I have not so many cores, and so for every query I direct it to all the cores, returned with a set of scores. Is it confident to conclude that the highest score gives the most confidence of the results? Thanks. Best Regards, Ni Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Multilingual-search-in-multicore-solr-tp3698969p3702041.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: language specific fields of "text"
Hi, Paul, I understand your point of missing "text_en" in the document. It is. Not "text_en" but "text" exists. But then it arises the question: isn't it dynamic to add language specific suffixes to an existing filed "text"? I am new here. As far as I know, for some field "title", people can create "title_en" "title_fr" to incorporate different analyzers in the same schema. Even this, I am not seeing it happens. Thus, I am thinking whether it is possible I neglect some obvious point? "Bing" is very common in the names of Chinese, as there are several Chinese characters corresponding to the same pronunciation. Thanks for reply. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/language-specific-fields-of-text-tp3698985p3702053.html Sent from the Solr - User mailing list archive at Nabble.com.
Source code of post in example package of Solr
Hi, all, I am using the following jar to index files in xml format, and I want to look into the source code. Where can I find it? Thanks. \apache-solr-3.5.0\example\exampledocs>java -jar post.jar *.xml Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Source-code-of-post-in-example-package-of-Solr-tp3702100p3702100.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing content in XML files
Hi, all, I am investigating the indexing in XML files. Currently, I have two findings: 1. Use DataImportHanlder. This requires to create one more configuration file for DIH, data-config.xml, which defines the fields specifically for my XML files. 2. Use the example package coming with Solr. This only requires to define the fields in the schema, and no additional configuration file needed. \apache-solr-3.5.0\example\exampledocs>java -jar post.jar *.xml I don't know whether I understand the two methods correctly, but it seems to me that they are absolutely different. If I want to index XML files with many self-defined fields, probably with embedded fields, which one makes more sense? Thanks. Best Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-content-in-XML-files-tp3702795p3702795.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Source code of post in example package of Solr
Hi, iorixxx, Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Closed-Source-code-of-post-in-example-package-of-Solr-tp3702100p3705333.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multilingual search in multicore solr
Hi, Erick, Thanks for your comment. Though I have some experience in Solr, I am completely a newbie in SolrJ, and haven't tried using SolrJ to access Solr. For now, I have a src package of solr3.5.0, and a SolrJ sc downloaded from web that I want to incorporate into Solr and have a try. How would I do to build and run it? Where should I put the sc in the package? Is IDE a must to do that? I cannot find many start-up tutorials about that, thus would be grateful if any suggestions and hints brought about. Best Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Multilingual-search-in-multicore-solr-tp3698969p3705556.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing content in XML files
Hi, all, Thanks for the comment. Then I will abandon post.jar, and try to learn SolrJ instead. Best Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-content-in-XML-files-tp3702795p3705563.html Sent from the Solr - User mailing list archive at Nabble.com.
Closed -- Re: Multilingual search in multicore solr
Hi, Erick, Thanks for commenting on this thread, and I think my problem has been solved. I might start another thread raising technical questions about using SolrJ. Thank you again. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Multilingual-search-in-multicore-solr-tp3698969p3708757.html Sent from the Solr - User mailing list archive at Nabble.com.
Fail to compile Java code (trying to use SolrJ with Solr)
Hi, all, I am trying to coding Java so that use SolrJ to access Solr, but failed in the first attempt. I have some experience in Solr, but I am a newbie of SolrJ. The following are the description of what I set, what I did, and what I got. I will be grateful if anyone can bring out some suggestions and point out my mistakes. What I have: Following are the necessary tools installed 1. Java 1.6.0_26 2. apache-tomcat-6.0.32 3. apache-solr-3.5.0 4. apache-solr-3.5.0-src 5. apache-maven-2.2.1 6. apache-ant-1.8.2 What I Set: 1. Classpath c:\apache-solr-3.5.0\apache-solr-3.5.0\dist Following are the jars might be used and also consisted in the directory indicated in the classpath: apache-solr-solrj-3.5.0.jar solrj-lib/commons-httpclient-3.1.jar solrj-lib/commons-codec-1.5.jar 2. Pom.xml in C:\apache-solr-3.5.0-src\apache-solr-3.5.0\ Adding the following dependency: org.apache.solr solr-solrj 3.5.0 What I Did: 1. Try to compile a MySolrJTest.java. Following is the sc, simple enough. import org.apache.solr.client.solrj.SolrServer; class MySolrjTest { public void query(String q) { CommonsHttpSolrServer server = null; try { server = new CommonsHttpSolrServer("http://localhost:8983/solr/";); } catch(Exception e) { e.printStackTrace(); } } public static void main(String[] args) { MySolrjTest solrj = new MySolrjTest(); solrj.query(args[0]); } } What I Get: After I compile the code using the following command, errors arouse: C:\apache-solr-3.5.0-src>javac MySolrjTest.java MySolrjTest.java:1: package org.apache.solr.client.solrj does not exist import org.apache.solr.client.solrj.SolrServer; ^ MySolrjTest.java:7: cannot find symbol symbol : class CommonsHttpSolrServer location: class MySolrjTest CommonsHttpSolrServer server = null; ^ MySolrjTest.java:11: cannot find symbol symbol : class CommonsHttpSolrServer location: class MySolrjTest server = new CommonsHttpSolrServer("http://localhost:8983/solr/";); ^ 3 errors Best Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Fail-to-compile-Java-code-trying-to-use-SolrJ-with-Solr-tp3708902p3708902.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fail to compile Java code (trying to use SolrJ with Solr)
Hi, all, Following the previous topic, if I abandon my own code and try to build a project with the original package apache-solr-3.5.0-src, I failed again. Following are the description of some technical details, and I hope someone can help to point out my mistakes. What I Have Besides the tools mentioned above, I install the following tool: NetBeans 7 IDE What I Set 2. Pom.xml in C:\apache-solr-3.5.0-src\apache-solr-3.5.0\ Adding the following dependency: org.apache.solr solr-solrj 3.5.0 What I Did: 2. Open the project by loading the original package apache-solr-3.5.0-src, and try to build it in NetBeans. What I Get: 2. Following are part of the messages shown in the output : BUILD FAILURE Total time: 5:39.460s Finished at: Thu Feb 02 11:00:45 CST 2012 Final Memory: 28M/129M Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.10:test (default-test) on project lucene-core: There are test failures. Please refer to C:\apache-solr-3.5.0-src\apache-solr-3.5.0\lucene\build\surefire-reports for the individual test results. -> [Help 1] To see the full stack trace of the errors, re-run Maven with the -e switch. Re-run Maven using the -X switch to enable full debug logging. For more information about the errors and possible solutions, please read the following articles: [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException After correcting the problems, you can resume the build with the command mvn -rf :lucene-core 'cmd' is not recognized as an internal or external command, operable program or batch file. -- View this message in context: http://lucene.472066.n3.nabble.com/Fail-to-compile-Java-code-trying-to-use-SolrJ-with-Solr-tp3708902p3708923.html Sent from the Solr - User mailing list archive at Nabble.com.
Development inside or outside of Solr?
Hi, all, I am deploying a multicore solr server runing on Tomcat, where I want to achieve language detection during index/query. Solr3.5.0 has a wrapped Tika API that can do language detection. Currently, the default behavior of Solr3.5.0 is, every time I index a document, and at mean time Solr call Tika API to give the result of language detection, i.e. index and detection happens at the same time. However, I hope I can have the language detection result first, and then I decide which core to put the document, i.e. detection happens before index. There seems that I need to do development in either of the following ways: 1. I might need to do revision of Solr itself, change the default behavior of Solr; 2. Or I might write a Java client outside Solr, call the client through server (JSP maybe) in index/query. Can anyone meeting with similar conditions give some suggestions about the advantages and disad of the two approaches? Any other alternatives? Thank you. Best Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Development inside or outside of Solr?
I have looked into the TikaCLI with -language option, and learned that Tika can output only the language metadata. It cannot help me to solve my problem though, as my main concern is whether to change Solr or not. Thank you all the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3760131.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Development inside or outside of Solr?
Hi, François Schiettecatte Thank you for the reply all the same, but I choose to stick on Solr (wrapped with Tika language API) and do changes outside Solr. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3768903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Development inside or outside of Solr?
Hi, Erick, The example is impressive. Thank you. For the first, we decide not to do that, as Tika extraction is time-consuming part in indexing large files, and the dual call make the situation worse. For the second, for now, we choose Dspace to connect to DB, and discovery(solr) as the index/query. Thus, we might do revisions in dspace. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3768977.html Sent from the Solr - User mailing list archive at Nabble.com.
TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?
Hi, all, I am using org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory (since Solr3.5.0) to do language detection, and it's cool. An issue: if I deploy Solr3.3.0, is it possible to import that factory in Solr3.5.0 to be used in Solr3.3.0? Why I stick on Solr3.3.0 is because I am working on Dspace (discovery) to call solr, and for now the highest version that Solr can be upgraded to is 3.3.0. I would hope to do this while keep Dspace + Solr at the most. Say, import that factory into Solr3.3.0, is it possible? Does any one happen to know certain way to solve this? Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3771620.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to increase Size of Document in solr
Hi, Suneel, There is a configuration in solrconfig.xml that you might need to look at. Following I set the limit as 2GB. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-increase-Size-of-Document-in-solr-tp3771813p3771931.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fail to compile Java code (trying to use SolrJ with Solr)
Hi, Dmitry Thank you. It solved my problem. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Fail-to-compile-Java-code-trying-to-use-SolrJ-with-Solr-tp3708902p3772017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: upgrading to Tika 0.9 on Solr 1.4.1
Hi, all, I tried to upgrade tika0.8 to tika0.10 on solr3.3.0, following the similar steps, but failed. 1. Replace the following jars in /contrib/extraction/ fontbox-1.6.0, jempbox-1.6.0, pdfbox-1.6.0, tika-core-0.10, tika-parsers-0.10; 2. Copy all the jars in /contrib/langid/* from solr3.5.0 3. Copy /dist/apache-solr-langid-3.5.0 from solr3.5.0 4. Configure solrconfig.xml in solr3.3.0, adding the following lib and definition of updateRequestProcessorChain. text,title,author language_s en Errors: (typical errors when factory is not found) org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) at Anyone tried similar things before. Pls advice. Thank you. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-tp2570526p3772177.html Sent from the Solr - User mailing list archive at Nabble.com.
Failed to upgrade tika0.8 to tika0.10 in solr3.3.0
Hi, all, I tried to upgrade tika0.8 to tika0.10 on solr3.3.0, but failed. Following are some technical details. Anyone tried similar things before? Pls advice. Thank you. 1. Replace the following jars in /contrib/extraction/ fontbox-1.6.0, jempbox-1.6.0, pdfbox-1.6.0, tika-core-0.10, tika-parsers-0.10; 2. Copy all the jars in /contrib/langid/* from solr3.5.0 3. Copy /dist/apache-solr-langid-3.5.0 from solr3.5.0 4. Configure solrconfig.xml in solr3.3.0, adding the following lib and definition of updateRequestProcessorChain. text,title,author language_s en Errors: (typical errors when factory is not found) org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) at Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Failed-to-upgrade-tika0-8-to-tika0-10-in-solr3-3-0-tp3772180p3772180.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?
Hi, Erick, My idea is to use Tika0.10 in Dspace1.7.2, which is based on two steps: 1. Upgrade Solr1.4.1 to Solr3.3.0 in Dspace1.7.2 In the following link, upgraded Solr & Lucene 3.3.0 has been resolved. https://jira.duraspace.org/browse/DS-980 2. Upgrade to Tika0.10 in Solr3.3.0 In the following link, people has tried to upgrade Tika0.8 to Tika0.9. http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-td2570526.html I was thinking, if both the above two steps can be achieved, then maybe I can get it done. What is your suggestion? Thank you. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3779437.html Sent from the Solr - User mailing list archive at Nabble.com.
How to define a multivalued string type "langid.langsField" in solrconfig.xml
Hi, all, I am using tika language detection. It is said that, if "langid.langsField" is set as multivalued string, and then a list of languages can be stored for the fields specified in "langid.fl". Following is how I configure the processor in soleconfig.xml. I tried using "text" only, and the detected result is language_s="zh_tw"; for "attr_stream_name", the result is language_s="en". I was expecting, when adding both "text" and "attr_stream_name", the result would look like language_s="en,zh_tw". However, I failed to see the result. text,attr_stream_name language_s true I will be grateful if anyone can point my mistake or give some hints how to do the correct things. Thank you. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-define-a-multivalued-string-type-langid-langsField-in-solrconfig-xml-tp3779602p3779602.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?
HI, Erick, I can write SolrJ client to call Tika, but I am not certain where to invoke the client. In my case, I work on Dspace to call Solr, and I suppose the client should be invoked in-between Dspace and Solr. That is, Dspace invokes SolrJ client when doing index/query, which call Tika and Solr. Do you think it is reasonable? Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3782793.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?
Hi, Erick, I get your point. Thank you so much. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3782938.html Sent from the Solr - User mailing list archive at Nabble.com.
Can solr-langid(Solr3.5.0) detect multiple languages in one text?
Hi, all, I am using solr-langid(Solr3.5.0) to do language detection, and I hope multiple languages in one text can be detected. The example text is: 咖哩起源於印度。印度民間傳說咖哩是佛祖釋迦牟尼所創,由於咖哩的辛辣與香味可以幫助遮掩羊肉的腥騷,此舉即為用以幫助不吃豬肉與牛肉的印度人。在泰米爾語中,「kari」是「醬」的意思。在馬來西亞,kari也稱dal(當在mamak檔)。早期印度被蒙古人所建立的莫臥兒帝國(Mughal Empire)所統治過,其間從波斯(現今的伊朗)帶來的飲食習慣,從而影響印度人的烹調風格直到現今。 Curry (plural, Curries) is a generic term primarily employed in Western culture to denote a wide variety of dishes originating in Indian, Pakistani, Bangladeshi, Sri Lankan, Thai or other Southeast Asian cuisines. Their common feature is the incorporation of more or less complex combinations of spices and herbs, usually (but not invariably) including fresh or dried hot capsicum peppers, commonly called "chili" or "cayenne" peppers. I want the text can be separated into two parts, and the part in Chinese goes to "text_zh-tw" while the other one "text_en". Can I do something like that? Thank you. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Can-solr-langid-Solr3-5-0-detect-multiple-languages-in-one-text-tp3821210p3821210.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can solr-langid(Solr3.5.0) detect multiple languages in one text?
Hi, Jan Høydahl, Forgot to mention, the identifier I use is an existing one wrapped in Solr3.5.0., LangDetectLanguageIdentifier (http://wiki.apache.org/solr/LanguageDetection). For the language identifier, I looked into the sc, and found that the whole content of a text is parsed before detection, which is why the end result consists of a specific language instead of multiple languages. Then I can assume, if the content is processed section by section (or even line by line), the end result shall consist of multiple languages. So the question is, can you guys plug this modification of the existing identifier into Solr? Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Can-solr-langid-Solr3-5-0-detect-multiple-languages-in-one-text-tp3821210p3821764.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can solr-langid(Solr3.5.0) detect multiple languages in one text?
Hi, Tanguy, >For the other implementation ( >http://code.google.com/p/language-detection/ ), it seems to be >performing a first pass on the input, and tries to separate Latin >characters from the others. If there's more non-Latin characters than >Latin ones, then it will process the non-Latin characters only for >language detection. >Oddly, in the other way non-Latin characters are not stripped from the >input if there's more Latin characters than non-Latin ones... The example case does simplify, but it simulates the normal conditions I need to handle, i.e. normally the task is to detect non-Lantin languages, and mostly separate western and eastern languages. >Anyway, LangDetect's implementation ends up with a list of >probabilities, and only the most accurate one is kept by solr's >langdetect processor, if the probability satisfies a certain threshold. Yes, I agree with you on "a list of probabilities", and I think if those probabilities are all returned, then my problem has been solved partially. >In this very particular case, something simple, based on unicode ranges >could be used to provide hints on how to chunk the input. Because we >need to split western and eastern languages, both written in well >isolated unicode character-ranges. >Using this, the language identifier could be fed with chunks that are >mostly made of one language only (presumably), and we could have >different language identifications for each distinct chunks. Intelligent chunk partition might be a different and comprehensive task. Is it possible that the text is processed line by line (or several lines)? If detected language changes in-between two continuous lines (or several lines), it indicates a different language range. Thank you for the thoughtful comments. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Can-solr-langid-Solr3-5-0-detect-multiple-languages-in-one-text-tp3821210p3824365.html Sent from the Solr - User mailing list archive at Nabble.com.
Can I use Field Aliasing/Renaming on Solr3.3?
Hi, all, I am working on Solr3.3. Recently I found out a new feature (Field Aliasing/Renaming) in Solr3.6, and I want to use it in Solr3.3. Can I do that, and how? Thank you. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-use-Field-Aliasing-Renaming-on-Solr3-3-tp3916103p3916103.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr/Lucene Faceted Search Too Many Unique Values?
Hi, I am going to evaluate some Lucene/Solr capabilities on handling faceted queries, in particular, with a single facet field that contains large number (say up to 1 million) of distinct values. Does anyone have some experience on how lucene performs in this scenario? e.g. Doc1 has tags A B C D Doc2 has tags B C D E etc etc millions of docs and there can be millions of distinct tag values. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Lucene-Faceted-Search-Too-Many-Unique-Values-tp4112860.html Sent from the Solr - User mailing list archive at Nabble.com.
Sort for Retrieved Data
Dear all, I have a question when sorting retrieved data from Solr. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). If I search data by string field (complete matching), how does Lucene sort the retrieved data? If I add some filters, such as time, what about the sorting way? If I just need to top ones, is it proper to just add rows? If I want to add new sorting ways, how to do that? Thanks so much! Bing
How to Sort By a PageRank-Like Complicated Strategy?
Dear all, I am using SolrJ to implement a system that needs to provide users with searching services. I have some questions about Solr searching as follows. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). But, if I search data by string field (complete matching), how does Lucene sort the retrieved data? If I want to add new sorting ways, Solr's function query seems to support this feature. However, for a complicated ranking strategy, such PageRank, can Solr provide an interface for me to do that? My ranking ways are more complicated than PageRank. Now I have to load all of matched data from Solr first by keyword and rank them again in my ways before showing to users. It is correct? Thanks so much! Bing
Re: How to Sort By a PageRank-Like Complicated Strategy?
Hi, Kai, Thanks so much for your reply! If the retrieving is done on a string field, not a text field, a complete matching approach should be used according to my understanding, right? If so, how does Lucene rank the retrieved data? Best regards, Bing On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu wrote: > Solr is kind of retrieval step, you can customize the score formula in > Lucene. But it supposes not to be too complicated, like it's better can be > factorization. It also regards to the stored information, like > TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have > got. > > Sent from my iPad > > On Jan 21, 2012, at 1:33 PM, Bing Li wrote: > > > Dear all, > > > > I am using SolrJ to implement a system that needs to provide users with > > searching services. I have some questions about Solr searching as > follows. > > > > As I know, Lucene retrieves data according to the degree of keyword > > matching on text field (partial matching). > > > > But, if I search data by string field (complete matching), how does > Lucene > > sort the retrieved data? > > > > If I want to add new sorting ways, Solr's function query seems to support > > this feature. > > > > However, for a complicated ranking strategy, such PageRank, can Solr > > provide an interface for me to do that? > > > > My ranking ways are more complicated than PageRank. Now I have to load > all > > of matched data from Solr first by keyword and rank them again in my ways > > before showing to users. It is correct? > > > > Thanks so much! > > Bing >
Re: How to Sort By a PageRank-Like Complicated Strategy?
Dear Shashi, Thanks so much for your reply! However, I think the value of PageRank is not a static one. It must update on the fly. As I know, Lucene index is not suitable to be updated too frequently. If so, how to deal with that? Best regards, Bing On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant wrote: > Lucene has a mechanism to "boost" up/down documents using your custom > ranking algorithm. So if you come up with something like Pagerank > you might do something like doc.SetBoost(myboost), before writing to index. > > > > On Sat, Jan 21, 2012 at 5:07 PM, Bing Li wrote: > > Hi, Kai, > > > > Thanks so much for your reply! > > > > If the retrieving is done on a string field, not a text field, a complete > > matching approach should be used according to my understanding, right? If > > so, how does Lucene rank the retrieved data? > > > > Best regards, > > Bing > > > > On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu wrote: > > > >> Solr is kind of retrieval step, you can customize the score formula in > >> Lucene. But it supposes not to be too complicated, like it's better can > be > >> factorization. It also regards to the stored information, like > >> TF,DF,position, etc. You can do 2nd phase rerank to the top N data you > have > >> got. > >> > >> Sent from my iPad > >> > >> On Jan 21, 2012, at 1:33 PM, Bing Li wrote: > >> > >> > Dear all, > >> > > >> > I am using SolrJ to implement a system that needs to provide users > with > >> > searching services. I have some questions about Solr searching as > >> follows. > >> > > >> > As I know, Lucene retrieves data according to the degree of keyword > >> > matching on text field (partial matching). > >> > > >> > But, if I search data by string field (complete matching), how does > >> Lucene > >> > sort the retrieved data? > >> > > >> > If I want to add new sorting ways, Solr's function query seems to > support > >> > this feature. > >> > > >> > However, for a complicated ranking strategy, such PageRank, can Solr > >> > provide an interface for me to do that? > >> > > >> > My ranking ways are more complicated than PageRank. Now I have to load > >> all > >> > of matched data from Solr first by keyword and rank them again in my > ways > >> > before showing to users. It is correct? > >> > > >> > Thanks so much! > >> > Bing > >> >
SolrCell takes InputStream
Hi, While using ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); The two ways of adding a file are up.addFile(File) up.addContentStream(ContentStream) However my raw files are stored on some remote storage devices. I am able to get an InputStream object for the file to be indexed. To me it may seem awkward to have the file temporarily stored locally. Is there a way of directly passing the InputStream in (e.g. constructing ContentStream using the InputStream)? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCell-takes-InputStream-tp4024315.html Sent from the Solr - User mailing list archive at Nabble.com.
Search match all tokens in Query Text
Hello, I have a field text with type text_general here. When I query for text:a b, solr returns results that contain only a but not b. That is, it uses OR operator between the two tokens. Am I right here? What should I do to force an AND operator between the two tokens? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Search-match-all-tokens-in-Query-Text-tp4037758.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search match all tokens in Query Text
Thanks for the quick reply. Seems like you are suggesting to add explicitly AND operator. I don't think this solves my problem. I found it somewhere, and this works. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-match-all-tokens-in-Query-Text-tp4037758p4037762.html Sent from the Solr - User mailing list archive at Nabble.com.
How to define a lowercase fieldtype without tokenizer
Hi, I don't want the field to be tokenized because Solr doesn't support sorting on a tokenized field. In order to do case insensitive sorting I need to copy a field to a lowercase but not tokenized field. How to define this? I did below but it says I need to specify a tokenizer or a class for analyzer. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to define a lowercase fieldtype without tokenizer
Works perfectly. Thank you. I didn't know this tokenizer does nothing before :) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500p4040507.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Sort By a PageRank-Like Complicated Strategy?
Dear Shashi, As I learned, big data, such as Lucene index, was not suitable to be updated frequently. Frequent updating must affect the performance and consistency when Lucene index must be replicated in a large scale cluster. It is expected such a search engine must work in a write-once & read-many environment, right? That's what HDFS (Hadoop Distributed File System) provides. According to my experience, it is really slow when updating a Lucene Index. Why did you say I could update Lucene index frequently? Thanks so much! Bing On Mon, Jan 23, 2012 at 11:02 PM, Shashi Kant wrote: > You can update the document in the index quite frequently. IDNK what > your requirement is, another option would be to boost query time. > > On Sun, Jan 22, 2012 at 5:51 AM, Bing Li wrote: > > Dear Shashi, > > > > Thanks so much for your reply! > > > > However, I think the value of PageRank is not a static one. It must > update > > on the fly. As I know, Lucene index is not suitable to be updated too > > frequently. If so, how to deal with that? > > > > Best regards, > > Bing > > > > > > On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant > wrote: > >> > >> Lucene has a mechanism to "boost" up/down documents using your custom > >> ranking algorithm. So if you come up with something like Pagerank > >> you might do something like doc.SetBoost(myboost), before writing to > >> index. > >> > >> > >> > >> On Sat, Jan 21, 2012 at 5:07 PM, Bing Li wrote: > >> > Hi, Kai, > >> > > >> > Thanks so much for your reply! > >> > > >> > If the retrieving is done on a string field, not a text field, a > >> > complete > >> > matching approach should be used according to my understanding, right? > >> > If > >> > so, how does Lucene rank the retrieved data? > >> > > >> > Best regards, > >> > Bing > >> > > >> > On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu wrote: > >> > > >> >> Solr is kind of retrieval step, you can customize the score formula > in > >> >> Lucene. But it supposes not to be too complicated, like it's better > can > >> >> be > >> >> factorization. It also regards to the stored information, like > >> >> TF,DF,position, etc. You can do 2nd phase rerank to the top N data > you > >> >> have > >> >> got. > >> >> > >> >> Sent from my iPad > >> >> > >> >> On Jan 21, 2012, at 1:33 PM, Bing Li wrote: > >> >> > >> >> > Dear all, > >> >> > > >> >> > I am using SolrJ to implement a system that needs to provide users > >> >> > with > >> >> > searching services. I have some questions about Solr searching as > >> >> follows. > >> >> > > >> >> > As I know, Lucene retrieves data according to the degree of keyword > >> >> > matching on text field (partial matching). > >> >> > > >> >> > But, if I search data by string field (complete matching), how does > >> >> Lucene > >> >> > sort the retrieved data? > >> >> > > >> >> > If I want to add new sorting ways, Solr's function query seems to > >> >> > support > >> >> > this feature. > >> >> > > >> >> > However, for a complicated ranking strategy, such PageRank, can > Solr > >> >> > provide an interface for me to do that? > >> >> > > >> >> > My ranking ways are more complicated than PageRank. Now I have to > >> >> > load > >> >> all > >> >> > of matched data from Solr first by keyword and rank them again in > my > >> >> > ways > >> >> > before showing to users. It is correct? > >> >> > > >> >> > Thanks so much! > >> >> > Bing > >> >> > > > > >
How is Data Indexed in HBase?
Dear all, I wonder how data in HBase is indexed? Now Solr is used in my system because data is managed in inverted index. Such an index is suitable to retrieve unstructured and huge amount of data. How does HBase deal with the issue? May I replaced Solr with HBase? Thanks so much! Best regards, Bing
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Mr Gupta, Thanks so much for your reply! In my use cases, retrieving data by keyword is one of them. I think Solr is a proper choice. However, Solr does not provide a complex enough support to rank. And, frequent updating is also not suitable in Solr. So it is difficult to retrieve data randomly based on the values other than keyword frequency in text. In this case, I attempt to use HBase. But I don't know how HBase support high performance when it needs to keep consistency in a large scale distributed system. Now both of them are used in my system. I will check out ElasticSearch. Best regards, Bing On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta wrote: > Bing, > Its a classic battle on whether to use solr or hbase or a combination of > both. both systems are very different but there is some overlap in the > utility. they also differ vastly when it compares to computation power, > storage needs, etc. so in the end, it all boils down to your use case. you > need to pick the technology that it best suited to your needs. > im still not clear on your use case though. > > btw, if you haven't started using solr yet - then you might want to > checkout ElasticSearch. I spent over a week researching between solr and ES > and eventually chose ES due to its cool merits. > > thanks > > > On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu wrote: > >> There is no secondary index support in HBase at the moment. >> >> It's on our road map. >> >> FYI >> >> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li wrote: >> >> > Jacques, >> > >> > Yes. But I still have questions about that. >> > >> > In my system, when users search with a keyword arbitrarily, the query is >> > forwarded to Solr. No any updating operations but appending new indexes >> > exist in Solr managed data. >> > >> > When I need to retrieve data based on ranking values, HBase is used. >> And, >> > the ranking values need to be updated all the time. >> > >> > Is that correct? >> > >> > My question is that the performance must be low if keeping consistency >> in a >> > large scale distributed environment. How does HBase handle this issue? >> > >> > Thanks so much! >> > >> > Bing >> > >> > >> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques wrote: >> > >> > > It is highly unlikely that you could replace Solr with HBase. They're >> > > really apples and oranges. >> > > >> > > >> > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li wrote: >> > > >> > >> Dear all, >> > >> >> > >> I wonder how data in HBase is indexed? Now Solr is used in my system >> > >> because data is managed in inverted index. Such an index is suitable >> to >> > >> retrieve unstructured and huge amount of data. How does HBase deal >> with >> > >> the >> > >> issue? May I replaced Solr with HBase? >> > >> >> > >> Thanks so much! >> > >> >> > >> Best regards, >> > >> Bing >> > >> >> > > >> > > >> > >> > >
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Dear Mr Gupta, Your understanding about my solution is correct. Now both HBase and Solr are used in my system. I hope it could work. Thanks so much for your reply! Best regards, Bing On Fri, Feb 24, 2012 at 3:30 AM, T Vinod Gupta wrote: > regarding your question on hbase support for high performance and > consistency - i would say hbase is highly scalable and performant. how it > does what it does can be understood by reading relevant chapters around > architecture and design in the hbase book. > > with regards to ranking, i see your problem. but if you split the problem > into hbase specific solution and solr based solution, you can achieve the > results probably. may be you do the ranking and store the rank in hbase and > then use solr to get the results and then use hbase as a lookup to get the > rank. or you can put the rank as part of the document schema and index the > rank too for range queries and such. is my understanding of your scenario > wrong? > > thanks > > > On Wed, Feb 22, 2012 at 9:51 AM, Bing Li wrote: > >> Mr Gupta, >> >> Thanks so much for your reply! >> >> In my use cases, retrieving data by keyword is one of them. I think Solr >> is a proper choice. >> >> However, Solr does not provide a complex enough support to rank. And, >> frequent updating is also not suitable in Solr. So it is difficult to >> retrieve data randomly based on the values other than keyword frequency in >> text. In this case, I attempt to use HBase. >> >> But I don't know how HBase support high performance when it needs to keep >> consistency in a large scale distributed system. >> >> Now both of them are used in my system. >> >> I will check out ElasticSearch. >> >> Best regards, >> Bing >> >> >> On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta wrote: >> >>> Bing, >>> Its a classic battle on whether to use solr or hbase or a combination of >>> both. both systems are very different but there is some overlap in the >>> utility. they also differ vastly when it compares to computation power, >>> storage needs, etc. so in the end, it all boils down to your use case. you >>> need to pick the technology that it best suited to your needs. >>> im still not clear on your use case though. >>> >>> btw, if you haven't started using solr yet - then you might want to >>> checkout ElasticSearch. I spent over a week researching between solr and ES >>> and eventually chose ES due to its cool merits. >>> >>> thanks >>> >>> >>> On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu wrote: >>> >>>> There is no secondary index support in HBase at the moment. >>>> >>>> It's on our road map. >>>> >>>> FYI >>>> >>>> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li wrote: >>>> >>>> > Jacques, >>>> > >>>> > Yes. But I still have questions about that. >>>> > >>>> > In my system, when users search with a keyword arbitrarily, the query >>>> is >>>> > forwarded to Solr. No any updating operations but appending new >>>> indexes >>>> > exist in Solr managed data. >>>> > >>>> > When I need to retrieve data based on ranking values, HBase is used. >>>> And, >>>> > the ranking values need to be updated all the time. >>>> > >>>> > Is that correct? >>>> > >>>> > My question is that the performance must be low if keeping >>>> consistency in a >>>> > large scale distributed environment. How does HBase handle this issue? >>>> > >>>> > Thanks so much! >>>> > >>>> > Bing >>>> > >>>> > >>>> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques wrote: >>>> > >>>> > > It is highly unlikely that you could replace Solr with HBase. >>>> They're >>>> > > really apples and oranges. >>>> > > >>>> > > >>>> > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li wrote: >>>> > > >>>> > >> Dear all, >>>> > >> >>>> > >> I wonder how data in HBase is indexed? Now Solr is used in my >>>> system >>>> > >> because data is managed in inverted index. Such an index is >>>> suitable to >>>> > >> retrieve unstructured and huge amount of data. How does HBase deal >>>> with >>>> > >> the >>>> > >> issue? May I replaced Solr with HBase? >>>> > >> >>>> > >> Thanks so much! >>>> > >> >>>> > >> Best regards, >>>> > >> Bing >>>> > >> >>>> > > >>>> > > >>>> > >>>> >>> >>> >> >
Re: pagerank??
According to my knowledge, Solr cannot support this. In my case, I get data by keyword-matching from Solr and then rank the data by PageRank after that. Thanks, Bing On Wed, Apr 4, 2012 at 6:37 AM, Manuel Antonio Novoa Proenza < mano...@estudiantes.uci.cu> wrote: > Hello, > > I have in my Solr index , many indexed documents. > > Let me know any way or efficient function to calculate the page rank of > websites indexed. > > > s > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci
How to Transmit and Append Indexes
Hi, all, I am working on a distributed searching system. Now I have one server only. It has to crawl pages from the Web, generate indexes locally and respond users' queries. I think this is too busy for it to work smoothly. I plan to use two servers at at least. The jobs to crawl pages and generate indexes are done by one of them. After that, the new available indexes should be transmitted to anther one which is responsible for responding users' queries. From users' point of view, this system must be fast. However, I don't know how I can get the additional indexes which I can transmit. After transmission, how to append them to the old indexes? Does the appending block searching? Thanks so much for your help! Bing Li
Is it fine to transmit indexes in this way?
Hi, all, Since I didn't find that Lucene presents updated indexes to us, may I transmit indexes in the following way? 1) One indexing machine, A, is busy with generating indexes; 2) After a certain time, the indexing process is terminated; 3) Then, the new indexes are transmitted to machines which serve users' queries; 4) It is possible that some index files have the same names. So the conflicting files should be renamed; 5) After the transmission is done, the transmitted indexes are removed from A. 6) After the removal, the indexing process is started again on A. The reason I am trying to do that is to load balancing the search load. One machine is responsible for generating indexes and the others are responsible for responding queries. If the above approaches do not work, may I see the updates of indexes in Lucene? May I transmit them? And, may I append them to existing indexes? Does the appending affect the querying? I am learning Solr. But it seems that Solr does that for me. However, I have to set up Tomcat to use Solr. I think it is a little bit heavy. Thanks! Bing Li
Re: Is it fine to transmit indexes in this way?
Thanks so much, Gora! What do you mean by appending? If you mean adding to an existing index (on reindexing, this would normally mean an update for an existing Solr document ID, and a create for a new Solr document ID), the best way probably is not to delete the index on the master server (what you call machine A). Once the indexing is completed, a commit ensures that new documents show up for any subsequent queries. When updates are replicated to slave servers, it is supposed that the updates are merged with the existing indexes and readings on them can be done concurrently. If so, the queries must be responded instantly. That's what I mean "appending". Does it happen in Solr? Best, Bing On Sat, Nov 20, 2010 at 1:58 AM, Gora Mohanty wrote: > On Fri, Nov 19, 2010 at 10:53 PM, Bing Li wrote: > > Hi, all, > > > > Since I didn't find that Lucene presents updated indexes to us, may I > > transmit indexes in the following way? > > > > 1) One indexing machine, A, is busy with generating indexes; > > > > 2) After a certain time, the indexing process is terminated; > > > > 3) Then, the new indexes are transmitted to machines which serve users' > > queries; > > Just replied to a similar question in another thread. The best way > is probably to use Solr replication: > http://wiki.apache.org/solr/SolrReplication > > You can set up replication to happen automatically upon commit on the > master server (where the new index was made). As a commit should > have been made when indexing is complete on the master server, this > will then ensure that a new index is replicated on the slave server. > > > 4) It is possible that some index files have the same names. So the > > conflicting files should be renamed; > > Replication will handle this for you. > > > 5) After the transmission is done, the transmitted indexes are removed > from > > A. > > > > 6) After the removal, the indexing process is started again on A. > [...] > > These two items you have to do manually, i.e., delete all documents > on A, and restart the indexing. > > > > And, may I append them to > existing indexes? > > Does the appending affect the querying? > [...] > > What do you mean by appending? If you mean adding to an existing index > (on reindexing, this would normally mean an update for an existing Solr > document ID, and a create for a new Solr document ID), the best way > probably is not to delete the index on the master server (what you call > machine A). Once the indexing is completed, a commit ensures that new > documents show up for any subsequent queries. > > Regards, > Gora >
Re: How to Transmit and Append Indexes
Dear Erick, Thanks so much for your help! I am new in Solr. So I have no idea about the version. But I wonder what are the differences between Solr and Hadoop? It seems that Solr has done the same as what Hadoop promises. Best, Bing On Sat, Nov 20, 2010 at 2:28 AM, Erick Erickson wrote: > You haven't said what version of Solr you're using, but you're > asking about replication, which is built-in. > See: http://wiki.apache.org/solr/SolrReplication > > And no, your slave doesn't block while the update is happening, > and it automatically switches to the updated index upon > successful replication. > > Older versions of Solr used rsynch & etc. > > Best > Erick > > On Fri, Nov 19, 2010 at 10:52 AM, Bing Li wrote: > >> Hi, all, >> >> I am working on a distributed searching system. Now I have one server >> only. >> It has to crawl pages from the Web, generate indexes locally and respond >> users' queries. I think this is too busy for it to work smoothly. >> >> I plan to use two servers at at least. The jobs to crawl pages and >> generate >> indexes are done by one of them. After that, the new available indexes >> should be transmitted to anther one which is responsible for responding >> users' queries. From users' point of view, this system must be fast. >> However, I don't know how I can get the additional indexes which I can >> transmit. After transmission, how to append them to the old indexes? Does >> the appending block searching? >> >> Thanks so much for your help! >> >> Bing Li >> > >
Re: How to Transmit and Append Indexes
Hi, Gora, No, I really wonder if Solr is based on Hadoop? Hadoop is efficient when using on search engines since it is suitable to the write-once-read-many model. After reading your emails, it looks like Solr's distributed file system does the same thing. Both of them are good for searching large indexes in a large scale distributed environment, right? Thanks! Bing On Sat, Nov 20, 2010 at 3:01 AM, Gora Mohanty wrote: > On Sat, Nov 20, 2010 at 12:05 AM, Bing Li wrote: > > Dear Erick, > > > > Thanks so much for your help! I am new in Solr. So I have no idea about > the > > version. > > The solr/admin/registry.jsp URL on your local Solr installation should show > you the version at the top. > > > But I wonder what are the differences between Solr and Hadoop? It seems > that > > Solr has done the same as what Hadoop promises. > [...] > > Er, what? Solr and Hadoop are entirely different applications. Did you > mean Lucene or Nutch, instead of Hadoop? > > Regards, > Gora >
Import Data Into Solr
Hi, all, I am a new user of Solr. Before using it, all of the data is indexed myself with Lucene. According to the Chapter 3 of the book, Solr. 1.4 Enterprise Search Server written by David Smiley and Eric Pugh, data in the formats of XML, CSV and even PDF, etc, can be imported to Solr. If I wish to import the Lucene indexes into Solr, may I have any other approaches? I know that Solr is a serverized Lucene. Thanks, Bing Li
Solr Got Exceptions When "schema.xml" is Changed
Dear all, I am a new user of Solr. Now I am just trying to try some basic samples. Solr can be started correctly with Tomcat. However, when putting a new schema.xml under SolrHome/conf and starting Tomcat again, I got the following two exceptions. The Solr cannot be started correctly unless using the initial schema.xml from Solr. Why cannot I change the schema.xml? Thanks so much! Bing Dec 5, 2010 4:52:49 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:52) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1146) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) - SEVERE: Could not start SOLR. Check solr/home property org.apache.solr.common.SolrException: QueryElevationComponent requires the schema to have a uniqueKeyFie ld implemented using StrField at org.apache.solr.handler.component.QueryElevationComponent.inform(QueryElevationComponent.java :157) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:508) at org.apache.solr.core.SolrCore.(SolrCore.java:588) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:37 2) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4405) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5037) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:812) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:787) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:570) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:891) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:683) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:466) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1267) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:308) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:89) at org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:328) at org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:308) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1043) at org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:738) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1035) at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:289) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.StandardService.startInternal(StandardService.java:442) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:674) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.startup.Catalina.start(Catalina.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at
SolrHome and Solr Data Dir in solrconfig.xml
Dear all, I am a new user of Solr. When using Solr, SolrHome is set to /home/libing/Solr. When Tomcat is started, it must read solrconfig.xml to get Solr data dir, which is used to contain indexes. However, I have no idea how to associate SolrHome with Solr data dir. So a mistake occurs. All the indexes are put under $TOMCAT_HOME/bin. This is NOT what I expect. I hope indexes are under SolrHome. Could you please give me a hand? Best, Bing Li
Indexing and Searching Chinese
Hi, all, Now I cannot search the index when querying with Chinese keywords. Before using Solr, I ever used Lucene for some time. Since I need to crawl some Chinese sites, I use ChineseAnalyzer in the code to run Lucene. I know Solr is a server for Lucene. However, I have no idea know how to configure the analyzer in Solr? I appreciate so much for your help! Best, LB
Indexing and Searching Chinese with SolrNet
Dear all, After reading some pages on the Web, I created the index with the following schema. .. .. It must be correct, right? However, when sending a query though SolrNet, no results are returned. Could you tell me what the reason is? Thanks, LB
Re: Indexing and Searching Chinese with SolrNet
Dear Jelsma, My servlet container is Tomcat 7. I think it should accept Chinese characters. But I am not sure how to configure it. From the console of Tomcat, I saw that the Chinese characters in the query are not displayed normally. However, it is fine in the Solr Admin page. I am not sure either if SolrNet supports Chinese. If not, how can I interact with Solr on .NET? Thanks so much! LB On Wed, Jan 19, 2011 at 2:34 AM, Markus Jelsma wrote: > Why creating two threads for the same problem? Anyway, is your servlet > container capable of accepting UTF-8 in the URL? Also, is SolrNet capable > of > handling those characters? To confirm, try a tool like curl. > > > Dear all, > > > > After reading some pages on the Web, I created the index with the > following > > schema. > > > > .. > > > positionIncrementGap="100"> > > > > > class="solr.ChineseTokenizerFactory"/> > > > > > > .. > > > > It must be correct, right? However, when sending a query though SolrNet, > no > > results are returned. Could you tell me what the reason is? > > > > Thanks, > > LB >
Re: Indexing and Searching Chinese with SolrNet
Dear Jelsma, After configuring the Tomcat URIEncoding, Chinese characters can be processed correctly. I appreciate so much for your help! Best, LB On Wed, Jan 19, 2011 at 3:02 AM, Markus Jelsma wrote: > Hi, > > Yes but Tomcat might need to be configured to accept, see the wiki for more > information on this subject. > > http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config > > Cheers, > > > Dear Jelsma, > > > > My servlet container is Tomcat 7. I think it should accept Chinese > > characters. But I am not sure how to configure it. From the console of > > Tomcat, I saw that the Chinese characters in the query are not displayed > > normally. However, it is fine in the Solr Admin page. > > > > I am not sure either if SolrNet supports Chinese. If not, how can I > > interact with Solr on .NET? > > > > Thanks so much! > > LB > > > > > > On Wed, Jan 19, 2011 at 2:34 AM, Markus Jelsma > > > > wrote: > > > Why creating two threads for the same problem? Anyway, is your servlet > > > container capable of accepting UTF-8 in the URL? Also, is SolrNet > capable > > > of > > > handling those characters? To confirm, try a tool like curl. > > > > > > > Dear all, > > > > > > > > After reading some pages on the Web, I created the index with the > > > > > > following > > > > > > > schema. > > > > > > > > .. > > > > > > > > > > > > > > > positionIncrementGap="100"> > > > > > > > > > > > > > > > > > > > > > > > class="solr.ChineseTokenizerFactory"/> > > > > > > > > > > > > > > > > > > > > > > > > .. > > > > > > > > It must be correct, right? However, when sending a query though > > > > SolrNet, > > > > > > no > > > > > > > results are returned. Could you tell me what the reason is? > > > > > > > > Thanks, > > > > LB >
SolrJ Tutorial
Hi, all, In the past, I always used SolrNet to interact with Solr. It works great. Now, I need to use SolrJ. I think it should be easier to do that than SolrNet since Solr and SolrJ should be homogeneous. But I cannot find a tutorial that is easy to follow. No tutorials explain the SolrJ programming step by step. No complete samples are found. Could anybody offer me some online resources to learn SolrJ? I also noticed Solr Cell and SolrJ POJO. Do you have detailed resources to them? Thanks so much! LB
Re: SolrJ Tutorial
I got the solution. Attach one complete sample code I made as follows. Thanks, LB package com.greatfree.Solr; import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.common.params.ModifiableSolrParams; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.common.SolrDocumentList; import org.apache.solr.client.solrj.beans.Field; import java.net.MalformedURLException; public class SolrJExample { public static void main(String[] args) throws MalformedURLException, SolrServerException { SolrServer solr = new CommonsHttpSolrServer(" http://192.168.210.195:8080/solr/CategorizedHub";); SolrQuery query = new SolrQuery(); query.setQuery("*:*"); QueryResponse rsp = solr.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println(docs.getNumFound()); try { SolrServer solrScore = new CommonsHttpSolrServer(" http://192.168.210.195:8080/solr/score";); Score score = new Score(); score.id = "4"; score.type = "modern"; score.name = "iphone"; score.score = 97; solrScore.addBean(score); solrScore.commit(); } catch (Exception e) { System.out.println(e.toString()); } } } On Sat, Jan 22, 2011 at 3:58 PM, Lance Norskog wrote: > The unit tests are simple and show the steps. > > Lance > > On Fri, Jan 21, 2011 at 10:41 PM, Bing Li wrote: > > Hi, all, > > > > In the past, I always used SolrNet to interact with Solr. It works great. > > Now, I need to use SolrJ. I think it should be easier to do that than > > SolrNet since Solr and SolrJ should be homogeneous. But I cannot find a > > tutorial that is easy to follow. No tutorials explain the SolrJ > programming > > step by step. No complete samples are found. Could anybody offer me some > > online resources to learn SolrJ? > > > > I also noticed Solr Cell and SolrJ POJO. Do you have detailed resources > to > > them? > > > > Thanks so much! > > LB > > > > > > -- > Lance Norskog > goks...@gmail.com >
SolrDocumentList Size vs NumFound
Dear all, I got a weird problem. The number of searched documents is much more than 10. However, the size of SolrDocumentList is 10 and the getNumFound() is the exact count of results. When I need to iterate the results as follows, only 10 are displayed. How to get the rest ones? .. for (SolrDocument doc : docs) { System.out.println(doc.getFieldValue(Fields.CATEGORIZED_HUB_TITLE_FIELD) + ": " + doc.getFieldValue(Fields.CATEGORIZED_HUB_URL_FIELD) + "; " + doc.getFieldValue(Fields.HUB_CATEGORY_NAME_FIELD) + "/" + doc.getFieldValue(Fields.HUB_PARENT_CATEGORY_NAME_FIELD)); } .. Could you give me a hand? Thanks, LB
Open Too Many Files
Dear all, I got an exception when querying the index within Solr. It told me that too many files are opened. How to handle this problem? Thanks so much! LB [java] org.apache.solr.client.solrj. SolrServerException: java.net.SocketException: Too many open files [java] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483) [java] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) [java] at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) [java] at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) [java] at com.greatfree.Solr.Broker.Search(Broker.java:145) [java] at com.greatfree.Solr.SolrIndex.SelectHubPageHashByHubKey(SolrIndex.java:116) [java] at com.greatfree.Web.HubCrawler.Crawl(Unknown Source) [java] at com.greatfree.Web.Worker.run(Unknown Source) [java] at java.lang.Thread.run(Thread.java:662) [java] Caused by: java.net.SocketException: Too many open files [java] at java.net.Socket.createImpl(Socket.java:397) [java] at java.net.Socket.(Socket.java:371) [java] at java.net.Socket.(Socket.java:249) [java] at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) [java] at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) [java] at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) [java] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) [java] at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) [java] at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) [java] at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) [java] at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) [java] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) [java] ... 8 more [java] Exception in thread "Thread-96" java.lang.NullPointerException [java] at com.greatfree.Solr.SolrIndex.SelectHubPageHashByHubKey(SolrIndex.java:117) [java] at com.greatfree.Web.HubCrawler.Crawl(Unknown Source) [java] at com.greatfree.Web.Worker.run(Unknown Source) [java] at java.lang.Thread.run(Thread.java:662)
Re: Solr Out of Memory Error
Dear Adam, I also got the OutOfMemory exception. I changed the JAVA_OPTS in catalina.sh as follows. ... if [ -z "$LOGGING_MANAGER" ]; then JAVA_OPTS="$JAVA_OPTS -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager" else JAVA_OPTS="$JAVA_OPTS -server -Xms8096m -Xmx8096m" fi ... Is this change correct? After that, I still got the same exception. The index is updated and searched frequently. I am trying to change the code to avoid the frequent updates. I guess only changing JAVA_OPTS does not work. Could you give me some help? Thanks, LB On Wed, Jan 19, 2011 at 10:05 PM, Adam Estrada < estrada.adam.gro...@gmail.com> wrote: > Is anyone familiar with the environment variable, JAVA_OPTS? I set > mine to a much larger heap size and never had any of these issues > again. > > JAVA_OPTS = -server -Xms4048m -Xmx4048m > > Adam > > On Wed, Jan 19, 2011 at 3:29 AM, Isan Fulia > wrote: > > Hi all, > > By adding more servers do u mean sharding of index.And after sharding , > how > > my query performance will be affected . > > Will the query execution time increase. > > > > Thanks, > > Isan Fulia. > > > > On 19 January 2011 12:52, Grijesh wrote: > > > >> > >> Hi Isan, > >> > >> It seems your index size 25GB si much more compared to you have total > Ram > >> size is 4GB. > >> You have to do 2 things to avoid Out Of Memory Problem. > >> 1-Buy more Ram ,add at least 12 GB of more ram. > >> 2-Increase the Memory allocated to solr by setting XMX values.at least > 12 > >> GB > >> allocate to solr. > >> > >> But if your all index will fit into the Cache memory it will give you > the > >> better result. > >> > >> Also add more servers to load balance as your QPS is high. > >> Your 7 Laks data makes 25 GB of index its looking quite high.Try to > lower > >> the index size > >> What are you indexing in your 25GB of index? > >> > >> - > >> Thanx: > >> Grijesh > >> -- > >> View this message in context: > >> > http://lucene.472066.n3.nabble.com/Solr-Out-of-Memory-Error-tp2280037p2285779.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > > > > > > > > -- > > Thanks & Regards, > > Isan Fulia. > > >
Detailed Steps for Scaling Solr
Dear all, I need to construct a site which supports searching for a large index. I think scaling Solr is required. However, I didn't get a tutorial which helps me do that step by step. I only have two resources as references. But both of them do not tell me the exact operations. 1) http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr 2) David Smiley, Eric Pugh; Solr 1.4 Enterprise Search Server If you have experiences to scale Solr, could you give me such tutorials? Thanks so much! LB
My Plan to Scale Solr
Dear all, I started to learn how to use Solr three months ago. My experiences are still limited. Now I crawl Web pages with my crawler and send the data to a single Solr server. It runs fine. Since the potential users are large, I decide to scale Solr. After configuring replication, a single index can be replicated to multiple servers. For shards, I think it is also required. I attempt to split the index according to the data categories and priorities. After that, I will use the above replication techniques and get high performance. The following work is not so difficult. I noticed some new terms, such as SolrClould, Katta and ZooKeeper. According to my current understandings, it seems that I can ignore them. Am I right? What benefits can I get if using them? Thanks so much! LB
Selection Between Solr and Relational Database
Dear all, I have started to learn Solr for two months. At least right now, my system runs good in a Solr cluster. I have a question when implementing one feature in my system. When retrieving documents by keyword, I believe Solr is faster than relational database. However, if doing the following operations, I guess the performance must be lower. Is it right? What I am trying to do is listed as follows. 1) All of the documents in Solr have one field which is used to differentiate them; different categories have different value in such a field, e.g., Group; the documents are classified as "news", "sports", "entertainment" and so on. 2) Retrieve all of them documents by the field, Group. 3) Besides the field of Group, another field called CreatedTime is also existed. I will filter the documents retrieved by Group according to the value of CreatedTime. The filtered documents are the final results I need. I guess the operation performance is lower than relational database, right? Could you please give me an explanation to that? Best regards, Li Bing
Re: SolrJ Tutorial
Dear Lance, Could you tell me where I can find the unit tests code? I appreciate so much for your help! Best regards, LB On Sat, Jan 22, 2011 at 3:58 PM, Lance Norskog wrote: > The unit tests are simple and show the steps. > > Lance > > On Fri, Jan 21, 2011 at 10:41 PM, Bing Li wrote: > > Hi, all, > > > > In the past, I always used SolrNet to interact with Solr. It works great. > > Now, I need to use SolrJ. I think it should be easier to do that than > > SolrNet since Solr and SolrJ should be homogeneous. But I cannot find a > > tutorial that is easy to follow. No tutorials explain the SolrJ > programming > > step by step. No complete samples are found. Could anybody offer me some > > online resources to learn SolrJ? > > > > I also noticed Solr Cell and SolrJ POJO. Do you have detailed resources > to > > them? > > > > Thanks so much! > > LB > > > > > > -- > Lance Norskog > goks...@gmail.com >
When Index is Updated Frequently
Dear all, According to my experiences, when the Lucene index updated frequently, its performance must become low. Is it correct? In my system, most data crawled from the Web is indexed and the corresponding index will NOT be updated any more. However, some indexes should be updated frequently like the records in relational databases. The sizes of the indexes are not so large as the crawled data. The updated index will NOT be scaled to many other nodes. In most time, they are located on a very limited number of machines. In this case, may I use Lucene indexes? Or I need to replace them with relational databases? Thanks so much! LB
Re: When Index is Updated Frequently
Dear Michael, Thanks so much for your answer! I have a question. If Lucene is good at updating, it must more loads on the Solr cluster. So in my system, I will leave the large amount of crawled data unchanged for ever. Meanwhile, I use a traditional database to keep mutable data. Fortunately, in most Internet systems, the amount of mutable data is much less than that of immutable one. How do you think about my solution? Best, LB On Sat, Mar 5, 2011 at 2:45 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Fri, Mar 4, 2011 at 10:09 AM, Bing Li wrote: > > > According to my experiences, when the Lucene index updated frequently, > its > > performance must become low. Is it correct? > > In fact Lucene can gracefully handle a high rate of updates with low > latency turnaround on the readers, using the near-real-time (NRT) API > -- IndexWriter.getReader() (or in soon-to-be 31, > IndexReader.open(IndexWriter)). > > NRT is really something a hybrid of "eventual consistency" and > "immediate consistency", because it lets your app have full control > over how quickly changes must be visible by controlling when you > pull a new NRT reader. > > That said, Lucene can't offer true immediate consistency at a high > update rate -- the time to open a new NRT reader is usually too costly > to do, eg, for every search. But eg every 100 msec (say) is > reasonable (depending on many variables...). > > So... for your app you should run some tests and see. And please > report back. > > (But, unfortunately, NRT hasn't been exposed in Solr yet...). > > -- > Mike > > http://blog.mikemccandless.com >
how often do you boys restart your tomcat?
I find that, if I do not restart the master's tomcat for some days, the load average will keep rising to a high level, solr become slow and unstable, so I add a crontab to restart the tomcat everyday. do you boys restart your tomcat ? and is there any way to avoid restart tomcat?
Re: how often do you boys restart your tomcat?
I want to let system do the job instead of system adminm, beause I'm lazy ~ ^__^ But I just want a better way to fix the problem. restart server will cause some other problem like I need to rebuild the changes happened during the restart. 2011/7/27 Dave Hall : > On 27/07/11 11:42, Bing Yu wrote: >> >> do you boys restart your tomcat ? and is there any way to avoid restart >> tomcat? > > Our female sysadmin takes care of managing our server. >
I can't pass the unit test when compile from apache-solr-3.3.0-src
I just goto apache-solr-3.3.0/solr and run 'ant test' I find that the junit test will always fail, and told me ’BUILD FAILED‘ but if I type 'ant dist', I can get a apache-solr-3.3-SNAPSHOT.war with no warning. Is it a problem just me? my server:Centos 5.6 64bit/apache-ant-1.8.2 /junit and jdk (both jrocket and sun jdk1.6 fails)
Multiple Embedded Servers Pointing to single solrhome/index
Hi, I'm trying to use two embedded solr servers pointing to a same solrhome / index. So that's something like System.setProperty("solr.solr.home", "SomeSolrDir"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); m_server = new EmbeddedSolrServer(coreContainer, ""); on both applications. The problem is, after I have done one add+commit SolrInputDocument on one embedded server, the other server would fail to obtain write lock any more. I'm thinking there must be a way of releasing write lock so other servers may pick up. Is there an API that does so? Any inputs are appreciated. Bing
Re: Multiple Embedded Servers Pointing to single solrhome/index
Thanks Lance. The use case is to have a cluster of nodes which runs the same application with EmbeddedSolrServer on each of them, and they all point to the same index on NFS. Every application is designed equal, meaning that everyone may index and/or search. In such way, after every commit the writer needs to be closed for other nodes' availability. Do you see any issues of this use case? Is the EmbeddedSolrServer able to release its write lock without shutting down? -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p3999591.html Sent from the Solr - User mailing list archive at Nabble.com.
Does Solr support 'Value Search'?
Hi folks, Just wondering if there is a query handler that simply takes a query string and search on all/part of fields for field values? e.g. q=*admin* Response may look like author: [admin, system_admin, sub_admin] last_modifier: [admin, system_admin, sub_admin] doctitle: [AdminGuide, AdminManual] -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr index storage strategy on FileSystem
Hi folks, With StandardDirectoryFactory, index is stored under data/index in forms of frq, tim, tip and a few other files. While index grows larger, more files are generated and sometimes it merges a few of them. It's like there're some kinds of separation and merging strategies there. My question is, are the separation / merging strategies configurable? Basically I want to add a size limit for any individual file. Is it feasible without changing solr core code? Thanks! Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-index-storage-strategy-on-FileSystem-tp3999661.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does Solr support 'Value Search'?
Thanks for the response but wait... Is it related to my question searching for field values? I was not asking how to use wildcards though. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p3999817.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does Solr support 'Value Search'?
Not quite understand but I'd explain the problem I had. The response would contain only fields and a list of field values that match the query. Essentially it's querying for field values rather than documents. The underlying use case would be, when typing in a quick search box, the drill down menu may contain matches on authors, on doctitles, and potentially on other fields. Still thanks for your response and hopefully I'm making it clearer. Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p327.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Embedded Servers Pointing to single solrhome/index
Makes sense. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p4000180.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does Solr support 'Value Search'?
Thanks Kuli and Mikhail, Using either termcomponent or suggester I could get some suggested terms but it's still confusing me how to get the respective field names. In order to get that, Use TermComponent I'll need to do a term query to every possible field. Similar things as using SpellCheckComponent. CopyField won't help since I want the original field name. Any suggestions? Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p4000267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Embedded Servers Pointing to single solrhome/index
I agree. We chose embedded to minimize the maintenance cost of http solr servers. One more concern. Even if I have only one node doing indexing, other nodes need to reopen index reader periodically to catch up with new changes, right? Is there a solr request that does this? Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p4000269.html Sent from the Solr - User mailing list archive at Nabble.com.
Multiple SpellCheckComponents
Hello, Background is that I want to use both Suggest and SpellCheck features in a single query to have alternatives returned at one time. Right now I can only specify one of them using spellcheck.dictionary at query time. default .. suggest Am I able to use two separate SpellCheckComponents for these two and add them to a same searchhandler to achieve this? I tried and seems like one is overwriting the other. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-SpellCheckComponents-tp4000272.html Sent from the Solr - User mailing list archive at Nabble.com.
SpellCheckComponent Collation query
Hello, >From spell check component I'm able to get the collation query and its # of hits. Is it possible to have solr execute the collated query automatically and return doc search results without resending it on client side? Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/SpellCheckComponent-Collation-query-tp4000273.html Sent from the Solr - User mailing list archive at Nabble.com.
Tlog vs. buffer + softcommit.
Hello, I'm a bit confused with the purpose of Transaction Logs (Update Logs) in Solr. My understanding is, update request comes in, first the new item is put in RAM buffer as well as T-Log. After a soft commit happens, the new item becomes searchable but not hard committed in stable storage. Configuring soft commit interval to 1 sec achieves NRT. Then what exactly T-Log is doing in this scenario? Why is it there and under what circumstances is it being cleared? I tried to search for online documentations but no success. Trying to get something from source code. Any hints would be appreciated. Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Tlog-vs-buffer-softcommit-tp4000330.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tlog vs. buffer + softcommit.
Thanks for the information. It definitely helps a lot. There're numDeletesToKeep = 1000; numRecordsToKeep = 100; in UpdateLog so this should probably be what you're referring to. However when I was doing indexing the total size of TLogs kept on increasing. It doesn't sound like the case where there's a cap for number of documents? Also for peersync, can I find some intro online? -- View this message in context: http://lucene.472066.n3.nabble.com/Tlog-vs-buffer-softcommit-tp4000330p4000503.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tlog vs. buffer + softcommit.
I remember I did set the 15sec autocommit and still saw the Tlogs growing unboundedly. But sounds like theoretically it should not if I index in a constant rate. I'll probably try it again sometime. For the peersync, I think solr cloud now uses push-replication over pull. Hmm, it makes sense to keep an amount of Tlogs for peers to sync up. Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Tlog-vs-buffer-softcommit-tp4000330p4000509.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr4.0 Partially update document
Hi, Several days ago I came across some solrj test code on partially updating document field values. Sadly I forgot where that was. In Solr 4.0, "/update" is able to take in document id and fields as hashmaps like "id": "doc1" "field1": {"set":"new_value"} Just trying to figure out what's the solrj client code that does this. Thanks for any help on this, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-Partially-update-document-tp4000875.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.0 Partially update document
Got it at https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/test/org/apache/solr/client/solrj/SolrExampleTests.java Problem solved. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-Partially-update-document-tp4000875p4000878.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting Suggestions without Search Results
Hi, I'm having a spell check component that does auto-complete suggestions. It is part of "last-components" of my /select search handler. So apart from normal search results I also get a list of suggestions. Now I want to split things up. Is there a way that I can only get suggestions of a query without getting the normal search results? I may need to create a new handler for this. Can anyone please give me some ideas on that? Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-Suggestions-without-Search-Results-tp4000968.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting Suggestions without Search Results
Great comments. Thanks to you all. Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-Suggestions-without-Search-Results-tp4000968p4001192.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing thousands file on solr
You may write a client using solrj and loop through all files in that folder. Something like, ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); up.addFile(new File(fileLocation), null); ModifiableSolrParams p = new ModifiableSolrParams(); p.add("literal.id", str); ... up.setParams(p); server.request(up); Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-thousands-file-on-solr-tp4001050p4001196.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Are there any comparisons of Elastic Search specifically with SOLR 4?
Most of existing comparisons were done on Solr3.x or earlier against ES. After Solr4 added those cloud concepts similar to ES's, there are really less differences. Solr is more heavier loaded and was not designed for maximize elasticity In my opinion. It's not hard to decide which way to go as long as you have a preference on better scalability or better stability & online supports. Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Are-there-any-comparisons-of-Elastic-Search-specifically-with-SOLR-4-tp4000889p4001237.html Sent from the Solr - User mailing list archive at Nabble.com.
Send plain text file to solr for indexing
Hello, I used to use solrcell, which has built-in tika support to handle both extraction and indexing of raw documents. Now I got another text extraction provider to convert raw document to a plain text txt file so I want to let solr bypass that extraction phase. Is there a way I can send the plain txt file to solr to simply index that as a fulltext field without doing extraction on that file? Thanks, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Send-plain-text-file-to-solr-for-indexing-tp4004515.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Send plain text file to solr for indexing
So in order to use solrcell I'll have to add a number of dependent libraries, which is one of what I'm trying to avoid. The second thing is, solrcell still parses the plain text files and I don't want it to make any change to those of my exported files. Any ideas? Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Send-plain-text-file-to-solr-for-indexing-tp4004515p4004753.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Send plain text file to solr for indexing
Thanks Mr.Yagami. I'll look into that. Jack, for the latter two options, they both require reading the entire text file into memory, right? Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Send-plain-text-file-to-solr-for-indexing-tp4004515p4004772.html Sent from the Solr - User mailing list archive at Nabble.com.