Simple Sort Is Not Working In Solr 4.7?
Hi, I don't know whether it is my setup or any other reasons. But the fact is that a very simple sort is not working in my Solr 4.7 environment. The query is very simple : http://localhost:8983/solr/bibs/select?q=author:soros&fl=id,author,title&sort=title+asc&wt=xml&start=0&indent=true And the output is NOT sorted according to title : 0 1 title asc id,author,title true 0 author:soros xml 9018 Soros, George, 1930- The alchemy of finance : reading the mind of the market / George Soros 15785 Soros, George, 1930- Soros Foundations Bosnia / by George Soros 16281 Soros, George, 1930- Soros Foundations Prospect for European disintegration / by George Soros 25807 Soros, George Open society : reforming global capitalism / George Soros 27440 George Soros on globalization Soros, George, 1930- 22254 Soros, George, 1930- The crisis of global capitalism : open society endangered / George Soros 16914 Soros, George, 1930- Soros Fund Management The theory of reflexivity / by George Soros 17343 Financial turmoil in Europe and the United States : essays / George Soros Soros, George, 1930- 15542 Soros, George, 1930- Harvard Club of New York City Nationalist dictatorships versus open society / by George Soros 15891 Soros, George The new paradigm for financial markets : the credit crisis of 2008 and what it means / George Soros Thank you for the help in advance, Simon.
Re: Simple Sort Is Not Working In Solr 4.7?
Hi Alex, It's simply defined like this in the schema.xml : and it is cloned to the other multi-valued field o_title : Should I simply change the type to be "string" instead? Thanks again, Simon. On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch wrote: > What's the field definition for your "title" field? Is it just string > or are you doing some tokenizing? > > It should be a string or a single token cleaned up (e.g. lower-cased) > using KeywordTokenizer. In the example schema, you will normally see > the original field tokenized and the sort field separately with > copyField connection. In latest Solr, docValues are also recommended > for sort fields. > > Regards, >Alex. >
Re: Simple Sort Is Not Working In Solr 4.7?
Hi Alex, It's okay after I added in a new field "s_title" in the schema and re-indexed. But how can I ignore the articles ("A", "An", "The") in the sorting. As you can see from the below example : http://localhost:8983/solr/bibs/select?q=singapore&fl=id,title&sort=s_title+asc&wt=xml&start=0&rows=20&indent=true 0 0 singapore true id,title 0 s_title asc 20 xml 36 5th SEACEN-Toronto Centre Leadership Seminar for Senior Management of Central Banks on Financial System Oversight, 16-21 Oct 2005, Singapore 70 Anti-money laundering & counter-terrorism financing / Commercial Affairs Dept 15 China's anti-secession law : a legal perspective / Zou, Keyuan 12 China's currency peg : firm in the eye of the storm / Calla Wiemer 22 China's politics in 2004 : dawn of the Hu Jintao era / Zheng Yongnian & Lye Liang Fook 92 Goods and Services Tax Act [2005 ed.] (Chapter 117A) 13 Governing capacity in China : creating a contingent of qualified personnel / Kjeld Erik Brodsgaard 21 Health care marketization in urban China / Gu Xin 85 Lianhe Zaobao, Sunday 84 Singapore : vision of a global city / Jones Lang LaSalle 7 Singapore real estate investment trusts : leveraged value / Tony Darwell 96 Singapore's success : engineering economic growth / Henri Ghesquiere 23 The Chen-Soong meeting : the beginning of inter-party rapprochement in Taiwan? / Raymond R. Wu 17 The Haw Par saga in the 1970s / project sponsor, Low Kwok Mun; team leader, Sandy Ho; team members, Audrey Low ... et al 78 The New paper on Sunday 95 The little Red Dot : reflections by Singapore's diplomats / editors, Tommy Koh, Chang Li Lin 52 [Press releases and articles on policy changes affecting the Singapore property market] / compiled by the Information Resource Centre, Monetary Authority of Singapore dataq Simon is testing Solr - This one is in English. Color of the Wind. 我是中国人 , БOΛbШ OЙ PYCCKO-KИTAЙCKИЙ CΛOBAPb , Français-Chinois
Re: Simple Sort Is Not Working In Solr 4.7?
Great help and thanks to you, Alex. On Wed, Feb 18, 2015 at 2:48 PM, Alexandre Rafalovitch wrote: > Like I mentioned before. You could use string type if you just want > title it is. Or you can use a custom type to normalize the indexed > value, as long as you end up with a single token. > > So, if you want to strip leading A/An/The, you can use > KeywordTokenizer, combined with whatever post-processing you need. I > would suggest LowerCase filter and perhaps Regex filter to strip off > those leading articles. You may need to iterate a couple of times on > that specific chain. > > The good news is that you can just make a couple of type definitions > with different values/order, reload the index (from Cores screen of > the Web Admin UI) and run some of your sample titles through those > different definitions without having to reindex in the Analysis > screen. > > Regards, > Alex. > > > Sign up for my Solr resources newsletter at http://www.solr-start.com/ > > On 17 February 2015 at 22:36, Simon Cheng wrote: > > Hi Alex, > > > > It's okay after I added in a new field "s_title" in the schema and > > re-indexed. > > > > > multiValued="false"/> > > > > > > But how can I ignore the articles ("A", "An", "The") in the sorting. As > you > > can see from the below example : >
How to trace error records during POST?
Good morning, I used Solr 4.7 to post 186,745 XML files and 186,622 files have been indexed. That means there are 123 XML files with errors. How can I trace what these files are? Thank you in advance, Simon Cheng.
Tracing Files Which Have Errors
Hi there, I have posted 190,000 simple XML using POST.JAR and there are only 8 files that were with errors. But how do I know which are the ones have errors? Thank you in advance, Simon Cheng.
Fwd: Tracing Files Which Have Errors
Hi there, I have posted 190,000 simple XML using POST.JAR and there are only 8 files that were with errors. But how do I know which are the ones have errors? Thank you in advance, Simon Cheng.
Re: ICUTokenizer or StandardTokenizer or ??? for "text_all" type field that might include non-whitespace langs
Hi Tim, I'm working on a similar project with some differences and may be we can share our knowledge in this area : 1) I have no problem with the Chinese characters. You can try this link : http://123.100.239.158:8983/solr/collection1/browse?q=%E4%B8%AD%E5%9B%BD Solr can find the record even the phrase 中国 (meaning China) is in the middle of the sentence. 2) My problem is more relating to other Asian languages ... Thai and Arabic are two examples. Read from https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters that solr.ICUTokenizerFactory can overcome the problem and I am exploring this approach at the moment. Simon. On Sat, Jun 21, 2014 at 7:37 AM, T. Kuro Kurosaka wrote: > On 06/20/2014 04:04 AM, Allison, Timothy B. wrote: > >> Let's say a predominantly English document contains a Chinese sentence. >> If the English field uses the WhitespaceTokenizer with a basic >> WordDelimiterFilter, the Chinese sentence could be tokenized as one big >> token (if it doesn't have any punctuation, of course) and will be >> effectively unsearchable...barring use of wildcards. >> > > In my experiment with Solr 4.6.1, both StandardTokenizer and ICUTokenizer > generates a token per han character. So they are searcheable though > precision suffers. But in your scenario, Chinese text is rare, so some > precision > loss may not be a real issue. > > Kuro > >