Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
Hi,

I don't know whether it is my setup or any other reasons. But the fact is
that a very simple sort is not working in my Solr 4.7 environment.

The query is very simple :
http://localhost:8983/solr/bibs/select?q=author:soros&fl=id,author,title&sort=title+asc&wt=xml&start=0&indent=true

And the output is NOT sorted according to title :



0
1

title asc
id,author,title
true
0
author:soros
xml




9018

Soros, George, 1930-


The alchemy of finance : reading the mind of the market / George Soros



15785

Soros, George, 1930-
Soros Foundations

Bosnia / by George Soros


16281

Soros, George, 1930-
Soros Foundations


Prospect for European disintegration / by George Soros



25807

Soros, George


Open society : reforming global capitalism / George Soros



27440
George Soros on globalization

Soros, George, 1930-



22254

Soros, George, 1930-


The crisis of global capitalism : open society endangered / George Soros



16914

Soros, George, 1930-
Soros Fund Management

The theory of reflexivity / by George Soros


17343

Financial turmoil in Europe and the United States : essays / George Soros


Soros, George, 1930-



15542

Soros, George, 1930-
Harvard Club of New York City


Nationalist dictatorships versus open society / by George Soros



15891

Soros, George


The new paradigm for financial markets : the credit crisis of 2008 and what
it means / George Soros





Thank you for the help in advance,
Simon.


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
Hi Alex,

It's simply defined like this in the schema.xml :

   

and it is cloned to the other multi-valued field o_title :

   

Should I simply change the type to be "string" instead?

Thanks again,
Simon.


On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch 
wrote:

> What's the field definition for your "title" field? Is it just string
> or are you doing some tokenizing?
>
> It should be a string or a single token cleaned up (e.g. lower-cased)
> using KeywordTokenizer. In the example schema, you will normally see
> the original field tokenized and the sort field separately with
> copyField connection. In latest Solr, docValues are also recommended
> for sort fields.
>
> Regards,
>Alex.
>


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
Hi Alex,

It's okay after I added in a new field "s_title" in the schema and
re-indexed.

   
   

But how can I ignore the articles ("A", "An", "The") in the sorting. As you
can see from the below example :

http://localhost:8983/solr/bibs/select?q=singapore&fl=id,title&sort=s_title+asc&wt=xml&start=0&rows=20&indent=true



0
0

singapore
true
id,title
0
s_title asc
20
xml




36

5th SEACEN-Toronto Centre Leadership Seminar for Senior Management of
Central Banks on Financial System Oversight, 16-21 Oct 2005, Singapore



70

Anti-money laundering & counter-terrorism financing / Commercial Affairs
Dept



15

China's anti-secession law : a legal perspective / Zou, Keyuan



12

China's currency peg : firm in the eye of the storm / Calla Wiemer



22

China's politics in 2004 : dawn of the Hu Jintao era / Zheng Yongnian & Lye
Liang Fook



92

Goods and Services Tax Act [2005 ed.] (Chapter 117A)



13

Governing capacity in China : creating a contingent of qualified personnel
/ Kjeld Erik Brodsgaard



21
Health care marketization in urban China / Gu Xin


85
Lianhe Zaobao, Sunday


84

Singapore : vision of a global city / Jones Lang LaSalle



7

Singapore real estate investment trusts : leveraged value / Tony Darwell



96

Singapore's success : engineering economic growth / Henri Ghesquiere



23

The Chen-Soong meeting : the beginning of inter-party rapprochement in
Taiwan? / Raymond R. Wu



17

The Haw Par saga in the 1970s / project sponsor, Low Kwok Mun; team leader,
Sandy Ho; team members, Audrey Low ... et al



78
The New paper on Sunday


95

The little Red Dot : reflections by Singapore's diplomats / editors, Tommy
Koh, Chang Li Lin



52

[Press releases and articles on policy changes affecting the Singapore
property market] / compiled by the Information Resource Centre, Monetary
Authority of Singapore



dataq

Simon is testing Solr - This one is in English. Color of the Wind. 我是中国人 ,
БOΛbШ OЙ PYCCKO-KИTAЙCKИЙ CΛOBAPb , Français-Chinois






Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-18 Thread Simon Cheng
Great help and thanks to you, Alex.


On Wed, Feb 18, 2015 at 2:48 PM, Alexandre Rafalovitch 
wrote:

> Like I mentioned before. You could use string type if you just want
> title it is. Or you can use a custom type to normalize the indexed
> value, as long as you end up with a single token.
>
> So, if you want to strip leading A/An/The, you can use
> KeywordTokenizer, combined with whatever post-processing you need. I
> would suggest LowerCase filter and perhaps Regex filter to strip off
> those leading articles. You may need to iterate a couple of times on
> that specific chain.
>
> The good news is that you can just make a couple of type definitions
> with different values/order, reload the index (from Cores screen of
> the Web Admin UI) and run some of your sample titles through those
> different definitions without having to reindex in the Analysis
> screen.
>
> Regards,
>   Alex.
>
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
> On 17 February 2015 at 22:36, Simon Cheng  wrote:
> > Hi Alex,
> >
> > It's okay after I added in a new field "s_title" in the schema and
> > re-indexed.
> >
> > > multiValued="false"/>
> >
> >
> > But how can I ignore the articles ("A", "An", "The") in the sorting. As
> you
> > can see from the below example :
>


How to trace error records during POST?

2015-04-07 Thread Simon Cheng
Good morning,

I used Solr 4.7 to post 186,745 XML files and 186,622 files have been
indexed. That means there are 123 XML files with errors. How can I trace
what these files are?

Thank you in advance,
Simon Cheng.


Tracing Files Which Have Errors

2014-06-19 Thread Simon Cheng
Hi there,

I have posted 190,000 simple XML using POST.JAR and there are only 8 files
that were with errors. But how do I know which are the ones have errors?

Thank you in advance,
Simon Cheng.


Fwd: Tracing Files Which Have Errors

2014-06-19 Thread Simon Cheng
Hi there,

I have posted 190,000 simple XML using POST.JAR and there are only 8 files
that were with errors. But how do I know which are the ones have errors?

Thank you in advance,
Simon Cheng.


Re: ICUTokenizer or StandardTokenizer or ??? for "text_all" type field that might include non-whitespace langs

2014-06-20 Thread Simon Cheng
Hi Tim,

I'm working on a similar project with some differences and may be we can
share our knowledge in this area :

1) I have no problem with the Chinese characters. You can try this link :

http://123.100.239.158:8983/solr/collection1/browse?q=%E4%B8%AD%E5%9B%BD

Solr can find the record even the phrase 中国 (meaning China) is in the
middle of the sentence.

2) My problem is more relating to other Asian languages ... Thai and Arabic
are two examples. Read from
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters that
solr.ICUTokenizerFactory  can overcome the problem and I am exploring this
approach at the moment.

Simon.



On Sat, Jun 21, 2014 at 7:37 AM, T. Kuro Kurosaka 
wrote:

> On 06/20/2014 04:04 AM, Allison, Timothy B. wrote:
>
>> Let's say a predominantly English document contains a Chinese sentence.
>>  If the English field uses the WhitespaceTokenizer with a basic
>> WordDelimiterFilter, the Chinese sentence could be tokenized as one big
>> token (if it doesn't have any punctuation, of course) and will be
>> effectively unsearchable...barring use of wildcards.
>>
>
> In my experiment with Solr 4.6.1, both StandardTokenizer and ICUTokenizer
> generates a token per han character. So they are searcheable though
> precision suffers. But in your scenario, Chinese text is rare, so some
> precision
> loss may not be a real issue.
>
> Kuro
>
>