help with using ngram analyser needed

2008-02-22 Thread Christian Wittern
achieve is a substring match that would match any sequence of characters in the target field. Any help appreciated, Christian -- Christian Wittern, Kyoto

Re: help with using ngram analyser needed

2008-02-22 Thread Christian Wittern
Otis Gospodnetic wrote: Great, this works and should give me a start for further experiments. Thanks a lot! Christian

no support for CJK characters from Extension B in Solr

2008-02-27 Thread Christian Wittern
sets, some of the characters in everyday use in Japan are now encoded in this area. It does therefore seems highly desirable that this problem gets solved. I am testing this on a Mac OS X 10.5.2 system, with Java 1.5.0_13 and Solr 1.2.0. Any hints appreciated, Christian Wittern -- Christian

Re: no support for CJK characters from Extension B in Solr

2008-02-27 Thread Christian Wittern
Leonardo Santagada wrote: On 28/02/2008, at 00:23, Christian Wittern wrote: The documents I am trying to index with Solr contain characters from the CJK Extension B, which had been added to Unicode in version 3.1 (March 2001). Just to give more information, does java suport this? I

Re: no support for CJK characters from Extension B in Solr

2008-02-28 Thread Christian Wittern
characters, given the processes involved. Steve -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

Re: no support for CJK characters from Extension B in Solr

2008-02-28 Thread Christian Wittern
Ken Krugler wrote: What was the actual format of the Extension B characters in the XML being posted? I tried both a binary (UTF-8) format and numeric character representation of the type 𠀀 -- the results where the same. Christian -- Christian Wittern Institute for Research in

Re: no support for CJK characters from Extension B in Solr

2008-02-28 Thread Christian Wittern
example directory -- I am just assuming that this is doing The Right Thing:-) The encoding is (also?) specified in the XML file itself as UTF-8. Christian -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265

Re: invalid XML character

2008-03-01 Thread Christian Wittern
th its binary value. The easiest place to fix it is before the field values are serialized into XML. Indeed! All the best, Christian -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

Result based sorting for KWIC?

2008-03-10 Thread Christian Wittern
implement this? Any ideas appreciated, Christian -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

Re: Result based sorting for KWIC?

2008-03-11 Thread Christian Wittern
ers, Christian -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

Re: Result based sorting for KWIC?

2008-03-17 Thread Christian Wittern
search term and then the results are sorted by relevance. But I think I have enough information now to decide how to proceed. Christian -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

Highlighted field gets truncated

2008-04-18 Thread Christian Wittern
tent,variants&wt=xml&tr=solr-tei.xsl Any hint on how to debug this would be highly appreciated! All the best, Christian -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

Re: Highlighted field gets truncated

2008-04-18 Thread Christian Wittern
hat hl.simple.pre and *.post are for surrounding the match, not the snippet, right? Christian -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

Re: Highlighted field gets truncated

2008-04-19 Thread Christian Wittern
g the whole field and do the fragmenting myself? Fragments are returned as an xml list; you can combine them together however you like in client code. Solr can merge adjacent fragments for you if you wish. I see. That is great. Thanks, Christian -- Christian Wittern Institute for Resear

Re: Highlighted field gets truncated

2008-04-22 Thread Christian Wittern
Mike Klaas wrote: On 19-Apr-08, at 3:02 AM, Christian Wittern wrote: So it could be that the match is not part of the fragment? This sounds a bit strange. Is there a way to make sure the fragment contains the match other than returning the whole field and do the fragmenting myself