RE: Solr 4.0 and Maven SNAPSHOT artifacts

2012-10-04 Thread Steven A Rowe
http://wiki.apache.org/solr/NightlyBuilds -Original Message- From: Amit Nithian [mailto:anith...@gmail.com] Sent: Thursday, October 04, 2012 1:22 PM To: solr-user@lucene.apache.org Subject: Solr 4.0 and Maven SNAPSHOT artifacts Is there a maven repository location that contains the night

RE: Solr - Remove specific punctuation marks

2012-09-24 Thread Steven A Rowe
Hi Daisy, I can't see anything wrong with the regex or the XML syntax. One possibility: if it's Arabic you're matching against, you may want to add ARABIC FULL STOP U+06D4 to the set you subtract from \p{Punct}. If you give an example of your input and your expected output, I might be able to

RE: failure notice from zju.edu.cn

2012-09-12 Thread Steven A Rowe
I get the same thing, after nearly every email I send directly to the lucene/solr lists (as opposed to auto-sent JIRA posts). I don't think it delays my messages though. Steve -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Wednesday, September 12, 2012 1:24 PM T

RE: Solr - Lucene Debuging help

2012-09-10 Thread Steven A Rowe
Hi Badal, I don't use Eclipse, but I did notice a comment in the Tips section on that wiki page that described a very similar problem and a resolution: Under some conditions, I've seen this process have thousands of compile errors, something like "class XXX defined in multiple

RE: Solr - Lucene Debuging help

2012-09-10 Thread Steven A Rowe
http://wiki.apache.org/solr/HowToConfigureEclipse -Original Message- From: BadalChhatbar [mailto:badal...@yahoo.com] Sent: Monday, September 10, 2012 3:46 PM To: solr-user@lucene.apache.org Subject: Solr - Lucene Debuging help Hi All, I am new to solr and lucene, i have downloaded sourc

RE: cant build trunk

2012-09-10 Thread Steven A Rowe
Hi Radim, Thanks for the report. I just compiled using 'ant compile' and then ran 'ant javadocs' in both lucene/ and solr/, and everything went fine. I have trunk r1382886 and am using Sun JVM 1.6.0_21 on Windows 7. I suspect you're using a JVM with different Javadoc requirements? Anyway, I'v

RE: Solr4 distributed IDF

2012-08-30 Thread Steven A Rowe
Hi Ke, Have you seen ? Steve -Original Message- From: Eric Wu [mailto:eirik...@gmail.com] Sent: Thursday, August 30, 2012 3:05 AM To: solr-user@lucene.apache.org Subject: Solr4 distributed IDF Hi there, Does there exist any issue ticket

RE: Solr4.0 BETA - Error when StempelPolishStemFilterFactory

2012-08-16 Thread Steven A Rowe
I can reproduce - I agree, this seems like a bug. I've opened an issue: https://issues.apache.org/jira/browse/SOLR-3737 Thanks for reporting! Steve -Original Message- From: sausarkar [mailto:sausar...@ebay.com] Sent: Thursday, August 16, 2012 6:42 PM To: solr-user@lucene.apache.org Su

RE: Solr4.0 BETA - Error when StempelPolishStemFilterFactory

2012-08-16 Thread Steven A Rowe
Hi sausarkar, You've probably been hit by the local configuration equivalent of - the Solr example configuration directory added a path segment, so references have to be changed to include an extra "../". Steve -Original Message- From

RE: Any ideas on Solr 4.0 Release.

2012-07-05 Thread Steven A Rowe
Hi Sohail, Some of your questions are answered here: . See Chris Hostetter's blog post for more info, particularly on questions around stability: . Steve -Original Message--

RE: How to import this Json-line by DIH?

2012-06-20 Thread Steven A Rowe
Hi jueljust, Nabble removed the entire content of your email before sending it to the mailing list. Maybe use a different service that doesn't throw away your message? Steve From: jueljust [juelj...@gmail.com] Sent: Wednesday, June 20, 2012 10:56 AM To:

RE: Sort by length percentage match

2012-05-16 Thread Steven A Rowe
Hi Alejandro, N-grams might be a good fit. Using bigrams (n-grams of length 2) for "london", you'd get tokens "lo", "on", "nd", "do", "on". This should provide the hit ordering you want. Although it's not listed on Solr's analysis factories wiki page

RE: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Steven A Rowe
Hi Tanguy, I looked at the code, and I can see where the problem you describe is happening. I think it's a bug: if numbers are search terms, "stemming" them by compressing repeated digits makes little sense. Could you file a bug in JIRA? Please include the examples you gave in your earlier em

[MAVEN] Heads up: build changes

2012-05-08 Thread Steven A Rowe
If you use the Lucene/Solr Maven POMs to drive the build, I committed a major change last night (see https://issues.apache.org/jira/browse/LUCENE-3948 for more details): * 'ant get-maven-poms' no longer places pom.xml files under the lucene/ and solr/ directories. Instead, they are placed in a

RE: How does "start.jar" get build in the Solr trunk repository?

2012-05-07 Thread Steven A Rowe
Hi Neil, ivy-maven-plugin might work: http://evgeny-goldin.com/wiki/Ivy-maven-plugin If it does work for you, I'm interested in the details. Alternatively, you could use maven-antrun-plugin to invoke the Ant build functionality you want from Maven. Steve -Original Message- From: nhoo

RE: Implementing multiterm chain for ICUCollationKeyFilterFactory

2012-05-03 Thread Steven A Rowe
Hi Oliver, Nabble.com stripped out your analysis chain XML before sending your message to the mailing list. My suggestion: stop using Nabble. (I've described this problem to their support people a couple of times, and they apparently just don't care, since it still persists, years later.)

RE: StandardTokenizer and domain names containing digits

2012-04-23 Thread Steven A Rowe
: StandardTokenizer and domain names containing digits Steven A Rowe syr.edu> writes: > StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary > rules from Unicode 6.0.0 Standard > Annex #29, a.k.a. UAX#29: <http://www.unicode.org/reports/tr29/tr29- 17.html#Word_Bounda

RE: StandardTokenizer and domain names containing digits

2012-04-19 Thread Steven A Rowe
Hi Alex, TLDR; Try adding WordDelimiterFilter to your analyzer(s). StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary rules from Unicode 6.0.0 Standard Annex #29, a.k.a. UAX#29: . These rules don't include reco

RE: Moving to Maven from Ant solr.build.dir Not Found

2012-04-10 Thread Steven A Rowe
on 4.0-SNAPSHOT org.apache.lucene lucene-queries 4.0-SNAPSHOT I know I could download the snapshot manually, but I'd much prefer to do that through Maven since I don't need to modify source at all. Eli On 4/10/12 3:14 PM, Steven A Rowe wrote: > You didn't answer my question a

RE: Moving to Maven from Ant solr.build.dir Not Found

2012-04-10 Thread Steven A Rowe
ing wrong here. I'm fine with maven being officially unsupported as long as I can get things working. I'm not doing anything too fancy or out of the ordinary, so I'm thinking this shouldn't be too bad. Thanks again for the help! Eli On 4/10/12 2:12 PM, Steven A Rowe wrote: >

RE: Moving to Maven from Ant solr.build.dir Not Found

2012-04-10 Thread Steven A Rowe
l I'm doing right now is just mvn installing and then trying mvn jetty:run-exploded. I'm not using ant at all, and would really like to keep it that way if at all possible. Eli On 4/10/12 11:56 AM, Steven A Rowe wrote: > Hi Eli, > > The author of the blog post you mentione

RE: Moving to Maven from Ant solr.build.dir Not Found

2012-04-10 Thread Steven A Rowe
Hi Eli, The author of the blog post you mentioned appears to be unaware of the Maven POMs that are already included in Subversion for both Lucene and Solr. See . Because of the complex nature of the Ant build, which the Maven POMs cannot enti

RE: reproducibility of query results

2012-04-01 Thread Steven A Rowe
If your results are only sorted by score, it's possible that some have exactly the same score. Unless you use a secondary sort, I don't think the order of returned results among same-scored hits is guaranteed. As a result, if you cut off hits at some fixed threshold, you could see different en

RE: "ant test" and contribs

2012-03-26 Thread Steven A Rowe
parts that export jars. But there is no solr/contrib that needs one of those jars, is there? On Sat, Mar 24, 2012 at 5:05 PM, Steven A Rowe wrote: > Hi Lance, > > Are you adding a new solr/contrib/project/?  If so, why not use the build.xml > file from a sibling project?  E.g. try

RE: "ant test" and contribs

2012-03-24 Thread Steven A Rowe
Hi Lance, Are you adding a new solr/contrib/project/? If so, why not use the build.xml file from a sibling project? E.g. try starting from solr/contrib/velocity/build.xml - it is very simple and enables all build steps by importing solr/contrib/contrib-build.xml. solr/contrib/contrib-build.x

RE: mailto: scheme aware tokenizer

2012-03-18 Thread Steven A Rowe
Hi Kai, I have created an issue for this: https://issues.apache.org/jira/browse/LUCENE-3880 Thanks for reporting! Steve -Original Message- From: Kai Gülzau [mailto:kguel...@novomind.com] Sent: Friday, March 16, 2012 9:59 AM To: solr-user@lucene.apache.org Subject: mailto: scheme aware

RE: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Steven A Rowe
On 2/27/2012 at 3:16 PM, Alexey Verkhovsky wrote: > By the way, I'm not sure that edismax interpreting 'wal mart' as 'wal' OR > 'mart' is really a bug that should be fixed. It's a counter-intuitive > behavior, for sure, but - per my understanding - edismax is supposed to > treat consecutive words a

RE: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Steven A Rowe
Hi Alexey, Lucene's QueryParser, and at least some of Solr's query parsers - I'm not familiar with all of them - have the problem you mention: analyzers are fed queries word-by-word, instead of whole strings between operators. There is a JIRA issue for fixing this, but no work done yet:

RE: Trunk build errors

2012-02-23 Thread Steven A Rowe
Hi Darren, I use Ant 1.7.1. There have been some efforts to make the build work with Ant 1.8.X, but it is not (yet) the required version. So if you're not using Ant 1.7.1, I suggest you try it. Steve > -Original Message- > From: dar...@ontrenet.com [mailto:dar...@ontrenet.com] > Sent

RE: customizing standard tokenizer

2012-02-17 Thread Steven A Rowe
Hi Torsten, The Lucene StandardTokenizer is written in JFlex (http://jflex.de) - you can see the version 3.X specification at: You can m

RE: PatternReplaceFilterFactory group

2012-02-16 Thread Steven A Rowe
Hi O., PatternReplaceFilter(Factory) uses Matcher.replaceAll() or replaceFirst(), both of which take in a string that can include any or all groups using the syntax "$n", where n is the group number. See the Matcher.appendReplacement() javadocs for an explanation of the functionality and synta

RE: Sort by geoDist() syntax error on 3.5

2012-01-30 Thread Steven A Rowe
No, it's not. If I follow the link below your email to nabble.com, I can read your message, including your suggested correct syntax - I'll quote it below for you, and since I don't use nabble, everyone who reads this list will be able to see it: > Correct syntax may be : > > select?q=%20&s

RE: Sort by geoDist() syntax error on 3.5

2012-01-30 Thread Steven A Rowe
Hi darul, Nobody got your suggested "Correct syntax": Nabble.com strips markup, e.g. XML, from emails that you send through them. I have complained to them about this problem through their support channel repeatedly, and you can see the result: they have done nothing to fix the problem. My su

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-25 Thread Steven A Rowe
tion to the following issue in JIRA: > > https://issues.apache.org/jira/browse/LUCENE-3721 > > I appreciate all the prompt responses! Looking forward to finding the > root > cause of this guy :) If there's something I'm doing incorrectly in the > configuration, ple

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Steven A Rowe
Hi Mike, When I add the following test to TestHTMLStripCharFilterFactory.java on Solr trunk, it passes: public void testNumericCharacterEntities() throws Exception { final String text = "Bose® ™"; // |Bose® ™| HTMLStripCharFilterFactory htmlStripFactory = new HTMLStripCharFilterFactory()

RE: Just can't get Solritas to work, help!

2012-01-20 Thread Steven A Rowe
Erik, I've already backported SOLR-2718 - is that what you were referring to when you said you would fix 3.6? Steve > -Original Message- > From: Erik Hatcher [mailto:erik.hatc...@gmail.com] > Sent: Friday, January 20, 2012 4:23 PM > To: solr-user@lucene.apache.org > Subject: Re: Just ca

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Steven A Rowe
onitoring+SaaS+for+Solr%22 > > Though this was already partially discussed with Chris @ fucu.org > which according to him, should have already been moved to Lucene > General. > > On Wed, Jan 18, 2012 at 11:04 PM, Steven A Rowe wrote: > > Why Jason, I declare, whatev

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Steven A Rowe
I want to retract my objection to commercial messages. I think Ted's position is more reasonable: on-topic commercial messages that are responsive to (and maybe even anticipatory of) users' needs will likely be welcomed by many subscribed here. Producing a policy statement that perfectly captu

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Steven A Rowe
> > equally dished out or not at all. > > > > On Wed, Jan 18, 2012 at 6:38 PM, Steven A Rowe wrote: > >> Hi Peter, > >> > >> Commercial solicitations are taboo here, except in the context of a > request for help that is directly relevant to a

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Steven A Rowe
4x > > Steven, > > If you are going to admonish people for advertising, it should be > equally dished out or not at all. > > On Wed, Jan 18, 2012 at 6:38 PM, Steven A Rowe wrote: > > Hi Peter, > > > > Commercial solicitations are taboo here, except in the contex

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Steven A Rowe
Hi Peter, Commercial solicitations are taboo here, except in the context of a request for help that is directly relevant to a product or service. Please don’t do this again. Steve Rowe From: Peter Velikin [mailto:pe...@velobit.com] Sent: Wednesday, January 18, 2012 6:33 PM To: solr-user@lucene

RE: Trying to understand SOLR memory requirements

2012-01-18 Thread Steven A Rowe
Hi Dave, Try 'ant usage' from the solr/ directory. Steve > -Original Message- > From: Dave [mailto:dla...@gmail.com] > Sent: Wednesday, January 18, 2012 2:11 PM > To: solr-user@lucene.apache.org > Subject: Re: Trying to understand SOLR memory requirements > > Ok, I've been able to pull

RE: Is there an issue with hypens in SpellChecker with StandardTokenizer?

2011-12-15 Thread Steven A Rowe
> On Thu, Dec 15, 2011 at 4:17 PM, Brandon Fish > wrote: > > > Hi Steve, > > > > I was using branch 3.5. I will try this on tip of branch_3x too. > > > > Thanks. > > > > > > On Thu, Dec 15, 2011 at 4:14 PM, Steven A Rowe wrote: > > > &g

RE: Is there an issue with hypens in SpellChecker with StandardTokenizer?

2011-12-15 Thread Steven A Rowe
Hi Brandon, When I add the following to SpellingQueryConverterTest.java on the tip of branch_3x (will be released as Solr 3.6), the test succeeds: @Test public void testStandardAnalyzerWithHyphen() { SpellingQueryConverter converter = new SpellingQueryConverter(); converter.init(new NamedLis

RE: Removing whitespace

2011-12-12 Thread Steven A Rowe
Hi Devon, Something like this should work for you (untested!): Steve > -Original Message- > From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] > Sent: Monday, December 12, 2011 4:52 PM > To: 'solr-user@lucene.apache.org' > Subject: Removing whitespace > > Hel

RE: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Steven A Rowe
e WhitespaceTokenizerFactory instead of StandardTokenizerFactory, which > means that I had to use extra PatternReplaceCharFilterFactory filters to > get rid of leading/trailing punctuation. > > Again, thanks! > > Marian > > 2011/11/30 Steven A Rowe > > > Hi Ma

RE: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Steven A Rowe
Hi Marian, Extending the StandardTokenizer(Factory) java class is not the way to go if you want to change its behavior. StandardTokenizer is generated from a JFlex specification, so you would need to modify the specification to include your special slash-containing-word rule

RE: XML Manager for Solr

2011-11-25 Thread Steven A Rowe
Hi Stephane, Do you know about Solr's DataImportHandler, aka DIH?: http://wiki.apache.org/solr/DataImportHandler Steve > -Original Message- > From: KabooHahahein [mailto:stele...@hotmail.com] > Sent: Friday, November 25, 2011 10:33 AM > To: solr-user@lucene.apache.org > Subject: XML Man

RE: Mailing List

2011-11-02 Thread Steven A Rowe
Hi Carol, Solr mailing list subscription is self-service. Go here and click on the "Subscribe to List" link under the "Users" section. Steve > -Original Message- > From: Carol Kuzel [mailto:cku...@ebscohost.com] > Sent: Wednesday, Nov

RE: Filter Question

2011-10-14 Thread Steven A Rowe
Hi Monica, AFAIK there is nothing like the filter you've described, and I believe it would be generally useful. Maybe it could be called StopTermTypesFilter? (Plural on Types to signify that more than one type of term can be stopped by a single instance of the filter.) Such a filter should

RE: Suggestions feature

2011-10-04 Thread Steven A Rowe
Hi Milan, I have three ideas: 1. Boost by log(weight) instead of just by weight. This would reduce weight-to-weight ratios and so reduce the likelihood of hit list domination, while still retaining the user's relative preferences. Multiple log applications will further decrease the weight-to

RE: Analyzer Tokenizer for Exact and Contains search on single field

2011-10-04 Thread Steven A Rowe
Hi Satish, I don't think there is a single analyzer that does what you want. However, you could send the info to a second field with copyField, and use e.g. WhitespaceTokenizer on one field for contains-style queries, and KeywordTokenizer on the other field (or just use the "string" field type)

RE: solr searching for special characters?

2011-10-03 Thread Steven A Rowe
Yes. > -Original Message- > From: vighnesh [mailto:svighnesh...@gmail.com] > Sent: Monday, October 03, 2011 2:22 AM > To: solr-user@lucene.apache.org > Subject: solr searching for special characters? > > Hi all, > > I need to search special characters in solr . so > Is it possible to sea

RE: Lucene 3.4.0 Merging

2011-10-02 Thread Steven A Rowe
gt; To: solr-user@lucene.apache.org > Subject: Re: Lucene 3.4.0 Merging > > Hi Steve > > Still same problem > > Regards > Ahsan > >  ----- Original Message - > > From: Steven A Rowe > To: "solr-user@lucene.apache.org" > Cc: > Sent: Sunday, October

RE: Lucene 3.4.0 Merging

2011-10-01 Thread Steven A Rowe
y, October 01, 2011 12:51 PM > To: solr-user@lucene.apache.org > Subject: Re: Lucene 3.4.0 Merging > > Hi Steve > > Thank you very much for your valued response but adding space as you have > mentioned does not solve the problem. > > Regards > Ahsan > > __

RE: Lucene 3.4.0 Merging

2011-09-30 Thread Steven A Rowe
Hi Ahson, The wiki page you got your cmdline invocation from was missing a space character between the classpath and "org/apache/lucene/misc/IndexMergeTool". I've just updated that page. Steve > -Original Message- > From: Ahson Iqbal [

RE: question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

2011-09-09 Thread Steven A Rowe
Hi Marc, StandardAnalyzer includes StopFilter. See the Javadocs for Lucene 3.3 here: This is not new behavior - StandardAnalyzer in Lucene 2.9.1 (the version of Lucene bundled with Solr 1.4)

RE: Newbie question, ant target for packaging source files from local copy?

2011-08-25 Thread Steven A Rowe
Hi sid, The current source packaging scheme aims to *avoid* including local changes :), so yes, there is no support currently for what you want to do. Prior to , the source packaging scheme used the current sources rather than pulling from Subv

RE: Hudson build issues

2011-08-11 Thread Steven A Rowe
Hi arian487, You apparently are not using the official Ant build? (Maven is officially unsupported.) The scripts used by the Lucene and Solr Jenkins builds at the ASF are available here: http://svn.apache.org/repos/asf/lucene/dev/nightly/ The ASF Jenkins jobs checkout the above direc

RE: Can't mix Synonyms with Shingles?

2011-08-10 Thread Steven A Rowe
Hi Jeff, Hi Jeff, You have configured ShingleFilterFactory with a token separator of "", so e.g. "International Corporation" will output the shingle "InternationalCorporation". If this is the form you want to use for synonym matching, it must exist in your synonym file. Does it? Steve > --

RE: ShingleFilterFactory class error

2011-07-28 Thread Steven A Rowe
Pradeep, As indicated on the wiki , the minShingleSize option is not available in Solr versions prior to 3.1. What version of Solr are you using? (By the way, I am only replying on solr-user@lucene.apache.or

RE: Problem starting solr on jetty

2011-07-27 Thread Steven A Rowe
XP Professional Version 2002 Service Pack 3 Interestingly my colleagues who have the same environment are not facing this problem. Thanks & Regards Anand Nigam -Original Message- From: Steven A Rowe [mailto:sar...@syr.edu] Sent: 27 July 2011 20:21 To: solr-user@lucene.apache.org Sub

RE: Problem starting solr on jetty

2011-07-27 Thread Steven A Rowe
Hi Anand, Someone else reported this exact same error with Solr v1.4.0: http://www.lucidimagination.com/search/document/fd5b83f3595a1c6c/can_t_start_solr_by_java_jar_start_jar I downloaded the apache-solr-3.3.0.zip, unpacked it, then ran 'java -jar start.jar' from the cmdline. It worked. (Win

RE: solr.StandardTokenizerFactory: more info needed

2011-07-06 Thread Steven A Rowe
ubject: Re: solr.StandardTokenizerFactory: more info needed Hi Steven, This looks very good. Thanks. Do I understand correctly, that I were to change the tokenizer rules, I could go and change e.g. the token class definitions (like ) in this file and recompile the code? On Wed, Jul 6, 2011 at 3:45 PM

RE: solr.StandardTokenizerFactory: more info needed

2011-07-06 Thread Steven A Rowe
Hi Dmitry, The underlying Lucene implementation is here: http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_9_1/src/java/org/apache/lucene/analysis/standard/ StandardTokenizerImpl.jflex is probably where you should start. Steve -Original Message- From: Dmitry Kan [mailto:dmitry.

RE: How to index correctly a text save with tinyMCE

2011-06-23 Thread Steven A Rowe
Hi Ariel, On 6/23/2011 at 12:34 PM, Ariel wrote: > But it still doesn't convert the code to the correct character, for > instance: España must be converted to España but it still > remains as España. So it looks like your text processing tool(s) escape markup meta-characters (e.g.

RE: response time for pdf indexing

2011-06-22 Thread Steven A Rowe
Hi Rode, Have you seen http://wiki.apache.org/solr/SolrPerformanceFactors ? Steve > -Original Message- > From: Rode González (libnova) [mailto:r...@libnova.es] > Sent: Wednesday, June 22, 2011 11:30 AM > To: solr-user@lucene.apache.org > Cc: dan...@silvereme.com; Gonzalo Iglesias; Leo; M

RE: Building Solr 3.2 from sources - can't get war

2011-06-19 Thread Steven A Rowe
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_2/ > -Original Message- > From: Yuriy Akopov [mailto:ako...@hotmail.co.uk] > Sent: Sunday, June 19, 2011 4:38 PM > To: solr-user@lucene.apache.org > Subject: Re: Building Solr 3.2 from sources - can't get war > > > In the chec

RE: How to index correctly a text save with tinyMCE

2011-06-16 Thread Steven A Rowe
Hi Ariel, As Shawn says, char filters come before tokenizers. You need to use a tag instead of tag. I've updated the HTMLStripCharFilter documentation on the Solr wiki to include this information: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory St

RE: How to index correctly a text save with tinyMCE

2011-06-16 Thread Steven A Rowe
Hi Ariel, On 6/16/2011 at 10:45 AM, Ariel wrote: > I have the following problem: I am using the spanish analyzer to index > and query, but due to I am using tinymce some charactes of the text are > changed codified in html, for example the text: "En españa ... " it is > changed to "En españa" so I

RE: ISOLatin1AccentFilterFactory vs ASCIIFoldingFilterFactory

2011-06-14 Thread Steven A Rowe
On 6/14/2011 at 7:12 AM, Ahmet Arslan wrote: > --- On Tue, 6/14/11, Nils Weinander wrote: > > The documentation states that ISOLatin1AccentFilterFactory > > is deprecated in favour of ASCIIFoldingFilterFactory: [...] > > Is there a way to limit which characters are folded? > > With MappingCharFil

RE: Tokenising based on known words?

2011-06-09 Thread Steven A Rowe
Hi Mark, Are you familiar with shingles aka token n-grams? http://lucene.apache.org/solr/api/org/apache/solr/analysis/ShingleFilterFactory.html Use the empty string for the tokenSeparator to get wordstogether style tokens in your index. I think you'll want to apply this filter only at index-t

RE: K-Stemmer for Solr 3.1

2011-05-28 Thread Steven A Rowe
Hi Mark, Yonik Seeley indicated on LUCENE-152 that he is considering contributing Lucid's KStemmer version to Lucene: You can vote

RE: How to test Solr Integartion - how to get EmbeddedSolrServer?

2011-05-17 Thread Steven A Rowe
Hi Gabriele, On 5/17/2011 at 9:34 AM, Gabriele Kahlout wrote: > Solr Core should declare a test dependency on Solr Test Framework. I agree: - Solr Core should have a test-scope dependency on Solr Test Framework. - Solr Test Framework should have a compile-scope dependency on Solr Core. But Mave

RE: K-Stemmer for Solr 3.1

2011-05-16 Thread Steven A Rowe
On 5/16/2011 at 5:33 PM, David W. Smiley wrote: > Lucid's KStemmer is LGPL and the Solr committers have shown that they > don't want LGPL libraries shipping with Solr. If you are intent on > releasing your changes, I suggest attaching both the modified source and > the compiled jar onto Solr's k-st

RE: Is it possible to build Solr as a maven project?

2011-05-10 Thread Steven A Rowe
On 5/10/2011 at 9:57 AM, Gabriele Kahlout wrote: > On Tue, May 10, 2011 at 3:50 PM, Steven A Rowe wrote: > > <http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/dev-tools/maven/README.maven> [...] > > svn co http://svn.apache.org/repos/asf/lucene/dev/tags/l

RE: Is it possible to build Solr as a maven project?

2011-05-10 Thread Steven A Rowe
Hi Ludovic, On 5/10/2011 at 10:02 AM, lboutros wrote: > Very nice Steve ! Thanks again. (I'm building from svn so that's perfect > for me) > Is this file referenced somewhere in the wiki ? Not yet, no. Probably should be linked from the HowToContribute pages for Lucene and Solr. Feel free to a

RE: Is it possible to build Solr as a maven project?

2011-05-10 Thread Steven A Rowe
/maven/. Please write back if you run into any problems. Steve From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Tuesday, May 10, 2011 8:37 AM To: boutr...@gmail.com Cc: solr-user@lucene.apache.org; Steven A Rowe; ryan...@gmail.com Subject: Re: Is it possible to build Solr as a maven p

RE: Is it possible to build Solr as a maven project?

2011-05-05 Thread Steven A Rowe
[INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 2:18.040s > [INFO] Finished at: Thu May 05 20:39:09 CEST 2011 > [INFO] Final Memory: 38M/90M > [I

RE: Is it possible to build Solr as a maven project?

2011-05-05 Thread Steven A Rowe
Hi Gabriele, On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote: > Okay, that sequence worked, but then shouldn't I be able to do $ mvn > install afterwards? This is what I get: ... > COMPILATION ERROR : > - > org/apache/solr/spelling/suggest

RE: Is it possible to build Solr as a maven project?

2011-05-05 Thread Steven A Rowe
Hi Gabriele, The sequence should be 1. svn update 2. ant get-maven-poms 3. mvn -N -Pbootstrap install I think you left out #2 - there was a very recent change to the POMs that affects the noggit jar name. Steve > -Original Message- > From: Gabriele Kahlout [mailto:gabri...@mysimpatico

RE: Getting field information inside a Tokenizer

2011-05-03 Thread Steven A Rowe
Hi FMC, On 5/3/2011 at 12:37 PM, FatMan Corp wrote: > Hi, I would like to get another's field information for the same document > within a Tekonizer class. > How can this be achieved? Use s in your schema , and associate different analysis pipe

RE: How to Update Value of One Field of a Document in Index?

2011-04-27 Thread Steven A Rowe
> There's the "limited join" patch, see: > https://issues.apache.org/jira/browse/SOLR-2272 > that hasn't been applied yet Correction: Yonik committed this feature in r1096978.

RE: Apache Solr 3.1.0

2011-04-26 Thread Steven A Rowe
Hi Wodek, UAX29URLEmailTokenizer includes all of StandardTokenizer's rules and adds rules to tokenize URLs and Emails: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.UAX29URLEmailTokenizerFactory Steve > -Original Message- > From: Wodek Siebor [mailto:siebor_wlo...@ba

RE: Split token

2011-04-15 Thread Steven A Rowe
This pattern split tokens *only* in the presence of parentheses with adjoining whitespace, and includes the parentheses with the tokens: (?<=\))\s+|\s+(?=\() So you'll get this kind of behavior: Tottenham Hotspur (London) F.C. Internationale (milan) FC Midtjylland (Herning) (Ikast)

RE: XML not coming through from nabble to Gmail

2011-04-12 Thread Steven A Rowe
I've asked on Nabble if they know of a fix for the problem: http://nabble-support.1.n2.nabble.com/solr-dev-mailing-list-tp6023495p6264955.html Steve > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, April 12, 2011 8:43 AM > To: Chris Hostetter

RE: Embedded Solr constructor not returning

2011-04-06 Thread Steven A Rowe
Hi Greg, > I need the servlet API in my app for it to work, despite being command > line. > So adding this to the maven POM fixed everything: > > javax.servlet > servlet-api > 2.5 > > > Perhaps this dependency could be listed on the wiki? Alon

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Steven A Rowe
I added this test method locally to TestASCIIFoldingFilter.java in the Lucene/Solr 3.1.0 source tree, and it passed, so the filter is not the problem (and the Solr factory certainly isn't either - it's just a wrapper) - I second Ludovic's question - you must have other filters configured: pub

RE: Question regarding XSLT

2011-03-30 Thread Steven A Rowe
Hi Marcelo, Try adding the 'method="text"' attribute to your tag, e.g.: If that doesn't work, there is another attribute "omit-xml-declaration" that might do the trick. See http://www.w3.org/TR/xslt#output for more info. Steve > -Original Message- > From: Marcelo Iturbe [mailto:mar

RE: Help please - recursively indexing lots and lots of text files

2011-03-04 Thread Steven A Rowe
Hi Colin, Solr's DataImportHandler sounds like what you want: http://wiki.apache.org/solr/DataImportHandler In particular, take a look at FileListEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor Steve > -Original Message- > From: c

RE: Query on multivalue field

2011-03-01 Thread Steven A Rowe
Hi Scott, Querying against a multi-valued field just works - no special incantation required. Steve > -Original Message- > From: Scott Yeadon [mailto:scott.yea...@anu.edu.au] > Sent: Monday, February 28, 2011 11:50 PM > To: solr-user@lucene.apache.org > Subject: Query on multivalue fiel

RE: How to search for special chars like ä from ae?

2011-02-08 Thread Steven A Rowe
Hi Anithya, That's good to hear. Again, please consider donating your work: . Steve > -Original Message- > From: Anithya [mailto:surysha...@gmail.com] > Sent: Tuesday, February 08, 2011 5:16 PM > To: solr-user@lucene.apache.or

RE: How to search for special chars like ä from ae?

2011-02-08 Thread Steven A Rowe
Hi Anithya, Yes, that sounds right. You will want to edit mapping-FoldToASCII.txt, and my suggestion is that you rename mapping-FoldToASCII.txt to reflect your changes (for example, if your target language is German, you could rename it to mapping-German-FoldToASCII.txt); otherwise it would be

RE: How to search for special chars like ä from ae?

2011-02-07 Thread Steven A Rowe
Hi Anithya, There is a mapping file for MappingCharFilterFactory that behaves the same as ASCIIFoldingFilterFactory: mapping-FoldToASCII.txt, located in Solr's example conf/ directory in Solr 3.1+. You can rename and then edit this file to map "ä" to "ae", " ü" to "ue", etc. (look for "WITH DI

RE: Indexing all permutations of words from the input

2011-01-20 Thread Steven A Rowe
Hi Martin, The co-occurrence filter I'm working on at https://issues.apache.org/jira/browse/LUCENE-2749 would do what you want (among other things). Still vaporware at this point, as I've only put a couple of hours into it, so don't hold your breath :) Steve > -Original Message- > Fro

RE: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Steven A Rowe
> [x] ASF Mirrors (linked in our release announcements or via the Lucene > website) > > [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) > > [x] I/we build them from source via an SVN/Git checkout.

RE: solrj & http client 4

2011-01-17 Thread Steven A Rowe
Hi Stevo, Thanks for reviewing the Maven POMs in LUCENE-2657 - I appreciate it. > In those poms, not all modules have explicit version and groupId which > is a bad practice. Really? According to the "POM best practices" section in Sonatype's Maven book

RE: start value in queries zero or one based?

2011-01-13 Thread Steven A Rowe
> Please, read every wiki page you can find and write notes. NO!!! Once you start down this road, there is no turning back! Soon you will feel the need to turn your notes into a new wiki page or a blog post, and people will read those and write notes, and the process will repeat, ad infinitum

RE: Not storing, but highlighting from document sentences

2011-01-12 Thread Steven A Rowe
Hi Tomislav, > if I understand correctly, you are suggesting query execution in two > phases: first execute query on whole article index core (where whole > articles are indexed, but not stored) to get article IDs (for articles > which match original query). Then for each match in article core: >

RE: Not storing, but highlighting from document sentences

2011-01-12 Thread Steven A Rowe
> > I think you can get what you want by doing the first stage retrieval, > > and then in the second stage, add required constraint(s) to the query > > for the matching docid(s), and change the AND operators in the > > original query to OR. Coordination will cause the best snippet(s) to > > rise

  1   2   >