RE: question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

2011-09-09 Thread Steven A Rowe
Hi Marc, StandardAnalyzer includes StopFilter. See the Javadocs for Lucene 3.3 here: This is not new behavior - StandardAnalyzer in Lucene 2.9.1 (the version of Lucene bundled with Solr 1.4)

RE: Lucene 3.4.0 Merging

2011-09-30 Thread Steven A Rowe
Hi Ahson, The wiki page you got your cmdline invocation from was missing a space character between the classpath and "org/apache/lucene/misc/IndexMergeTool". I've just updated that page. Steve > -Original Message- > From: Ahson Iqbal [

RE: Lucene 3.4.0 Merging

2011-10-01 Thread Steven A Rowe
y, October 01, 2011 12:51 PM > To: solr-user@lucene.apache.org > Subject: Re: Lucene 3.4.0 Merging > > Hi Steve > > Thank you very much for your valued response but adding space as you have > mentioned does not solve the problem. > > Regards > Ahsan > > __

RE: Lucene 3.4.0 Merging

2011-10-02 Thread Steven A Rowe
gt; To: solr-user@lucene.apache.org > Subject: Re: Lucene 3.4.0 Merging > > Hi Steve > > Still same problem > > Regards > Ahsan > >  ----- Original Message - > > From: Steven A Rowe > To: "solr-user@lucene.apache.org" > Cc: > Sent: Sunday, October

RE: solr searching for special characters?

2011-10-03 Thread Steven A Rowe
Yes. > -Original Message- > From: vighnesh [mailto:svighnesh...@gmail.com] > Sent: Monday, October 03, 2011 2:22 AM > To: solr-user@lucene.apache.org > Subject: solr searching for special characters? > > Hi all, > > I need to search special characters in solr . so > Is it possible to sea

RE: Analyzer Tokenizer for Exact and Contains search on single field

2011-10-04 Thread Steven A Rowe
Hi Satish, I don't think there is a single analyzer that does what you want. However, you could send the info to a second field with copyField, and use e.g. WhitespaceTokenizer on one field for contains-style queries, and KeywordTokenizer on the other field (or just use the "string" field type)

RE: Suggestions feature

2011-10-04 Thread Steven A Rowe
Hi Milan, I have three ideas: 1. Boost by log(weight) instead of just by weight. This would reduce weight-to-weight ratios and so reduce the likelihood of hit list domination, while still retaining the user's relative preferences. Multiple log applications will further decrease the weight-to

RE: Filter Question

2011-10-14 Thread Steven A Rowe
Hi Monica, AFAIK there is nothing like the filter you've described, and I believe it would be generally useful. Maybe it could be called StopTermTypesFilter? (Plural on Types to signify that more than one type of term can be stopped by a single instance of the filter.) Such a filter should

RE: Mailing List

2011-11-02 Thread Steven A Rowe
Hi Carol, Solr mailing list subscription is self-service. Go here and click on the "Subscribe to List" link under the "Users" section. Steve > -Original Message- > From: Carol Kuzel [mailto:cku...@ebscohost.com] > Sent: Wednesday, Nov

RE: XML Manager for Solr

2011-11-25 Thread Steven A Rowe
Hi Stephane, Do you know about Solr's DataImportHandler, aka DIH?: http://wiki.apache.org/solr/DataImportHandler Steve > -Original Message- > From: KabooHahahein [mailto:stele...@hotmail.com] > Sent: Friday, November 25, 2011 10:33 AM > To: solr-user@lucene.apache.org > Subject: XML Man

RE: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Steven A Rowe
Hi Marian, Extending the StandardTokenizer(Factory) java class is not the way to go if you want to change its behavior. StandardTokenizer is generated from a JFlex specification, so you would need to modify the specification to include your special slash-containing-word rule

RE: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Steven A Rowe
e WhitespaceTokenizerFactory instead of StandardTokenizerFactory, which > means that I had to use extra PatternReplaceCharFilterFactory filters to > get rid of leading/trailing punctuation. > > Again, thanks! > > Marian > > 2011/11/30 Steven A Rowe > > > Hi Ma

RE: Removing whitespace

2011-12-12 Thread Steven A Rowe
Hi Devon, Something like this should work for you (untested!): Steve > -Original Message- > From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] > Sent: Monday, December 12, 2011 4:52 PM > To: 'solr-user@lucene.apache.org' > Subject: Removing whitespace > > Hel

RE: Is there an issue with hypens in SpellChecker with StandardTokenizer?

2011-12-15 Thread Steven A Rowe
Hi Brandon, When I add the following to SpellingQueryConverterTest.java on the tip of branch_3x (will be released as Solr 3.6), the test succeeds: @Test public void testStandardAnalyzerWithHyphen() { SpellingQueryConverter converter = new SpellingQueryConverter(); converter.init(new NamedLis

RE: Is there an issue with hypens in SpellChecker with StandardTokenizer?

2011-12-15 Thread Steven A Rowe
> On Thu, Dec 15, 2011 at 4:17 PM, Brandon Fish > wrote: > > > Hi Steve, > > > > I was using branch 3.5. I will try this on tip of branch_3x too. > > > > Thanks. > > > > > > On Thu, Dec 15, 2011 at 4:14 PM, Steven A Rowe wrote: > > > &g

RE: Trying to understand SOLR memory requirements

2012-01-18 Thread Steven A Rowe
Hi Dave, Try 'ant usage' from the solr/ directory. Steve > -Original Message- > From: Dave [mailto:dla...@gmail.com] > Sent: Wednesday, January 18, 2012 2:11 PM > To: solr-user@lucene.apache.org > Subject: Re: Trying to understand SOLR memory requirements > > Ok, I've been able to pull

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Steven A Rowe
Hi Peter, Commercial solicitations are taboo here, except in the context of a request for help that is directly relevant to a product or service. Please don’t do this again. Steve Rowe From: Peter Velikin [mailto:pe...@velobit.com] Sent: Wednesday, January 18, 2012 6:33 PM To: solr-user@lucene

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Steven A Rowe
4x > > Steven, > > If you are going to admonish people for advertising, it should be > equally dished out or not at all. > > On Wed, Jan 18, 2012 at 6:38 PM, Steven A Rowe wrote: > > Hi Peter, > > > > Commercial solicitations are taboo here, except in the contex

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Steven A Rowe
> > equally dished out or not at all. > > > > On Wed, Jan 18, 2012 at 6:38 PM, Steven A Rowe wrote: > >> Hi Peter, > >> > >> Commercial solicitations are taboo here, except in the context of a > request for help that is directly relevant to a

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Steven A Rowe
I want to retract my objection to commercial messages. I think Ted's position is more reasonable: on-topic commercial messages that are responsive to (and maybe even anticipatory of) users' needs will likely be welcomed by many subscribed here. Producing a policy statement that perfectly captu

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Steven A Rowe
onitoring+SaaS+for+Solr%22 > > Though this was already partially discussed with Chris @ fucu.org > which according to him, should have already been moved to Lucene > General. > > On Wed, Jan 18, 2012 at 11:04 PM, Steven A Rowe wrote: > > Why Jason, I declare, whatev

RE: Just can't get Solritas to work, help!

2012-01-20 Thread Steven A Rowe
Erik, I've already backported SOLR-2718 - is that what you were referring to when you said you would fix 3.6? Steve > -Original Message- > From: Erik Hatcher [mailto:erik.hatc...@gmail.com] > Sent: Friday, January 20, 2012 4:23 PM > To: solr-user@lucene.apache.org > Subject: Re: Just ca

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Steven A Rowe
Hi Mike, When I add the following test to TestHTMLStripCharFilterFactory.java on Solr trunk, it passes: public void testNumericCharacterEntities() throws Exception { final String text = "Bose® ™"; // |Bose® ™| HTMLStripCharFilterFactory htmlStripFactory = new HTMLStripCharFilterFactory()

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-25 Thread Steven A Rowe
tion to the following issue in JIRA: > > https://issues.apache.org/jira/browse/LUCENE-3721 > > I appreciate all the prompt responses! Looking forward to finding the > root > cause of this guy :) If there's something I'm doing incorrectly in the > configuration, ple

RE: problem to indexing

2010-07-11 Thread Steven A Rowe
Hi Jörg, Just guessing what the problem is, the following looks like it's not well-formed XML: & If you want just the char "&", that should instead read: & Similarly, you should escape "<" and ">" chars in text: < and > respectively. Steve > -Original Message- > From: Jörg Ag

RE: Novice seeking help to change filters to search without diacritics

2010-07-23 Thread Steven A Rowe
Hi HSingh, Maybe the mapping file I attached to https://issues.apache.org/jira/browse/SOLR-2013 will help? Steve > -Original Message- > From: HSingh [mailto:hsin...@gmail.com] > Sent: Thursday, July 22, 2010 11:30 PM > To: solr-user@lucene.apache.org > Subject: Re: Novice seeking help t

RE: Novice seeking help to change filters to search without diacritics

2010-07-24 Thread Steven A Rowe
Hi HSingh, Usually people set up two fields, one with diacritics and one without. Then searches are against both fields. If you think a match against the field with diacritics is more valuable, you can give that field a boost. Steve > -Original Message- > From: HSingh [mailto:hsin...

RE: analysis tool vs. reality

2010-08-16 Thread Steven A Rowe
Hi Robert, You wrote in response to Hoss: > Maybe for once your argument isn't completely bogus Attacking people here is really uncalled for. -1 from me. Steve

RE: Solr synonyms format query time vs index time

2010-08-17 Thread Steven A Rowe
Hi Michael, I think the problem you're seeing is that no document contains "reebox", and you've used the "explicit" syntax (source=>dest) instead of the "equivalent" syntax (term,term,term). I'm guessing that if you convert your synonym file from: reebox => Reebok to: reebox

RE: shingles work in analyzer but not real data

2010-09-02 Thread Steven A Rowe
Hi Jeff, Have you seen PositionFilterFactory?: Steve > -Original Message- > From: Jeff Rose [mailto:j...@globalorange.nl] > Sent: Thursday, September 02, 2010 9:06 AM > To: solr-user@lucene.apache.

RE: shingles work in analyzer but not real data

2010-09-03 Thread Steven A Rowe
Hi Dennis, I took a stab at answering this question in the following java-user mailing list post: http://www.lucidimagination.com/search/document/6cb7b54cce6872b3/lucene_indexes Steve > -Original Message- > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > Sent: Friday, September 03

RE: bi-grams for common terms - any analyzers do that?

2010-09-23 Thread Steven A Rowe
> -Original Message- > From: Andy [mailto:angelf...@yahoo.com] > Sent: Thursday, September 23, 2010 6:05 AM > To: solr-user@lucene.apache.org > Subject: bi-grams for common terms - any analyzers do

RE: RAM increase

2010-10-21 Thread Steven A Rowe
Memory limits info: http://www.oracle.com/technetwork/java/hotspotfaq-138619.html#gc_heap_32bit -d64 usage info: Steve > -Original Message- > From: Dennis Gearon [m

RE: How to index long words with StandardTokenizerFactory?

2010-10-22 Thread Steven A Rowe
Hi Sergey, I've opened an issue to add a maxTokenLength param to the StandardTokenizerFactory configuration: https://issues.apache.org/jira/browse/SOLR-2188 I'll work on it this weekend. Are you using Solr 1.4.1? I ask because of your mention of Lucene 2.9.3. I'm not sure there will

RE: How to index long words with StandardTokenizerFactory?

2010-10-22 Thread Steven A Rowe
24*1024, but I couldn't index a field with just size > of ~34kb. I understand that it's a little weird to index such a big > data, but I just want to know it doesn't work > > On 22 October 2010 20:36, Steven A Rowe wrote: > > Hi Sergey, > > > > I

RE: FieldCache

2010-10-25 Thread Steven A Rowe
Hi Mathias, > [...] I tried to use IndexableBinaryStringTools to re-encode my 11 byte > array. The size was increased to 7 characters (= 14 bytes) > which is still a gain of more than 50 percent compared to the UTF8 > encoding. BTW: I found no sample how to use the > IndexableBinaryStringTools cla

RE: FieldCache

2010-10-25 Thread Steven A Rowe
Hi Robert, On 10/25/2010 at 8:20 AM, Robert Muir wrote: > it is deprecated in trunk, because you can index binary terms (your > own byte[]) directly if you want. To do this, you need to use a custom > AttributeFactory. It's not actually deprecated yet. > See src/test/org/apache/lucene/index/Test

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Hi Varun, I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: "mobile wi

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Um, maybe I'm way off base, but when Varun said: > If I search with the text "samsung andriod GPS", > search results should only conain "samsung", "GPS", > "andriod" and "samsung andriod". I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Makin

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
. It is usually a > better idea to learn from others’ mistakes, so you do not have to make > them yourself. from > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On Tue,

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
ve to make > them yourself. from > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On Tue, 10/26/10, Steven A Rowe wrote: > > > From: Steven A Rowe > > Subject: RE:

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
mistakes. It is usually a > better idea to learn from others’ mistakes, so you do not have to make > them yourself. from > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
> >> > >> It is always a good idea to learn from your own mistakes. It is > >> usually a better idea to learn from others’ mistakes, so you do not > >> have to make them yourself. from > >> 'http://blogs.techrepublic.com.com/se

RE: How do I this in Solr?

2010-10-27 Thread Steven A Rowe
I'm pretty sure the word-count strategy won't work. > If I search with the text "samsung andriod GPS", search results > should only conain "samsung", "GPS", "andriod" and "samsung andriod". Using the word-count strategy, a document containing "samsung andriod PDQ" would be a hit, but Varun doesn

RE: Does Solr support Natural Language Search

2010-11-04 Thread Steven A Rowe
Hi Jayant, I think you mean NL search as opposed to Boolean search: the ability to return ranked results from queries based on non-required term matches. Right? If that is what you meant, then the answer is: "Yes!". If not, then you should rephrase your question. Otherwise, the answer coul

RE: How do I this in Solr?

2010-11-05 Thread Steven A Rowe
Hi Varun, On 10/26/2010 at 11:26 PM, Varun Gupta wrote: > I will try to implement the two filters suggested by Steven and see how > the performance matches up. Have you made any progress? I was thinking about your use case, and it occurred to me that you could get what you want by reversing the

RE: Searching problem

2010-11-13 Thread Steven A Rowe
Hi Riz, You likely have some form of stemming in your indexing analysis chain - this may cause "panasonic", e.g., to be indexed as "panason". (The remainder of this email assumes that this is true.) When you search for "panasonic", presumably with the same stemming filter in your query analys

RE: IndexableBinaryStringTools (was FieldCache)

2010-11-13 Thread Steven A Rowe
Hi Mathias, > > > I assume that the char[] returned form > > > IndexableBinaryStringTools.encode is encoded in UTF-8 again > > > and then stored. At some point the information is lost and > > > cannot be recovered. > > > > Can you give an example? This should not happen. > > My character array r

RE: IndexableBinaryStringTools (was FieldCache)

2010-11-13 Thread Steven A Rowe
On 11/13/2010 at 2:04 PM, Yonik Seeley wrote: n Sat, Nov 13, 2010 at 1:50 PM, Steven A Rowe wrote: > > Looks to me like the returned value is in a Solr-internal form of XML > > character escaping: \u is represented as "#0;" and \u0008 is > > represented as &quo

RE: Sort by geoDist() syntax error on 3.5

2012-01-30 Thread Steven A Rowe
Hi darul, Nobody got your suggested "Correct syntax": Nabble.com strips markup, e.g. XML, from emails that you send through them. I have complained to them about this problem through their support channel repeatedly, and you can see the result: they have done nothing to fix the problem. My su

RE: Sort by geoDist() syntax error on 3.5

2012-01-30 Thread Steven A Rowe
No, it's not. If I follow the link below your email to nabble.com, I can read your message, including your suggested correct syntax - I'll quote it below for you, and since I don't use nabble, everyone who reads this list will be able to see it: > Correct syntax may be : > > select?q=%20&s

RE: PatternReplaceFilterFactory group

2012-02-16 Thread Steven A Rowe
Hi O., PatternReplaceFilter(Factory) uses Matcher.replaceAll() or replaceFirst(), both of which take in a string that can include any or all groups using the syntax "$n", where n is the group number. See the Matcher.appendReplacement() javadocs for an explanation of the functionality and synta

RE: customizing standard tokenizer

2012-02-17 Thread Steven A Rowe
Hi Torsten, The Lucene StandardTokenizer is written in JFlex (http://jflex.de) - you can see the version 3.X specification at: You can m

RE: Trunk build errors

2012-02-23 Thread Steven A Rowe
Hi Darren, I use Ant 1.7.1. There have been some efforts to make the build work with Ant 1.8.X, but it is not (yet) the required version. So if you're not using Ant 1.7.1, I suggest you try it. Steve > -Original Message- > From: dar...@ontrenet.com [mailto:dar...@ontrenet.com] > Sent

RE: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Steven A Rowe
Hi Alexey, Lucene's QueryParser, and at least some of Solr's query parsers - I'm not familiar with all of them - have the problem you mention: analyzers are fed queries word-by-word, instead of whole strings between operators. There is a JIRA issue for fixing this, but no work done yet:

RE: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Steven A Rowe
On 2/27/2012 at 3:16 PM, Alexey Verkhovsky wrote: > By the way, I'm not sure that edismax interpreting 'wal mart' as 'wal' OR > 'mart' is really a bug that should be fixed. It's a counter-intuitive > behavior, for sure, but - per my understanding - edismax is supposed to > treat consecutive words a

RE: mailto: scheme aware tokenizer

2012-03-18 Thread Steven A Rowe
Hi Kai, I have created an issue for this: https://issues.apache.org/jira/browse/LUCENE-3880 Thanks for reporting! Steve -Original Message- From: Kai Gülzau [mailto:kguel...@novomind.com] Sent: Friday, March 16, 2012 9:59 AM To: solr-user@lucene.apache.org Subject: mailto: scheme aware

RE: "ant test" and contribs

2012-03-24 Thread Steven A Rowe
Hi Lance, Are you adding a new solr/contrib/project/? If so, why not use the build.xml file from a sibling project? E.g. try starting from solr/contrib/velocity/build.xml - it is very simple and enables all build steps by importing solr/contrib/contrib-build.xml. solr/contrib/contrib-build.x

RE: "ant test" and contribs

2012-03-26 Thread Steven A Rowe
parts that export jars. But there is no solr/contrib that needs one of those jars, is there? On Sat, Mar 24, 2012 at 5:05 PM, Steven A Rowe wrote: > Hi Lance, > > Are you adding a new solr/contrib/project/?  If so, why not use the build.xml > file from a sibling project?  E.g. try

RE: reproducibility of query results

2012-04-01 Thread Steven A Rowe
If your results are only sorted by score, it's possible that some have exactly the same score. Unless you use a secondary sort, I don't think the order of returned results among same-scored hits is guaranteed. As a result, if you cut off hits at some fixed threshold, you could see different en

RE: Moving to Maven from Ant solr.build.dir Not Found

2012-04-10 Thread Steven A Rowe
Hi Eli, The author of the blog post you mentioned appears to be unaware of the Maven POMs that are already included in Subversion for both Lucene and Solr. See . Because of the complex nature of the Ant build, which the Maven POMs cannot enti

RE: Moving to Maven from Ant solr.build.dir Not Found

2012-04-10 Thread Steven A Rowe
l I'm doing right now is just mvn installing and then trying mvn jetty:run-exploded. I'm not using ant at all, and would really like to keep it that way if at all possible. Eli On 4/10/12 11:56 AM, Steven A Rowe wrote: > Hi Eli, > > The author of the blog post you mentione

RE: Moving to Maven from Ant solr.build.dir Not Found

2012-04-10 Thread Steven A Rowe
ing wrong here. I'm fine with maven being officially unsupported as long as I can get things working. I'm not doing anything too fancy or out of the ordinary, so I'm thinking this shouldn't be too bad. Thanks again for the help! Eli On 4/10/12 2:12 PM, Steven A Rowe wrote: >

RE: Moving to Maven from Ant solr.build.dir Not Found

2012-04-10 Thread Steven A Rowe
on 4.0-SNAPSHOT org.apache.lucene lucene-queries 4.0-SNAPSHOT I know I could download the snapshot manually, but I'd much prefer to do that through Maven since I don't need to modify source at all. Eli On 4/10/12 3:14 PM, Steven A Rowe wrote: > You didn't answer my question a

RE: StandardTokenizer and domain names containing digits

2012-04-19 Thread Steven A Rowe
Hi Alex, TLDR; Try adding WordDelimiterFilter to your analyzer(s). StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary rules from Unicode 6.0.0 Standard Annex #29, a.k.a. UAX#29: . These rules don't include reco

RE: StandardTokenizer and domain names containing digits

2012-04-23 Thread Steven A Rowe
: StandardTokenizer and domain names containing digits Steven A Rowe syr.edu> writes: > StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary > rules from Unicode 6.0.0 Standard > Annex #29, a.k.a. UAX#29: <http://www.unicode.org/reports/tr29/tr29- 17.html#Word_Bounda

RE: Implementing multiterm chain for ICUCollationKeyFilterFactory

2012-05-03 Thread Steven A Rowe
Hi Oliver, Nabble.com stripped out your analysis chain XML before sending your message to the mailing list. My suggestion: stop using Nabble. (I've described this problem to their support people a couple of times, and they apparently just don't care, since it still persists, years later.)

RE: How does "start.jar" get build in the Solr trunk repository?

2012-05-07 Thread Steven A Rowe
Hi Neil, ivy-maven-plugin might work: http://evgeny-goldin.com/wiki/Ivy-maven-plugin If it does work for you, I'm interested in the details. Alternatively, you could use maven-antrun-plugin to invoke the Ant build functionality you want from Maven. Steve -Original Message- From: nhoo

[MAVEN] Heads up: build changes

2012-05-08 Thread Steven A Rowe
If you use the Lucene/Solr Maven POMs to drive the build, I committed a major change last night (see https://issues.apache.org/jira/browse/LUCENE-3948 for more details): * 'ant get-maven-poms' no longer places pom.xml files under the lucene/ and solr/ directories. Instead, they are placed in a

RE: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Steven A Rowe
Hi Tanguy, I looked at the code, and I can see where the problem you describe is happening. I think it's a bug: if numbers are search terms, "stemming" them by compressing repeated digits makes little sense. Could you file a bug in JIRA? Please include the examples you gave in your earlier em

RE: Sort by length percentage match

2012-05-16 Thread Steven A Rowe
Hi Alejandro, N-grams might be a good fit. Using bigrams (n-grams of length 2) for "london", you'd get tokens "lo", "on", "nd", "do", "on". This should provide the hit ordering you want. Although it's not listed on Solr's analysis factories wiki page

RE: How to import this Json-line by DIH?

2012-06-20 Thread Steven A Rowe
Hi jueljust, Nabble removed the entire content of your email before sending it to the mailing list. Maybe use a different service that doesn't throw away your message? Steve From: jueljust [juelj...@gmail.com] Sent: Wednesday, June 20, 2012 10:56 AM To:

RE: Shingles and Delimiter Help

2010-11-22 Thread Steven A Rowe
Hi Jessy, Several ShingleFilter(Factory) improvements, including the ability to specify minShingleSize, were introduced on the Solr/Lucene 3.x, and so are not available in Solr 1.4.X/Lucene 2.9.X. (This is your #1 issue.) For details about the changes and when they were introduced: http://wiki

RE: SOLR Thesaurus

2010-12-02 Thread Steven A Rowe
Hi Lee, Can you describe your thesaurus format (it's not exactly self-descriptive) and how you would like it to be applied? I gather you're referring to a thesaurus feature in another product (or product class)? Maybe if you describe that it would help too. Steve > -Original Message-

RE: solrj & http client 4

2010-12-22 Thread Steven A Rowe
Stevo, You may be interested in LUCENE-2657 , which provides full POMs for Lucene/Solr trunk. I don't use Eclipse, but I think it can use POMs to bootstrap project configuration. (I know IntelliJ can do this.) Steve > -Original Message-

RE: Problem while creating Polish supported SOLR artifact creation

2011-01-03 Thread Steven A Rowe
Hi Johnny, The patch at SOLR-2237 has already been applied to trunk (see the final comment on the issue, where Robert Muir stated that he committed the patch). So just check out trunk - don't use the "-r {2010-11-15}" option to "svn co". Good luck, Steve > -Original Message- > From: jo

RE: Apply a patch

2011-01-03 Thread Steven A Rowe
Hi Darx, On 1/3/2001 at 2:15 AM, Darx Oman wrote: > I checked out source code svn, and applied the patch > but when I build the source code I've got the following error > > C:\trunk\solr\common-build.xml:245: C\trunk\modules\analysis\phonetic > does not exist. You have to check out everything

RE: Problem while creating Polish supported SOLR artifact creation

2011-01-03 Thread Steven A Rowe
Johnny, On 1/3/2011 at 5:56 AM, johnnyisrael wrote: > I tried with the latest trunk and try tried the command "ant dist", Still > it is throwing the same 6 errors. > > https://svn.apache.org/repos/asf/lucene/dev/trunk/ The latest trunk Solr requires minimum Java 1.6 - maybe you're using 1.5?

RE: Sub query using SOLR?

2011-01-04 Thread Steven A Rowe
Hi Barani, I haven't tried it myself, but the limited JOIN functionality provided by SOLR-2272 sounds very similar to what you want to do: https://issues.apache.org/jira/browse/SOLR-2272 Steve > -Original Message- > From: bbarani [mailto:bbar...@gmail.com] > Sent: Tuesday, January 0

RE: Not storing, but highlighting from document sentences

2011-01-12 Thread Steven A Rowe
Hi Otis, I think you can get what you want by doing the first stage retrieval, and then in the second stage, add required constraint(s) to the query for the matching docid(s), and change the AND operators in the original query to OR. Coordination will cause the best snippet(s) to rise to the t

RE: Not storing, but highlighting from document sentences

2011-01-12 Thread Steven A Rowe
> > I think you can get what you want by doing the first stage retrieval, > > and then in the second stage, add required constraint(s) to the query > > for the matching docid(s), and change the AND operators in the > > original query to OR. Coordination will cause the best snippet(s) to > > rise

RE: Not storing, but highlighting from document sentences

2011-01-12 Thread Steven A Rowe
Hi Tomislav, > if I understand correctly, you are suggesting query execution in two > phases: first execute query on whole article index core (where whole > articles are indexed, but not stored) to get article IDs (for articles > which match original query). Then for each match in article core: >

RE: start value in queries zero or one based?

2011-01-13 Thread Steven A Rowe
> Please, read every wiki page you can find and write notes. NO!!! Once you start down this road, there is no turning back! Soon you will feel the need to turn your notes into a new wiki page or a blog post, and people will read those and write notes, and the process will repeat, ad infinitum

RE: solrj & http client 4

2011-01-17 Thread Steven A Rowe
Hi Stevo, Thanks for reviewing the Maven POMs in LUCENE-2657 - I appreciate it. > In those poms, not all modules have explicit version and groupId which > is a bad practice. Really? According to the "POM best practices" section in Sonatype's Maven book

RE: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Steven A Rowe
> [x] ASF Mirrors (linked in our release announcements or via the Lucene > website) > > [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) > > [x] I/we build them from source via an SVN/Git checkout.

RE: Indexing all permutations of words from the input

2011-01-20 Thread Steven A Rowe
Hi Martin, The co-occurrence filter I'm working on at https://issues.apache.org/jira/browse/LUCENE-2749 would do what you want (among other things). Still vaporware at this point, as I've only put a couple of hours into it, so don't hold your breath :) Steve > -Original Message- > Fro

RE: How to search for special chars like ä from ae?

2011-02-07 Thread Steven A Rowe
Hi Anithya, There is a mapping file for MappingCharFilterFactory that behaves the same as ASCIIFoldingFilterFactory: mapping-FoldToASCII.txt, located in Solr's example conf/ directory in Solr 3.1+. You can rename and then edit this file to map "ä" to "ae", " ü" to "ue", etc. (look for "WITH DI

RE: How to search for special chars like ä from ae?

2011-02-08 Thread Steven A Rowe
Hi Anithya, Yes, that sounds right. You will want to edit mapping-FoldToASCII.txt, and my suggestion is that you rename mapping-FoldToASCII.txt to reflect your changes (for example, if your target language is German, you could rename it to mapping-German-FoldToASCII.txt); otherwise it would be

RE: How to search for special chars like ä from ae?

2011-02-08 Thread Steven A Rowe
Hi Anithya, That's good to hear. Again, please consider donating your work: . Steve > -Original Message- > From: Anithya [mailto:surysha...@gmail.com] > Sent: Tuesday, February 08, 2011 5:16 PM > To: solr-user@lucene.apache.or

RE: Query on multivalue field

2011-03-01 Thread Steven A Rowe
Hi Scott, Querying against a multi-valued field just works - no special incantation required. Steve > -Original Message- > From: Scott Yeadon [mailto:scott.yea...@anu.edu.au] > Sent: Monday, February 28, 2011 11:50 PM > To: solr-user@lucene.apache.org > Subject: Query on multivalue fiel

RE: Help please - recursively indexing lots and lots of text files

2011-03-04 Thread Steven A Rowe
Hi Colin, Solr's DataImportHandler sounds like what you want: http://wiki.apache.org/solr/DataImportHandler In particular, take a look at FileListEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor Steve > -Original Message- > From: c

RE: Question regarding XSLT

2011-03-30 Thread Steven A Rowe
Hi Marcelo, Try adding the 'method="text"' attribute to your tag, e.g.: If that doesn't work, there is another attribute "omit-xml-declaration" that might do the trick. See http://www.w3.org/TR/xslt#output for more info. Steve > -Original Message- > From: Marcelo Iturbe [mailto:mar

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Steven A Rowe
I added this test method locally to TestASCIIFoldingFilter.java in the Lucene/Solr 3.1.0 source tree, and it passed, so the filter is not the problem (and the Solr factory certainly isn't either - it's just a wrapper) - I second Ludovic's question - you must have other filters configured: pub

RE: Embedded Solr constructor not returning

2011-04-06 Thread Steven A Rowe
Hi Greg, > I need the servlet API in my app for it to work, despite being command > line. > So adding this to the maven POM fixed everything: > > javax.servlet > servlet-api > 2.5 > > > Perhaps this dependency could be listed on the wiki? Alon

RE: XML not coming through from nabble to Gmail

2011-04-12 Thread Steven A Rowe
I've asked on Nabble if they know of a fix for the problem: http://nabble-support.1.n2.nabble.com/solr-dev-mailing-list-tp6023495p6264955.html Steve > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, April 12, 2011 8:43 AM > To: Chris Hostetter

RE: Split token

2011-04-15 Thread Steven A Rowe
This pattern split tokens *only* in the presence of parentheses with adjoining whitespace, and includes the parentheses with the tokens: (?<=\))\s+|\s+(?=\() So you'll get this kind of behavior: Tottenham Hotspur (London) F.C. Internationale (milan) FC Midtjylland (Herning) (Ikast)

RE: Apache Solr 3.1.0

2011-04-26 Thread Steven A Rowe
Hi Wodek, UAX29URLEmailTokenizer includes all of StandardTokenizer's rules and adds rules to tokenize URLs and Emails: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.UAX29URLEmailTokenizerFactory Steve > -Original Message- > From: Wodek Siebor [mailto:siebor_wlo...@ba

RE: How to Update Value of One Field of a Document in Index?

2011-04-27 Thread Steven A Rowe
> There's the "limited join" patch, see: > https://issues.apache.org/jira/browse/SOLR-2272 > that hasn't been applied yet Correction: Yonik committed this feature in r1096978.

RE: Getting field information inside a Tokenizer

2011-05-03 Thread Steven A Rowe
Hi FMC, On 5/3/2011 at 12:37 PM, FatMan Corp wrote: > Hi, I would like to get another's field information for the same document > within a Tekonizer class. > How can this be achieved? Use s in your schema , and associate different analysis pipe

RE: Is it possible to build Solr as a maven project?

2011-05-05 Thread Steven A Rowe
Hi Gabriele, The sequence should be 1. svn update 2. ant get-maven-poms 3. mvn -N -Pbootstrap install I think you left out #2 - there was a very recent change to the POMs that affects the noggit jar name. Steve > -Original Message- > From: Gabriele Kahlout [mailto:gabri...@mysimpatico

  1   2   >