Re: Unable to get offsets using AtomicReader.termPositionsEnum(Term)

2014-03-10 Thread Robert Muir
Hello, I think you are confused between two different index structures, probably because of the name of the options in solr. 1. indexing term vectors: this means given a document, you can go lookup a miniature "inverted index" just for that document. That means each document has "term vectors" whi

[ANNOUNCE] Apache Solr 4.7.2 released.

2014-04-15 Thread Robert Muir
April 2014, Apache Solr™ 4.7.2 available The Lucene PMC is pleased to announce the release of Apache Solr 4.7.2 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted sear

[ANNOUNCE] Apache Solr 4.8.1 released

2014-05-20 Thread Robert Muir
May 2014, Apache Solr™ 4.8.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.8.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search

[ANNOUNCE] Apache Solr 4.9.0 released

2014-06-25 Thread Robert Muir
25 June 2014, Apache Solr™ 4.9.0 available The Lucene PMC is pleased to announce the release of Apache Solr 4.9.0 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted se

Re: Background merge errors with Solr 4.4.0 on Optimize call

2013-10-29 Thread Robert Muir
I think its a bug, but thats just my opinion. i sent a patch to dev@ for thoughts. On Tue, Oct 29, 2013 at 6:09 PM, Erick Erickson wrote: > Hmmm, so you're saying that merging indexes where a field > has been removed isn't handled. So you have some documents > that do have a "what" field, but you

Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Robert Muir
which example? there are so many. On Wed, Nov 13, 2013 at 1:00 PM, Mark Miller wrote: > RE: the example folder > > It’s something I’ve been pushing towards moving away from for a long time - > see https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to > 'server' and pull exampl

Re: Bad fieldNorm when using morphologic synonyms

2013-12-06 Thread Robert Muir
Your analyzer needs to set positionIncrement correctly: sounds like its broken. On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh wrote: > Hi, > we implemented a morphologic analyzer, which stems words on index time. > For some reasons, we index both the original word and the stem (on the same > positi

Re: Bad fieldNorm when using morphologic synonyms

2013-12-06 Thread Robert Muir
ll right (for me). > 2) fieldNorm is determined by the size of the termVector, isn't it? the > termVector size isn't affected by the positions. > > > On Fri, Dec 6, 2013 at 10:46 AM, Robert Muir wrote: > >> Your analyzer needs to set positionIncrement correctly: so

Re: Bad fieldNorm when using morphologic synonyms

2013-12-08 Thread Robert Muir
its accurate, you are wrong. please, look at setDiscountOverlaps in your similarity. This is really easy to understand. On Sun, Dec 8, 2013 at 7:23 AM, Manuel Le Normand wrote: > Robert, you last reply is not accurate. > It's true that the field norms and termVectors are independent. But this >

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Robert Muir
no, its turned on by default in the default similarity. as i said, all that is necessary is to fix your analyzer to emit the proper position increments. On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand wrote: > In order to set discountOverlaps to true you must have added the > to the schema.x

Re: Tracking down the input that hits an analysis chain bug

2014-01-03 Thread Robert Muir
This exception comes from OffsetAttributeImpl (e.g. you dont need to index anything to reproduce it). Maybe you have a missing clearAttributes() call (your tokenizer 'returns true' without calling that first)? This could explain it, if something like a StopFilter is also present in the chain: basi

[ANNOUNCE] Apache Solr 4.6.1 released.

2014-01-28 Thread Robert Muir
January 2014, Apache Solr™ 4.6.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.6.1Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted searc

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
you need the solr analysis-extras jar in your classpath, too. On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer wrote: > Hello, > > I'm migrating to solr 4.6.1 and have problems with the ICUCollationField > (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100). > > I get consistently the error message

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
classes mentioned are > loaded. > > Do you know which jar is supposed to contain the ICUCollationField? > > Best regards > Thomas > > > > Am 19.02.2014 um 13:54 schrieb Robert Muir: > > > you need the solr analysis-extras jar in your classpath, too. > > &g

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
HOME/lib in order to use it." > > is misleading insofar as this README.txt doesn't mention the > solr-analysis-extras-4.6.1.jar in dist. > > Best > Thomas > > > Am 19.02.2014 um 14:27 schrieb Robert Muir: > > > you need the solr analysis-extras

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
On Wed, Feb 19, 2014 at 10:33 AM, Thomas Fischer wrote: > > > Hmm, for standardization of text fields, collation might be a little > > awkward. > > I arrived there after using custom rules for a while (see > "RuleBasedCollator" on http://wiki.apache.org/solr/UnicodeCollation) and > then being tol

Re: ANNOUNCE: Apache Solr Reference Guide for 4.7

2014-03-05 Thread Robert Muir
I debugged the PDF a little. FWIW, the following code (using iText) takes it to 9MB: public static void main(String args[]) throws Exception { Document document = new Document(); PdfSmartCopy copy = new PdfSmartCopy(document, new FileOutputStream("/home/rmuir/Downloads/test.pdf")); /

Re: Using per-segment FieldCache or DocValues in custom component?

2013-07-02 Thread Robert Muir
Where do you get the docid from? Usually its best to just look at the whole algorithm, e.g. docids come from per-segment readers by default anyway so ideally you want to access any per-document things from that same segmentreader. As far as supporting docvalues, FieldCache API "passes thru" to doc

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Robert Muir
If you use wikipediatokenizer it will tag different wiki elements with different types (you can see it in the admin UI). so then followup with typetokenfilter to only filter the types you care about, and i think it will do what you want. On Tue, Jul 23, 2013 at 7:53 AM, Furkan KAMACI wrote: > Hi

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Robert Muir
On Mon, Aug 5, 2013 at 11:42 AM, Chris Hostetter wrote: > > : I agree with you, 0xfffe is a special character, that is why I was asking > : how it's handled in solr. > : In my document, 0xfffe does not appear at the beginning, it's in the > : content. > > Unless i'm missunderstanding something (an

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Robert Muir
On Mon, Aug 5, 2013 at 3:03 PM, Chris Hostetter wrote: > > : > 0xfffe is not a special character -- it is explicitly *not* a character in > : > Unicode at all, it is set asside as "not a character." specifically so > : > that the character 0xfeff can be used as a BOM, and if the BOM is read > : >

Re: Purging unused segments.

2013-08-09 Thread Robert Muir
On Fri, Aug 9, 2013 at 7:48 PM, Erick Erickson wrote: > > So is there a good way, without optimizing, to purge any segments not > referenced in the segments file? Actually I doubt that optimizing would > even do it if I _could_, any phantom segments aren't visible from the > segments file anyway..

Re: Is there a way to store binary data (byte[]) in DocValues?

2013-08-12 Thread Robert Muir
On Mon, Aug 12, 2013 at 8:38 AM, Mathias Lux wrote: > Hi! > > I'm basically searching for a method to put byte[] data into Lucene > DocValues of type BINARY (see [1]). Currently only primitives and > Strings are supported according to [1]. > > I know that this can be done with a custom update hand

Re: Is there a way to store binary data (byte[]) in DocValues?

2013-08-12 Thread Robert Muir
On Mon, Aug 12, 2013 at 12:25 PM, Mathias Lux wrote: > > Another thing for not using the the SORTED_SET and SORTED > implementations is, that Solr currently works with Strings on that and > I want to have a small memory footprint for millions of images ... > which does not go well with immutables.

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
did you do a (real) commit before trying to use this? I am not sure how this splitting works, but at least the merge option requires that. i can't see this happening unless you are somehow splitting a 0 document index (or, if the splitter is creating 0 document splits) so this is likely just a sym

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
Well, i meant before, but i just took a look and this is implemented differently than the "merge" one. In any case, i think its the same bug, because I think the only way this can happen is if somehow this splitter is trying to create a 0-document "split" (or maybe a split containing all deletions

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
On Tue, Aug 13, 2013 at 11:39 AM, Shalin Shekhar Mangar wrote: > The splitting code calls commit before it starts the splitting. It creates > a LiveDocsReader using a bitset created by the split. This reader is merged > to an index using addIndexes. > > Shouldn't the addIndexes code then ignore al

Re: PostingsHighlighter returning fields which don't match

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 3:53 AM, ses wrote: > We are trying out the new PostingsHighlighter with Solr 4.2.1 and finding > that the highlighting section of the response includes self-closing tags > for > all the fields in hl.fl (by default for edismax it is all fields in qf) > where there are no h

Re: Who's cleaning the Fieldcache?

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 5:29 PM, Chris Hostetter wrote: > > : why? Those are my sort fields and they are occupying a lot of space (doubled > : in this case but I see that sometimes I have three or four "old" segment > : references) > : > : Is there something I can do to remove those old references

Re: Who's cleaning the Fieldcache?

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 5:58 PM, Chris Hostetter wrote: > > : > FieldCaches are managed using a WeakHashMap - so once the IndexReader's > : > associated with those FieldCaches are no logner used, they will be garbage > : > collected when and if the JVMs garbage collector get arround to it. > : > >

Re: Problems installing Solr4 in Jetty9

2013-08-17 Thread Robert Muir
On Sat, Aug 17, 2013 at 3:59 AM, Chris Collins wrote: > I am using 4.4 in an embedded mode and found that it has a dependency on > hadoop 2.0.5. alpha that in turn depends on jetty 6.1.26 which I think > pre-dates electricity :-} > I think this is only a "test dependency" ?

Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Robert Muir
On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen wrote: > Schema with DocValues attempt at solving problem: > http://pastebin.com/Ne23NnW4 > Config: http://pastebin.com/x1qykyXW > This schema isn't using docvalues, due to a typo in your config. it should not be DocValues="true" but docValues="true"

Re: Requesting to add into a Contributor Group

2013-05-04 Thread Robert Muir
done. let us know if you have any problems. On Sat, May 4, 2013 at 10:12 AM, Krunal wrote: > Dear Sir, > > Kindly add me to the contributor group to help me contribute to the Solr > wiki. > > My Email id: jariwalakru...@gmail.com > Login Name: Krunal > > Specific changes I would like to make to

Re: Are there any plans to change example directory layout?

2013-06-11 Thread Robert Muir
If you have a good idea... Just do it. Open an issue On Jun 11, 2013 9:34 PM, "Alexandre Rafalovitch" wrote: > I think it is quite hard for beginners that basic solr example > directory is competing for attention with other - nested - examples. I > see quite a lot of questions on which directory

Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-16 Thread Robert Muir
On Fri, Sep 16, 2011 at 6:53 PM, Burton-West, Tom wrote: > Hello, > > The TieredMergePolicy has become the default with Solr 3.3, but the > configuration in the example uses the mergeFactor setting which applys to the > LogByteSizeMergePolicy. > > How is the mergeFactor interpreted by the Tiered

Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-20 Thread Robert Muir
On Mon, Sep 19, 2011 at 9:57 AM, Burton-West, Tom wrote: > Thanks Robert, > > Removing "set" from " setMaxMergedSegmentMB" and using "maxMergedSegmentMB" > fixed the problem. > ( Sorry about the multiple posts.  Our mail server was being flaky and the > client lied to me about whether the messag

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-21 Thread Robert Muir
On Tue, Sep 20, 2011 at 12:32 PM, Michael McCandless wrote: > > Or: is it possible you reopened the reader several times against the > index (ie, after committing from Solr)?  If so, I think 2.9.x never > unmaps the mapped areas, and so this would "accumulate" against the > system limit. In order

Re: payloads - Inconsistency between the document score and the explain score

2011-09-27 Thread Robert Muir
https://issues.apache.org/jira/browse/LUCENE-3421 Note: if you are using this 'includeSpanScore=false' (which I think you are, as thats where the bug applies), be aware this means the score is *only* the result of your payload, boosts, tf, length normalization, idf, none of this is incorporated in

Re: Indexing PDF

2011-10-04 Thread Robert Muir
Your persian pdf problem is different, and already taken care of in pdfbox trunk https://issues.apache.org/jira/browse/PDFBOX-1127 On Tue, Oct 4, 2011 at 2:04 PM, ahmad ajiloo wrote: > I have this problem too, in indexing some of persian pdf files. > > 2011/10/4 Héctor Trujillo > >> Hi all, I'm

Re: New scoring models in LUCENE/SOLR (LUCENE-2959)

2011-10-05 Thread Robert Muir
On Wed, Oct 5, 2011 at 2:23 PM, David Ryan wrote: > Hi, > > According to the IRA issue 2959, > https://issues.apache.org/jira/browse/LUCENE-2959 > > BM25 will be included in the next release of LUCENE. > > 1). Will BM25F be included in the next release as well as part > of LUCENE-2959? should be

Re: New scoring models in LUCENE/SOLR (LUCENE-2959)

2011-10-05 Thread Robert Muir
On Wed, Oct 5, 2011 at 3:03 PM, David Ryan wrote: > Do you mean both BM25 and BM25F? > > No, BM25F and other "fielded" or structured models are somewhat different. In these model, if you have two fields (body/title) you are saying that "dogs" in body is actually the same term as "dogs" in title.

Re: stemEnglishPossessive and contractions

2011-10-19 Thread Robert Muir
The word delimiter filter also does other things, it treats ' as punctuation by default. So it normally splits on ', except if its 's (in this case it removes the 's completely if you use this stemEnglishPossessive). There are a couple approaches you can use: 1. you can keep worddelimiterfilter wi

Re: changing omitNorms on an already built index

2011-10-27 Thread Robert Muir
On Thu, Oct 27, 2011 at 6:00 PM, Simon Willnauer wrote: > we are not actively removing norms. if you set omitNorms=true and > index documents they won't have norms for this field. Yet, other > segment still have norms until they get merged with a segment that has > no norms for that field ie. omit

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Robert Muir
On Fri, Oct 28, 2011 at 5:03 PM, Jason Rutherglen wrote: > +1 I suggested it should be backported a while back.  Or that Lucene > 4.x should be released.  I'm not sure what is holding up Lucene 4.x at > this point, bulk postings is only needed useful for PFOR. This is not true, most modern index

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Robert Muir
On Fri, Oct 28, 2011 at 8:10 PM, Jason Rutherglen wrote: >> Otherwise we have "flexible indexing" where "flexible" means "slower >> if you do anything but the default". > > The other encodings should exist as modules since they are pluggable. > 4.0 can ship with the existing codec.  4.1 with addit

Re: SolrCloud with large synonym files

2011-11-02 Thread Robert Muir
On Wed, Nov 2, 2011 at 8:53 AM, Phil Hoy wrote: > It is solr 4.0 and uses the new FSTSynonymFilterFactory i believe but defers > to ZkSolrResourceLoader to load the synonym file when in cloud mode. > Phil > FYI: The synonyms implementation supports multiple formats (currently "solr" and "wordnet

Re: [Solr-3.4] Norms file size is large in case of many unique indexed fields in index

2011-11-10 Thread Robert Muir
what is the point of a unique indexed field? If for all of your fields, there is only one possible document, you don't need length normalization, scoring, or a search engine at all... just use a HashMap? On Thu, Nov 10, 2011 at 7:42 AM, Ivan Hrytsyuk wrote: > Hello everyone, > > We have large in

Re: trouble with CollationKeyFilter

2011-11-23 Thread Robert Muir
hi, locale sensitive range queries don't work with these filters, only sort, although erick erickson has a patch that will enable this (the lowercasing wildcards patch, then you could add this filter to your multiterm chain). separately locale range queries and sort both work easily on trunk (wit

Re: trouble with CollationKeyFilter

2011-11-25 Thread Robert Muir
On Wed, Nov 23, 2011 at 11:22 PM, Michael Sokolov wrote: > Thanks for confirming that, and laying out the options, Robert. > FYI: Erick committed the multiterm stuff, so I opened an issue for this: https://issues.apache.org/jira/browse/SOLR-2919 -- lucidimagination.com

Re: trouble with CollationKeyFilter

2011-11-27 Thread Robert Muir
On Sat, Nov 26, 2011 at 8:43 PM, Michael Sokolov wrote: > That's great news!  We can't really track trunk, but it looks like this is > targeted for 3.6, right? As a short-term alternative, I was considering > using ICUFoldingFilter; this won't preserve some of the finer distinctions, > but will at

Re: DirectSolrSpellChecker on request specified field.

2011-11-28 Thread Robert Muir
technically it could? I'm just not sure if the current spellchecking apis allow for it? But maybe someone has a good idea on how to easily expose this. I think its a good idea. Care to open a JIRA issue? On Mon, Nov 28, 2011 at 1:31 PM, Phil Hoy wrote: > Hi, > > Can the DirectSolrSpellChecker b

Re: DirectSolrSpellChecker on request specified field.

2011-11-28 Thread Robert Muir
On Mon, Nov 28, 2011 at 4:36 PM, Phil Hoy wrote: > Added issue: https://issues.apache.org/jira/browse/SOLR-2926 > Please let me know if more information needs adding to JIRA. > > Phil > Thanks, I'll followup on the issue -- lucidimagination.com

Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

2011-11-29 Thread Robert Muir
On Tue, Nov 29, 2011 at 8:07 AM, elisabeth benoit wrote: > Hello, > > I'd like to know if the Levensthein distance algorithm used by Solr 4.0 > DirectSpellChecker (working quite well I must say) is considering an > inversion as distance = 1 or distance = 2? > > For instance, if I write Monteruil a

Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

2011-11-29 Thread Robert Muir
On Tue, Nov 29, 2011 at 9:21 AM, elisabeth benoit wrote: > ok, thanks. > > I think it would be a nice improvment to consider inversion as distance = > 1, since it's a so common mistake. The distance = 2 makes it difficult to > correct transpositions on small words (for instance, the DirectSpellChe

Re: RegexQuery performance

2011-12-08 Thread Robert Muir
On Thu, Dec 8, 2011 at 11:01 AM, Jay Luker wrote: > Hi, > > I am trying to provide a means to search our corpus of nearly 2 > million fulltext astronomy and physics articles using regular > expressions. A small percentage of our users need to be able to > locate, for example, certain types of iden

Re: Solr Lucene Index Version

2011-12-08 Thread Robert Muir
On Thu, Dec 8, 2011 at 10:46 AM, Mark Miller wrote: > > On Dec 8, 2011, at 8:50 AM, Jamie Johnson wrote: > >> Isn't the codec stuff merged with trunk now? > > Robert merged this recently AFAIK. > true but that issue only moved the majority of the rest of the index (stored fields, term vectors, fi

Re: Solr Lucene Index Version

2011-12-08 Thread Robert Muir
On Thu, Dec 8, 2011 at 12:55 PM, Jamie Johnson wrote: > Thanks Andrzej.  I'll continue to follow the portable format JIRA > along with 3622, are there any others that you're aware of that are > blockers that would be useful to watch? > There is a lot to be done, particularly norms and deleted doc

Re: codec="Pulsing" per field broken?

2011-12-11 Thread Robert Muir
On Sun, Dec 11, 2011 at 11:34 AM, eks dev wrote: > on the latest trunk, my schema.xml with field type declaration > containing //codec="Pulsing"// does not work any more (throws > exception from FieldType). It used to work wit approx. a month old > trunk version. > > I didn't dig deeper, can be th

Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Robert Muir
On Mon, Dec 12, 2011 at 5:18 AM, Max wrote: > The end offset remains 11 even after folding and transforming "æ" to > "ae", which seems wrong to me. End offsets refer to the *original text* so this is correct. What is wrong, is EdgeNGramsFilter. See how it turns that 11 to a 12? > > I also stum

Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Robert Muir
On Mon, Dec 12, 2011 at 5:18 AM, Max wrote: > It seems like there is some weird stuff going on when folding the > string, it can be seen in the analysis view, too: > > http://i.imgur.com/6B2Uh.png > I created a bug here, https://issues.apache.org/jira/browse/LUCENE-3642 Thanks for the screensho

Re: feature of FST version of SynonymFilter affects Highlighter

2011-12-26 Thread Robert Muir
The old one didn't really handle this correctly either. Koji, what is the highlighting problem? Can we have a test case? 2011/12/26 Koji Sekiguchi : > I found that SynonymFilter javadoc says: > > "Matches single or multi word synonyms in a token stream. > This token stream cannot properly handle

Re: feature of FST version of SynonymFilter affects Highlighter

2011-12-26 Thread Robert Muir
On Mon, Dec 26, 2011 at 10:54 AM, Koji Sekiguchi wrote: > I don't have JUnit test case. What I tried was: > > I have indexing time synonym definition: > > nhl, national hockey league > > and I indexed "I like national hockey league". > > Then I searched nhl with hl=on, I got an unwanted highlight

Re: GermanAnalyzer

2012-01-14 Thread Robert Muir
On Sat, Jan 14, 2012 at 12:58 PM, wrote: > Hi, > > I'm switching from Lucene 2.3 to Solr 3.5. I want to reuse the existing > indexes (huge...). If you want to use a Lucene 2.3 index, then you should set this in your solrconfig.xml: LUCENE_23 > > In Lucene I use an untweaked org.apache.lucene.a

Re: GermanAnalyzer

2012-01-14 Thread Robert Muir
On Sat, Jan 14, 2012 at 5:09 PM, Lance Norskog wrote: > Has the GermanAnalyzer behavior changed at all? This is another kind > of mismatch, and it can cause very subtle problems.  If text is > indexed and queried using different Analyzers, queries will not do > what you think they should. It acts

Re: Trying to understand SOLR memory requirements

2012-01-16 Thread Robert Muir
looks like https://issues.apache.org/jira/browse/SOLR-2888. Previously, FST would need to hold all the terms in RAM during construction, but with the patch it uses offline sorts/temporary files. I'll reopen the issue to backport this to the 3.x branch. On Mon, Jan 16, 2012 at 8:31 PM, Dave wrot

Re: Trying to understand SOLR memory requirements

2012-01-17 Thread Robert Muir
how long it will take to > get a fix? Would I be better switching to trunk? Is trunk stable enough for > someone who's very much a SOLR novice? > > Thanks, > Dave > > On Mon, Jan 16, 2012 at 10:08 PM, Robert Muir wrote: > >> looks like https://issues.apache.org/j

Re: Trying to understand SOLR memory requirements

2012-01-19 Thread Robert Muir
>> at >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) >>  at >> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) >> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) >

Re: Trying to understand SOLR memory requirements

2012-01-19 Thread Robert Muir
countryid, > c.plainname as countryname, p.timezone as timezone, r.id as regionid, > r.plainname as regionname from places p, regions r, countries c, cities c2 > where c2.id = p.cityid AND p.settingid = 1 AND p.regionid > 1 AND > p.countryid=c.id AND r.id=p.regionid" >            transformer="TemplateTransformer"> >             >             >    

Re: Plural only stemmer

2010-06-17 Thread Robert Muir
make it work with the KStem jars? > > Thanks! > -- Robert Muir rcm...@gmail.com

Re: MappingCharFilterFactory equivalent for use after tokenizer?

2010-06-18 Thread Robert Muir
idea.) > > I don't think we should do this. how many tokens would make? (such malformed input exists in the wild, e.g. someone spills beer on their keyboard and they key gets sticky) -- Robert Muir rcm...@gmail.com

Re: fuzzy query performance

2010-06-23 Thread Robert Muir
On Wed, Jun 23, 2010 at 3:34 PM, Peter Karich wrote: > > So, you mean I should try it out her: > http://svn.apache.org/viewvc/lucene/dev/trunk/solr/ > > yes, the speedups are only in trunk. -- Robert Muir rcm...@gmail.com

Re: Stemmed and/or unStemmed field

2010-06-23 Thread Robert Muir
results with word > 'comfort' in the title. I assume it is because of stemming. What is the > right way to handle this? > from your examples, it seems a more lightweight stemmer might be an easy option: https://issues.apache.org/jira/browse/LUCENE-2503 -- Robert Muir rcm...@gmail.com

Re: NGramFilterFactory usage

2010-06-26 Thread Robert Muir
ost on the > nGram_text field. > > If I do a *:* on the Solr administration interface it shows the nGram_text > field to be populated. > However if I search for plan (Assume I indexed the word Plane) no results > are shown. > Is there any other configurations that needs to be done ? > > Thanks in advance, > > Regards, > Indika > -- Robert Muir rcm...@gmail.com

Re: Indexing slowdowns

2010-07-08 Thread Robert Muir
ght be something in analysis) -- Robert Muir rcm...@gmail.com

Re: Polish language support?

2010-07-09 Thread Robert Muir
-- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Robert Muir rcm...@gmail.com

Re: Foreign characters question

2010-07-14 Thread Robert Muir
Solr - User mailing list archive at Nabble.com. > -- Robert Muir rcm...@gmail.com

Re: Foreign characters question

2010-07-14 Thread Robert Muir
ache.org/jira/browse/SOLR-2003 In this case, the wrong encoding could have been detected and saved you some time... -- Robert Muir rcm...@gmail.com

Re: Error in building Solr-Cloud (ant example)

2010-07-15 Thread Robert Muir
cluded > >> ZooKeeper jar (java versioning issue) - so I had to download the source > and > >> build this. Now 'ant' gets a bit further, to the stage listed above. > >> > >> Any idea of the problem??? THANKS! > >> > >> [javac] Compiling 438 source files to > >> /Volumes/newpart/solrcloud/cloud/build/solr > >> [javac] > >> > /Volumes/newpart/solrcloud/cloud/src/java/org/apache/solr/cloud/ZkController.java:588: > >> cannot find symbol > >> [javac] symbol : method stringPropertyNames() > >> [javac] location: class java.util.Properties > >> [javac] for (String sprop : > >> System.getProperties().stringPropertyNames()) { > >> > > > > > > > -- Robert Muir rcm...@gmail.com

Re: How to get search results taking into account ortographies errors ???

2010-07-15 Thread Robert Muir
and I would want when a user make a search and forget to accent > the words the search results show both posibilities: the results without > the > accent an the results with the accent. > > would you help me please ??? > Regards > Ariel > -- Robert Muir rcm...@gmail.com

Re: slovene language support

2010-07-19 Thread Robert Muir
wrote: > Hi, > > I want to setup an solr with support for several languages. > The language list includes slovene, unfortunately I found nothing about it > in the wiki. > Has some one experiences with solr 1.4 and slovene? > > thanks for help > Markus -- Robert Muir rcm...@gmail.com

Re: Stemming

2010-07-20 Thread Robert Muir
message in context: > http://lucene.472066.n3.nabble.com/Stemming-tp982690p982690.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Robert Muir rcm...@gmail.com

Re: Stemming

2010-07-20 Thread Robert Muir
Stemming-tp982690p982786.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Robert Muir rcm...@gmail.com

Re: Russian stemmer

2010-07-27 Thread Robert Muir
м, Коврове. > > Are there other stemming plugins for the russian language that can handle > this? > If not, what are the options. A simple solution may be to use the wildcard > queries in Standard mode instead of the DisMaxQueryHandler: > Ковров* > > but I'd like to avoid it. > > Thanks. > -- Robert Muir rcm...@gmail.com

Re: Russian stemmer

2010-07-27 Thread Robert Muir
might give you less problems on average, but I noticed it has this same problem with the example you gave. On Tue, Jul 27, 2010 at 4:25 AM, Robert Muir wrote: > All of your examples stem to "ковров": > >assertAnalyzesTo(a, "Коврова Коврову Ковровом Коврове", >

Re: Russian stemmer

2010-07-27 Thread Robert Muir
?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0 > > > > Немцов: 74 articles > > > > > http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2 > > > > > > > > > -- Robert Muir rcm...@gmail.com

Re: Russian stemmer

2010-07-27 Thread Robert Muir
.* stopwords is ideal for the english language, > although in russian nouns are inflected: Борис, Борису, Бориса, Борисом > > I'll try the RussianLightStemFilterFactory (the article in the PDF > mentioned > it's more accurate). > > Once again thanks, > Oleg Bu

Re: Good list of English words that get "butchered" by Porter Stemmer

2010-07-30 Thread Robert Muir
agine such a list could be added to the example protwords.txt > > Thanks, > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > -- Robert Muir rcm...@gmail.com

Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir
lo, > > I have found the analysis tool in the admin page to be very useful in > understanding my schema. I've made changes to my schema so that a > particular case I'm looking at matches properly. I restarted solr, > deleted the document from the index, and added it again. But still, > when I do a query, the document does not get returned in the results. > > Does anyone have any tips for debugging this sort of issue? What is > different between what I see in analysis tool and new documents added > to the index? > > Thanks, > Justin > -- Robert Muir rcm...@gmail.com

Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir
ing the > "match" highlighting will actaully reduce confusion, but perhaps there is > verbage/disclaimers that could be added to make it more clear? > As i said before, I think i disagree with you. I think for stuff like this the technicals are less important, whats important is this is a misleading checkbox that really confuses users. I suggest disabling it entirely, you are only going to remove confusion. -- Robert Muir rcm...@gmail.com

Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir
rom 'Query Analyzer' is completely bogus. On Wed, Aug 4, 2010 at 1:57 PM, Robert Muir wrote: > > > On Wed, Aug 4, 2010 at 1:45 PM, Chris Hostetter > wrote: > >> >> it really only attempts to identify when there is overlap between >> analaysis at

Re: Index compatibility 1.4 Vs 3.1 Trunk

2010-08-05 Thread Robert Muir
after and including version X-1.0, but may-or-may-not be able to read indexes generated by version X-2.N. (And personally I think there is stuff in 2.x like modified-utf8 that i would object to adding support for with terms now as byte[]) -- Robert Muir rcm...@gmail.com

Re: Improve Query Time For Large Index

2010-08-11 Thread Robert Muir
tem won't form phrase queries unless the user explicitly puts double quotes around it. -- Robert Muir rcm...@gmail.com

Re: Improve Query Time For Large Index

2010-08-12 Thread Robert Muir
ll actually form slow phrase queries by default. > > > > do you mean that http://lucene.apache.org will be split up into "http > lucene apache org" and solr will perform a phrase query? > > Regards, > Peter. > -- Robert Muir rcm...@gmail.com

Re: analysis tool vs. reality

2010-08-12 Thread Robert Muir
on whitespace first. That's my > point: analysis.jsp doesn't make any assumptions about what query parser > *might* be used, it just tells you what your analyzers do with strings. > you're right, we should just fix the bug that the queryparser tokenizes on whitespace first. then

Re: analysis tool vs. reality

2010-08-12 Thread Robert Muir
sing. > even if you change the Lucene QUeryParser so that whitespace isn't a meta > character it doens't affect the underlying issue: analysis.jsp is agnostic > about QueryParsers. analysis.jsp isn't agnostic about queryparsers, its ignorant of them, and your default queryparser is actually a de-facto whitespace tokenizer, don't try to sugarcoat it. -- Robert Muir rcm...@gmail.com

Re: Index compatibility 1.4 Vs 3.1 Trunk

2010-08-12 Thread Robert Muir
r indexes will not be able to be read natively without conversion first (with maybe loss of analyzer compatibility)." the fact 4.0 can read 3.x indexes *at all* without a converter tool is only because Mike Mccandless went the extra mile. i dont see anything suggesting we should support any tools for 2.x indexes! -- Robert Muir rcm...@gmail.com

Re: analysis tool vs. reality

2010-08-16 Thread Robert Muir
ly why it comes up on the mailing list it seems at least every week [at this point you have to admit, there is a problem]. If you want to say the analysis tool is agnostic about queryparsers, thats fine, you can keep saying that. I'm saying it shouldn't be. -- Robert Muir rcm...@gmail.com

Re: analysis tool vs. reality

2010-08-16 Thread Robert Muir
ght, we should just fix the bug that the queryparser tokenizes on > whitespace first. then analysis.jsp will be significantly less confusing. >> dude .. not trying to get into a holy war here > -1 from me. > > well, that might be your opinion, but it doesn't change the facts. -- Robert Muir rcm...@gmail.com

Re: TurkishLowerCaseFilterFactory

2010-08-26 Thread Robert Muir
e analyzers jar! This way, in a single jar you have the TurkishLowerCaseFilter, but also the Turkish stemmer from snowball, a set of Turkish stopwords in resources/, and a Lucene TurkishAnalyzer that puts it all together. -- Robert Muir rcm...@gmail.com

Re: shingles work in analyzer but not real data

2010-09-01 Thread Robert Muir
uery of foo bar is processed as TokenStream(foo) + TokenStream(bar) so query-time shingling like this doesn't work as you expect for this reason. -- Robert Muir rcm...@gmail.com

  1   2   3   4   >