Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
* 1st question (ls from index directory) solr 1.4 -rw-r--r-- 1 user user2180582 Nov 30 07:26 _3g1_cf.del -rw-r--r-- 1 user user 5190652802 Nov 28 17:57 _3g1.fdt -rw-r--r-- 1 user user 139556724 Nov 28 17:57 _3g1.fdx -rw-r--r-- 1 user user 4963 Nov 28 17:56 _3g1.fnm -rw-r--r-- 1 user us

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu (left side of chart). at the begining of chart there was about 60rps and about 100rps (before turning off solr 3.5). Then there was 1.4 turned on with 100rps. -- Pawel On Wed, Nov 30, 2011 at 9:07 AM, Pawel Rog wrote: > * 1st

Re: Splitting Words but retaining offsets

2011-11-30 Thread lboutros
I think this is what you are looking for : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Splitting-Words-but-retaining-offsets-tp3546104p3547977.html Sent fr

Re: how to apply fuzzy search with slop

2011-11-30 Thread vrpar...@gmail.com
Thanks Erick, i have download ComplexPhraseQueryParser from your give link, apply maven package to create jar file and add it to WEB-INF/lib folder and generate war file and deploy to jboss server also i added QueryParser into solrconfig.xml file, now when i do normal search, it works fine but

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
I made thread dump. Most active threads have such trace: "471003383@qtp-536357250-245" - Thread t@270 java.lang.Thread.State: RUNNABLE at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:702) at org.apache.solr.search.SolrIndexSearcher.getDocListC(Solr

Re: Don't snowball depending on terms

2011-11-30 Thread Rob Brown
I guess I could do a bit of pre-processing, look for any words that are quoted, and search in a diff field for those How is a query like this formulated? q=unstemmed:perl or java&q=stemmed:manager -- IntelCompute Web Design and Online Marketing http://www.intelcompute.com -Original Mes

Re: how to apply fuzzy search with slop

2011-11-30 Thread Erick Erickson
I have no idea whether it will work with 1.4, although I haven't looked at the underlying code. I actually doubt it. There's an entry in newer solrconfig.xml files "luceneMatchVersion" that is referenced by that code for that just doesn't exist in the 1.4 code frame. I strongly recommend you upgra

Re: Don't snowball depending on terms

2011-11-30 Thread Erick Erickson
You can't have multiple "q" clauses (as opposed to "fq" clauses). You could form something like q=unstemmed:perl or java&fq=stemmed:manager or q=+(unstemmed:perl or java) +stemmed:manager BTW, this fragment of the query probably doesn't do what you expect: unstemmed:perl or java would be parsed as

Re: make fuzzy search for phrase

2011-11-30 Thread meghana
I installed ComplexPhraseQueryParser as suggested by you from https://issues.apache.org/jira/browse/SOLR-1604 by adding latest version of it , i am getting error HTTP Status 500 - luceneMatchVersion java.lang.NoSuchFieldError: luceneM

Terms Component with documents marked for deletion

2011-11-30 Thread qwamci
I have been playing around with Terms Component in solr and hit a situation i do not understand. When indexing documents and then updating them the termscomponent does not always have the correct count. In specific when updating a document, the termscomponent keeps a track of the former version of

Re: Don't snowball depending on terms

2011-11-30 Thread Robert Brown
Boosts can be included there too can't they? so this is valid? q=+(stemmed^2:perl or stemmed^3:java) +unstemmed^5:"development manager" is it possible to have different boosts on the same field btw? We currently search across 5 fields anyway, so my queries are gonna start getting messy. :-/

Re: Seek past EOF

2011-11-30 Thread Ruben Chadien
Happened again…. I got 3 directories in my index dir 4096 Nov 4 09:31 index.2004083156 4096 Nov 21 10:04 index.2021090440 4096 Nov 30 14:55 index.2029024919 as you can se the first two are old and also empty , the last one from today is and containing 9 files none of the are 0 size

Re: Don't snowball depending on terms

2011-11-30 Thread Erick Erickson
First, watch the syntax q=+(stemmed:perl^2 or stemmed:java^3) +unstemmed:"development manager"^5 although it is a bit confusing to see the dismax stuff where the boost is put on the field name, but that's not how the queries are formed. BTW, have you looked at edismax queries? You can distri

Re: Don't snowball depending on terms

2011-11-30 Thread Robert Brown
Thanks Erick, This is a required feature since we're swapping out an existing search engine for Solr - users have saved searches that need to behave the same. I'll look into the edismax stuff, that's the handler we're using anyway. --- IntelCompute Web Design & Local Online Marketing http://

Leaving certain tokens intact during indexing

2011-11-30 Thread Marian Steinbach
I have documents containing tokens of a certain format in arbitrary positions, like this: ... blah blahblah AB/1234/5678 blah blah blahblah ... I would like to enable "usual" keyword searching within these documents. In addition, I'd also like to enable users to find "AB/1234/5678", ideally w

Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
I have documents containing tokens of a certain format in arbitrary positions, like this: ... blah blahblah AB/1234/5678 blah blah blahblah ... I would like to enable "usual" keyword searching within these documents. In addition, I'd also like to enable users to find "AB/1234/5678", ideally w

Re: Don't snowball depending on terms

2011-11-30 Thread Erick Erickson
Ahhh, I hate making a new implementation match all of the old behavior, but sometimes ya' just got no choice. I *swear* that there's a JIRA with an approach to creating a filter for this situation, but I can't find it Best Erick On Wed, Nov 30, 2011 at 9:19 AM, Robert Brown wrote: > Thanks

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Erick Erickson
There's about a zillion tokenizers, for what you're describing WhitespaceTokenizerFactory is a good candidate. See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for a partial list, and it has links to the authoritative docs. Best Erick On Wed, Nov 30, 2011 at 9:23 AM, Marian Stein

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
Thanks for the quick response! Are you saying that I should extend WhitespaceTokenizerFactory to create my own? Or should I simply use it? Because, I guess tokenizing on spaces wouldn't be enough. I would need tokenizing on slashes in other positions, just not within strings matching ([A-Z]+/[0-9

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Erick Erickson
Well, it depends (tm). No, in your case WhitespaceTokenizer wouldn't work, although it did satisfy your initial statement. You could consider PatternTokenizerFactory, but take a look at the link I provided, and follow it to the javadocs to see if there are better matches. Best Erick On Wed, Nov

Re: Terms Component with documents marked for deletion

2011-11-30 Thread lboutros
Hi, you have to use the 'expungeDeletes' additional parameter: http://wiki.apache.org/solr/UpdateXmlMessages and depending on the version of Solr you are using, you perhaps have to use a merge policy like the LogByteSizeMergePolicy. See : https://issues.apache.org/jira/browse/SOLR-2725 Ludovic

RE: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Steven A Rowe
Hi Marian, Extending the StandardTokenizer(Factory) java class is not the way to go if you want to change its behavior. StandardTokenizer is generated from a JFlex specification, so you would need to modify the specification to include your special slash-containing-word rule

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
That's pretty helpful, thanks! Especially since I didn't understand so far that I could use a filter like PatternReplaceCharFilterFactory both as a charFilter and as a filter. In the meantime I had figured out another alternative, involving WordDelimiterFilterFactory. But I had to use WhitespaceTo

RE: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Steven A Rowe
Note that my example does not actually use PatternReplaceCharFilterFactory twice - the second one is actually a PatternReplaceFilterFactory - note that "Char" isn't present in the second one. CharFilters operate before tokenizers, and regular filters operate after tokenizers. Steve > -Ori

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
Got me right when Solr reported the error on restart :) Thanks! 2011/11/30 Steven A Rowe > Note that my example does not actually use PatternReplaceCharFilterFactory > twice - the second one is actually a PatternReplaceFilterFactory - note > that "Char" isn't present in the second one. > > CharF

mysolr python client

2011-11-30 Thread Marco Martinez
Hi all, For anyone interested, recently I've been using a new Solr client for Python. It's easy and pretty well documented. If you're interested its site is: *http://mysolr.redtuna.org/* * * bye! Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta

Re: InvalidTokenOffsetsException when using MappingCharFilterFactory, DictionaryCompoundWordTokenFilterFactory and Highlighting

2011-11-30 Thread Jay Luker
I am having a similar issue with OffsetExceptions during highlighting. In all of the explanations and bug reports I'm reading there is a mention this is all the result of a problem with HTMLStripCharFilter. But my analysis chains don't (that I'm aware of) make use of HTMLStripCharFilter, so can som

Solr indexing custom fields

2011-11-30 Thread VladislavLysov
Hello!!! I have a question. How do I make sure that when you add a file with a specific field, the index remained not the entire field, but only a part? For example - in the field contains the text " VALUE TEXT FORMAT ", but the index I want to save only the text "TEXT". Thank you. -- View

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Mikhail Khludnev
Hello, I spot the difference in the number of segments (4 vs 14). For me it explains the increased query time, and cpu load, especially because you don't use utilize filters via fq=, only q= in your queries. The first thing you need is make the length of segment chains the same. The first clue is

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Simon Willnauer
I wonder if you have a explicitly configured merge policy? In Solr 1.4 ie. Lucene 2.9 LogMergePolicy was the default but in 3.5 TieredMergePolicy is used by default. This could explain the differences segment wise since from what I understand you are indexing the same data on 1.4 and 3.5? simon O

Re: Seek past EOF

2011-11-30 Thread Simon Willnauer
can you give us some details about what filesystem you are using? simon On Wed, Nov 30, 2011 at 3:07 PM, Ruben Chadien wrote: > Happened again…. > > I got 3 directories in my index dir > > 4096 Nov  4 09:31 index.2004083156 > 4096 Nov 21 10:04 index.2021090440 > 4096 Nov 30 14:55 index.2

Re: when using group=true facet numbers are "incorrect"

2011-11-30 Thread O. Klein
Yonik Seeley-2-2 wrote > > On Mon, Nov 7, 2011 at 8:55 PM, Chris Hostetter > wrote: >> >> : I understand that's a valid thing for faceting to do, I was just >> wondering >> : if there's any way to get it to do the faceting on the groups returned. >> : Otherwise I guess I'll need

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Chris Hostetter
: I tried to use index from 1.4 (load was the same as on index from 3.5) : but there was problem with synchronization with master (invalid : javabin format) : Then I built new index on 3.5 with luceneMatchVersion LUCENE_35 why would you need to re-replicate from the master? You already have a co

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Chris Hostetter
: I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu : (left side of chart). FWIW: The mailing list software filters out most attachments (there are some exceptions for certain text mime types) -Hoss

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
http://imageshack.us/photo/my-images/838/cpuusage.png/ On Wed, Nov 30, 2011 at 9:18 PM, Chris Hostetter wrote: > > : I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu > : (left side of chart). > > FWIW: The mailing list software filters out most attachments (there are > some e

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
On Wed, Nov 30, 2011 at 9:05 PM, Chris Hostetter wrote: > > : I tried to use index from 1.4 (load was the same as on index from 3.5) > : but there was problem with synchronization with master (invalid > : javabin format) > : Then I built new index on 3.5 with luceneMatchVersion LUCENE_35 > > why w

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Darren Govoni
Monitoring this thread make me ask the question of whether there are standardized performance benchmarks for Solr. Such that they are run and published with each new release. This would affirm its performance under known circumstances, with which people can try in their own environments and compa

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Yonik Seeley
On Wed, Nov 30, 2011 at 7:08 AM, Pawel Rog wrote: >        at > org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:702) >        at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1144) >        at > org.apache.solr.search.SolrIndexSearcher.s

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
Yes it works. Thanks a lot. But I stil don't understand why in solr 1.4 that option was efficient but in solr 3.5 not On Wed, Nov 30, 2011 at 11:01 PM, Yonik Seeley wrote: > On Wed, Nov 30, 2011 at 7:08 AM, Pawel Rog wrote: >>        at >> org.apache.solr.search.SolrIndexSearcher.getDocSet(Solr

Blog you might find interesting

2011-11-30 Thread Erick Erickson
At the risk of committing a gaffe, I recently did a blog post about queries and "multi term aware" capabilities newly added to Solr. The short form is that the recurring problem of wildcard queries (and some other types, e.g. range) not automatically lower-casing (or accent folding or a few others)

is there a way using 1.4 index at 4.0 trunk?

2011-11-30 Thread Jason, Kim
Hello, I'm using solr 1.4 version. I want to use some plugin in trunk version. But I got IndexFormatTooOldException when it run old version index at trunk. Is there a way using 1.4 index at 4.0 trunk? Thanks, Jason -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-wa

Re: is there a way using 1.4 index at 4.0 trunk?

2011-11-30 Thread Lance Norskog
No, you will have to upgrade your index. See the wiki for more information. (To my knowledge, you should be able to drop in your 1.4 (.1?) schema.xml and re-index.) On Wed, Nov 30, 2011 at 6:44 PM, Jason, Kim wrote: > Hello, > I'm using solr 1.4 version. > I want to use some plugin in trunk vers