You seem to be desperate to get out of the Solr mailing list :)
Send an email to solr-user-unsubscr...@lucene.apache.org
Cheers
Avlesh
On Fri, Sep 25, 2009 at 11:54 AM, Rafeek Raja wrote:
> Unsubscribe from this mailing-list
>
I am new to the whole highlighting API and have a few basic questions:
I have a "text" type field defined as underneath:
And the schema field is assoc
Hi Grant!
Thanks for the advidce, I added the link to the list.
Regards,
Marian
On Fri, Sep 25, 2009 at 5:14 AM, Grant Ingersoll wrote:
> Hi Marian,
>
> Looks great! Wish I could order some wine. When you get a chance, please
> add the site to http://wiki.apache.org/solr/PublicServers!
>
>
Hello
I can't bring HTMLStripStandardTokenizerFactory to remove the content of the
style tag, as the documentation says it should.
A search for 'mso' returns a document where the search term only appears in the
style tag (it's a word document saved as html). Here is the highlight returned
by s
Hi,
I want to index both the contents of a document/file and metadata associated
with that document. Since I also want to update the content and metadata
indexes independently, I believe that I need to use two separate Solr
documents per real/logical document. The question I have is how do I merg
I got the answer to my question.
The field needs to be "stored" (or "termVector" enabled) for highlighting to
work properly.
Cheers
Avlesh
On Fri, Sep 25, 2009 at 1:01 PM, Avlesh Singh wrote:
> I am new to the whole highlighting API and have a few basic questions:
> I have a "text" type field d
Hi solr addicts,
I know there's no one size fits all set of options for the sun JVM,
but I think It'd be useful to everyone to share your tips on using the
sun JVM with solr.
For instance, I recently figured out that setting the tenured
generation garbage collection to Concurrent mark and sweep (
Can I expect the index to be left in a usable state ofter an out of
memory error during a merge or it it most likely to be corrupt? I'd
really hate to have to start this index build again from square one.
Thanks.
Thanks,
Phil
---
Exception in thread "http-8080-Processor2505"
java.lang.
Hello everybody,
we are using Solr to index some RSS feeds for a news agregator application.
We've got some difficulties with the publication date of each item
because each site use an homemade date format.
The fact is that we want to have the exact amount of time between the
date of publicati
Hi Ken,
I am using the WordDelimiterFilterFactory. I thought I needed it because I
thought that's what gave me the control over the options of how the words are
split and indexed? I did try taking it out completely, but that didn't seem to
help.
I'll try the analysis tool today. There has got t
Hi to all!
Lately my solr servers seem to stop responding once in a while. I'm using
solr 1.3.
Of course I'm having more traffic on the servers.
So I logged the Garbage Collection activity to check if it's because of
that. It seems like 11% of the time the application runs, it is stopped
because of
On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel wrote:
> Hi to all!
> Lately my solr servers seem to stop responding once in a while. I'm using
> solr 1.3.
> Of course I'm having more traffic on the servers.
> So I logged the Garbage Collection activity to check if it's because of
> that. It seems
In case it helps, here's what I have currently, but I've been messing with
different options:
-Original Message-
From: Carr, Adrian [mailto:adrian.c...@jtv.com]
Sent: Friday, September 25, 2009 9:28 AM
To: solr-user@lucene.apache.org
Subject: RE: Alphanumeric Wild Card Search Questio
On Fri, Sep 25, 2009 at 8:20 AM, Phillip Farber wrote:
> Can I expect the index to be left in a usable state ofter an out of memory
> error during a merge or it it most likely to be corrupt?
It should be in the state it was after the last successful commit.
-Yonik
http://www.lucidimagination.co
Are you storing (in addition to indexing) your data? Perhaps you could turn
off storage on data older than 7 days (requires reindexing), thus losing the
ability to return snippets but cutting down on your storage space and server
count. I've experienced 10x decrease in space requirements and a la
Thank you Grant and Lance for your comments -- I've run into a separate snag
which puts this on hold for a bit, but I'll return to finish digging into
this and post my results. - Michael
On Thu, Sep 24, 2009 at 9:23 PM, Lance Norskog wrote:
> Are you on Java 5, 6 or 7? Each release sees some twea
> No- there are various analyzers. StandardAnalyzer is geared toward
> searching bodies of text for interesting words - punctuation is
> ripped out. Other analyzers are more useful for "concrete" text. You
> may have to work at finding one that leaves punctuation in.
>
My problem is not with the
I'm trying to perform a faceted query with the facet field referencing a
field that is not in the schema but matches a dynamicField with its suffix.
The query returns results but for some reason the facet list is always
empty. When I change the facet field to one that is explicitly named in the
On Sep 25, 2009, at 7:30 AM, Jérôme Etévé wrote:
Hi solr addicts,
I know there's no one size fits all set of options for the sun JVM,
but I think It'd be useful to everyone to share your tips on using the
sun JVM with solr.
For instance, I recently figured out that setting the tenured
generat
Also, here is the field definition in the schema
--
View this message in context:
http://www.nabble.com/Faceted-Search-on-Dynamic-Fields--tp25612887p25612936.html
Sent from the Solr - User mailing list archive at Nabble.com.
Right, now I'm giving it 12GB of heap memory.
If I give it less (10GB) it throws the following exception:
Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCache
Hi,
Have you looked at tuning the garbage collection ?
Take a look at the following articles
http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot
-camp-draft/
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
Changing to the concurrent or throughput collector shoul
I've got the start of a Garbage Collection article here:
http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot-camp-draft/
I plan to tie it more into Lucene/Solr and add some more about the
theory/methods in the final version.
With so much RAM, I take it you prob have a han
Is there any value for the "f.my_year_facet.facet.sort" parameter that
will return the facet values in descending order? So far I only see
"index" and "count" as the choices.
http://lucene.apache.org/solr/api/org/apache/solr/common/params/FacetParams.html#FACET_SORT_INDEX
Thanks.
Gerald Sn
> Bigger heaps lead to bigger GC pauses in general.
Opposite viewpoint:
1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second.
To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)
Use -server option.
-server option of JVM is 'native CPU code', I remember WebLogic
Give it even more memory.
Lucene FieldCache is used to store non-tokenized single-value non-boolean
(DocumentId -> FieldValue) pairs, and it is used (in-full!) for instance for
sorting query results.
So that if you have 100,000,000 documents with specific heavily distributed
field values (cardina
You are saying that I should give more memory than 12GB?
When I was with 10GB I had the exceptions that I sent. Switching to 12GB
made them disappear.
So I think I don't have problems with FieldCache any more. What it seems
like a problem is 11% on the application time dedicated to GC. Specially
wh
> You are saying that I should give more memory than 12GB?
Yes. Look at this:
> > SEVERE: java.lang.OutOfMemoryError: Java heap space
>
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3
> 61
> > )
It can't find few (!!!) contiguous bytes for .createValue(...)
It ca
Yes - more RAM is not a solution to your problem.
Jonathan Ariel wrote:
> You are saying that I should give more memory than 12GB?
> When I was with 10GB I had the exceptions that I sent. Switching to 12GB
> made them disappear.
> So I think I don't have problems with FieldCache any more. What it
Faceting, as of now, can only be done of definitive field names. Faceting on
field names matching wildcards (dynamic field being one such scenario) is
yet to be supported. There are lot of open issues, aiming to achieve this.
Find a similar discussion here -
http://www.lucidimagination.com/search/d
I would look at the JVM. Have you tried switching to the concurrent low
pause collector ?
Colin.
-Original Message-
From: Jonathan Ariel [mailto:ionat...@gmail.com]
Sent: Friday, September 25, 2009 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr and Garbage Collection
You ar
I can't really understand how increasing the heap will decrease the
11% dedicated to GC
On 9/25/09, Fuad Efendi wrote:
>> You are saying that I should give more memory than 12GB?
>
>
> Yes. Look at this:
>
>> > SEVERE: java.lang.OutOfMemoryError: Java heap space
>>
> org.apache.lucene.search.Fiel
BTW why making them equal will lower the frequency of GC?
On 9/25/09, Fuad Efendi wrote:
>> Bigger heaps lead to bigger GC pauses in general.
>
> Opposite viewpoint:
> 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second.
>
> To lower frequency of GC: -Xms4096m -Xmx4096m (ma
It won't really - it will just keep the JVM from wasting time resizing
the heap on you. Since you know you need so much RAM anyway, no reason
not to just pin it at what you need.
Not going to help you much with GC though.
Jonathan Ariel wrote:
> BTW why making them equal will lower the frequency o
On Fri, Sep 25, 2009 at 12:19 PM, Avlesh Singh wrote:
> Faceting, as of now, can only be done of definitive field names.
To further clarify, the fields you can facet on can include those
defined by dynamic fields. You just must specify the exact field name
when you facet.
Did you reall
>-server option of JVM is 'native CPU code', I remember WebLogic 7 console
>with SUN JVM 1.3 not showing any GC (just horizontal line).
Not sure what that is all about either. -server and -client are just two
different versions of hotspot.
The -server version is optimized for long running applicat
30ms is not better or worse than 1s until you look at the service
requirements. For many applications, it is worth dedicating 10% of your
processing time to GC if that makes the worst-case pause short.
On the other hand, my experience with the IBM JVM was that the maximum query
rate was 2-3X bette
markrmiller wrote:
>
> michael8 wrote:
>> Hi,
>>
>> I know Solr 1.4 is going to be released any day now pending Lucene 2.9
>> release. Is there anywhere where one can download a pre-released nighly
>> build of Solr 1.4 just for getting familiar with new features (e.g. field
>> collapsing)?
>>
Walter Underwood wrote:
> 30ms is not better or worse than 1s until you look at the service
> requirements. For many applications, it is worth dedicating 10% of your
> processing time to GC if that makes the worst-case pause short.
>
> On the other hand, my experience with the IBM JVM was that the
michael8 wrote:
>
> markrmiller wrote:
>
>> michael8 wrote:
>>
>>> Hi,
>>>
>>> I know Solr 1.4 is going to be released any day now pending Lucene 2.9
>>> release. Is there anywhere where one can download a pre-released nighly
>>> build of Solr 1.4 just for getting familiar with new feature
hello *, i read on the wiki about using recip(rord(...)...) to boost
recent documents with a date field, does anyone have a good function
for doing something similar with unix timestamps?
if not, is there a lot of overhead related to counting the number of
distinct values for rord() ?
thx much
As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low
pause" collector is only in the Sun JVM.
I just found this excellent article about the various IBM GC options for a
Lucene application with a 100GB heap:
http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for
Ok. I will try with the "concurrent low pause" collector and let you know
the results.
On Fri, Sep 25, 2009 at 2:23 PM, Walter Underwood wrote:
> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low
> pause" collector is only in the Sun JVM.
>
> I just found this excellent arti
My bad - later, it looks as if your giving general advice, and thats
what I took issue with.
Any Collector that is not doing generational collection is essentially
from the dark ages and shouldn't be used.
Any Collector that doesn't have concurrent options, unless possibly your
running a tiny app
Y'all,
We're down to 8 open issues:
https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230&versionId=12313351&showOpenIssuesOnly=true
2 are packaging related, one is dependent on the official 2.9 release
(so should be taken care of today or tomorrow I suspect) and then we
hav
For batch-oriented computing, like Hadoop, the most efficient GC is probably
a non-concurrent, non-generational GC. I doubt that there are many
batch-oriented applications of Solr, though.
The rest of the advice is intended to be general and it sounds like we agree
about sizing. If the nursery is
Hi
I am trying to use a custom transformer that extends
org.apache.solr.handler.dataimport.Transformer.
I have the CustomTransformer.jar and DataImportHandler.jar in
JBOSS/server/default/lib. I have the solr.war (as is from the distro) in
the JBOSS/server/default/deploy.
org.apache.solr.handler
Walter Underwood wrote:
> For batch-oriented computing, like Hadoop, the most efficient GC is probably
> a non-concurrent, non-generational GC.
Okay - for batch we somewhat agree I guess - if you can stand any length
of pausing, non concurrent can be nice, because you don't pay for thread
sync com
This all applies to having more than once processor though - if you have
one processor, than non concurrent can also make sense.
But especially with the young space, you want concurrency - with upto
98% of objects being short lived, and multiple threads generally
creating new objects, its a huge b
Faud, you didn't read the thread right.
He is not having a problem with OOM. He got the OOM because he lowered
the heap to try and help GC.
He normally runs with a heap that can handle his FC.
Please re-read the thread. You are confusing the tread.
- Mark
Fuad Efendi wrote:
> Guys, thanks for
> He is not having a problem with OOM. He got the OOM because he lowered
> the heap to try and help GC.
That is very confusing!!!
Lowering heap helps GC? Someone mentioned it in this thread, but my
viewpoint is completely opposite.
1. Some RAM is needed to_be_reserved for FieldCache (it will be
Guys, thanks for GC discussion; but the root of a problem is FieldCache
internals.
Not enough RAM for FieldCache will cause unpredictable OOM, and it does not
depend on GC. How much RAM FieldCache needs in case of 2 different
values for a Field, 200 bytes each (Unicode), and 100M documents? Wh
Mark,
what if piece of code needs 10 contiguous Kb to load a document field? How
locked memory pieces are optimized/moved (putting on hold almost whole
application)?
Lowering heap is _bad_ idea; we will have extremely frequent GC (optimize of
live objects!!!) even if RAM is (theoretically) enough.
Hello all,
We are using the patch from SOLR-64 (http://issues.apache.org/jira/browse/SOLR-64
) to implement hierarchical facets for categories. We are trying to
use the facet.prefix to prevent all categories from coming back.
However, f.category.facet.prefix doesn't work. Using facet.prefix
I'm not planning on lowering the heap. I just want to lower the time
"wasted" on GC, which is 11% right now.So what I'll try is changing the GC
to -XX:+UseConcMarkSweepGC
On Fri, Sep 25, 2009 at 4:17 PM, Fuad Efendi wrote:
> Mark,
>
> what if piece of code needs 10 contiguous Kb to load a docume
Sorry for the long delay in responding, but I've just gotten back to
this problem...
I got the solr 1.4 nightly and the problem went away, so I guess it is a
solr 1.3 bug.
Thanks for all the input!
Lance Norskog wrote:
Paul, can you create an HTTP url that does this exact query? With
multip
But again, GC is not just "Garbage Collection" as many in this thread
think... it is also "memory defragmentation" which is much costly than
"collection" just because it needs move somewhere _live_objects_ (and
wait/lock till such objects get unlocked to be moved...) - obviously more
memory helps..
On Fri, Sep 25, 2009 at 2:52 PM, Fuad Efendi wrote:
> Lowering heap helps GC?
Yes. In general, lowering the heap can help or hurt.
Hurt: if one is running very low on memory, GC will be working harder
all of the time trying to find more memory and the % of time that GC
takes can go up.
Help: i
Maybe what's missing here is how did I get the 11%.I just ran solr with the
following JVM params: -XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime with that I can measure the amount of
time the application run between collection pauses and the length of the
collection pauses
Hi Michael,
We are storing all our data in addition to index, as we need to display those
values to the user. So unfortunately we cannot go with the option stored=false,
which could have potentially solved our issue.
Appreciate any other pointers/suggestions
Thanks,
sS
--- On Fri, 9/25/09, Mi
When we talk about Collectors, we are not just talking about
"collecting" - whatever that means. There isn't really a "collecting"
phase - the whole algorithm is garbage collecting - hence calling the
different implementations "collectors".
Usually, fragmentation is dealt with using a mark-compact
I already have a handful of solr instances running . However, I'm
trying to install solr (1.4) on a new linux server with tomcat using a
context file (same way I usually do):
However it throws an exception due to the following:
SEVERE: Could not start SOLR. Check solr/home propert
Ok. I'll first change the GC and see if the time spent decreased. Than
I'll try increasing the heap as Fuad recommends.
On 9/25/09, Mark Miller wrote:
> When we talk about Collectors, we are not just talking about
> "collecting" - whatever that means. There isn't really a "collecting"
> phase - t
>> or IBM has used a mark-sweep-compact collector
Never mind - Sun's is also sometimes referred to as mark-sweep-compact.
I've just seen it referred to as mark-compact before as well. In either
case though, without some sort of sweep phase, there is no reclamation
of memory :)
It's interesting th
On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:
Hi to all!
Lately my solr servers seem to stop responding once in a while. I'm
using
solr 1.3.
Of course I'm having more traffic on the servers.
So I logged the Garbage Collection activity to check if it's because
of
that. It seems like 11
> Usually, fragmentation is dealt with using a mark-compact collector (or
> IBM has used a mark-sweep-compact collector).
> Copying collectors are not only super efficient at collecting young
> spaces, but they are also great for fragmentation - when you copy
> everything to the new space, you can
Jonathan Ariel wrote:
> How can I check which is the GC that it is being used? If I'm right JVM
> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have
> any recommendation on this?
>
>
Just to straighten out this one too - Ergonomics doesn't use throughput
- throughput is
Mark Miller wrote:
> Jonathan Ariel wrote:
>
>> How can I check which is the GC that it is being used? If I'm right JVM
>> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have
>> any recommendation on this?
>>
>>
>>
> Just to straighten out this one too - Ergonomic
Thats a good point too - if you can reduce your need for such a large
heap, by all means, do so.
However, considering you already need at least 10GB or you get OOM, you
have a long way to go with that approach. Good luck :)
How many docs do you have ? I'm guessing its mostly FieldCache type
stuff
One more point and I'll stop - I've hit my email quota for the day ;)
While its a pain to have to juggle GC params and tune - when you require
a heap thats more than a gig or two, I personally believe its essential
to do so for good performance. The (default settings / ergonomics with
throughput)
I have around 8M documents.
I set up my server to use a different collector and it seems like it
decreased from 11% to 4%, of course I need to wait a bit more because it is
just a 1 hour old log. But it seems like it is much better now.
I will tell you on Monday the results :)
On Fri, Sep 25, 2009
Sorry for OFF-topic:
Create dummy "Hello, World!" JSP, use Tomcat, execute load-stress
simulator(s) from separate machine(s), and measure... don't forget to
allocate necessary thread pools in Tomcat (if you have to)...
Although such JSP doesn't use any memory, you will see how easy one can go
with
Can you give a small test file that demonstrates the problem?
-Yonik
http://www.lucidimagination.com
On Fri, Sep 25, 2009 at 5:34 AM, Kundig, Andreas
wrote:
> Hello
>
> I can't bring HTMLStripStandardTokenizerFactory to remove the content of the
> style tag, as the documentation says it shoul
Your indexing project is disk-bound. My modern midrange laptop gets
30MB/s doing "cat > /dev/null" (1 7200rpm disk). The Amazon instances
I'm playing with get 50-60 (I really want to know how it fits
together). Your laptop might be 10-20?
On Thu, Sep 24, 2009 at 11:54 PM, Constantijn Visinescu
wr
Have you seen this? It is another Solr/Typeo3 integration project.
http://forge.typo3.org/projects/show/extension-solr
Would you consider open-sourcing your Solr/Typo3 integration?
On Fri, Sep 25, 2009 at 1:18 AM, Marian Steinbach
wrote:
> Hi Grant!
>
> Thanks for the advidce, I added the link
Hello,
It looks like solr is not allowing me to change the default
MergePolicy/Scheduler classes.
Even if I change the default MergePolicy/
Scheduler(LogByteSizeMErgePolicy and ConcurrentMergeScheduler) defined
in solrconfig.xml to a different one (LogDocMergePolicy and
SerialMergeSchedu
The DisMax parser essentially creates a set of queries against
different fields. These queries are analyzed as per each field.
I think this what you are talking about- "The" in a movie title is
diffferent from "the" in the movie description. Would you expect "The
Sound Of Music" to fetch every mov
Hi Nasseam,
I think per field parameter for facet.prefix should be worked
on hierarchical facet fields by briefly looking at the patch.
And I can get same facet results by:
&facet=on&facet.field=hiefacet&facet.prefix=A/B/
and
&facet=on&facet.field=hiefacet&f.hiefacet.facet.prefix=A/B/
when us
78 matches
Mail list logo