I have now optimized the index - down to 325mb, it compresses down to 20mb.
I think the new replication thing in solr is great, but if it could compress
the files it's sending, it would be an awful lot more useful when replicating,
as we are, between sites.
---
Hi,
Is the new replication feature based on HTTP requests between sites ?
If yes, then I guess it might be possible to configure an HTTP server
with mod_deflate so the data is compressed on the fly.
C.
Simon Collins wrote:
I have now optimized the index - down to 325mb, it compresses down to
Hi,
I'm trying as well to stress test solr. I would love some advice to manage
it properly.
I'm using solr 1.3 and tomcat55.
Thanks a lot,
zqzuk wrote:
>
> Hi, I am doing a stress testing of my solr application to see how many
> concurrent requests it can handle and how long it takes. But I m
open a JIRA issue. we will use a gzip on both ends of the pipe . On
the slave side you can say
true
as an extra option to compress and send data from server
--Noble
On Wed, Oct 29, 2008 at 3:06 PM, Simon Collins
<[EMAIL PROTECTED]> wrote:
> I have now optimized the index - down to 325mb, it com
Hi,
try to firstly have a look at http://wiki.apache.org/solr/SolrCaching the
section on firstsearcher and warming. Search engines rely on caching, so
first searches will be slow. I think to be fair testing it is necessary to
warm up the search engine by sending most frequently used and/or most
Hi!
Out of curiosity: How would one implement "search by example" with Solr?
What I mean:
Say I have a result entry with these fields/attributes:
id: 1
title: blue big slow car
color: blue
size: 30
maxspeed: 100
make: buses inc.
What would I have to do in order to find "similar" items? Do a se
Hi,
This can be done with 'more like this' functionality in Solr:
http://wiki.apache.org/solr/MoreLikeThis
Bye,
Jaco.
2008/10/29 Marian Steinbach <[EMAIL PROTECTED]>
> Hi!
>
> Out of curiosity: How would one implement "search by example" with Solr?
>
> What I mean:
>
> Say I have a result ent
Hi
Now I am using SOLR and two different type of data indexed and
searched.For ex:
1) JobRec
2) JobSel
I stored the data by specify type:JobRec similarly I specify
type:JobSel while indexing .If I want to retrieve the data i will get by
querying with type:job rec.
This is perfect
Do keep in mind that compression is a CPU intensive process so it is a trade
off between CPU utilization and network bandwidth. I have see cases where
compressing the data before a network transfer ended up being slower than
without compression because the cost of compression and un-compression wa
Hello,
I'm doing some expirements with the morelikethis functionality using the
standard request handler to see if it also works with distributed search (I
saw that it will not yet work with the MoreLikeThis handler,
https://issues.apache.org/jira/browse/SOLR-788). As far as I can see, this
also d
Dont get really with httpstone, I did my ant dist it works fine,
but then don't get ... when I do a java -jar I've an error:
~/httpstone-read-only/dist/lib$ java -jar httpstone.jar
Failed to load Main-Class manifest attribute from httpstone.jar
Any idea? Sorry I'm new in java
zqzuk wrote:
>
Why invent something when compression is standard in HTTP? --wunder
On 10/29/08 4:35 AM, "Noble Paul നോബിള് नोब्ळ्" <[EMAIL PROTECTED]>
wrote:
> open a JIRA issue. we will use a gzip on both ends of the pipe . On
the slave
> side you can say
true
as an extra option to compress and
> send data fr
Awesome! Thanks for the pointer, I will check this out.
Marian
On Wed, Oct 29, 2008 at 1:52 PM, Jaco wrote:
> Hi,
>
> This can be done with 'more like this' functionality in Solr:
> http://wiki.apache.org/solr/MoreLikeThis
: As far as our application goes, Commits and reads are done to the index
: during the normal business hours. However, we observed the max warmers
: error happening during a nightly job when the only operation is 4
: parallel threads commits data to index and Optimizes it finally. We
: increas
just a question about your httpstone's configuration ?
I would like to know how did you simulate several word search ... ??
Did you create a lot of different workers with lof of different word search
?
Thanks,
zqzuk wrote:
>
> Hi,
>
> try to firstly have a look at http://wiki.apache.o
I think you may be right i've opened SOLR-830
: We may have identified the root cause but wanted to run it by the community.
: We figure there is a bug in the snappuller shell script, line 181:
-Hoss
Hi,
I'm doing the following query:
q=text:abc AND type:typeA
And I ask to return highlighting (query.setHighlight(true);). The search
term for field "type" (typeA) is also highlighted in the "text" field.
Anyway to avoid this ?
Thanks
Christophe
: 1) solr-core artifact contains org.apache.solr.client.solrj packages, and at
: the same time, the solr-core artifact depends on the solr-solrj artifact.
what you are seeing isn't specific to the maven jars, that's the way it is
in hte standard release.
i believe the inclusion of solrj code in
Yes, I created different workers and with same or different queries.
sorry it has been a while since I used it, and the link to its source code
(no jars) should be here:
http://code.google.com/p/httpstone/source/checkout
sunnyfr wrote:
>
> just a question about your httpstone's configuration
On Wed, Oct 29, 2008 at 9:11 PM, Chris Hostetter
<[EMAIL PROTECTED]>wrote:
>
> i believe the inclusion of solrj code in the core jar is intentional, the
> core jar is intended (as i understand it) to encapsulate everything needed
> to run "Solr" (and because of the built in distributed search feat
Depends on your use cases. Having things in one index will generally
make things easier in the long run, and generally shouldn't be a
bottleneck. However, if the two types will be treated very differently
it may make sense to have two cores - say one type is not changed very
often, while the ot
: > On the main lucene web page: http://lucene.apache.org/index.html
: > There is a list of news items spanning all the lucene subprojects. Does
FYI: that news section is just a manually maintained list of items as
regular forrest content (forrest is the tool used to generate the site and
buil
christophe wrote:
Hi,
I'm doing the following query:
q=text:abc AND type:typeA
And I ask to return highlighting (query.setHighlight(true);). The
search term for field "type" (typeA) is also highlighted in the "text"
field.
Anyway to avoid this ?
Thanks
Christophe
I havn't used solrj really,
: > I'm not sure if there's any reason for solr-core to declare a maven
: > dependency on solr-solrj.
: When creating the POMs, I had (incorrectly) assumed that the core jar does
: not contain SolrJ classes, hence the dependency.
I consider it a totally justifiable assumption. the current packa
we are not doing anything non-standard
GZipInputStream/GZipOutputStream are standards. But asking users to
setup an extra apache is not fair if we can manage it with say 5 lines
of code
On Wed, Oct 29, 2008 at 7:44 PM, Walter Underwood
<[EMAIL PROTECTED]> wrote:
> Why invent something when compres
You propose to do compressed transfers over HTTP ignoring the standard
support for compressed transfers in HTTP. Programming that with a
library doesn't make it "standard".
In Ultraseek, we implemented index synchronization over HTTP with
compression. It wasn't that hard.
I doubt that compression
: I want to partition my index based on category information. Also, while
: indexing I want to store particular category data to corresponding index
: partition. In the same way I need to search for category information on
: corresponding partition.. I found some information on wiki link
: h
I am getting this error quite frequently on my Solr installation:
SEVERE: org.apache.solr.common.SolrException: Error opening new
searcher. exceeded limit of maxWarmingSearchers=8, try again later.
I've done some googling but the common explanation of it being related
to autocommit doesn't a
Have you looked at how long your warm up is taking?
If it's taking longer to warm up a searcher then it does for you to do
an update, you will be behind the curve and eventually run into this no
matter how big that number.
-Original Message-
From: news [mailto:[EMAIL PROTECTED] On Behalf
I'm having the same issue.. have you had any progress with this?
--
View this message in context:
http://www.nabble.com/Error-in-Integrating-JBoss-4.2-and-Solr-1.3.0%3A-tp20202032p20234054.html
Sent from the Solr - User mailing list archive at Nabble.com.
> I'm doing the following query:
> q=text:abc AND type:typeA
> And I ask to return highlighting (query.setHighlight(true);). The search
> term for field "type" (typeA) is also highlighted in the "text" field.
> Anyway to avoid this ?
Use setHighlightRequireFieldMatch(true) on the query object [1]
I was just looking at Mark Miller's Qsol parser for Lucene (
http://www.myhardshadow.com/qsol.php), and my users would really like to
have a similar ability to combine proximity and boolean search in arbitrary,
nested ways. The simplest use case I'm interested in is "phrase proximity",
where you sa
Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine.
Fairly simple schema -- no large text fields, standard request
handler. 4 small facet fields.
The index is an event log -- a primary search/retrieval requirement is
date range queries.
A simple query without a date rang
Do you need to search down to the minutes and seconds level? If searching by
date provides sufficient granularity, for instance, you can normalize all
the time-of-day portions of the timestamps to midnight while indexing. (So
index any event happening on Oct 01, 2008 as 2008-10-01T00:00:00Z.) That
Well, no - we don't care so much about the seconds, but hours &
minutes are indeed crucial.
---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]
On Oct 29, 2008, at 4:41 PM, Chris Harris wrote:
Do you need to search down to the minutes and seconds
Feak, Todd wrote:
Have you looked at how long your warm up is taking?
If it's taking longer to warm up a searcher then it does for you to do
an update, you will be behind the curve and eventually run into this no
matter how big that number.
Most of them say warmupTime=0. It ranges from 0 to
It strikes me that removing just the seconds could very well reduce
overhead to 1/60 of original. 30 second query turns into 500ms query.
Just a swag though.
-Todd
-Original Message-
From: Alok Dhir [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 29, 2008 1:48 PM
To: solr-user@lucene.
My understanding of Noble's comment (and i could be wrong, i'm reading
between the lines) is that if you specify the new setting he's suggesting
when initializing the replication handler on the slave, then the slave
should start using an "Accept-Encoding: gzip" header when querying the
master,
Chris Harris wrote:
I was just looking at Mark Miller's Qsol parser for Lucene (
http://www.myhardshadow.com/qsol.php), and my users would really like to
have a similar ability to combine proximity and boolean search in arbitrary,
nested ways. The simplest use case I'm interested in is "phrase pr
: The doc of HashDocSet says "t can be a better choice if there are few
: docs in the set" . What does 'few' means in this context ?
it's relative the total size of your index. if you have a million docs,
but you are dealing with DocSets that are only going to contain 10 docs,
then both the m
: Tomcat is using about 98mb memory, mysql is about 500mb. Tomcat
: completely freezes up - can't do anything other than restart the
: service.
a thread dump from the jvm running tomcat would probably be helpful in
figuring out what's going on
: timing out well before getting to the commit. As
I've also seen the suggestion (more from a pure Lucene perspective) of
breaking
apart your dates. Remember that the time/space issues are due to the number
of
terms. So it's possible (although I haven't tried it) to, index many fewer
distinct
terms. e.g. break your dates into some number of fields,
I saw a similar subject posted earlier. This is not a continuation of that
thread, but the problem is similar. I have a large, fast, dedicated machine,
that despite boosting various parameters in solrconfig.xml (attached) and in
the JVM, utilizes at most 10% of the cpu while importing: (from t
On Wed, Oct 29, 2008 at 9:48 PM, Barnett, Jeffrey
<[EMAIL PROTECTED]> wrote:
> Reported import rates start a 70 docs per second, and decrease as more
> records are added.
It might just be segment merges (that takes more time as segments grow in size).
>From the solrconfig.xml I see you have autoc
Hoss,
You are partially right. Instead of the HTTP header , we use a request
parameter. (RequestHandlers cannot read HTP headers). If the param is
present it wraps the response in an zip outputstream. It is configured
in the slave because Every slave may not want compression. . Slaves
which are nea
: You are partially right. Instead of the HTTP header , we use a request
: parameter. (RequestHandlers cannot read HTP headers). If the param is
hmmm, i'm with walter: we shouldn't invent new mechanisms for
clients to request compression over HTTP from servers.
replicatoin is both special enoug
I thought it was turned off already. ( Lucene vs Solr ?) Where do I make this
change?
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Wednesday, October 29, 2008 11:28 PM
To: solr-user@lucene.apache.org
Subject: Re: where's the bottlen
On Thu, Oct 30, 2008 at 2:46 AM, Jon Drukman <[EMAIL PROTECTED]> wrote:
>
> Most of them say warmupTime=0. It ranges from 0 to 37. I hope that is
> msec and not seconds!!
>
Correct, that is in milliseconds.
--
Regards,
Shalin Shekhar Mangar.
Hi,
I've been running solr 1.3 on an ec2 instance for a couple of weeks and I've
had some stability issues. It seems like I need to bounce the app once a day.
That I could live with and ultimately maybe troubleshoot, but what's more
disturbing is that three times in the last 2 weeks my index ha
49 matches
Mail list logo