Thank you again.
Unfortunately the index files will not fit in the RAM.I have to try using
document cache. I am also moving my index to SSD again, we took our index
off when fusion IO cards failed twice during indexing and index was
corrupted.Now with the bios upgrade and new driver, it is suppose
Thanks, Kai!
About removing non-nouns: the OpenNLP patch includes two simple
TokenFilters for manipulating terms with payloads. The
FilterPayloadFilter lets you keep or remove terms with given payloads.
In the demo schema.xml, there is an example type that keeps only
nouns&verbs.
There is a
It is possible to do this with IP Multicast. The query goes out on the
multicast and all query servers read it. The servers wait for a random
amount of time, then transmit the answer. Here's the trick: it's
multicast. All of the query servers listen to each other's responses,
and drop out when
For what it's worth, Google has done some pretty interesting research into
coping with the idea that particular shards might very well be busy doing
something else when your query comes in.
Check out this slide deck: http://research.google.com/people/jeff/latency.html
Lots of interesting ideas,
On 1/31/2013 3:21 PM, Michael Della Bitta wrote:
I do notice that it seems like the version of Jetty that ships with
Solr isn't the preferred one according to the wiki, so that would be
an extra dependency for a config management system like Chef.
Near as I can tell, the versions of jetty that
Thanks for confirming my suspicions, the custom
TokenLengthMarkerFilterFactory sounds like the best approach for doing this.
On Thu, Jan 31, 2013 at 5:12 PM, Jan Høydahl wrote:
> Hi,
>
> I believe each stemmer implementation decides that themselves. At least
> the MinimalNorwegianStemmer has a
That's surprising to me, mostly because a number of the Solr wiki
pages don't really make that strong of a case for it:
http://wiki.apache.org/solr/SolrInstall
http://wiki.apache.org/solr/SolrTomcat
http://wiki.apache.org/solr/SolrJetty
Would it make sense to spell that out somewhere?
I do notic
Hi,
I believe each stemmer implementation decides that themselves. At least the
MinimalNorwegianStemmer has a built-in logic which stems certain suffixes only
if the token is >N chars.
If you want external control, you can look at
http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemmi
On Jan 31, 2013, at 10:15 AM, Michael Della Bitta
wrote:
> I'd really like some confirmation from the devs that there really is a
> blessed status for a given container that provides advantages over
> others.
IMO: jetty is what all of our unit/integration tests are run in, jetty is what
we co
The ping handler is how we tell our load balancers that our Solr cores
are healthy. I guess if you're running more than one core behind the
same balancer, it would make sense to drop a webapp in there that ran
the ping queries for all your cores and only responded OK if they all
came back OK.
Or i
Shawn Heisey [s...@elyograg.org] wrote:
[...]
> If you have a total index size for this JVM of 240GB, then you may not
> have enough RAM to let the OS disk cache work efficiently. For that
> size of index, I would plan on a system with at least 128GB of RAM,
> 256GB would be better.
[...]
> On
Sorry about that - even if I switch the splitBy to "," it still
doesn't work. Here's the corrected unit test:
http://pastie.org/5995399
On Thu, Jan 31, 2013 at 12:30 PM, Dyer, James
wrote:
> In your unit test, you have:
>
> "" +
>
> And also:
>
> runner.update("INSERT INTO test VALUES 1, 'foo,bar
On 1/31/2013 12:47 PM, Mou wrote:
To clarify, the third shard is used to store the recently added/updated
data. Two main big cores take very long to replicate ( when a full
replication is required) so the third one helps us to return the newly
indexed documents quickly. It gets deleted every hour
In your unit test, you have:
"" +
And also:
runner.update("INSERT INTO test VALUES 1, 'foo,bar,baz'");
So you need to decide if you want to delimit with a pipe or a comma.
James Dyer
Ingram Content Group
(615) 213-4311
-Original Message-
From: Christopher Condit [mailto:con...@sdsc.e
I'm having an issue getting the splitBy construct from the regex
transformer to work in a very basic case (with either Solr 3.6 or
4.1).
I have a field defined like this:
The entity is defined like this:
Here's a POM:
http://pastie.org/5992725
A JUnit test case showing the problem:
http:/
Thank you Shawn for reading all of my previous entries and for a detailed
answer.
To clarify, the third shard is used to store the recently added/updated
data. Two main big cores take very long to replicate ( when a full
replication is required) so the third one helps us to return the newly
indexe
On 1/31/2013 1:01 AM, Mou wrote:
I am running solr 3.4 on tomcat 7.
Our index is very big , two cores each 120G. We are searching the slaves
which are replicated every 30 min.
I am using filtercache only and We have more than 90% cache hits. We use
lot of filter queries, queries are usually pr
Thanks for the quick reply. Seems like you are suggesting to add explicitly
AND operator. I don't think this solves my problem.
I found it somewhere, and this
works.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Search-match-all-tokens-in-Query-Text-tp4037758p4037762.h
+text:a +b
-- Jack Krupansky
-Original Message-
From: Bing Hua
Sent: Thursday, January 31, 2013 12:59 PM
To: solr-user@lucene.apache.org
Subject: Search match all tokens in Query Text
Hello,
I have a field text with type text_general here.
When I query for text:a b, so
Hello,
I have a field text with type text_general here.
Thanks Shawn. Actually now that I think about it, Yonik also mentioned
something about lucene number representation once in reply to one of my
questions. Here it is:
Could you also tell me what these `#8;#0;#0;#0;#1; strings represent in the
debug output?
"That's internally how a number is e
UIMA:
I just found this issue https://issues.apache.org/jira/browse/SOLR-3013
Now I am able to use this analyzer for english texts and filter (un)wanted
token types :-)
Open issue -> How to set the ModelFile for the Tagger to
"german/TuebaModel.dat" ???
OpenNLP:
And a mod
>
> We have a Chef regime here, and I've written Tomcat and Solr recipes
> to be played against Ubuntu 12.04 Server.
We do mostly the same: chef to install Tomcat (with configuration
appropriate to Solr), but then instead of deploying Solr via chef, we use
an ant script to package and deploy a wa
jack Thanks for your response..
we have a deal web application.. and having free text search in it . here
free text
means you can type any thing in it..
we have deals of different categories.. and tagged at different
merchant locations..
As per requirement i have to do some tweaks in search
Are you using eDismax? Maybe your ID field is not part of the search fields
or not a high priority. And, just maybe, you are doing a copyField * to
text and the text splits the ID into parts. Enable the debug on your query
and you should be able to figure it out.
Regards,
Alex.
Personal blog:
Thanks for your reply.
No, there is no eviction, yet.
The time is spent mostly on org.apache.solr.handler.component.QueryComponent
to process the request.
Again, the time varies widely for same query.
--
View this message in context:
http://lucene.472066.n3.nabble.com/long-QTime-for-big-inde
Fantastic! Thanks very much.. I will do so accordingly and will let you
know the results.
Thanks again,
Sandeep
On 31 January 2013 13:54, Felipe Lahti wrote:
> So, it depends of your business requirement, right? If a document has
> matches in more searchable fields, at least for me, this docum
> - How often, in your experience, and why, would solr crash?
Not very often. Typically if your heap is too small, you'll end up going OOM.
> - If I kill solr master and slave, usually do I need to also delete the
> indexes? Or everything should be fine upon restarting?
Restarts are fine. Orde
Hello Erick,
Thanks for your answer.
After reading previous subjects on the user list, we had already tried to
change the parameters we mentioned.
- concurrent warming searchers : we have set the maxWarmingSearchers attribute
to 2
2
- we have tried 32 and 64 for the ramBufferSizeMB attribute
On Thu, Jan 31, 2013 at 5:13 AM, Scott Stults
wrote:
> Right now that blessed container is Jetty version 8.1.2.v20120308.
I'd really like some confirmation from the devs that there really is a
blessed status for a given container that provides advantages over
others. From what I understand, Jetty
Hi people,
First of all this forum is a god sent!!!
Second:
I have a master / slave configuration, using replication.
Currently in production I have only one server, there's no backup server
(really...).
The webapplication is a public webapplication, everyone can see it.
- How often, in your
Hi,
I solved the issue by setting up two different virtual network adapters in
ubuntu server.
case closed ;)
thanks for the help!!
--
View this message in context:
http://lucene.472066.n3.nabble.com/setting-up-master-and-slave-in-same-machine-with-diff-ip-s-and-same-port-tp4035795p4037713.h
So, it depends of your business requirement, right? If a document has
matches in more searchable fields, at least for me, this document is more
important than other document that has less matches.
Example:
Put this in your schema:
And create a class in your classpath of your Solr:
package com.y
Hi,
I am stuck trying to index only the nouns of german and english texts.
(very similar to http://wiki.apache.org/solr/OpenNLP#Full_Example)
First try was to use UIMA with the HMMTagger:
/org/apache/uima/desc/AggregateSentenceAE.xml
false
false
albody
I'm really surprised you're hitting OOM errors, I suspect you have
something else pathological in your system. So, I'd start checking things
like
- how many concurrent warming searchers you allow
- How big your indexing RAM is set to (we find very little gain over 128M
BTW).
- Other load on your So
You can also do all this via HTTP commands, see:
http://wiki.apache.org/solr/SolrReplication#HTTP_API
that allows you to control _all_ replication from the master (i.e. tell the
master "don't to any replication") or just tell a slave "don't replicate
any more" as well as a lot of other stuff.
Bes
Is there a way to do an atomic update (inc by 1) and retrieve the updated value
in one operation?
Hi,
So am I correct in thinking that I add the jira myself, if so can I add it do
the 4.2 release? Also I have further questions about the scope of my patch,
should that be left to the comments of the jira itself?
Phil
-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@
Part of this is a rant, part is a plea to others who've run successful
production deployments.
Solr is a second-class citizen when it comes to production deployment. Every
recipe I've seen (RPM, DEB, chef, or puppet) makes assumptions that in one way
or another run afoul of best-practices when
which analyzer are you using to index that field , you can verify that
from schema file .
thanks
On Thu, Jan 31, 2013 at 2:35 PM, b.riez...@pixel-ink.de <
b.riez...@pixel-ink.de> wrote:
> Hi
>
> I have an id wich is a string like this.
> tx-20130130-4599
>
> i'm using a field without processi
Hi list,
I recognized that the result order is FIFO if documents have the same score.
I think this is due to the fact that documents which are indexed later get a
higher
internal document ID and the output for documents with the same score starts
with the lowest internal document ID and raises.
I
It could be a foolish question or concern, but I have no option :-) . We do
have an e-com site where we consuming the feed from the CSE partners and
indexing it in to SOLR for our search. Instead of the traditional
auto-suggest, the predictive search in the header search box recommends the
categori
Does debugQuery=true tell anything useful for these? Like what is the
component taking most of the 30 seconds. Do you have evictions in your solr
caches?
Dmitry
On Thu, Jan 31, 2013 at 10:01 AM, Mou wrote:
> I am running solr 3.4 on tomcat 7.
>
> Our index is very big , two cores each 120G. We
Hello,
After more tests, we could identify our problem in indexation (Solr 4.0.0).
Indeed our problems are OutOfMemoryErrors. Thinking about Zookeeper connection
problems was a mistake. We have thought about this because OOME sometimes
appear in logs after errors on Zookeeper leader election.
I
Hi
I have an id wich is a string like this.
tx-20130130-4599
i'm using a field without processing, wich i got confirmed via the analyser tool
But when i search for that it got split up, so instead of finding that specific
entry with that unique id,
it finds all entries with "tx" in it.
Any idea
Hi,
I am going to upgrade to solr 4.1 from version 3.6, and I want to set up to
shards.
I use ConcurrentUpdateSolrServer to index the documents in solr3.6.
I saw the api CloudSolrServer in 4.1,BUT
1:CloudSolrServer use the LBHttpSolrServer to issue requests,but "*
LBHttpSolrServer should NOT be
Hi
After to study apache solr documentation, I think only way to know
update records (modify, delete an insert actions) is developed a class
extends org.apache.solr.servlet.SolrUpdateServlet.
In this class, I can access updated record information go into Apache
solr server.
Somebody can co
I am running solr 3.4 on tomcat 7.
Our index is very big , two cores each 120G. We are searching the slaves
which are replicated every 30 min.
I am using filtercache only and We have more than 90% cache hits. We use
lot of filter queries, queries are usually pretty big with 10-20 fq
parameters. N
48 matches
Mail list logo