I have not tried it but I would check the option of using the synonymFilter
to duplicate certain query words . Anothe opt - you can detect these word
at index time (eg. UpdateProcessor) to give these documents a document
boost in case it fits your logic. Or even make a copy field that contains a
wh
Right, it works!
I was not aware of this functionality and being able to customize it by
hl.requireFieldMatch param.
Thanks
in this case? Or is highlighting the 10 fields the
> slowdown?
>
> Best,
> Erick
>
>
> On Wed, Jul 30, 2014 at 2:55 AM, Manuel Le Normand <
> manuel.lenorm...@gmail.com> wrote:
>
> > Current I use the classic but I can change my posting format in order to
>
Current I use the classic but I can change my posting format in order to
work with another highlighting component if that leads to any solution
Hello,
I need to expose the search and highlighting capabilities over few tens of
fields. The edismax's qf param makes it possible but the time performances
for searching tens of words over tens of fields is problematic.
I made a copyField (indexed, not stored) for these fields, which gives way
be
7
> >
> > t: @appinions <https://twitter.com/Appinions> | g+:
> > plus.google.com/appinions
> > <
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> > w: appinions.com <http://www.appinions.com/>
> >
>
Hello,
Many of our indexed documents are scanned and OCR'ed documents.
Unfortunately we were not able to improve much the OCR quality (less than
80% word accuracy) for various reasons, a fact which badly hurts the
retrieval quality.
As we use an open-source OCR, we think of changing every scanned
Is the issue SOLR-5478 what you were looking for?
Why wouldn't you take advantage of your use case - the chars belong to
different char classes.
You can index this field to a single solr field (no copyField) and apply an
analysis chain that includes both languages analysis - stopword, stemmers
etc.
As every filter should apply to its' specific la
Hi,
I have a performance and scoring problem for phrase queries
1. Performance - phrase queries involving frequent terms are very slow
due to the reading of large positions posting list.
2. Scoring - I want to control the boost of phrase and entity (in
gazetteers) matches
Indexing all
Hello,
I'm trying to handle a situation with taxonomy search - that is for each
taxonomy I have a list of words with their boosts. These taxonomies are
updated frequently so I retrieve these scored lists at query time from an
external service.
My expectation would be:
q={!some_query_parser}Cities
In short, when running a distributed search every shard runs the query
separately. Each shard's collector returns the topN (rows param) internal
docId's of the matching documents.
These topN docId's are converted to their uniqueKey in the
BinaryResponseWriter and sent to the frontend core (the one
Running solr 4.3, sharded collection. Tomcat 7.0.39
Faceting on multivalue fields works perfectly fine, I was describing this
log to emphasize the fact the servlet failed right after a new searcher was
opened and the event listener finished running a warming faceting query.
Zookeeper client for eclipse is the tool you're looking for. You can edit
directly the clusterstate.
http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper
Another option can be using the delivered zkclient (distributed with solr
4.5 and above) and upload a new cluster
In the last days one of my tomcat servlet, running only a Solr instance,
crushed unexpectedly twice.
Low memory usage, nothing written in the tomcat log, and the last thing
happening in solr log is 'end_commit_flush' followed by 'UnInverted
mutli-valued field' for the fields faceted during the new
In order to set discountOverlaps to true you must have added the
to the schema.xml, which
is commented out by default!
As by default this param is false, the above situation is expected with
correct positioning, as said.
In order to fix the field norms you'd have to reindex with the similarity
c
Robert, you last reply is not accurate.
It's true that the field norms and termVectors are independent. But this
issue of higher norms for this case is expected with well assigned
positions. The LengthNorm is assigned as FieldInvertState.length which is
the count of incrementToken and not num of po
https://issues.apache.org/jira/browse/SOLR-5478
There it goes
On Mon, Nov 18, 2013 at 5:44 PM, Manuel Le Normand <
manuel.lenorm...@gmail.com> wrote:
> Sure, I am out of office till end of week. I reply after i upload the patch
>
Sure, I am out of office till end of week. I reply after i upload the patch
In order to accelerate the BinaryResponseWriter.write we extended this
writer class to implement the docid to id tranformation by docValues (on
memory) with no need to access stored field for id reading nor lazy loading
of fields that also has a cost. That should improve read rate as docValues
are
It's surprising such a query takes a long time, I would assume that after
trying consistently q=*:* you should be getting cache hits and times should
be faster. Try see in the adminUI how do your query/doc cache perform.
Moreover, the query in itself is just asking the first 5000 docs that were
ind
Hi
Any distributed lookup is basically composed of two stages: the first
collecting all the matching documents from every shard and a second which
fetches additional information about specific ids (i.e stored, termVectors).
It can be seen in the logs of each shard (isShard=true), where first
requ
I tried my last proposition, editing the clusterstate.json to add a dummy
frontend shard seems to work. I made sure the ranges were not overlapping.
Doesn't it resolve the solr cloud issue as specified above?
Would adding a dummy shard instead of a dummy collection would resolve the
situation? - e.g. editing clusterstate.json from a zookeeper client and
adding a shard with a 0-range so no docs are routed to this core. This core
would be on a separate server and act as the collection gateway.
nce is the one that does not have its own index and
> is doing merging of the results. Is this the case? If yes, are all 36
> shards always queried?
>
> Dmitry
>
>
> On Mon, Sep 9, 2013 at 10:11 PM, Manuel Le Normand <
> manuel.lenorm...@gmail.com> wrote:
>
>
tell you more.
>
> I'd _really_ try to get more disk space. The amount of engineer time spent
> trying to tune this is way more expensive than a disk...
>
> Best,
> Erick
>
>
> On Sun, Sep 8, 2013 at 11:51 AM, Manuel Le Normand <
> manuel.lenorm...@gmail.c
ter if results merging can be avoided.
>
> Dmitry
>
>
> On Sun, Sep 8, 2013 at 6:56 PM, Manuel Le Normand <
> manuel.lenorm...@gmail.com> wrote:
>
> > Hello all
> > Looking on the 10% slowest queries, I get very bad performances (~60 sec
> > per query).
&
Hello all
Looking on the 10% slowest queries, I get very bad performances (~60 sec
per query).
These queries have lots of conditions on my main field (more than a
hundred), including phrase queries and rows=1000. I do return only id's
though.
I can quite firmly say that this bad performance is due
Hi,
In order to delete part of my index I run a delete by query that intends to
erase 15% of the docs.
I added this params to the solrconfig.xml
2
2
5000.0
10.0
15.0
The extra params were added in order to promote merge of old segments but
with restriction on the transient disk
00 AM, Manuel Le Normand <
manuel.lenorm...@gmail.com> wrote:
> Hello,
> My solr cluster runs on RH Linux with tomcat7 servlet.
> NumOfShards=40, replicationFactor=2, 40 servers each has 2 replicas. Solr
> 4.3
>
> For experimental reasons I splitted my cluster to 2 sub-cluster
Hello,
My solr cluster runs on RH Linux with tomcat7 servlet.
NumOfShards=40, replicationFactor=2, 40 servers each has 2 replicas. Solr
4.3
For experimental reasons I splitted my cluster to 2 sub-clusters, each
containing a single replica of each shard.
When connecting back these sub-clusters the
Hi,
I have a slow storage machine and non sufficient RAM for the whole index to
store all the index. This causes the first queries (~5000) to be very slow
(they are read from disk and my cpu is most of time in iowait), and after
that the readings from the index become very fast and read mainly from
Use the pattern replace filter factory
This will do exactly what you asked for
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceFilterFactory
On Mon, Jul 22, 2013 at 12:22 PM, Scatman wrote:
> Hi,
>
> I was looking for an issue, in order to put some regular
Minfeng- This issue is tougher as the number of shard you have raise, you
can read Erick Erickson's post:
http://grokbase.com/t/lucene/solr-user/131p75p833/how-distributed-queries-works.
If you have 100M docs I guess you are running this issue.
The common way to deal with this issue is by filteri
Great explanation and article.
Yes, this buffer for merges seems very small, and still optimized. Thats
impressive.
Thu, Jul 11, 2013 at 8:36 AM, Manuel Le Normand
> wrote:
> > Hello,
> > As a result of frequent java OOM exceptions, I try to investigate more
into
> > the solr jvm memory heap usage.
> > Please correct me if I am mistaking, this is my understanding of usages
for
>
Hello,
As a result of frequent java OOM exceptions, I try to investigate more into
the solr jvm memory heap usage.
Please correct me if I am mistaking, this is my understanding of usages for
the heap (per replica on a solr instance):
1. Buffers for indexing - bounded by ramBufferSize
2. Solr caches
By field aliasing I meant something like: f.all_fields.qf=*_txt+*_s+*_int
that would sum up to 100 fields
On Wed, Jun 26, 2013 at 12:00 AM, Manuel Le Normand <
manuel.lenorm...@gmail.com> wrote:
> My schema contains about a hundred of fields of various types (int,
> strings, plain
My schema contains about a hundred of fields of various types (int,
strings, plain text, emails).
I was concerned what is the common practice for searching free text over
the index. Assuming there are not boosts related to field matching, these
are the options I see:
1. Index and query a "all_f
st. You are getting OOM because the JVM does not
> have enough memory to build a response with 100K documents.
>
> wunder
>
> On Jun 17, 2013, at 1:57 PM, Manuel Le Normand wrote:
>
> > One of my users requested it, they are less aware of what's allowed and I
> >
N
> processes running in the OS - they all get a slice of the CPU time to
> do their work. Not sure if that answers your question...?
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Mon, Jun 17, 2013 at 4:32 PM, Manuel
would not get the JVM heap flooded (for
example I already have all cashed and my RAM io's are very fast)
On Mon, Jun 17, 2013 at 11:47 PM, Walter Underwood wrote:
> Don't request 100K docs in a single query. Fetch them in smaller batches.
>
> wunder
>
> On Jun 17, 2013,
Hello again,
After a heavy query on my index (returning 100K docs in a single query) my
JVM heap's floods and I get an JAVA OOM exception, and then that my
GCcannot collect anything (GC
overhead limit exceeded) as these memory chunks are not disposable.
I want to afford queries like this, my conc
Hello all,
Assuming I have a single shard with a single core, how do run
multi-threaded queries on Solr 4.x?
Specifically, if one user sends a heavy query (legitimate wildcard query
for 10 sec), what happens to all other users quering during this period?
If the repsonse is that simultaneous queri
Ok! Will check eventually if it's an ACE issue and will upload the stack
trace in case something else is throwing theses exceptions...
Thanks meanwhile
On Mon, May 13, 2013 at 12:11 AM, Shawn Heisey wrote:
> On 5/12/2013 2:37 PM, Manuel Le Normand wrote:
> > The upgrade from 4
Hello,
Since i replicated my shards (i have 2 cores per shard now), I get a
remarkable decrease in qTime. I assume it happens since my memory has to
split between twice more cores than it used to.
In my low qps rate use-case, I use replications as shard backup only (in
case one of my servers goes
Hi there,
Looking at one of my shards (about 1M docs) i see lot of unique terms, more
than 8M which is a significant part of my total term count. These are very
likely useless terms, binaries or other meaningless numbers that come with
few of my docs.
I am totally fine with deleting them so these t
Can happen for various reasons.
Can you recreate the situation, meaning restarting the servlet or server
would start with good qTime and decrease from that point? How fast does
this happen?
Start by monitoring the jvm process, with oracle visualVM for example.
Monitor for frequent garbage collect
On the query side, another down side i see would be that for a given memory
pool, you'd have to share it with more cores because every replica uses
it's own cache.
True for the inner solr caching (JVM's heap) and OS caching as well.
Adding a replicated core creates a new data set (index) that will
Hello,
After creating a distributed collection on several different servers I
sometimes get to deal with failing servers (cores appear "not available" =
grey) or failing cores ("Down / unable to recover" = brown / red).
In case i wish to delete this errorneous collection (through collection
API) on
Hi,
We have different working hours, sorry for the reply delay. Your assumed
numbers are right, about 25-30Kb per doc. giving a total of 15G per shard,
there are two shards per server (+2 slaves that should do no work normally).
An average query has about 30 conditions (OR AND mixed), most of them
as it's a
"response-merge" (CPU resource) bottleneck?
Thanks in advance,
Manu
On Mon, Apr 8, 2013 at 10:19 PM, Shawn Heisey wrote:
> On 4/8/2013 12:19 PM, Manuel Le Normand wrote:
>
>> It seems that sharding my collection to many shards slowed down
>> unre
After taking a look on what I'd wrote earlier, I will try to rephrase in a
clear manner.
It seems that sharding my collection to many shards slowed down
unreasonably, and I'm trying to investigate why.
First, I created "collection1" - 4 shards*replicationFactor=1 collection on
2 servers. Second I
Hello
After performing a benchmark session on small scale i moved to a full scale
on 16 quad core servers.
Observations at small scale gave me excellent qTime (about 150 ms) with up
to 2 servers, showing my searching thread was mainly cpu bounded. My query
set is not faceted.
Growing to full scale
Your question is a typical use-case dependent, the bottleneck will change
from user to user.
These are two main issues that will affect the answer:
1. How do you index: what is your indexing rate (how many docs a days)? how
big is a typical document? how many documents do you plan on indexing in
t
e. . This
> triggers Solr to merge any segments with deletes.
>
> Lastly, I'm not sure about your specific questions related to
> optimizations, but I think it's worth trying the suggestions above and
> avoid optimizations altogether. I'm pretty sure the answer to #1 is n
r with a single
> thread? Because Solr uses multiple threads to search AFAIK.
>
> Best
> Erick
>
>
> On Wed, Feb 20, 2013 at 4:01 AM, Manuel Le Normand <
> manuel.lenorm...@gmail.com > wrote:
>
> > More to it, i do see 75 more threads under the process of tomcat
57 matches
Mail list logo