from:"Manuel Le Normand"

Re: boosting words from specific list

2014-09-30 Thread Manuel Le Normand

I have not tried it but I would check the option of using the synonymFilter to duplicate certain query words . Anothe opt - you can detect these word at index time (eg. UpdateProcessor) to give these documents a document boost in case it fits your logic. Or even make a copy field that contains a wh

Re: Searching and highlighting ten's of fields

2014-07-31 Thread Manuel Le Normand

Right, it works! I was not aware of this functionality and being able to customize it by hl.requireFieldMatch param. Thanks

Re: Searching and highlighting ten's of fields

2014-07-30 Thread Manuel Le Normand

in this case? Or is highlighting the 10 fields the > slowdown? > > Best, > Erick > > > On Wed, Jul 30, 2014 at 2:55 AM, Manuel Le Normand < > manuel.lenorm...@gmail.com> wrote: > > > Current I use the classic but I can change my posting format in order to >

Re: Searching and highlighting ten's of fields

2014-07-30 Thread Manuel Le Normand

Current I use the classic but I can change my posting format in order to work with another highlighting component if that leads to any solution

Searching and highlighting ten's of fields

2014-07-30 Thread Manuel Le Normand

Hello, I need to expose the search and highlighting capabilities over few tens of fields. The edismax's qf param makes it possible but the time performances for searching tens of words over tens of fields is problematic. I made a copyField (indexed, not stored) for these fields, which gives way be

Re: OCR - Saving multi-term position

2014-07-02 Thread Manuel Le Normand

7 > > > > t: @appinions <https://twitter.com/Appinions> | g+: > > plus.google.com/appinions > > < > https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts > > > > w: appinions.com <http://www.appinions.com/> > > >

OCR - Saving multi-term position

2014-07-02 Thread Manuel Le Normand

Hello, Many of our indexed documents are scanned and OCR'ed documents. Unfortunately we were not able to improve much the OCR quality (less than 80% word accuracy) for various reasons, a fact which badly hurts the retrieval quality. As we use an open-source OCR, we think of changing every scanned

Re: Compression vs FieldCache for doc ids retrieval

2014-05-30 Thread Manuel Le Normand

Is the issue SOLR-5478 what you were looking for?

Re: Application of different stemmers / stopword lists within a single field

2014-04-28 Thread Manuel Le Normand

Why wouldn't you take advantage of your use case - the chars belong to different char classes. You can index this field to a single solr field (no copyField) and apply an analysis chain that includes both languages analysis - stopword, stemmers etc. As every filter should apply to its' specific la

Indexing useful N-grams and adding payloads

2014-03-10 Thread Manuel Le Normand

Hi, I have a performance and scoring problem for phrase queries 1. Performance - phrase queries involving frequent terms are very slow due to the reading of large positions posting list. 2. Scoring - I want to control the boost of phrase and entity (in gazetteers) matches Indexing all

Using payloads for expanded query terms

2014-02-18 Thread Manuel Le Normand

Hello, I'm trying to handle a situation with taxonomy search - that is for each taxonomy I have a list of words with their boosts. These taxonomies are updated frequently so I retrieve these scored lists at query time from an external service. My expectation would be: q={!some_query_parser}Cities

Re: Solr 4.6.0: DocValues (distributed search)

2014-01-10 Thread Manuel Le Normand

In short, when running a distributed search every shard runs the query separately. Each shard's collector returns the topN (rows param) internal docId's of the matching documents. These topN docId's are converted to their uniqueKey in the BinaryResponseWriter and sent to the frontend core (the one

Re: Sudden Solr crush after commit

2013-12-12 Thread Manuel Le Normand

Running solr 4.3, sharded collection. Tomcat 7.0.39 Faceting on multivalue fields works perfectly fine, I was describing this log to emphasize the fact the servlet failed right after a new searcher was opened and the event listener finished running a warming faceting query.

Re: Updating shard range in Zookeeper

2013-12-12 Thread Manuel Le Normand

Zookeeper client for eclipse is the tool you're looking for. You can edit directly the clusterstate. http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper Another option can be using the delivered zkclient (distributed with solr 4.5 and above) and upload a new cluster

Sudden Solr crush after commit

2013-12-12 Thread Manuel Le Normand

In the last days one of my tomcat servlet, running only a Solr instance, crushed unexpectedly twice. Low memory usage, nothing written in the tomcat log, and the last thing happening in solr log is 'end_commit_flush' followed by 'UnInverted mutli-valued field' for the fields faceted during the new

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Manuel Le Normand

In order to set discountOverlaps to true you must have added the to the schema.xml, which is commented out by default! As by default this param is false, the above situation is expected with correct positioning, as said. In order to fix the field norms you'd have to reindex with the similarity c

Re: Bad fieldNorm when using morphologic synonyms

2013-12-08 Thread Manuel Le Normand

Robert, you last reply is not accurate. It's true that the field norms and termVectors are independent. But this issue of higher norms for this case is expected with well assigned positions. The LengthNorm is assigned as FieldInvertState.length which is the count of incrementToken and not num of po

Re: distributed search is significantly slower than direct search

2013-11-25 Thread Manuel Le Normand

https://issues.apache.org/jira/browse/SOLR-5478 There it goes On Mon, Nov 18, 2013 at 5:44 PM, Manuel Le Normand < manuel.lenorm...@gmail.com> wrote: > Sure, I am out of office till end of week. I reply after i upload the patch >

Re: distributed search is significantly slower than direct search

2013-11-18 Thread Manuel Le Normand

Sure, I am out of office till end of week. I reply after i upload the patch

Re: distributed search is significantly slower than direct search

2013-11-17 Thread Manuel Le Normand

In order to accelerate the BinaryResponseWriter.write we extended this writer class to implement the docid to id tranformation by docValues (on memory) with no need to access stored field for id reading nor lazy loading of fields that also has a cost. That should improve read rate as docValues are

Re: distributed search is significantly slower than direct search

2013-11-13 Thread Manuel Le Normand

It's surprising such a query takes a long time, I would assume that after trying consistently q=*:* you should be getting cache hits and times should be faster. Try see in the adminUI how do your query/doc cache perform. Moreover, the query in itself is just asking the first 5000 docs that were ind

Basic query process question with fl=id

2013-10-24 Thread Manuel Le Normand

Hi Any distributed lookup is basically composed of two stages: the first collecting all the matching documents from every shard and a second which fetches additional information about specific ids (i.e stored, termVectors). It can be seen in the logs of each shard (isShard=true), where first requ

Re: Profiling Solr Lucene for query

2013-10-15 Thread Manuel Le Normand

I tried my last proposition, editing the clusterstate.json to add a dummy frontend shard seems to work. I made sure the ranges were not overlapping. Doesn't it resolve the solr cloud issue as specified above?

Re: Profiling Solr Lucene for query

2013-10-12 Thread Manuel Le Normand

Would adding a dummy shard instead of a dummy collection would resolve the situation? - e.g. editing clusterstate.json from a zookeeper client and adding a shard with a 0-range so no docs are routed to this core. This core would be on a separate server and act as the collection gateway.

Re: Profiling Solr Lucene for query

2013-09-11 Thread Manuel Le Normand

nce is the one that does not have its own index and > is doing merging of the results. Is this the case? If yes, are all 36 > shards always queried? > > Dmitry > > > On Mon, Sep 9, 2013 at 10:11 PM, Manuel Le Normand < > manuel.lenorm...@gmail.com> wrote: > >

Re: Expunge deleting using excessive transient disk space

2013-09-09 Thread Manuel Le Normand

tell you more. > > I'd _really_ try to get more disk space. The amount of engineer time spent > trying to tune this is way more expensive than a disk... > > Best, > Erick > > > On Sun, Sep 8, 2013 at 11:51 AM, Manuel Le Normand < > manuel.lenorm...@gmail.c

Re: Profiling Solr Lucene for query

2013-09-09 Thread Manuel Le Normand

ter if results merging can be avoided. > > Dmitry > > > On Sun, Sep 8, 2013 at 6:56 PM, Manuel Le Normand < > manuel.lenorm...@gmail.com> wrote: > > > Hello all > > Looking on the 10% slowest queries, I get very bad performances (~60 sec > > per query). &

Profiling Solr Lucene for query

2013-09-08 Thread Manuel Le Normand

Hello all Looking on the 10% slowest queries, I get very bad performances (~60 sec per query). These queries have lots of conditions on my main field (more than a hundred), including phrase queries and rows=1000. I do return only id's though. I can quite firmly say that this bad performance is due

Expunge deleting using excessive transient disk space

2013-09-08 Thread Manuel Le Normand

Hi, In order to delete part of my index I run a delete by query that intends to erase 15% of the docs. I added this params to the solrconfig.xml 2 2 5000.0 10.0 15.0 The extra params were added in order to promote merge of old segments but with restriction on the transient disk

Re: Wrong leader election leads to shard removal

2013-08-14 Thread Manuel Le Normand

00 AM, Manuel Le Normand < manuel.lenorm...@gmail.com> wrote: > Hello, > My solr cluster runs on RH Linux with tomcat7 servlet. > NumOfShards=40, replicationFactor=2, 40 servers each has 2 replicas. Solr > 4.3 > > For experimental reasons I splitted my cluster to 2 sub-cluster

Wrong leader election leads to shard removal

2013-08-14 Thread Manuel Le Normand

Hello, My solr cluster runs on RH Linux with tomcat7 servlet. NumOfShards=40, replicationFactor=2, 40 servers each has 2 replicas. Solr 4.3 For experimental reasons I splitted my cluster to 2 sub-clusters, each containing a single replica of each shard. When connecting back these sub-clusters the

Merged segment warmer Solr 4.4

2013-07-28 Thread Manuel Le Normand

Hi, I have a slow storage machine and non sufficient RAM for the whole index to store all the index. This causes the first queries (~5000) to be very slow (they are read from disk and my cpu is most of time in iowait), and after that the readings from the index become very fast and read mainly from

Re: Regex in Stopword.xml

2013-07-22 Thread Manuel Le Normand

Use the pattern replace filter factory This will do exactly what you asked for http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceFilterFactory On Mon, Jul 22, 2013 at 12:22 PM, Scatman wrote: > Hi, > > I was looking for an issue, in order to put some regular

Re: SolrEntityProcessor gets slower and slower

2013-07-21 Thread Manuel Le Normand

Minfeng- This issue is tougher as the number of shard you have raise, you can read Erick Erickson's post: http://grokbase.com/t/lucene/solr-user/131p75p833/how-distributed-queries-works. If you have 100M docs I guess you are running this issue. The common way to deal with this issue is by filteri

Re: Solr caching clarifications

2013-07-15 Thread Manuel Le Normand

Great explanation and article. Yes, this buffer for merges seems very small, and still optimized. Thats impressive.

Re: Solr caching clarifications

2013-07-14 Thread Manuel Le Normand

Thu, Jul 11, 2013 at 8:36 AM, Manuel Le Normand > wrote: > > Hello, > > As a result of frequent java OOM exceptions, I try to investigate more into > > the solr jvm memory heap usage. > > Please correct me if I am mistaking, this is my understanding of usages for >

Solr caching clarifications

2013-07-11 Thread Manuel Le Normand

Hello, As a result of frequent java OOM exceptions, I try to investigate more into the solr jvm memory heap usage. Please correct me if I am mistaking, this is my understanding of usages for the heap (per replica on a solr instance): 1. Buffers for indexing - bounded by ramBufferSize 2. Solr caches

Re: Common practice for free text field

2013-06-25 Thread Manuel Le Normand

By field aliasing I meant something like: f.all_fields.qf=*_txt+*_s+*_int that would sum up to 100 fields On Wed, Jun 26, 2013 at 12:00 AM, Manuel Le Normand < manuel.lenorm...@gmail.com> wrote: > My schema contains about a hundred of fields of various types (int, > strings, plain

Common practice for free text field

2013-06-25 Thread Manuel Le Normand

My schema contains about a hundred of fields of various types (int, strings, plain text, emails). I was concerned what is the common practice for searching free text over the index. Assuming there are not boosts related to field matching, these are the options I see: 1. Index and query a "all_f

Re: Avoiding OOM fatal crash

2013-06-17 Thread Manuel Le Normand

st. You are getting OOM because the JVM does not > have enough memory to build a response with 100K documents. > > wunder > > On Jun 17, 2013, at 1:57 PM, Manuel Le Normand wrote: > > > One of my users requested it, they are less aware of what's allowed and I > >

Re: Parallel queries on a single core

2013-06-17 Thread Manuel Le Normand

N > processes running in the OS - they all get a slice of the CPU time to > do their work. Not sure if that answers your question...? > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > On Mon, Jun 17, 2013 at 4:32 PM, Manuel

Re: Avoiding OOM fatal crash

2013-06-17 Thread Manuel Le Normand

would not get the JVM heap flooded (for example I already have all cashed and my RAM io's are very fast) On Mon, Jun 17, 2013 at 11:47 PM, Walter Underwood wrote: > Don't request 100K docs in a single query. Fetch them in smaller batches. > > wunder > > On Jun 17, 2013,

Avoiding OOM fatal crash

2013-06-17 Thread Manuel Le Normand

Hello again, After a heavy query on my index (returning 100K docs in a single query) my JVM heap's floods and I get an JAVA OOM exception, and then that my GCcannot collect anything (GC overhead limit exceeded) as these memory chunks are not disposable. I want to afford queries like this, my conc

Parallel queries on a single core

2013-06-17 Thread Manuel Le Normand

Hello all, Assuming I have a single shard with a single core, how do run multi-threaded queries on Solr 4.x? Specifically, if one user sends a heavy query (legitimate wildcard query for 10 sec), what happens to all other users quering during this period? If the repsonse is that simultaneous queri

Re: Exceptions on startup & shutdown for solr 4.3 on Tomcat 7

2013-05-12 Thread Manuel Le Normand

Ok! Will check eventually if it's an ACE issue and will upload the stack trace in case something else is throwing theses exceptions... Thanks meanwhile On Mon, May 13, 2013 at 12:11 AM, Shawn Heisey wrote: > On 5/12/2013 2:37 PM, Manuel Le Normand wrote: > > The upgrade from 4

Query specific replica

2013-04-23 Thread Manuel Le Normand

Hello, Since i replicated my shards (i have 2 cores per shard now), I get a remarkable decrease in qTime. I assume it happens since my memory has to split between twice more cores than it used to. In my low qps rate use-case, I use replications as shard backup only (in case one of my servers goes

Too many unique terms

2013-04-23 Thread Manuel Le Normand

Hi there, Looking at one of my shards (about 1M docs) i see lot of unique terms, more than 8M which is a significant part of my total term count. These are very likely useless terms, binaries or other meaningless numbers that come with few of my docs. I am totally fine with deleting them so these t

Re: solr-cloud performance decrease day by day

2013-04-19 Thread Manuel Le Normand

Can happen for various reasons. Can you recreate the situation, meaning restarting the servlet or server would start with good qTime and decrease from that point? How fast does this happen? Start by monitoring the jvm process, with oracle visualVM for example. Monitor for frequent garbage collect

Re: What are the pros and cons Having More Replica at SolrCloud

2013-04-18 Thread Manuel Le Normand

On the query side, another down side i see would be that for a given memory pool, you'd have to share it with more cores because every replica uses it's own cache. True for the inner solr caching (JVM's heap) and OS caching as well. Adding a replicated core creates a new data set (index) that will

Updating clusterstate from the zookeeper

2013-04-18 Thread Manuel Le Normand

Hello, After creating a distributed collection on several different servers I sometimes get to deal with failing servers (cores appear "not available" = grey) or failing cores ("Down / unable to recover" = brown / red). In case i wish to delete this errorneous collection (through collection API) on

Re: Slow qTime for distributed search

2013-04-11 Thread Manuel Le Normand

Hi, We have different working hours, sorry for the reply delay. Your assumed numbers are right, about 25-30Kb per doc. giving a total of 15G per shard, there are two shards per server (+2 slaves that should do no work normally). An average query has about 30 conditions (OR AND mixed), most of them

Re: Slow qTime for distributed search

2013-04-09 Thread Manuel Le Normand

as it's a "response-merge" (CPU resource) bottleneck? Thanks in advance, Manu On Mon, Apr 8, 2013 at 10:19 PM, Shawn Heisey wrote: > On 4/8/2013 12:19 PM, Manuel Le Normand wrote: > >> It seems that sharding my collection to many shards slowed down >> unre

Re: Slow qTime for distributed search

2013-04-08 Thread Manuel Le Normand

After taking a look on what I'd wrote earlier, I will try to rephrase in a clear manner. It seems that sharding my collection to many shards slowed down unreasonably, and I'm trying to investigate why. First, I created "collection1" - 4 shards*replicationFactor=1 collection on 2 servers. Second I

Slow qTime for distributed search

2013-04-07 Thread Manuel Le Normand

Hello After performing a benchmark session on small scale i moved to a full scale on 16 quad core servers. Observations at small scale gave me excellent qTime (about 150 ms) with up to 2 servers, showing my searching thread was mainly cpu bounded. My query set is not faceted. Growing to full scale

Re: Is Solr more CPU bound or IO bound?

2013-03-17 Thread Manuel Le Normand

Your question is a typical use-case dependent, the bottleneck will change from user to user. These are two main issues that will affect the answer: 1. How do you index: what is your indexing rate (how many docs a days)? how big is a typical document? how many documents do you plan on indexing in t

Re: Optimization storage issue

2013-03-02 Thread Manuel Le Normand

e. . This > triggers Solr to merge any segments with deletes. > > Lastly, I'm not sure about your specific questions related to > optimizations, but I think it's worth trying the suggestions above and > avoid optimizations altogether. I'm pretty sure the answer to #1 is n

Re: Threads running while querrying

2013-02-20 Thread Manuel Le Normand

r with a single > thread? Because Solr uses multiple threads to search AFAIK. > > Best > Erick > > > On Wed, Feb 20, 2013 at 4:01 AM, Manuel Le Normand < > manuel.lenorm...@gmail.com > wrote: > > > More to it, i do see 75 more threads under the process of tomcat

57 matches

Mail list logo