Re: Bad fieldNorm when using morphologic synonyms

2013-12-26 Thread Isaac Hebsh
Attached patch into the JIRA issue. Reviews are welcome. On Thu, Dec 19, 2013 at 7:24 PM, Isaac Hebsh wrote: > Roman, do you have any results? > > created SOLR-5561 > > Robert, if I'm wrong, you are welcome to close that issue. > > > On Mon, Dec 9, 2013 at 10:50 PM

Re: Bad fieldNorm when using morphologic synonyms

2013-12-19 Thread Isaac Hebsh
Roman, do you have any results? created SOLR-5561 Robert, if I'm wrong, you are welcome to close that issue. On Mon, Dec 9, 2013 at 10:50 PM, Isaac Hebsh wrote: > You can see the norm value, in the "explain" text, when setting > debugQuery=true. > If the same item ge

Re: LocalParam for nested query without escaping?

2013-12-19 Thread Isaac Hebsh
created SOLR-5560 On Tue, Dec 10, 2013 at 8:48 AM, William Bell wrote: > Sounds like a bug. > > > On Mon, Dec 9, 2013 at 1:16 PM, Isaac Hebsh wrote: > > > If so, can someone suggest how a query should be escaped (securely and > > correctly)? > > Should I esc

Re: Global query parameters to facet query

2013-12-09 Thread Isaac Hebsh
my case is field aliasing of edismax. consider this request, which sent to the example configuration: http://localhost:8983/solr/collection1/select?defType=edismax&q.myalias.qf=text&q=myalias:1234&facet=true&facet.query=myalias:1234 the result is: undefined field myalias 400 when disabling th

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Isaac Hebsh
--roman > > > On Mon, Dec 9, 2013 at 2:42 PM, Isaac Hebsh > > > wrote: > > > Hi Robert and Manuel. > > > > The DefaultSimilarity indeed sets discountOverlap to true by default. > > BUT, the *factory*, aka DefaultSimilarityFactory, when called by > &g

Re: Global query parameters to facet query

2013-12-09 Thread Isaac Hebsh
created SOLR-5542. Anyone else want it? On Thu, Dec 5, 2013 at 8:55 PM, Isaac Hebsh wrote: > Hi, > > It seems that a facet query does not use the global query parameters (for > example, field aliasing for edismax parser). > We have an intensive use of facet queries (in some c

Re: LocalParam for nested query without escaping?

2013-12-09 Thread Isaac Hebsh
If so, can someone suggest how a query should be escaped (securely and correctly)? Should I escape the quote mark (and backslash mark itself) only? On Fri, Dec 6, 2013 at 2:59 PM, Isaac Hebsh wrote: > Obviously, there is the option of external parameter ({... > v=$nestedq}&a

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Isaac Hebsh
Hi Robert and Manuel. The DefaultSimilarity indeed sets discountOverlap to true by default. BUT, the *factory*, aka DefaultSimilarityFactory, when called by IndexSchema (the getSimilarity method), explicitly sets this value to the value of its corresponding class member. This class member is initi

Re: LocalParam for nested query without escaping?

2013-12-06 Thread Isaac Hebsh
Obviously, there is the option of external parameter ({... v=$nestedq}&nestedq=...) This is a good solution, but it is not practical, when having a lot of such nested queries. Any ideas? On Friday, December 6, 2013, Isaac Hebsh wrote: > We want to set a LocalParam on a nested quer

LocalParam for nested query without escaping?

2013-12-06 Thread Isaac Hebsh
We want to set a LocalParam on a nested query. When quering with "v" inline parameter, it works fine: http://localhost:8983/solr/collection1/select?debugQuery=true&defType=lucene&df=id&q=TERM1AND {!lucene df=text v="TERM2 TERM3 \"TERM4 TERM5\""} the parsedquery_toString is +id:TERM1 +(text:term2 t

Re: Bad fieldNorm when using morphologic synonyms

2013-12-06 Thread Isaac Hebsh
its > broken. > > On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh wrote: > > Hi, > > we implemented a morphologic analyzer, which stems words on index time. > > For some reasons, we index both the original word and the stem (on the > same > > position, of course). &g

Re: Bad fieldNorm when using morphologic synonyms

2013-12-05 Thread Isaac Hebsh
at 9:48 AM, Ahmet Arslan wrote: > Hi Isaac, > > Did you consider omitting norms completely for that field? omitNorms="true" > Are you using solr.RemoveDuplicatesTokenFilterFactory? > > > > On Thursday, December 5, 2013 8:55 PM, Isaac Hebsh > wrote: > > Hi,

Global query parameters to facet query

2013-12-05 Thread Isaac Hebsh
Hi, It seems that a facet query does not use the global query parameters (for example, field aliasing for edismax parser). We have an intensive use of facet queries (in some cases, we have a lot of facet.query for a single q), and the using of LocalParams for each facet.query is not convenient. D

Bad fieldNorm when using morphologic synonyms

2013-12-05 Thread Isaac Hebsh
Hi, we implemented a morphologic analyzer, which stems words on index time. For some reasons, we index both the original word and the stem (on the same position, of course). The stemming is done on a specific language, so other languages are not stemmed at all. Because of that, two documents with

Re: Solr Result Tagging

2013-10-27 Thread Isaac Hebsh
Hi, Try using facet.query on each part, you will get the number of total hits for every OR. If you need this info per document, the answers might appear when specifying debug query=true.. If that info is useful, try adding "[explain]" to fl param (probably requires registering the augmenter plugin

Re: Profiling Solr Lucene for query

2013-10-01 Thread Isaac Hebsh
e for reading the index, or more CPUs because the merging process might be more CPU intensive). Isn't it possible? On Wed, Oct 2, 2013 at 12:42 AM, Shawn Heisey wrote: > On 10/1/2013 2:35 PM, Isaac Hebsh wrote: > >> Hi Dmitry, >> >> I'm trying to examine your s

Re: Profiling Solr Lucene for query

2013-10-01 Thread Isaac Hebsh
Hi Dmitry, I'm trying to examine your suggestion to create a frontend node. It sounds pretty usefull. I saw that every node in solr cluster can serve request for any collection, even if it does not hold a core of that collection. because of that, I thought that adding a new node to the cluster (ak

Re: Data duplication using Cloud+HDFS+Mirroring

2013-09-30 Thread Isaac Hebsh
Hi Greg, Did you get an answer? I'm interested in the same question. More generally, what are the benefits of HdfsDirectoryFactory, besides the transparent restore of the shard contents in case of a disk failure, and the ability to rebuild index using MR? Is the next statement exact? blocks of a p

Considerations about setting maxMergedSegmentMB

2013-09-30 Thread Isaac Hebsh
Hi, Trying to solve query performance issue, we suspect on the number of index segments, which might slow the query (due to I/O seeks, happens for each term in the query, multiplied by number of segments). We are on Solr 4.3 (TieredMergePolicy with mergeFactor of 4). We can reduce the number of se

Re: Getting a query parameter in a TokenFilter

2013-09-21 Thread Isaac Hebsh
ira/browse/SOLR-5053 What would you do? On Tue, Sep 17, 2013 at 10:31 PM, Isaac Hebsh wrote: > Hi everyone, > > We developed a TokenFilter. > It should act differently, depends on a parameter supplied in the > query (for query chain only, not the index one, of course). >

Getting a query parameter in a TokenFilter

2013-09-17 Thread Isaac Hebsh
Hi everyone, We developed a TokenFilter. It should act differently, depends on a parameter supplied in the query (for query chain only, not the index one, of course). We found no way to pass that parameter into the TokenFilter flow. I guess that the root cause is because TokenFilter is a pure luce

Re: documentCache and lazyFieldLoading

2013-08-29 Thread Isaac Hebsh
Thanks Hoss. 1. We currently use Solr 4.3.0. 2. I understand this architecture of LazyFields, but i did not understand why multiple LazyFields should be created for the multivalued field. You can't load a part of them. If you request the field, you will get ALL of its values. so 100 (or more) plac

documentCache and lazyFieldLoading

2013-08-29 Thread Isaac Hebsh
Hi, We've investigated a memory dump, which was taken after some frequent OOM incidents. The main issue we found was a lot of millions of LazyField instances, taking ~2GB of memory, even though queries request about 10 small fields only. We've found that LazyDocument creates a LazyField object fo

Re: Sending shard requests to all replicas

2013-07-31 Thread Isaac Hebsh
Thanks to Ryan Ernst, my issue is duplicate of SOLR-4449. I think that this proposal might be very useful (some supporting links are attached there. worth reading..) On Tue, Jul 30, 2013 at 11:49 PM, Isaac Hebsh wrote: > Hi, > I submitted a new JIRA for this: > https://issues.apache

Re: Sending shard requests to all replicas

2013-07-30 Thread Isaac Hebsh
know that's where the request is sent out. > > I'd think that would be better than changing Solr itself > since if you found that this was useful you wouldn't > be patching your Solr release, just keeping your client > up to date. > > Best > Erick > &g

Re: Sending shard requests to all replicas

2013-07-27 Thread Isaac Hebsh
olution or not.. :) On Sun, Jul 28, 2013 at 1:06 AM, Shawn Heisey wrote: > On 7/27/2013 3:33 PM, Isaac Hebsh wrote: > > I have about 40 shards. repFactor=2. > > The cause of slower shards is very interesting, and this is the main > > approach we took. > > Note that in e

Re: Sending shard requests to all replicas

2013-07-27 Thread Isaac Hebsh
y to figure out _why_ > you so often have a slow shard and whether the problem could > be cured with, say, better warming queries on the shards... > > Best > Erick > > On Fri, Jul 26, 2013 at 8:23 AM, Isaac Hebsh > wrote: > > Hi! > > > > When SolrClound

Sending shard requests to all replicas

2013-07-26 Thread Isaac Hebsh
Hi! When SolrClound executes a query, it creates shard requests, which is sent to one replica of each shard. Total QTime is determined by the slowest shard response (plus some extra time). [For simplicity, let's assume that no stored fields are requested.] I suffer from a situation where in every

MoinMoin Dump

2013-07-17 Thread Isaac Hebsh
Hi, There was a thread about viewing Solr Wiki offline, About 6 months ago. I'm intersted, too. It seems that a manual (cron?) dump will do the work... Would it be too much to ask that one of the admins will manually create such a dump? (http://moinmo.in/HelpOnMoinCommand/ExportDump) Otis, is t

Re: Wildcards and Phrase queries

2013-06-23 Thread Isaac Hebsh
. > > > You could try with higher solr versions too. If it does not work, please > lets us know. > > > https://issues.apache.org/jira/secure/attachment/12579832/ComplexPhrase-4.2.1.zip > > > > ____ > From: Isaac Hebsh > To: solr-use

Re: Wildcards and Phrase queries

2013-06-22 Thread Isaac Hebsh
terms of whether you wanted > to use these for production. > > I confess I don't know what state they were left in or why they were > never committed. > > FWIW, > Erick > > On Wed, Jun 19, 2013 at 10:08 AM, Isaac Hebsh > wrote: > > Hi, > > > >

Wildcards and Phrase queries

2013-06-19 Thread Isaac Hebsh
Hi, I'm trying to understand what is the status of enabling wildcards on phrase queries? Lucene JIRA issue: https://issues.apache.org/jira/browse/LUCENE-1486 Solr JIRA issue: https://issues.apache.org/jira/browse/SOLR-1604 It looks like these issues are not going to be solved in the close future

OutOfMemory while indexing (PROD environment!)

2013-06-06 Thread Isaac Hebsh
Hi everyone, My SolrCloud cluster (4.3.0) has came into production a few days ago. Docs are being indexed into Solr using "/update" requestHandler, as a POST request, containing text/xml content-type. The collection is sharded into 36 pieces, each shard has two replicas. There are 36 nodes (each

Re: Prevention of heavy wildcard queries

2013-06-02 Thread Isaac Hebsh
n Tue, May 28, 2013 at 7:08 AM, Isaac Hebsh wrote: > I don't want to affect on the (correctness of the) real query parsing, so > creating a QParserPlugin is risky. > Instead, If I'll parse the query in my search component, it will be > detached from the real query parsin

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
be at the same place, > or just above, the wildcard processor > > also make sure you are setting your qparser for FQ queries, ie. > fq="{!nw}foo" > > > On Mon, May 27, 2013 at 5:01 PM, Isaac Hebsh > wrote: > > > Thanks Roman. > > Based on some

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
> raise error etc > > this way, you are changing semantics - but don't need to touch the syntax > definition; of course, you may also change the grammar and allow only one > instance of wildcard (or some combination) but for that you should probably > use LUCENE-5014 >

Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
Hi. Searching terms with wildcard in their start, is solved with ReversedWildcardFilterFactory. But, what about terms with wildcard in both start AND end? This query is heavy, and I want to disallow such queries from my users. I'm looking for a way to cause these queries to fail. I guess there i

Bloom Filters

2013-05-17 Thread Isaac Hebsh
Hi everyone.. I'm indexing docs into Solr using the update request handler, by POSTing data to the REST endpoint (not SolrJ, not DIH). My indexer should return an indication, whether the document existed in the collection before or not, based in its ID. The obvious solution is the perform a query

Re: SurroundQParser does not analyze the query text

2013-05-17 Thread Isaac Hebsh
p a query in a way that leverages Solr's field > type analysis settings, but it is a technologically possible technique > maybe worth considering. > > Erik > > > > On May 16, 2013, at 16:38 , Isaac Hebsh wrote: > > Hi, >> >> I'm trying to use Surround Qu

SurroundQParser does not analyze the query text

2013-05-16 Thread Isaac Hebsh
Hi, I'm trying to use Surround Query Parser for two reasons, which are not covered by proximity slops: 1. find documents with two words within a given distance, *unordered* 2. given two lists of words, find documents with (at least) one word from list A and (at least) one word from list B, within

Re: Combining Solr Indexes at SolrCloud

2013-03-29 Thread Isaac Hebsh
Let's say you have machine A and machine B. you want to shutdown B. If all the shards on B have replicas (on A), you can shutdown B instantly. If there is a shard on B that has no replica, you should create one on machine A (using Core API), let it replicate the whole shard contents, and then you a

Re: Basic auth on SolrCloud /admin/* calls

2013-03-29 Thread Isaac Hebsh
Hi Tim, Are you running Solr 4.2? (In 4.0 and 4.1, the Collections API didn't return any failure message. see SOLR-4043 issue). As far as I know, you can't tell Solr to use authentication credentials when communicating other nodes. It's a bigger issue.. for example, if you want to protect the "/up

Solr 4.2 - DocValues on id field

2013-03-13 Thread Isaac Hebsh
Hi, The example schema.xml in Solr 4.2 does not define "id" field as docValues=true. Any good reason? (other than backward compat for index for previous version...) If my common case is fl=id (and no other field), DocValues is classic for me. Am I right?

Any documentation on Solr MBeans?

2013-03-07 Thread Isaac Hebsh
Hi, I'm trying to monitor some Solr behaviour, using JMX. It looks like a great job was done there, but I can't find any documentation on the MBeans themselves. For example, DirectUpdateHandler2 attributes. What is the difference between "adds" and "cumulative_adds"? Is "adds" count the last X se

Re: Timestamp field is changed on update

2013-02-28 Thread Isaac Hebsh
if exists). This solution exactly covers my case. Thank you! On Wed, Feb 20, 2013 at 11:33 PM, Isaac Hebsh wrote: > Nobody responded my JIRA issue :( > Should I commit this patch into SVN's trunk, and set the issue as Resolved? > > > On Sun, Feb 17, 2013 at 9:26 PM, Isaac He

update fails if one doc is wrong

2013-02-26 Thread Isaac Hebsh
Hi. I add documents to Solr by POSTing them to UpdateHandler, as bulks of commands (DIH is not used). If one document contains any invalid data (e.g. string data into numeric field), Solr returns HTTP 400 Bad Request, and the whole bulk is failed. I'm searching for a way to tell Solr to accept

Re: Timestamp field is changed on update

2013-02-20 Thread Isaac Hebsh
Nobody responded my JIRA issue :( Should I commit this patch into SVN's trunk, and set the issue as Resolved? On Sun, Feb 17, 2013 at 9:26 PM, Isaac Hebsh wrote: > Thank you Alex. > Atomic Update allows you to "add" new values into multivalued field, for > example... It

Re: Timestamp field is changed on update

2013-02-17 Thread Isaac Hebsh
Thank you Alex. Atomic Update allows you to "add" new values into multivalued field, for example... It means that the original document is being read (using RealTimeGet, which depends on updateLog). There is no reason that the list of operations (add/set/inc) will not include a "create-only" operat

Re: Timestamp field is changed on update

2013-02-16 Thread Isaac Hebsh
7 AM, Upayavira wrote: > I think what Walter means is make the thing that sends it to Solr set > the timestamp when it does so. > > Upayavira > > On Sat, Feb 16, 2013, at 08:56 PM, Isaac Hebsh wrote: > > Hi, > > I do have an externally-created timestamp, but some minut

Re: Timestamp field is changed on update

2013-02-16 Thread Isaac Hebsh
tem? I think an external > create timestamp would be a lot more useful. > > wunder > > On Feb 16, 2013, at 12:37 PM, Isaac Hebsh wrote: > > > I opened a JIRA for this improvement request (attached a patch to > > DistributedUpdateProcessor). > > It's my firs

Re: Timestamp field is changed on update

2013-02-16 Thread Isaac Hebsh
I opened a JIRA for this improvement request (attached a patch to DistributedUpdateProcessor). It's my first JIRA. please review it... (Or, if someone has an easier solution, tell us...) https://issues.apache.org/jira/browse/SOLR-4468 On Fri, Feb 15, 2013 at 8:13 AM, Isaac Hebsh wrote:

Re: How to limit queries to specific IDs

2013-02-12 Thread Isaac Hebsh
them as a MUST clause, like > +(original query) +id:(1 2 3 4). > > Third possibility, see https://issues.apache.org/jira/browse/SOLR-2429, > but > the short form is: > fq={!cache=false}restoffq > > > On Mon, Feb 11, 2013 at 2:41 PM, Isaac Hebsh > wrote: > > > Hi e

How to limit queries to specific IDs

2013-02-11 Thread Isaac Hebsh
Hi everyone. I have queries that should be bounded to a set of IDs (the uniqueKey field of my schema). My client front-end sends two Solr request: In the first one, it wants to get the top X IDs. This result should return very fast. No time to "waste" on highlighting. this is a very standard query

Re: Trying to understand soft vs hard commit vs transaction log

2013-02-08 Thread Isaac Hebsh
Shawn, what about 'flush to disk' behaviour on MMapDirectoryFactory? On Fri, Feb 8, 2013 at 11:12 AM, Prakhar Birla wrote: > Great explanation Shawn! BTW soft commited documents will be not be > recovered on JVM crash. > > On 8 February 2013 13:27, Shawn Heisey wrote: > > > On 2/7/2013 9:29 PM,

Re: IP Address as number

2013-02-07 Thread Isaac Hebsh
Small addition: To support query, I probably have to implement an analyzer (query time)... An analyzer can be configured on numeric (i.e non TEXT) field? On Thu, Feb 7, 2013 at 6:48 PM, Isaac Hebsh wrote: > Hi. > > I have to index field which contains an IP address. > Users want t

Re: Servlet Filter for randomizing core names

2013-02-04 Thread Isaac Hebsh
2/4/2013 12:06 PM, Isaac Hebsh wrote: > >> LBHttpSolrServer is only solrj feature.. doesn't it? >> >> I think that Solr does not balance queries among cores in the same server. >> You can claim that it's a non-issue, if a single core can completely serve >

Re: Servlet Filter for randomizing core names

2013-02-04 Thread Isaac Hebsh
es nothing. I feel that we can achieve some improvement in this case... On Mon, Feb 4, 2013 at 12:45 AM, Shawn Heisey wrote: > On 2/3/2013 3:24 PM, Isaac Hebsh wrote: > >> Thanks Shawn for your quick answer. >> >> When using collection name, Solr will choose the leader

Re: Servlet Filter for randomizing core names

2013-02-03 Thread Isaac Hebsh
ithreading works well here, Is utilizing all the cores would not be useful? On Sun, Feb 3, 2013 at 11:49 PM, Shawn Heisey wrote: > On 2/3/2013 1:18 PM, Isaac Hebsh wrote: > >> Hi. >> >> I have a SolrCloud cluster, which contains some servers. each server runs >> multip

Re: Distibuted search

2013-01-28 Thread Isaac Hebsh
nd the boost is pretty > impressive (roughly 2-5x faster for a complicated query) > > Ming > > > On Mon, Jan 28, 2013 at 10:54 AM, Isaac Hebsh > wrote: > > > Does adding replicas (on additional servers) help to improve search > > performance? > > > >

Re: secure Solr server

2013-01-27 Thread Isaac Hebsh
You can define a security filter in WEB-INF\web.xml, on specific url patterns. You might want to set the url pattern to "/admin/*". [find examples here: http://stackoverflow.com/questions/7920092/how-can-i-bypass-security-filter-in-web-xml ] On Sun, Jan 27, 2013 at 8:07 PM, Mingfeng Yang wrote:

Re: uniqueKey field type

2013-01-23 Thread Isaac Hebsh
http://sematext.com/ > On Jan 23, 2013 2:53 PM, "Isaac Hebsh" wrote: > > > Hi, > > > > In my use case, Solr have to to return only the "id" field, as a response > > for queries. However, it should return 1000 docs at once (rows=1000). > &g

Re: Solr cache considerations

2013-01-20 Thread Isaac Hebsh
flagpole and try it. Rely on the OS to do its job > (http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html). > Find a bottleneck _then_ tune. Premature optimization and all > that > > Several tens of millions of docs isn't that large unless the text > field

Re: Solr cache considerations

2013-01-19 Thread Isaac Hebsh
it, so you won't see the new indexed data and caches > wont be flushed. openSearcher=false makes sense when you are using > hard-commits together with soft-commits, as the "soft-commit" is dealing > with opening/closing searchers, you don't need hard commit

Re: Solr cache considerations

2013-01-17 Thread Isaac Hebsh
ransaction log to > > assure index integrity. Not to mention that your tlog will be huge. > > Not to mention that there is some memory usage for each document in > > the tlog. Hard commits roll over the tlog, flush the in-memory tlog > > pointers, close index segments, etc. &g