Optimizing integer primary key lookup speed: optimal FieldType and Codec?

2019-06-17 Thread Gregg Donovan
Hello! We (Etsy) would like to optimize primary key lookup speed. Our primary key is a 32-bit integer -- and are wondering what the state-of-the-art is for FieldType and Codec these days for maximizing the throughput of 32-bit ID lookups. Context: Specifically, we're looking to optimize the loadi

1969 vs 1960s: not-quite-synonyms in Solr

2019-03-06 Thread Gregg Donovan
For a search like "1969 shirt" I would like to return items with either 1969 or 1960s but boost 1969 items higher. For the query "1960s shirt", 1960s and 1960, 1961, ... 1969 should all match equally. Is there a standard technique for this? I'm struggling to do this with eDisMax without adding new

Re: Query of Death Lucene/Solr 7.6

2019-02-22 Thread Gregg Donovan
FWIW: we have also seen serious Query of Death issues after our upgrade to Solr 7.6. Are there any open issues we can watch? Is Markus' findings around `pf` our best guess? We've seen these issues even with ps=0. We also use the WDF. On Fri, Feb 22, 2019 at 8:58 AM Markus Jelsma wrote: > Hello M

Compression for solrbin?

2015-11-13 Thread Gregg Donovan
We've had success with LZ4 compression in a custom ShardHandler to reduce network overhead, getting ~25% compression with low CPU impact. LZ4 or Snappy seem like reasonable choices[1] for maximizing compression + transfer + decompression times in the data center. Would it make sense to integrate c

ShardHandler semantics

2015-04-02 Thread Gregg Donovan
We're starting work on adding backup requests to the ShardHandler. Roughly something like: 1. Send requests to 100 shards. 2. Wait for results from 75 to come back. 3. Wait for either a)

Re: How To Interrupt Solr Query Execution

2015-03-20 Thread Gregg Donovan
SOLR-5986 looks like a great enhancement for enforcing timeouts. I'm curious about how to handle *manual* cancellation. We're working on backup requests -- e.g. wait till 90% of shards have responded then send out a backup request for the lagging (e.g. GC, cache miss, overloaded, etc.) shards afte

Re: Enforcing a hard timeout on shard requests?

2014-06-02 Thread Gregg Donovan
occurrence? > > Jason > > On May 30, 2014, at 3:05 PM, Gregg Donovan wrote: > > > I'd like a to add a hard timeout on some of my sharded requests. E.g.: > for > > about 30% of the requests, I want to wait no longer than 120ms before a > > response comes ba

Enforcing a hard timeout on shard requests?

2014-05-30 Thread Gregg Donovan
I'd like a to add a hard timeout on some of my sharded requests. E.g.: for about 30% of the requests, I want to wait no longer than 120ms before a response comes back, but aggregating results from as many shards as possible in that 120ms. My first attempt was to use timeAllowed=120&shards.tolerant

How to optimize a DisMax of multiple cachable queries?

2014-04-25 Thread Gregg Donovan
dexSearcher and then merging a the DocLists manually. My concern is that this would lose the query normalization that happens in DisjunctionMaxQuery. This seems like a common problem: how to cache parts of a complex Solr query individually. Any ideas or common patterns for solving it? Thanks. --Gregg

Re: Estimating RAM usage of SolrCache instances?

2014-04-16 Thread Gregg Donovan
is about "average size of a filter query" + maxdoc/8 > document cacha is about "average size of the stored fields in bytes" * > size. > > HTH, > Erick > > On Mon, Apr 14, 2014 at 5:17 PM, Gregg Donovan wrote: > > We'd like to graph the approximate RA

Estimating RAM usage of SolrCache instances?

2014-04-14 Thread Gregg Donovan
We'd like to graph the approximate RAM size of our SolrCache instances. Our first attempt at doing this was to use the Lucene RamUsageEstimator [1]. Unfortunately, this appears to give a bogus result. Every instance of FastLRUCache was judged to have the same exact size, down to the byte. I assume

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-07 Thread Gregg Donovan
t seems to get the job done - for me - it's not > a reusable component, but might serve as an illustration of one way to > handle the problem > > -Mike > > > On 04/07/2014 12:23 PM, Gregg Donovan wrote: > >> That was my first attempt, but it's much trickier tha

Re: Fetching uniqueKey and other int quickly from documentCache?

2014-04-07 Thread Gregg Donovan
Mar 3, 2014 at 11:14 AM, Gregg Donovan wrote: > > Yonik, > > > > That's a very clever idea. Unfortunately, I think that will skip the > > distributed query optimization we were hoping to take advantage of in > > SOLR-1880 [1], but it should work with the proposed

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-07 Thread Gregg Donovan
; > > > Regards, > >Alex. > > Personal website: http://www.outerthoughts.com/ > > Current project: http://www.solr-start.com/ - Accelerating your Solr > proficiency > > > > > > On Sat, Apr 5, 2014 at 3:16 AM, Gregg Donovan > wrote: > >>

Distributed tracing for Solr via adding HTTP headers?

2014-04-04 Thread Gregg Donovan
We have some metadata -- e.g. a request UUID -- that we log to every log line using Log4J's MDC [1]. The UUID logging allows us to connect any log lines we have for a given request across servers. Sort of like Zipkin [2]. Currently we're using EmbeddedSolrServer without sharding, so adding the UUI

Re: Fetching uniqueKey and other int quickly from documentCache?

2014-03-03 Thread Gregg Donovan
u're not requesting any stored fields, that *might* currently > skip that step. > > -Yonik > http://heliosearch.org - native off-heap filters and fieldcache for solr > > > On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan wrote: > > We fetch a large number of documents --

Re: SolrCloud: heartbeat succeeding while node has failing SSD?

2014-03-03 Thread Gregg Donovan
some local readings and depending on the results, pulls itself out > of the mix as best it can (remove itself from clusterstate.json or simply > closes it's zk conneciton). > > - Mark > > http://about.me/markrmiller > > On Mar 2, 2014, at 3:42 PM, Gregg Donovan wrote:

SolrCloud: heartbeat succeeding while node has failing SSD?

2014-03-02 Thread Gregg Donovan
We had a brief SolrCloud outage this weekend when a node's SSD began to fail but the node still appeared to be up to the rest of the SolrCloud cluster (i.e. still green in clusterstate.json). Distributed queries that reached this node would fail but whatever heartbeat keeps the node in the clustrst

Fetching uniqueKey and other int quickly from documentCache?

2014-02-24 Thread Gregg Donovan
We fetch a large number of documents -- 1000+ -- for each search. Each request fetches only the uniqueKey or the uniqueKey plus one secondary integer key. Despite this, we find that we spent a sizable amount of time in SolrIndexSearcher#doc(int docId, Set fields). Time is spent fetching the two sto

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

2014-02-24 Thread Gregg Donovan
Thank you Shalin and Yonik! Both SOLR-1880 and SOLR-5768 will be very helpful for our distributed search performance. On Mon, Feb 24, 2014 at 5:02 AM, Shalin Shekhar Mangar < shalinman...@gmail.co

DistributedSearch: Skipping STAGE_GET_FIELDS?

2014-02-23 Thread Gregg Donovan
In most of our Solr use-cases, we fetch only fl= or fl=,. I'd like to be able to do a distributed search and skip STAGE_GET_FIELDS -- i.e. the stage where each shard is queried for the documents found the the top ids -- as it seems like we could be collecting this information earlier in the pipeli

Caching Solr boost functions?

2014-02-18 Thread Gregg Donovan
We're testing out a new handler that uses edismax with three different "boost" functions. One has a random() function in it, so is not very cacheable, but the other two boost functions do not change from query to query. I'd like to tell Solr to cache those boost queries for the life of the Searche

Re: Consistent relevance tie-breaking across clusters?

2013-03-02 Thread Gregg Donovan
> > i believe he wants a consistent ordering that resolves ties in docs > with identical scores in some way thta doesn't favor documents based on > any externally visible propery of the documents themselves. That's correct. If we were starting from scratch, we might start with a secondary sort o

Re: SolrCore#getIndexDir() contract change between 3.6 and 4.1?

2013-02-07 Thread Gregg Donovan
.. Anyway, I'll follow up in JIRA. --Gregg [1] https://issues.apache.org/jira/browse/SOLR-4413 On Wed, Feb 6, 2013 at 8:42 PM, Mark Miller wrote: > Thanks Gregg - can you file a JIRA issue? > > - Mark > > On Feb 6, 2013, at 5:57 PM, Gregg Donovan wrote: > > > M

Re: SolrCore#getIndexDir() contract change between 3.6 and 4.1?

2013-02-06 Thread Gregg Donovan
getIndexDir() method to know the active index directory"* is the behavior that we were reliant on. Since it's now hardcoded to dataDir + "index/", it doesn't always return the active index directory. --Gregg On Wed, Feb 6, 2013 at 5:13 PM, Mark Miller wrote: > >

SolrCore#getIndexDir() contract change between 3.6 and 4.1?

2013-02-06 Thread Gregg Donovan
ed into the older code paths. I can certainly appreciate that it's tough to make the changes needed for SolrCloud while maintaining perfect compatibility in pre-Cloud code paths. Would restoring the previous contact of SolrCore#getIndexDir() break anything in SolrCloud? Thanks! --Gregg G

Re: replicateOnStartup not finding commits after SOLR-3911?

2013-01-29 Thread Gregg Donovan
Thanks, Mark -- that fixed the issue for us. I created https://issues.apache.org/jira/browse/SOLR-4380 to track it. On Tue, Jan 29, 2013 at 4:06 PM, Mark Miller wrote: > > On Jan 29, 2013, at 3:50 PM, Gregg Donovan wrote: > >> should we >> just try uncommenting that line

replicateOnStartup not finding commits after SOLR-3911?

2013-01-29 Thread Gregg Donovan
ry uncommenting that line in ReplicationHandler? Thanks! --Gregg Gregg Donovan Senior Software Engineer, Etsy.com gr...@etsy.com [1] https://issues.apache.org/jira/browse/SOLR-3911 https://issues.apache.org/jira/secure/attachment/12548596/SOLR-3911.patch [2] http://svn.apache.org/viewvc/luce

PK uniqueness aware Solr index merging?

2013-01-24 Thread Gregg Donovan
nner than re-adding all of the documents in each directory to a new Solr index to avoid PK duplicates? Thanks. --Gregg Gregg Donovan Senior Software Engineer, Etsy.com gr...@etsy.com

Re: Solr 4.0 SnapPuller version vs. generation issue

2013-01-10 Thread Gregg Donovan
ration || forceReplication; and that fixed our post-reindexing HTTP replication issues. But I'm not sure if that check works for all of the cases that SnapPuller is designed for. --Gregg On Thu, Jan 10, 2013 at 4:28 PM, Mark Miller wrote: > > On Jan 10, 2013, at 4:11 PM, Gregg Donova

Solr 4.0 SnapPuller version vs. generation issue

2013-01-10 Thread Gregg Donovan
napPuller.java?r1=1144761&r2=1235888&pathrev=1235888&diff_format=h Thanks! --Gregg Gregg Donovan Senior Software Engineer, Etsy.com gr...@etsy.com

Re: Is FileFloatSource's WeakHashMap cache only cleaned by GC?

2012-06-06 Thread Gregg Donovan
patch. > Although we've had some folks provide a Git-based rather than an SVN-based > patch. > > Anyone can open a JIRA, but you must create a signon to do that. It'd get more > attention that way > > Best > Erick > > On Tue, Jun 5, 2012 at 2:19 PM, Gregg Dono

Is FileFloatSource's WeakHashMap cache only cleaned by GC?

2012-06-05 Thread Gregg Donovan
We've encountered GC spikes at Etsy after adding new ExternalFileFields a decent number of times. I was always a little confused by this behavior -- isn't it just one big float[]? why does that cause problems for the GC? -- but looking at theĀ FileFloatSource code a little more carefully, I wonder i

Good time for an upgrade to Solr/Lucene trunk?

2011-06-21 Thread Gregg Donovan
ut to land on trunk that are worth waiting a few weeks for? Thanks for the guidance! --Gregg Gregg Donovan Technical Lead, Search, Etsy.com gr...@etsy.com

Sorting and filtering on fluctuating multi-currency price data?

2010-10-20 Thread Gregg Donovan
SolrIndexReader, but could be per-segment. Perhaps a custom poly-field could accomplish something like this? Has anyone dealt with this sort of problem? Do any of these approaches sound more or less reasonable? Are we missing anything? Thanks for the help! Gregg Donovan Technical Lead, Search Etsy.com

Re: Difficulty with Multi-Word Synonyms

2009-09-17 Thread Gregg Donovan
hanks for creating such a precise bug report. > > +1 > > Thanks, I had missed this. This is serious, and looks due to a Lucene > back compat break. > I've added the testcase and can confirm the bug. > > -Yonik > http://www.lucidimagination.com > > > > >

Difficulty with Multi-Word Synonyms

2009-09-14 Thread Gregg Donovan
I'm running into an odd issue with multi-word synonyms in Solr (using the latest [9/14/09] nightly ). Things generally seem to work as expected, but I sometimes see words that are the leading term in a multi-word synonym being replaced with the token that follows them in the stream when they should

Re: How to handle database replication delay when using DataImportHandler?

2009-01-29 Thread Gregg Donovan
Noble, Thanks for the suggestion. The unfortunate thing is that we really don't know ahead of time what sort of replication delay we're going to encounter -- it could be one millisecond or it could be one hour. So, we end up needing to do something like: For delta-import run N: 1. query DB slave