Re: ltr (reranking) in combination with cursorMarks

2020-08-30 Thread Dmitry Kan
th. > > freiheit.com technologies gmbh > Budapester Straße 45 > 20359 Hamburg / Germany > fon: +49 40 / 890584-0 > Hamburg HRB 70814 > > +++ Hamburg/ Germany + Lisbon/ Portugal +++ > > https://www.freiheit.com > https://www.facebook.com/freiheitcom > > B444 034F 9C95 A569 C5DA 087C E6B9 CCF9 5572 A904 > Geschäftsführer: Claudia Dietze, Stefan Richter > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: https://semanticanalyzer.info

Re: Creating a phrase match feature in LTR

2020-08-28 Thread Dmitry Kan
with the exception > > Exception from createWeight for SolrFeature [name=phraseMatch, > params={q={!complexphrase inOrder=true}query(fieldName:${input})}] null > > But similar query works when used in the query reranking construct with > these params > > rqq: "{!complexphrase inOrder=true v=$v1}", > v1: "query(fieldName:"some text"~2^1.0,0)", > > What is the problem in the LTR configuration for the feature ? > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: https://semanticanalyzer.info

Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Dmitry Kan
quot;: 13.956842, > > "score": 0.17357588 > > }, > > { "id": "6512", > > "$sort_score": 14.43907, > > "score": 0.11575622 > > }, > > > > We also tried with other simple re-rank queries apart from LTR, and the > > issue persisted. > > > > Could someone please help troubleshoot? Ideally, we would want to have > the > > re-rank results merged on the single node, and not re-apply sorting. > > > > Thank you! > > > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: https://semanticanalyzer.info

Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Dmitry Kan
: "2016", > > > "$sort_score": 14.612957, > > > "score": 0.19214153 > > > }, > > > { "id": "1523", > > > "$sort_score": 14.4093275, > > > "score": 0.26738763 > > >

Re: Rerank for distributed requests

2020-08-28 Thread Dmitry Kan
nctionally > equivalent to the single shard behavior. > > I'm curious if current behavior is intended or not, typically I would > expect either something I described above or at least ignoring sort during > the merge and using only doc.score that was generated by LTR rescorer.

Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Dmitry Kan
, > { "id": "6704", > "$sort_score": 13.956842, > "score": 0.17357588 > }, > { "id": "6512", > "$sort_score": 14.43907, > "score": 0.11575622 > }, > > We also tried with other simple re

Re: Issues deploying LTR into SolrCloud

2020-08-26 Thread Dmitry Kan
odel deployment status per collection in the admin UI? Thanks, Dmitry On Tue, Aug 25, 2020 at 6:20 PM Dmitry Kan wrote: > Hi, > > There is a recent thread "Replication of Solr Model and feature store" on > deploying LTR feature store and model into a master/slave Solr

Issues deploying LTR into SolrCloud

2020-08-25 Thread Dmitry Kan
tially a bug in SolrCloud? Is there any workaround I can try, like saving the feature store and model JSON files into the collection config path and creating the SolrCloud from there? Thanks, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogs

Re: Questions about corrupted Segments files.

2019-11-06 Thread Dmitry Kan
xpected extra argument '-fix' > > > > If anybody knows about either a way to fix corrupted segment files or a > way to use checkIndex '-fix' option correctly, could you please let me > know? > > Any clue will be very appreciated. >

question on MLT params

2019-05-20 Thread Dmitry Kan
org/solr/TermVector> support." Will the tokens be parsed in the order of appearance in the stored field (same as raw input) or some prioritization like TF*IDF is going to be applied? Thanks, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.c

Re: tf function query

2017-10-12 Thread Dmitry Kan
expect as output? tf(field, "a OR b AND c NOT d"). I'm > not sure what term frequency would even mean in that situation. > > tf is a pretty simple function, it expects a single term and there's > now way I know of to do what you're asking. > > Best, &

tf function query

2017-10-05 Thread Dmitry Kan
don't use edismax parser to apply multifield boosts, but instead use a custom ranking function. Would appreciate any thoughts, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer:

ClassCastException in RelevanceComparator

2017-09-19 Thread Dmitry Kan
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532) at java.lang.Thread.run(Thread.java:745) Would tint fields be causing this? If so, should they be defined as Floats? Thanks, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http

null's in logging

2017-04-07 Thread Dmitry Kan
org.apache.solr.update.DirectUpdateHandler2 *null* - Reordered DBQs detected. Is this a known issue to have *null* or a misconfig on our part? Thanks, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan

sort by function with cursor based result fetching

2017-03-05 Thread Dmitry Kan
fixed in solr 6.x? Thanks! -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan Insider Solutions: https://semanticanalyzer.info

[ANNOUNCEMENT] Luke 6.4.1 released

2017-02-12 Thread Dmitry Kan
Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-6.4.1 Upgrade to Lucene 6.4.1. Supports: Apache Solr 6.4.1 Elasticsearch 5.2.0 Pull-requests: #79 <https://github.com/DmitryKey/luke/pull/79> and #80 <https://github.com/DmitryKey/luke/pull/80>. --

Re: [Result Query Solr] How to retrieve the content of pdfs

2016-09-20 Thread Dmitry Kan
Hi Alexandre, Could you add fl=* to your query and check the output? Alternatively, have a look at your schema file and check what could look like content field: text or similar. Dmitry 14 сент. 2016 г. 1:27 AM пользователь "Alexandre Martins" < alexandremart...@gmail.com> написал: > Hi Guys, >

RE: Where is Stored values resides ?

2016-07-22 Thread Dmitry Kan
Hi, To my best knowledge the getopt luke is not supported anymore. Use this instead: https://github.com/DmitryKey/luke Regards, Dmitry Hi Prabaharan, You can use Luke to open an index. http://www.getopt.org/luke/ -Original Message- From: Rajendran, Prabaharan [mailto:rajendra...@d

Re: puzzling StemmerOverrideFilterFactory

2016-06-30 Thread Dmitry Kan
Hi, It appears, the issue was due to a mis-config I did in schema. After StemmerOverrideFilterFactory was added on both query and index sides, the problem has disappeared. Thanks, Dmitry On Thu, May 19, 2016 at 9:01 PM, Shawn Heisey wrote: > On 5/19/2016 5:26 AM, Dmitry Kan wrote: >

Re: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Dmitry Kan
uot;solr.StemmerOverrideFilterFactory" > > > > dictionary="stemdict.txt" /> on query side, but not indexing. One > rule is > > > > mapping organization onto organiz (on query). On indexing > > > > SnowballPorterFilterFactory will

Re: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Dmitry Kan
an. Still > > searching with organization finds it in the index. Anybody has an idea > why > > this happens? > > > > This is on solr 4.10.2. > > > > Thanks, > > Dmitry > > > > -- > > Dmitry Kan > > Luke Toolbox: http://github.com/DmitryKey/l

puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Dmitry Kan
? This is on solr 4.10.2. Thanks, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

[ANNOUNCEMENT] Luke 6.0.0 released

2016-04-18 Thread Dmitry Kan
Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-6.0.0 Major upgrade to new Lucene 6.0.0 API. #55 <https://github.com/DmitryKey/luke/pull/55> Enjoy! -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com T

[ANNOUNCEMENT] Luke 5.5.0 released

2016-03-19 Thread Dmitry Kan
Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-5.5.0 <https://github.com/DmitryKey/luke/releases/tag/luke-5.4.0> Fixed in this release: #50 <https://github.com/DmitryKey/luke/issues/50> (Literally, the upgrade to Lucene 5.5.0) Enjoy! -- Dmi

Re: [Migration Solr4 to Solr5] Collection reload error

2016-03-10 Thread Dmitry Kan
Thanks Shawn, Missed the openSearcher=false setting. So another thing to check really is whether there are concurrent commitWithin calls ever to the same shard. 10 марта 2016 г. 4:39 PM пользователь "Shawn Heisey" написал: > On 3/10/2016 3:05 AM, Dmitry Kan wrote: > > Th

Re: [Migration Solr4 to Solr5] Collection reload error

2016-03-10 Thread Dmitry Kan
- Add or delete documents from the main collection >solrClient.add(doc, 180) // commitWithin > == 30 mn > solrClient.deleteById(doc, 180) // commitWithin == 30 mn > > Maybe you will spot something obviously wrong ? > > Thanks >

Re: [Migration Solr4 to Solr5] Collection reload error

2016-03-04 Thread Dmitry Kan
gt; > > Kelkoo SAS > Société par Actions Simplifiée > Au capital de € 4.168.964,30 > Siège social : 158 Ter Rue du Temple 75003 Paris > 425 093 069 RCS Paris > > Ce message et les pièces jointes sont confidentiels et établis à > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le > destinataire de ce message, merci de le détruire et d'en avertir > l'expéditeur. > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

change default id in results clustering

2016-02-18 Thread Dmitry Kan
Hi, Is it possible to change the id field, that defaults to 'id' in carrot based result clustering? I have another field, 'externalId', that is stamped on each document and would like to return it in clusters instead. -- Dmitry Kan Luke Toolbox: http://github.com/Dmitr

[ANNOUNCE] Luke 5.4.0 released

2016-02-14 Thread Dmitry Kan
earlier, but not announced separately on this list: luke running on Apache Pivot instead of the Thinlet library. It supports lucene 5.2.1. Grab it here: https://github.com/DmitryKey/luke/releases/tag/pivot-luke-5.2.1 Your feedback is appreciated! -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey

Re: similarity as a parameter

2015-12-15 Thread Dmitry Kan
structed query is going to be the > simplest/cleanest solution regardless of wether #1 or #2 makes the most > sense -- perhaps even achieving #2 by using #1 so that createWeight in > your new QueryWrapper class does the IndexSearcher wrapping before > delegating. > > > >

Re: similarity as a parameter

2015-12-15 Thread Dmitry Kan
se field but > then had the desired alternate similarity, using SchemaSimilarityFactory. > > See: > https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements > > > -- Jack Krupansky > > > On Tue, Dec 15, 2015 at 10:02 AM, Dmitry Kan wrote: > > > Hi guys

similarity as a parameter

2015-12-15 Thread Dmitry Kan
Hi guys, Is there a way to alter the similarity class at runtime, with a parameter? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

ways to affect on SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite

2015-11-02 Thread Dmitry Kan
Hi solr fans, Are there ways to affect on strategy behind SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite ? As it seems, at the moment, the rewrite method loads max N words that maximize term score. How can this be changed to loading top terms by frequency, for example? -- Dmitry Kan

[ANNOUNCE] Luke 5.3.0 released

2015-09-28 Thread Dmitry Kan
, please file an issue on the luke's github: https://github.com/DmitryKey/luke Luke Team -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: modular QueryParser in contrib

2015-09-21 Thread Dmitry Kan
pache/lucene/queryparser/flexible/standard/package-summary.html > > > > The original Jira: > > https://issues.apache.org/jira/browse/LUCENE-1567 > > > > This new query parser was dumped into Lucene some years ago, but I > haven't > > noticed any real ac

modular QueryParser in contrib

2015-09-21 Thread Dmitry Kan
modularity and customizability. Can you point to what the exact class is? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: how to extend JavaBinCodec and make it available in solrj api

2015-08-17 Thread Dmitry Kan
Shekhar Mangar > wrote: > > No, I'm afraid you will have to extend the XmlResponseWriter in that > case. > > > > On Sat, Aug 8, 2015 at 2:02 PM, Dmitry Kan wrote: > >> Shalin, > >> > >> Thanks, can I also introduce custom entity tags

Re: how to extend JavaBinCodec and make it available in solrj api

2015-08-08 Thread Dmitry Kan
changing the response writers. Instead, if you just used > nested maps/lists or SimpleOrderedMap/NamedList then every response > writer should be able to just directly write the output. Nesting is > not a problem. > > On Fri, Aug 7, 2015 at 6:09 PM, Dmitry Kan wrote: > > S

Re: how to extend JavaBinCodec and make it available in solrj api

2015-08-07 Thread Dmitry Kan
> What do you mean by a custom format? As long as your custom component > is writing primitives or NamedList/SimpleOrderedMap or collections > such as List/Map, any response writer should be able to handle them. > > On Wed, Aug 5, 2015 at 5:08 PM, Dmitry Kan wrote: > >

how to extend JavaBinCodec and make it available in solrj api

2015-08-05 Thread Dmitry Kan
lugin framework such that JavaBinCodec is extended and used for the new data structure? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

[JOB] Financial search engine company AlphaSense is looking for Search Engineers

2015-08-03 Thread Dmitry Kan
Send your CV over and let's have a chat. Please e-mail me, if you have any questions. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

[JOB] Financial search engine company AlphaSense is looking for Search Engineers

2015-07-09 Thread Dmitry Kan
Revolution, ApacheCon, Berlin buzzwords), review books on Solr. Send your CV over and let's have a chat. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

[ANNOUNCE] Luke 5.2.0 released

2015-07-07 Thread Dmitry Kan
m/DmitryKey/luke/pull/27> Lucene 5x support #28 <https://github.com/DmitryKey/luke/pull/28> Added LUKE_PATH env variable to luke.sh #30 <https://github.com/DmitryKey/luke/pull/30> Luke 5.2 -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blog

Re: issue with highlighting in solr 4.10.2

2015-06-29 Thread Dmitry Kan
nippet > size you've specified? > > Shot in the dark, > Erick > > On Fri, Jun 26, 2015 at 3:22 AM, Dmitry Kan wrote: > > Hi, > > > > When highlighting hits for the following query: > > > > (+Contents:apple +Contents:watch) Contents:iphone > >

issue with highlighting in solr 4.10.2

2015-06-26 Thread Dmitry Kan
feature? Is there any way to debug the highlighter using solr admin? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: MappingCharFilterFactory and start and end offsets

2015-06-25 Thread Dmitry Kan
intent is for offsets to > map to the *original* text. You can work around this by performing the > substitution prior to Solr analysis, e.g. in an update processor like > RegexReplaceProcessorFactory. > > Steve > www.lucidworks.com > > > On Jun 18, 2015,

MappingCharFilterFactory and start and end offsets

2015-06-18 Thread Dmitry Kan
.jpg Ideally, we would like to have start and end offset respecting the remapped token. Can this be achieved with settings? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: bug in search with sloppy queries

2015-06-15 Thread Dmitry Kan
termStats); } } [/code] as query we get the above structure, from which all terms are extracted without keeping the query structure? Could someone shed light on the logic behind this weight calculation? On Mon, Jun 15, 2015 at 10:23 AM, Dmitry Kan wrote: > To clarify additionally: we use St

Re: bug in search with sloppy queries

2015-06-15 Thread Dmitry Kan
To clarify additionally: we use StandardTokenizer & StandardFilter in front of the WDF. Already following ST's transformations e-tail gets split into two consecutive tokens On Mon, Jun 15, 2015 at 10:08 AM, Dmitry Kan wrote: > Thanks, Erick. Analysis page shows the positions are gro

Re: bug in search with sloppy queries

2015-06-15 Thread Dmitry Kan
quot;verbose" box checked > and you'll see the position of each token after analysis to see if my guess > is accurate. > > Best, > Erick > > On Sun, Jun 14, 2015 at 4:34 AM, Dmitry Kan wrote: > > Hi guys, > > > > We observe some strange bug in solr

bug in search with sloppy queries

2015-06-14 Thread Dmitry Kan
panNear([Contents:eä, Contents:commerceä], 0, true)], 300, false) This query produces words as hits, like: >From E-Tail In the inner spanNear query we expect that e and commerce will occur within 0 slop in that order. Can somebody shed light into what is going on? -- Dmitry Kan Luke Toolbox: http

storeOffsetsWithPositions does not reflect in the index

2015-05-11 Thread Dmitry Kan
e fine. Any ideas how to make storeOffsetsWithPositions work? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Dmitry Kan
one after > another > (around 5-10minutes), I start getting many OOM exceptions. > > > Thank you. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html > Sent fr

Re: Proximity Search

2015-04-30 Thread Dmitry Kan
> > SolrJ > > > > > Query API? > > > > > > > > > > Thanks & Regards > > > > > Vijay > > > > > > > > > > > > > -- > > > > The contents of this e-mail are confidential and for the exclusive > use > > of > > > > the intended recipient. If you receive this e-mail in error please > > delete > > > > it from your system immediately and notify us either by e-mail or > > > > telephone. You should not copy, forward or otherwise disclose the > > content > > > > of the e-mail. The views expressed in this communication may not > > > > necessarily be the view held by WHISHWORKS. > > > > > > > > > > > -- > > The contents of this e-mail are confidential and for the exclusive use of > > the intended recipient. If you receive this e-mail in error please delete > > it from your system immediately and notify us either by e-mail or > > telephone. You should not copy, forward or otherwise disclose the content > > of the e-mail. The views expressed in this communication may not > > necessarily be the view held by WHISHWORKS. > > > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: payload similarity

2015-04-25 Thread Dmitry Kan
see: > > http://lucidworks.com/blog/end-to-end-payload-example-in-solr/ > > Best, > Erick > > On Fri, Apr 24, 2015 at 6:33 AM, Dmitry Kan wrote: > > Ahmet, exactly. As I have just illustrated with code, simultaneously with > > your reply. Thanks! > > > > On Fri, Apr 2

Re: payload similarity

2015-04-24 Thread Dmitry Kan
Ahmet, exactly. As I have just illustrated with code, simultaneously with your reply. Thanks! On Fri, Apr 24, 2015 at 4:30 PM, Ahmet Arslan wrote: > Hi Dmitry, > > I think, it is activated by PayloadTermQuery. > > Ahmet > > > > On Friday, April 24, 2015 2:51 PM

Re: payload similarity

2015-04-24 Thread Dmitry Kan
eldNorm(doc=1) 1.0 = MaxPayloadFunction.docScore() On Fri, Apr 24, 2015 at 2:50 PM, Dmitry Kan wrote: > Hi, > > > Using the approach here > http://lucidworks.com/blog/getting-started-with-payloads/ I have > implemented my own PayloadSimilarity class. When debugging the code I have > noticed, tha

payload similarity

2015-04-24 Thread Dmitry Kan
TermQuery(new Term("body", "dogs")); termQuery.setBoost(1.1f); TopDocs topDocs = searcher.search(termQuery, 10); printResults(searcher, termQuery, topDocs); [/code] -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter

Re: Odp.: phraseFreq vs sloppyFreq

2015-04-22 Thread Dmitry Kan
1k? > > @LAFK_PL > Oryginalna wiadomość > Od: Dmitry Kan > Wysłano: środa, 22 kwietnia 2015 09:26 > Do: solr-user@lucene.apache.org > Odpowiedz: solr-user@lucene.apache.org > Temat: phraseFreq vs sloppyFreq > > Hi guys. I'm executing the following proximity query: "

phraseFreq vs sloppyFreq

2015-04-22 Thread Dmitry Kan
er phraserFreq increase the final similarity score? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

[ANNOUNCE] Luke 4.10.4 released

2015-03-16 Thread Dmitry Kan
now distributed as a tar.gz with the luke binary and a launcher script. There is currently luke atop apache pivot cooking in its own branch. You can try it out already for some basic index loading and search operations: https://github.com/DmitryKey/luke/tree/pivot-luke -- Dmitry Kan Luke Toolbox

Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-16 Thread Dmitry Kan
nd just to be really clear, you _only_ seeing more segments being > >> added, right? If you're only counting files in the index directory, it's > >> _possible_ that merging is happening, you're just seeing new files take > >> the place of old ones. > >> > >> Best, > >> Erick > >> > >> On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey > wrote: > >>> On 3/4/2015 4:12 PM, Erick Erickson wrote: > >>>> I _think_, but don't know for sure, that the merging stuff doesn't get > >>>> triggered until you commit, it doesn't "just happen". > >>>> > >>>> Shot in the dark... > >>> > >>> I believe that new segments are created when the indexing buffer > >>> (ramBufferSizeMB) fills up, even without commits. I'm pretty sure that > >>> anytime a new segment is created, the merge policy is checked to see > >>> whether a merge is needed. > >>> > >>> Thanks, > >>> Shawn > >>> > > > > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: [Poll]: User need for Solr security

2015-03-13 Thread Dmitry Kan
a wildcard search I need to have the "run" in > > "run" match "running", "runner" "runs" etc. Any but trivial encryption > > will break that, and the trivial encryption is easy to break. > > > > So putting all this over an en

Re: [Poll]: User need for Solr security

2015-03-13 Thread Dmitry Kan
t;run" in > "run" match "running", "runner" "runs" etc. Any but trivial encryption > will break that, and the trivial encryption is easy to break. > > So putting all this over an encrypting filesystem is an approach > that's often used. &

Re: [Poll]: User need for Solr security

2015-03-12 Thread Dmitry Kan
uture version of > Solr. > Examples: Local user management, AD/LDAP integration, SSL, authenticated > login to Admin UI, authorization for Admin APIs, e.g. admin user vs > read-only user etc > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com &

Re: DocumentAnalysisRequestHandler

2015-03-12 Thread Dmitry Kan
Is /analysis/document deprecated in SOLR 5? > >class="solr.DocumentAnalysisRequestHandler" > startup="lazy" /> > > > What is the modern equivalent of Luke? > > Many thanks. > > Philippe > -- Dmitry Ka

Re: Missing doc fields

2015-03-12 Thread Dmitry Kan
":"*","rows":"3","wt":"json"}},"response":{"numFound":160238,"start":0,"docs":[{"id":"10","_version_":1495262519674011648},{"id":"1","_version_":1495262517261238272},{

Re: Missing doc fields

2015-03-11 Thread Dmitry Kan
uot;:true}, > { > "name":"id", > "type":"string", > "multiValued":false, > "indexed":true, > "required":true, > "stored":true}, > { > "name":"ymd", > "type":"tdate", > "indexed":true, > "stored":true}], > > > > Yet, when I display $results in the richtext_doc.vm Velocity template, > documents only contain three fields (id, _version_, score): > > SolrDocument{id=3, _version_=1495262517955395584, score=1.0}, > > > How can I increase the number of doc fields? > > Many thanks. > > Philipppe > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-03-10 Thread Dmitry Kan
This freed up couple dozen GBs on the solr server! On Tue, Feb 17, 2015 at 1:47 PM, Dmitry Kan wrote: > Thanks Toke! > > Now I consistently see the saw-tooth pattern on two shards with new GC > parameters, next I will try your suggestion. > > The current params are:

Re: Conditional invocation of HTMLStripCharFactory

2015-03-02 Thread Dmitry Kan
; View this message in context: > http://lucene.472066.n3.nabble.com/Conditional-invocation-of-HTMLStripCharFactory-tp4190010.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blo

Re: [ANNOUNCE] Luke 4.10.3 released

2015-03-01 Thread Dmitry Kan
27;s versions at same place, as you suggested. > > Thanks, > Tomoko > > 2015-02-26 22:15 GMT+09:00 Dmitry Kan : > > > Sure, it is: > > > > java version "1.7.0_76" > > Java(TM) SE Runtime Environment (build 1.7.0_76-b13) > > Java Ho

Re: [ANNOUNCE] Luke 4.10.3 released

2015-02-26 Thread Dmitry Kan
> Seems something wrong around Pivot's, but I have no idea about it. > Would you tell me java version you're using ? > > Tomoko > > 2015-02-26 21:15 GMT+09:00 Dmitry Kan : > > > Thanks, Tomoko, it compiles ok! > > >

Re: [ANNOUNCE] Luke 4.10.3 released

2015-02-26 Thread Dmitry Kan
; > // compile and make jars and run > $ ant dist > ... > BUILD SUCCESSFULL > $ java -cp "dist/*" org.apache.lucene.luke.ui.LukeApplication > ... > > > Thanks, > Tomoko > > 2015-02-26 16:39 GMT+09:00 Dmitry Kan : > > > Hi Tomoko, > > > > Thanks for t

Re: [ANNOUNCE] Luke 4.10.3 released

2015-02-25 Thread Dmitry Kan
s, > Tomoko > > 2015-02-25 18:37 GMT+09:00 Dmitry Kan : > > > Ok, sure. The plan is to make the pivot branch in the current github repo > > and update its structure accordingly. > > Once it is there, I'll let you know. > > > > Thank you, > >

Re: highlighting the boolean query

2015-02-25 Thread Dmitry Kan
>> within the document. Been a while since I dug into the HighlightComponent, >> so maybe there’s some other options available out of the box? >> >> — >> Erik Hatcher, Senior Solutions Architect >> http://www.lucidworks.com <http://www.lucidworks.com/> >

Re: [ANNOUNCE] Luke 4.10.3 released

2015-02-25 Thread Dmitry Kan
long way to go for Pivot's version, of course, I'd like to also > make pull requests to enhance github's version if I can. > > Thanks, > Tomoko > > 2015-02-24 23:34 GMT+09:00 Dmitry Kan : > > > Hi, Tomoko! > > > > Thanks for being a fan of luke

Re: [ANNOUNCE] Luke 4.10.3 released

2015-02-24 Thread Dmitry Kan
7;t have any opinions, just want to understand current status and avoid > duplicate works. > > Apologize for a bit annoying post. > > Many thanks, > Tomoko > > > > 2015-02-24 0:00 GMT+09:00 Dmitry Kan : > > > Hello, > > > > Luke 4.10.3 has been r

Re: Integration Tests with SOLR 5

2015-02-24 Thread Dmitry Kan
so no local repository cache is present > - How to deploy your schema.xml, stopwords, solr plug-ins etc. for testing > in an isolated environment > - What does a maven boilerplate code look like? > > Any ideas would be appreciated. > > Kind regards, > > Thomas > --

Re: highlighting the boolean query

2015-02-24 Thread Dmitry Kan
oes look like something with the highlighter. Whether other > highlighters are better for this case.. no clue ;( > > Best, > Erick > > On Mon, Feb 23, 2015 at 9:36 AM, Dmitry Kan wrote: > > Erick, > > > > nope, we are using std lucene qparser with some customization

Re: highlighting the boolean query

2015-02-23 Thread Dmitry Kan
Erick, nope, we are using std lucene qparser with some customizations, that do not affect the boolean query parsing logic. Should we try some other highlighter? On Mon, Feb 23, 2015 at 6:57 PM, Erick Erickson wrote: > Are you using edismax? > > On Mon, Feb 23, 2015 at 3:28 AM, D

[ANNOUNCE] Luke 4.10.3 released

2015-02-23 Thread Dmitry Kan
iation changed from ASL 2.0 to ALv2 Thanks to respective contributors! P.S. waiting for lucene 5.0 artifacts to hit public maven repositories for the next major release of luke. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitt

highlighting the boolean query

2015-02-23 Thread Dmitry Kan
the standard highlighter? Can it be mitigated? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: Internal document format for Solr 4.10.2

2015-02-18 Thread Dmitry Kan
ity to store this internal document in xml format ? > > -- > Best Regards, > Dinesh Naik > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
-XX:CMSInitiatingOccupancyFraction=40 Dmitry On Tue, Feb 17, 2015 at 1:34 PM, Toke Eskildsen wrote: > On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote: > > Solr: 4.10.2 (high load, mass indexing) > > Java: 1.7.0_76 (Oracle) > > -Xmx25600m > > > > > > Solr: 4.3.1 (normal load, no ma

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
> -Xmx25600m > > > > > > > > The RAM consumption remained the same after the load has stopped on > the > > > > 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via > > > > jvisualvm dropped the used RAM from 8,5G to 0,5G. But t

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
> > > This unusual spike happened during mass data indexing. > > > > What else could be the artifact of such a difference -- Solr or JVM? Can > it > > only be explained by the mass indexing? What is worrisome is that the > > 4.10.2 shard reserves

unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
indexing. What else could be the artifact of such a difference -- Solr or JVM? Can it only be explained by the mass indexing? What is worrisome is that the 4.10.2 shard reserves 8x times it uses. What can be done about this? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http

Re: Weird Solr Replication Slave out of sync

2015-02-17 Thread Dmitry Kan
Because these type > of issues are going to > be hard to find especially when there are on errors. > > What could be happening. and how can I avoid this from happening ? > > > Thanks, > Summer > > -- Dmitry Kan Luke Toolbox: http://github.com/Dmi

Re: ApacheCon 2015 at Austin, TX

2015-02-12 Thread Dmitry Kan
ll be lucene/solr sessions in it. > > Anyone else planning to attend? > > Thanks, > CP > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: WordDelimiterFilterFactory and position increment.

2015-02-04 Thread Dmitry Kan
Filter on query side. > > Regards, > Modassar > > On Fri, Jan 30, 2015 at 5:12 PM, Dmitry Kan wrote: > > > Hi, > > > > Do you use WordDelimiterFilter on query side as well? > > > > On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather > > &

How deletes affect on QPS

2015-01-31 Thread Dmitry Kan
post is on Lucene level): https://www.elasticsearch.org/blog/lucenes-handling-of-deleted-documents/ -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: solrj returning no results but curl can get them

2015-01-31 Thread Dmitry Kan
ut-curl-can-get-them-tp4183053p4183119.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: WordDelimiterFilterFactory and position increment.

2015-01-30 Thread Dmitry Kan
h per my understanding causes the > > phrase search "3d image" fail. > > "3d image"~1 works fine. Same behavior is present for "wi-fi device" and > > other few queries starting with token which is tokenized as shown above > in > > the table. >

Re: solrj returning no results but curl can get them

2015-01-30 Thread Dmitry Kan
ring and run this command: > > curl "http://myserver/myapp/myproduct\ > > > fl=*,score&rows=500&qt=/myproduct&hl=on&hl.fl=title+snippet&hl.fragsize=50\ >&hl.simple.pre=&hl.simple.post=\ >&a

Re: SOS-help: How to store solr index data in hbase table???

2015-01-26 Thread Dmitry Kan
ve no idea how you would do that. You *can* store your indexes in > HDFS storage, but that's not the same thing. > > https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS > > I have never done this, so I have no idea whether this documentation is > complete. >

groups inside groups

2015-01-15 Thread Dmitry Kan
'UserId'=>68}, { 'UserId'=>68}] }}} Desired output: 'grouped'=>{ 'UserId'=>{ 'matches'=>22154, 'groups'=>[]}, 'Field1:[* TO *] AND Field2:[* TO *]'=>{ '

Re: SegmentInfos exposed to /admin/luke

2014-12-08 Thread Dmitry Kan
ese data. > > We'd be happy to push the changes to Solr afterwards. > > > Thank you, > Alexey Kozhemiakin > > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: Solr slow start up (tlog is small)

2014-11-03 Thread Dmitry Kan
> > > > 2) > > Solr home: 185G > > tlog: 5M > > 17 minutes to start up > > While starting up, disk read is constantly about 5MB/s (according to > > dstat). So it seems that Solr reads 17m * 60s *5MB/s = 5GB of data while > > starting up, which is abo

Re: dynamically change default update chain

2014-11-03 Thread Dmitry Kan
An update: Another idea comes from Erick Hatcher; sharing it for the benefit of anyone who's interested in the topic: maybe you can make a custom request handler that toggles which is the default chain? On Mon, Nov 3, 2014 at 4:08 PM, Dmitry Kan wrote: > Thanks, Mike, > > we

Re: dynamically change default update chain

2014-11-03 Thread Dmitry Kan
core. > > -Mike > > > > On 11/3/14 6:28 AM, Dmitry Kan wrote: > >> Hello solr fellows, >> >> I'm working on a project that involves using two update chains. One >> default >> chain is used most of the time and another one custom is used >&

  1   2   3   4   5   6   >