Re: Questions about Solr Search

2020-07-02 Thread Doug Turnbull
ng content using the Knowledge Graph entities. > > > *Your help will be appreciated highly.* > > Many thanks > Gautam Kanaujia > India > -- *Doug Turnbull **| CTO* | OpenSource Connections <http://opensourceconnections.com>, LLC | 240.476.9983 Author: Relevant Search

Re: RankLib model output format to Solr LTR model format

2020-06-17 Thread Doug Turnbull
2020 at 12:46 PM gnandre wrote: > Hi, > > Before I start writing my own implementation for converting RankLib's model > output format to Solr LTR model format for my own use cases, I just wanted > to check if there is any work done on this front already. Any references >

Re: Master Slave Terminology

2020-06-17 Thread Doug Turnbull
> > > > As the Github and Python will replace terminologies that relative to > > > > slavery, > > > > why don't we replace master-slave for Solr as well? > > > > > > > > https://developers.srad.jp/story/18/09/14/0935201/ > > >

Re: Why Did It Match?

2020-05-21 Thread Doug Turnbull
mstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > > > Click http://www.merckgroup.com/disclaimer to access the German, French, >

Re: Dynamic Stopwords

2020-05-15 Thread Doug Turnbull
to configure stop words to be dynamic for each > > document > > > based on the language detected of a multilingual text field? Combining > > all > > > languages stop words in one set is a possibility however it introduces > > > false positives for some language

Re: JSON from Term Vectors Component

2020-02-06 Thread Doug Turnbull
# append to list > prev_v.append(v) > else: > # turn into list > new_v = [prev_v, v] > d[k] = new_v > else: > d[k] = v > return d > decoder = JSONDecoder(obj

Re: JSON from Term Vectors Component

2020-02-06 Thread Doug Turnbull
ayout could be redesigned to be more portable. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Feb 6, 2020, at 8:38 AM, Doug Turnbull < > dturnb...@opensourceconnections.com> wrote: > > > > T

Re: JSON from Term Vectors Component

2020-02-06 Thread Doug Turnbull
; a > query parameter for this case? > > Regards, > Munendra S N > > > > On Thu, Feb 6, 2020 at 10:01 PM Doug Turnbull < > dturnb...@opensourceconnections.com> wrote: > > > Hi all, > > > > I was curious if anyone had any tips on parsing the JSO

JSON from Term Vectors Component

2020-02-06 Thread Doug Turnbull
0", [ "uniqueKey", "D10", "body", [ "1", [ "positions", [ "position", 92, "position", 113 ] ], "10", [ ... -- *Doug Turnbull **| CTO* | OpenSource Connections <http://opensourceconnections.com>, L

Re: Haystack CFP is open, come and tell us how you tune relevance for Lucene/Solr

2020-01-27 Thread Doug Turnbull
Lucene/Solr relevance stories. > > Cheers > > Charlie > -- > > Charlie Hull > Flax - Open Source Enterprise Search > > tel/fax: +44 (0)8700 118334 > mobile: +44 (0)7767 825828 > web: www.flax.co.uk > > -- *Doug Turnbull **| CTO* | OpenSource Connections <

Re: Solr is very slow with term vectors

2019-08-11 Thread Doug Turnbull
h request for 100 records. > How do I make it faster to get my results in ms ? > Please respond soon as its lil urgent. > > Note: All my values are stored and indexed. I am not using Solr Cloud. > -- *Doug Turnbull **| CTO* | OpenSource Connections <http://opensourceconnec

Re: Quepid, the relevance testing tool for Solr, released as open source

2019-07-26 Thread Doug Turnbull
> (also particularly pleased to see Luwak, the stored query engine we > built at Flax become part of Lucene - it's a great day for open source!) > > Cheers > > Charlie > > -- > Charlie Hull > Flax - Open Source Enterprise Search > > tel/fax: +44 (0)8700 118334 &g

Solr Learning to Rank - Trailing slash in path behavior differences, expected?

2019-06-03 Thread Doug Turnbull
uot;QTime":0}, "featureStores":["title"]} Feels like this is a really easy mistake to make and lose a lot of time on. But is there a reason this is actually somehow expected behavior? It seems that with a / my request is interpreted as for a feature store (of

Re: regarding debugging solr in eclipse

2019-01-18 Thread Doug Turnbull
& Solutions Architect | OpenSource Connections, > LLC > > | 434.409.2780 <(434)%20409-2780> > > http://www.opensourceconnections.com > > > -- > Lucene/Solr Search Committer (PMC), Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Boo

Re: Debugging Solr Search results & Issues with Distributed IDF

2019-01-02 Thread Doug Turnbull
parameter b and parameter K1? > > 3. Why there are lots of parameters included in myDoc15 rather than > Doc1? > > Is there any documentations I can refer to understand thesolr query > calculations in depth. > > We are using Solr 6.1in Cloud with 3 zookeepers and 3 maste

Re: Similarity plugins which are normalized

2018-11-29 Thread Doug Turnbull
by the Term Frequency component. Since I am using a threshold to filter the > results for a matched record based off the SOLR score, a somewhat > normalized score is needed. > Are there any similarity classes that are more suitable to my needs? > > Thanks, > Tanu > -- *Do

Re: Flatten term frequency

2018-11-29 Thread Doug Turnbull
I think the similarity way (setting k1 to 0) or a constant score query are probably the best ways. Omitting term freqs and position will also remove positions meaning phrase queries won’t work. This blog article might be useful for your use case. I discuss a similar prob. https://opensourceconnec

Re: Boosting score based off a match in a particular field

2018-11-28 Thread Doug Turnbull
class", > "org.apache.solr.common.SolrException", "root-error-class", > "org.apache.solr.search.SyntaxError"], > "msg":"org.apache.solr.search.SyntaxError: > Infinite Recursion detected parsing query > > Thank you, > Tanya > &g

Re: Boosting score based off a match in a particular field

2018-11-28 Thread Doug Turnbull
The terminology we use at my company is you want to *gate* the effect of boost to only very precise scenarios. A lot of this depends on how your Email and Phone numbers are being tokenized/analyzed (ie what analyzer is on the field type), because you really only want to boost when you have high con

Re: PSA: Activate 2018 videos are now available

2018-11-28 Thread Doug Turnbull
Thanks Alex, and thanks to everyone who was part of organizing the conference! On Wed, Nov 28, 2018 at 12:28 PM Alexandre Rafalovitch wrote: > For all those who wanted to be at the conference for the talks :-) but > could not: > > https://www.youtube.com/watch?v=Hm98XL0Mw5c&list=PLU6n9Voqu_1HW8-

Haystack Relevance Conference Announced; CFP ends Jan 9!

2018-11-27 Thread Doug Turnbull
Hey everyone, Many of you may know about/have been to Haystack - The Search Relevance Conference. http://haystackconf.com We're excited to announce 2019's Haystack, April 22-25 in Charlottesville, VA, USA. Our CFP due January 9th. We want to bring together practitioners that work on really inter

Re: Synonyms relationships

2018-10-31 Thread Doug Turnbull
Synonyms in Solr are really a kind of "programmers" tool, useful for mapping terms to other terms. This need not correspond to linguistic notions of a synonym or hypernomy/hyponomy. That being said, there's probably half a dozen approaches for doing these kinds of taxonomical relationships in Solr

Re: Integrating word2vec and glove results into Solr

2018-10-30 Thread Doug Turnbull
You may already know this, but just be very careful. Embeddings are useful, but people often think of them as detecting synonyms, but really just encode contexts. For example antonyms and words with similar functions often are seen as similar. There's also issues with terms that occur in sparsely

Re: Storing & using feature vectors

2018-10-19 Thread Doug Turnbull
This is a pretty big hole in Lucene-based search right now that many practitioners have struggled with I know a couple of people who have worked on solutions. And I've used a couple of hacks: - You can hack together something that does cosine similarity using the term frequency & query boosts Del

Re: MLT in Cloud Mode - Not Returning Fields?

2018-09-07 Thread Doug Turnbull
/handler/component/MoreLikeThisComponent.java#L342 Happy to create a Jira ticket -Doug On Mon, Sep 3, 2018 at 5:23 AM Charlie Hull wrote: > On 31/08/2018 19:36, Doug Turnbull wrote: > > Hello, > > > > We're working on a Solr More Like This project (Solr 6.6.2), usin

Re: MLT in Cloud Mode - Not Returning Fields?

2018-09-03 Thread Doug Turnbull
Thanks Charlie, those are helpful. I think at this point we will attach a debugger and see what shakes out. Perhaps it's one of these cases you list. Perhaps we're missing something. We'll report back. -Doug On Mon, Sep 3, 2018 at 5:23 AM Charlie Hull wrote: > On 31/0

MLT in Cloud Mode - Not Returning Fields?

2018-08-31 Thread Doug Turnbull
Hello, We're working on a Solr More Like This project (Solr 6.6.2), using the More Like This searchComponent. What we note is in standalone Solr, when we request MLT using the search component, we get every more like this document fully formed with complete fields in the moreLikeThis section. In

Re: Boost matches occurring early in the field (offset)

2018-08-29 Thread Doug Turnbull
You can also insert a token at the beginning of the query during analysis using a char filter. I call these sort of boundary tokens "sentinel tokens". So a phrase search for "red shoes" becomes " red shoes". You can add some slop to allow for permissible distance (with You can also use the Limit T

Re: Contextual Synonym Filter

2018-08-17 Thread Doug Turnbull
Would one option be to change the query analyzer at query time? The Match Query Parser (https://github.com/o19s/match-query-parser), would let you do this -Doug On Fri, Aug 17, 2018 at 8:04 AM Vergantini Luca wrote: > I need to create a contextual Synonym Filter: > > > > I need that the Synonym

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Doug Turnbull
Also share your fieldType settings for myfield as well from your schema On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Aside from the screenshot issue, one thing to check: are you searching > with defType=edismax ? > > As in >

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Doug Turnbull
Aside from the screenshot issue, one thing to check: are you searching with defType=edismax ? As in q=lcd&qf=myfield&sow=false&defType=edismax ? Also sow=false should the the default on Solr 7 and above Doug On Wed, Aug 15, 2018 at 6:27 PM Roy Lim wrote: > I'm trying to figure out why the m

Solr Relevance Engineer Training, Sept 25 & 26

2018-08-03 Thread Doug Turnbull
Hey everyone, Many may know me, in the words of Will Hayes, as "Mr. Relevance". I'm the author of the book Relevant Search, and prolific blogge r about all things Solr relevance. I want to share I'll be running a Solr 'Think Like a Relevance Engineer' course Sept 25 and 26 <

Re: common ecommerce use case

2018-07-07 Thread Doug Turnbull
I discuss nearly this exact use case my Lucene Rev talk 'Taxonomical Semantical Magical Search' https://www.youtube.com/watch?v=90F30PS-884 A common theme in my work with search UX is that users (especially in B2C contexts) increasingly don't want to use filters/facets. They want highly relevant,

Re: Remove schema.xml in favor of managed-schema

2018-06-19 Thread Doug Turnbull
I actually prefer the classic config-files approach over managed schemas. Having done both Elasticsearch (where everything is configed through an API), managed and non-managed Solr, I prefer the legacy non-managed Solr way of doing things when its possible - With 'managed' approaches, the config c

Re: Remove schema.xml in favor of managed-schema

2018-06-16 Thread Doug Turnbull
I'm not sure changing something from mutable -> unmutable means it suddenly becomes hand-editable. I don't know the details here, but I can imagine a case that unmutable implies some level of consistency, where the file is hashed, and later might be confirmed to still be the same 'unmutable' state

Re: Introducing a stopword in a query causes ExtendedDismaxQueryParser to produce a radically different parsed query

2018-05-02 Thread Doug Turnbull
This is a problem that we’ve noted too. This blog post discusses the underlying cause https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/ Hope that helps On Wed, May 2, 2018 at 3:07 PM Chris Wilt wrote: > I began with a 7.2.1 solr instance using the techpr

Re: Team please help

2018-04-29 Thread Doug Turnbull
Morphlines is a cloudera specific tool. I suspect moving Solr platforms will require you to rework your indexing somewhat. You may need to step back and think about the requirements of what you’re doing and design how it would work with Solr/Azure tooling. On Sat, Apr 28, 2018 at 8:58 PM Erick Eric

Re: Search Analytics Help

2018-04-27 Thread Doug Turnbull
I hadn't time to publish it. > > > > Another option is use only logstash to feed e.g. graphite database and > > show results with grafana or combine all these options. > > > > You can also monitor SOLR instances by JMX logstash input plugin. > > > > R

Re: Search Analytics Help

2018-04-26 Thread Doug Turnbull
Honestly I haven’t seen anything satisfactory (yet). It’s a huge need in the open source community On Thu, Apr 26, 2018 at 3:38 PM Ennio Bozzetti wrote: > Hello, > > I'm setting up SOLR on an internal website for my company and I would like > to know if anyone can recommend an analytics that I c

Re: Loss of "Optimise" button

2018-04-21 Thread Doug Turnbull
I haven’t tracked this change, but can you still optimize through the API? Here’s an example using update XML https://stackoverflow.com/questions/6954358/how-to-optimize-solr-index There are so many cases hitting “optimize” causes a huge segment merge that brings down a Solr cluster that I think

Re: ZK CLI script giving IOException doing upconfig

2018-04-05 Thread Doug Turnbull
der too if there's anything that can be changed in zkcli to see if the confdir is a reasonable configuration directory? Thanks -Doug On Wed, Apr 4, 2018 at 3:51 PM Shawn Heisey wrote: > On 4/4/2018 12:13 PM, Doug Turnbull wrote: > > Thanks for the responses. Yeah I thought th

Re: ZK CLI script giving IOException doing upconfig

2018-04-04 Thread Doug Turnbull
:] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /0:0:0:0:0:0:0:1:55079 which had sessionid 0x10024db7e2800 On Wed, Apr 4, 2018 at 11:15 AM Shawn Heisey wrote: > On 4/4/2018 7:14 AM, Doug Turnbull wrote: > > I've been st

ZK CLI script giving IOException doing upconfig

2018-04-04 Thread Doug Turnbull
I've been struggling to do a basic upconfig both with embedded and actual Zookeeper in Solr 7.2.1 using the zkcli script on OSX. One variable, I recently upgraded to Java 9. I get slightly different errors on Java 8 vs 9 This is probably me being dumb, but googling / searching Jira hasn't really

Haystack - Search Relevance Conf - Agenda Announced

2018-02-16 Thread Doug Turnbull
Come to the hometown of some Solr gurus like Eric Pugh, Erik Hatcher, and Doug Turnbull for our search relevance conference :) Check out the agenda. Lots of good Solr talks from people doing innovative work at places like Wikimedia Foundation, Elsevier, Snagajob, LexisNexis, Lucidworks, Elastic

Re: Fusion or DIY w/Solr?

2018-02-06 Thread Doug Turnbull
using Solr On Tue, Feb 6, 2018 at 2:30 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Used Fusion an a couple projects, > > *Pros:* > *-* Wraps Solr, so you should be able to do anything you can do in Solr > in Fusion > - 'Opinionated' Solr - Ann

Re: Fusion or DIY w/Solr?

2018-02-06 Thread Doug Turnbull
Used Fusion an a couple projects, *Pros:* *-* Wraps Solr, so you should be able to do anything you can do in Solr in Fusion - 'Opinionated' Solr - Annoying problems I see on every team, solved with common tooling (experiments, signal collection, etc). Can save tons of work. - Relevance focused - g

Re: Haystack: The Search Relevance & Cognitive Search Conference

2018-01-12 Thread Doug Turnbull
- Doug On Fri, Dec 8, 2017 at 3:27 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Join us at Haystack, April 10 & 11 where we discuss advanced technical > topics on search relevance and cognitive search! We'll discuss topics on > applied relevance

Re: request dependent analyzer

2017-12-18 Thread Doug Turnbull
Yes I would like to get around to implementing that. You might find out match query parser useful for selecting analyzers at query time https://github.com/o19s/match-query-parser -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http

Re: Any Insights SOLR Rank tuning tool

2017-12-13 Thread Doug Turnbull
7C+Getzville,+NY+14068&entry=gmail&source=g> > www.wshein.com/contact-us > > > From: Doug Turnbull > Sent: Wednesday, December 13, 2017 3:47 PM > To: solr-user@lucene.apache.org > Cc: Sherrill, Marty; Lu, David T. > Subject: Re: An

Re: Any Insights SOLR Rank tuning tool

2017-12-13 Thread Doug Turnbull
Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > There's a lot of tools out there that target different audiences > > Splainer - Open source, dev focused, but helps business users understand > results > http://splainer.io > > Quepid - Product (our compan

Re: Any Insights SOLR Rank tuning tool

2017-12-13 Thread Doug Turnbull
igrate from FAST ESP to SOLR. > I was just wondering if you Guys have any built-in Relevancy tool for the > Business Folks like what we have in FAST called SBC (Search Business > Center)? > > Thanks, Abhi > -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Haystack: The Search Relevance & Cognitive Search Conference

2017-12-08 Thread Doug Turnbull
1 Click here to learn more. CFPs Needed! http://o19s.com/haystack Best -Doug -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Re: Calling rest API from Solr custom tokenizer plugin

2017-12-06 Thread Doug Turnbull
; planning to not to customize manifold cf. > > Please suggest > > > > Regards, > Sreenivas > -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Re: Skewed IDF in multi lingual index, again

2017-12-05 Thread Doug Turnbull
> Alessandro Benedetti > Search Consultant, R&D Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Re: Skewed IDF in multi lingual index, again

2017-12-05 Thread Doug Turnbull
; > OR > > > > maxDocs(index)= max # of documents that appeared in the index ( field > > independent) > > The latter. > I imagine that's why docCount was introduced (to avoid changing the > meaning of an existing term). > FWIW, the scoring change was made in >

Re: Possible to disable SynonymQuery and get legacy behavior?

2017-11-21 Thread Doug Turnbull
ely on this technique. > > I've been going through QueryBuilder and I don't see where we could go > back to the legacy behavior. It seems to be based on position overlap. > > Thanks! > -Doug > > > > -- > Consultant, OpenSource Connections. Contact inf

Possible to disable SynonymQuery and get legacy behavior?

2017-11-21 Thread Doug Turnbull
Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Re: vespa

2017-10-06 Thread Doug Turnbull
oes anyone know more about the framework? does it provide a new way to do > search? how does it compare with Solr? > > https://github.com/vespa-engine/vespa > http://vespa.ai > > -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Re: edismax-with-bq-complexphrase-query not working in solrconfig.xml search handler

2017-09-04 Thread Doug Turnbull
her-parsers > > > stipulates > some required "escaping": Special care has to be given when escaping: > clauses between double quotes (usually whole query) is parsed twice, these > parts have to be escaped as twice. eg "foo\\: bar\\^" > > hence, it is possible that I have not used the proper escaping syntax. > > It is troubling that I cannot use the same URL parameter expression in a > search handler to accomplish the same effect, a strong assumption of mine > in how Solr can be used. > > Any suggestion, comment, or similar experience? Does it look like a bug? > > Thank you, > Bertrand > -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Re: "What is Solr" in Google search results

2017-08-30 Thread Doug Turnbull
pular is not the right > > answer. > > > > So I want inform the community and search for an advice, if any, how to > > have a better description in the Google results page. > > > > If you have any comments or questions, please let me know. > > > > Best regards, > > Vincenzo > > > > > > -- > > Vincenzo D'Amore > > email: v.dam...@gmail.com > > skype: free.dev > > mobile: +39 349 8513251 <+39%20349%20851%203251> <349%20851%203251> > > > -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Re: LambdaMART XML model to JSON

2017-07-23 Thread Doug Turnbull
Yes you're correct that the feature is the 1-based identifier from your training data. For a script. Not one to Solr exactly, but when developing the Elasticsearch plugin, I started to work on a JSON serialization format, and as part of that built a Python script for reading the Ranklib XML and ou

Re: Solr Web Crawler - Robots.txt

2017-06-01 Thread Doug Turnbull
Scrapy is fantastic and I use it scrape search results pages for clients to take quality snapshots for relevance work Ignoring robots.txt sometimes legit comes up because a staging site might be telling google not to crawl but don't care about a developer crawling for internal purposes. Doug On T

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Doug Turnbull
David I think it can be done, but a score has no real *meaning* to your domain other than the one you engineer into it. There's no 1-100 scale that guarantees at 100 that your users will love the results. Solr isn't really a turn key solution. It requires you to understand more deeply what relevan

Re: The downsides of not splitting on whitespace in edismax (the old albino elephant prob)

2017-03-29 Thread Doug Turnbull
ax queries: > > >> (title:albino | title:albino) OR (text:elephant | text:elephant) > > > This should instead be: > > (title:albino | text:albino) OR (title:elephant | text:elephant) > > -- > Steve > www.lucidworks.com > > > On Mar 29, 2017, at 10:49 AM, Do

Re: The downsides of not splitting on whitespace in edismax (the old albino elephant prob)

2017-03-29 Thread Doug Turnbull
f I'm wrong here) as opposed to per-term. If I understand this correctly, you may run into a different set of problems along the albino elephant spectrum when sow=true On Wed, Mar 29, 2017 at 10:45 AM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > So with regards to th

The downsides of not splitting on whitespace in edismax (the old albino elephant prob)

2017-03-29 Thread Doug Turnbull
So with regards to this JIRA ( https://issues.apache.org/jira/browse/SOLR-9185) Which makes Solr splitting on whitespace optional. I want to point out that there's not a simple fix to multi-term synonyms in part because of specific tradeoffs. Splitting on whitespace is *someimes a good thing*. Not

Re: Multi word synonyms

2017-03-27 Thread Doug Turnbull
Fntastic! On Mon, Mar 27, 2017 at 9:56 AM alessandro.benedetti wrote: > In addition to what Doug has already pointed out, i would like to highlight > this contribution in Solr 6.5.0 . > It may seem like a small innocent patch but it actually open a new worlds > for one of the most controversi

Re: Multi word synonyms

2017-03-26 Thread Doug Turnbull
You might have stumbled on all these articles, but you can probably read our orgs progression with this problem as a play in 3 acts Act I Introducing the characters http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/ Act II Heroes Meet Despair http://open

Re: Announcing Marple, a RESTful API & GUI for inspecting Lucene indexes

2017-02-27 Thread Doug Turnbull
Marple looks great, and if you want to work on really interesting Solr problems, I can heartily vouch for a career in consulting and recommend Flax as a great firm to work for! Best -Doug On Fri, Feb 24, 2017 at 12:26 PM Charlie Hull wrote: > On 24/02/2017 17:24, Charlie Hull wrote: > > Hi all,

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Doug Turnbull
With that many documents, why not start with an AND search and reissue an OR query if there's no results? My strategy is to prefer an AND for large collections (or a higher mm than 1) and prefer closer to an OR for smaller collections. -Doug On Tue, Feb 21, 2017 at 1:39 PM Fuad Efendi wrote: >

Re: How to combine third party search data as top results ?

2017-02-01 Thread Doug Turnbull
I was going to say what Charlie said! I would trust Flax's work in this area :) -Doug On Wed, Feb 1, 2017 at 3:10 PM shamik wrote: > Charlie, thanks for sharing the information. I'm going to take a look and > get > back to you. > > > > -- > View this message in context: > http://lucene.472066.n

Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

2017-01-24 Thread Doug Turnbull
>Alex. > P.s. Cool enough for http://solr.cool/ ? > > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ > > > On 2 September 2016 at 07:45, Doug Turnbull > wrote: > > I wanted to solicit feedback on my query parser, the mat

Re: Apache Solr Question

2016-11-03 Thread Doug Turnbull
For general search use cases, it's generally not a good idea to index giant documents. A relevance score for an entire book is generally less meaningful than if you can break it up into chapters or sections. Those subdivisions are often much more useful to a user from a usability standpoint for und

Re: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread Doug Turnbull
You might want to talk to Kevin Waters or look at some of the work being done with the graph plugin. It's being used to model permissions with Solr. It's a bit of normalization within Solr whereby you could localize updates to a users shared-with document. Kevin can probably talk more intelligently

Re: Stream expressions: Break up multivalue field into usable tuples

2016-10-08 Thread Doug Turnbull
replace(field, null, withValue=item))) There is a ticket open to have scoreNodes operate directly on the facet() function so you don't have to deal with the select() function. https://issues.apache.org/jira/browse/SOLR-9537. I'd like to get to this soon. Joel Berns

Stream expressions: Break up multivalue field into usable tuples

2016-09-22 Thread Doug Turnbull
I have a field like follows in my search index { "shopper_id": 1234, "basket_id": 2512, "items_bought": ["eggs", "tacos", "nachos"] } { "shopper_id" 1236, "basket_id": 2515, "items_bought": ["eggs", "tacos", "chicken", "bubble gum"] } I would like to use some of the stream expr

Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

2016-09-02 Thread Doug Turnbull
u could also query multiple > fields at the same time, to give more edismax-like functionality. In fact, > you could probably extend this slightly to almost entirely replace edismax, > by allowing multiple fields and multiple analysis paths. > > Alan Woodward > www.flax.co.uk &

Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

2016-09-01 Thread Doug Turnbull
I wanted to solicit feedback on my query parser, the match query parser ( https://github.com/o19s/match-query-parser). It's a work in progress, so any thoughts from the community would be welcome. The point of this query parser is that it's not a query parser! Instead, it's a way of selecting any

Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-23 Thread Doug Turnbull
Thanks John, and Mary Joe Yeah it's definitely more about "relevance" than ES or Solr. So the choice in search engine is more an implementation detail. We chose ES because it's more book/educational friendly, not necessarily because it's the best choice as a search engine. It's query language is c

Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-23 Thread Doug Turnbull
For those who can't get enough of me and John, you can see us live at 2PM ET today talk about the book. Come bring your questions! :) https://blab.im/matthew-l-overstreet-relevant-search-and-building-a-search-practice-jfgn2g On Wed, Jun 22, 2016 at 8:45 AM Doug Turnbull <

Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-22 Thread Doug Turnbull
t's very well-written, and definitely worth the read. > > Congrats again, guys. > > Trey Grainger > Co-author, Solr in Action > SVP of Engineering @ Lucidworks > > On Tue, Jun 21, 2016 at 2:12 PM, Doug Turnbull < > dturnb...@opensourceconnections.com> wrote: &

Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread Doug Turnbull
he mailing list there!! Best -Doug On Tue, Jun 21, 2016 at 2:16 PM Will Hayes wrote: > W00t! Congrats! > On Jun 21, 2016 8:12 PM, "Doug Turnbull" < > dturnb...@opensourceconnections.com> wrote: > > > Not much more to add than my post here! This book

[ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread Doug Turnbull
Not much more to add than my post here! This book is targeted towards Lucene-based search (Elasticsearch and Solr) relevance. Announcement with discount code: http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/ Related hacker news thread: https://news.ycombinator.com/item?

Re: Solutions for Multi-word Synonyms

2016-06-09 Thread Doug Turnbull
Mary Jo, Honestly half the time I run into this problem, I end up creating a QParserPlugin because I need to do something specific. With a QParserPlugin I can run whatever analysis, slicing and dicing of the query string to manually construct whatever I need to http://www.supermind.org/blog/1134/

Re: Stemming Help

2016-06-05 Thread Doug Turnbull
What output are you seeing exactly from the analysis UI? It's also interesting you're not lowercasing after tokeinzation. On Sun, Jun 5, 2016 at 10:42 AM Georg Sorst wrote: > Without having more context: > > How do you know that it is not working? > What is the output you are getting in the anal

Re: Boost(bf) function in solr

2016-05-30 Thread Doug Turnbull
Let's say you're building search for your blog. If popularity is say number of page views, than a handful might have a million (they made it to hacker news and slashdot). A few dozen may have hundreds of thousand (they only made it to slashdot). The vast majority might have less than 100 page views

Re: deactivate coord scoring factor in pf2 pf3

2016-04-29 Thread Doug Turnbull
rc/java/org/apache/solr/search/ExtendedDismaxQParser.java -Doug On Thu, Apr 28, 2016 at 2:05 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Glad to see you're using http://splainer.io! I recognize those explains! > (let me know if you have any ideas/thoughts/questi

Re: deactivate coord scoring factor in pf2 pf3

2016-04-28 Thread Doug Turnbull
Glad to see you're using http://splainer.io! I recognize those explains! (let me know if you have any ideas/thoughts/questions/criticisms I created the thing). Some thoughts - You might consider using ps2 or ps3 to add a slop to the two word and three word phrase searches. Slop adds a less strict

Re: need help with keyword spamming

2016-04-23 Thread Doug Turnbull
By keyword spamming, do you mean stuffing the same term over and over to game term frequency? If so You might want to try tuning BM25 similarity for your needs. It has a saturation point for term frequency. http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-releva

Re: ConcurrentUpdateSolrClient Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2016-04-22 Thread Doug Turnbull
Joe this might be _version_ as in Solr's optimistic concurrency used in atomic updates, etc http://yonik.com/solr/optimistic-concurrency/ On Fri, Apr 22, 2016 at 5:24 PM Joe Lawson < jlaw...@opensourceconnections.com> wrote: > I'm updating from a basic Solr Client to the ConcurrentUpdateSolrClie

Re: Live Podcast on Solr 6 with Yonik and Erik Hatcher (Today, 2pm ET)

2016-04-20 Thread Doug Turnbull
Thanks to those that watched live. If you missed it, here's the audio recording if you'd like to listen in http://opensourceconnections.com/blog/2016/04/19/solr-6-release/ Best -Doug On Tue, Apr 19, 2016 at 12:32 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote

Re: Live Podcast on Solr 6 with Yonik and Erik Hatcher (Today, 2pm ET)

2016-04-19 Thread Doug Turnbull
hat-s-new > > -Yonik > > > On Tue, Apr 19, 2016 at 10:37 AM, Doug Turnbull > wrote: > > Hey Solristas: > > > > We do a regular podcast called Search Disco > > <http://opensourceconnections.com/podcast>. Today we'll be discussing > the >

Live Podcast on Solr 6 with Yonik and Erik Hatcher (Today, 2pm ET)

2016-04-19 Thread Doug Turnbull
in past episodes, check them out <http://opensourceconnections.com/podcast>. -Doug Turnbull http://opensourceconnections.com

Re: Cannot use Phrase Queries in eDisMax and filtering

2016-04-18 Thread Doug Turnbull
Also you mentioned your field was a string? This means the field must match *exactly* to be considered.a phrase match. Have you considered changing the field to text field type with a tokenizer and doing phrase matching -- it might work more like you'd expect. Thanks -Doug On Mon, Apr 18, 2016 at

Re: what is opening realtime Searcher

2016-04-18 Thread Doug Turnbull
Erick can correct me. I think "searcher" here might just sound a bit misleading. Real time get is really about fetching by id, not issuing searches per-se. Only after a soft or hard commit does a document truly become searchable. On Mon, Apr 18, 2016 at 8:02 PM Erick Erickson wrote: > This is ab

Re: Solr Support for BM25F

2016-04-18 Thread Doug Turnbull
It's worth adding that Lucene's BlendedTermQuery, (used in Elasticsearch's cross_field search), attempts to blend field's document frequency together. So I wonder what BlendedTermQuery plus BM25 similarity per-field would do? It might be close to true BM25F aside for the length issue. (You'd have

Re: Solr Support for BM25F

2016-04-14 Thread Doug Turnbull
ly appreciated. > > Regards, > > David > > Current Solr Version 5.4.1 > -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections <http://opensourceconnections.com>, LLC | 240.476.9983 Author: Relevant Search <http://manning.com/turnbull> This e-ma

Re: understand scoring

2016-03-01 Thread Doug Turnbull
t; -- > View this message in context: > http://lucene.472066.n3.nabble.com/understand-scoring-tp4260837p4260860.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections <http://opensourcecon

Re: understand scoring

2016-03-01 Thread Doug Turnbull
LguNgAjFMbigbW_VqO4Z-YpMxBGWUc7-T3q25XnFyeijoNzY_Fi6gRzhs&sz=s0-l75&ats=1456852075298&rm=153332681af9c93f&zw > > I expected that the order will be 1,3,2 (because 1 is shortest filed[4 > words], and 3 before 2 because the distance between the words...) > Thank you

Re: What search metrics are useful?

2016-02-24 Thread Doug Turnbull
I would also point you at many of Mr. Underwood's blog posts, as they have helped me quite a bit :) http://techblog.chegg.com/2012/12/12/measuring-search-relevance-with-mrr/ On Wed, Feb 24, 2016 at 11:37 AM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > For rele

  1   2   >