date:20191009

Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Walter Underwood

I wouldn’t worry about performance with that setup. I just checked on a production system with 13 million docs in four shards, so 3+ million per shard. I searched on the most common term in the title field and got a response in 31 milliseconds. This was probably not cached, because the collection

Re: Wild-card query behavior

2019-10-09 Thread Mikhail Khludnev

Well it remind regular awkward parsing issues. Try to experiment with &fq={!join to=...from=... v='field:12*'} or &fq={!join to=... from=... v=$qq}&qq=field:12* No more questions to ask. On Wed, Oct 9, 2019 at 4:39 PM Paresh wrote: > E.g. In query, join with wild-card query using parenthesis I g

Re: Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings

yup. youre going to find solr is WAY more efficient than you think when it comes to complex queries. On Wed, Oct 9, 2019 at 3:17 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > True...I guess another rub here is that we're using the edismax parser, so > all of our queries are inherent

Re: Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com

True...I guess another rub here is that we're using the edismax parser, so all of our queries are inherently OR queries. So for a query like 'the ibm way', the search engine would have to: 1) retrieve a document list for: --> "ibm" (this list is probably 80% of the documents) --> "the" (th

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings

if you have anything close to a decent server you wont notice it all. im at about 21 million documents, index varies between 450gb to 800gb depending on merges, and about 60k searches a day and stay sub second non stop, and this is on a single core/non cloud environment On Wed, Oct 9, 2019 at 2:5

Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings

only in my more like this tools, but they have a very specific purpose, otherwise no On Wed, Oct 9, 2019 at 2:31 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Wow, thank you so much, everyone. This is all incredibly helpful insight. > > So, would it be fair to say that the majority o

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings

oh and by 'non stop' i mean close enough for me :) On Wed, Oct 9, 2019 at 2:59 PM David Hastings wrote: > if you have anything close to a decent server you wont notice it all. im > at about 21 million documents, index varies between 450gb to 800gb > depending on merges, and about 60k searches a

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com

Also, in terms of computational cost, it would seem that including most terms/not having a stop ilst would take a toll on the system. For instance, right now we have "ibm" as a stop word because it appears everywhere in our corpus. If we did not include it in the stop words file, we would have t

Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings

Yeah, I dont use it as a search, only well, finding more documents like that one :) . for my purposes i tested between 2 to 5 part shingles and ended up that the 2 part was actually giving me better results, for my use case, than using any more. I dont suppose you could point me to any of the phra

Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Walter Underwood

We did something like that with Infoseek and Ultraseek. We had a set of “glue words” that made noun phrases and indexed patterns like “noun glue noun” as single tokens. I remember Doug Cutting saying that Nutch did something similar using pairs, but using that as a prefilter instead of as a relev

Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com

Wow, thank you so much, everyone. This is all incredibly helpful insight. So, would it be fair to say that the majority of you all do NOT use stop words? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/9/19, 11:14 AM, "David Hastings" wrote: However,

Re: Windows Production

2019-10-09 Thread David Barnett

Hi Suleiman As the solr distribution is the same regardless of Linux / Windows yes it's OK for Windows, to answer your specific question about Windows service we personally use NSSM to wrap the solr.cmd command. You then specify your arguments as you would starting solr in Linux Example *start -f

Windows Production

2019-10-09 Thread Suleiman Hasan

Dear all, I hope this email finds you well. I was just wondering if there is a way in which I can make solr in production mode (as a service) on windows server, not just on *nix systems. I'm working on a project and I need solr in production mode on windows server. Regards Suleiman Hassan App4leg

Solr ZK Status Page fails when using SSL feature of ZooKeeper

2019-10-09 Thread Ryan Rockenbaugh

I was going to file a bug in JIRA for this, but it said to discuss first on the user mailing list: I upgraded to Solr 8.2.0 and Zookeeper 3.5.5. I added all the System properties and the missing "netty-all-4.1.29.Final.jar" file from zookeeper and put it in the classpath for solr. Encrypted Zo

Highlighting Solr 8

2019-10-09 Thread Eric Allen

Use case I am querying a catchall field and then would like to highlight that term in 3 other fields say a, b, and c. I already have full term vectors. >From my reading and limited testing the fastest choice would be hl.method unified hl.termVectors true hl.termPositions true hl.termOffsets true

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings

However, with all that said, stopwords CAN be useful in some situations. I combine stopwords with the shingle factory to create "interesting phrases" (not really) that i use in "my more like this" needs. for example, europe for vacation europe on vacation will create the shingle europe_vacation w

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings

another add on, as the previous two were pretty much spot on: https://www.google.com/search?rlz=1C5CHFA_enUS814US819&sxsrf=ACYBGNTi2tQTQH6TycDKwRNEn9g2km9awg%3A1570632176627&ei=8PGdXa7tJeem_QaatJ_oAg&q=drive+in&oq=drive+in&gs_l=psy-ab.3..0l10.35669.36730..37042...0.4..1.434.1152.4j3j4-1..0

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Erick Erickson

The theory behind stopwords is that they are “safe” to remove when calculating relevance, so we can squeeze every last bit of usefulness out of very constrained hardware (think 64K of memory. Yes kilobytes). We’ve come a long way since then and the necessity of removing stopwords from the indexe

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Alexandre Rafalovitch

Stopwords (it was discussed on mailing list several times I recall): The ideas is that it used to be part of the tricks to make the index as small as possible to allow faster search. Stopwords being the most common words This days, disk space is not an issue most of the time and there have been

Re: How to combine [child] and [subquery?]

2019-10-09 Thread Bram Biesbrouck

My use case is this: I'd like solr to return my indexed document including all nested children. On top of that, some extra information about the root doc is added dynamically (the subquery). But I understand this is an advanced use case and probably not requested frequently. I'll try to around it.

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Walter Underwood

Stopwords were used when we were running search engines on 16-bit computers with 50 Megabyte disks, like the PDP-11. They avoided storing and processing long posting lists. Think of removing stopwords as a binary weighting on frequent terms, either on or off (not in the index). With idf, we hav

Re: Wild-card query behavior

2019-10-09 Thread Paresh

E.g. In query, join with wild-card query using parenthesis I get error - "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.parser.ParseException"], "msg":"org.apache.solr.search.SyntaxError: Cannot parse 'solrField:(12*': Encountered \"\" at line 1

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com

Hey Alex, Thank you! Re: stopwords being a thing of the past due to the affordability of hardware...can you expand? I'm not sure I understand. -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/8/19, 1:01 PM, "David Hastings" wrote: Another thing to ad

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2019-10-09 Thread Alexandre Rafalovitch

Try referencing the jar directly (by absolute path) with a statement in the solrconfig.xml (and reloading the core). The DIH example shipped with Solr shows how it works. This will help to see if the problem with not finding the jar or something else. Regards, Alex. On Wed, 9 Oct 2019 at 09:14

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2019-10-09 Thread Erick Erickson

Try starting Solr with the “-v” option. That will echo all the jars that are loaded and the paths. Where _exactly_ is the jar file? You say “in the lib folder of my core”, but that leaves a lot of room for interpretation. Are you running stand-alone or SolrCloud? Exactly how do you start Solr?

Re: How to combine [child] and [subquery?]

2019-10-09 Thread Mikhail Khludnev

I might not fully understand how you would like to combine them. The possible reason is that [subquery] expect regular Solr Response to act on, but [child] might yield something hairish. On Wed, Oct 9, 2019 at 2:40 PM Bram Biesbrouck < bram.biesbro...@reinvention.be> wrote: > Hi Mikhail, > > You'

Re: How to combine [child] and [subquery?]

2019-10-09 Thread Bram Biesbrouck

Hi Mikhail, You're right, I should file an issue for the doc thing, I'll look into it. Thanks for pointing me towards parsing the _nest_path_ field. It's exactly what ChildDocTransformer does, indeed. Would you by any chance know why [child] and [subquery] can't be combined? They don't look too

Re: How to combine [child] and [subquery?]

2019-10-09 Thread Mikhail Khludnev

Hello, Bram. I guess [child] was recently extended. Docs might be outdated, don't hesitate to contribute doc improvement. [subquery] is a neat thing, it's just queries without relying on particular use case, if my understanding is right one may request something like _path_ field in [subquery], wh

How to combine [child] and [subquery?]

2019-10-09 Thread Bram Biesbrouck

Hi all, I'm diving deep into the ChildDocTransformer and its related SubQueryAugmenter. First of all, I think there's a bug in the Solr docs about [child]. It states: "This transformer returns all descendant documents of each parent document matching your query in a flat list nested inside the ma

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2019-10-09 Thread guptavaibhav35

Hi, Kindly help me solve the issue when I am connecting NEO4j with solr. I am facing this issue in my log file while I have the jar file of neo4j driver in the lib folder of my core. Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.Data

Re: Protecting Tokens from Any Analysis

Re: Wild-card query behavior

Re: Re: Re: Re: Re: Protecting Tokens from Any Analysis

Re: Re: Re: Re: Re: Protecting Tokens from Any Analysis

Re: Re: Re: Re: Protecting Tokens from Any Analysis

Re: Re: Re: Protecting Tokens from Any Analysis

Re: Re: Re: Re: Protecting Tokens from Any Analysis

Re: Re: Re: Re: Protecting Tokens from Any Analysis

Re: Protecting Tokens from Any Analysis

Re: Protecting Tokens from Any Analysis

Re: Re: Re: Protecting Tokens from Any Analysis

Re: Windows Production

Windows Production

Solr ZK Status Page fails when using SSL feature of ZooKeeper

Highlighting Solr 8

Re: Re: Protecting Tokens from Any Analysis

Re: Re: Protecting Tokens from Any Analysis

Re: Re: Protecting Tokens from Any Analysis

Re: Re: Protecting Tokens from Any Analysis

Re: How to combine [child] and [subquery?]

Re: Re: Protecting Tokens from Any Analysis

Re: Wild-card query behavior

Re: Re: Protecting Tokens from Any Analysis

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

Re: How to combine [child] and [subquery?]

Re: How to combine [child] and [subquery?]

Re: How to combine [child] and [subquery?]

How to combine [child] and [subquery?]

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

30 matches

Site Navigation

Mail list logo

Footer information