I wouldn’t worry about performance with that setup. I just checked on a
production
system with 13 million docs in four shards, so 3+ million per shard. I searched
on
the most common term in the title field and got a response in 31 milliseconds.
This was probably not cached, because the collection
Well it remind regular awkward parsing issues. Try to experiment with
&fq={!join to=...from=... v='field:12*'} or &fq={!join to=... from=...
v=$qq}&qq=field:12*
No more questions to ask.
On Wed, Oct 9, 2019 at 4:39 PM Paresh wrote:
> E.g. In query, join with wild-card query using parenthesis I g
yup. youre going to find solr is WAY more efficient than you think when it
comes to complex queries.
On Wed, Oct 9, 2019 at 3:17 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com
wrote:
> True...I guess another rub here is that we're using the edismax parser, so
> all of our queries are inherent
True...I guess another rub here is that we're using the edismax parser, so all
of our queries are inherently OR queries. So for a query like 'the ibm way',
the search engine would have to:
1) retrieve a document list for:
--> "ibm" (this list is probably 80% of the documents)
--> "the" (th
if you have anything close to a decent server you wont notice it all. im
at about 21 million documents, index varies between 450gb to 800gb
depending on merges, and about 60k searches a day and stay sub second non
stop, and this is on a single core/non cloud environment
On Wed, Oct 9, 2019 at 2:5
only in my more like this tools, but they have a very specific purpose,
otherwise no
On Wed, Oct 9, 2019 at 2:31 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com
wrote:
> Wow, thank you so much, everyone. This is all incredibly helpful insight.
>
> So, would it be fair to say that the majority o
oh and by 'non stop' i mean close enough for me :)
On Wed, Oct 9, 2019 at 2:59 PM David Hastings
wrote:
> if you have anything close to a decent server you wont notice it all. im
> at about 21 million documents, index varies between 450gb to 800gb
> depending on merges, and about 60k searches a
Also, in terms of computational cost, it would seem that including most
terms/not having a stop ilst would take a toll on the system. For instance,
right now we have "ibm" as a stop word because it appears everywhere in our
corpus. If we did not include it in the stop words file, we would have t
Yeah, I dont use it as a search, only well, finding more documents like
that one :) . for my purposes i tested between 2 to 5 part shingles and
ended up that the 2 part was actually giving me better results, for my use
case, than using any more.
I dont suppose you could point me to any of the phra
We did something like that with Infoseek and Ultraseek. We had a set of
“glue words” that made noun phrases and indexed patterns like “noun glue noun”
as single tokens.
I remember Doug Cutting saying that Nutch did something similar using pairs,
but using that as a prefilter instead of as a relev
Wow, thank you so much, everyone. This is all incredibly helpful insight.
So, would it be fair to say that the majority of you all do NOT use stop words?
--
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
On 10/9/19, 11:14 AM, "David Hastings" wrote:
However,
Hi Suleiman
As the solr distribution is the same regardless of Linux / Windows yes it's
OK for Windows, to answer your specific question about Windows service we
personally use NSSM to wrap the solr.cmd command.
You then specify your arguments as you would starting solr in Linux
Example *start
-f
Dear all,
I hope this email finds you well.
I was just wondering if there is a way in which I can make solr in
production mode (as a service) on windows server, not just on *nix systems.
I'm working on a project and I need solr in production mode on windows
server.
Regards
Suleiman Hassan
App4leg
I was going to file a bug in JIRA for this, but it said to discuss first on the
user mailing list:
I upgraded to Solr 8.2.0 and Zookeeper 3.5.5. I added all the System
properties and the missing "netty-all-4.1.29.Final.jar" file from zookeeper and
put it in the classpath for solr. Encrypted Zo
Use case
I am querying a catchall field and then would like to highlight that term in 3
other fields say a, b, and c. I already have full term vectors.
>From my reading and limited testing the fastest choice would be
hl.method unified
hl.termVectors true
hl.termPositions true
hl.termOffsets true
However, with all that said, stopwords CAN be useful in some situations. I
combine stopwords with the shingle factory to create "interesting phrases"
(not really) that i use in "my more like this" needs. for example,
europe for vacation
europe on vacation
will create the shingle
europe_vacation
w
another add on, as the previous two were pretty much spot on:
https://www.google.com/search?rlz=1C5CHFA_enUS814US819&sxsrf=ACYBGNTi2tQTQH6TycDKwRNEn9g2km9awg%3A1570632176627&ei=8PGdXa7tJeem_QaatJ_oAg&q=drive+in&oq=drive+in&gs_l=psy-ab.3..0l10.35669.36730..37042...0.4..1.434.1152.4j3j4-1..0
The theory behind stopwords is that they are “safe” to remove when calculating
relevance, so we can squeeze every last bit of usefulness out of very
constrained hardware (think 64K of memory. Yes kilobytes). We’ve come a long
way since then and the necessity of removing stopwords from the indexe
Stopwords (it was discussed on mailing list several times I recall):
The ideas is that it used to be part of the tricks to make the index
as small as possible to allow faster search. Stopwords being the most
common words
This days, disk space is not an issue most of the time and there have
been
My use case is this:
I'd like solr to return my indexed document including all nested children.
On top of that, some extra information about the root doc is added
dynamically (the subquery).
But I understand this is an advanced use case and probably not requested
frequently. I'll try to around it.
Stopwords were used when we were running search engines on 16-bit computers
with 50 Megabyte disks, like the PDP-11. They avoided storing and processing
long posting lists.
Think of removing stopwords as a binary weighting on frequent terms, either on
or off (not in the index). With idf, we hav
E.g. In query, join with wild-card query using parenthesis I get error -
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.parser.ParseException"],
"msg":"org.apache.solr.search.SyntaxError: Cannot parse
'solrField:(12*': Encountered \"\" at line 1
Hey Alex,
Thank you!
Re: stopwords being a thing of the past due to the affordability of
hardware...can you expand? I'm not sure I understand.
--
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
On 10/8/19, 1:01 PM, "David Hastings" wrote:
Another thing to ad
Try referencing the jar directly (by absolute path) with a statement
in the solrconfig.xml (and reloading the core).
The DIH example shipped with Solr shows how it works.
This will help to see if the problem with not finding the jar or something else.
Regards,
Alex.
On Wed, 9 Oct 2019 at 09:14
Try starting Solr with the “-v” option. That will echo all the jars that are
loaded and the paths.
Where _exactly_ is the jar file? You say “in the lib folder of my core”, but
that leaves a lot of room for interpretation.
Are you running stand-alone or SolrCloud? Exactly how do you start Solr?
I might not fully understand how you would like to combine them. The
possible reason is that [subquery] expect regular Solr Response to act on,
but [child] might yield something hairish.
On Wed, Oct 9, 2019 at 2:40 PM Bram Biesbrouck <
bram.biesbro...@reinvention.be> wrote:
> Hi Mikhail,
>
> You'
Hi Mikhail,
You're right, I should file an issue for the doc thing, I'll look into it.
Thanks for pointing me towards parsing the _nest_path_ field. It's exactly
what ChildDocTransformer does, indeed.
Would you by any chance know why [child] and [subquery] can't be combined?
They don't look too
Hello, Bram.
I guess [child] was recently extended. Docs might be outdated, don't
hesitate to contribute doc improvement.
[subquery] is a neat thing, it's just queries without relying on particular
use case, if my understanding is right one may request something like
_path_ field in [subquery], wh
Hi all,
I'm diving deep into the ChildDocTransformer and its
related SubQueryAugmenter.
First of all, I think there's a bug in the Solr docs about [child]. It
states:
"This transformer returns all descendant documents of each parent document
matching your query in a flat list nested inside the ma
Hi,
Kindly help me solve the issue when I am connecting NEO4j with solr. I am
facing this issue in my log file while I have the jar file of neo4j driver
in the lib folder of my core.
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.Data
30 matches
Mail list logo