Related docs can be retrieved with
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents#TransformingResultDocuments-[subquery]
but searching related docs is less ready.
Here is a patch for query time join across collections
https://issues.apache.org/jira/browse/SOLR-8297.
Make one collection with denormalized data. This looks like a relational,
multi-table schema in Solr. That will be slow and painful.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Mar 2, 2017, at 9:55 PM, Preeti Bhat wrote:
>
> Hi All,
>
> I hav
Hi All,
I have two collections in solrcloud namely contact and company, they are in
same solr instance. Company is relatively simpler document with id, Name,
address etc... Coming over to Contact, this has the nested document like below.
I would like to get the Company details using the "Compan
SOLR gets the updated content from external source (by calling a REST api
which returns xml content).
so my question is how can I plug this logic
in DocExpirationUpdateProcessorFactory, saying poll from external source
and update indexing?
for now i'm thinking to use a custom 'autoDeleteChainName'
Where would Solr get the updated content? Do you mean would it poll
from external source to refresh? Then, no. And if it is pushed from
external sources to Solr, then you just replace it as normal.
Not sure if I understand your use-case exactly.
Regards,
Alex.
http://www.solr-start.com/ -
Hi folks
in our case, we have contents need to be refreshed periodically according
to the TTL of each document.
looks like DocExpirationUpdateProcessorFactory is a quite good fit except
that it does delete the document only, but no way to update the indexing
with the new document.
I don't see th
On 3/1/2017 8:48 AM, Liu, Daphne wrote:
> Hello Solr experts, Is there a place in Solr (Delta Import
> Datasource?) where I can adjust the JDBC connection frame size to 256
> mb ? I have adjusted the settings in Cassandra but I'm still getting
> this error. NonTransientConnectionException:
> org.ap
On 3/2/2017 8:04 AM, Caruana, Matthew wrote:
> I’m currently performing an optimise operation on a ~190GB index with about 4
> million documents. The process has been running for hours.
>
> This is surprising, because the machine is an EC2 r4.xlarge with four cores
> and 30GB of RAM, 24GB of whic
On 3/2/2017 6:44 PM, Alexandre Rafalovitch wrote:
> And if you are not using SolrCloud, you can have
> collection=shard=core, so the terminology gets confused. But you can
> definitely have many cores on one mail server. You can also make them
> lazy, so not all cores have to be loaded. That would
Hi Emir,
Thanks for your reply.
For the query:
q=_query_:"({!frange l=1}ms(startDate_dt,endDate_dt)" OR
_query_:"startDate:[2000-01-01T00:00:00Z TO *] AND
endDate:[2016-12-31T23:59:59Z]"
Must the _query_ be one of the field in the index? I do not have any
fields in the index that relates to th
And if you are not using SolrCloud, you can have
collection=shard=core, so the terminology gets confused. But you can
definitely have many cores on one mail server. You can also make them
lazy, so not all cores have to be loaded. That would definitely allow
you to have a core per user and only sear
What do you have for merge configuration in solrconfig.xml? You should
be able to tune it to - approximately - whatever you want without
doing the grand optimize:
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments
Regards,
Ale
Yes, so the terms component will of course show me the same thing as the facet
query, I am sure the facet query is not wrong. It shows ` in the values, no
matter for which unique product key since there should be 0 of them since there
is a splitby, was there something else you wanted me to look
On 3/2/2017 2:58 PM, Daniel Miller wrote:
> One of the many features of the Dovecot IMAP server is Solr support.
> This obviously provides full-text-searching of stored mails - and it
> works great. But...the focus of the Dovecot team and mailing list is
> Dovecot configuration. I'm asking for s
"should" is the operative term here. My guess is that the data you're putting
in the index isn't what you think it is.
I'd suggest you use the TermsComponent to examine the data actually in
your index.
Best,
Erick
On Thu, Mar 2, 2017 at 3:18 PM, Sales
wrote:
> We are using Solr 4.10.4. I have a
We are using Solr 4.10.4. I have a few Solr fields in schema.xml defined as
follows:
Both of them are loaded in via data-config.xml import handler, and they are
defined there as:
This has been working for years, but, lately, we have noticed strange
One of the many features of the Dovecot IMAP server is Solr support.
This obviously provides full-text-searching of stored mails - and it
works great. But...the focus of the Dovecot team and mailing list is
Dovecot configuration. I'm asking for some guidance on how I might
optimize Solr.
A
Yes, we already do it outside Solr. See https://github.com/ICIJ/extract which
we developed for this purpose. My guess is that the documents are very large,
as you say.
Optimising was always an attempt to bring down the number of segments from 60+.
Not sure how else to do that.
> On 2 Mar 2017,
I typically end up with about 60-70 segments after indexing. What configuration
do you use to bring it down to 16?
> On 2 Mar 2017, at 7:42 pm, Michael Joyner wrote:
>
> You can solve the disk space and time issues by specifying multiple segments
> to optimize down to instead of a single segme
Hello, Frank!
The closest equivalent would be q=+type:userAccount +givenName:test*
And make sure please that it's parsed correctly with debugQuery=true.
Can you also narrow the query to troubleshoot the difference?
ahhh I probably understood.. shards results are merged by uniqueKey, can
you share
Glad to hear it's working. The trick (as you've probably discovered)
is to properly
map the meta-data to Solr fields. The extracting request handler does
this, but the
real underlying issue is that there's no real standard. Word docs
might have "last_editor",
PDFs might have just "author". And on a
Got it all working with Tika and SolrJ. (Got the correct artifacts). Much
faster now too which is good. Thanks very much for your help.
Notice: This email and any attachments are confidential and may not be used,
published or redistributed without the prior written consent of the Institute
of Ge
You would absolutely want to read "Relevant Search" book first. It is
based on Elasticsearch examples, but the concepts map to Solr (and
there is an appendix).
(The following is mostly for names, phone numbers, don't know about addresses)
The core issue is that you will want to setup a bunch of c
When you restart, there are a bunch of threads that start up than can
chew up stack space.
If the message says something about "unable to start native thread"
then it's not raw memory
but the stack space.
Doesn't really sound like this is your error, but thought I'd mention it.
On Wed, Mar 1, 201
Hi All,
First off, what a fabulous job you all are doing creating and supporting an
open source solution! Great Work and many thanks for that.
I am reasonably new to SOLR and our team is trying to integrate SOLR to a
structured database to help with Searching Person Records (first name, last
nam
You can solve the disk space and time issues by specifying multiple
segments to optimize down to instead of a single segment.
When we reindex we have to optimize or we end up with hundreds of
segments and very horrible performance.
We optimize down to like 16 segments or so and it doesn't do
It's _very_ unlikely that optimize will help with OOMs, so that's
very probably a red herring. Likely the document that's causing
the issue is very large or, perhaps, you're using the extracting
processor and it might be a Tika issue, consider doing the Tika
processing outside Solr if so, see:
htt
On 3/1/2017 6:59 PM, Phil Scadden wrote:
> Exceptions never triggered but metadata was essentially empty except
> for contentType, and content was always an empty string. I don’t know
> what parser was doing, but I gave up and with the extractHandler route
> instead which did at least build a full
Thank you, these are useful tips.
We were previously working with a 4GB heap and getting OOMs in Solr while
updating (probably from the analysers) that would cause the index writer to
close with what’s called a “tragic” error in the writer code. Only a hard
restart of the service could bring it
6.4.0 added a lot of metrics to low-level calls. That makes many operations
slow. Go back to 6.3.0 or wait for 6.4.2.
Meanwhile, stop running optimize. You almost certainly don’t need it.
24 GB is a huge heap. Do you really need that? We run a 15 million doc index
with an 8 GB heap (Java 8u121,
Hi,
how about q=code_text:bolt*&fq=code_text:bolt
Ahmet
On Thursday, March 2, 2017 4:41 PM, Сергей Твердохлеб
wrote:
Hi,
is there way to separate exact match from wildcard match in solr response?
e.g. there are two documents: {code_text:bolt} and {code_text:bolter}. When
I search for "bolt
Our customers are running this query where they have a filter on the parent
objects (givenName, familyName etc) and then request the child objects
({!parent which etc)
q=+(givenName:(+UserSearchControllerUTFN +1180460672*)
familyName:(+UserSearchControllerUTFN +1180460672*)) +{!parent
which="t
I recommend the MULTIPOINT approach.
BTW if you go the route of multiple OR'ed sub-clauses, I recommend avoiding
the _query_ syntax which predates Solr 4.x's (4.2?) ability to embed fully
the sub-clauses more naturally; though you need to beware of the gotcha of
needing to add a leading space. If
Hi,
It's simply expensive. You are rewriting your whole index.
Why are you running optimize? Are you seeing performance problems you are
trying to fix with optimize?
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sem
Thank you. The question remains however, if this is such a hefty operation then
why is it walking to the destination instead of running, so to speak?
Is the process throttled in some way?
> On 2 Mar 2017, at 16:20, David Hastings wrote:
>
> Agreed, and since it takes three times the space is p
Hi Matthew,
I'm guessing it's the EBS. With EBS we've seen:
* cpu.system going up in some kernels
* low read/write speeds and maxed out IO at times
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
On Thu,
Agreed, and since it takes three times the space is part of the reason it
takes so long, so that 190gb index ends up writing another 380 gb until it
compresses down and deletes the two left over files. its a pretty hefty
operation
On Thu, Mar 2, 2017 at 10:13 AM, Alexandre Rafalovitch
wrote:
>
Optimize operation is no longer recommended for Solr, as the
background merges got a lot smarter.
It is an extremely expensive operation that can require up to 3-times
amount of disk during the processing.
This is not to say yours is a valid question, which I am leaving to
others to respond.
Reg
I’m currently performing an optimise operation on a ~190GB index with about 4
million documents. The process has been running for hours.
This is surprising, because the machine is an EC2 r4.xlarge with four cores and
30GB of RAM, 24GB of which is allocated to the JVM.
The load average has been
Again, depending on your case, you can use functions in fl to return
additional indicator if doc is exact match or not:
q=code_text:bolt OR whatever&fl=*,isExact:tf('code_text_exact', 'bolt')
It will return isExact field with values >0 for any doc that has term
'bolt' in code_text_exact field.
Hi Mohan,
> On Feb 26, 2017, at 1:37 AM, mohanmca01 wrote:
>
> i searched with (bizNameAr: شرطة ازكي), and am getting:
> […]
>
> the expected result is: "id": "82",
> "bizNameAr": "شرطة عمان السلطانية - قيادة
> شرطة محافظة الداخلية - - مركز *شرطة إزكي*",
>
>
This is Solr Cloud 5.3.1
I have a query like the following
q={!child of="type:userAccount" v="givenName:test*”}
Intent: Show me all children of the type:userAccount where
userAccount.givenName:test*
If I run the query multiple times I get a very different numFound difference
186,560 to 187,412
You could still use scoring with distinct bands of values and include
score field to see the assigned score. Then, on the client, you do
rough grouping.
You could try looking at highlighting, but that's probably
computationally irrational for this purpose.
You could try enabling debugging and see
If you liked my minimal config, you may also appreciate the last
presentation I did at the Lucene/Solr Revolution on deconstructing the
examples.
The slides are
https://www.slideshare.net/arafalov/rebuilding-solr-6-examples-layer-by-layer-lucenesolrrevolution-2016
(the video is embedded at the en
Hi Emir,
Thanks for your answer.
However in my case I really need to separate results, because I need to
treat those resultsets differently.
Thanks.
2017-03-02 15:57 GMT+02:00 Emir Arnautovic :
> Hi Sergei,
>
> Usually you don't want to know which is which, but you do want to have
> exact match
Hi Sergei,
Usually you don't want to know which is which, but you do want to have
exact matches first. In case of simple queries and depending on your
usecase, you can use score to make distinction. If "bolter" matches
"bolt" because of some filters, you will need to index it in two fields
an
Hi,
is there way to separate exact match from wildcard match in solr response?
e.g. there are two documents: {code_text:bolt} and {code_text:bolter}. When
I search for "bolt" I want to get both results, but somehow grouped, so I
can determine either it was found with exact or non-exact match.
Tha
Hi all,
When indexing data i get in the gc log messages like:
2017-03-02T10:43:17.872+: 1088.957: Total time for which application
threads were stopped: 0.0002071 seconds, Stopping threads took: 0.888
seconds
2017-03-02T10:43:17.885+: 1088.970: Total time for which application
threads
Found this project and I'd like to know what would be involved with
exposing its RestrictedField type through Solr for indexing and querying as
a Solr field type.
https://github.com/roshanp/lucure-core
Thanks,
Mike
Hi Stave,
Any update on this.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4323005.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi - don't bother anymore, it seems to work fine now. I don't know why, but it
kept hanging without error message.
Thanks,
Markus
-Original message-
> From:Zheng Lin Edwin Yeo
> Sent: Thursday 2nd March 2017 4:55
> To: solr-user@lucene.apache.org
> Subject: Re: bin/solr -a doesn't work
Thanks Charly. This is what i looked for.
On Thu, Mar 2, 2017 at 11:07 AM David Michael Gang
wrote:
I use the latest version. Solr 6.4.1
On Thu, Mar 2, 2017 at 9:15 AM Aravind Durvasula
wrote:
Hi David,
What is the solr version you are using?
To get started, it's better to use the config fi
I use the latest version. Solr 6.4.1
On Thu, Mar 2, 2017 at 9:15 AM Aravind Durvasula
wrote:
> Hi David,
>
> What is the solr version you are using?
> To get started, it's better to use the config file that comes out of the
> box.
>
> Thanks,
> Aravind
>
>
>
> --
> View this message in context:
Hi Edwin,
You can use subqueries:
q=_query_:"({!frange l=1}ms(startDate_dt,endDate_dt)" OR
_query_:"startDate:[2000-01-01T00:00:00Z TO *] AND endDate:[2016-12-31T23:59:59Z]"
HTH,
Emir
On 02.03.2017 04:51, Zheng Lin Edwin Yeo wrote:
Hi,
Would like to check, how can we do an OR condition bet
On 02/03/2017 06:58, David Michael Gang wrote:
Hi all,
I want to create my first solr collection
I found an example of solrconfig here.
https://github.com/apache/lucene-solr/blob/master/solr/example/files/conf/solrconfig.xml
This is a file of more than thousand lines.
As i understand this file s
55 matches
Mail list logo