OK, so the next thing to do would be to index and store the rich text ...
is it HTML? Because then you can use HTMLStripCharFilterFactory in your
analyzer, and still get the correct highlight back with hl.fragsize=0.
I would think that you will have a hard time using the term positions, if
what yo
Hi Alex
Thank you for pointing out theUpdateRequestProcessor option.
On 3/30/2017 11:43 AM, Alexandre Rafalovitch wrote:
I am not sure I can tell how to decide on one or another. However, I
wanted to mention that you also have an option of doing in in the
UpdateRequestProcessor chain. That's st
Hi Erick
So I could also not use the query analyzer stage to append the code to
the search keyword?
Have the front-end application append the code for every query it issue
instead?
On 3/30/2017 12:20 PM, Erick Erickson wrote:
I generally prefer index-time work to query-time work on the theo
I generally prefer index-time work to query-time work on the theory
that the index-time work is done once and the query time work is done
for each query.
That said, for a corpus this size (and presumably without a large
query rate) I doubt you'd be able to measure any difference.
So basically cho
Thanks for your reply.
>From what I see, getting more hardware to do the OCR is inevitable?
Even if we run the OCR outside of Solr indexing stream, it will still take
a long time to process it if it is on just one machine. And we still need
to wait for the OCR to finish converting before we can r
I am not sure I can tell how to decide on one or another. However, I
wanted to mention that you also have an option of doing in in the
UpdateRequestProcessor chain. That's still within Solr (and therefore
is consistent with multiple clients feeding into Solr) but is before
individual field processi
Hi
Ineed to create afield that will be prefix and suffix with code
'z01x'.This field needs to have the code in the index and during query.
I can either
1.
have the source data of the field formatted with the code before
indexing (outside solr).
use a charFilter in the query stage of the field
It's an LRU cache time. See the docs for LinkedHashmap, this form of
the c'tor is used in SolrCores.allocateLazyCores
transientCores = new LinkedHashMap(Math.min(cacheSize, 1000), 0.75f, true) {
which is a special form of the c'tor that creates an access-ordered map.
I had a terrible moment seein
Thanks again for the information Shawn.
1) The long running process I told earlier was about Backup. I have written a
custom BackupHandler to backup the index files to a Cloud storage following the
ReplicationHandler class. I’m just wondering how does switching between
transient state affect s
On 3/29/2017 4:50 PM, Shashank Pedamallu wrote:
> Thank you very much for the response. Is there no definite way of
> ensuring that Solr does not switch transient states by an api? Like
> solrCore.open() and solrCore.close()?
I am not aware of any way to tell Solr to NOT unload a core when all of
Hi Shawn,
Thank you very much for the response. Is there no definite way of ensuring that
Solr does not switch transient states by an api? Like solrCore.open() and
solrCore.close()?
Thanks,
Shashank Pedamallu
MTS MBU vCOps Dev
US-CA-Promontory E, E 1035
Email: spedama...@vmware.com
Office: 650.
You might be helped by "distributed IDF".
see: SOLR-1632
On Wed, Mar 29, 2017 at 1:56 PM, Chris Hostetter
wrote:
>
> The thing to keep in mind, is that w/o a fully deterministic sort,
> the underlying problem statement "doc may appera on multiple pages" can
> exist even in a single node solr inde
> For in-place updates, the documentation states that only the fields being
> modified are updated, but does that mean that all other fields don't need
> to be stored?
Correct, in general there's no need to store the other fields. However,
there's a niche case where if a simultaneous DeleteByQuery
The thing to keep in mind, is that w/o a fully deterministic sort,
the underlying problem statement "doc may appera on multiple pages" can
exist even in a single node solr index, even if no documents are
added/deleted between bage requests: because background merges /
searcher re-opening may h
Great explanation, Alessandro!
Let me briefly explain my experience. I have a tiny test with 2 shards and
2 replicas, index about a hundred of docs. And then when I fully paginate
search results with score ranking, I've got duplicates across pages. And
the reason is deletes, which occur probably d
Hi Alex,
Thanks for your time and please see response inline.
Thanks,
Jack
On Wed, Mar 29, 2017 at 11:36 AM, Alexandre Rafalovitch
wrote:
> There are too many things here. As far as I understand:
> *) You should not need to use Signature chain [Jack] I have the same
> feeling as well, but I j
I need some clarity on atomic vs in-place updates. For atomic I understand
that all fields need to be stored, either explicitly or through docValues,
since the entire document is re-indexed.
For in-place updates, the documentation states that only the fields being
modified are updated, but does t
There are too many things here. As far as I understand:
*) You should not need to use Signature chain
*) You should have a uniqueID assigned to the child record
*) You should not assign parentID to the child record, it will be
assigned automatically
*) Double check that your unique_key field type i
BTW, we only have one node and the collection has just one shard.
On Wed, Mar 29, 2017 at 10:52 AM, Wenjie Zhang
wrote:
> Hi there,
>
> We are in solr 6.0.1, here is our solr schema and config:
>
> _unique_key
>
>
>
> solr.StrField
> 32766
>
>
>
>
>
> [^
On 3/29/2017 11:17 AM, Shashank Pedamallu wrote:
> I’m performing some long running operation on a Background thread on a
> Core and I observed that since the core has the property “transient”
> set to true, in between this operation completes, the core is being
> CLOSED and OPENED by Solr (even th
Hi there,
We are in solr 6.0.1, here is our solr schema and config:
_unique_key
solr.StrField
32766
[^\w-\.]
_
When having above configuration, and doing following operations, we will
see duplicate documents (two documents have same _
Hi,
I’m performing some long running operation on a Background thread on a Core and
I observed that since the core has the property “transient” set to true, in
between this operation completes, the core is being CLOSED and OPENED by Solr
(even though the operation continues without interruption
Hi,
I’m performing some long running operation on a Background thread on a Core and
I observed that since the core has the property “transient” set to true, in
between this operation completes, the core is being CLOSED and OPENED by Solr
(even though the operation continues without interruption
The reason Mikhail mentioned that, is probably related to :
*The way how number of document calculated is changed (LUCENE-6711)*
/The number of documents (docCount) is used to calculate term specificity
(idf) and average document length (avdl). Prior to LUCENE-6711,
collectionStats.maxDoc() was us
Thanks Shawn.
Regards,
Prateek Jain
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: 29 March 2017 01:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr | cluster | behaviour
On 3/29/2017 3:21 AM, Prateek Jain J wrote:
> We are having solr deployment in ac
Good info-- Yes thanks for you correction! And thanks for the very welcome
change to edismax!
Interesting point on findin consistent offsets in the text. That would be
an interesting approach
Doug
On Wed, Mar 29, 2017 at 11:36 AM Steve Rowe wrote:
> Thanks Doug, excellent analysis!
>
> In im
Thanks Doug, excellent analysis!
In implementing the SOLR-9185 changtes, I considered a compromise approach to
the term-centric / field-centric axis you describe in the case of differing
field analysis pipelines: finding common source-text-offset bounded slices in
all per-field queries, and the
Thanks Rick.
Does that mean I need to define managed-schema.xml, I thought it gets
created by default on installing but only on later versions of SOLR ( 6.0
or later).
Will managed-schema help in indexing the JSON type fields in the mongoDB ?
How do I define the managed-schema in SOLR 5.4.0 ?
What triggered me to send this was seeing this
> When per-field query structures differ, e.g. when one field's analyzer
removes stopwords and another's doesn't, edismax's DisjunctionMaxQuery
structure when sow=false differs from that produced when sow=true. Briefly,
sow=true produces a boolean que
So with regards to this JIRA (
https://issues.apache.org/jira/browse/SOLR-9185) Which makes Solr splitting
on whitespace optional.
I want to point out that there's not a simple fix to multi-term synonyms in
part because of specific tradeoffs. Splitting on whitespace is *someimes a
good thing*. Not
I can answer at least one bit...
If all the sort fields are equal, the _internal_ Lucene document ID
(not ) is used to break the tie.The kicker is that the
internal Lucene ID can change when merging segments. Further, the
internal ID for two given docs can change relative to each other. I.e.
star
Mikhall,
effectively maxDocs are different and also deletedDocs, but numDocs are ok.
I don't really get it, but can that be the problem?
2017-03-29 10:35 GMT-03:00 Mikhail Khludnev :
> Can it happen that replicas are different by deleted docs? I mean numDocs
> is the same, but maxDocs is differ
Can it happen that replicas are different by deleted docs? I mean numDocs
is the same, but maxDocs is different by number of deleted docs, you can
see it in solr admin at the core page.
On Wed, Mar 29, 2017 at 4:16 PM, Pablo Anzorena
wrote:
> Shawn,
>
> Yes, the field has duplicate values and ye
Shawn,
Yes, the field has duplicate values and yes, if I add the secondary sort by
the uniqueKey it solve the issue.
Those 2 situations you mentioned are not occurring, none of them. The index
is replicated, but not sharded.
Does solr sort by an internal id if no uniqueKey is present in the sort
On 3/29/2017 6:35 AM, Pablo Anzorena wrote:
> I was paginating the results of a query and noticed that some
> documents were repeated across pagination buckets of 100 rows. When I
> sort by the unique field there is no repeated document but when I sort
> by another field then repeated documents app
Let me try. It is really hard to replicate it, but I will try out and come
back when i got it.
2017-03-29 9:40 GMT-03:00 Erik Hatcher :
> Certainly not intended behavior. Can you show us a way to replicate the
> issue?
>
>
> > On Mar 29, 2017, at 8:35 AM, Pablo Anzorena
> wrote:
> >
> > Hey,
>
Certainly not intended behavior. Can you show us a way to replicate the issue?
> On Mar 29, 2017, at 8:35 AM, Pablo Anzorena wrote:
>
> Hey,
>
> I was paginating the results of a query and noticed that some documents
> were repeated across pagination buckets of 100 rows.
> When I sort by the
Hey,
I was paginating the results of a query and noticed that some documents
were repeated across pagination buckets of 100 rows.
When I sort by the unique field there is no repeated document but when I
sort by another field then repeated documents appear.
I assume is a bug and it's not the intend
On 3/29/2017 3:21 AM, Prateek Jain J wrote:
> We are having solr deployment in active passive mode. So, ideally only
> one instance should be up and running at a time. That's true we only
> see one instance serving requests but we do see some activity in CPU
> for the standby solr instance. These i
Thanks, Erick
I am not sure about hdfs transcation logs, it's intricate.
--
View this message in context:
http://lucene.472066.n3.nabble.com/why-leader-replica-does-not-call-HdfsTransactionLog-finish-tp4327139p4327399.html
Sent from the Solr - User mailing list archive at Nabble.com.
Just to update, we are using solr 4.8.1
Regards,
Prateek Jain
-Original Message-
From: Prateek Jain J
Sent: 29 March 2017 10:22 AM
To: solr-user@lucene.apache.org
Subject: Solr | cluster | behaviour
Hi All,
We are having solr deployment in active passive mode. So, ideally only one
Hi All,
We are having solr deployment in active passive mode. So, ideally only one
instance should be up and running at a time. That's true we only see one
instance serving requests but we do see some activity in CPU for the standby
solr instance. These instances are writing to shared disk.
42 matches
Mail list logo