Hi All,
I am trying to understand on how we can have commit issued to solr while
indexing documents. Around 200K to 300K document/per hour with an avg size of
10 KB size each will be getting into SOLR . JAVA code fetches the document from
MQ and streamlines it to SOLR. The problem is the client
Thanks Walter,
the payload idea is something that I've never heard... it seems interesting
but quite complex to implement. I think we'll have to write a custom filter
to add page numbers and it's not clear to me how to retrieve payloads in
the query result. However I'll try to go more in deep on th
Thanks Jeff,
I understand your philosophy and it sounds correct.
Since we had many problems with zookeeper when switching to Solr Cloud. we
couldn't make it as a source of knowledge and had to relay on a more stable
source.
The issues is that when we get such an event of zookeeper, it brought our
s
Thanks you, @Doug Turnbull I tried http://splainer.io but it's not for my
query(not explain for the docs..).
here the picture again...
https://drive.google.com/file/d/0B-7dnH4rlntJc2ZWdmxMS3RDMGc/view?usp=sharing
On Tue, Mar 1, 2016 at 10:06 PM, Doug Turnbull <
dturnb...@opensourceconnections.com>
Thanks Emir,
a similar solution has already come in my mind too: searching on chapters,
highlighting the result and retrieve matching pages parsing the highlighted
result... surely not a very efficient approach but could work...
however I think I'll try different approaches before this
Il giorno m
IT WAS MY FIRST POST IN MAILING LIST SO NOT SURE IF YOU GET IT SO I'M SEND
IT AGAIN
Hi,
I have solr 5.4.1 and I'm trying to use Block Join Query Parser for search
in children and return the parent.
I want to apply highlight on children but it's return empty.
My q parameter: "q={!parent which="is_p
Thanks Jack,
the chapter is definitely the optimal unit to search into and your solution
seems a quite good approach. The counterpart is that, depending on how
we'll choose the amount of text shared on two adjacent pages we will
experience some errors. For example will be always possible finding a
Thanks Alexandre,
your solution seems very good: I'll surely try it and let you know. I like
the Idea of mixing blockjoins and grouping!
Il giorno mer 2 mar 2016 alle ore 04:46 Alexandre Rafalovitch <
arafa...@gmail.com> ha scritto:
> Here is an - untested - possible approach. I might be missing
If someone of you cares about his Stackoverflow reputation and has time to
do it I also opened a question there:
http://stackoverflow.com/questions/35722672/solr-schema-to-model-books-chapters-and-pages.
Thanks again to everybody
Il giorno mer 2 mar 2016 alle ore 09:42 Zaccheo Bagnati
ha scritto:
Hi Team,
Can you please clarify the below. My understanding is tokenizer is used to say
how the content should be indexed physically in file system. Filters are used
to query result. The blow lines are from my setup. But I have seen eg that
include filters inside and tokenizer in that confuse
Hi Rajesh,
Processing flow is same for both indexing and querying. What is compared
at the end are resulting tokens. In general flow is: text -> char filter
-> filtered text -> tokenizer -> tokens -> filter1 -> tokens ... ->
filterN -> tokens.
You can read more about analysis chain in Solr wi
Hi Michael,
Can you please run query with debug and share title field configuration.
Thanks,
Emir
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/
On 02.03.2016 09:14, michael solomon wrote:
Thanks you, @Doug Turnbu
Hi Sangeetha,
What is sure is that it is not going to work - with 200-300K doc/hour,
there will be >50 commits/second, meaning there are <20ms time for
doc+commit.
You can do is let Solr handle commits and maybe use real time get to
verify doc is in Solr or do some periodic sanity checks.
Are y
Hi Emir,
In morning I delete those documents and know added them again to re-run the
query.. and know this is how I expect (0_0) and I can't to re-produce the
problem... this weird.. :\
On Wed, Mar 2, 2016 at 11:38 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:
> Hi Michael,
> Can you
Hi,
Varun, we actually ran the test for our restored data snapshot and it threw
an error saying "Broken segment".
How is it possible that the same test gives success on the snapshot, but
not on the restored snapshot? Can you please throw some light on this, so
we can proceed and fix this issue.
Hi Team,
Can you please clarify the below. My understanding is tokenizer is used to say
how the content should be indexed physically in file system. Filters are used
to query result. The blow lines are from my setup. But I have seen eg that
include filters inside and tokenizer in that confuse
Could you post the full output of the CheckIndex command on the restored
snapshot? Also what happens if you delete the snapshot indexes and attempt
to restore again? Does it get corrupted again or is it a one off scenario?
On Wed, Mar 2, 2016 at 3:44 PM, Janit Anjaria (Tech-IT) <
anjaria.ja...@fli
This is probably more of a Tika question now...
It sounds like Tika is not extracting dates from the .eml files that you are
generating? To confirm, you are able to extract dates with libpst...it is just
that Tika is not able to process the dates that you are sending it in your .eml
files?
If
Am running in to this issue : https://issues.apache.org/jira/browse/SOLR-7606.
But am not following all of the description there in that ticket.
But what I am not able to understand is when does a parent/child orthogonality
is broken. And what does a child document without a parent mean ?
I hav
Hi Sangeetha,
Well I don't think you need to commit after every document add.
You can rely on Solr's transaction log feature . If you are using SolrCloud
it's mandatory to have a transaction log . So every documents get written
to the tlog . Now say a node crashes even if documents were not commi
Hi,
I Installed 3 instances of SolrCloud 5.4.1.
I'm doing a little search engine of websites and I'm store their info as
Nested Documents(one document for the website general information and it
children is the pages inside the website).
So when I'm querying this collection I'm using a BlockJoin par
Hi,
... must have one and only one and
it can have zero or more s. From the point of view of the
rules, your ... is not correct
because it has more than one and
... is not correct as well because it has no .
Koji
On 2016/03/02 20:25, G, Rajesh wrote:
Hi Team,
Can you please clarify the bel
Hi Edwin.
That was what I suspected, but I wanted to confirm. If we go down this
route I’ll do some testing and post the results.
We’re using 5.1 in production, but I’m testing with 5.4.1.
The index has 40,891,287 documents and is 3.01 GB, so it’s not big at all.
Many thanks,
Alfonso
On 01/
Hello,
It's really hard to find exact case, why it happens. There is a bruteforce
approach, sweep all deleted documents ie forcemerge until there is no
deleted docs.
Can it happen that standalone docs and parent blocks are mixed in the index?
On Wed, Mar 2, 2016 at 2:04 PM, Sathyakumar Seshachala
Hi,
my schema looks like this
multiValued="true"/>
stored="false" multiValued="true"/>
I'd like to get the tagIds of documents with a certain tagDescription
(and text). However tagIds contains multiple ids in the same order as
tagDescription and simple faceting would return all. Is there a
Thanks for your email Koji. Can you please explain what is the role of
tokenizer and filter so I can understand why I should not have two tokenizer in
index and I should have at least one tokenizer in query?
My understanding is tokenizer is used to say how the content should be indexed
physical
On 3/2/2016 9:55 AM, G, Rajesh wrote:
> Thanks for your email Koji. Can you please explain what is the role of
> tokenizer and filter so I can understand why I should not have two tokenizer
> in index and I should have at least one tokenizer in query?
You can't have two tokenizers. It's not all
Hi,
I noticed some strange behavior when deleting orphaned child documents in
Solr 5.3.1. I am indexing nested documents in parent/child hierarchy. When I
delete a child document whose parent is already deleted previously, child
document still shows up in search. I am using deleteById(
Well, with the understanding that someone who isn’t involved in the process is
describing something that isn’t built yet...
I could imagine changes like:
- Core discovery ignores cores that aren’t present in the ZK cluster state
- New cores are automatically created to bring a node in line with
when it indexes a document block it have to assign not a but
"_root_"
field, but deleteById() is unaware of it.
On Wed, Mar 2, 2016 at 8:16 PM, Naeem Tahir
wrote:
> Hi,
>
> I noticed some strange behavior when deleting orphaned child
> documents in Solr 5.3.1. I am indexing nested document
It makes no sense to facet on a “text_general” ananlyzed field. Can you give a
concrete example with a few dummy docs and show some queries (do you query the
tagDescription field?) and wanted facet output?
There may be several ways to solve the task, depending on the exact use case.
One solutio
Hello All
I am tryiing to import data from one solr cloud into another using
SolrEntityProcessor. My schema got changed and I need to reindex
1. Does SolrEntityProcessor works with Solr cloud to get data from Solr Cloud ?
It looks it will not work as SolrEntityProcessor code is creating
instance
Hi,
I will try that approach. Deleting and force merging before adding the
blocks.
In my case, yes there are standalone docs (without any parents) and then
there is blocks with parents and its children in the same index.
Note however that docs in the blocks are unique in that the children,
there i
Hi,
For currency, as suggested in the wiki and guide, the field type
is currency and the defaults would take usd and will take the exchange
rates from the currency.xml file located in the conf dir. We have script
that talks to google apis for the current currency exchange and
symlinked to
Has anyone tried -XX:ParGCCardsPerStrideChunk with Solr?
There has been reports of improved GC times.
--
Bill Bell
billnb...@gmail.com
cell 720-256-8076
Hi,
We are using Solr 5.2 (on windows 2012 server/jdk 1.8) for document content
indexing/querying. We found that querying slows down intermittently under load
condition.
In our analysis we found two issues.
1) Solr is not effectively using caching.
Whenever new document indexed, it opens new
1) Experiment with the autowarming settings in solrconfig.xml. Since in
your case, you're indexing so frequently consider setting the count to a
low number, so that not a lot of time is spent warming the caches.
Alternatively if you're not very big on initial query response times being
small, you c
According to what you describe, I really don't see the need of core
discovery in Solr Cloud. It will only be used to eagerly load a core on
startup.
If I understand correctly, when ZK = truth, this eager loading can/should
be done by consulting zookeeper instead of local disk.
I agree that it is re
we do soft commit when we insert/update document.
//Insert Document
UpdateResponse resp = cloudServer.add(doc, 1000);
if (resp.getStatus() == 0)
{
success = true;
}
//Update Document
UpdateRequest req = new UpdateRequest();
req.setCommitWithin(1000);
req.add(docs);
UpdateResponse resp =
Adding extra information.
Our index size is around 120 GB (2 shard + 2 replica).
We have 400 GB RAM on our windows server. Solr is assigned 50 GB RAM. So
there is huge amount of free RAM (>300 GB) is available for OS.
We have very simple query which returns only 5 solr documents. Under load
On Thu, Mar 3, 2016 at 7:18 AM, Sathyakumar Seshachalam <
sathyakumar_seshacha...@trimble.com> wrote:
> In my case, yes there are standalone docs (without any parents) and then
> there is blocks with parents and its children in the same index.
>
As far as I know you can't mix them. Can you try to
Can you share the cache stats from the admin panel?
Also how much load are you talking about here? (Queries/second)
How many documents do you have?
Are you fetching any large stored fields?
On Thu, 3 Mar 2016, 12:31 Maulin Rathod, wrote:
> Adding extra information.
>
> Our index size is around 1
Hi,
In SolrCloud you would want to upload your new currency.xml to ZK and then call
the collections API for a reload.
Alternatively you could write your own ExchangeRate Provider for Google
implementing the interface ExchangeRateProvider.
The downside here is that each Solr node then will fetch
Committing after every doc is an anti-pattern. All the in-memory structures
are being thrown away after each update/insert.
Why do you think you need to do this? The usual pattern is to just let your
autocommit parameters in Solr config.XML do this for you.
Ditto with specifying commitWithin on e
44 matches
Mail list logo