date:20160302

Commit after every document - alternate approach

2016-03-02 Thread sangeetha.subraman...@gtnexus.com

Hi All, I am trying to understand on how we can have commit issued to solr while indexing documents. Around 200K to 300K document/per hour with an avg size of 10 KB size each will be getting into SOLR . JAVA code fetches the document from MQ and streamlines it to SOLR. The problem is the client

Re: Indexing books, chapters and pages

2016-03-02 Thread Zaccheo Bagnati

Thanks Walter, the payload idea is something that I've never heard... it seems interesting but quite complex to implement. I think we'll have to write a custom filter to add page numbers and it's not clear to me how to retrieve payloads in the query result. However I'll try to go more in deep on th

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread danny teichthal

Thanks Jeff, I understand your philosophy and it sounds correct. Since we had many problems with zookeeper when switching to Solr Cloud. we couldn't make it as a source of knowledge and had to relay on a more stable source. The issues is that when we get such an event of zookeeper, it brought our s

Re: understand scoring

2016-03-02 Thread michael solomon

Thanks you, @Doug Turnbull I tried http://splainer.io but it's not for my query(not explain for the docs..). here the picture again... https://drive.google.com/file/d/0B-7dnH4rlntJc2ZWdmxMS3RDMGc/view?usp=sharing On Tue, Mar 1, 2016 at 10:06 PM, Doug Turnbull < dturnb...@opensourceconnections.com>

Re: Indexing books, chapters and pages

2016-03-02 Thread Zaccheo Bagnati

Thanks Emir, a similar solution has already come in my mind too: searching on chapters, highlighting the result and retrieve matching pages parsing the highlighted result... surely not a very efficient approach but could work... however I think I'll try different approaches before this Il giorno m

Standard highlighting doesn't work for Block Join

2016-03-02 Thread michael solomon

IT WAS MY FIRST POST IN MAILING LIST SO NOT SURE IF YOU GET IT SO I'M SEND IT AGAIN Hi, I have solr 5.4.1 and I'm trying to use Block Join Query Parser for search in children and return the parent. I want to apply highlight on children but it's return empty. My q parameter: "q={!parent which="is_p

Re: Indexing books, chapters and pages

2016-03-02 Thread Zaccheo Bagnati

Thanks Jack, the chapter is definitely the optimal unit to search into and your solution seems a quite good approach. The counterpart is that, depending on how we'll choose the amount of text shared on two adjacent pages we will experience some errors. For example will be always possible finding a

Re: Indexing books, chapters and pages

2016-03-02 Thread Zaccheo Bagnati

Thanks Alexandre, your solution seems very good: I'll surely try it and let you know. I like the Idea of mixing blockjoins and grouping! Il giorno mer 2 mar 2016 alle ore 04:46 Alexandre Rafalovitch < arafa...@gmail.com> ha scritto: > Here is an - untested - possible approach. I might be missing

Re: Indexing books, chapters and pages

2016-03-02 Thread Zaccheo Bagnati

If someone of you cares about his Stackoverflow reputation and has time to do it I also opened a question there: http://stackoverflow.com/questions/35722672/solr-schema-to-model-books-chapters-and-pages. Thanks again to everybody Il giorno mer 2 mar 2016 alle ore 09:42 Zaccheo Bagnati ha scritto:

FW: Difference Between Tokenizer and filter

2016-03-02 Thread G, Rajesh

Hi Team, Can you please clarify the below. My understanding is tokenizer is used to say how the content should be indexed physically in file system. Filters are used to query result. The blow lines are from my setup. But I have seen eg that include filters inside and tokenizer in that confuse

Re: FW: Difference Between Tokenizer and filter

2016-03-02 Thread Emir Arnautovic

Hi Rajesh, Processing flow is same for both indexing and querying. What is compared at the end are resulting tokens. In general flow is: text -> char filter -> filtered text -> tokenizer -> tokens -> filter1 -> tokens ... -> filterN -> tokens. You can read more about analysis chain in Solr wi

Re: understand scoring

2016-03-02 Thread Emir Arnautovic

Hi Michael, Can you please run query with debug and share title field configuration. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 02.03.2016 09:14, michael solomon wrote: Thanks you, @Doug Turnbu

Re: Commit after every document - alternate approach

2016-03-02 Thread Emir Arnautovic

Hi Sangeetha, What is sure is that it is not going to work - with 200-300K doc/hour, there will be >50 commits/second, meaning there are <20ms time for doc+commit. You can do is let Solr handle commits and maybe use real time get to verify doc is in Solr or do some periodic sanity checks. Are y

Re: understand scoring

2016-03-02 Thread michael solomon

Hi Emir, In morning I delete those documents and know added them again to re-run the query.. and know this is how I expect (0_0) and I can't to re-produce the problem... this weird.. :\ On Wed, Mar 2, 2016 at 11:38 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Michael, > Can you

Re: [ISSUE] After restoring data to a Solrcloud instance

2016-03-02 Thread Janit Anjaria (Tech-IT)

Hi, Varun, we actually ran the test for our restored data snapshot and it threw an error saying "Broken segment". How is it possible that the same test gives success on the snapshot, but not on the restored snapshot? Can you please throw some light on this, so we can proceed and fix this issue.

FW: Difference Between Tokenizer and filter

2016-03-02 Thread G, Rajesh

Hi Team, Can you please clarify the below. My understanding is tokenizer is used to say how the content should be indexed physically in file system. Filters are used to query result. The blow lines are from my setup. But I have seen eg that include filters inside and tokenizer in that confuse

Re: [ISSUE] After restoring data to a Solrcloud instance

2016-03-02 Thread Varun Thacker

Could you post the full output of the CheckIndex command on the restored snapshot? Also what happens if you delete the snapshot indexes and attempt to restore again? Does it get corrupted again or is it a one off scenario? On Wed, Mar 2, 2016 at 3:44 PM, Janit Anjaria (Tech-IT) < anjaria.ja...@fli

RE: outlook email file pst extraction problem

2016-03-02 Thread Allison, Timothy B.

This is probably more of a Tika question now... It sounds like Tika is not extracting dates from the .eml files that you are generating? To confirm, you are able to extract dates with libpst...it is just that Tika is not able to process the dates that you are sending it in your .eml files? If

BlockJoinQuery parser and ArrayIndexOutOfBoundException

2016-03-02 Thread Sathyakumar Seshachalam

Am running in to this issue : https://issues.apache.org/jira/browse/SOLR-7606. But am not following all of the description there in that ticket. But what I am not able to understand is when does a parent/child orthogonality is broken. And what does a child document without a parent mean ? I hav

Re: Commit after every document - alternate approach

2016-03-02 Thread Varun Thacker

Hi Sangeetha, Well I don't think you need to commit after every document add. You can rely on Solr's transaction log feature . If you are using SolrCloud it's mandatory to have a transaction log . So every documents get written to the tlog . Now say a node crashes even if documents were not commi

Query solrcloud questions

2016-03-02 Thread michael solomon

Hi, I Installed 3 instances of SolrCloud 5.4.1. I'm doing a little search engine of websites and I'm store their info as Nested Documents(one document for the website general information and it children is the pages inside the website). So when I'm querying this collection I'm using a BlockJoin par

Re: FW: Difference Between Tokenizer and filter

2016-03-02 Thread Koji Sekiguchi

Hi, ... must have one and only one and it can have zero or more s. From the point of view of the rules, your ... is not correct because it has more than one and ... is not correct as well because it has no . Koji On 2016/03/02 20:25, G, Rajesh wrote: Hi Team, Can you please clarify the bel

Re: Non-contigous terms in SuggestComponent

2016-03-02 Thread Alfonso Muñoz-Pomer Fuentes

Hi Edwin. That was what I suspected, but I wanted to confirm. If we go down this route I’ll do some testing and post the results. We’re using 5.1 in production, but I’m testing with 5.4.1. The index has 40,891,287 documents and is 3.01 GB, so it’s not big at all. Many thanks, Alfonso On 01/

Re: BlockJoinQuery parser and ArrayIndexOutOfBoundException

2016-03-02 Thread Mikhail Khludnev

Hello, It's really hard to find exact case, why it happens. There is a bruteforce approach, sweep all deleted documents ie forcemerge until there is no deleted docs. Can it happen that standalone docs and parent blocks are mixed in the index? On Wed, Mar 2, 2016 at 2:04 PM, Sathyakumar Seshachala

facet on two multi-valued fields

2016-03-02 Thread Andreas Hubold

Hi, my schema looks like this multiValued="true"/> stored="false" multiValued="true"/> I'd like to get the tagIds of documents with a certain tagDescription (and text). However tagIds contains multiple ids in the same order as tagDescription and simple faceting would return all. Is there a

RE: FW: Difference Between Tokenizer and filter

2016-03-02 Thread G, Rajesh

Thanks for your email Koji. Can you please explain what is the role of tokenizer and filter so I can understand why I should not have two tokenizer in index and I should have at least one tokenizer in query? My understanding is tokenizer is used to say how the content should be indexed physical

Re: FW: Difference Between Tokenizer and filter

2016-03-02 Thread Shawn Heisey

On 3/2/2016 9:55 AM, G, Rajesh wrote: > Thanks for your email Koji. Can you please explain what is the role of > tokenizer and filter so I can understand why I should not have two tokenizer > in index and I should have at least one tokenizer in query? You can't have two tokenizers. It's not all

Solr (v5.3.1) doesn't delete orphaned child documents

2016-03-02 Thread Naeem Tahir

Hi, I noticed some strange behavior when deleting orphaned child documents in Solr 5.3.1. I am indexing nested documents in parent/child hierarchy. When I delete a child document whose parent is already deleted previously, child document still shows up in search. I am using deleteById(

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread Jeff Wartes

Well, with the understanding that someone who isn’t involved in the process is describing something that isn’t built yet... I could imagine changes like: - Core discovery ignores cores that aren’t present in the ZK cluster state - New cores are automatically created to bring a node in line with

Re: Solr (v5.3.1) doesn't delete orphaned child documents

2016-03-02 Thread Mikhail Khludnev

when it indexes a document block it have to assign not a but "_root_" field, but deleteById() is unaware of it. On Wed, Mar 2, 2016 at 8:16 PM, Naeem Tahir wrote: > Hi, > > I noticed some strange behavior when deleting orphaned child > documents in Solr 5.3.1. I am indexing nested document

Re: facet on two multi-valued fields

2016-03-02 Thread Jan Høydahl

It makes no sense to facet on a “text_general” ananlyzed field. Can you give a concrete example with a few dummy docs and show some queries (do you query the tagDescription field?) and wanted facet output? There may be several ways to solve the task, depending on the exact use case. One solutio

SolrEntityProcessor works with Solr Cloud

2016-03-02 Thread Neeraj Bhatt

Hello All I am tryiing to import data from one solr cloud into another using SolrEntityProcessor. My schema got changed and I need to reindex 1. Does SolrEntityProcessor works with Solr cloud to get data from Solr Cloud ? It looks it will not work as SolrEntityProcessor code is creating instance

Re: BlockJoinQuery parser and ArrayIndexOutOfBoundException

2016-03-02 Thread Sathyakumar Seshachalam

Hi, I will try that approach. Deleting and force merging before adding the blocks. In my case, yes there are standalone docs (without any parents) and then there is blocks with parents and its children in the same index. Note however that docs in the blocks are unique in that the children, there i

Currency field doubts

2016-03-02 Thread Pranaya Behera

Hi, For currency, as suggested in the wiki and guide, the field type is currency and the defaults would take usd and will take the exchange rates from the currency.xml file located in the conf dir. We have script that talks to google apis for the current currency exchange and symlinked to

XX:ParGCCardsPerStrideChunk

2016-03-02 Thread William Bell

Has anyone tried -XX:ParGCCardsPerStrideChunk with Solr? There has been reports of improved GC times. -- Bill Bell billnb...@gmail.com cell 720-256-8076

Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Maulin Rathod

Hi, We are using Solr 5.2 (on windows 2012 server/jdk 1.8) for document content indexing/querying. We found that querying slows down intermittently under load condition. In our analysis we found two issues. 1) Solr is not effectively using caching. Whenever new document indexed, it opens new

Re: Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Binoy Dalal

1) Experiment with the autowarming settings in solrconfig.xml. Since in your case, you're indexing so frequently consider setting the count to a low number, so that not a lot of time is spent warming the caches. Alternatively if you're not very big on initial query response times being small, you c

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread danny teichthal

According to what you describe, I really don't see the need of core discovery in Solr Cloud. It will only be used to eagerly load a core on startup. If I understand correctly, when ZK = truth, this eager loading can/should be done by consulting zookeeper instead of local disk. I agree that it is re

RE: Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Maulin Rathod

we do soft commit when we insert/update document. //Insert Document UpdateResponse resp = cloudServer.add(doc, 1000); if (resp.getStatus() == 0) { success = true; } //Update Document UpdateRequest req = new UpdateRequest(); req.setCommitWithin(1000); req.add(docs); UpdateResponse resp =

RE: Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Maulin Rathod

Adding extra information. Our index size is around 120 GB (2 shard + 2 replica). We have 400 GB RAM on our windows server. Solr is assigned 50 GB RAM. So there is huge amount of free RAM (>300 GB) is available for OS. We have very simple query which returns only 5 solr documents. Under load

Re: BlockJoinQuery parser and ArrayIndexOutOfBoundException

2016-03-02 Thread Mikhail Khludnev

On Thu, Mar 3, 2016 at 7:18 AM, Sathyakumar Seshachalam < sathyakumar_seshacha...@trimble.com> wrote: > In my case, yes there are standalone docs (without any parents) and then > there is blocks with parents and its children in the same index. > As far as I know you can't mix them. Can you try to

Re: Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Binoy Dalal

Can you share the cache stats from the admin panel? Also how much load are you talking about here? (Queries/second) How many documents do you have? Are you fetching any large stored fields? On Thu, 3 Mar 2016, 12:31 Maulin Rathod, wrote: > Adding extra information. > > Our index size is around 1

Re: Currency field doubts

2016-03-02 Thread Jan Høydahl

Hi, In SolrCloud you would want to upload your new currency.xml to ZK and then call the collections API for a reload. Alternatively you could write your own ExchangeRate Provider for Google implementing the interface ExchangeRateProvider. The downside here is that each Solr node then will fetch

Re: Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Erick Erickson

Committing after every doc is an anti-pattern. All the in-memory structures are being thrown away after each update/insert. Why do you think you need to do this? The usual pattern is to just let your autocommit parameters in Solr config.XML do this for you. Ditto with specifying commitWithin on e

44 matches

Mail list logo