date:20150911

Re: Search results differs with sorting on pagination.

2015-09-11 Thread Upayavira

Are you getting out of order scores? Or does the score change between requests? Can you show us some results that you are getting so we might see what's going on? Upayavira On Fri, Sep 11, 2015, at 05:07 AM, Modassar Ather wrote: > Thanks Erick and Upayavira for the responses. One thing which I n

Re: Detect term occurrences

2015-09-11 Thread Upayavira

It sounds to me like you are wanting to *filter* your document to only include terms within that medical dictionary. Or to have a keyword field based upon those of your 100k terms that appear in that doc. Synonyms are your saviour, if that's the case. Create a synonyms list for your terms, they ca

Solr authentication - Error 401 Unauthorized

2015-09-11 Thread Merlin Morgenstern

I have secured solr cloud via basic authentication. Now I am having difficulties creating cores and getting status information. Solr keeps telling me that the request is unothorized. However, I have access to the admin UI after login. How do I configure solr to use the basic authentication creden

Re: Detect term occurrences

2015-09-11 Thread Francisco Andrés Fernández

Many thanks pals. I will walk some of those ways (and return with new questions) ;) Best regards, Francisco El vie., 11 de sept. de 2015 a la(s) 5:41 a. m., Upayavira escribió: > It sounds to me like you are wanting to *filter* your document to only > include terms within that medical dictionar

Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Merlin Morgenstern

OK, I downgraded to solr 5.2.x Unfortunatelly still no luck. I followed 2 aproaches: 1. Secure it the old fashioned way like described here: http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password 2. Using the Basic Authentication Plugin like described here: http://luci

Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Noble Paul

There were some bugs with the 5.3.0 release and 5.3.1 is in the process of getting released. try out the option #2 with the RC here https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.1-RC1-rev1702389/solr/ On Fri, Sep 11, 2015 at 5:16 PM, Merlin Morgenstern wrote: > OK, I downgrade

RE: Solr Join between two indexes taking too long.

2015-09-11 Thread Russell Taylor

I'll try that Thanks Upayavira. From: Upayavira [u...@odoko.co.uk] Sent: 09 September 2015 19:30 To: solr-user@lucene.apache.org Subject: Re: Solr Join between two indexes taking too long. I've never reviewed that join query debug info - very interesting.

RE: Solr Join between two indexes taking too long.

2015-09-11 Thread Russell Taylor

It will take a little while to set-up a 5.3 version, hopefully I'll have some results later next week. From: Mikhail Khludnev [mkhlud...@griddynamics.com] Sent: 11 September 2015 12:59 To: Russell Taylor Subject: Re: Solr Join between two indexes taking too long.

RE: Stemmer and stopword Development

2015-09-11 Thread Imtiaz Shakil Siddique

Thank you all for your precious advice. For now I'll just stick with building a stemmer and test the solr search results. Imtiaz Shakil Siddique On Sep 11, 2015 3:20 AM, "Davis, Daniel (NIH/NLM) [C]" wrote: > Stop words for international indexing seem not too useful to me at this > point.To

Bug or Operator Error?

2015-09-11 Thread Mark Fenbers

Greetings! So, I've created my first index and am able to search programmatically (through SolrJ) and through the Web interface. (Yay!) I get non-empty results for my searches! My index was built from database records using /dataimport?command=full-import. I have 9936 records in the table

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp

Running 4.8.1. I am experiencing the same problem where I get duplicates on index update despite using overwrite=true when adding existing documents. My duplicate ratio is a lot higher with maybe 25 - 50% of records having duplicates (and as the index continues to run the duplicates increase from 2

Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Merlin Morgenstern

Thank you for the info. I have already downgraded to 5.2.x as this is a production setup. Unfortunatelly I have the same trouble there ... Any suggestions how to fix this? What is the recommended procedure in securing the admin gui on prod setups? 2015-09-11 14:26 GMT+02:00 Noble Paul : > There

Re: Duplicate Documents

2015-09-11 Thread Shawn Heisey

On 9/11/2015 8:25 AM, Mr Havercamp wrote: > Running 4.8.1. I am experiencing the same problem where I get duplicates on > index update despite using overwrite=true when adding existing documents. > My duplicate ratio is a lot higher with maybe 25 - 50% of records having > duplicates (and as the ind

Re: Detect term occurrences

2015-09-11 Thread Alexandre Rafalovitch

Assuming the medical dictionary is constant, I would do a copyField of text into a separate field and have that separate field use: http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html with words coming from the dictionary (normalized).

RE: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Davis, Daniel (NIH/NLM) [C]

The authorization plugin is new in Solr 5.3.It is hard to describe a secure Solr 5.2.1 environment simply - the basics are to protect /solr by placing it behind Apache httpd or nginx, and also a port-based firewall. I am most familiar with Apache httpd and Linux/RedHat family. Within the

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp

Hi Shawn Thanks for your response. fieldType def: It is not SolrCloud. Cheers Hayden On 11 September 2015 at 16:35, Shawn Heisey wrote: > On 9/11/2015 8:25 AM, Mr Havercamp wrote: > > Running 4.8.1. I am experiencing the same problem where I get duplicates > on > > index

Help storing + highlighting search results in PDF newspapers

2015-09-11 Thread Colin 't Hart

Hi, I'm having trouble negotiating the steep Solr learning curve... 1. I'm trying to store scanned and OCRed newspapers in PDF format into Solr for full-text searching. I've tried most (all?) of the examples and sample configurations that come with Solr 5.3.0 and I can upload the PDFs. Searching

Re: Duplicate Documents

2015-09-11 Thread Shawn Heisey

On 9/11/2015 9:10 AM, Mr Havercamp wrote: > fieldType def: > > > sortMissingLast="true" /> > > It is not SolrCloud. As long as it's not a distributed index, I can't think of any problem those field/type definitions might cause. Even if it were distributed and you had the same do

Re: Help storing + highlighting search results in PDF newspapers

2015-09-11 Thread Erick Erickson

Yeah, there are a lot of moving parts to connect Let's see the highlight configuration you're using. Should be in your solrconfig.xml file for the request handler you're using. Are you calling out the field you want highlighted in the hl.fl list? Unfortunately getting specific fields populat

Re: Duplicate Documents

2015-09-11 Thread Vivek Pathak

At query time, you could externally roll in the dups when they have the same signature. If you define your use case, it might be easier.. On 09/11/2015 11:55 AM, Shawn Heisey wrote: On 9/11/2015 9:10 AM, Mr Havercamp wrote: fieldType def: It is not SolrCloud. As long

Re: Duplicate Documents

2015-09-11 Thread Erick Erickson

Are you by any chance using the MERGEINDEXES core admin call? Or using MapReduceIndexerTool? Neither of those delete duplicates This is a fundamental part of Solr though, so it's virtually certain that there's some innocent-seeming thing you're doing that's causing this... Best, Erick On Fr

Re: Bug or Operator Error?

2015-09-11 Thread Erick Erickson

Several ideas, all shots in the dark because to analyze this we need the schema definitions and the result of your query with &debug=true added. In particular you'll see the "parsed query" section near the bottom, and often the parsed query isn't quite what you think it is. In particular this is of

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp

Thanks for the suggestions. No, not using MERGEINDEXES nor MapReduceIndexerTool. I've pasted the XML in case there is something broken there (cut down for brevity, i.e. the "..."): 123456789/3Test SubmissionTest Submission11Test Collectiontest collection|||Test CollectionTest Collectionyoung, ha

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp

I'm wondering if the commitWithin is causing issues. On 11 September 2015 at 18:52, Mr Havercamp wrote: > Thanks for the suggestions. No, not using MERGEINDEXES nor > MapReduceIndexerTool. > > I've pasted the XML in case there is something broken there (cut > down for brevity, i.e. the "..."):

Re: Bug or Operator Error?

2015-09-11 Thread Mark Fenbers

Additional experimenting lead me to the discovery that /dataimport does *not* index words with a preceding %20 (a URL-encoded space), or in fact *any* preceding %xx encoding. I can probably replace each %20 with a '+' in each record of my database -- the dataimporter/indexer doesn't sneeze at

Re: Duplicate Documents

2015-09-11 Thread Erick Erickson

OK, this makes no sense whatsoever, so I"m missing something. commitWithin shouldn't matter at all, there's code to handle multiple updates between commits. I'm _really_ shooting in the dark here, but... > did you perhaps change the definition from the default "id" to "key" without blowing away

Re: Detect term occurrences

2015-09-11 Thread Sujit Pal

Hi Francisco, >> I have many drug products leaflets, each corresponding to 1 product. In the other hand we have a medical dictionary with about 10^5 terms. I want to detect all the occurrences of those terms for any leaflet document. Take a look at SolrTextTagger for this use case. https://github.

Re: Bug or Operator Error?

2015-09-11 Thread Erick Erickson

Oh my. I'll leave it to the DIH guys to suggest whether there's something that can be done with pure DIH, and offer a couple of alternatives: 1> You could put a MappingCharFilterFactory in your analysis chain. In the mapping file you can map things like: "%20" => " " that would work with DIH as we

Re: Detect term occurrences

2015-09-11 Thread Francisco Andrés Fernández

Thanks! El vie, sep 11, 2015 14:39, Sujit Pal escribió: > Hi Francisco, > > >> I have many drug products leaflets, each corresponding to 1 product. In > the > other hand we have a medical dictionary with about 10^5 terms. > I want to detect all the occurrences of those terms for any leaflet > do

Morphline for Indexing Nested Document Structure

2015-09-11 Thread Lewin Joy (TMS)

Hi, I am having a huge data of about 600 Million documents. These documents are relational and I need to maintain the relation in solr. So, I am Indexing them as nested documents. It has nested documents within nested documents. Now, my problem is how to index them. We are on Cloudera Solr 4.4

Re: Detect term occurrences

2015-09-11 Thread simon

+1 on Sujit's recommendation: we have a similar use case (detecting drug names / disease entities /MeSH terms ) and have been using the SolrTextTagger with great success. We run a separate Solr instance as a tagging service and add the detected tags as metadata fields to a document before it is i

SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Hendrik Haddorp

Hi, I'm using Solr 5.3.0 and noticed that the following code does not work with Solr Cloud: CollectionAdminRequest.Reload reloadReq = new CollectionAdminRequest.Reload(); reloadReq.process(client, collection); It complains that the name parameter is required. When adding reloadReq.set

Re: Morphline for Indexing Nested Document Structure

2015-09-11 Thread Mikhail Khludnev

Hello Lewin, Block Join support is released in Solr 4.5. On Fri, Sep 11, 2015 at 9:05 PM, Lewin Joy (TMS) wrote: > Hi, > > I am having a huge data of about 600 Million documents. > These documents are relational and I need to maintain the relation in solr. > > So, I am Indexing them as nested d

adding fields to a managed schema using solr cloud

2015-09-11 Thread Hendrik Haddorp

Hi, I have a simple Solr 5.3 cloud setup with two nodes using a manged schema. I'm creating a collection using a schema that initially only contains the id field. When documents get added I'm dynamically adding the required fields. Currently this fails quite consistently as in bug SOLR-7536 but ca

RE: Morphline for Indexing Nested Document Structure

2015-09-11 Thread Lewin Joy (TMS)

Oh Yes. We are upgrading Cloudera to get solr 4.10 just to get this block join feature. But, how do I index a nested document to use for block join for this huge a dataset? I could not find anyway to sculpt the morphline file for this use case. Thank you for the reply, Mikhail -Lewin -Ori

Re: SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Shawn Heisey

On 9/11/2015 3:12 PM, Hendrik Haddorp wrote: > I'm using Solr 5.3.0 and noticed that the following code does not work > with Solr Cloud: > CollectionAdminRequest.Reload reloadReq = new > CollectionAdminRequest.Reload(); > reloadReq.process(client, collection); > > It complains that the name

Re: SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Hendrik Haddorp

the full stack is: [9/11/15 23:36:17:406 CEST] 0216 SystemErr R Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://xxx.xxx.xxx.xxx:10001/solr: Missing required parameter: name [9/11/15 23:36:17:406 CEST] 0216 SystemErr R

Re: SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Anshum Gupta

This certainly can be fixed. Can you create a JIRA for the same? There might be other calls which might need fixing on similar lines. On Fri, Sep 11, 2015 at 2:32 PM, Shawn Heisey wrote: > On 9/11/2015 3:12 PM, Hendrik Haddorp wrote: > > I'm using Solr 5.3.0 and noticed that the following code d

Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Anshum Gupta

Hi Merlin, Solr 5.2.x only supported Kerberos out of the box and introduced a framework to write your own authentication/authorization plugin. If you don't use Kerberos, the only sensible way forward for you would be to wait for the 5.3.1 release to come out and then move to it. Until then, or wi

Re: Morphline for Indexing Nested Document Structure

2015-09-11 Thread Mikhail Khludnev

You need to override org.apache.solr.morphlines.solr.LoadSolrBuilder.LoadSolr.doProcess(Record). Now LoadSolrBuilder.LoadSolr.convert(Record) copies all record fields into SolrInputDocuments fields. SolrInputDocument.addChildDocument(SolrInputDocument) nests a doc. On Fri, Sep 11, 2015 at 11:27 PM

Re: SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Hendrik Haddorp

I created https://issues.apache.org/jira/browse/SOLR-8042 On 11/09/15 23:41, Anshum Gupta wrote: > This certainly can be fixed. Can you create a JIRA for the same? There > might be other calls which might need fixing on similar lines. > > On Fri, Sep 11, 2015 at 2:32 PM, Shawn Heisey wrote: > >>

41 matches

Mail list logo