Are you getting out of order scores? Or does the score change between
requests? Can you show us some results that you are getting so we might
see what's going on?
Upayavira
On Fri, Sep 11, 2015, at 05:07 AM, Modassar Ather wrote:
> Thanks Erick and Upayavira for the responses. One thing which I n
It sounds to me like you are wanting to *filter* your document to only
include terms within that medical dictionary. Or to have a keyword field
based upon those of your 100k terms that appear in that doc.
Synonyms are your saviour, if that's the case. Create a synonyms list
for your terms, they ca
I have secured solr cloud via basic authentication.
Now I am having difficulties creating cores and getting status information.
Solr keeps telling me that the request is unothorized. However, I have
access to the admin UI after login.
How do I configure solr to use the basic authentication creden
Many thanks pals.
I will walk some of those ways (and return with new questions)
;)
Best regards,
Francisco
El vie., 11 de sept. de 2015 a la(s) 5:41 a. m., Upayavira
escribió:
> It sounds to me like you are wanting to *filter* your document to only
> include terms within that medical dictionar
OK, I downgraded to solr 5.2.x
Unfortunatelly still no luck. I followed 2 aproaches:
1. Secure it the old fashioned way like described here:
http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password
2. Using the Basic Authentication Plugin like described here:
http://luci
There were some bugs with the 5.3.0 release and 5.3.1 is in the
process of getting released.
try out the option #2 with the RC here
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.1-RC1-rev1702389/solr/
On Fri, Sep 11, 2015 at 5:16 PM, Merlin Morgenstern
wrote:
> OK, I downgrade
I'll try that Thanks Upayavira.
From: Upayavira [u...@odoko.co.uk]
Sent: 09 September 2015 19:30
To: solr-user@lucene.apache.org
Subject: Re: Solr Join between two indexes taking too long.
I've never reviewed that join query debug info - very interesting.
It will take a little while to set-up a 5.3 version, hopefully I'll have some
results later next week.
From: Mikhail Khludnev [mkhlud...@griddynamics.com]
Sent: 11 September 2015 12:59
To: Russell Taylor
Subject: Re: Solr Join between two indexes taking too long.
Thank you all for your precious advice.
For now I'll just stick with building a stemmer and test the solr search
results.
Imtiaz Shakil Siddique
On Sep 11, 2015 3:20 AM, "Davis, Daniel (NIH/NLM) [C]"
wrote:
> Stop words for international indexing seem not too useful to me at this
> point.To
Greetings!
So, I've created my first index and am able to search programmatically
(through SolrJ) and through the Web interface. (Yay!) I get non-empty
results for my searches!
My index was built from database records using
/dataimport?command=full-import. I have 9936 records in the table
Running 4.8.1. I am experiencing the same problem where I get duplicates on
index update despite using overwrite=true when adding existing documents.
My duplicate ratio is a lot higher with maybe 25 - 50% of records having
duplicates (and as the index continues to run the duplicates increase from
2
Thank you for the info.
I have already downgraded to 5.2.x as this is a production setup.
Unfortunatelly I have the same trouble there ... Any suggestions how to fix
this? What is the recommended procedure in securing the admin gui on prod
setups?
2015-09-11 14:26 GMT+02:00 Noble Paul :
> There
On 9/11/2015 8:25 AM, Mr Havercamp wrote:
> Running 4.8.1. I am experiencing the same problem where I get duplicates on
> index update despite using overwrite=true when adding existing documents.
> My duplicate ratio is a lot higher with maybe 25 - 50% of records having
> duplicates (and as the ind
Assuming the medical dictionary is constant, I would do a copyField of
text into a separate field and have that separate field use:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
with words coming from the dictionary (normalized).
The authorization plugin is new in Solr 5.3.It is hard to describe a secure
Solr 5.2.1 environment simply - the basics are to protect /solr by placing it
behind Apache httpd or nginx, and also a port-based firewall. I am most
familiar with Apache httpd and Linux/RedHat family.
Within the
Hi Shawn
Thanks for your response.
fieldType def:
It is not SolrCloud.
Cheers
Hayden
On 11 September 2015 at 16:35, Shawn Heisey wrote:
> On 9/11/2015 8:25 AM, Mr Havercamp wrote:
> > Running 4.8.1. I am experiencing the same problem where I get duplicates
> on
> > index
Hi,
I'm having trouble negotiating the steep Solr learning curve...
1. I'm trying to store scanned and OCRed newspapers in PDF format into Solr
for full-text searching.
I've tried most (all?) of the examples and sample configurations that come
with Solr 5.3.0 and I can upload the PDFs.
Searching
On 9/11/2015 9:10 AM, Mr Havercamp wrote:
> fieldType def:
>
>
> sortMissingLast="true" />
>
> It is not SolrCloud.
As long as it's not a distributed index, I can't think of any problem
those field/type definitions might cause. Even if it were distributed
and you had the same do
Yeah, there are a lot of moving parts to connect
Let's see the highlight configuration you're
using. Should be in your solrconfig.xml file for the request
handler you're using. Are you calling out the field you want
highlighted in the hl.fl list?
Unfortunately getting specific fields populat
At query time, you could externally roll in the dups when they have the
same signature.
If you define your use case, it might be easier..
On 09/11/2015 11:55 AM, Shawn Heisey wrote:
On 9/11/2015 9:10 AM, Mr Havercamp wrote:
fieldType def:
It is not SolrCloud.
As long
Are you by any chance using the MERGEINDEXES
core admin call? Or using MapReduceIndexerTool?
Neither of those delete duplicates
This is a fundamental part of Solr though, so it's
virtually certain that there's some innocent-seeming
thing you're doing that's causing this...
Best,
Erick
On Fr
Several ideas, all shots in the dark because to analyze this we
need the schema definitions and the result of your query with
&debug=true added. In particular you'll see the "parsed query"
section near the bottom, and often the parsed query isn't
quite what you think it is. In particular this is of
Thanks for the suggestions. No, not using MERGEINDEXES nor
MapReduceIndexerTool.
I've pasted the XML in case there is something broken there (cut
down for brevity, i.e. the "..."):
123456789/3Test
SubmissionTest Submission11Test Collectiontest
collection|||Test CollectionTest
Collectionyoung,
ha
I'm wondering if the commitWithin is causing issues.
On 11 September 2015 at 18:52, Mr Havercamp wrote:
> Thanks for the suggestions. No, not using MERGEINDEXES nor
> MapReduceIndexerTool.
>
> I've pasted the XML in case there is something broken there (cut
> down for brevity, i.e. the "..."):
Additional experimenting lead me to the discovery that /dataimport does
*not* index words with a preceding %20 (a URL-encoded space), or in fact
*any* preceding %xx encoding. I can probably replace each %20 with a
'+' in each record of my database -- the dataimporter/indexer doesn't
sneeze at
OK, this makes no sense whatsoever, so I"m missing something.
commitWithin shouldn't matter at all, there's code to handle multiple
updates between commits.
I'm _really_ shooting in the dark here, but...
> did you perhaps change the definition from the default "id"
to "key" without blowing away
Hi Francisco,
>> I have many drug products leaflets, each corresponding to 1 product. In
the
other hand we have a medical dictionary with about 10^5 terms.
I want to detect all the occurrences of those terms for any leaflet
document.
Take a look at SolrTextTagger for this use case.
https://github.
Oh my. I'll leave it to the DIH guys to suggest whether there's
something that can be done with pure DIH, and offer a couple
of alternatives:
1> You could put a MappingCharFilterFactory in your analysis
chain. In the mapping file you can map things like:
"%20" => " " that would work with DIH as we
Thanks!
El vie, sep 11, 2015 14:39, Sujit Pal escribió:
> Hi Francisco,
>
> >> I have many drug products leaflets, each corresponding to 1 product. In
> the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> do
Hi,
I am having a huge data of about 600 Million documents.
These documents are relational and I need to maintain the relation in solr.
So, I am Indexing them as nested documents. It has nested documents within
nested documents.
Now, my problem is how to index them.
We are on Cloudera Solr 4.4
+1 on Sujit's recommendation: we have a similar use case (detecting drug
names / disease entities /MeSH terms ) and have been using the
SolrTextTagger with great success.
We run a separate Solr instance as a tagging service and add the detected
tags as metadata fields to a document before it is i
Hi,
I'm using Solr 5.3.0 and noticed that the following code does not work
with Solr Cloud:
CollectionAdminRequest.Reload reloadReq = new
CollectionAdminRequest.Reload();
reloadReq.process(client, collection);
It complains that the name parameter is required. When adding
reloadReq.set
Hello Lewin,
Block Join support is released in Solr 4.5.
On Fri, Sep 11, 2015 at 9:05 PM, Lewin Joy (TMS)
wrote:
> Hi,
>
> I am having a huge data of about 600 Million documents.
> These documents are relational and I need to maintain the relation in solr.
>
> So, I am Indexing them as nested d
Hi,
I have a simple Solr 5.3 cloud setup with two nodes using a manged
schema. I'm creating a collection using a schema that initially only
contains the id field. When documents get added I'm dynamically adding
the required fields. Currently this fails quite consistently as in bug
SOLR-7536 but ca
Oh Yes. We are upgrading Cloudera to get solr 4.10 just to get this block join
feature.
But, how do I index a nested document to use for block join for this huge a
dataset?
I could not find anyway to sculpt the morphline file for this use case.
Thank you for the reply, Mikhail
-Lewin
-Ori
On 9/11/2015 3:12 PM, Hendrik Haddorp wrote:
> I'm using Solr 5.3.0 and noticed that the following code does not work
> with Solr Cloud:
> CollectionAdminRequest.Reload reloadReq = new
> CollectionAdminRequest.Reload();
> reloadReq.process(client, collection);
>
> It complains that the name
the full stack is:
[9/11/15 23:36:17:406 CEST] 0216 SystemErr R Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://xxx.xxx.xxx.xxx:10001/solr: Missing required
parameter: name
[9/11/15 23:36:17:406 CEST] 0216 SystemErr R
This certainly can be fixed. Can you create a JIRA for the same? There
might be other calls which might need fixing on similar lines.
On Fri, Sep 11, 2015 at 2:32 PM, Shawn Heisey wrote:
> On 9/11/2015 3:12 PM, Hendrik Haddorp wrote:
> > I'm using Solr 5.3.0 and noticed that the following code d
Hi Merlin,
Solr 5.2.x only supported Kerberos out of the box and introduced a
framework to write your own authentication/authorization plugin. If you
don't use Kerberos, the only sensible way forward for you would be to wait
for the 5.3.1 release to come out and then move to it.
Until then, or wi
You need to override
org.apache.solr.morphlines.solr.LoadSolrBuilder.LoadSolr.doProcess(Record).
Now LoadSolrBuilder.LoadSolr.convert(Record) copies all record fields into
SolrInputDocuments fields.
SolrInputDocument.addChildDocument(SolrInputDocument) nests a doc.
On Fri, Sep 11, 2015 at 11:27 PM
I created https://issues.apache.org/jira/browse/SOLR-8042
On 11/09/15 23:41, Anshum Gupta wrote:
> This certainly can be fixed. Can you create a JIRA for the same? There
> might be other calls which might need fixing on similar lines.
>
> On Fri, Sep 11, 2015 at 2:32 PM, Shawn Heisey wrote:
>
>>
41 matches
Mail list logo