Re: Metadata and HTML ending up in searchable text

2016-05-31 Thread Simon Blandford
Hi Alex, That sounds similar. I am puzzled by what I am seeing because it looks like a major bug and I am following the docs for curl as closely as possible, but hardly anyone else seems to have noticed it. To me it is a show-stopper. If I convert the docs to txt with html2text first then I

help need example code of solrj to get schema of a given core

2016-05-31 Thread Liu, Ming (Ming)
Hello, I am very new to Solr, I want to write a simple Java program to get a core's schema information. Like how many field and details of each field. I spent a few time searching on internet, but cannot get much information about this. The solrj wiki seems not updated for long time. I am using

Re: Can a DocTransformer access the whole results tree?

2016-05-31 Thread Upayavira
I was always under the impression that a search component couldn't modify the output of a previous search component. If it can, then the highlight component could add its results to the output of the query component, and we're done. Upayavira (who sees the confusion on people's faces often when he

Re: searching in two indices

2016-05-31 Thread Mikhail Khludnev
Hello Bernd, I recently committed [subquery] document transformer which sounds pretty much the same. Find the details at https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents It's not yet released, thus I appreciate if you take a nightly build from https://builds.apache.or

Re: help need example code of solrj to get schema of a given core

2016-05-31 Thread Georg Sorst
Querying the schema can be done with the Schema API ( https://cwiki.apache.org/confluence/display/solr/Schema+API), which is fully supported by SolrJ: http://lucene.apache.org/solr/6_0_0/solr-solrj/org/apache/solr/client/solrj/request/schema/package-summary.html . Liu, Ming (Ming) schrieb am Di.,

RE: Metadata and HTML ending up in searchable text

2016-05-31 Thread Allison, Timothy B.
>> From the same page, extractFormat=text only applies when extractOnly >> is true, which just shows the output from tika without indexing the document. Y, sorry. I just looked through the source code. You're right. If you use DIH (TikaEntityProcessor) instead of Solr Cell (ExtractingDocumen

Re: Solr vs JDBC driver

2016-05-31 Thread Vachon , Jean-Sébastien
I am using Java 8 (JDK 1.8.091) and it’s an application layer on top of Solr 6 using SolrJ. Here is the section of my pom.xml org.apache.solr solr-solrj 6.0.0 I had to manually load the driver (“org.apache.solr.client.solrj.io.sql.DriverImpl") to

Re: searching in two indices

2016-05-31 Thread Bernd Fehling
Hi Mikhail, I will check that out, thanks. Regards, Bernd Am 31.05.2016 um 10:53 schrieb Mikhail Khludnev: > Hello Bernd, > > I recently committed [subquery] document transformer which sounds pretty > much the same. > Find the details at > https://cwiki.apache.org/confluence/display/solr/Transf

Solr leaking references to deleted files

2016-05-31 Thread Gavin Harcourt
Hi All, I've noticed on some of my solr nodes that the disk usage is increasing over time. After checking the output of lsof I found hundreds of references to deleted index files being held by solr. This totaled 24GB on a 16GB index. A restart of solr can obviously fix this but this is not an

Re: Clarity on Sharding Concepts.

2016-05-31 Thread Siddhartha Singh Sandhu
Thank you Mugeesh. On Tue, May 31, 2016 at 12:19 AM, Mugeesh Husain wrote: > Hi, > > To read out this document > > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud > for proper understanding. > > FYI, you are using implicit router, a document will be divided

Sorting documents in one core based on a field in another core

2016-05-31 Thread Mark Robinson
Hi, I have a requirement to sort records in one core/ collection based on a field in another core/collection. Could some one please advise how it can be done in SOLR. I have used !join to restrict documents in one core based on field values in another core. Is there some way to sort like that?

Add a new field dynamically to each of the result docs and sort on it

2016-05-31 Thread Mark Robinson
Hi, My core does not have a field say *fieldnew*. *Case 1:-* But in my results I would like to have *fieldnew *also and my results should be sorted on only this new field. *Case 2:-* Just adding one more case further. Suppose I have other fields also in the sort criteria and *fieldnew *is one am

Re: Add a new field dynamically to each of the result docs and sort on it

2016-05-31 Thread Erick Erickson
I really don't understand this. If you don't have "fieldnew", where is the value coming from? It's not in the index so If you mean you're _adding_ a field after the index already has some docs in it, then the normal sort rules apply and you can specify sortMisingFirst/Last to tell Solr where o

Re: Sorting documents in one core based on a field in another core

2016-05-31 Thread Erick Erickson
Join doesn't work like that, which is why it's referred to as "pseudo join". There's no way that I know of to do what you want here. I'd strongly recommend you flatten your data at index time. Best, Erick On Tue, May 31, 2016 at 7:41 AM, Mark Robinson wrote: > Hi, > > I have a requirement to so

Re: Solr leaking references to deleted files

2016-05-31 Thread Erick Erickson
Possibly: SOLR-9116 or SOLR-9117? Note those two require that the core be reloaded, so you have to be doing something a bit unusual for them to be the problem. Best, Erick On Tue, May 31, 2016 at 5:41 AM, Gavin Harcourt wrote: > Hi All, > > I've noticed on some of my solr nodes that the disk usa

Re: float or string type for a field with whole number and decimal number values?

2016-05-31 Thread Erick Erickson
First, when changing the topic of the thread, please start a new thread. This is called "thread hijacking" and makes it difficult to find threads later. Collection aliasing does not do _anything_ about adding/deleting/whatever. It's just a way to do exactly what you want. Your clients point to myc

Re: Solr leaking references to deleted files

2016-05-31 Thread Gavin Harcourt
Those two bugs would make sense as we have been reloading the cores quite frequently recently to apply new config and schema changes. I'll keep an eye on the situation now our reload spree has ended and see if it recurs. Thanks, Gavin. On 31/05/16 16:14, Erick Erickson wrote: Possibly: SOLR-

Re: Clarity on Sharding Concepts.

2016-05-31 Thread Siddhartha Singh Sandhu
Hi Mugeesh, I was speculating whether sharding is done on: 1. index terms with each shard having the whole document space. 2. document space with each shard have num(documents/no. of shards) of the documents divided between them. Regards, Sid. On Tue, May 31, 2016 at 12:19 AM, Mugeesh Husain w

Re: SolrCloud Shard console shows roughly same number of documents?

2016-05-31 Thread Siddhartha Singh Sandhu
Hi, I was speculating whether sharding is done on: 1. index terms with each shard having the whole document space. 2. document space with each shard have num(documents/no. of shards) of the documents divided between them. Regards, Sid. On Tue, May 31, 2016 at 9:27 AM, Siddhartha Singh Sandhu <

Re: Sorting documents in one core based on a field in another core

2016-05-31 Thread Mark Robinson
Thanks for the reply Eric! Can we write a custom sort component to achieve this?... I am thinking of normalizing as the last option as clear separation of the cores helps me. Thanks! Mark. On Tue, May 31, 2016 at 11:12 AM, Erick Erickson wrote: > Join doesn't work like that, which is why it's

RE: Clarity on Sharding Concepts.

2016-05-31 Thread Garth Grimm
Both. One shard will have roughly half the documents, and the indices built from them; the other shard will have the other half of the documents, and the indices built from those. There won't be one location that contains all the documents, nor all the indices. -Original Message- From

Re: Add a new field dynamically to each of the result docs and sort on it

2016-05-31 Thread Mark Robinson
sorry Eric... I did not phrase it right ... what I meant was the field is there in the schema, but I do not have values for it when normal indexing happens. When a query comes in, I want to populate value for this field in the results based on some values passed in the query. So what needs to be ac

ClusterState says we are the leader, but locally we don't think so

2016-05-31 Thread Jon Drews
We have seen the following error on four separate instances of Solr. The result is that all or most shards go into "Down" state and do not recover on restart of Solr. I'm hoping one of you has some insight into what might be causing it as we haven't been able to track down the issue or reproduce i

Re: Add a new field dynamically to each of the result docs and sort on it

2016-05-31 Thread Shawn Heisey
On 5/31/2016 10:16 AM, Mark Robinson wrote: > sorry Eric... I did not phrase it right ... what I meant was the field is > there in the schema, but I do not have values for it when normal indexing > happens. > When a query comes in, I want to populate value for this field in the > results based on s

Re: SolrCloud Shard console shows roughly same number of documents?

2016-05-31 Thread Shawn Heisey
On 5/31/2016 9:53 AM, Siddhartha Singh Sandhu wrote: > I was speculating whether sharding is done on: 1. index terms with > each shard having the whole document space. 2. document space with > each shard have num(documents/no. of shards) of the documents divided > between them. If the router for

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-31 Thread Alessandro Benedetti
Interesting developments : https://issues.apache.org/jira/browse/SOLR-9176 I think we found why term Enum seems slower in recent Solr ! In our case it is likely to be related to the commit I mention in the Jira. Have a check Joel ! On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti < abenede

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-31 Thread Alessandro Benedetti
I think we found our performance killer here : https://issues.apache.org/jira/browse/SOLR-9176 Basically we were thinking to use Term Enum, but actually under the hood Solr forces you to use FCS with single valued numeric fields. In Solr 4 was not like that. I checked the commit related , and it

Re: [Solr 6] Legacy faceting Term Enum method VS DocValues

2016-05-31 Thread Alessandro Benedetti
Further investigations lead to : https://issues.apache.org/jira/browse/SOLR-9176 On Tue, May 24, 2016 at 12:47 PM, Alessandro Benedetti < abenede...@apache.org> wrote: > Hi guys, > It has been a while I was thinking about this and yesterday I took a look > into the code : > > I was wondering if

Re: ClusterState says we are the leader, but locally we don't think so

2016-05-31 Thread Jon Drews
I forgot to add that this is Apache Solr 5.3.1. There are three collections, two of which have one shard and and the other has 3-5 shards. Approximately 200,000 documents across all collections. Jon Drews jondrews.com On Tue, May 31, 2016 at 12:15 PM, Jon Drews wrote: > We have seen the follow

Re: Solr vs JDBC driver

2016-05-31 Thread Joel Bernstein
You mentioned that you had to use Class.foreName() for other drivers as well. Possibly there is something in your setup that is suppressing the driver auto loading. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, May 31, 2016 at 8:30 AM, Vachon, Jean-Sébastien < jvac...@cebglobal.com> wrote:

Alternate Port Not Working for Solr 6.0.0

2016-05-31 Thread Teague James
Hello, I am trying to install Solr 6.0.0 and have been successful with the default installation, following the instructions provided on the Apache Solr website. However, I do not want Solr running on port 8983, I want it to run on port 80. I started a new Ubuntu 14.04 VM, installed open JDK 8, the

Re: Alternate Port Not Working for Solr 6.0.0

2016-05-31 Thread John Bickerstaff
This may be no help at all, but my first thought is to wonder if anything else is already running on port 80? That might explain the somewhat silent "fail"... Nicely said by the way - resisting the urge On Tue, May 31, 2016 at 2:02 PM, Teague James wrote: > Hello, > > I am trying to install S

Re: Solr leaking references to deleted files

2016-05-31 Thread Erick Erickson
Cool, please let us know what you find out. On Tue, May 31, 2016 at 8:34 AM, Gavin Harcourt wrote: > Those two bugs would make sense as we have been reloading the cores quite > frequently recently to apply new config and schema changes. I'll keep an eye > on the situation now our reload spree has

Re: Alternate Port Not Working for Solr 6.0.0

2016-05-31 Thread Shawn Heisey
On 5/31/2016 2:02 PM, Teague James wrote: > Hello, I am trying to install Solr 6.0.0 and have been successful with > the default installation, following the instructions provided on the > Apache Solr website. However, I do not want Solr running on port 8983, > I want it to run on port 80. I started

Why Doesn't Solr Really Quit on Zookeeper Exceptions?

2016-05-31 Thread jimtronic
When I try to launch Solr 6.0 in cloud mode and connect it to a specific chroot in zookeeper that doesn't exist, I get an error in my solr.log. That's expected, but the solr process continues to launch and succeeds. Why wouldn't we want the start process simply to fail and exit? There's no mechan

Re: Add a new field dynamically to each of the result docs and sort on it

2016-05-31 Thread Erick Erickson
To have Lucene/Solr do the sorting, your value must be in the docs at search time. Consider the clause "&sort=my_field asc". If rows=10, then only the top 10 docs are kept. So if a doc's score is non-zero, it's value is compared against the 10 docs in the list and either replaces one or is discarde

Re: Why Doesn't Solr Really Quit on Zookeeper Exceptions?

2016-05-31 Thread Shawn Heisey
On 5/31/2016 2:34 PM, jimtronic wrote: > When I try to launch Solr 6.0 in cloud mode and connect it to a specific > chroot in zookeeper that doesn't exist, I get an error in my solr.log. > That's expected, but the solr process continues to launch and succeeds. > > Why wouldn't we want the start pro

Re: Sorting documents in one core based on a field in another core

2016-05-31 Thread Mikhail Khludnev
Hello Mark, Is it sounds like what's described at http://blog-archive.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html ? On Tue, May 31, 2016 at 5:41 PM, Mark Robinson wrote: > Hi, > > I have a requirement to sort records in one core/ collection based on a > field in > another core/c

Re: DIH Delete with Full Import

2016-05-31 Thread nikosmarinos
Thank you Kiran. Simple and nice. I lost a day today trying to make the delta-import work. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Delete-with-Full-Import-tp4040070p4279981.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff
All -- I'm now attempting to use the hon_lucene_synonyms project from github. I found the documents that were infered by the dead links on the readme in the repository -- however, given that I'm using Solr 5.4.x, I no longer have the need to integrate into a war file (as far as I can see). The s

Re: Why Doesn't Solr Really Quit on Zookeeper Exceptions?

2016-05-31 Thread Dennis Gove
The retry logic for errors in construction of SolrZooKeeper was added in https://issues.apache.org/jira/browse/SOLR-8599 and is in 5.5.1 and 6.0. I wonder if either that is not working as expected during startup or if startup is following a different code path. - Dennis On Tue, May 31, 2016 at 4:

Re: Why Doesn't Solr Really Quit on Zookeeper Exceptions?

2016-05-31 Thread jimtronic
Thanks Shawn. I'm leaning towards a retry as well. So, there's no mechanism that currently exists within Solr that would allow me to automatically retry the zookeeper connection on launch? My options then would be: 1. Externally monitor the status of Solr (eg /solr/admin/collections?action=CLUST

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread Joe Lawson
The docs are out of date for the synonym_edismax but it does work. Check out the tests for working examples. I'll try to update it soon. I've run the plugin on Solr 5 and 6, solrcloud and standalone. For running in SolrCloud make sure you follow https://cwiki.apache.org/confluence/display/solr/Addi

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff
Many thanks Joe! I'll follow the instructions on the linked webpage. On Tue, May 31, 2016 at 4:05 PM, Joe Lawson < jlaw...@opensourceconnections.com> wrote: > The docs are out of date for the synonym_edismax but it does work. Check > out the tests for working examples. I'll try to update it soon

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread Jeff Wartes
I’ve generally been dropping foreign plugin jars in this dir: server/solr-webapp/webapp/WEB-INF/lib/ This is because it then gets loaded by the same classloader as Solr itself, which can be useful if you’re, say, overriding some solr-protected-space method. If you don’t care about the classloader

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff
Thanks Jeff, I believe I tried that, and it still refused to load.. But I'd sure love it to work since the other process is a bit convoluted - although I see it's value in a large Solr installation. When I "locate" the jar on the linux command line I get: /opt/solr-5.4.0/server/solr-webapp/weba

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff
Jeff - Looking at the page, I'm unclear exactly how to set things up. I get using the blob api and I get adding the blob/jar to the collection, but the bit about runtimeLib=true is confusing. Does that go on the entry in the solrconfig.xml file like this? Is anything else required? (The bit a

Re: Add a new field dynamically to each of the result docs and sort on it

2016-05-31 Thread Chris Hostetter
: When a query comes in, I want to populate value for this field in the : results based on some values passed in the query. : So what needs to be accommodated in the result depends on a parameter in : the query and I would like to sort the final results on this field also, : which is dynamically p

Re: After Solr 5.5, mm parameter doesn't work properly

2016-05-31 Thread Greg Pendlebury
I don't think it is 8812. q.op was completely ignored by edismax prior to 5.5, so it is not mm that changed. If you do the same 5.4 query with q.op=OR I suspect it will not change the debug query at all. On 30 May 2016 at 21:07, Jan Høydahl wrote: > Hi, > > This may be related to SOLR-8812, but

Re: float or string type for a field with whole number and decimal number values?

2016-05-31 Thread Derek Poh
Sorry about that. Thank you for your explanation. I still have some questions on using and setting up collection alias for my current situation. I will start a new threadon this. On 5/31/2016 11:21 PM, Erick Erickson wrote: First, when changing the topic of the thread, please start a new thr

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread Shawn Heisey
On 5/31/2016 3:13 PM, John Bickerstaff wrote: > The suggestion on the readme is that I can drop the > hon_lucene_synonyms jar file into the $SOLR_HOME directory, but this > does not seem to be working - I'm getting class not found exceptions. What I typically do with *all* extra jars (dataimport,