how to update index by c/c++?

2013-08-29 Thread Kevin

i know query a key word by " 
http://localhost:8983/solr/collection1/select?q=*solr*&wt=json&indent=true";
how to add/update  records ?thanks




Kevin

Solr - facet fields that contain other facet fields

2015-12-28 Thread Kevin Lopez
*What I am trying to accomplish: *
Generate a facet based on the documents uploaded and a text file containing
terms from a domain/ontology such that a facet is shown if a term is in the
text file and in a document (key phrase extraction).

*The problem:*
When I select the facet for the term "*not necessarily*" (we see there is a
space) and I get the results for the term "*not*". The field is tokenized
and multivalued. This leads me to believe that I can not use a tokenized
field as a facet field. I tried to copy the values of the field to a text
field with a keywordtokenizer. I am told when checking the schema browser:
"Sorry, no Term Info available :(" This is after I delete the old index and
upload the documents again. The facet is coming from a field that is
already copied from another field, so I cannot copy this field to a text
field with a keywordtokenizer or strfield. What can I do to fix this? Is
there an alternate way to accomplish this?

*Here is my configuration:*










  



  
      
    

  



Regards,

Kevin


Re: Solr - facet fields that contain other facet fields

2015-12-28 Thread Kevin Lopez
I am not sure I am following correctly. The field I upload the document to
would be "content" the analyzed field is "ColonCancerField". The "content"
field contains the entire text of the document, in my case a pubmed
abstract. This is a tokenized field. I made this field untokenized and I
still received the same results [the results for not instead of not
necessarily (in my current example I have 2 docs with not and 1 doc with
not necessarily {not is of course in the document that contains not
necessarily})]:

http://imgur.com/a/1bfXT

I also tried this:

http://localhost:8983/solr/Cytokine/select?&q=ColonCancerField
:"not+necessarily"

I still receive the two documents, which is the same as doing
ColonCancerField:"not"

Just to clarify the structure looks like this: *content (untokenized,
unanalyzed)* [copied to]==> *ColonCancerField *(tokenized, analyzed) then I
browse the ColonCancerField and the facets state that there is 1 document
for not necessarily, but when selecting it, solr returns 2 results.

-Kevin

On Mon, Dec 28, 2015 at 10:22 AM, Jamie Johnson  wrote:

> Can you do the opposite?  Index into an unanalyzed field and copy into the
> analyzed?
>
> If I remember correctly facets are based off of indexed values so if you
> tokenize the field then the facets will be as you are seeing now.
> On Dec 28, 2015 9:45 AM, "Kevin Lopez"  wrote:
>
> > *What I am trying to accomplish: *
> > Generate a facet based on the documents uploaded and a text file
> containing
> > terms from a domain/ontology such that a facet is shown if a term is in
> the
> > text file and in a document (key phrase extraction).
> >
> > *The problem:*
> > When I select the facet for the term "*not necessarily*" (we see there
> is a
> > space) and I get the results for the term "*not*". The field is tokenized
> > and multivalued. This leads me to believe that I can not use a tokenized
> > field as a facet field. I tried to copy the values of the field to a text
> > field with a keywordtokenizer. I am told when checking the schema
> browser:
> > "Sorry, no Term Info available :(" This is after I delete the old index
> and
> > upload the documents again. The facet is coming from a field that is
> > already copied from another field, so I cannot copy this field to a text
> > field with a keywordtokenizer or strfield. What can I do to fix this? Is
> > there an alternate way to accomplish this?
> >
> > *Here is my configuration:*
> >
> > 
> >
> >  > multiValued="true" type="Cytokine_Pass"/>
> > 
> > 
> > 
> > 
> > 
> >
> >> stored="true" multiValued="true"
> >termPositions="true"
> >termVectors="true"
> >termOffsets="true"/>
> >  > sortMissingLast="true" omitNorms="true">
> > 
> >  > minShingleSize="2" maxShingleSize="5"
> > outputUnigramsIfNoShingles="true"
> > />
> >   
> >   
> >  > synonyms="synonyms_ColonCancer.txt" ignoreCase="true" expand="true"
> > tokenizerFactory="solr.KeywordTokenizerFactory"/>
> >  > words="prefLabels_ColonCancer.txt" ignoreCase="true"/>
> >   
> > 
> > 
> >
> > Regards,
> >
> > Kevin
> >
>


Re: Solr - facet fields that contain other facet fields

2015-12-29 Thread Kevin Lopez
Erick,

I am not sure when you say "the only available terms are "not" and
"necessarily"" is totally correct. I go into the schema browser and I can
see that there are two terms "not" and "not necessarily" with the correct
count. Unless these are not the terms you are talking about. Can you
explain to me what these are exactly.

http://imgur.com/m82CH2f

I see what you are saying, it may be best for me to do the entity
extraction separately, and put the terms into a special field, although I
would like the terms to be highlighted (or have some type of position so I
can highlight it).

Regards,

Kevin

On Mon, Dec 28, 2015 at 12:49 PM, Erick Erickson 
wrote:

> bq:  so I cannot copy this field to a text field with a
> keywordtokenizer or strfield
>
> 1> There is no restriction on whether a field is analyzed or not as far as
> faceting is concerned. You can freely facet on an analyzed field
> or String field or KeywordTokenized field. As Binoy says, though,
> faceting on large analyzed text fields is dangerous.
>
> 2> copyField directives are not chained. As soon as the
> field is received, before _anything_ is done the raw contents are
> pushed to the copyField destinations. So in your case the source
> for both copyField directives should be "content". Otherwise you
> get into interesting behavior if you, say,  copyField from A to B and
> have another copyField from B to A. I _suspect_ this is
> why you have no term info available, but check
>
> 3> This is not going to work as you're trying to implement it. If you
> tokenize, the only available terms are "not" and "necessarily". There
> is no "not necessarily" _token_ to facet on. If you use a String
> or KeywordAnalylzed field, likewise there is no "not necessarily"
> token, there will be a _single_ token that's the entire content of the
> field
> (I'm leaving aside, for instance, WordDelimiterFilterFactory
> modifications...).
>
> One way to approach this would be to recognize and index synthetic
> tokens representing the concepts. You'd pre-analyze the text, do your
> entity recognition and add those entities to a special "entity" field or
> some such. This would be an unanalyzed field that you facet on. Let's
> say your entity was "colon cancer". Whenever you recognized that in
> the text during indexing, you'd index "colon_cancer", or "disease_234"
> in your special field.
>
> Of course your app would then have to present this pleasingly, and
> rather than the app needing access to your dictionary the "colon_cancer"
> form would be easier to unpack.
>
> The fragility here is that changing your text file of entities would
> require
> you to re-index to re-inject them into documents.
>
> You could also, assuming you know all the entities that should match
> a given query form facet _queries_ on the phrases. This could get to be
> quite a large query, but has the advantage of not requiring re-indexing.
> So you'd have something like
> facet.query=field:"not necessarily"&facet.query=field:certainly
> etc.
>
> Best,
> Erick
>
>
> On Mon, Dec 28, 2015 at 9:13 AM, Binoy Dalal 
> wrote:
> > 1) When faceting use field of type string. That'll rid you of your
> > tokenization problems.
> > Alternatively do not use any tokenizers.
> > Also turn doc values on for the field. It'll improve performance.
> > 2) If however you do need to use a tokenized field for faceting, make
> sure
> > that they're pretty short in terms of number of tokens or else your app
> > will die real soon.
> >
> > On Mon, 28 Dec 2015, 22:24 Kevin Lopez  wrote:
> >
> >> I am not sure I am following correctly. The field I upload the document
> to
> >> would be "content" the analyzed field is "ColonCancerField". The
> "content"
> >> field contains the entire text of the document, in my case a pubmed
> >> abstract. This is a tokenized field. I made this field untokenized and I
> >> still received the same results [the results for not instead of not
> >> necessarily (in my current example I have 2 docs with not and 1 doc with
> >> not necessarily {not is of course in the document that contains not
> >> necessarily})]:
> >>
> >> http://imgur.com/a/1bfXT
> >>
> >> I also tried this:
> >>
> >> http://localhost:8983/solr/Cytokine/select?&q=ColonCancerField
> >> :"not+necessarily"
> >>
> >> I still receive the two documents, which is the same as doing
> &g

Re: Solr - facet fields that contain other facet fields

2015-12-31 Thread Kevin Lopez
Hi Erick,

I believe I have found a solution and I am putting plenty of detail for
future reference. I have taken your previous advice and decided to add a
field (cancerTerms) and add in the terms there. But I am not doing this
outside of Solr. I am using the analysis chain and passing it through a
ScriptUpdateProcessor. Here I can take the results of the analysis chain
and store the results to the document (as as strField). Then I facet on
this field (cancerTerms). This actually gives me the correct results, it
does not give me issues with the not and not necessarily or any other
similar issue. Also I am not storing the the analysis chain field (I was
previously). It makes no sense to store this because it was a copy field
(apparently copy fields only copy the source text then pipe it to the
analyzer, and cannot be chained). I am only storing the results of the
chain (which is useful for faceting).

Here is a simplified view as to what I am doing:

*Content* [is copied to] -> *ColonCancerField* (analysis chain [not stored,
and will produce tokenized strings]) ->*Passed to update-script* (processed
each token as string) [added to] -> *CancerTerms* (strField)

Here is an example Document:
id:2040ee23-c5dc-459c-969f-2ebf6c728184title:Immune profile modulation of
blood and mucosal eosinophils in nasal polyposis with concomitant asthma.
content:BACKGROUND: Chronic rhinosinusitis with nasal polyps (CRSwNP) is
frequently associated with asthma. Mucosal eosinophil (EO) infiltrate has
been found to correlate with asthma and disease severity but not
necessarily in ..SNIP.. and could explain the low benefit of
anti-IL-5 therapy for some patients with asthma and nasal polyposis.
cytokineTerms:t cell replacing factortype ii
interferonc7chemokineinterleukin 17 precursorleukocyte
mediatorinterleukinst cell replacing factort cell replacing factoril9
proteininterferon alpha-5cytokinesil9 proteincancerTerms:butnotnot
necessarilyalthough_version_:1522116540216901632score:1.0
Here is some of the code (please forgive the mess. I have included changes
for Solr ver. 5):

/***UpdateScript*/
> function getAnalyzerResult(analyzer, fieldName, fieldValue) {
>   var result = [];
>   var token_stream = analyzer.tokenStream(fieldName, new
> java.io.StringReader(fieldValue));//null value?
>   var term_att =
> token_stream.getAttribute(Packages.org.apache.lucene.analysis.tokenattributes.CharTermAttribute.class);
>   token_stream.reset();
>   while (token_stream.incrementToken()) {
> result.push(term_att.toString());
>   }
>   token_stream.end();
>   token_stream.close();
>   return result;
> }
> function processAdd(cmd) {
>   doc = cmd.solrDoc;  // org.apache.solr.common.SolrInputDocument
>   id = doc.getFieldValue("id");
>   logger.warn("update-script#processAdd: id=" + id);
>
>   var content = doc.getFieldValue("content"); // Comes from /update/extract
>   //facetList contains the actual facet terms
>   //facetAnalyzerName contains the Analyzer name for the term vector list
> names. (i.e the field type)
>   var facetList = ["cytokineTerms", "cancerTerms"];
>   var facetAnalyzerName = ["key_phrases", "ColonCancer"];
>   /*
> Loop through all of the facets, and get the analyzer and the name for
> the field
> Then add the terms to the document
>   */
>   for(var i = 0; i < facetList.length; i++){
> var analyzer =
> req.getCore().getLatestSchema().getFieldTypeByName(facetAnalyzerName[i]).getIndexAnalyzer();
> var terms = getAnalyzerResult(analyzer, null, content);
> for(var index = 0; index < terms.length; index++){
>  doc.addField(facetList[i], terms[index]);
> }
>   }
> }
> // The functions below must be defined, but there's rarely a need to
> implement
> // anything in these.
> function processDelete(cmd) {
>   // no-op
> }
> function processMergeIndexes(cmd) {
>   // no-op
> }
> function processCommit(cmd) {
>   // no-op
> }
> function processRollback(cmd) {
>   // no-op
> }
> function finish() {
>   // no-op
> }
> /***UpdateScript*/
> /updateRequestProcessorChain ***/
> 
>   
> update-script.js
>     
>   example config parameter
> 
>   
>  
>   
> 
> /updateRequestProcessorChain ***/



>  java -Durl=http://localhost:8983/solr/Cytokine/update -Dauto
> -Dparams=update.chain=script -jar bin/post.jar
> C:/Users/Kevin/Downloads/pubmed_result.json


Sources:

   1.
   http://lucidworks.com/blog/2013/06/27/poor-mans-entity-extraction-with-solr/
   2. https://www.youtube.com/watch?v

Use SqlEntityProcessor in cached mode to repeat a query for a nested child element

2016-02-04 Thread Kevin Colgan
Hi everyone,

Is it possible to use SqlEntityProcessor in cached mode to repeat a query for a 
nested child element? I'd like to use the entity query once to consolidate 
information from the children to the parent, then another to actually index the 
entities as children. 

Here's an example of what I'm trying to do in the db-config file. The 
EventsTransformer consolidates information from child events and adds fields to 
the parent row. I had to add the two entities as the EventsTransformer will 
only add fields to the parent if child=false:

This is NOT working - the child event entities aren't being created 

    
    


This is IS working but the events query is being run twice so indexing is twice 
as slow


    
    


Anyone got any idea how to do this? I've already tried nesting the second child 
entity inside the other but this didn't work.
Thanks,Kevin


Re: Use SqlEntityProcessor in cached mode to repeat a query for a nested child element

2016-02-04 Thread Kevin Colgan
you're right, that was a mistake in my code - I did actually using cacheKey but 
that didn't work so I was looking at the Java class for DIHCacheSupport to see 
if there were any other settings I could use 
https://lucene.apache.org/solr/5_4_0/solr-dataimporthandler/index.html?org/apache/solr/handler/dataimport/DIHCacheSupport.html
 
There doesn't seem to be a lot of documentation or examples for using cacheKey 
and SQLEntityProcessor around.

Regards,Kevin 

On Thursday, February 4, 2016 9:31 PM, Alexandre Rafalovitch 
 wrote:
 
 

 Where did cachePrimaryKey comes from? The documentation has cacheKey :
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

Regards,
    Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 5 February 2016 at 02:53, Kevin Colgan  wrote:
> Hi everyone,
>
> Is it possible to use SqlEntityProcessor in cached mode to repeat a query for 
> a nested child element? I'd like to use the entity query once to consolidate 
> information from the children to the parent, then another to actually index 
> the entities as children.
>
> Here's an example of what I'm trying to do in the db-config file. The 
> EventsTransformer consolidates information from child events and adds fields 
> to the parent row. I had to add the two entities as the EventsTransformer 
> will only add fields to the parent if child=false:
>
> This is NOT working - the child event entities aren't being created
>  query="select  from houses">
>            transformer="EventsTransformer"
>        name="events"
>        query="select '${houses.uid}_events_' || e_id::text AS uuid,fields> from events">
>            child=true
>        processor="SqlEntityProcessor" cachePrimaryKey="events_e_id" 
>cacheLookup="events_parsed.events_e_id" cacheImpl="SortedMapBackedCache"
>        name="events"
>        query="select  from events">
> 
>
> This is IS working but the events query is being run twice so indexing is 
> twice as slow
>
>  query="select  from houses">
>            transformer="EventsTransformer"
>        name="events_parsed"
>        query="select '${houses.uid}_events_' || e_id::text AS uuid, 
>e_id::text AS events_e_id, from events">
>            child=true
>        processor="SqlEntityProcessor" cachePrimaryKey="events_e_id" 
>cacheLookup="events_parsed.events_e_id" cacheImpl="SortedMapBackedCache"
>        transformer="EventsTransformer"
>        name="events_child"
>        query="select  from events">
> 
>
> Anyone got any idea how to do this? I've already tried nesting the second 
> child entity inside the other but this didn't work.
> Thanks,Kevin

 
  

[ANNOUNCE] YCSB 0.7.0 Release

2016-02-26 Thread Kevin Risden
On behalf of the development community, I am pleased to announce the
release of YCSB 0.7.0.

Highlights:

* GemFire binding replaced with Apache Geode (incubating) binding
* Apache Solr binding was added
* OrientDB binding improvements
* HBase Kerberos support and use single connection
* Accumulo improvements
* JDBC improvements
* Couchbase scan implementation
* MongoDB improvements
* Elasticsearch version increase to 2.1.1

Full release notes, including links to source and convenience binaries:
https://github.com/brianfrankcooper/YCSB/releases/tag/0.7.0

This release covers changes from the last 1 month.


Re: NoSuchFileException errors common on version 5.5.0

2016-03-10 Thread Kevin Risden
This sounds related to SOLR-8587 and there is a fix in SOLR-8793 that isn't
out in a release since it was fixed after 5.5 went out.

Kevin Risden
Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>

-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

On Thu, Mar 10, 2016 at 11:02 AM, Shawn Heisey  wrote:

> I have a dev system running 5.5.0.  I am seeing a lot of
> NoSuchFileException errors (for segments_XXXfilenames).
>
> Here's a log excerpt:
>
> 2016-03-10 09:52:00.054 INFO  (qtp1012570586-821) [   x:inclive]
> org.apache.solr.core.SolrCore.Request [inclive]  webapp=/solr
> path=/admin/luke
> params={qt=/admin/luke&show=schema&wt=javabin&version=2} status=500 QTime=1
> 2016-03-10 09:52:00.055 ERROR (qtp1012570586-821) [   x:inclive]
> org.apache.solr.servlet.HttpSolrCall
> null:java.nio.file.NoSuchFileException:
> /index/solr5/data/data/inc_0/index/segments_ias
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
>
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at
>
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
> at
>
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.nio.file.Files.readAttributes(Files.java:1737)
> at java.nio.file.Files.size(Files.java:2332)
> at
> org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:209)
> 
>
> I did not include the full stacktrace, only up to the first Lucene/Solr
> class.
>
> Most of the error logs are preceded by a request to the /admin/luke
> handler, like you see above, but there are also entries where a failed
> request is not logged right before the error.  My index maintenance
> program calls /admin/luke to programmatically determine the uniqueKey
> for the index.
>
> These errors do not seem to actually interfere with Solr operation, but
> they do concern me.
>
> Thanks,
> Shawn
>
>


Re: Which line is solr following in terms of a BI Tool?

2016-04-13 Thread Kevin Risden
For Solr 6, ParallelSQL and Solr JDBC driver are going to be developed more
as well as JSON facets. The Solr JDBC driver that is in Solr 6 contains
SOLR-8502. There are further improvements coming in SOLR-8659 that didn't
make it into 6.0. The Solr JDBC piece leverages ParallelSQL and in some
cases uses JSON facets under the hood.

The Solr JDBC driver should enable BI tools to connect to Solr and use the
language of SQL. This is also a familiar interface for many Java developers.

Just a note: Solr is not an RDBMS and shouldn't be treated like one even
with a JDBC driver. The Solr JDBC driver is more of a convenience for
querying.

Kevin Risden

On Tue, Apr 12, 2016 at 6:24 PM, Erick Erickson 
wrote:

> The unsatisfactory answer is that the have different characteristics.
>
> The analytics contrib does not work in distributed mode. It's not
> receiving a lot of love at this point.
>
> The JSON facets are estimations. Generally very close but are not
> guaranteed to be 100% accurate. The variance, as I understand it,
> is something on the order of < 1% in most cases.
>
> The pivot facets are accurate, but more expensive than the JSON
> facets.
>
> And, to make matters worse, the ParllelSQL way of doing some
> aggregations is going to give yet another approach.
>
> Best,
> Erick
>
> On Tue, Apr 12, 2016 at 7:15 AM, Pablo  wrote:
> > Hello,
> > I think this topic is important for solr users that are planning to use
> solr
> > as a BI Tool.
> > Speaking about facets, nowadays there are three majors way of doing
> (more or
> > less) the same  in solr.
> > First, you have the pivot facets, on the other hand you have the
> Analytics
> > component and finally you have the JSON Facet Api.
> > So, which line is Solr following? Which of these component is going to
> be in
> > constant development and which one is going to be deprecated sooner.
> > In Yonik page, there are some test that shows how JSON Facet Api performs
> > better than legacy facets, also the Api was way simpler than the pivot
> > facets, so in my case that was enough to base my solution around the JSON
> > Api. But I would like to know what are the thoughts of the solr
> developers.
> >
> > Thanks!
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Which-line-is-solr-following-in-terms-of-a-BI-Tool-tp4269597.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Question on Solr JDBC driver with SQL client like DB Visualizer

2016-04-15 Thread Kevin Risden
>
> Page 11, the screenshot specifies to select a
> "solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into
> "solr-solrj-6.0.0.jar" shipped with released version, correct?
>

Correct the PDF was generated before 6.0.0 was released. The documentation
from SOLR-8521 is being migrated to here:

https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SQLClientsandDatabaseVisualizationTools


> When I try adding that jar, it doesn't show up driver class, DBVisualizer
> still shows "No new driver class". Does it mean the class is not added to
> this jar yet?
>

I checked the Solr 6.0.0 release and the driver is there. I was testing it
yesterday for a blog series that I'm putting together.

Just for reference here is the output for the Solr 6 release:

tar -tvf solr-solrj-6.0.0.jar | grep sql
drwxrwxrwx  0 0  0   0 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/
-rwxrwxrwx  0 0  0 842 Apr  1 14:40
META-INF/services/java.sql.Driver
-rwxrwxrwx  0 0  0   10124 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
-rwxrwxrwx  0 0  0   23557 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
-rwxrwxrwx  0 0  04459 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/DriverImpl.class
-rwxrwxrwx  0 0  0   28333 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
-rwxrwxrwx  0 0  05167 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
-rwxrwxrwx  0 0  0   10451 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/StatementImpl.class
-rwxrwxrwx  0 0  0 141 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/package-info.class


Kevin Risden
Apache Lucene/Solr Committer
Hadoop and Search Tech Lead | Avalon Consulting, LLC
<http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>

-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.


Re: Parallel SQL Interface returns "java.lang.NullPointerException" after reloading collection

2016-05-03 Thread Kevin Risden
What I think is happening is that since the CloudSolrClient is from the
SolrCache and the collection was reloaded. zkStateReader is actually null
since there was no cloudSolrClient.connect() call after the reload. I think
that would cause the NPE on anything that uses the zkStateReader like
getClusterState().

ZkStateReader zkStateReader = cloudSolrClient.getZkStateReader();
ClusterState clusterState = zkStateReader.getClusterState();


Kevin Risden
Apache Lucene/Solr Committer
Hadoop and Search Tech Lead | Avalon Consulting, LLC
<http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>

-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

On Mon, May 2, 2016 at 9:58 PM, Joel Bernstein  wrote:

> Looks like the loop below is throwing a Null pointer. I suspect the
> collection has not yet come back online. In theory this should be self
> healing and when the collection comes back online it should start working
> again. If not then that would be a bug.
>
> for(String col : clusterState.getCollections()) {
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 2, 2016 at 10:06 PM, Ryan Yacyshyn 
> wrote:
>
> > Yes stack trace can be found here:
> >
> > http://pastie.org/10821638
> >
> >
> >
> > On Mon, 2 May 2016 at 01:05 Joel Bernstein  wrote:
> >
> > > Can you post your stack trace? I suspect this has to do with how the
> > > Streaming API is interacting with SolrCloud. We can probably also
> create
> > a
> > > jira ticket for this.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Sun, May 1, 2016 at 4:02 AM, Ryan Yacyshyn  >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'm exploring with parallel SQL queries and found something strange
> > after
> > > > reloading the collection: the same query will return a
> > > > java.lang.NullPointerException error. Here are my steps on a fresh
> > > install
> > > > of Solr 6.0.0.
> > > >
> > > > *Start Solr in cloud mode with example*
> > > > bin/solr -e cloud -noprompt
> > > >
> > > > *Index some data*
> > > > bin/post -c gettingstarted example/exampledocs/*.xml
> > > >
> > > > *Send query, which works*
> > > > curl --data-urlencode 'stmt=select id,name from gettingstarted where
> > > > inStock = true limit 2'
> http://localhost:8983/solr/gettingstarted/sql
> > > >
> > > > *Reload the collection*
> > > > curl '
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/admin/collections?action=RELOAD&name=gettingstarted
> > > > '
> > > >
> > > > After reloading, running the exact query above will return the null
> > > pointer
> > > > exception error. Any idea why?
> > > >
> > > > If I stop all Solr severs and restart, then it's fine.
> > > >
> > > > *java -version*
> > > > java version "1.8.0_25"
> > > > Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
> > > > Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
> > > >
> > > > Thanks,
> > > > Ryan
> > > >
> > >
> >
>


Re: Solr 6 / Solrj RuntimeException: First tuple is not a metadata tuple

2016-05-04 Thread Kevin Risden
>
> java.sql.SQLException: java.lang.RuntimeException: First tuple is not a
> metadata tuple
>

That is a client side error message meaning that the statement couldn't be
handled. There should be better error handling around this, but its not in
place currently.

And on Solr side, the logs seem okay:


The logs you shared don't seem to be the full logs. There will be a related
exception on the Solr server side. The exception on the Solr server side
will explain the cause of the problem.

Kevin Risden

On Wed, May 4, 2016 at 2:57 AM, deniz  wrote:

> I am trying to go through the steps  here
> <http://https://sematext.com/blog/2016/04/26/solr-6-as-jdbc-data-source/>
> to start playing with the new api, but I am getting:
>
> java.sql.SQLException: java.lang.RuntimeException: First tuple is not a
> metadata tuple
> at
>
> org.apache.solr.client.solrj.io.sql.StatementImpl.executeQuery(StatementImpl.java:70)
> at com.sematext.blog.App.main(App.java:28)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: java.lang.RuntimeException: First tuple is not a metadata tuple
> at
>
> org.apache.solr.client.solrj.io.sql.ResultSetImpl.(ResultSetImpl.java:75)
> at
>
> org.apache.solr.client.solrj.io.sql.StatementImpl.executeQuery(StatementImpl.java:67)
> ... 6 more
>
>
>
> My code is
>
> import java.sql.Connection;
> import java.sql.DriverManager;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.sql.Statement;
>
>
> /**
>  * Hello world!
>  *
>  */
> public class App
> {
> public static void main( String[] args )
> {
>
>
> Connection connection = null;
> Statement statement = null;
> ResultSet resultSet = null;
>
> try{
> String connectionString =
>
> "jdbc:solr://zkhost:port?collection=test&aggregationMode=map_reduce&numWorkers=1";
> connection = DriverManager.getConnection(connectionString);
> statement  = connection.createStatement();
> resultSet = statement.executeQuery("select id, text from test
> where tits=1 limit 5");
> while(resultSet.next()){
> String id = resultSet.getString("id");
> String nickname = resultSet.getString("text");
>
> System.out.println(id + " : " + nickname);
> }
> }catch(Exception e){
> e.printStackTrace();
> }finally{
> if (resultSet != null) {
> try {
> resultSet.close();
> } catch (Exception ex) {
> }
> }
> if (statement != null) {
> try {
> statement.close();
> } catch (Exception ex) {
> }
> }
> if (connection != null) {
> try {
> connection.close();
> } catch (Exception ex) {
> }
> }
> }
>
>
> }
> }
>
>
> I tried to figure out what is happening, but there is no more logs other
> than the one above. And on Solr side, the logs seem okay:
>
> 2016-05-04 15:52:30.364 INFO  (qtp1634198-41) [c:test s:shard1 r:core_node1
> x:test] o.a.s.c.S.Request [test]  webapp=/solr path=/sql
>
> params={includeMetadata=true&numWorkers=1&wt=json&version=2.2&stmt=select+id,+text+from+test+where+tits%3D1+limit+5&aggregationMode=map_reduce}
> status=0 QTime=3
> 2016-05-04 15:52:30.382 INFO  (qtp1634198-46) [c:test s:shard1 r:core_node1
> x:test] o.a.s.c.S.Request [test]  webapp=/solr path=/select
>
> params={q=(tits:"1")&distrib=false&fl=id,text,score&sort=score+desc&rows=5&wt=json&version=2.2}
> hits=5624 status=0 QTime=1
>
>
> The error is happening because of some missing handlers on errors on the
> code or because of some strict checks on IDE(Ideaj)? Anyone had similar
> issues while using sql with solrj?
>
>
> Thanks
>
> Deniz
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-6-Solrj-RuntimeException-First-tuple-is-not-a-metadata-tuple-tp4274451.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Upload core.properties to ZooKeeper

2015-08-06 Thread Kevin Lee
You should be able to use user defined properties within core.properties.  
However, it sounds like you are uploading core.properties to Zookeeper.  In 
SolrCloud, core.properties is not uploaded to Zookeeper.  You place 
core.properties within your core’s top level directory and the cores are 
automatically discovered.  Your configuration set which includes your 
solrconfig.xml and schema.xml would be uploaded to Zookeeper.

See the following around core discovery and the very last sentence on the page 
at the first link which speaks to “user defined” properties and the second link 
which discusses the different ways to use user defined properties.
https://cwiki.apache.org/confluence/display/solr/Defining+core.properties 
<https://cwiki.apache.org/confluence/display/solr/Defining+core.properties>
https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml#Configuringsolrconfig.xml-SubstitutingPropertiesinSolrConfigFiles
 
<https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml#Configuringsolrconfig.xml-SubstitutingPropertiesinSolrConfigFiles>

I just tried a quick test and defining a property in core.properties does get 
picked up and substituted in solrconfig.xml.  However, I tried to use 
solrcore.properties as described at the link above, but it did not work.  So 
you may need to just ensure your config set is uploaded (without 
core.properties) and make sure your core.properties exist in the root directory 
of each core with your custom properties.

See the section on “Upload a configuration directory” below for how to upload 
to Zookeeper.
https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities 
<https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities>

- Kevin

> On Aug 6, 2015, at 4:37 AM, Upayavira  wrote:
> 
> Have you looked at the collections API? It has the ability to set
> properties against collections. I wonder if that'll achieve the same
> thing as adding them to core.properties? I've never used it myself, but
> wonder if it'll solve your issue.
> 
> Upayavira
> 
> On Thu, Aug 6, 2015, at 12:35 PM, marotosg wrote:
>> Hi,
>> 
>> I am in the process of migrating my master, slave Solr infraestructure to
>> SolrCloud.
>> At the moment I have several cores inside a folder with this structure
>> /MyCores
>> /MyCores/Core1
>> /MyCores/Core1/conf
>> /MyCores/Core1/core.properties
>> /MyCores/Core2
>> /MyCores/Core2/conf
>> /MyCores/Core1/core.properties
>> 
>> As you can see I have several cores with a core.properties file inside
>> which
>> contain several variables to populate solrconfig.xml and schema.xml.
>> 
>> When uploading this info into Zookeeper it fails during the creation of a
>> collection because it's not able to resolve the core.properties
>> variables.
>> 
>> Do you have any idea? Is it possible to use core.properties with
>> SolrCloud?
>> 
>> Thanks
>> Sergio
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Upload-core-properties-to-ZooKeeper-tp4221259.html
>> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Difficulties in getting Solrcloud running

2015-08-19 Thread Kevin Lee
Hi,

Have you created a collection yet?  If not, then there won’t be a graph yet.  
It doesn’t show up until there is at least one collection.

- Kevin

> On Aug 19, 2015, at 5:48 AM, Merlin Morgenstern 
>  wrote:
> 
> HI everybody,
> 
> I am trying to setup solrcloud on ubuntu and somehow the graph on the admin
> interface does not show up. It is simply blanck. The tree is available.
> 
> This is a test installation on one machine.
> 
> There are 3 zookeepers running.
> 
> I start two solr nodes like this:
> 
> solr-5.2.1$ bin/solr start -cloud -s server/solr1 -p 8983 -z
> zk1:2181,zk1:2182,zk1:2183 -noprompt
> 
> solr-5.2.1$ bin/solr start -cloud -s server/solr2 -p 8984 -z
> zk1:2181,zk1:2182,zk1:2183 -noprompt
> 
> zk1 is a local interface with 10.0.0.120
> 
> it all looks OK, no error messages.
> 
> Thank you in advance for any help on this



Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-08-29 Thread Kevin Lee
Hi,

I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t seem to 
be working quite right.  Not sure if I’m missing steps or there is a bug.  I am 
able to get it to protect access to a URL under a collection, but am unable to 
get it to secure access to the Admin UI.  In addition, after stopping the Solr 
and Zookeeper instances, the security.json is still in Zookeeper, however Solr 
is allowing access to everything again like the security configuration isn’t in 
place.

Contents of security.json taken from wiki page, but edited to produce valid 
JSON.  Had to move comma after 3rd from last “}” up to just after the last “]”.

{
"authentication":{
   "class":"solr.BasicAuthPlugin",
   "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
},
"authorization":{
   "class":"solr.RuleBasedAuthorizationPlugin",
   "permissions":[{"name":"security-edit",
  "role":"admin"}],
   "user-role":{"solr":"admin"}
}}

Here are the steps I followed:

Upload security.json to zookeeper
./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
/security.json ~/solr/security.json

Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper at 
/security.json.  It is there and looks like what was originally uploaded.

Start Solr Instances

Attempt to create a permission, however get the following error:
{
  "responseHeader":{
"status":400,
"QTime":0},
  "error":{
"msg":"No authorization plugin configured",
"code":400}}

Upload security.json again.
./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
/security.json ~/solr/security.json

Issue the following to try to create the permission again and this time it’s 
successful.
// Create a permission for mysearch endpoint
curl --user solr:SolrRocks -H 
'Content-type:application/json' -d '{"set-permission": 
{"name":"mycollection-search","collection": 
“mycollection","path":”/mysearch","role": "search-user"}}' 
http://localhost:8983/solr/admin/authorization

{
"responseHeader":{
  "status":0,
 "QTime":7}}

Issue the following commands to add users
curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 
'Content-type:application/json' -d '{"set-user": {"admin" : “password" }}’
curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 
'Content-type:application/json' -d '{"set-user": {"user" : “password" }}'

Issue the following command to add permission to users
curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ "set-user-role" 
: {"admin": ["search-user", "admin"]}}' 
http://localhost:8983/solr/admin/authorization
curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ "set-user-role" 
: {"user": ["search-user"]}}' http://localhost:8983/solr/admin/authorization

After executing the above, access to /mysearch is protected until I restart the 
Solr and Zookeeper instances.  However, the admin UI is never protected like 
the Wiki page says it should be once activated.

https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin
 
<https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin>

Why does the authentication and authorization plugin not stay activated after 
restart and why is the Admin UI never protected?  Am I missing any steps?

Thanks,
Kevin

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-08-31 Thread Kevin Lee
Anyone else running into any issues trying to get the authentication and 
authorization plugins in 5.3 working?

> On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:
> 
> Hi,
> 
> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t seem 
> to be working quite right.  Not sure if I’m missing steps or there is a bug.  
> I am able to get it to protect access to a URL under a collection, but am 
> unable to get it to secure access to the Admin UI.  In addition, after 
> stopping the Solr and Zookeeper instances, the security.json is still in 
> Zookeeper, however Solr is allowing access to everything again like the 
> security configuration isn’t in place.
> 
> Contents of security.json taken from wiki page, but edited to produce valid 
> JSON.  Had to move comma after 3rd from last “}” up to just after the last 
> “]”.
> 
> {
> "authentication":{
>   "class":"solr.BasicAuthPlugin",
>   "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
> },
> "authorization":{
>   "class":"solr.RuleBasedAuthorizationPlugin",
>   "permissions":[{"name":"security-edit",
>  "role":"admin"}],
>   "user-role":{"solr":"admin"}
> }}
> 
> Here are the steps I followed:
> 
> Upload security.json to zookeeper
> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
> /security.json ~/solr/security.json
> 
> Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper at 
> /security.json.  It is there and looks like what was originally uploaded.
> 
> Start Solr Instances
> 
> Attempt to create a permission, however get the following error:
> {
>  "responseHeader":{
>"status":400,
>"QTime":0},
>  "error":{
>"msg":"No authorization plugin configured",
>"code":400}}
> 
> Upload security.json again.
> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
> /security.json ~/solr/security.json
> 
> Issue the following to try to create the permission again and this time it’s 
> successful.
> // Create a permission for mysearch endpoint
>curl --user solr:SolrRocks -H 'Content-type:application/json' -d 
> '{"set-permission": {"name":"mycollection-search","collection": 
> “mycollection","path":”/mysearch","role": "search-user"}}' 
> http://localhost:8983/solr/admin/authorization
>
>{
>  "responseHeader":{
>"status":0,
>"QTime":7}}
>
> Issue the following commands to add users
> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 
> 'Content-type:application/json' -d '{"set-user": {"admin" : “password" }}’
> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 
> 'Content-type:application/json' -d '{"set-user": {"user" : “password" }}'
> 
> Issue the following command to add permission to users
> curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ 
> "set-user-role" : {"admin": ["search-user", "admin"]}}' 
> http://localhost:8983/solr/admin/authorization
> curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ 
> "set-user-role" : {"user": ["search-user"]}}' 
> http://localhost:8983/solr/admin/authorization
> 
> After executing the above, access to /mysearch is protected until I restart 
> the Solr and Zookeeper instances.  However, the admin UI is never protected 
> like the Wiki page says it should be once activated.
> 
> https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin
>  
> <https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin>
> 
> Why does the authentication and authorization plugin not stay activated after 
> restart and why is the Admin UI never protected?  Am I missing any steps?
> 
> Thanks,
> Kevin


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Kevin Lee
Thanks for the clarification!  

So is the wiki page incorrect at 
https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin 
which says that the admin ui will require authentication once the authorization 
plugin is activated?

"An authorization plugin is also available to configure Solr with permissions 
to perform various activities in the system. Once activated, access to the Solr 
Admin UI and all requests will need to be authenticated and users will be 
required to have the proper authorization for all requests, including using the 
Admin UI and making any API calls."

If activating the authorization plugin doesn't protect the admin ui, how does 
one protect access to it?

Also, the issue I'm having is not just at restart.  According to the docs 
security.json should be uploaded to Zookeeper before starting any of the Solr 
instances.  However, I tried to upload security.json before starting any of the 
Solr instances, but it would not pick up the security config until after the 
Solr instances are already running and then uploading the security.json again.  
I can see in the logs at startup that the Solr instances don't see any plugin 
enabled even though security.json is already in zookeeper and then after they 
are started and the security.json is uploaded again I see it reconfigure to use 
the plugin.

Thanks,
Kevin

> On Aug 31, 2015, at 11:22 PM, Noble Paul  wrote:
> 
> Admin UI is not protected by any of these permissions. Only if you try
> to perform a protected operation , it asks for a password.
> 
> I'll investigate the restart problem and report my  findings
> 
>> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee  wrote:
>> Anyone else running into any issues trying to get the authentication and 
>> authorization plugins in 5.3 working?
>> 
>>> On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:
>>> 
>>> Hi,
>>> 
>>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t 
>>> seem to be working quite right.  Not sure if I’m missing steps or there is 
>>> a bug.  I am able to get it to protect access to a URL under a collection, 
>>> but am unable to get it to secure access to the Admin UI.  In addition, 
>>> after stopping the Solr and Zookeeper instances, the security.json is still 
>>> in Zookeeper, however Solr is allowing access to everything again like the 
>>> security configuration isn’t in place.
>>> 
>>> Contents of security.json taken from wiki page, but edited to produce valid 
>>> JSON.  Had to move comma after 3rd from last “}” up to just after the last 
>>> “]”.
>>> 
>>> {
>>> "authentication":{
>>> "class":"solr.BasicAuthPlugin",
>>> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
>>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
>>> },
>>> "authorization":{
>>> "class":"solr.RuleBasedAuthorizationPlugin",
>>> "permissions":[{"name":"security-edit",
>>>"role":"admin"}],
>>> "user-role":{"solr":"admin"}
>>> }}
>>> 
>>> Here are the steps I followed:
>>> 
>>> Upload security.json to zookeeper
>>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>>> /security.json ~/solr/security.json
>>> 
>>> Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper at 
>>> /security.json.  It is there and looks like what was originally uploaded.
>>> 
>>> Start Solr Instances
>>> 
>>> Attempt to create a permission, however get the following error:
>>> {
>>> "responseHeader":{
>>>  "status":400,
>>>  "QTime":0},
>>> "error":{
>>>  "msg":"No authorization plugin configured",
>>>  "code":400}}
>>> 
>>> Upload security.json again.
>>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>>> /security.json ~/solr/security.json
>>> 
>>> Issue the following to try to create the permission again and this time 
>>> it’s successful.
>>> // Create a permission for mysearch endpoint
>>>  curl --user solr:SolrRocks -H 'Content-type:application/json' -d 
>>> '{"set-permission": {"name":"mycollection-search","collection": 
>>> “mycollection","path":”/mysearch","role"

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Kevin Lee
The restart issues aside, I’m trying to lockdown usage of the Collections API, 
but that also does not seem to be working either.

Here is my security.json.  I’m using the “collection-admin-edit” permission and 
assigning it to the “adminRole”.  However, after uploading the new 
security.json and restarting the web browser, it doesn’t seem to be requiring 
credentials when calling the RELOAD action on the Collections API.  The only 
thing that seems to work is the custom permission “browse” which is requiring 
authentication before allowing me to pull up the page.  Am I using the 
permissions correctly for the RuleBasedAuthorizationPlugin?

{
"authentication":{
   "class":"solr.BasicAuthPlugin",
   "credentials": {
"admin”:” ",
"user": ” "
}
},
"authorization":{
   "class":"solr.RuleBasedAuthorizationPlugin",
   "permissions": [
{
"name":"security-edit", 
"role":"adminRole"
},
{
"name":"collection-admin-edit”,
"role":"adminRole"
},
{
"name":"browse", 
"collection": "inventory", 
"path": "/browse", 
"role":"browseRole"
}
],
   "user-role": {
"admin": [
"adminRole",
"browseRole"
],
"user": [
"browseRole"
]
}
}
}

Also tried adding the permission using the Authorization API, but no effect, 
still isn’t protecting the Collections API from being invoked without a 
username password.  I do see in the Solr logs that it sees the updates because 
it outputs the messages “Updating /security.json …”, “Security node changed”, 
“Initializing authorization plugin: solr.RuleBasedAuthorizationPlugin” and 
“Authentication plugin class obtained from ZK: solr.BasicAuthPlugin”.

Thanks,
Kevin

> On Sep 1, 2015, at 12:31 AM, Noble Paul  wrote:
> 
> I'm investigating why restarts or first time start does not read the
> security.json
> 
> On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul  wrote:
>> I removed that statement
>> 
>> "If activating the authorization plugin doesn't protect the admin ui,
>> how does one protect access to it?"
>> 
>> One does not need to protect the admin UI. You only need to protect
>> the relevant API calls . I mean it's OK to not protect the CSS and
>> HTML stuff.  But if you perform an action to create a core or do a
>> query through admin UI , it automatically will prompt you for
>> credentials (if those APIs are protected)
>> 
>> On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee  wrote:
>>> Thanks for the clarification!
>>> 
>>> So is the wiki page incorrect at
>>> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
>>>  which says that the admin ui will require authentication once the 
>>> authorization plugin is activated?
>>> 
>>> "An authorization plugin is also available to configure Solr with 
>>> permissions to perform various activities in the system. Once activated, 
>>> access to the Solr Admin UI and all requests will need to be authenticated 
>>> and users will be required to have the proper authorization for all 
>>> requests, including using the Admin UI and making any API calls."
>>> 
>>> If activating the authorization plugin doesn't protect the admin ui, how 
>>> does one protect access to it?
>>> 
>>> Also, the issue I'm having is not just at restart.  According to the docs 
>>> security.json should be uploaded to Zookeeper before starting any of the 
>>> Solr instances.  However, I tried to upload security.json before starting 
>>> any of the Solr instances, but it would not pick up the security config 
>>> until after the Solr instances are already running and then uploading the 
>>> security.json again.  I can see in the logs at startup that the Solr 
>>> 

Re: DataImportHandler scheduling

2015-09-01 Thread Kevin Lee
While it may be useful to have a scheduler for simple cases, I think there are 
too many variables to make it useful for everyone's case.  For example, I 
recently wrote a script that uses the data import handler api to get the 
status, kick off the import, etc.  However, before allowing it to just kick 
off, I needed to query the database where the data was coming from to make sure 
it had finished it's daily load and then if it hadn't finished, wait for awhile 
to see if it would, then the script could do the load.  After the load is 
finished it does another check to ensure the expected number of docs was 
actually loaded by Solr based on the data from the database.

If a scheduler were built into Solr it probably would only cover the simple 
case and for production you'd probably need to write your own scripts and use 
your own scheduler anyways to ensure the loads are starting/completing as 
expected.

> On Sep 1, 2015, at 1:09 PM, William Bell  wrote:
> 
> We should add a simple scheduler in the UI. It is very useful. To schedule
> various actions:
> 
> - Full index
> - Delta Index
> - Replicate
> 
> 
> 
> 
>> On Tue, Sep 1, 2015 at 12:41 PM, Shawn Heisey  wrote:
>> 
>>> On 9/1/2015 11:45 AM, Troy Edwards wrote:
>>> My initial thought was to use scheduling built with DIH:
>>> http://wiki.apache.org/solr/DataImportHandler#Scheduling
>>> 
>>> But I think just a cron job should do the same for me.
>> 
>> The dataimport scheduler does not exist in any Solr version.  This is a
>> proposed feature, with the enhancement issue open for more than four years:
>> 
>> https://issues.apache.org/jira/browse/SOLR-2305
>> 
>> I have updated the wiki page to state the fact that the scheduler is a
>> proposed improvement, not a usable feature.
>> 
>> Thanks,
>> Shawn
> 
> 
> -- 
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076


Using bq param for negative boost

2015-09-02 Thread Kevin Lee
Hi,

I’m trying to boost all results using the bq param with edismax where termA and 
termB do not appear in the field, but if phraseC appears it doesn’t matter if 
termA and termB appear.

The following works and boosts everything that doesn’t have termA and termB in 
myField so the effect is that all documents with termA and termB are pushed to 
the bottom of the result list.

myField:(*:* -termA -termB)^1

How would you add the second part where if phraseC is present, then termA and 
termB can be present?

Tried doing something like the following, but it is not working.

myField:(*:* ((-termA -termB) OR +”phraseC”))^1

Thanks!

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-02 Thread Kevin Lee
I’ve found that completely exiting Chrome or Firefox and opening it back up 
re-prompts for credentials when they are required.  It was re-prompting with 
the /browse path where authentication was working each time I completely exited 
and started the browser again, however it won’t re-prompt unless you exit 
completely and close all running instances so I closed all instances each time 
to test.

However, to make sure I ran it via the command line via curl as suggested and 
it still does not give any authentication error when trying to issue the 
command via curl.  I get a success response from all the Solr instances that 
the reload was successful.

Not sure why the pre-canned permissions aren’t working, but the one to the 
request handler at the /browse path is.


> On Sep 1, 2015, at 11:03 PM, Noble Paul  wrote:
> 
> " However, after uploading the new security.json and restarting the
> web browser,"
> 
> The browser remembers your login , So it is unlikely to prompt for the
> credentials again.
> 
> Why don't you try the RELOAD operation using command line (curl) ?
> 
> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee  wrote:
>> The restart issues aside, I’m trying to lockdown usage of the Collections 
>> API, but that also does not seem to be working either.
>> 
>> Here is my security.json.  I’m using the “collection-admin-edit” permission 
>> and assigning it to the “adminRole”.  However, after uploading the new 
>> security.json and restarting the web browser, it doesn’t seem to be 
>> requiring credentials when calling the RELOAD action on the Collections API. 
>>  The only thing that seems to work is the custom permission “browse” which 
>> is requiring authentication before allowing me to pull up the page.  Am I 
>> using the permissions correctly for the RuleBasedAuthorizationPlugin?
>> 
>> {
>>"authentication":{
>>   "class":"solr.BasicAuthPlugin",
>>   "credentials": {
>>"admin”:” ",
>>"user": ” "
>>}
>>},
>>"authorization":{
>>   "class":"solr.RuleBasedAuthorizationPlugin",
>>   "permissions": [
>>{
>>"name":"security-edit",
>>"role":"adminRole"
>>},
>>{
>>"name":"collection-admin-edit”,
>>"role":"adminRole"
>>},
>>{
>>"name":"browse",
>>"collection": "inventory",
>>"path": "/browse",
>>"role":"browseRole"
>>}
>>],
>>   "user-role": {
>>"admin": [
>>"adminRole",
>>"browseRole"
>>],
>>"user": [
>>"browseRole"
>>]
>>}
>>}
>> }
>> 
>> Also tried adding the permission using the Authorization API, but no effect, 
>> still isn’t protecting the Collections API from being invoked without a 
>> username password.  I do see in the Solr logs that it sees the updates 
>> because it outputs the messages “Updating /security.json …”, “Security node 
>> changed”, “Initializing authorization plugin: 
>> solr.RuleBasedAuthorizationPlugin” and “Authentication plugin class obtained 
>> from ZK: solr.BasicAuthPlugin”.
>> 
>> Thanks,
>> Kevin
>> 
>>> On Sep 1, 2015, at 12:31 AM, Noble Paul  wrote:
>>> 
>>> I'm investigating why restarts or first time start does not read the
>>> security.json
>>> 
>>> On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul  wrote:
>>>> I removed that statement
>>>> 
>>>> "If activating the authorization plugin doesn't protect the admin ui,
>>>> how does one protect access to it?"
>>>> 
>>>> One does not need to protect the admin UI. You only need to protect
>>>> the relevant API calls

Re: Error in creating a new collection

2015-09-03 Thread Kevin Lee
Configuration upload to zookeeper and collection creation are two separate 
things, although they can be accompished at the same time using /bin/solr.  You 
can upload configurations before you create collections and you can have 
mutiple configurations uploaded to zookeeper at the same time.  I typically 
upload my configurations using zkcli.sh in the server/scripts/cloud-scripts 
directory in Solr and then use curl to send a request to create the collection 
based on a configuration I uploaded to Zookeeper instead of using the /bin/solr 
script to do the upload and creation all at once.  So you may have several 
configs uploaded, but you may not necessarily have created collections for 
them.  

If you need to update the configuration, just re-upload using zkcli.sh (from 
Solr server/scripts/cloud-scripts not ZkCli.sh in zookeeper/bin) and then try 
the collection creation again using curl or the browser to issue the 
Collections API create command.  Re-uploading will overwrite the config in 
Zookeeper.

I don't believe what you are seeing is a bug, as you can upload anything you 
want to Zookeeper, but it doesn't mean it's valid.  Only Solr knows that once 
it tries to load what you've uploaded to Zookeeper.

- Kevin

> On Sep 3, 2015, at 12:36 AM, shacky  wrote:
> 
> Hi Shalin,
> thank you very much for your answer.
> 
> I found out and managed in recreating the problem.
> 
> I created a new collection, with the wrong configset. I got the error
> and the collection was not created, good.
> But after that I continue to see the "SolrCore Initialization
> Failures" in the Solr Admin web interface on all three nodes.
> I had to restart Solr to remove that error.
> 
> From this point if I try to change the configset and recreate the same
> collection I still continue to get the same previous error.
> 
> So I discovered that the configuration is still in Zookeeper:
> 
> root@index1:~# /usr/share/zookeeper/bin/zkCli.sh
> Connecting to localhost:2181
> Welcome to ZooKeeper!
> JLine support is enabled
> 
> WATCHER::
> 
> WatchedEvent state:SyncConnected type:None path:null
> [zk: localhost:2181(CONNECTED) 0] ls /solr/configs
> [test, test2]
> 
> Even if the "test2" collection was not created due to the previous error.
> 
> I had to remove (rmr) the configuration in Zookeeper to be able to
> recreate the collection.
> 
> I think this is a bug, isn't it?
> The configuration should be removed from Zookeeper if the collection
> was not created due to an error...


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-04 Thread Kevin Lee
Thanks, I downloaded the source and compiled it and replaced the jar file in 
the dist and solr-webapp’s WEB-INF/lib directory.  It does seem to be 
protecting the Collections API reload command now as long as I upload the 
security.json after startup of the Solr instances.  If I shutdown and bring the 
instances back up, the security is no longer in place and I have to upload the 
security.json again for it to take effect.

- Kevin

> On Sep 3, 2015, at 10:29 PM, Noble Paul  wrote:
> 
> Both these are committed. If you could test with the latest 5.3 branch
> it would be helpful
> 
> On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul  wrote:
>> I opened a ticket for the same
>> https://issues.apache.org/jira/browse/SOLR-8004
>> 
>> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee  wrote:
>>> I’ve found that completely exiting Chrome or Firefox and opening it back up 
>>> re-prompts for credentials when they are required.  It was re-prompting 
>>> with the /browse path where authentication was working each time I 
>>> completely exited and started the browser again, however it won’t re-prompt 
>>> unless you exit completely and close all running instances so I closed all 
>>> instances each time to test.
>>> 
>>> However, to make sure I ran it via the command line via curl as suggested 
>>> and it still does not give any authentication error when trying to issue 
>>> the command via curl.  I get a success response from all the Solr instances 
>>> that the reload was successful.
>>> 
>>> Not sure why the pre-canned permissions aren’t working, but the one to the 
>>> request handler at the /browse path is.
>>> 
>>> 
>>>> On Sep 1, 2015, at 11:03 PM, Noble Paul  wrote:
>>>> 
>>>> " However, after uploading the new security.json and restarting the
>>>> web browser,"
>>>> 
>>>> The browser remembers your login , So it is unlikely to prompt for the
>>>> credentials again.
>>>> 
>>>> Why don't you try the RELOAD operation using command line (curl) ?
>>>> 
>>>> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee  
>>>> wrote:
>>>>> The restart issues aside, I’m trying to lockdown usage of the Collections 
>>>>> API, but that also does not seem to be working either.
>>>>> 
>>>>> Here is my security.json.  I’m using the “collection-admin-edit” 
>>>>> permission and assigning it to the “adminRole”.  However, after uploading 
>>>>> the new security.json and restarting the web browser, it doesn’t seem to 
>>>>> be requiring credentials when calling the RELOAD action on the 
>>>>> Collections API.  The only thing that seems to work is the custom 
>>>>> permission “browse” which is requiring authentication before allowing me 
>>>>> to pull up the page.  Am I using the permissions correctly for the 
>>>>> RuleBasedAuthorizationPlugin?
>>>>> 
>>>>> {
>>>>>   "authentication":{
>>>>>  "class":"solr.BasicAuthPlugin",
>>>>>  "credentials": {
>>>>>   "admin”:” ",
>>>>>   "user": ” "
>>>>>   }
>>>>>   },
>>>>>   "authorization":{
>>>>>  "class":"solr.RuleBasedAuthorizationPlugin",
>>>>>  "permissions": [
>>>>>   {
>>>>>   "name":"security-edit",
>>>>>   "role":"adminRole"
>>>>>   },
>>>>>   {
>>>>>   "name":"collection-admin-edit”,
>>>>>   "role":"adminRole"
>>>>>   },
>>>>>   {
>>>>>   "name":"browse",
>>>>>   "collection": "inventory",
>>>>>   "path": "/browse",
>>>>>   "role":"browseRole"
>>>>>   }
>>>>>   ],
>>>>>  "us

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-04 Thread Kevin Lee
Noble,

Does SOLR-8000 need to be re-opened?  Has anyone else been able to test the 
restart fix?  

At startup, these are the log messages that say there is no security 
configuration and the plugins aren’t being used even though security.json is in 
Zookeeper:
2015-09-04 08:06:21.205 INFO  (main) [   ] o.a.s.c.CoreContainer Security conf 
doesn't exist. Skipping setup for authorization module.
2015-09-04 08:06:21.205 INFO  (main) [   ] o.a.s.c.CoreContainer No 
authentication plugin used.

Thanks,
Kevin

> On Sep 4, 2015, at 5:47 AM, Noble Paul  wrote:
> 
> There are no download links for 5.3.x branch  till we do a bug fix release
> 
> If you wish to download the trunk nightly (which is not same as 5.3.0)
> check here 
> https://builds.apache.org/job/Solr-Artifacts-trunk/lastSuccessfulBuild/artifact/solr/package/
> 
> If you wish to get the binaries for 5.3 branch you will have to make it
> (you will need to install svn and ant)
> 
> Here are the steps
> 
> svn checkout 
> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_3/
> cd lucene_solr_5_3/solr
> ant server
> 
> 
> 
> On Fri, Sep 4, 2015 at 4:11 PM, davidphilip cherian
>  wrote:
>> Hi Kevin/Noble,
>> 
>> What is the download link to take the latest? What are the steps to compile
>> it, test and use?
>> We also have a use case to have this feature in solr too. Therefore, wanted
>> to test and above info would help a lot to get started.
>> 
>> Thanks.
>> 
>> 
>> On Fri, Sep 4, 2015 at 1:45 PM, Kevin Lee  wrote:
>> 
>>> Thanks, I downloaded the source and compiled it and replaced the jar file
>>> in the dist and solr-webapp’s WEB-INF/lib directory.  It does seem to be
>>> protecting the Collections API reload command now as long as I upload the
>>> security.json after startup of the Solr instances.  If I shutdown and bring
>>> the instances back up, the security is no longer in place and I have to
>>> upload the security.json again for it to take effect.
>>> 
>>> - Kevin
>>> 
>>>> On Sep 3, 2015, at 10:29 PM, Noble Paul  wrote:
>>>> 
>>>> Both these are committed. If you could test with the latest 5.3 branch
>>>> it would be helpful
>>>> 
>>>> On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul  wrote:
>>>>> I opened a ticket for the same
>>>>> https://issues.apache.org/jira/browse/SOLR-8004
>>>>> 
>>>>> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee 
>>> wrote:
>>>>>> I’ve found that completely exiting Chrome or Firefox and opening it
>>> back up re-prompts for credentials when they are required.  It was
>>> re-prompting with the /browse path where authentication was working each
>>> time I completely exited and started the browser again, however it won’t
>>> re-prompt unless you exit completely and close all running instances so I
>>> closed all instances each time to test.
>>>>>> 
>>>>>> However, to make sure I ran it via the command line via curl as
>>> suggested and it still does not give any authentication error when trying
>>> to issue the command via curl.  I get a success response from all the Solr
>>> instances that the reload was successful.
>>>>>> 
>>>>>> Not sure why the pre-canned permissions aren’t working, but the one to
>>> the request handler at the /browse path is.
>>>>>> 
>>>>>> 
>>>>>>> On Sep 1, 2015, at 11:03 PM, Noble Paul  wrote:
>>>>>>> 
>>>>>>> " However, after uploading the new security.json and restarting the
>>>>>>> web browser,"
>>>>>>> 
>>>>>>> The browser remembers your login , So it is unlikely to prompt for the
>>>>>>> credentials again.
>>>>>>> 
>>>>>>> Why don't you try the RELOAD operation using command line (curl) ?
>>>>>>> 
>>>>>>> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee 
>>> wrote:
>>>>>>>> The restart issues aside, I’m trying to lockdown usage of the
>>> Collections API, but that also does not seem to be working either.
>>>>>>>> 
>>>>>>>> Here is my security.json.  I’m using the “collection-admin-edit”
>>> permission and assigning it to the “adminRole”.  However, after uploading
>>> the new security.json and restarting the web browser, it doesn’t seem to be
>>> requiring credentials 

Re: Config error mystery

2015-09-04 Thread Kevin Lee
Are you using a single instance or cloud?  What version of Solr are you using?  
In your solrconfig.xml is the path to where you copied your library specified 
in a  tag?  Do you have a jar file for the Postgres JDBC driver in your 
lib directory as well?

For simple setup in 5.x I copy the jars to a lib directory under the 
core/collection directory.  For example, under server/solr//lib I 
would have the solr-dataimporthandler-.jar and the jdbc driver jar 
file.  These should be automatically picked up without having to add anything 
to the solrconfig.xml in terms of  tags.

For a production cloud deployment, I create a lib directory outside of the 
core/collection directory somewhere else on the file system so that it is easy 
to install without having to wait for a directory to be created by the 
Collections CREATE command and add the appropriate entry to the solrconfig.xml. 
 Then stick both jars in that directory.

Your error may be different, but as long as I have both jars in one of the two 
places mentioned above with the appropriate entry in solrconfig.xml if needed, 
then it has been working in my setups.

- Kevin


> On Sep 4, 2015, at 9:40 AM, Mark Fenbers  wrote:
> 
> Greetings,
> 
> I'm moving on from the tutorials and trying to setup an index for my own data 
> (from a database).  All I did was add the following to the solrconfig.xml 
> (taken verbatim from the example in Solr documentation, except for the 
> name="config" pathname) and I get an error in the web-based UI.
> 
>   class="org.apache.solr.handler.dataimport.DataImportHandler" >
>
>/localapps/dev/EventLog/data-config.xml
>
>  
> 
> Because of this error, no /dataimport page is available in the Admin user 
> interface; therefore, I cannot visit the page 
> http://localhost:8983/solr/dataimport.  The actual error is:
> 
> org.apache.solr.common.SolrException: Error Instantiating requestHandler, 
> org.apache.solr.handler.dataimport.DataImportHandler failed to instantiate 
> org.apache.solr.request.SolrRequestHandler
>at org.apache.solr.core.SolrCore.(SolrCore.java:820)
>at org.apache.solr.core.SolrCore.(SolrCore.java:659)
>at org.apache.solr.core.CoreContainer.create(CoreContainer.java:727)
>at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:447)
>at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:438)
>at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: Error Instantiating 
> requestHandler, org.apache.solr.handler.dataimport.DataImportHandler failed 
> to instantiate org.apache.solr.request.SolrRequestHandler
>at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:588)
>at org.apache.solr.core.PluginBag.createPlugin(PluginBag.java:122)
>at org.apache.solr.core.PluginBag.init(PluginBag.java:217)
>at 
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:130)
>at org.apache.solr.core.SolrCore.(SolrCore.java:773)
>... 9 more
> Caused by: java.lang.ClassCastException: class 
> org.apache.solr.handler.dataimport.DataImportHandler
>at java.lang.Class.asSubclass(Class.java:3208)
>at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:475)
>at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:422)
>at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:567)
>... 13 more
> 
> 
> If I remove the  section and restart Solr, the error goes 
> away.  As best I can tell, the contents of
> /localapps/dev/EventLog/data-config.xml look fine, too.  See it here:
> 
> 
> url="jdbc:postgresql://dx1f/OHRFC" user="awips" />
>
>deltaQuery="SELECT posttime FROM eventlogtext WHERE 
> lastmodtime > '${dataimporter.last_index_time}'">
>
>
>
>
> 
> 
> It seems to me that this problem could be a classpath issue, but I copied the 
> appropriate jar file into the solr/lib directory to be sure.  This made the 
> (slightly different) initial error go away, but now I cannot make this one go 
> away.
> 
> Any ideas?
> 
> Mark
> 
> 
> 



Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-08 Thread Kevin Lee
Thanks Dan!  Please let us know what you find.  I’m interested to know if this 
is an issue with anyone else’s setup or if I have an issue in my local 
configuration that is still preventing it to work on start/restart.

- Kevin

> On Sep 5, 2015, at 8:45 AM, Dan Davis  wrote:
> 
> Kevin & Noble,
> 
> I'll take it on to test this.   I've built from source before, and I've
> wanted this authorization capability for awhile.
> 
> On Fri, Sep 4, 2015 at 9:59 AM, Kevin Lee  wrote:
> 
>> Noble,
>> 
>> Does SOLR-8000 need to be re-opened?  Has anyone else been able to test
>> the restart fix?
>> 
>> At startup, these are the log messages that say there is no security
>> configuration and the plugins aren’t being used even though security.json
>> is in Zookeeper:
>> 2015-09-04 08:06:21.205 INFO  (main) [   ] o.a.s.c.CoreContainer Security
>> conf doesn't exist. Skipping setup for authorization module.
>> 2015-09-04 08:06:21.205 INFO  (main) [   ] o.a.s.c.CoreContainer No
>> authentication plugin used.
>> 
>> Thanks,
>> Kevin
>> 
>>> On Sep 4, 2015, at 5:47 AM, Noble Paul  wrote:
>>> 
>>> There are no download links for 5.3.x branch  till we do a bug fix
>> release
>>> 
>>> If you wish to download the trunk nightly (which is not same as 5.3.0)
>>> check here
>> https://builds.apache.org/job/Solr-Artifacts-trunk/lastSuccessfulBuild/artifact/solr/package/
>>> 
>>> If you wish to get the binaries for 5.3 branch you will have to make it
>>> (you will need to install svn and ant)
>>> 
>>> Here are the steps
>>> 
>>> svn checkout
>> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_3/
>>> cd lucene_solr_5_3/solr
>>> ant server
>>> 
>>> 
>>> 
>>> On Fri, Sep 4, 2015 at 4:11 PM, davidphilip cherian
>>>  wrote:
>>>> Hi Kevin/Noble,
>>>> 
>>>> What is the download link to take the latest? What are the steps to
>> compile
>>>> it, test and use?
>>>> We also have a use case to have this feature in solr too. Therefore,
>> wanted
>>>> to test and above info would help a lot to get started.
>>>> 
>>>> Thanks.
>>>> 
>>>> 
>>>> On Fri, Sep 4, 2015 at 1:45 PM, Kevin Lee 
>> wrote:
>>>> 
>>>>> Thanks, I downloaded the source and compiled it and replaced the jar
>> file
>>>>> in the dist and solr-webapp’s WEB-INF/lib directory.  It does seem to
>> be
>>>>> protecting the Collections API reload command now as long as I upload
>> the
>>>>> security.json after startup of the Solr instances.  If I shutdown and
>> bring
>>>>> the instances back up, the security is no longer in place and I have to
>>>>> upload the security.json again for it to take effect.
>>>>> 
>>>>> - Kevin
>>>>> 
>>>>>> On Sep 3, 2015, at 10:29 PM, Noble Paul  wrote:
>>>>>> 
>>>>>> Both these are committed. If you could test with the latest 5.3 branch
>>>>>> it would be helpful
>>>>>> 
>>>>>> On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul 
>> wrote:
>>>>>>> I opened a ticket for the same
>>>>>>> https://issues.apache.org/jira/browse/SOLR-8004
>>>>>>> 
>>>>>>> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee >> 
>>>>> wrote:
>>>>>>>> I’ve found that completely exiting Chrome or Firefox and opening it
>>>>> back up re-prompts for credentials when they are required.  It was
>>>>> re-prompting with the /browse path where authentication was working
>> each
>>>>> time I completely exited and started the browser again, however it
>> won’t
>>>>> re-prompt unless you exit completely and close all running instances
>> so I
>>>>> closed all instances each time to test.
>>>>>>>> 
>>>>>>>> However, to make sure I ran it via the command line via curl as
>>>>> suggested and it still does not give any authentication error when
>> trying
>>>>> to issue the command via curl.  I get a success response from all the
>> Solr
>>>>> instances that the reload was successful.
>>>>>>>> 
>>>>>>>> Not sure why the pre-canned permissions aren’t working, bu

Questions regarding indexing JSON data

2015-09-20 Thread Kevin Vasko
I am new to Apache Solr and have been struggling with indexing some JSON files.

I have several TB of twitter data in JSON format that I am having trouble 
posting/indexing. I am trying to use a schemaless schema so I don't have to add 
200+ records fields manually.

1.

The first issue is none of the records have '[' or ']' wrapped around the 
records. So it looks like this:

 { "created_at": "Sun Apr 19 23:45:45 + 2015","id": 5.899379634353e+17, 
"id_str": "589937963435302912",}


Just to validate the schemaless portion was working I used a single "tweet" and 
trimmed it down to bare minimum. The brackets not being in the origian appears 
to be a problem as when I tried to process just a small portion of one record 
it requires me to wrap the row in a [ ] (I assume to make it an array) to index 
correctly.  Like the following:

[{ "created_at": "Sun Apr 19 23:45:45 + 2015","id": 5.899379634353e+17, 
"id_str": "589937963435302912",}]

Is there a way around this? I didn't want to preprocess the TB's of JSON data 
that is in this format to add '[', ',' and '[' around all of the data.

2. 

The second issue is some of the fields have null values. 
e.g. "in_reply_to_status_id": null,

I think I figured a way to resolve this by manually adding the field as a 
"strings" type but if I miss one it will kick the file out. Just wanted to see 
if there was something I could add to the schemaless configuration to have it 
pick up null fields as replace them as strings automatically? Or is there a 
better way to handle this?


3. 
The last issue I think my most difficult issue. Which is dealing with "nested" 
or "children" fields in my JSON data.

The data looks like this. https://gist.github.com/gnip/764239. Is there anyways 
to index this information preferably automatically (schemaless method) without 
having to flatten all of my data?

Thanks.


Lucene/Solr Git Mirrors 5 day lag behind SVN?

2015-10-23 Thread Kevin Risden
It looks like both Apache Git mirror (git://git.apache.org/lucene-solr.git)
and GitHub mirror (https://github.com/apache/lucene-solr.git) are 5 days
behind SVN. This seems to have happened before:
https://issues.apache.org/jira/browse/INFRA-9182

Is this a known issue?

Kevin Risden


CloudSolrClient query /admin/info/system

2015-10-26 Thread Kevin Risden
I am trying to use CloudSolrClient to query information about the Solr
server including version information. I found /admin/info/system and it
seems to provide the information I am looking for. However, it looks like
CloudSolrClient cannot query /admin/info since INFO_HANDLER_PATH [1] is not
part of the ADMIN_PATHS in CloudSolrClient.java [2]. Was this possibly
missed as part of SOLR-4943 [3]?

Is this an issue or is there a better way to query this information?

As a side note, ZK_PATH also isn't listed in ADMIN_PATHS. I'm not sure what
issues that could cause. Is there a reason that ADMIN_PATHS in
CloudSolrClient would be different than the paths in CommonParams [1]?

[1]
https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java#L168
[2]
https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L808
[3] https://issues.apache.org/jira/browse/SOLR-4943

Kevin Risden
Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>


Re: CloudSolrClient query /admin/info/system

2015-10-27 Thread Kevin Risden
Created https://issues.apache.org/jira/browse/SOLR-8216

Kevin Risden
Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>

-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

On Tue, Oct 27, 2015 at 5:11 AM, Alan Woodward  wrote:

> Hi Kevin,
>
> This looks like a bug in CSC - could you raise an issue?
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 26 Oct 2015, at 22:21, Kevin Risden wrote:
>
> > I am trying to use CloudSolrClient to query information about the Solr
> > server including version information. I found /admin/info/system and it
> > seems to provide the information I am looking for. However, it looks like
> > CloudSolrClient cannot query /admin/info since INFO_HANDLER_PATH [1] is
> not
> > part of the ADMIN_PATHS in CloudSolrClient.java [2]. Was this possibly
> > missed as part of SOLR-4943 [3]?
> >
> > Is this an issue or is there a better way to query this information?
> >
> > As a side note, ZK_PATH also isn't listed in ADMIN_PATHS. I'm not sure
> what
> > issues that could cause. Is there a reason that ADMIN_PATHS in
> > CloudSolrClient would be different than the paths in CommonParams [1]?
> >
> > [1]
> >
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java#L168
> > [2]
> >
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L808
> > [3] https://issues.apache.org/jira/browse/SOLR-4943
> >
> > Kevin Risden
> > Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/
> >
> > M: 732 213 8417
> > LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> |
> Google+
> > <http://www.google.com/+AvalonConsultingLLC> | Twitter
> > <https://twitter.com/avalonconsult>
>
>


CloudSolrClient Connect To Zookeeper with ACL Protected files

2015-11-13 Thread Kevin Lee
Hi,

Is there a way to use CloudSolrClient and connect to a Zookeeper instance where 
ACL is enabled and resources/files like /live_nodes, etc are ACL protected?  
Couldn’t find a way to set the ACL credentials.

Thanks,
Kevin

Re: CloudSolrClient Connect To Zookeeper with ACL Protected files

2015-11-17 Thread Kevin Lee
Does anyone know if it is possible to set the ACL credentials in 
CloudSolrClient needed to access a protected resource in Zookeeper?

Thanks!

> On Nov 13, 2015, at 1:20 PM, Kevin Lee  wrote:
> 
> Hi,
> 
> Is there a way to use CloudSolrClient and connect to a Zookeeper instance 
> where ACL is enabled and resources/files like /live_nodes, etc are ACL 
> protected?  Couldn’t find a way to set the ACL credentials.
> 
> Thanks,
> Kevin



Re: CloudSolrClient Connect To Zookeeper with ACL Protected files

2015-11-18 Thread Kevin Lee
Thanks Alan!

That works!  I was looking for a programatic way to do it, but this will work 
for now as it doesn’t seem to be supported.

- Kevin

> On Nov 18, 2015, at 1:24 AM, Alan Woodward  wrote:
> 
> At the moment it seems that it's only settable via System properties - see 
> https://cwiki.apache.org/confluence/display/solr/ZooKeeper+Access+Control.  
> But it would be nice to do this programmatically as well, maybe worth opening 
> a JIRA ticket?
> 
> Alan Woodward
> www.flax.co.uk
> 
> 
> On 17 Nov 2015, at 16:44, Kevin Lee wrote:
> 
>> Does anyone know if it is possible to set the ACL credentials in 
>> CloudSolrClient needed to access a protected resource in Zookeeper?
>> 
>> Thanks!
>> 
>>> On Nov 13, 2015, at 1:20 PM, Kevin Lee  wrote:
>>> 
>>> Hi,
>>> 
>>> Is there a way to use CloudSolrClient and connect to a Zookeeper instance 
>>> where ACL is enabled and resources/files like /live_nodes, etc are ACL 
>>> protected?  Couldn’t find a way to set the ACL credentials.
>>> 
>>> Thanks,
>>> Kevin
>> 
> 



syntax for increasing java memory

2015-02-23 Thread Kevin Laurie
Hi Guys,
 I am a newbie on Solr and I am just using it for dovecot sake.
Could you help advise the correct syntax to increase java heap size using
the  -xmx option(or advise some easy-to-read literature for configuring) ?
Much appreciate if you could help. I just need this to sort out the problem
with my Dovecot FTS.
Thanks
Kevin


Re: syntax for increasing java memory

2015-02-23 Thread Kevin Laurie
Hi Walter,

I am running :-
Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_65 24.65-b04)

I tried running with this command:-

java -jar start.jar -Xmx1024m
WARNING: System properties and/or JVM args set.  Consider using --dry-run
or --exec
0[main] INFO  org.eclipse.jetty.server.Server  ? jetty-8.1.10.v20130312
61   [main] INFO  org.eclipse.jetty.deploy.providers.ScanningAppProvider  ?
Deployment monitor /opt/solr/contexts at interval 0

Still getting 500m.

Any advise? Will check java -X out.


On Tue, Feb 24, 2015 at 12:49 AM, Walter Underwood 
wrote:

> That depends on the JVM you are using. For the Oracle JVMs, use this to
> get a list of extended options:
>
> java -X
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> On Feb 23, 2015, at 8:21 AM, Kevin Laurie 
> wrote:
>
> > Hi Guys,
> > I am a newbie on Solr and I am just using it for dovecot sake.
> > Could you help advise the correct syntax to increase java heap size using
> > the  -xmx option(or advise some easy-to-read literature for configuring)
> ?
> > Much appreciate if you could help. I just need this to sort out the
> problem
> > with my Dovecot FTS.
> > Thanks
> > Kevin
>
>


Re: syntax for increasing java memory

2015-02-23 Thread Kevin Laurie
Hi Walter
Got it.
java -Xmx1024m -jar start.jar
Thanks
Kevin

On Tue, Feb 24, 2015 at 1:00 AM, Kevin Laurie 
wrote:

> Hi Walter,
>
> I am running :-
> Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_65 24.65-b04)
>
> I tried running with this command:-
>
> java -jar start.jar -Xmx1024m
> WARNING: System properties and/or JVM args set.  Consider using --dry-run
> or --exec
> 0[main] INFO  org.eclipse.jetty.server.Server  ? jetty-8.1.10.v20130312
> 61   [main] INFO  org.eclipse.jetty.deploy.providers.ScanningAppProvider
> ? Deployment monitor /opt/solr/contexts at interval 0
>
> Still getting 500m.
>
> Any advise? Will check java -X out.
>
>
> On Tue, Feb 24, 2015 at 12:49 AM, Walter Underwood 
> wrote:
>
>> That depends on the JVM you are using. For the Oracle JVMs, use this to
>> get a list of extended options:
>>
>> java -X
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> On Feb 23, 2015, at 8:21 AM, Kevin Laurie 
>> wrote:
>>
>> > Hi Guys,
>> > I am a newbie on Solr and I am just using it for dovecot sake.
>> > Could you help advise the correct syntax to increase java heap size
>> using
>> > the  -xmx option(or advise some easy-to-read literature for
>> configuring) ?
>> > Much appreciate if you could help. I just need this to sort out the
>> problem
>> > with my Dovecot FTS.
>> > Thanks
>> > Kevin
>>
>>
>


apache solr - dovecot - some search fields works some dont

2015-02-23 Thread Kevin Laurie
Hi,
I finally understand how Solr works(somewhat) its a bit complicated as I am
new to the whole concept but I understand it as a search engine. I am using
Solr with dovecot.
and  I found out that some seach fields from the inbox work and other dont.
For example if I were to search To and From apache solr would process it in
its log and give me an output, however if I were to search something in the
Body it would stall and no output.
I am guessing this is some schema.xml problem. Could you advise?
Oh. I already addressed the java heap size problem.
I have underlined the syntax that shows it.
I am guessing its only the body search that fails, and it might be
schema.xml related.



*3374412 [qtp1728413448-16] INFO  org.apache.solr.core.SolrCore  ?
[collection1] webapp=/solr path=/select
params={sort=uid+asc&fl=uid,score&q=subject:"dave"+OR+from:"dave"+OR+to:"dave"&fq=%2Bbox:ac553604f7314b54e6233555fc1a+%2Buser:"b...@email.net
"&rows=107161} hits=571 status=0 QTime=706 *
3379438 [qtp1728413448-18] INFO  org.apache.solr.servlet.
SolrDispatchFilter  ? [admin] webapp=null path=/admin/info/logging
params={_=1424714397078&since=1424711021771&wt=json} status=0 QTime=0
3389791 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714407453&since=1424711021771&wt=json} status=0 QTime=1
3400172 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714417834&since=1424711021771&wt=json} status=0 QTime=1
3410544 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714428205&since=1424711021771&wt=json} status=0 QTime=0
3420895 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714438558&since=1424711021771&wt=json} status=0 QTime=0
3431247 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714448908&since=1424711021771&wt=json} status=0 QTime=1
3441671 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714459334&since=1424711021771&wt=json} status=0 QTime=1
3452017 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714469679&since=1424711021771&wt=json} status=0 QTime=1
3462363 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714480026&since=1424711021771&wt=json} status=0 QTime=0
3472707 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714490369&since=1424711021771&wt=json} status=0 QTime=0
3483139 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714500802&since=1424711021771&wt=json} status=0 QTime=1
3493590 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714511246&since=1424711021771&wt=json} status=0 QTime=0
3504027 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714521691&since=1424711021771&wt=json} status=0 QTime=0
3514477 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714532137&since=1424711021771&wt=json} status=0 QTime=1
3524933 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714542598&since=1424711021771&wt=json} status=0 QTime=0
3535288 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714552951&since=1424711021771&wt=json} status=0 QTime=0
3545634 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714563290&since=1424711021771&wt=json} status=0 QTime=0
3556077 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714573714&since=1424711021771&wt=json} status=0 QTime=0
3566496 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714584157&since=1424711021771&wt=json} status=0 QTime=1
3576937 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714594601&since=1424711021771&wt=json} status=0 QTime=0
3587273 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714604939&sin

Re: apache solr - dovecot - some search fields works some dont

2015-02-24 Thread Kevin Laurie
Dear Alex,
Nothing comes back when I do a "body search". It shows a searching
process on the client but then it just stops and no result comes up.
I am wondering if this is schema related problem.

When I search a "subject" on the mail client I get output as below and :-

8025 [main] INFO  org.eclipse.jetty.server.AbstractConnector  ?
Started SocketConnector@0.0.0.0:8983
9001 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore
 ? [collection1] Registered new searcher Searcher@7dfcb28[collection1]
main{StandardDirectoryReader(segments_4g:789:nrt _6z(4.10.2):C16672
_44(4.10.2):C6996 _56(4.10.2):C3672 _64(4.10.2):C4000
_8y(4.10.2):C3143 _7v(4.10.2):C673 _7b(4.10.2):C830 _85(4.10.2):C3754
_7k(4.10.2):C3975 _8f(4.10.2):C1516 _7n(4.10.2):C67 _9a(4.10.2):C677
_8o(4.10.2):C38 _8v(4.10.2):C40 _9l(4.10.2):C2705 _8x(4.10.2):C43
_90(4.10.2):C16 _9b(4.10.2):C22 _9d(4.10.2):C44 _9f(4.10.2):C84
_9h(4.10.2):C83 _9i(4.10.2):C356 _9j(4.10.2):C84 _9k(4.10.2):C296
_9m(4.10.2):C83 _9n(4.10.2):C57)}
155092 [qtp433527567-13] INFO  org.apache.solr.core.SolrCore  ?
[collection1] webapp=/solr path=/select
params={sort=uid+asc&fl=uid,score&q=subject:"price"&fq=%2Bbox:ac553604f7314b54e6233555fc1a+%2Buser:"u...@domain.net"&rows=107178}
hits=1237 status=0 QTime=1918

The content is quite large, 27,000emails .

Could you advise what could this problem be?
How do we correct and fix this problem then?

I might have the wrong schema installed so the body search is not
working. Could this be it?
Might post this on dovecot to see if someone could answer about this.

Kindly advise if you have any idea on this

Ps.How do I check the body definition?
Thanks
Kevin

On Tue, Feb 24, 2015 at 9:36 PM, Alexandre Rafalovitch
 wrote:
> What specifically do you mean by "stall"? Very slow but comes back?
> Never comes back? Throws an error?
>
> What is your field definition for body? How big is the content in it?
> Do you change the fields returned if you search body and if you search
> just headers?
> How many rows do you request back?
>
>
> One hypothesis: You are storing (stored=true) your body, it is very
> large and the stall happens not during search but during reading very
> large amount of text from disk to reconstitute the body to send it
> back.
>
> Regards,
>Alex.
>
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
> On 24 February 2015 at 02:06, Kevin Laurie  
> wrote:
>> For example if I were to search To and From apache solr would process it in
>> its log and give me an output, however if I were to search something in the
>> Body it would stall and no output.


Re: apache solr - dovecot - some search fields works some dont

2015-02-24 Thread Kevin Laurie
Dear Alex,
I checked the log. When searching the fields From , To, Subject. It records
it
When searching Body, there is no log showing. I am assuming it is a problem
in the schema.

Will post schema.xml output in next mail.

On Wed, Feb 25, 2015 at 1:09 AM, Alexandre Rafalovitch 
wrote:

> Look for the line like this in your log with the search matching the
> body. Maybe put a nonsense string and look for that. This should tell
> you what the Solr-side search looks like.
>
> The thing that worries me here is: rows=107178 - that's most probably
> what's blowing up Solr. You should be paging, not getting everything.
> And that number being like that, it may mean your client makes two
> requests, once to get the result count and once to get the rows
> themselves. It's the second request that is most probably blowing up.
>
> Once you get the request, you should be able to tell what fields are
> being searched and check those fields in schema.xml for field type and
> then field type's definition. Which is what I asked for in the
> previous email.
>
> Regards,
>Alex.
>
> On 24 February 2015 at 11:55, Kevin Laurie 
> wrote:
> > 155092 [qtp433527567-13] INFO  org.apache.solr.core.SolrCore  ?
> > [collection1] webapp=/solr path=/select
> >
> params={sort=uid+asc&fl=uid,score&q=subject:"price"&fq=%2Bbox:ac553604f7314b54e6233555fc1a+%2Buser:"
> u...@domain.net"&rows=107178}
> > hits=1237 status=0 QTime=1918
>
>
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>


Re: apache solr - dovecot - some search fields works some dont

2015-02-24 Thread Kevin Laurie
Hi Alex,
Sorry for such noobness question.
But where does the schema file go in Solr? Is the directory below correct?
/opt/solr/solr/collection1/data
Correct?
Thanks
Kevin

On Wed, Feb 25, 2015 at 1:21 AM, Kevin Laurie
 wrote:
> Dear Alex,
> I checked the log. When searching the fields From , To, Subject. It records
> it
> When searching Body, there is no log showing. I am assuming it is a problem
> in the schema.
>
> Will post schema.xml output in next mail.
>
> On Wed, Feb 25, 2015 at 1:09 AM, Alexandre Rafalovitch 
> wrote:
>>
>> Look for the line like this in your log with the search matching the
>> body. Maybe put a nonsense string and look for that. This should tell
>> you what the Solr-side search looks like.
>>
>> The thing that worries me here is: rows=107178 - that's most probably
>> what's blowing up Solr. You should be paging, not getting everything.
>> And that number being like that, it may mean your client makes two
>> requests, once to get the result count and once to get the rows
>> themselves. It's the second request that is most probably blowing up.
>>
>> Once you get the request, you should be able to tell what fields are
>> being searched and check those fields in schema.xml for field type and
>> then field type's definition. Which is what I asked for in the
>> previous email.
>>
>> Regards,
>>Alex.
>>
>> On 24 February 2015 at 11:55, Kevin Laurie 
>> wrote:
>> > 155092 [qtp433527567-13] INFO  org.apache.solr.core.SolrCore  ?
>> > [collection1] webapp=/solr path=/select
>> >
>> > params={sort=uid+asc&fl=uid,score&q=subject:"price"&fq=%2Bbox:ac553604f7314b54e6233555fc1a+%2Buser:"u...@domain.net"&rows=107178}
>> > hits=1237 status=0 QTime=1918
>>
>>
>>
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>
>


Re: apache solr - dovecot - some search fields works some dont

2015-02-24 Thread Kevin Laurie
Hi Alex,

Below is where my schema is stored:-

/opt/solr/solr/collection1/conf#

File name: schema.xml

Below output for body


 
   
   
   
   

   
   

   
   
   
   
   
 

Anything you see that I should be concerned about?




On Wed, Feb 25, 2015 at 1:27 AM, Kevin Laurie 
wrote:

> Hi Alex,
> Sorry for such noobness question.
> But where does the schema file go in Solr? Is the directory below correct?
> /opt/solr/solr/collection1/data
> Correct?
> Thanks
> Kevin
>
> On Wed, Feb 25, 2015 at 1:21 AM, Kevin Laurie
>  wrote:
> > Dear Alex,
> > I checked the log. When searching the fields From , To, Subject. It
> records
> > it
> > When searching Body, there is no log showing. I am assuming it is a
> problem
> > in the schema.
> >
> > Will post schema.xml output in next mail.
> >
> > On Wed, Feb 25, 2015 at 1:09 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >>
> >> Look for the line like this in your log with the search matching the
> >> body. Maybe put a nonsense string and look for that. This should tell
> >> you what the Solr-side search looks like.
> >>
> >> The thing that worries me here is: rows=107178 - that's most probably
> >> what's blowing up Solr. You should be paging, not getting everything.
> >> And that number being like that, it may mean your client makes two
> >> requests, once to get the result count and once to get the rows
> >> themselves. It's the second request that is most probably blowing up.
> >>
> >> Once you get the request, you should be able to tell what fields are
> >> being searched and check those fields in schema.xml for field type and
> >> then field type's definition. Which is what I asked for in the
> >> previous email.
> >>
> >> Regards,
> >>Alex.
> >>
> >> On 24 February 2015 at 11:55, Kevin Laurie  >
> >> wrote:
> >> > 155092 [qtp433527567-13] INFO  org.apache.solr.core.SolrCore  ?
> >> > [collection1] webapp=/solr path=/select
> >> >
> >> >
> params={sort=uid+asc&fl=uid,score&q=subject:"price"&fq=%2Bbox:ac553604f7314b54e6233555fc1a+%2Buser:"
> u...@domain.net"&rows=107178}
> >> > hits=1237 status=0 QTime=1918
> >>
> >>
> >>
> >> 
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >
> >
>


Re: apache solr - dovecot - some search fields works some dont

2015-02-25 Thread Kevin Laurie
Hi Alex,

I get 1 error on start up
Is the error below serious:-


2/25/2015, 11:32:30 PM ERROR SolrCore
org.apache.solr.common.SolrException: undefined field text

org.apache.solr.common.SolrException: undefined field text
at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1269)
at 
org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getWrappedAnalyzer(IndexSchema.java:434)
at 
org.apache.lucene.analysis.DelegatingAnalyzerWrapper$DelegatingReuseStrategy.getReusableComponents(DelegatingAnalyzerWrapper.java:74)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:175)
at org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:207)
at 
org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:374)
at 
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:742)
at 
org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:541)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:299)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:107)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:151)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
at org.apache.solr.search.QParser.getQuery(QParser.java:141)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:148)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64)
at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1739)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


On Wed, Feb 25, 2015 at 3:08 AM, Alexandre Rafalovitch
 wrote:
> The field definition looks fine. It's not storing any content
> (stored=false) but is indexing, so you should find the records but not
> see the body in them.
>
> Not seeing a log entry is more of a worry. Are you sure the request
> even made it to Solr?
>
> Can you see anything in Dovecot's logs? Or in Solr's access.logs
> (Actually Jetty/Tomcat's access logs that may need to be enabled
> first).
>
> At this point, you don't have enough information to fix anything. You
> need to understand what's different between request against "subject"
> vs. the request against "body". I would break the communication in
> three stages:
> 1) What Dovecote sent
> 2) What Solr received
> 3) What Solr sent back
>
> I don't know your skill levels or your system setup to advise
> specifically, but Network tracer (e.g. Wireshark) is good for 1. Logs
> are good for 2. Using the query from 1) and manually running it
> against Solr is good for 3).
>
> Hope this helps,
>Alex.
>
> On 24 February 2015 at 12:35, Kevin Laurie  
> wrote:
>> 
>
>
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/


Re: apache solr - dovecot - some search fields works some dont

2015-02-25 Thread Kevin Laurie
Hi Alex,

Below shows that Solr is not getting anything from the text search.
I will try to search from / to and see hows the performance.





select BAD Error in IMAP command INBOX: Unknown command.
. select inbox
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded)
* OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft
$Forwarded \*)] Flags permitted.
* 49983 EXISTS
* 0 RECENT
* OK [UNSEEN 46791] First unseen.
* OK [UIDVALIDITY 1414214135] UIDs valid
* OK [UIDNEXT 107218] Predicted next UID
* OK [NOMODSEQ] No permanent modsequences
. OK [READ-WRITE] Select completed (0.002 secs).
search text dave
search BAD Error in IMAP command TEXT: Unknown command.
. search text "dave"
* OK Searched 6% of the mailbox, ETA 2:24
* OK Searched 13% of the mailbox, ETA 2:10
* OK Searched 20% of the mailbox, ETA 1:54
* OK Searched 27% of the mailbox, ETA 1:46
* OK Searched 34% of the mailbox, ETA 1:36
* OK Searched 41% of the mailbox, ETA 1:26
* OK Searched 49% of the mailbox, ETA 1:11
* OK Searched 56% of the mailbox, ETA 1:02
* OK Searched 63% of the mailbox, ETA 0:52
* OK Searched 69% of the mailbox, ETA 0:44
* OK Searched 77% of the mailbox, ETA 0:31
* OK Searched 85% of the mailbox, ETA 0:20
* OK Searched 92% of the mailbox, ETA 0:10
* OK Searched 98% of the mailbox, ETA 0:02

On Wed, Feb 25, 2015 at 11:39 PM, Kevin Laurie
 wrote:
> Hi Alex,
>
> I get 1 error on start up
> Is the error below serious:-
>
>
> 2/25/2015, 11:32:30 PM ERROR SolrCore
> org.apache.solr.common.SolrException: undefined field text
>
> org.apache.solr.common.SolrException: undefined field text
> at 
> org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1269)
> at 
> org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getWrappedAnalyzer(IndexSchema.java:434)
> at 
> org.apache.lucene.analysis.DelegatingAnalyzerWrapper$DelegatingReuseStrategy.getReusableComponents(DelegatingAnalyzerWrapper.java:74)
> at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:175)
> at org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:207)
> at 
> org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:374)
> at 
> org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:742)
> at 
> org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:541)
> at org.apache.solr.parser.QueryParser.Term(QueryParser.java:299)
> at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)
> at org.apache.solr.parser.QueryParser.Query(QueryParser.java:107)
> at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)
> at 
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:151)
> at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
> at org.apache.solr.search.QParser.getQuery(QParser.java:141)
> at 
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:148)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
> at 
> org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64)
> at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1739)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> On Wed, Feb 25, 2015 at 3:08 AM, Alexandre Rafalovitch
>  wrote:
>> The field definition looks fine. It's not storing any content
>> (stored=false) but is indexing, so you should find the records but not
>> see the body in them.
>>
>> Not seeing a log entry is more of a worry. Are you sure the request
>> even made it to Solr?
>>
>> Can you see anything in Dovecot's logs? Or in Solr's access.logs
>> (Actually Jetty/Tomcat's access logs that may need to be enabled
>> first).
>>
>> At this point, you don't have enough information to fix anything. You
>> need to understand what's different between request against "subject"
>> vs. the request against "body". I would break the communication in
>> three stages:
>> 1) What Dovecote sent
>> 2) What Solr received
>> 3) What Solr sent back
>>
>> I don't know your skill levels or your system setup to advise
>> specifically, but Network tracer (e.g. Wireshark) is good for 1. Logs
>> are good for 2. Using the query from 1) and manually running it
>> against Solr is good for 3).
>>
>> Hope this helps,
>>Alex.
>>
>> On 24 February 2015 at 12:35, Kevin Laurie  
>> wrote:
>>> 
>>
>>
>>
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/


get Multi-Valued field data from DocValues

2015-03-13 Thread Kevin Osborn
If I am finding the values of a long field for a single numeric field, I
just do:

DocValues.getNumeric(contex.reader(), "myField").get(docNumber). This
returns the value of the field and everything is good.

However, my field is a multi-valued long field. So, I need to do:

DocValues.getSortedSet(contex.reader(), "myField")
This returns a SortedSetDocValues. And now I am a bit lost. I want to
generate a list of all the values in the field for a particular document. I
am getting into BytesRef and other unfamiliar areas. Any help would be
greatly appreciated.

As a followup question, I am doing this for a PostFiter. So,
DeletagtingCollector.collect(int doc). The value of doc always seems to be
0. So, I am assuming this is not the doc ID. Is this the index of the
reader at it's current position?

Thanks.


Re: get Multi-Valued field data from DocValues

2015-03-13 Thread Kevin Osborn
getSortedNumeric throws the following exception:

unexpected docvalues type SORTED_SET for field 'space_list' (expected one
of [SORTED_NUMERIC, NUMERIC]). Use UninvertingReader or index with
docvalues.

If I am reading the doumentation correctly, getSortedNumeric sorts the
values, but it is still for non-multivalued fields.

On Fri, Mar 13, 2015 at 1:55 PM, Chris Hostetter 
wrote:

>
> : If I am finding the values of a long field for a single numeric field, I
> : just do:
> :
> : DocValues.getNumeric(contex.reader(), "myField").get(docNumber). This
> : returns the value of the field and everything is good.
> :
> : However, my field is a multi-valued long field. So, I need to do:
> :
> : DocValues.getSortedSet(contex.reader(), "myField")
> : This returns a SortedSetDocValues. And now I am a bit lost. I want to
>
> I haven't looked into this closely, but isn't what you want just
> DocValues.getSortedNumeric() ?
>
>
> : As a followup question, I am doing this for a PostFiter. So,
> : DeletagtingCollector.collect(int doc). The value of doc always seems to
> be
> : 0. So, I am assuming this is not the doc ID. Is this the index of the
> : reader at it's current position?
>
> It shouldn't always be 0 -- it should be the docId relative the current
> (Leaf) reader context ... so if you have a lot of segments containing only
> a single document, then it would always be 0.
>
> If you always use the current LeafReaderContext to fetch the DocValues
> (ie: you can load the DocValues in doSetNextReader() and re-use until the
> next doSetNextReader() or finish()) then the docId collected and the docId
> you use to lookup the docValues can be identical, and you can ignore the
> details of where/how the current reader context is in relation to the
> entire index (ie: the docBase)
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: get Multi-Valued field data from DocValues

2015-03-13 Thread Kevin Osborn
I figured it out. Here is what you want to do (excuse the Scala syntax).

docValues = DocValues.getSortedSet(contex.reader(), "myField")
docValues.setDocument(docNumber)
val values = Stream.continually(docValues.nextOrd).takeWhile(_ !=
SortedSetDocValues.NO_MORE_ORDS).map(b =>
NumericUtils.prefixCodedToLong(docValues.lookupOrd(b))).toSet

Basically, we set the document, then iterate through the Ords. And then
convert the BytesRef to a long.



On Fri, Mar 13, 2015 at 2:33 PM, Kevin Osborn 
wrote:

> getSortedNumeric throws the following exception:
>
> unexpected docvalues type SORTED_SET for field 'space_list' (expected one
> of [SORTED_NUMERIC, NUMERIC]). Use UninvertingReader or index with
> docvalues.
>
> If I am reading the doumentation correctly, getSortedNumeric sorts the
> values, but it is still for non-multivalued fields.
>
> On Fri, Mar 13, 2015 at 1:55 PM, Chris Hostetter  > wrote:
>
>>
>> : If I am finding the values of a long field for a single numeric field, I
>> : just do:
>> :
>> : DocValues.getNumeric(contex.reader(), "myField").get(docNumber). This
>> : returns the value of the field and everything is good.
>> :
>> : However, my field is a multi-valued long field. So, I need to do:
>> :
>> : DocValues.getSortedSet(contex.reader(), "myField")
>> : This returns a SortedSetDocValues. And now I am a bit lost. I want to
>>
>> I haven't looked into this closely, but isn't what you want just
>> DocValues.getSortedNumeric() ?
>>
>>
>> : As a followup question, I am doing this for a PostFiter. So,
>> : DeletagtingCollector.collect(int doc). The value of doc always seems to
>> be
>> : 0. So, I am assuming this is not the doc ID. Is this the index of the
>> : reader at it's current position?
>>
>> It shouldn't always be 0 -- it should be the docId relative the current
>> (Leaf) reader context ... so if you have a lot of segments containing only
>> a single document, then it would always be 0.
>>
>> If you always use the current LeafReaderContext to fetch the DocValues
>> (ie: you can load the DocValues in doSetNextReader() and re-use until the
>> next doSetNextReader() or finish()) then the docId collected and the docId
>> you use to lookup the docValues can be identical, and you can ignore the
>> details of where/how the current reader context is in relation to the
>> entire index (ie: the docBase)
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>
>


copy field from boolean to int

2015-03-17 Thread Kevin Osborn
I was hoping to use DocValues, but one of my fields is a boolean, which is
not currently supported by DocValues. I can use a copyField to convert my
boolean to a string. Is there is anyway to use a copyField to convert from
a boolean to a tint?


Re: copy field from boolean to int

2015-03-18 Thread Kevin Osborn
I already use this field elsewhere, so I don't want to change it's type. I
did implement a UpdateRequestProcessor to copy from a bool to an int. This
works, but even better would be to fix Solr so that I can use DocValues
with boolean. So, I am going to try to get that working as well.

On Tue, Mar 17, 2015 at 10:25 PM, William Bell  wrote:

> Can you reindex? Just use 1,0.
>
> On Tue, Mar 17, 2015 at 6:08 PM, Chris Hostetter  >
> wrote:
>
> >
> > Can you open a jira to add docValues support for BoolField? ... i can't
> > think of any good reason not to directly support that in Solr for
> > BoolField ... seems like just an oversight that slipped through the
> > cracks.
> >
> >
> > For now, your best bet is probably to use an UpdateProcessor ... maybe 2
> > instances of RegexReplaceProcessorFactory to match "true" and "false" and
> > replace them with "0" and "1" ?
> >
> >
> > : Date: Tue, 17 Mar 2015 17:57:03 -0700
> > : From: Kevin Osborn 
> > : Reply-To: solr-user@lucene.apache.org
> > : To: solr-user@lucene.apache.org
> > : Subject: copy field from boolean to int
> > :
> > : I was hoping to use DocValues, but one of my fields is a boolean, which
> > is
> > : not currently supported by DocValues. I can use a copyField to convert
> my
> > : boolean to a string. Is there is anyway to use a copyField to convert
> > from
> > : a boolean to a tint?
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>


PostFilter does not seem to work across shards

2015-03-20 Thread Kevin Osborn
I developed a post filter. My documents to be filtered are on two different
shards. So, in a single-shard environment,
DelegatingCollector.doSetNextReader is called twice. And collect is called
the correct number of times. Everything went well and I got my correct
number of results back.

So, I then tried this filter in a two-shard environment. This time things
did not work well. I am still trying to figure out what is going on, but it
seems like just the first shard is being used. I get the same results no
matter what shard or replica I begin my query on. But it seems like the
results are not being merged.

Although I am still trying to figure out if the second shard is even being
queried.

Are there any known issues with DelegatingCollector and shards?

I don't know if this is related, but I did once get the following error
message as well.

java.lang.UnsupportedOperationException: Query {!cache=false cost=100}
does not implement
createWeight


Re: PostFilter does not seem to work across shards

2015-03-23 Thread Kevin Osborn
A little more information here. I have verified that the post filter is
giving me only documents that are in the first shard. Running two shards
and a single replica in debug mode also shows that the collect method is
only called for documents in the first shard. I never see any indication
that the filter is called for any documents on the second shard.

On Fri, Mar 20, 2015 at 4:12 PM, Kevin Osborn 
wrote:

> I developed a post filter. My documents to be filtered are on two
> different shards. So, in a single-shard environment,
> DelegatingCollector.doSetNextReader is called twice. And collect is called
> the correct number of times. Everything went well and I got my correct
> number of results back.
>
> So, I then tried this filter in a two-shard environment. This time things
> did not work well. I am still trying to figure out what is going on, but it
> seems like just the first shard is being used. I get the same results no
> matter what shard or replica I begin my query on. But it seems like the
> results are not being merged.
>
> Although I am still trying to figure out if the second shard is even being
> queried.
>
> Are there any known issues with DelegatingCollector and shards?
>
> I don't know if this is related, but I did once get the following error
> message as well.
>
> java.lang.UnsupportedOperationException: Query {!cache=false cost=100} does 
> not implement
> createWeight
>
>


Re: PostFilter does not seem to work across shards

2015-03-23 Thread Kevin Osborn
I think I found my issue. It has nothing to do with the post filter. In the
constructor of my post filter, I am doing a TermQuery do get a single user
document. I then later intersect this user's permissions with the collected
documents. So, if the user document is in the shard that I am filtering in,
it works fine. I retrieve the object and do my intersections. But, on the
other shard, I don't have my user document. So, I have nothing to intersect
with.

That is a separate issue that I need to figure out.

On Mon, Mar 23, 2015 at 8:09 AM, Kevin Osborn 
wrote:

> A little more information here. I have verified that the post filter is
> giving me only documents that are in the first shard. Running two shards
> and a single replica in debug mode also shows that the collect method is
> only called for documents in the first shard. I never see any indication
> that the filter is called for any documents on the second shard.
>
> On Fri, Mar 20, 2015 at 4:12 PM, Kevin Osborn 
> wrote:
>
>> I developed a post filter. My documents to be filtered are on two
>> different shards. So, in a single-shard environment,
>> DelegatingCollector.doSetNextReader is called twice. And collect is called
>> the correct number of times. Everything went well and I got my correct
>> number of results back.
>>
>> So, I then tried this filter in a two-shard environment. This time things
>> did not work well. I am still trying to figure out what is going on, but it
>> seems like just the first shard is being used. I get the same results no
>> matter what shard or replica I begin my query on. But it seems like the
>> results are not being merged.
>>
>> Although I am still trying to figure out if the second shard is even
>> being queried.
>>
>> Are there any known issues with DelegatingCollector and shards?
>>
>> I don't know if this is related, but I did once get the following error
>> message as well.
>>
>> java.lang.UnsupportedOperationException: Query {!cache=false cost=100} does 
>> not implement
>> createWeight
>>
>>
>


Query in Solr plugin across shards

2015-03-23 Thread Kevin Osborn
I have created a PostFilter. PostFilter creates a DelegatingCollector,
which provides a Lucene IndexSearcher.

However, I need to query for an object that may or may not be located on
the shard that I am filtering on.

Normally, I would do something like:

searcher.search(new TermQuery(new Term("field", "value").scoreDocs

But this does not work across shards. So, if the document I am looking for
is on a different shard, I get no results.

Any idea how I would best do my search across all shards from within my
plugin?


Re: Query in Solr plugin across shards

2015-03-23 Thread Kevin Osborn
Thanks.

It is a fairly large ACL, so I am hoping to avoid any sort of application
redirect. That is sort of the problem we are trying to solve actually. Our
list was getting too large and we were maxing out maxBooleanQueries.

And I don't know which shard the user document is located on, just its
unique key. Although I suppose we could put all user objects on a single
shard. Not an ideal solution though.

I did see an option of using SolrJ CloudServer from within a plugin, but
that didn't seem very desirable to me.

-Kevin

On Mon, Mar 23, 2015 at 10:41 AM, Erick Erickson 
wrote:

> How much information do you need from this document? If it's a reasonably
> small
> amount, can you read it at the application layer and attach it as a
> set of parameters
> to the query that are then available to the post filter. Or is it a
> huge ACL list of something
>
> In this latter case, if you know the URL of a shard with the doc, you
> could send a query
> (that you perhaps cache in your postFilter code) to that shard with
> &distrib=false and
> get the doc. Hmmm, I suppose if this is a well-known doc ID you don't
> even have to know
> what the shard is, just send the request
>
> BTW, there's a "userCache" that you can configure in solrconfig.xml
> that you might want
> to use, the advantage here is that it gets notified whenever a new
> searcher is opened
> so it can "do the right thing" in terms of refreshing itself.
>
> FWIW,
> Erick
>
> On Mon, Mar 23, 2015 at 9:23 AM, Kevin Osborn
>  wrote:
> > I have created a PostFilter. PostFilter creates a DelegatingCollector,
> > which provides a Lucene IndexSearcher.
> >
> > However, I need to query for an object that may or may not be located on
> > the shard that I am filtering on.
> >
> > Normally, I would do something like:
> >
> > searcher.search(new TermQuery(new Term("field", "value").scoreDocs
> >
> > But this does not work across shards. So, if the document I am looking
> for
> > is on a different shard, I get no results.
> >
> > Any idea how I would best do my search across all shards from within my
> > plugin?
>


Bug: replies mixed up with concurrent requests from the same host

2015-07-01 Thread Kevin Perros

Hello,

I'm new to the solr mailing list, and I have not used solr for much 
time, so I might be wrong.


I may have found a bug whose symptoms look that one in jetty 
https://bugs.eclipse.org/bugs/show_bug.cgi?id=392936


I am using solr 5.0.0 (the one with the great packaging and deployment 
work :). I use a lucene index to store adio fingerprints for audio 
tracks, with a custom search system, which reads directly from leaf readers.


When I query solr with either curl or wget, with multiple parallel 
requests from the same client host to the server, the answers come mixed 
up. From my logs, I've seen that if I send 1 requests, with a 24 
fold parallelism, I often get as an answer to a request, the answer to 
the first one.


Hence, I have tried to bypass jetty and launch the same batch work from 
"inside solr", by writing a dummy request that simulates the 1 
requests, with the same 24 fold parallelism. In that case, everything 
works well.


I had already noticed that bug in mozilla. I have a bookmark folder with 
a bunch of test requests, and when I click on the "open all in tabs" 
button, the result from requests appears in the tab for another one, in 
a random fashion.


Regards,
Kevin







Re: Bug: replies mixed up with concurrent requests from the same host

2015-07-02 Thread Kevin Perros

Thanks for the answers,

I also found that blog post about such issues:
http://techbytes.anuragkapur.com/2014/08/potential-jetty-concurrency-bug-seen-in.html

On 01/07/15 20:26, Chris Hostetter wrote:

: Hmm, interesting. That particular bug was fixed by upgrading to Jetty
: 4.1.7 in https://issues.apache.org/jira/browse/SOLR-4031

1st) Typo - Shalin ment 8.1.7 above.

2nd) If you note the details of both issues, no root cause was ever
identified as being "fixed" -- all that hapened was that Per tried
upgrading to 8.1.7 and found he could no longer reproduce with his
particular test cases.

That doesn't mean the bug went away in 8.1.7, it means something
changed in 8.1.7 that cause the bug to no longer surface in the same way
for the same person.

It's very possible this is in fact hte same bug, but some other minor
change in 8.1.7 just changed the input needed to trigger the bug (eg:
maybe a buffer size increase/decrease, or a change in the default size of
a HashMap, ... anything like that that could tweak the neccessary input
size / request count / etc... neccessary to trigger the bug)

: > When I query solr with either curl or wget, with multiple parallel requests
: > from the same client host to the server, the answers come mixed up. From my
: > logs, I've seen that if I send 1 requests, with a 24 fold parallelism, I
: > often get as an answer to a request, the answer to the first one.

can you reproduce this against a controlled set of data/configs/queries
that you can bundle up in a zip file and make available to other people
for testing?  (ie: non proprietary/confidential configs + data + queries,
preferably with a data set small enough that it can be downloaded
quickly, ideally under 10MB so it can be attached to jira)


-Hoss
http://www.lucidworks.com/






Re: Bug: replies mixed up with concurrent requests from the same host

2015-07-06 Thread Kevin Perros

Hi,

I have good news (for me :), I have resolved my bug.

As always it was my own fault.

I did a few tests sometime ago so as to understand how instances of 
various objects were instanciated in Lucene/Solr, and made a mistake in 
understanding how SearchComponent were managed.


I believed that each time a request was to be handled by a 
SearchComponent, a new instance of that SearchComponent was 
instanciated. So I did not bother about making my SearchComponent thread 
safe... I should have.


Thank You for your help,
Kevin

On 01/07/15 11:26, Kevin Perros wrote:

Hello,

I'm new to the solr mailing list, and I have not used solr for much 
time, so I might be wrong.


I may have found a bug whose symptoms look that one in jetty 
https://bugs.eclipse.org/bugs/show_bug.cgi?id=392936


I am using solr 5.0.0 (the one with the great packaging and deployment 
work :). I use a lucene index to store adio fingerprints for audio 
tracks, with a custom search system, which reads directly from leaf 
readers.


When I query solr with either curl or wget, with multiple parallel 
requests from the same client host to the server, the answers come 
mixed up. From my logs, I've seen that if I send 1 requests, with 
a 24 fold parallelism, I often get as an answer to a request, the 
answer to the first one.


Hence, I have tried to bypass jetty and launch the same batch work 
from "inside solr", by writing a dummy request that simulates the 
1 requests, with the same 24 fold parallelism. In that case, 
everything works well.


I had already noticed that bug in mozilla. I have a bookmark folder 
with a bunch of test requests, and when I click on the "open all in 
tabs" button, the result from requests appears in the tab for another 
one, in a random fashion.


Regards,
Kevin











Re: NoNode error on -downconfig when node does exist?

2016-08-08 Thread Kevin Risden
Just a quick guess: do you have a period (.) in your zk connection string
chroot when you meant an underscore (_)?

When you do the ls you use /solr6_1/configs, but you have /solr6.1 in your
zk connection string chroot.

Kevin Risden

On Mon, Aug 8, 2016 at 4:44 PM, John Bickerstaff 
wrote:

> First, the caveat:  I understand this is technically a zookeeper error.  It
> is an error that occurs when trying to deal with Solr however, so I'm
> hoping someone on the list may have some insight.  Also, I'm getting the
> error via the zkcli.sh tool that comes with Solr...
>
> I have created a collection in SolrCloud (6.1) giving the "techproducts"
> sample directory as the location of the conf files.
>
> I then wanted to download those files from zookeeper to the local machine
> via the -cmd downconfig command, so I issue this command:
>
> sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd downconfig
> -confdir /home/john/conf/ -confname statdx -z 192.168.56.5/solr6.1
>
> Instead of the files, I get a stacktrace / error back which says :
>
> exception in thread "main" java.io.IOException: Error downloading files
> from zookeeper path /configs/statdx to /home/john/conf
> at
> org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(
> ZkConfigManager.java:117)
> at
> org.apache.solr.common.cloud.ZkConfigManager.downloadConfigDir(
> ZkConfigManager.java:153)
> at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:237)
> *Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /configs/statdx*
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
> at
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:331)
> at
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:328)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> ZkCmdExecutor.java:60)
> at
> org.apache.solr.common.cloud.SolrZkClient.getChildren(
> SolrZkClient.java:328)
> at
> org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(
> ZkConfigManager.java:101)
> ... 2 more
>
> However, when I actually look in Zookeeper, I find that the "directory"
> does exist and that inside it are listed all the files.
>
> Here is the output from zookeeper:
>
> [zk: localhost:2181(CONNECTED) 0] *ls /solr6_1/configs*
> [statdx]
>
> and...
>
> [zk: localhost:2181(CONNECTED) 1] *ls /solr6_1/configs/statdx*
> [mapping-FoldToASCII.txt, currency.xml, managed-schema, protwords.txt,
> synonyms.txt, stopwords.txt, _schema_analysis_synonyms_english.json,
> velocity, admin-extra.html, update-script.js,
> _schema_analysis_stopwords_english.json, solrconfig.xml,
> admin-extra.menu-top.html, elevate.xml, clustering, xslt,
> _rest_managed.json, mapping-ISOLatin1Accent.txt, spellings.txt, lang,
> admin-extra.menu-bottom.html]
>
> I've rebooted all my zookeeper nodes and restarted them - just in case...
> Same deal.
>
> Has anyone seen anything like this?
>


Re: Unable to connect to correct port in solr 6.2.0

2016-09-12 Thread Kevin Risden
Jan - the issue you are hitting is Docker and /proc/version is getting the
underlying OS kernel and not what you would expect from the Docker
container. The errors for update-rc.d and service are because the docker
image you are using is trimmed down.

Kevin Risden

On Mon, Sep 12, 2016 at 3:19 PM, Jan Høydahl  wrote:

> I tried it on a Docker RHEL system (gidikern/rhel-oracle-jre) and the
> install failed with errors
>
> ./install_solr_service.sh: line 322: update-rc.d: command not found
> ./install_solr_service.sh: line 326: service: command not found
> ./install_solr_service.sh: line 328: service: command not found
>
> Turns out that /proc/version returns “Ubuntu” this on the system:
> Linux version 4.4.19-moby (root@3934ed318998) (gcc version 5.4.0 20160609
> (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #1 SMP Thu Sep 1 09:44:30 UTC 2016
> There is also a /etc/redhat-release file:
> Red Hat Enterprise Linux Server release 7.1 (Maipo)
>
> So the install of rc.d failed completely because of this. Don’t know if
> this is common on RHEL systems, perhaps we need to improve distro detection
> in installer?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 12. sep. 2016 kl. 21.31 skrev Shalin Shekhar Mangar <
> shalinman...@gmail.com>:
> >
> > I just tried this out on ubuntu (sorry I don't have access to a red hat
> > system) and it works fine.
> >
> > One thing that you have to take care of is that if you install the
> service
> > on the default 8983 port then, trying to upgrade with the same tar to a
> > different port does not work. So please ensure that you hadn't already
> > installed the service before already.
> >
> > On Tue, Sep 13, 2016 at 12:53 AM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> >> Which version of red hat? Is lsof installed on this system?
> >>
> >> On Mon, Sep 12, 2016 at 4:30 PM, Preeti Bhat 
> >> wrote:
> >>
> >>> HI All,
> >>>
> >>> I am trying to setup the solr in Redhat Linux, using the
> >>> install_solr_service.sh script of solr.6.2.0  tgz. The script runs and
> >>> starts the solr on port 8983 even when the port is specifically
> specified
> >>> as 2016.
> >>>
> >>> /root/install_solr_service.sh solr-6.2.0.tgz -i /opt -d /var/solr -u
> root
> >>> -s solr -p 2016
> >>>
> >>> Is this correct way to setup solr in linux? Also, I have observed that
> if
> >>> I go to the /bin/solr and start with the port number its working as
> >>> expected but not as service.
> >>>
> >>> I would like to setup the SOLR in SOLRCloud mode with external
> zookeepers.
> >>>
> >>> Could someone please advise on this?
> >>>
> >>>
> >>>
> >>> NOTICE TO RECIPIENTS: This communication may contain confidential
> and/or
> >>> privileged information. If you are not the intended recipient (or have
> >>> received this communication in error) please notify the sender and
> >>> it-supp...@shoregrp.com immediately, and destroy this communication.
> Any
> >>> unauthorized copying, disclosure or distribution of the material in
> this
> >>> communication is strictly forbidden. Any views or opinions presented in
> >>> this email are solely those of the author and do not necessarily
> represent
> >>> those of the company. Finally, the recipient should check this email
> and
> >>> any attachments for the presence of viruses. The company accepts no
> >>> liability for any damage caused by any virus transmitted by this email.
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>
>


Solr Special Character Search

2016-09-20 Thread Cheatham, Kevin
Hello - Has anyone out there had success with anything similar to our issue 
below and be kind enough to share?

We posted several files as text and we're able to search for alphanumeric 
characters, but not able to search for special characters such as @ or © 
through Solrcloud Admin 5.2 UI.  
We've searched through lots of documentation but haven't had success yet.  

We also tried posting files not as text but seems we're not able to search for 
any special characters below hexadecimal 20.

Any assistance would be greatly appreciated!

Thanks!

Kevin Cheatham | Office (314) 573-5534 | kevin.cheat...@graybar.com 
www.graybar.com - Graybar Works to Your Advantage 
  


Convert BytesRef to long in Solr 6.2

2016-09-29 Thread Osborn, Kevin
I have the following code inside a Solr post filter.


SortedDocValues docValues = DocValues.getSortedSet(context.reader, "my_field");

long x = LegacyNumericUtils.prefixCodedToLong(docValues.lookupOrd(b))


I am in the process of upgrading from Solr 5.5 to 6.2, so I changed 
NumericUtils to LegacyNumericUtils.


Basically, I am taking the BytesRef from a field and extracting the long from 
it.


However, LegacyNumericUtils is the deprecated form of NumericUtils. It says 
that I should use the PointValues class instead. However, unless I am missing 
something, it does not seem to support converting BytesRef to long/int/etc.


Is there a better method to do this? I would rather not deprecated code.


-Kevin


Re: CheckHdfsIndex with Kerberos not working

2016-10-03 Thread Kevin Risden
You need to have the hadoop pieces on the classpath. Like core-site.xml and
hdfs-site.xml. There is an hdfs classpath command that would help but it
may have too many pieces. You may just need core-site and hdfs-site so you
don't get conflicting jars.

Something like this may work for you:

java -cp
"$(hdfs classpath):./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/
ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar"
-ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex
hdfs://:8020/apps/solr/data/ExampleCollection/
core_node1/data/index

Kevin Risden

On Mon, Oct 3, 2016 at 1:38 PM, Rishabh Patel <
rishabh.mahendra.pa...@gmail.com> wrote:

> Hello,
>
> My SolrCloud 5.5 installation has Kerberos enabled. The CheckHdfsIndex test
> fails to run. However, without Kerberos, I am able to run the test with no
> issues.
>
> I ran the following command:
>
> java -cp
> "./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/
> ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar"
> -ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex
> hdfs://:8020/apps/solr/data/ExampleCollection/
> core_node1/data/index
>
> The error is:
>
> ERROR: could not open hdfs directory "
> hdfs://:8020/apps/solr/data/ExampleCollection/
> core_node1/data/index
> ";
> exiting org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.
> AccessControlException):
> SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
> Does this error message imply that the test cannot run with Kerberos
> enabled?
>
> For reference, I followed this blog
> http://yonik.com/solr-5-5/
>
> --
> Regards,
> *Rishabh Patel*
>


Re: Problem with Password Decryption in Data Import Handler

2016-10-06 Thread Kevin Risden
I haven't tried this but is it possible there is a new line at the end in
the file?

If you did something like echo "" > file.txt then there would be a new
line. Use echo -n "" > file.txt

Also you should be able to check how many characters are in the file.

Kevin Risden

On Wed, Oct 5, 2016 at 5:00 PM, Jamie Jackson  wrote:

> Hi Folks,
>
> (Using Solr 5.5.3.)
>
> As far as I know, the only place where encrypted password use is documented
> is in
> https://cwiki.apache.org/confluence/display/solr/
> Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler,
> under the "Configuring the DIH Configuration File", in a comment in the
> sample XML file:
>
> 
>
> Anyway, I can encrypt just fine:
>
> $ openssl enc -aes-128-cbc -a -salt -in stgps.txt
> enter aes-128-cbc encryption password:
> Verifying - enter aes-128-cbc encryption password:
> U2FsdGVkX1+VtVoQtmEREvB5qZjn3131+N4jRXmjyIY=
>
>
> I can also decrypt just fine from the command line.
>
> However, if I use the encrypted password and encryptKeyFile in the config
> file, I end up with an error: "String length must be a multiple of four."
>
> https://gist.github.com/jamiejackson/3852dacb03432328ea187d43ade5e4d9
>
> How do I get this working?
>
> Thanks,
> Jamie
>


Re: How to substract numeric value stored in 2 documents related by correlation id one-to-one

2016-10-19 Thread Kevin Risden
The Parallel SQL support for what you are asking for doesn't exist quite
yet. The use case you described is close to what I was envisioning for the
Solr SQL support. This would allow full text searches and then some
analytics on top of it (like call duration).

I'm not sure if subtracting fields (c2.time-c1.time) is supported in
streaming expressions yet. The leftOuterJoin is but not sure about
arbitrary math equations. The Parallel SQL side has an issue w/ 1!=0 right
now so I'm guessing adding/subtracting is also out for now.

The ticket you will want to follow is SOLR-8593 (
https://issues.apache.org/jira/browse/SOLR-8593) This is the Calcite
integration and should enable a lot more SQL syntax as a result.

Kevin Risden
Apache Lucene/Solr Committer
Hadoop and Search Tech Lead | Avalon Consulting, LLC
<http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>

-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

On Wed, Oct 19, 2016 at 8:23 AM,  wrote:

> Hello,
> I have 2 documents recorded at request or response of a service call  :
> Entity Request
>  {
>   "type":"REQ",
>   "reqid":"MES0",
>"service":"service0",
>"time":1,
>  }
> Entity response
>  {
>   "type":"RES",
>   "reqid":"MES0",
>"time":10,
>  }
>
> I need to create following statistics:
> Total service call duration for each call (reqid is unique for each
> service call) :
> similar to query :
> select c1.reqid,c1.service,c1.time as REQTime, c2.time as RESTime ,
> c2.time - c1.time as TotalTime from collection c1 left join collection c2
> on c1.reqid = c2.reqid and c2.type = 'RES'
>
>  {
>"reqid":"MES0",
>"service":service0,
>"REQTime":1,
>"RESTime":10,
>"TotalTime":9
>  }
>
> Average service call duration :
> similar to query :
> select c1.service,  avg(c2.time - c1.time) as AvgTime, count(*) from
> collection c1 left join collection c2 on c1.reqid = c2.reqid and c2.type =
> 'RES' group by c1.service
>
>  {
>"service":service0,
>"AvgTime":9,
>"Count": 1
>  }
>
> I Tried to find solution in archives, I experimented  with !join,
> subquery, _query_ etc. but not succeeded..
> I can probably use streaming and leftOuterJoin, but in my understanding
> this functionality is not ready for production.
> Is SOLR capable to fulfill these use cases?  What are the key functions to
> focus on ?
>
> Thanks' Pavel
>
>
>
>
>
>
>
>
>


Re: Sorl shards: very sensitive to swap space usage !?

2016-11-10 Thread Kevin Risden
Agreed with what Shawn and Erick said.

If you don't see anything in the Solr logs and your servers are swapping a
lot, this could mean the Linux OOM killer is killing the Solr process (and
maybe others). There is usually a log of this depending on your Linux
distribution.

Kevin Risden

On Thu, Nov 10, 2016 at 6:42 PM, Shawn Heisey  wrote:

> On 11/10/2016 3:20 PM, Chetas Joshi wrote:
> > I have a SolrCloud (Solr 5.5.0) of 50 nodes. The JVM heap memory usage
> > of my solr shards is never more than 50% of the total heap. However,
> > the hosts on which my solr shards are deployed often run into 99% swap
> > space issue. This causes the solr shards go down. Why solr shards are
> > so sensitive to the swap space usage? The JVM heap is more than enough
> > so the shards should never require the swap space. What could be the
> > reason? Where can find the reason why the solr shards go down. I don't
> > see anything on the solr logs.
>
> If the machine that Solr is installed on is using swap, that means
> you're having serious problems, and your performance will be TERRIBLE.
> This kind of problem cannot be caused by Solr if it is properly
> configured for the machine it's running on.
>
> Solr is a Java program.  That means its memory usage is limited to the
> Java heap, plus a little bit for Java itself, and absolutely cannot go
> any higher.  If the Java heap is set too large, then the operating
> system might utilize swap to meet Java's memory demands.  The solution
> is to set your Java heap to a value that's significantly smaller than
> the amount of available physical memory.  Setting the heap to a value
> that's close to (or more than) the amount of physical memory, is a
> recipe for very bad performance.
>
> You need to also limit the memory usage of other software installed on
> the machine, or you might run into a situation where swap is required
> that is not Solr's fault.
>
> Thanks,
> Shawn
>
>


Re: Basic Auth for Solr Streaming Expressions

2016-11-16 Thread Kevin Risden
Was a JIRA ever created for this? I couldn't find it searching.

One that is semi related is SOLR-8213 for SolrJ JDBC auth.

Kevin Risden

On Wed, Nov 9, 2016 at 8:25 PM, Joel Bernstein  wrote:

> Thanks for digging into this, let's create a jira ticket for this.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Nov 9, 2016 at 6:23 PM, sandeep mukherjee <
> wiredcit...@yahoo.com.invalid> wrote:
>
> > I have more progress since my last mail. I figured out that  in the
> > StreamContext object there is a way to set the SolrClientCache object
> which
> > keep reference to all the CloudSolrClient where I can set a reference to
> > HttpClient which sets the Basic Auth header. However the problem is,
> inside
> > the SolrClientCache there is no way to set your own version of
> > CloudSolrClient with BasicAuth enabled. Unfortunately, SolrClientCache
> has
> > no set method which takes a CloudSolrClient object.
> > So long story short we need an API in SolrClientCache to
> > accept CloudSolrClient object from user.
> > Please let me know if there is a better way to enable Basic Auth when
> > using StreamFactory as mentioned in my previous email.
> > Thanks much,Sandeep
> >
> > On Wednesday, November 9, 2016 11:44 AM, sandeep mukherjee
> >  wrote:
> >
> >
> >  Hello everyone,
> > I trying to find the documentation for Basic Auth plugin for Solr
> > Streaming expressions. But I'm not able to find it in the documentation
> > anywhere. Could you please point me in right direction of how to enable
> > Basic auth for Solr Streams?
> > I'm creating StreamFactory as follows: I wonder how and where can I
> > specify Basic Auth username and password
> > @Bean
> > public StreamFactory streamFactory() {
> > SolrConfig solrConfig = ConfigManager.getNamedConfig("solr",
> > SolrConfig.class);
> >
> > return new StreamFactory().withDefaultZkHost(solrConfig.
> > getConnectString())
> > .withFunctionName("gatherNodes", GatherNodesStream.class);
> > }
> >
> >
> >
>


Re: Hardware size in solrcloud

2016-11-16 Thread Kevin Risden
First question: is your initial sizing correct?

7GB/1 billion = 7 bytes per document? That would be basically 7 characters?

Anyway there are lots of variables regarding sizing. The typical response
is:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Kevin Risden

On Wed, Nov 16, 2016 at 1:12 PM, Mugeesh Husain  wrote:

> I have lots of document i dont know now how much it would be in future. for
> the inilial stage, I am looking for hardware details(assumption).
>
> I are looking forward to setting up a billion document(1 billion approx)
> solr index and the size is 7GB.
>
> Can you please suggest the hardware details as per experience.
> 1. OS(32/64bit):
> 2. Processor:
> 3. RAM:
> 4. No of physical servers/systems :
>
>
> Thanks
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Hardware-size-in-solrcloud-tp4306169.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Basic Auth for Solr Streaming Expressions

2016-11-16 Thread Kevin Risden
Thanks Sandeep!

Kevin Risden

On Wed, Nov 16, 2016 at 3:33 PM, sandeep mukherjee <
wiredcit...@yahoo.com.invalid> wrote:

> [SOLR-9779] Basic auth in not supported in Streaming Expressions - ASF JIRA
>
> |
> |
> |
> |   ||
>
>|
>
>   |
> |
> |   |
> [SOLR-9779] Basic auth in not supported in Streaming Expressions - ASF JIRA
>|   |
>
>   |
>
>   |
>
>
>
> I have created the above jira ticket for the base auth support in solr
> streaming expressions.
> ThanksSandeep
>
> On Wednesday, November 16, 2016 8:22 AM, sandeep mukherjee
>  wrote:
>
>
>   blockquote, div.yahoo_quoted { margin-left: 0 !important;
> border-left:1px #715FFA solid !important; padding-left:1ex !important;
> background-color:white !important; }  Nope never got past the login screen.
> Will create one today.
>
>
> Sent from Yahoo Mail for iPhone
>
>
> On Wednesday, November 16, 2016, 8:17 AM, Kevin Risden <
> compuwizard...@gmail.com> wrote:
>
> Was a JIRA ever created for this? I couldn't find it searching.
>
> One that is semi related is SOLR-8213 for SolrJ JDBC auth.
>
> Kevin Risden
>
> On Wed, Nov 9, 2016 at 8:25 PM, Joel Bernstein  wrote:
>
> > Thanks for digging into this, let's create a jira ticket for this.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, Nov 9, 2016 at 6:23 PM, sandeep mukherjee <
> > wiredcit...@yahoo.com.invalid> wrote:
> >
> > > I have more progress since my last mail. I figured out that  in the
> > > StreamContext object there is a way to set the SolrClientCache object
> > which
> > > keep reference to all the CloudSolrClient where I can set a reference
> to
> > > HttpClient which sets the Basic Auth header. However the problem is,
> > inside
> > > the SolrClientCache there is no way to set your own version of
> > > CloudSolrClient with BasicAuth enabled. Unfortunately, SolrClientCache
> > has
> > > no set method which takes a CloudSolrClient object.
> > > So long story short we need an API in SolrClientCache to
> > > accept CloudSolrClient object from user.
> > > Please let me know if there is a better way to enable Basic Auth when
> > > using StreamFactory as mentioned in my previous email.
> > > Thanks much,Sandeep
> > >
> > >On Wednesday, November 9, 2016 11:44 AM, sandeep mukherjee
> > >  wrote:
> > >
> > >
> > >  Hello everyone,
> > > I trying to find the documentation for Basic Auth plugin for Solr
> > > Streaming expressions. But I'm not able to find it in the documentation
> > > anywhere. Could you please point me in right direction of how to enable
> > > Basic auth for Solr Streams?
> > > I'm creating StreamFactory as follows: I wonder how and where can I
> > > specify Basic Auth username and password
> > > @Bean
> > > public StreamFactory streamFactory() {
> > >SolrConfig solrConfig = ConfigManager.getNamedConfig("solr",
> > > SolrConfig.class);
> > >
> > >return new StreamFactory().withDefaultZkHost(solrConfig.
> > > getConnectString())
> > >.withFunctionName("gatherNodes", GatherNodesStream.class);
> > > }
> > >
> > >
> > >
> >
>
>
>
>
>
>


Request to be added to the ContributorsGroup

2017-08-23 Thread Kevin Grimes
Hi there,

I would like to contribute to the Solr wiki. My username is KevinGrimes, and my 
e-mail is kevingrim...@me.com <mailto:kevingrim...@me.com>.

Thanks,
Kevin



Re: Solr uses lots of shared memory!

2017-09-02 Thread Kevin Risden
I haven't looked at reproducing this locally, but since it seems like
there haven't been any new ideas decided to share this in case it
helps:

I noticed in Travis CI [1] they are adding the environment variable
MALLOC_ARENA_MAX=2 and so I googled what that configuration did. To my
surprise, I came across a stackoverflow post [2] about how glibc could
actually be the case and report memory differently. I then found a
Hadoop issue HADOOP-7154 [3] about setting this as well to reduce
virtual memory usage. I found some more cases where this has helped as
well [4], [5], and [6]

[1] https://docs.travis-ci.com/user/build-environment-updates/2017-09-06/#Added
[2] 
https://stackoverflow.com/questions/10575342/what-would-cause-a-java-process-to-greatly-exceed-the-xmx-or-xss-limit
[3] https://issues.apache.org/jira/browse/HADOOP-7154?focusedCommentId=14505792
[4] https://github.com/cloudfoundry/java-buildpack/issues/320
[5] https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
[6] 
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en
Kevin Risden


On Thu, Aug 24, 2017 at 10:19 AM, Markus Jelsma
 wrote:
> Hello Bernd,
>
> According to the man page, i should get a list of stuff in shared memory if i 
> invoke it with just a PID. Which shows a list of libraries that together 
> account for about 25 MB's shared memory usage. Accoring to ps and top, the 
> JVM uses 2800 MB shared memory (not virtual), that leaves 2775 MB unaccounted 
> for. Any ideas? Anyone else to reproduce it on a freshly restarted node?
>
> Thanks,
> Markus
>
>
>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> 18901 markus20   0 14,778g 4,965g 2,987g S 891,1 31,7  20:21.63 java
>
> 0x55b9a17f1000  6K  /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
> 0x7fdf1d314000  182K
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libsunec.so
> 0x7fdf1e548000  38K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libmanagement.so
> 0x7fdf1e78e000  94K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
> 0x7fdf1e9a6000  75K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnio.so
> 0x7fdf5cd6e000  34K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libzip.so
> 0x7fdf5cf77000  46K /lib/x86_64-linux-gnu/libnss_files-2.24.so
> 0x7fdf5d189000  46K /lib/x86_64-linux-gnu/libnss_nis-2.24.so
> 0x7fdf5d395000  90K /lib/x86_64-linux-gnu/libnsl-2.24.so
> 0x7fdf5d5ae000  34K /lib/x86_64-linux-gnu/libnss_compat-2.24.so
> 0x7fdf5d7b7000  187K
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjava.so
> 0x7fdf5d9e6000  70K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libverify.so
> 0x7fdf5dbf8000  30K /lib/x86_64-linux-gnu/librt-2.24.so
> 0x7fdf5de0  90K /lib/x86_64-linux-gnu/libgcc_s.so.1
> 0x7fdf5e017000  1063K   /lib/x86_64-linux-gnu/libm-2.24.so
> 0x7fdf5e32  1553K   /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
> 0x7fdf5e6a8000  15936K  
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> 0x7fdf5f5ed000  139K/lib/x86_64-linux-gnu/libpthread-2.24.so
> 0x7fdf5f80b000  14K /lib/x86_64-linux-gnu/libdl-2.24.so
> 0x7fdf5fa0f000  110K/lib/x86_64-linux-gnu/libz.so.1.2.11
> 0x7fdf5fc2b000  1813K   /lib/x86_64-linux-gnu/libc-2.24.so
> 0x7fdf5fff2000  58K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/jli/libjli.so
> 0x7fdf60201000  158K/lib/x86_64-linux-gnu/ld-2.24.so
>
> -Original message-
>> From:Bernd Fehling 
>> Sent: Thursday 24th August 2017 15:39
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr uses lots of shared memory!
>>
>> Just an idea, how about taking a dump with jmap and using
>> MemoryAnalyzerTool to see what is going on?
>>
>> Regards
>> Bernd
>>
>>
>> Am 24.08.2017 um 11:49 schrieb Markus Jelsma:
>> > Hello Shalin,
>> >
>> > Yes, the main search index has DocValues on just a few fields, they are 
>> > used for facetting and function queries, we started using DocValues when 
>> > 6.0 was released. Most fields are content fields for many languages. I 
>> > don't think it is going to be DocValues because the max shared memory 
>> > consumption is reduced my searching on fields fewer languages, and by 
>> > disabling highlighting, both not using DocValues.
>> >
>> > But it tried the option regardless, and because i didn't know about it. 
>> > But it appears the op

solr 7.0.1: exception running post to crawl simple website

2017-10-11 Thread Kevin Layer
I want to use solr to index a markdown website.  The files
are in native markdown, but they are served in HTML (by markserv).

Here's what I did:

docker run --name solr -d -p 8983:8983 -t solr
docker exec -it --user=solr solr bin/solr create_core -c handbook

Then, to crawl the site:

quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook 
http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes md
/docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar 
-Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web 
org.apache.solr.util.SimplePostTool http://quadra.franz.com:9091/index.md
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file 
endings md
SimplePostTool: WARNING: Never crawl an external web site faster than every 10 
seconds, your IP will probably be blocked
Entering recursive mode, depth=10, delay=0s
Entering crawl at level 0 (1 links total, 1 new)
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.solr.util.SimplePostTool$PageFetcher.readPageFromUrl(SimplePostTool.java:1138)
at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:603)
at 
org.apache.solr.util.SimplePostTool.postWebPages(SimplePostTool.java:563)
at 
org.apache.solr.util.SimplePostTool.doWebMode(SimplePostTool.java:365)
at org.apache.solr.util.SimplePostTool.execute(SimplePostTool.java:187)
at org.apache.solr.util.SimplePostTool.main(SimplePostTool.java:172)
quadra[git:master]$ 


Any ideas on what I did wrong?

Thanks.

Kevin


Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote:

>> Kevin,
>> 
>> You are getting NPE at:
>> 
>> String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL
>> 
>> // related code
>> 
>> String rawContentType = conn.getContentType();
>> 
>> public String getContentType() {
>> return getHeaderField("content-type");
>> }
>> 
>> HttpURLConnection conn = (HttpURLConnection) u.openConnection();
>> 
>> Can you check at your webpage level headers are properly set and it
>> has key "content-type".

Amrit, this is markserv, and I just used wget to prove you are
correct, there is no Content-Type header.

Thanks for the help!  I'll see if I can hack markserv to add that, and
try again.

Kevin


Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
OK, so I hacked markserv to add Content-Type text/html, but now I get

SimplePostTool: WARNING: Skipping URL with unsupported type text/html

What is it expecting?

$ docker exec -it --user=solr solr bin/post -c handbook 
http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
/docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar 
-Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web 
org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file 
endings md
SimplePostTool: WARNING: Never crawl an external web site faster than every 10 
seconds, your IP will probably be blocked
Entering recursive mode, depth=10, delay=0s
Entering crawl at level 0 (1 links total, 1 new)
SimplePostTool: WARNING: Skipping URL with unsupported type text/html
SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a HTTP 
result status of 415
0 web pages indexed.
COMMITting Solr index changes to 
http://localhost:8983/solr/handbook/update/extract...
Time spent: 0:00:03.882
$ 

Thanks.

Kevin


Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote:

>> Strange,
>> 
>> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
>> Content-Type. Let's see what it says now.

Same thing.  Verified Content-Type:

quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |& grep 
Content-Type
  Content-Type: text/html;charset=utf-8
quadra[git:master]$ ]

quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook 
http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
/docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar 
-Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web 
org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file 
endings md
SimplePostTool: WARNING: Never crawl an external web site faster than every 10 
seconds, your IP will probably be blocked
Entering recursive mode, depth=10, delay=0s
Entering crawl at level 0 (1 links total, 1 new)
SimplePostTool: WARNING: Skipping URL with unsupported type text/html
SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a HTTP 
result status of 415
0 web pages indexed.
COMMITting Solr index changes to 
http://localhost:8983/solr/handbook/update/extract...
Time spent: 0:00:00.531
quadra[git:master]$ 

Kevin

>> 
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> 
>> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer  wrote:
>> 
>> > OK, so I hacked markserv to add Content-Type text/html, but now I get
>> >
>> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> >
>> > What is it expecting?
>> >
>> > $ docker exec -it --user=solr solr bin/post -c handbook
>> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> > /docker-java-home/jre/bin/java -classpath 
>> > /opt/solr/dist/solr-core-7.0.1.jar
>> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
>> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> > SimplePostTool version 5.0.0
>> > Posting web pages to Solr url http://localhost:8983/solr/
>> > handbook/update/extract
>> > Entering auto mode. Indexing pages with content-types corresponding to
>> > file endings md
>> > SimplePostTool: WARNING: Never crawl an external web site faster than
>> > every 10 seconds, your IP will probably be blocked
>> > Entering recursive mode, depth=10, delay=0s
>> > Entering crawl at level 0 (1 links total, 1 new)
>> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
>> > HTTP result status of 415
>> > 0 web pages indexed.
>> > COMMITting Solr index changes to http://localhost:8983/solr/
>> > handbook/update/extract...
>> > Time spent: 0:00:03.882
>> > $
>> >
>> > Thanks.
>> >
>> > Kevin
>> >


Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote:

>> Reference to the code:
>> 
>> .
>> 
>> String rawContentType = conn.getContentType();
>> String type = rawContentType.split(";")[0];
>> if(typeSupported(type) || "*".equals(fileTypes)) {
>>   String encoding = conn.getContentEncoding();
>> 
>> .
>> 
>> protected boolean typeSupported(String type) {
>>   for(String key : mimeMap.keySet()) {
>> if(mimeMap.get(key).equals(type)) {
>>   if(fileTypes.contains(key))
>> return true;
>> }
>>   }
>>   return false;
>> }
>> 
>> .
>> 
>> It has another check for fileTypes, I can see the page ending with .md
>> (which you are indexing) and not .html. Let's hope now this is not the
>> issue.

Did you see the "-filetypes md" at the end of the post command line?
Shouldn't that handle it?

Kevin

>> 
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> 
>> On Fri, Oct 13, 2017 at 7:04 PM, Amrit Sarkar 
>> wrote:
>> 
>> > Kevin,
>> >
>> > Just put "html" too and give it a shot. These are the types it is
>> > expecting:
>> >
>> > mimeMap = new HashMap<>();
>> > mimeMap.put("xml", "application/xml");
>> > mimeMap.put("csv", "text/csv");
>> > mimeMap.put("json", "application/json");
>> > mimeMap.put("jsonl", "application/json");
>> > mimeMap.put("pdf", "application/pdf");
>> > mimeMap.put("rtf", "text/rtf");
>> > mimeMap.put("html", "text/html");
>> > mimeMap.put("htm", "text/html");
>> > mimeMap.put("doc", "application/msword");
>> > mimeMap.put("docx", 
>> > "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
>> > mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> > mimeMap.put("pptx", 
>> > "application/vnd.openxmlformats-officedocument.presentationml.presentation");
>> > mimeMap.put("xls", "application/vnd.ms-excel");
>> > mimeMap.put("xlsx", 
>> > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
>> > mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> > mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> > mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
>> > mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
>> > mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
>> > mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
>> > mimeMap.put("txt", "text/plain");
>> > mimeMap.put("log", "text/plain");
>> >
>> > The keys are the types supported.
>> >
>> >
>> > Amrit Sarkar
>> > Search Engineer
>> > Lucidworks, Inc.
>> > 415-589-9269
>> > www.lucidworks.com
>> > Twitter http://twitter.com/lucidworks
>> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >
>> > On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar 
>> > wrote:
>> >
>> >> Ah!
>> >>
>> >> Only supported type is: text/html; encoding=utf-8
>> >>
>> >> I am not confident of this either :) but this should work.
>> >>
>> >> See the code-snippet below:
>> >>
>> >> ..
>> >>
>> >> if(res.httpStatus == 200) {
>> >>   // Raw content type of form "text/html; encoding=utf-8"
>> >>   String rawContentType = conn.getContentType();
>> >>   String type = rawContentType.split(";")[0];
>> >>   if(typeSupported(type) || "*".equals(fileTypes)) {
>> >> String encoding = conn.getContentEncoding();
>> >>
>> >> 
>> >>
>> >>
>> >> Amrit Sarkar
>> >> Search Engineer
>> >> Lucidworks, Inc.
>> >> 415-589-9269
>> >> www.lucidworks.com
>> >&g

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote:

>> Kevin,
>> 
>> Just put "html" too and give it a shot. These are the types it is expecting:

Same thing.

>> 
>> mimeMap = new HashMap<>();
>> mimeMap.put("xml", "application/xml");
>> mimeMap.put("csv", "text/csv");
>> mimeMap.put("json", "application/json");
>> mimeMap.put("jsonl", "application/json");
>> mimeMap.put("pdf", "application/pdf");
>> mimeMap.put("rtf", "text/rtf");
>> mimeMap.put("html", "text/html");
>> mimeMap.put("htm", "text/html");
>> mimeMap.put("doc", "application/msword");
>> mimeMap.put("docx",
>> "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
>> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> mimeMap.put("pptx",
>> "application/vnd.openxmlformats-officedocument.presentationml.presentation");
>> mimeMap.put("xls", "application/vnd.ms-excel");
>> mimeMap.put("xlsx",
>> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
>> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
>> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
>> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
>> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
>> mimeMap.put("txt", "text/plain");
>> mimeMap.put("log", "text/plain");
>> 
>> The keys are the types supported.
>> 
>> 
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> 
>> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar 
>> wrote:
>> 
>> > Ah!
>> >
>> > Only supported type is: text/html; encoding=utf-8
>> >
>> > I am not confident of this either :) but this should work.
>> >
>> > See the code-snippet below:
>> >
>> > ..
>> >
>> > if(res.httpStatus == 200) {
>> >   // Raw content type of form "text/html; encoding=utf-8"
>> >   String rawContentType = conn.getContentType();
>> >   String type = rawContentType.split(";")[0];
>> >   if(typeSupported(type) || "*".equals(fileTypes)) {
>> > String encoding = conn.getContentEncoding();
>> >
>> > 
>> >
>> >
>> > Amrit Sarkar
>> > Search Engineer
>> > Lucidworks, Inc.
>> > 415-589-9269
>> > www.lucidworks.com
>> > Twitter http://twitter.com/lucidworks
>> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >
>> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer  wrote:
>> >
>> >> Amrit Sarkar wrote:
>> >>
>> >> >> Strange,
>> >> >>
>> >> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
>> >> >> Content-Type. Let's see what it says now.
>> >>
>> >> Same thing.  Verified Content-Type:
>> >>
>> >> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |&
>> >> grep Content-Type
>> >>   Content-Type: text/html;charset=utf-8
>> >> quadra[git:master]$ ]
>> >>
>> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
>> >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> >> /docker-java-home/jre/bin/java -classpath 
>> >> /opt/solr/dist/solr-core-7.0.1.jar
>> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
>> >> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> >> SimplePostTool version 5.0.0
>> >> Posting web pages to Solr url http://localhost:8983/solr/han
>> >> dbook/update/extract
>> >> Entering auto mode. Indexing pages with content-types corresponding to
>> >> file endings md
>> >> SimplePostTo

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote:

>> Hi Kevin,
>> 
>> Can you post the solr log in the mail thread. I don't think it handled the
>> .md by itself by first glance at code.

How do I extract the log you want?


>> 
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> 
>> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer  wrote:
>> 
>> > Amrit Sarkar wrote:
>> >
>> > >> Kevin,
>> > >>
>> > >> Just put "html" too and give it a shot. These are the types it is
>> > expecting:
>> >
>> > Same thing.
>> >
>> > >>
>> > >> mimeMap = new HashMap<>();
>> > >> mimeMap.put("xml", "application/xml");
>> > >> mimeMap.put("csv", "text/csv");
>> > >> mimeMap.put("json", "application/json");
>> > >> mimeMap.put("jsonl", "application/json");
>> > >> mimeMap.put("pdf", "application/pdf");
>> > >> mimeMap.put("rtf", "text/rtf");
>> > >> mimeMap.put("html", "text/html");
>> > >> mimeMap.put("htm", "text/html");
>> > >> mimeMap.put("doc", "application/msword");
>> > >> mimeMap.put("docx",
>> > >> "application/vnd.openxmlformats-officedocument.
>> > wordprocessingml.document");
>> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> > >> mimeMap.put("pptx",
>> > >> "application/vnd.openxmlformats-officedocument.
>> > presentationml.presentation");
>> > >> mimeMap.put("xls", "application/vnd.ms-excel");
>> > >> mimeMap.put("xlsx",
>> > >> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
>> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> > >> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
>> > >> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
>> > >> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
>> > >> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
>> > >> mimeMap.put("txt", "text/plain");
>> > >> mimeMap.put("log", "text/plain");
>> > >>
>> > >> The keys are the types supported.
>> > >>
>> > >>
>> > >> Amrit Sarkar
>> > >> Search Engineer
>> > >> Lucidworks, Inc.
>> > >> 415-589-9269
>> > >> www.lucidworks.com
>> > >> Twitter http://twitter.com/lucidworks
>> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >>
>> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar 
>> > >> wrote:
>> > >>
>> > >> > Ah!
>> > >> >
>> > >> > Only supported type is: text/html; encoding=utf-8
>> > >> >
>> > >> > I am not confident of this either :) but this should work.
>> > >> >
>> > >> > See the code-snippet below:
>> > >> >
>> > >> > ..
>> > >> >
>> > >> > if(res.httpStatus == 200) {
>> > >> >   // Raw content type of form "text/html; encoding=utf-8"
>> > >> >   String rawContentType = conn.getContentType();
>> > >> >   String type = rawContentType.split(";")[0];
>> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
>> > >> > String encoding = conn.getContentEncoding();
>> > >> >
>> > >> > 
>> > >> >
>> > >> >
>> > >> > Amrit Sarkar
>> > >> > Search Engineer
>> > >> > Lucidworks, Inc.
>> > >> > 415-589-9269
>> > >> > www.lucidworks.com
>

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote:

>> Hi Kevin,
>> 
>> Can you post the solr log in the mail thread. I don't think it handled the
>> .md by itself by first glance at code.

Note that when I use the admin web interface, and click on "Logging"
on the left, I just see a spinner that implies it's trying to retrieve
the logs (I see headers "Time (Local)   Level   CoreLogger  Message"),
but no log entries.  It's been like this for 10 minutes.

>> 
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> 
>> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer  wrote:
>> 
>> > Amrit Sarkar wrote:
>> >
>> > >> Kevin,
>> > >>
>> > >> Just put "html" too and give it a shot. These are the types it is
>> > expecting:
>> >
>> > Same thing.
>> >
>> > >>
>> > >> mimeMap = new HashMap<>();
>> > >> mimeMap.put("xml", "application/xml");
>> > >> mimeMap.put("csv", "text/csv");
>> > >> mimeMap.put("json", "application/json");
>> > >> mimeMap.put("jsonl", "application/json");
>> > >> mimeMap.put("pdf", "application/pdf");
>> > >> mimeMap.put("rtf", "text/rtf");
>> > >> mimeMap.put("html", "text/html");
>> > >> mimeMap.put("htm", "text/html");
>> > >> mimeMap.put("doc", "application/msword");
>> > >> mimeMap.put("docx",
>> > >> "application/vnd.openxmlformats-officedocument.
>> > wordprocessingml.document");
>> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> > >> mimeMap.put("pptx",
>> > >> "application/vnd.openxmlformats-officedocument.
>> > presentationml.presentation");
>> > >> mimeMap.put("xls", "application/vnd.ms-excel");
>> > >> mimeMap.put("xlsx",
>> > >> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
>> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> > >> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
>> > >> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
>> > >> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
>> > >> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
>> > >> mimeMap.put("txt", "text/plain");
>> > >> mimeMap.put("log", "text/plain");
>> > >>
>> > >> The keys are the types supported.
>> > >>
>> > >>
>> > >> Amrit Sarkar
>> > >> Search Engineer
>> > >> Lucidworks, Inc.
>> > >> 415-589-9269
>> > >> www.lucidworks.com
>> > >> Twitter http://twitter.com/lucidworks
>> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >>
>> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar 
>> > >> wrote:
>> > >>
>> > >> > Ah!
>> > >> >
>> > >> > Only supported type is: text/html; encoding=utf-8
>> > >> >
>> > >> > I am not confident of this either :) but this should work.
>> > >> >
>> > >> > See the code-snippet below:
>> > >> >
>> > >> > ..
>> > >> >
>> > >> > if(res.httpStatus == 200) {
>> > >> >   // Raw content type of form "text/html; encoding=utf-8"
>> > >> >   String rawContentType = conn.getContentType();
>> > >> >   String type = rawContentType.split(";")[0];
>> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
>> > >> > String encoding = conn.getContentEncoding();
>> > >> >
>> > >> > 
>> 

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
mp;_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:48:38.831 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/logging 
params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:48:48.833 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/logging 
params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:48:58.833 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/logging 
params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:49:08.834 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/logging 
params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:49:18.832 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/logging 
params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:49:28.835 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/logging 
params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:49:38.861 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/logging 
params={wt=json&_=1507905257696&since=0} status=0 QTime=14
2017-10-13 14:49:48.853 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/logging 
params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:49:58.837 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/logging 
params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:50:08.833 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/logging 
params={wt=json&_=1507905257696&since=0} status=0 QTime=0



>> 
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> 
>> On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer  wrote:
>> 
>> > Amrit Sarkar wrote:
>> >
>> > >> Hi Kevin,
>> > >>
>> > >> Can you post the solr log in the mail thread. I don't think it handled
>> > the
>> > >> .md by itself by first glance at code.
>> >
>> > How do I extract the log you want?
>> >
>> >
>> > >>
>> > >> Amrit Sarkar
>> > >> Search Engineer
>> > >> Lucidworks, Inc.
>> > >> 415-589-9269
>> > >> www.lucidworks.com
>> > >> Twitter http://twitter.com/lucidworks
>> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >>
>> > >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer  wrote:
>> > >>
>> > >> > Amrit Sarkar wrote:
>> > >> >
>> > >> > >> Kevin,
>> > >> > >>
>> > >> > >> Just put "html" too and give it a shot. These are the types it is
>> > >> > expecting:
>> > >> >
>> > >> > Same thing.
>> > >> >
>> > >> > >>
>> > >> > >> mimeMap = new HashMap<>();
>> > >> > >> mimeMap.put("xml", "application/xml");
>> > >> > >> mimeMap.put("csv", "text/csv");
>> > >> > >> mimeMap.put("json", "application/json");
>> > >> > >> mimeMap.put("jsonl", "application/json");
>> > >> > >> mimeMap.put("pdf", "application/pdf");
>> > >> > >> mimeMap.put("rtf", "text/rtf");
>> > >> > >> mimeMap.put("html", "text/html");
>> > >> > >> mimeMap.put("htm", "text/html");
>> > >> > >> mimeMap.put("doc", "application/msword");
>> > >> > >> mimeMap.put("docx",
>> > >> > >> "application/vnd.openxmlformats-officedocument.
>> > >> > wordprocessingml.document");
>> > >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> > >> > >> mimeMap.put("pptx",
>> > >> > >> "application/vnd.openxmlformats-officedocument.
>> > >> > pr

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote:

>> Kevin,
>> 
>> I am not able to replicate the issue on my system, which is bit annoying
>> for me. Try this out for last time:
>> 
>> docker exec -it --user=solr solr bin/post -c handbook
>> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html
>> 
>> and have Content-Type: "html" and "text/html", try with both.

With text/html I get and your command I get

quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook 
http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html
/docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar 
-Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=html -Dc=handbook -Ddata=web 
org.apache.solr.util.SimplePostTool http://quadra.franz.com:9091/index.md
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file 
endings html
SimplePostTool: WARNING: Never crawl an external web site faster than every 10 
seconds, your IP will probably be blocked
Entering recursive mode, depth=10, delay=0s
Entering crawl at level 0 (1 links total, 1 new)
POSTed web resource http://quadra.franz.com:9091/index.md (depth: 0)
[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "main" java.lang.RuntimeException: 
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not 
allowed in prolog.
at 
org.apache.solr.util.SimplePostTool$PageFetcher.getLinksFromWebPage(SimplePostTool.java:1252)
at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:616)
at 
org.apache.solr.util.SimplePostTool.postWebPages(SimplePostTool.java:563)
at 
org.apache.solr.util.SimplePostTool.doWebMode(SimplePostTool.java:365)
at org.apache.solr.util.SimplePostTool.execute(SimplePostTool.java:187)
at org.apache.solr.util.SimplePostTool.main(SimplePostTool.java:172)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; 
Content is not allowed in prolog.
at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at org.apache.solr.util.SimplePostTool.makeDom(SimplePostTool.java:1061)
at 
org.apache.solr.util.SimplePostTool$PageFetcher.getLinksFromWebPage(SimplePostTool.java:1232)
... 5 more


When I use "-filetype md" back to the regular output that doesn't scan
anything.


>> 
>> If you get past this hurdle this hurdle, let me know.
>> 
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> 
>> On Fri, Oct 13, 2017 at 8:22 PM, Kevin Layer  wrote:
>> 
>> > Amrit Sarkar wrote:
>> >
>> > >> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log
>> > in
>> > >> the machine. I haven't played much with docker, any way you can get that
>> > >> file from that location.
>> >
>> > I see these files:
>> >
>> > /opt/solr/server/logs/archived
>> > /opt/solr/server/logs/solr_gc.log.0.current
>> > /opt/solr/server/logs/solr.log
>> > /opt/solr/server/solr/handbook/data/tlog
>> >
>> > The 3rd one has very little info.  Attached:
>> >
>> >
>> > 2017-10-11 15:28:09.564 INFO  (main) [   ] o.e.j.s.Server
>> > jetty-9.3.14.v20161028
>> > 2017-10-11 15:28:10.668 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
>> > ___  _   Welcome to Apache Solr™ version 7.0.1
>> > 2017-10-11 15:28:10.669 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter /
>> > __| ___| |_ _   Starting in standalone mode on port 8983
>> > 2017-10-11 15:28:10.670 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__
>> > \/ _ \ | '_|  Install dir: /opt/solr, Default config dir:
>> > /opt/solr/server/solr/configsets/_default/conf
>> > 2017-10-11 15:28:10.707 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
>> > |___/\___/_|_|Start time: 2017-10-11T15:28:10.674Z
>> > 2017-10-11 15:28:10.747 INFO  (main) [   ] o.a.s.c.SolrResourceLoader
>> > Using system property solr.solr.home: /opt/solr/server/solr
>> > 2017-10-11 15:28:10.763 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading
>> > container configuration from /opt/solr/server/solr/solr.xml
>> > 2017-10-11 15:28:11.062 INFO  

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote:

>> Kevin,
>> 
>> fileType => md is not recognizable format in SimplePostTool, anyway, moving
>> on.

OK, thanks.  Looks like I'll have to abandon using solr for this
project (or find another way to crawl the site).

Thank you for all the help, though.  I appreciate it.

>> The above is SAXParse, runtime exception. Nothing can be done at Solr end
>> except curating your own data.
>> Some helpful links:
>> https://stackoverflow.com/questions/2599919/java-parsing-xml-document-gives-content-not-allowed-in-prolog-error
>> https://stackoverflow.com/questions/3030903/content-is-not-allowed-in-prolog-when-parsing-perfectly-valid-xml-on-gae
>> 
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> 
>> On Fri, Oct 13, 2017 at 8:48 PM, Kevin Layer  wrote:
>> 
>> > Amrit Sarkar wrote:
>> >
>> > >> Kevin,
>> > >>
>> > >> I am not able to replicate the issue on my system, which is bit annoying
>> > >> for me. Try this out for last time:
>> > >>
>> > >> docker exec -it --user=solr solr bin/post -c handbook
>> > >> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0
>> > -filetypes html
>> > >>
>> > >> and have Content-Type: "html" and "text/html", try with both.
>> >
>> > With text/html I get and your command I get
>> >
>> > quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
>> > http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes
>> > html
>> > /docker-java-home/jre/bin/java -classpath 
>> > /opt/solr/dist/solr-core-7.0.1.jar
>> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=html -Dc=handbook
>> > -Ddata=web org.apache.solr.util.SimplePostTool
>> > http://quadra.franz.com:9091/index.md
>> > SimplePostTool version 5.0.0
>> > Posting web pages to Solr url http://localhost:8983/solr/
>> > handbook/update/extract
>> > Entering auto mode. Indexing pages with content-types corresponding to
>> > file endings html
>> > SimplePostTool: WARNING: Never crawl an external web site faster than
>> > every 10 seconds, your IP will probably be blocked
>> > Entering recursive mode, depth=10, delay=0s
>> > Entering crawl at level 0 (1 links total, 1 new)
>> > POSTed web resource http://quadra.franz.com:9091/index.md (depth: 0)
>> > [Fatal Error] :1:1: Content is not allowed in prolog.
>> > Exception in thread "main" java.lang.RuntimeException:
>> > org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is
>> > not allowed in prolog.
>> > at org.apache.solr.util.SimplePostTool$PageFetcher.
>> > getLinksFromWebPage(SimplePostTool.java:1252)
>> > at org.apache.solr.util.SimplePostTool.webCrawl(
>> > SimplePostTool.java:616)
>> > at org.apache.solr.util.SimplePostTool.postWebPages(
>> > SimplePostTool.java:563)
>> > at org.apache.solr.util.SimplePostTool.doWebMode(
>> > SimplePostTool.java:365)
>> > at org.apache.solr.util.SimplePostTool.execute(
>> > SimplePostTool.java:187)
>> > at org.apache.solr.util.SimplePostTool.main(
>> > SimplePostTool.java:172)
>> > Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1;
>> > Content is not allowed in prolog.
>> > at com.sun.org.apache.xerces.internal.parsers.DOMParser.
>> > parse(DOMParser.java:257)
>> > at com.sun.org.apache.xerces.internal.jaxp.
>> > DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
>> > at javax.xml.parsers.DocumentBuilder.parse(
>> > DocumentBuilder.java:121)
>> > at org.apache.solr.util.SimplePostTool.makeDom(
>> > SimplePostTool.java:1061)
>> > at org.apache.solr.util.SimplePostTool$PageFetcher.
>> > getLinksFromWebPage(SimplePostTool.java:1232)
>> > ... 5 more
>> >
>> >
>> > When I use "-filetype md" back to the regular output that doesn't scan
>> > anything.
>> >
>> >
>> > >>
>> > >> If you get past this hurdle this hurdle, let me know.
>> > >>
>> > >> Amrit Sarkar
>> > >> Search Engineer
>> > >> Lucidworks

Re: Parallel SQL: GROUP BY throws exception

2017-10-17 Thread Kevin Risden
Calcite might support this in 0.14. I know group by support was improved
lately. It might be as simple as upgrading the dependency? A test case
showing the NPE would be helpful. We are using MySQL dialect under the hood
with Calcite.

Kevin Risden

On Tue, Oct 17, 2017 at 8:09 AM, Joel Bernstein  wrote:

> This would be a good jira to create at (
> https://issues.apache.org/jira/projects/SOLR)
>
> Interesting that the query works in MySQL. I'm assuming MySQL automatically
> adds the group by field to the field list. We can look at doing this as
> well.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Oct 17, 2017 at 6:48 AM, Dmitry Gerasimov <
> dgerasi...@kommunion.com>
> wrote:
>
> > Joel,
> >
> > Thanks for the tip. That worked. I was confused since this query works
> > just fine in MySQL.
> > It would of course be very helpful if SOLR was responding with a
> > proper error. What’s the process here? Where do I post this request?
> >
> > Dmitry
> >
> >
> >
> >
> > > -- Forwarded message --
> > > From: Joel Bernstein 
> > > To: solr-user@lucene.apache.org
> > > Cc:
> > > Bcc:
> > > Date: Mon, 16 Oct 2017 11:16:28 -0400
> > > Subject: Re: Parallel SQL: GROUP BY throws exception
> > > Ok, I just the read the query again.
> > >
> > > Try the failing query like this:
> > >
> > > SELECT people_person_id, sum(amount) as total FROM donation GROUP BY
> > > people_person_id
> > >
> > > That is the correct syntax for the SQL group by aggregation.
> > >
> > > It looks like you found a null pointer though where a proper error
> > message
> > > is needed.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Mon, Oct 16, 2017 at 9:49 AM, Joel Bernstein 
> > wrote:
> > >
> > > > Also what version are you using?
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Mon, Oct 16, 2017 at 9:49 AM, Joel Bernstein 
> > > > wrote:
> > > >
> > > >> Can you provide the stack trace?
> > > >>
> > > >> Are you in SolrCloud mode?
> > > >>
> > > >>
> > > >>
> > > >> Joel Bernstein
> > > >> http://joelsolr.blogspot.com/
> > > >>
> > > >> On Mon, Oct 16, 2017 at 9:20 AM, Dmitry Gerasimov <
> > > >> dgerasi...@kommunion.com> wrote:
> > > >>
> > > >>> Hi all!
> > > >>>
> > > >>> This query works as expected:
> > > >>> SELECT sum(amount) as total FROM donation
> > > >>>
> > > >>> Adding GROUP BY:
> > > >>> SELECT sum(amount) as total FROM donation GROUP BY people_person_id
> > > >>>
> > > >>> Now I get response:
> > > >>> {
> > > >>>   "result-set":{
> > > >>> "docs":[{
> > > >>> "EXCEPTION":"Failed to execute sqlQuery 'SELECT sum(amount)
> > as
> > > >>> total  FROM donation GROUP BY people_person_id' against JDBC
> > connection
> > > >>> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT
> sum(amount)
> > as
> > > >>> total  FROM donation GROUP BY people_person_id\": null",
> > > >>> "EOF":true,
> > > >>> "RESPONSE_TIME":279}]}
> > > >>> }
> > > >>>
> > > >>> Any ideas on what is causing this? Or how to debug?
> > > >>>
> > > >>>
> > > >>> Here is the collection structure:
> > > >>>
> > > >>>  > > >>> required="true"
> > > >>> multiValued="false"/>
> > > >>>  > stored="true"
> > > >>> required="true" multiValued="false" docValues="true"/>
> > > >>>  > > >>> required="true" multiValued="false"/>
> > > >>>  > > >>> multiValued="false" docValues="true"/>
> > > >>>
> > > >>>
> > > >>> Thanks!
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>


Trouble using Jython script as ScriptTransformer

2017-11-02 Thread Kevin Grimes
Hey all,

I’m running v6.3.0. I’ve been trying to configure a Jython ScriptTransformer in 
my data-config.xml (pulls from JdbcDataSource). But when I run the full import, 
it tries to interpret the script as JavaScript, even though I added the 
language=Jython attribute to the 

Re: Solr7: Bad query throughput around commit time

2017-11-11 Thread Kevin Risden
> One machine runs with a 3TB drive, running 3 solr processes (each with
one core as described above).

How much total memory on the machine?

Kevin Risden

On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal 
wrote:

> Thanks for a quick and detailed response, Erick!
>
> Unfortunately i don't have a proof, but our servers with solr 4.5 are
> running really nicely with the above config. I had assumed that same  or
> similar settings will also perform well with Solr 7, but that assumption
> didn't hold. As, a lot has changed in 3 major releases.
> I have tweaked the cache values as you suggested but increasing or
> decreasing doesn't seem to do any noticeable improvement.
>
> At the moment, my one core has 800GB index, ~450 Million documents, 48 G
> Xmx. GC pauses haven't been an issue though.  One machine runs with a 3TB
> drive, running 3 solr processes (each with one core as described above).  I
> agree that it is a very atypical system so i should probably try different
> parameters with a fresh eye to find the solution.
>
>
> I tried with autocommits (commit with opensearcher=false very half minute ;
> and softcommit every 5 minutes). That supported the hypothesis that the
> query throughput decreases after opening a new searcher and **not** after
> committing the index . Cache hit ratios are all in 80+% (even when i
> decreased the filterCache to 128, so i will keep it at this lower value).
> Document cache hitratio is really bad, it drops to around 40% after
> newSearcher. But i guess that is expected, since it cannot be warmed up
> anyway.
>
>
> Thanks
> Nawab
>
>
>
> On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson 
> wrote:
>
> > What evidence to you have that the changes you've made to your configs
> > are useful? There's lots of things in here that are suspect:
> >
> >   1
> >
> > First, this is useless unless you are forceMerging/optimizing. Which
> > you shouldn't be doing under most circumstances. And you're going to
> > be rewriting a lot of data every time See:
> >
> > https://lucidworks.com/2017/10/13/segment-merging-deleted-
> > documents-optimize-may-bad/
> >
> > filterCache size of size="10240" is far in excess of what we usually
> > recommend. Each entry can be up to maxDoc/8 and you have 10K of them.
> > Why did you choose this? On the theory that "more is better?" If
> > you're using NOW then you may not be using the filterCache well, see:
> >
> > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/
> >
> > autowarmCount="1024"
> >
> > Every time you commit you're firing off 1024 queries which is going to
> > spike the CPU a lot. Again, this is super-excessive. I usually start
> > with 16 or so.
> >
> > Why are you committing from a cron job? Why not just set your
> > autocommit settings and forget about it? That's what they're for.
> >
> > Your queryResultCache is likewise kind of large, but it takes up much
> > less space than the filterCache per entry so it's probably OK. I'd
> > still shrink it and set the autowarm to 16 or so to start, unless
> > you're seeing a pretty high hit ratio, which is pretty unusual but
> > does happen.
> >
> > 48G of memory is just asking for long GC pauses. How many docs do you
> > have in each core anyway? If you're really using this much heap, then
> > it'd be good to see what you can do to shrink in. Enabling docValues
> > for all fields you facet, sort or group on will help that a lot if you
> > haven't already.
> >
> > How much memory on your entire machine? And how much is used by _all_
> > the JVMs you running on a particular machine? MMapDirectory needs as
> > much OS memory space as it can get, see:
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >
> > Lately we've seen some structures that consume memory until a commit
> > happens (either soft or hard). I'd shrink my autocommit down to 60
> > seconds or even less (openSearcher=false).
> >
> > In short, I'd go back mostly to the default settings and build _up_ as
> > you can demonstrate improvements. You've changed enough things here
> > that untangling which one is the culprit will be hard. You want the
> > JVM to have as little memory as possible, unfortunately that's
> > something you figure out by experimentation.
> >
> > Best,
> > Erick
> >
> > On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal 
> > wrote:
> > > Hi,
> >

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-22 Thread Kevin Risden
Joe,

I have a few questions about your Solr and HDFS setup that could help
improve the recovery performance.

* Is HDFS part of a distribution from Hortonworks, Cloudera, etc?
* Is Solr colocated with HDFS data nodes?
* What is the output of "ps aux | grep solr"? (specifically looking for the
Java arguments that are being set.)

Depending on how Solr on HDFS was setup, there are some potentially simple
settings that can help significantly improve performance.

1) Short circuit reads

If Solr is colocated with an HDFS datanode, short circuit reads can improve
read performance since it skips a network hop if the data is local to that
node. This requires HDFS native libraries to be added to Solr.

2) HDFS block cache in Solr

Solr without HDFS uses the OS page cache to handle caching data for
queries. With HDFS, Solr has a special HDFS block cache which allows for
caching HDFS blocks. This significantly helps query performance. There are
a few configuration parameters that can help here.

Kevin Risden

On Wed, Nov 22, 2017 at 4:20 AM, Hendrik Haddorp 
wrote:

> Hi Joe,
>
> sorry, I have not seen that problem. I would normally not delete a replica
> if the shard is down but only if there is an active shard. Without an
> active leader the replica should not be able to recover. I also just had a
> case where all replicas of a shard stayed in down state and restarts didn't
> help. This was however also caused by lock files. Once I cleaned them up
> and restarted all Solr instances that had a replica they recovered.
>
> For the lock files I discovered that the index is not always in the
> "index" folder but can also be in an index. folder. There can be
> an "index.properties" file in the "data" directory in HDFS and this
> contains the correct index folder name.
>
> If you are really desperate you could also delete all but one replica so
> that the leader election is quite trivial. But this does of course increase
> the risk of finally loosing the data quite a bit. So I would try looking
> into the code and figure out what the problem is here and maybe compare the
> state in HDFS and ZK with a shard that works.
>
> regards,
> Hendrik
>
>
> On 21.11.2017 23:57, Joe Obernberger wrote:
>
>> Hi Hendrick - the shards in question have three replicas.  I tried
>> restarting each one (one by one) - no luck.  No leader is found. I deleted
>> one of the replicas and added a new one, and the new one also shows as
>> 'down'.  I also tried the FORCELEADER call, but that had no effect.  I
>> checked the OVERSEERSTATUS, but there is nothing unusual there.  I don't
>> see anything useful in the logs except the error:
>>
>> org.apache.solr.common.SolrException: Error getting leader from zk for
>> shard shard21
>> at org.apache.solr.cloud.ZkController.getLeader(ZkController.
>> java:996)
>> at org.apache.solr.cloud.ZkController.register(ZkController.java:902)
>> at org.apache.solr.cloud.ZkController.register(ZkController.java:846)
>> at org.apache.solr.core.ZkContainer.lambda$registerInZk$0(
>> ZkContainer.java:181)
>> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
>> xecutor.lambda$execute$0(ExecutorUtil.java:229)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1149)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.solr.common.SolrException: Could not get leader
>> props
>> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkControll
>> er.java:1043)
>> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkControll
>> er.java:1007)
>> at org.apache.solr.cloud.ZkController.getLeader(ZkController.
>> java:963)
>> ... 7 more
>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader
>> at org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:111)
>> at org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:51)
>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>> at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkCl
>> ient.java:357)
>> at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkCl
>> ient.java:354)
>> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk
>> CmdExecutor.java:60)
>> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClie
>> nt.java:354)
>> at org.apache.solr.cloud.ZkController.getLeaderPro

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-22 Thread Kevin Risden
Thanks for the detailed answers Joe. Definitely sounds like you covered
most of the easy HDFS performance items.

Kevin Risden

On Wed, Nov 22, 2017 at 7:44 AM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Hi Kevin -
> * HDFS is part of Cloudera 5.12.0.
> * Solr is co-located in most cases.  We do have several nodes that run on
> servers that are not data nodes, but most do. Unfortunately, our nodes are
> not the same size.  Some nodes have 8TBytes of disk, while our largest
> nodes are 64TBytes.  This results in a lot of data that needs to go over
> the network.
>
> * Command is:
> /usr/lib/jvm/jre-1.8.0/bin/java -server -Xms12g -Xmx16g -Xss2m
> -XX:+UseG1GC -XX:MaxDirectMemorySize=11g -XX:+PerfDisableSharedMem
> -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=16m
> -XX:MaxGCPauseMillis=300 -XX:InitiatingHeapOccupancyPercent=75
> -XX:+UseLargePages -XX:ParallelGCThreads=16 -XX:-ResizePLAB
> -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> -Xloggc:/opt/solr6/server/logs/solr_gc.log -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M -DzkClientTimeout=30
> -DzkHost=frodo.querymasters.com:2181,bilbo.querymasters.com:2181,
> gandalf.querymasters.com:2181,cordelia.querymasters.com:2181,cressida.
> querymasters.com:2181/solr6.6.0 -Dsolr.log.dir=/opt/solr6/server/logs
> -Djetty.port=9100 -DSTOP.PORT=8100 -DSTOP.KEY=solrrocks -Dhost=tarvos
> -Duser.timezone=UTC -Djetty.home=/opt/solr6/server
> -Dsolr.solr.home=/opt/solr6/server/solr -Dsolr.install.dir=/opt/solr6
> -Dsolr.clustering.enabled=true -Dsolr.lock.type=hdfs
> -Dsolr.autoSoftCommit.maxTime=12 -Dsolr.autoCommit.maxTime=180
> -Dsolr.solr.home=/etc/solr6 -Djava.library.path=/opt/cloud
> era/parcels/CDH/lib/hadoop/lib/native -Xss256k -Dsolr.log.muteconsole
> -XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100
> /opt/solr6/server/logs -jar start.jar --module=http
>
> * We have enabled short circuit reads.
>
> Right now, we have a relatively small block cache due to the requirements
> that the servers run other software.  We tried to find the best balance
> between block cache size, and RAM for programs, while still giving enough
> for local FS cache.  This came out to be 84 128M blocks - or about 10G for
> the cache per node (45 nodes total).
>
>  class="solr.HdfsDirectoryFactory">
> true
> true
> 84
> true bool>
> 16384
> true
> true
> 128
> 1024
> hdfs://nameservice1:8020/solr6.6.0 r>
> /etc/hadoop/conf.cloudera.hdfs1 r>
> 
>
> Thanks for reviewing!
>
> -Joe
>
>
>
> On 11/22/2017 8:20 AM, Kevin Risden wrote:
>
>> Joe,
>>
>> I have a few questions about your Solr and HDFS setup that could help
>> improve the recovery performance.
>>
>> * Is HDFS part of a distribution from Hortonworks, Cloudera, etc?
>> * Is Solr colocated with HDFS data nodes?
>> * What is the output of "ps aux | grep solr"? (specifically looking for
>> the
>> Java arguments that are being set.)
>>
>> Depending on how Solr on HDFS was setup, there are some potentially simple
>> settings that can help significantly improve performance.
>>
>> 1) Short circuit reads
>>
>> If Solr is colocated with an HDFS datanode, short circuit reads can
>> improve
>> read performance since it skips a network hop if the data is local to that
>> node. This requires HDFS native libraries to be added to Solr.
>>
>> 2) HDFS block cache in Solr
>>
>> Solr without HDFS uses the OS page cache to handle caching data for
>> queries. With HDFS, Solr has a special HDFS block cache which allows for
>> caching HDFS blocks. This significantly helps query performance. There are
>> a few configuration parameters that can help here.
>>
>> Kevin Risden
>>
>> On Wed, Nov 22, 2017 at 4:20 AM, Hendrik Haddorp > >
>> wrote:
>>
>> Hi Joe,
>>>
>>> sorry, I have not seen that problem. I would normally not delete a
>>> replica
>>> if the shard is down but only if there is an active shard. Without an
>>> active leader the replica should not be able to recover. I also just had
>>> a
>>> case where all replicas of a shard stayed in down state and restarts
>>> didn't
>>> help. This was however also caused by lock files. Once I cleaned them up
>>> and restarted all Solr instances that had a replica they recovered.
>>>
>&g

Re: Solr 6.3.0 SQL question

2016-11-28 Thread Kevin Risden
Is there a longer error/stack trace in your Solr server logs? I wonder if
the real error is being masked.

Kevin Risden

On Mon, Nov 28, 2016 at 3:24 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> I'm running this query:
>
> curl --data-urlencode 'stmt=SELECT avg(TextSize) from UNCLASS'
> http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce
>
> The error that I get back is:
>
> {"result-set":{"docs":[
> {"EXCEPTION":"org.apache.solr.common.SolrException: Collection not found:
> unclass","EOF":true,"RESPONSE_TIME":2}]}}
>
> TextSize is defined as:
>  indexed="true" stored="true"/>
>
> This query works fine:
> curl --data-urlencode 'stmt=SELECT TextSize from UNCLASS'
> http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce
>
> Any idea what I'm doing wrong?
> Thank you!
>
> -Joe
>
>


Re: Documentation of Zookeeper's specific roles and functions in Solr Cloud?

2016-11-29 Thread Kevin Risden
If using CloudSolrClient or another zookeeper aware client, then a request
gets sent to Zookeeper to determine the live nodes. If indexing,
CloudSolrClient can find the leader and send documents directly there. The
client then uses that information to query the correct nodes directly.

Zookeeper is not forwarding requests to Solr. The client requests from
Zookeeper and then the client uses that information to query Solr directly.

Kevin Risden

On Tue, Nov 29, 2016 at 10:49 AM, John Bickerstaff  wrote:

> All,
>
> I've thought I understood that Solr search requests are made to the Solr
> servers and NOT Zookeeper directly.  (I.E. Zookeeper doesn't decide which
> Solr server responds to requests and requests are made directly to Solr)
>
> My new place tells me they're sending requests to Zookeeper - and those are
> getting sent on to Solr by Zookeeper - -- this is news to me if it's
> true...
>
> Is there any documentation of exactly the role(s) played by Zookeeper in a
> SolrCloud setup?
>


Re: Highlighting, offsets -- external doc store

2016-11-29 Thread Kevin Risden
For #3 specifically, I've always found this page useful:

https://cwiki.apache.org/confluence/display/solr/Field+Properties+by+Use+Case

It lists out what properties are necessary on each field based on a use
case.

Kevin Risden

On Tue, Nov 29, 2016 at 11:49 AM, Erick Erickson 
wrote:

> (1) No that I have readily at hand. And to make it
> worse, there's the UnifiedHighlighter coming out soon
>
> I don't think there's a good way for (2).
>
> for (3) at least yes. The reason is simple. For analyzed text,
> the only thing in the index is what's made it through the
> analysis chains. So stopwords are missing. Stemming
> has been done. You could even have put a phonetic filter
> in there and have terms like ARDT KNTR which would
> be...er...not very useful to show the end user so the original
> text must be available.
>
>
>
>
> Not much help...
> Erick
>
> On Tue, Nov 29, 2016 at 8:43 AM, John Bickerstaff
>  wrote:
> > All,
> >
> > One of the questions I've been asked to answer / prove out is around the
> > question of highlighting query matches in responses.
> >
> > BTW - One assumption I'm making is that highlighting is basically a
> > function of storing offsets for terms / tokens at index time.  If that's
> > not right, I'd be grateful for pointers in the right direction.
> >
> > My underlying need is to get highlighting on search term matches for
> > returned documents.  I need to choose between doing this in Solr and
> using
> > an external document store, so I'm interested in whether Solr can provide
> > the doc store with the information necessary to identify which section(s)
> > of the doc to highlight in a query response...
> >
> > A few questions:
> >
> > 1. This page doesn't say a lot about how things work - is there somewhere
> > with more information on dealing with offsets and highlighting? On
> offsets
> > and how they're handled?
> > https://cwiki.apache.org/confluence/display/solr/Highlighting
> >
> > 2. Can I return offset information with a query response or is that
> > internal only?  If yes, can I return offset info if I have NOT stored the
> > data in Solr but indexed only?
> >
> > (Explanation: Currently my project is considering indexing only and
> storing
> > the entire text elsewhere -- using Solr to return only doc ID's for
> > searches.  If Solr could also return offsets, these could be used in
> > processing the text stored elsewhere to provide highlighting)
> >
> > 3. Do I assume correctly that in order for Solr highlighting to work
> > correctly, the text MUST also be stored in Solr (I.E. not indexed only,
> but
> > stored=true)
> >
> > Many thanks...
>


Re: Highlighting, offsets -- external doc store

2016-11-29 Thread Kevin Risden
For #2 you might be able to get away with the following:

https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component

The Term Vector component can return offsets and positions. Not sure how
useful they would be to you, but at least is a starting point. I'm assuming
this requires only termVecotrs and termPositions and won't require stored
to be true.

Kevin Risden

On Tue, Nov 29, 2016 at 12:00 PM, Kevin Risden 
wrote:

> For #3 specifically, I've always found this page useful:
>
> https://cwiki.apache.org/confluence/display/solr/Field+
> Properties+by+Use+Case
>
> It lists out what properties are necessary on each field based on a use
> case.
>
> Kevin Risden
>
> On Tue, Nov 29, 2016 at 11:49 AM, Erick Erickson 
> wrote:
>
>> (1) No that I have readily at hand. And to make it
>> worse, there's the UnifiedHighlighter coming out soon
>>
>> I don't think there's a good way for (2).
>>
>> for (3) at least yes. The reason is simple. For analyzed text,
>> the only thing in the index is what's made it through the
>> analysis chains. So stopwords are missing. Stemming
>> has been done. You could even have put a phonetic filter
>> in there and have terms like ARDT KNTR which would
>> be...er...not very useful to show the end user so the original
>> text must be available.
>>
>>
>>
>>
>> Not much help...
>> Erick
>>
>> On Tue, Nov 29, 2016 at 8:43 AM, John Bickerstaff
>>  wrote:
>> > All,
>> >
>> > One of the questions I've been asked to answer / prove out is around the
>> > question of highlighting query matches in responses.
>> >
>> > BTW - One assumption I'm making is that highlighting is basically a
>> > function of storing offsets for terms / tokens at index time.  If that's
>> > not right, I'd be grateful for pointers in the right direction.
>> >
>> > My underlying need is to get highlighting on search term matches for
>> > returned documents.  I need to choose between doing this in Solr and
>> using
>> > an external document store, so I'm interested in whether Solr can
>> provide
>> > the doc store with the information necessary to identify which
>> section(s)
>> > of the doc to highlight in a query response...
>> >
>> > A few questions:
>> >
>> > 1. This page doesn't say a lot about how things work - is there
>> somewhere
>> > with more information on dealing with offsets and highlighting? On
>> offsets
>> > and how they're handled?
>> > https://cwiki.apache.org/confluence/display/solr/Highlighting
>> >
>> > 2. Can I return offset information with a query response or is that
>> > internal only?  If yes, can I return offset info if I have NOT stored
>> the
>> > data in Solr but indexed only?
>> >
>> > (Explanation: Currently my project is considering indexing only and
>> storing
>> > the entire text elsewhere -- using Solr to return only doc ID's for
>> > searches.  If Solr could also return offsets, these could be used in
>> > processing the text stored elsewhere to provide highlighting)
>> >
>> > 3. Do I assume correctly that in order for Solr highlighting to work
>> > correctly, the text MUST also be stored in Solr (I.E. not indexed only,
>> but
>> > stored=true)
>> >
>> > Many thanks...
>>
>
>


Re: Adding DocExpirationUpdateProcessorFactory causes "Overlapping onDeckSearchers" warnings

2016-12-09 Thread Kevin Risden
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/processor/DocExpirationUpdateProcessorFactory.java#L407

Based on that it looks like this would definitely trigger additional
commits. Specifically with openSearcher being true.

Not sure the best way around this.

Kevin Risden

On Fri, Dec 9, 2016 at 5:15 PM, Brent  wrote:

> I'm using Solr Cloud 6.1.0, and my client application is using SolrJ 6.1.0.
>
> Using this Solr config, I get none of the dreaded "PERFORMANCE WARNING:
> Overlapping onDeckSearchers=2" log messages:
> https://dl.dropboxusercontent.com/u/49733981/solrconfig-no_warnings.xml
>
> However, I start getting them frequently after I add an expiration update
> processor to the update request processor chain, as seen in this config (at
> the bottom):
> https://dl.dropboxusercontent.com/u/49733981/solrconfig-warnings.xml
>
> Do I have something configured wrong in the way I've tried to add the
> function of expiring documents? My client application sets the "expire_at"
> field with the date to remove the document being added, so I don't need
> anything on the Solr Cloud side to calculate the expiration date using a
> TTL. I've confirmed that the documents are getting removed as expected
> after
> the TTL duration.
>
> Is it possible that the expiration processor is triggering additional
> commits? Seems like the warning is usually the result of commits happening
> too frequently. If the commit spacing is fine without the expiration
> processor, but not okay when I add it, it seems like maybe each update is
> now triggering a (soft?) commit. Although, that'd actually be crazy and I'm
> sure I'd see a lot more errors if that were the case... is it triggering a
> commit every 30 seconds, because that's what I have the
> autoDeletePeriodSeconds set to? Maybe if I try to offset that a bit from
> the
> 10 second auto soft commit I'm using? Seems like it'd be better (if that is
> the case) if the processor simple didn't have to do a commit when it
> expires
> documents, and instead let the auto commit settings handle that.
>
> Do I still need the line:
>  name="/update">
> when I have the
>  default="true">
> element?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Adding-
> DocExpirationUpdateProcessorFactory-causes-Overlapping-
> onDeckSearchers-warnings-tp4309155.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: 6.4 in a 6.2.1 cluster?

2017-01-31 Thread Kevin Risden
Just my two cents: I wouldn't trust that it completely works to be honest.
It works for the very small test case that was put together (select q=*:*).
I would love to add more tests to it. If there are any ideas of things that
you think should be tested that would be great to comment on the JIRA
(ideally everything but prioritizing some examples would be nice).

Kevin Risden

On Tue, Jan 31, 2017 at 11:19 AM, Walter Underwood 
wrote:

> I’m sure people need to do this, so I’ll share it worked for me.
>
> I just noticed that there is a new integration test being written to
> verify that this works. Great!
>
> https://issues.apache.org/jira/browse/SOLR-8581 <
> https://issues.apache.org/jira/browse/SOLR-8581>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jan 25, 2017, at 11:18 AM, Walter Underwood 
> wrote:
> >
> > Has anybody done this? Not for long term use of course, but does it work
> well enough
> > for a rolling upgrade?
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> > http://observer.wunderwood.org/  (my blog)
> >
> >
>
>


Re: How long for autoAddReplica?

2017-02-02 Thread Kevin Risden
>
> so migrating by replacing nodes is going to be a bother.


Not sure what you mean by migrating and replacing nodes, but these two new
actions on the Collections API as of Solr 6.2 may be of use:

   -
   
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-REPLACENODE:MoveAllReplicasinaNodetoAnother
   -
   
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETENODE:DeleteReplicasinaNode



Kevin Risden

On Thu, Feb 2, 2017 at 11:46 AM, Erick Erickson 
wrote:

> bq: I don’t see a way to add replicas through the UI, so migrating by
> replacing nodes is going to be a bother
>
> There's a lot of improvements in the admin UI for SolrCloud that I'd
> love to see. Drag/drop replicas would be really cool for instance.
>
> At present though using
> ADDREPLICA/wait-for-new-replica-to-be-active/DELETEREPLICA through the
> collections API is what's available.
>
> Best,
> Erick
>
> On Thu, Feb 2, 2017 at 8:37 AM, Walter Underwood 
> wrote:
> > Oh, missed that limitation.
> >
> > Seems like something that would be very handy in all installations. I
> don’t see a way to add replicas through the UI, so migrating by replacing
> nodes is going to be a bother.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >> On Feb 2, 2017, at 12:25 AM, Hendrik Haddorp 
> wrote:
> >>
> >> Hi,
> >>
> >> are you using HDFS? According to the documentation the feature should
> be only available if you are using HDFS. For me it did however also fail on
> that. See the thread "Solr on HDFS: AutoAddReplica does not add a replica"
> from about two weeks ago.
> >>
> >> regards,
> >> Hendrik
> >>
> >> On 02.02.2017 07:21, Walter Underwood wrote:
> >>> I added a new node an shut down a node with a shard replica on it. It
> has been an hour and I don’t see any activity toward making a new replica.
> >>>
> >>> The new node and the one I shut down are both 6.4. The rest of the
> 16-node cluster is 6.2.1.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>
> >>>
> >>
> >
>


Re: bin/post and self-signed SSL

2017-02-05 Thread Kevin Risden
Last time I looked at this, there was no way to pass any Java properties to
the bin/post command. This made it impossible to even set the SSL
properties manually. I checked master just now and still there is no place
to enter Java properties that would make it to the Java command.

I came up with a chart of commands previously that worked with standard (no
SSL or Kerberos), SSL only, and SSL with Kerberos. Only the standard solr
setup worked for the bin/solr and bin/post commands. Errors popped up that
I couldn't work around. I've been meaning to get back to it just haven't
had a chance.

I'll try to share that info when I get back to my laptop.

Kevin Risden

On Feb 5, 2017 12:31, "Jan Høydahl"  wrote:

> Hi,
>
> I’m trying to post a document to Solr using bin/post after enabling SSL
> with self signed certificate. Result is:
>
> $ post -url https://localhost:8983/solr/sslColl *.html
> /usr/lib/jvm/java-8-openjdk-amd64/bin/java -classpath
> /opt/solr/dist/solr-core-6.4.0.jar -Dauto=yes -Durl=
> https://localhost:8983/solr/sslColl -Dc= -Ddata=files
> org.apache.solr.util.SimplePostTool lab-index.html lab-ops1.html
> lab-ops2.html lab-ops3.html lab-ops4.html lab-ops6.html lab-ops8.html
> SimplePostTool version 5.0.0
> Posting files to [base] url https://localhost:8983/solr/sslColl...
> Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,
> docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> POSTing file lab-index.html (text/html) to [base]/extract
> SimplePostTool: FATAL: Connection error (is Solr running at
> https://localhost:8983/solr/sslColl ?): javax.net.ssl.SSLHandshakeException:
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to
> find valid certification path to requested target
>
>
> Do anyone know a workaround for letting bin/post accept self-signed cert?
> Have not tested it against a CA signed Solr...
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>


Re: bin/post and self-signed SSL

2017-02-05 Thread Kevin Risden
Originally formatted as MarkDown. This was tested against Solr 5.5.x
packaged as Lucidworks HDP Search. It would be the same as Solr 5.5.x.

# Using Solr
*
https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
* https://cwiki.apache.org/confluence/display/solr/Running+Solr
* https://cwiki.apache.org/confluence/display/solr/Collections+API

## Create collection (w/o Kerberos)
```bash
/opt/lucidworks-hdpsearch/solr/bin/solr create -c test
```

## Upload configuration directory (w/ SSL and Kerberos)
```bash
/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh
-zkhost ZK_CONNECTION_STRING -cmd upconfig -confname basic_config -confdir
/opt/lucidworks-hdpsearch/solr/server/solr/configsets/basic_configs/conf
```

## Create Collection (w/ SSL and Kerberos)
```bash
curl -k --negotiate -u : "
https://SOLR_HOST:8983/solr/admin/collections?action=CREATE&name=newCollection&numShards=1&replicationFactor=1&collection.configName=basic_config
"
```

## Delete collection (w/o Kerberos)
```bash
/opt/lucidworks-hdpsearch/solr/bin/solr delete -c test
```

## Delete Collection (w/ SSL and Kerberos)
```bash
curl -k --negotiate -u : "
https://SOLR_HOST:8983/solr/admin/collections?action=DELETE&name=newCollection
"
```

## Adding some test docs (w/o SSL)
```bash
/opt/lucidworks-hdpsearch/solr/bin/post -c test
/opt/lucidworks-hdpsearch/solr/example/exampledocs/*.xml
```

## Adding documents (w/ SSL and Kerberos)
```bash
curl -k --negotiate -u : "
https://SOLR_HOST:8983/solr/newCollection/update?commit=true"; -H
"Content-Type: application/json" --data-binary
@/opt/lucidworks-hdpsearch/solr/example/exampledocs/books.json
```

## List Collections (w/ SSL and Kerberos)
```bash
curl -k --negotiate -u : "
https://SOLR_HOST:8983/solr/admin/collections?action=LIST";
```

Kevin Risden

On Sun, Feb 5, 2017 at 5:55 PM, Kevin Risden 
wrote:

> Last time I looked at this, there was no way to pass any Java properties
> to the bin/post command. This made it impossible to even set the SSL
> properties manually. I checked master just now and still there is no place
> to enter Java properties that would make it to the Java command.
>
> I came up with a chart of commands previously that worked with standard
> (no SSL or Kerberos), SSL only, and SSL with Kerberos. Only the standard
> solr setup worked for the bin/solr and bin/post commands. Errors popped up
> that I couldn't work around. I've been meaning to get back to it just
> haven't had a chance.
>
> I'll try to share that info when I get back to my laptop.
>
> Kevin Risden
>
> On Feb 5, 2017 12:31, "Jan Høydahl"  wrote:
>
>> Hi,
>>
>> I’m trying to post a document to Solr using bin/post after enabling SSL
>> with self signed certificate. Result is:
>>
>> $ post -url https://localhost:8983/solr/sslColl *.html
>> /usr/lib/jvm/java-8-openjdk-amd64/bin/java -classpath
>> /opt/solr/dist/solr-core-6.4.0.jar -Dauto=yes -Durl=
>> https://localhost:8983/solr/sslColl -Dc= -Ddata=files
>> org.apache.solr.util.SimplePostTool lab-index.html lab-ops1.html
>> lab-ops2.html lab-ops3.html lab-ops4.html lab-ops6.html lab-ops8.html
>> SimplePostTool version 5.0.0
>> Posting files to [base] url https://localhost:8983/solr/sslColl...
>> Entering auto mode. File endings considered are
>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,
>> ods,ott,otp,ots,rtf,htm,html,txt,log
>> POSTing file lab-index.html (text/html) to [base]/extract
>> SimplePostTool: FATAL: Connection error (is Solr running at
>> https://localhost:8983/solr/sslColl ?): javax.net.ssl.SSLHandshakeException:
>> sun.security.validator.ValidatorException: PKIX path building failed:
>> sun.security.provider.certpath.SunCertPathBuilderException: unable to
>> find valid certification path to requested target
>>
>>
>> Do anyone know a workaround for letting bin/post accept self-signed cert?
>> Have not tested it against a CA signed Solr...
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>>


Re: 回复: bin/post and self-signed SSL

2017-02-06 Thread Kevin Risden
I expect that the commands work the same or very close from 5.5.x through
6.4.x. There have been some cleaning up of the bin/solr and bin/post
commands but not many security changes. If you find differently then please
let us know.

Kevin Risden

On Feb 5, 2017 21:02, "alias" <524839...@qq.com> wrote:

> You mean this can only be used in this version 5.5.x? Other versions
> invalid?
>
>
>
>
> -- 原始邮件 --
> 发件人: "Kevin Risden";;
> 发送时间: 2017年2月6日(星期一) 上午9:44
> 收件人: "solr-user";
>
> 主题: Re: bin/post and self-signed SSL
>
>
>
> Originally formatted as MarkDown. This was tested against Solr 5.5.x
> packaged as Lucidworks HDP Search. It would be the same as Solr 5.5.x.
>
> # Using Solr
> *
> https://cwiki.apache.org/confluence/display/solr/Solr+
> Start+Script+Reference
> * https://cwiki.apache.org/confluence/display/solr/Running+Solr
> * https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> ## Create collection (w/o Kerberos)
> ```bash
> /opt/lucidworks-hdpsearch/solr/bin/solr create -c test
> ```
>
> ## Upload configuration directory (w/ SSL and Kerberos)
> ```bash
> /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh
> -zkhost ZK_CONNECTION_STRING -cmd upconfig -confname basic_config -confdir
> /opt/lucidworks-hdpsearch/solr/server/solr/configsets/basic_configs/conf
> ```
>
> ## Create Collection (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/admin/collections?action=
> CREATE&name=newCollection&numShards=1&replicationFactor=
> 1&collection.configName=basic_config
> "
> ```
>
> ## Delete collection (w/o Kerberos)
> ```bash
> /opt/lucidworks-hdpsearch/solr/bin/solr delete -c test
> ```
>
> ## Delete Collection (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/admin/collections?action=
> DELETE&name=newCollection
> "
> ```
>
> ## Adding some test docs (w/o SSL)
> ```bash
> /opt/lucidworks-hdpsearch/solr/bin/post -c test
> /opt/lucidworks-hdpsearch/solr/example/exampledocs/*.xml
> ```
>
> ## Adding documents (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/newCollection/update?commit=true"; -H
> "Content-Type: application/json" --data-binary
> @/opt/lucidworks-hdpsearch/solr/example/exampledocs/books.json
> ```
>
> ## List Collections (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/admin/collections?action=LIST";
> ```
>
> Kevin Risden
>
> On Sun, Feb 5, 2017 at 5:55 PM, Kevin Risden 
> wrote:
>
> > Last time I looked at this, there was no way to pass any Java properties
> > to the bin/post command. This made it impossible to even set the SSL
> > properties manually. I checked master just now and still there is no
> place
> > to enter Java properties that would make it to the Java command.
> >
> > I came up with a chart of commands previously that worked with standard
> > (no SSL or Kerberos), SSL only, and SSL with Kerberos. Only the standard
> > solr setup worked for the bin/solr and bin/post commands. Errors popped
> up
> > that I couldn't work around. I've been meaning to get back to it just
> > haven't had a chance.
> >
> > I'll try to share that info when I get back to my laptop.
> >
> > Kevin Risden
> >
> > On Feb 5, 2017 12:31, "Jan Høydahl"  wrote:
> >
> >> Hi,
> >>
> >> I’m trying to post a document to Solr using bin/post after enabling SSL
> >> with self signed certificate. Result is:
> >>
> >> $ post -url https://localhost:8983/solr/sslColl *.html
> >> /usr/lib/jvm/java-8-openjdk-amd64/bin/java -classpath
> >> /opt/solr/dist/solr-core-6.4.0.jar -Dauto=yes -Durl=
> >> https://localhost:8983/solr/sslColl -Dc= -Ddata=files
> >> org.apache.solr.util.SimplePostTool lab-index.html lab-ops1.html
> >> lab-ops2.html lab-ops3.html lab-ops4.html lab-ops6.html lab-ops8.html
> >> SimplePostTool version 5.0.0
> >> Posting files to [base] url https://localhost:8983/solr/sslColl...
> >> Entering auto mode. File endings considered are
> >> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,
> >> ods,ott,otp,ots,rtf,htm,html,txt,log
> >> POSTing file lab-index.html (text/html) to [base]/extract
> >> SimplePostTool: FATAL: Connection error (is Solr running at
> >> https://localhost:8983/solr/sslColl ?): javax.net.ssl.
> SSLHandshakeException:
> >> sun.security.validator.ValidatorException: PKIX path building failed:
> >> sun.security.provider.certpath.SunCertPathBuilderException: unable to
> >> find valid certification path to requested target
> >>
> >>
> >> Do anyone know a workaround for letting bin/post accept self-signed
> cert?
> >> Have not tested it against a CA signed Solr...
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>


Re: SSL using signed client certificate not working

2017-02-15 Thread Kevin Risden
It sounds like Edge, Firefox, and Chrome aren't setup on your computer to
do client authentication. You can set need client authentication to false
and use want client authentication in solr.in.sh. This will allow browsers
that don't present a client certificate to work. Otherwise you need to
configure your browsers.

Client authentication is an extra part of SSL and not usually required.

Kevin Risden

On Feb 15, 2017 4:43 AM, "Espen Rise Halstensen"  wrote:

>
> Hi,
>
> I have some problems with client certificates. By the look of it, it works
> with
> curl and safari prompts for and accepts my certificate. Does not work with
> Edge,
> Firefox or Chrome. The certificates are requested from our CA.
>
> When requesting https://s02/solr in the browser, it doesn't
> prompt for certificate and I get the following error message in Chrome:
> >This site can't provide a secure connection
> >s02 didn't accept your login certificate, or one may not have been
> provided.
> >Try contacting the system admin.
>
> When debugging with wireshark I can see the s01t9 certificate in the
> "certificate request"-part of the handshake, but the browser answers
> without certificate.
>
>
> Setup as follows:
>
> solr.in.sh:
> SOLR_SSL_KEY_STORE=etc/keystore.jks
> SOLR_SSL_KEY_STORE_PASSWORD=secret
> SOLR_SSL_TRUST_STORE=etc/truststore.jks
> SOLR_SSL_TRUST_STORE_PASSWORD=secret
> SOLR_SSL_NEED_CLIENT_AUTH=true
> SOLR_SSL_WANT_CLIENT_AUTH=false
>
> Content of truststore.jks:
> [solruser@s02 etc]# keytool -list -keystore 
> /opt/solr-6.4.0/server/etc/truststore.jks
> -storepass secret
>
> Keystore type: JKS
> Keystore provider: SUN
>
> Your keystore contains 1 entry
>
> s01t9, 15.feb.2017, trustedCertEntry,
> Certificate fingerprint (SHA1): CF:BD:02:71:64:F0:BA:65:71:10:
> A1:23:42:34:E0:3C:37:75:E1:BF
>
>
>
> Curl(returns html of admin page with -L option):
>
> curl -v -E  s01t9.pem:secret --cacert  rootca.pem 'https://vs02/solr'
> * Hostname was NOT found in DNS cache
> *   Trying 10.0.121.132...
> * Connected to s02 (10.0.121.132) port 443 (#0)
> * successfully set certificate verify locations:
> *   CAfile: rootca.pem
>   CApath: /etc/ssl/certs
> * SSLv3, TLS handshake, Client hello (1):
> * SSLv3, TLS handshake, Server hello (2):
> * SSLv3, TLS handshake, CERT (11):
> * SSLv3, TLS handshake, Request CERT (13):
> * SSLv3, TLS handshake, Server finished (14):
> * SSLv3, TLS handshake, CERT (11):
> * SSLv3, TLS handshake, Client key exchange (16):
> * SSLv3, TLS handshake, CERT verify (15):
> * SSLv3, TLS change cipher, Client hello (1):
> * SSLv3, TLS handshake, Finished (20):
> * SSLv3, TLS change cipher, Client hello (1):
> * SSLv3, TLS handshake, Finished (20):
> * SSL connection using AES256-SHA256
> * Server certificate:
> *subject: CN=s01t9
> *start date: 2017-01-09 11:31:49 GMT
> *expire date: 2022-01-08 11:31:49 GMT
> *subjectAltName: s02 matched
> *issuer: DC=local; DC=com; CN=Root CA
> *SSL certificate verify ok.
> > GET /solr HTTP/1.1
> > User-Agent: curl/7.35.0
> > Host: s02
> > Accept: */*
> >
> < HTTP/1.1 302 Found
> < Location: https://s02 /solr/
> < Content-Length: 0
> <
> * Connection #0 to host s02 left intact
>
> Thanks,
> Espen
>


JSON Facet API - Range Query - Missing field parameter NPE

2017-02-24 Thread Kevin Risden
One of my colleagues ran into this testing the JSON Facet API. A malformed
JSON Facet API range query seems to get a NPE and then devolves into saying
no live servers to handle the request. It looks like the
FacetRangeProcessor should check the inputs before trying to getField. Does
this seem reasonable?

The problematic query:

json.facet={price:{type:range,start:0,end:600,gap:50}}

The fixed query:

json.facet={prices:{field:price,type:range,start:0,end:600,gap:50}}

The stack trace:

INFO  - 2017-02-24 20:54:52.217; [c:gettingstarted s:shard1 r:core_node2
x:gettingstarted_shard1_replica1] org.apache.solr.core.SolrCore;
[gettingstarted_shard1_replica1]  webapp=/solr path=/select
params={df=_text_&distrib=false&_facet_={}&fl=id&fl=
score&shards.purpose=1048580&start=0&fsv=true&shard.url=htt
p://localhost:8983/solr/gettingstarted_shard1_replica1/|
http://localhost:7574/solr/gettingstarted_shard1_replica2/&rows=10&;
version=2&q=*:*&json.facet={price:{type:range,start:0,end:600,gap:50}}&NOW=
1487969692214&isShard=true&wt=javabin} hits=2328 status=500 QTime=1
ERROR - 2017-02-24 20:54:52.218; [c:gettingstarted s:shard1 r:core_node2
x:gettingstarted_shard1_replica1] org.apache.solr.common.SolrException;
null:java.lang.NullPointerException
at org.apache.solr.schema.IndexSchema$DynamicReplacement$
DynamicPattern$NameEndsWith.matches(IndexSchema.java:1043)
at org.apache.solr.schema.IndexSchema$DynamicReplacement.matches(
IndexSchema.java:1057)
at org.apache.solr.schema.IndexSchema.getFieldOrNull(IndexSchema.java:1213)
at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1230)
at org.apache.solr.search.facet.FacetRangeProcessor.process(
FacetRange.java:96)
at org.apache.solr.search.facet.FacetProcessor.processSubs(
FacetProcessor.java:439)
at org.apache.solr.search.facet.FacetProcessor.fillBucket(
FacetProcessor.java:396)
at org.apache.solr.search.facet.FacetQueryProcessor.process(
FacetQuery.java:60)
at org.apache.solr.search.facet.FacetModule.process(FacetModule.java:96)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(
SearchHandler.java:295)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)

Kevin Risden


Re: Add fieldType from Solr API

2017-02-26 Thread Kevin Risden
As Alex said there is no Admin UI support. The API is called the Schema API:

https://cwiki.apache.org/confluence/display/solr/Schema+API

That allows you to modify the schema programatically. You will have to
reload the collection either way.

Kevin Risden

On Sun, Feb 26, 2017 at 1:33 PM, Alexandre Rafalovitch 
wrote:

> You can hand edit it, just make sure to reload the collection after.
>
> Otherwise, I believe, there is API.
>
> Not the Admin UI yet, unfortunately.
>
> Regards,
> Alex
>
> On 26 Feb 2017 1:50 PM, "OTH"  wrote:
>
> Hello,
>
> I am new to Solr, and am using Solr v. 6.4.1.
>
> I need to add a new "fieldType" to my schema.  My version of Solr is using
> the "managed-schema" XML file, which I gather one is not supposed to modify
> directly.  Is it possible to add a new fieldType using the Solr Admin via
> the browser?  The "schema" page doesn't seem to provide this option, at
> least from what I can tell.
>
> Thanks
>


Re: Searchable archive of this mailing list

2017-03-31 Thread Kevin Risden
Google usually does a pretty good job of indexing this mailing list.

The other place I'll usually go is here:
http://search-lucene.com/?project=Solr

Kevin Risden

On Fri, Mar 31, 2017 at 4:18 PM, OTH  wrote:

> Hi all,
>
> Is there a searchable archive of this mailing list?
>
> I'm asking just so I don't have to post a question in the future which may
> have been answered before already.
>
> Thanks
>


  1   2   3   4   >