indexing data to solrcloud with "implicit" is not distributing across cluster.

2015-10-06 Thread Steve
I’ve been unable to get solrcloud to distribute data across 4 solr nodes
with the “route.name=implicit”  feature of the collections API.

The nodes are live, and the graphs are green.  All the data (the “Films”
example data) shows up on one node, the node that received the CREATE
command.





My CREATE command is:

curl
http://host-192-168-0-60.openstacklocal:8081/solr/admin/collections?action=CREATE&name=CollectionFilms&replicationFactor=2&router.name=implicit&shards=shard-1,shard-2,shard-3,shard-4&maxShardsPerNode=2&collection.configName=configAlpha



solr version 5.3.1

zookeeper version 3.4.6

indexing with:

   cd /opt/solr/example/films;

/opt/solr/bin/post -c CollectionFilms -port 8081  films.json





Thanks,

strick


Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

2015-10-06 Thread Steve
Thanks Shawn, that fixed it !

The documentation int the Collections API says  "The value can be ...
*implicit*, which uses an internal default hash".
I think most people would assume the "hash" would be used to route the
data.
Meanwhile the description of CompositID in the "Document Routing" section
only discusses how modify your document IDs, which I did not want to do.

thanks again,
.strick



On Tue, Oct 6, 2015 at 8:15 AM, Shawn Heisey  wrote:

> On 10/6/2015 7:58 AM, Steve wrote:
> > I’ve been unable to get solrcloud to distribute data across 4 solr nodes
> > with the “route.name=implicit”  feature of the collections API.
> >
> > The nodes are live, and the graphs are green.  All the data (the “Films”
> > example data) shows up on one node, the node that received the CREATE
> > command.
>
> A better name for the implicit router is "manual."  The implicit router
> doesn't actually route.  It assumes that you know what you are doing and
> have sent the request to the shard where you want it to be indexed.
>
> You want the compositeId router.
>
> Even though the name "implicit" makes sense in the context of Solr
> *code*, it is a confusing name when it comes to user expectations.
> You're not the first one to be confused by this, which is why I opened
> this issue:
>
> https://issues.apache.org/jira/browse/SOLR-6630
>
> Thanks,
> Shawn
>
>


No live SolrServers available to handle this request

2015-10-08 Thread Steve
I've loaded the Films data into a 4 node cluster.  Indexing went well, but
when I issue a query, I get this:

"error": {
"msg": "org.apache.solr.client.solrj.SolrServerException: No live
SolrServers available to handle this request:
[
http://host-192-168-0-63.openstacklocal:8081/solr/CollectionFilms_shard1_replica2
,

http://host-192-168-0-62.openstacklocal:8081/solr/CollectionFilms_shard2_replica2
,

http://host-192-168-0-60.openstacklocal:8081/solr/CollectionFilms_shard2_replica1
]",
...

and further down in the stacktrace:

Server Error
Caused by:
java.lang.NoSuchMethodError:
org.apache.lucene.index.TermsEnum.postings(Lorg/apache/lucene/index/PostingsEnum;I)Lorg/apache/lucene/index/PostingsEnum;\n\tat
org.apache.solr.search.SolrIndexSearcher.getFirstMatch(SolrIndexSearcher.java:802)\n\tat
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:333)\n\tat
...


I'm using:

solr version 5.3.1

lucene 5.2.1

zookeeper version 3.4.6

indexing with:

   cd /opt/solr/example/films;

/opt/solr/bin/post -c CollectionFilms -port 8081  films.json



thx,
.strick


Re: No live SolrServers available to handle this request

2015-10-12 Thread Steve
Thanks Mark,

I rebuilt and made sure the versions matched.  It works.
Not sure how that happened tho..

thx.
.strick

On Thu, Oct 8, 2015 at 4:31 PM, Mark Miller  wrote:

> Your Lucene and Solr versions must match.
>
> On Thu, Oct 8, 2015 at 4:02 PM Steve  wrote:
>
> > I've loaded the Films data into a 4 node cluster.  Indexing went well,
> but
> > when I issue a query, I get this:
> >
> > "error": {
> > "msg": "org.apache.solr.client.solrj.SolrServerException: No live
> > SolrServers available to handle this request:
> > [
> >
> >
> http://host-192-168-0-63.openstacklocal:8081/solr/CollectionFilms_shard1_replica2
> > ,
> >
> >
> >
> http://host-192-168-0-62.openstacklocal:8081/solr/CollectionFilms_shard2_replica2
> > ,
> >
> >
> >
> http://host-192-168-0-60.openstacklocal:8081/solr/CollectionFilms_shard2_replica1
> > ]",
> > ...
> >
> > and further down in the stacktrace:
> >
> > Server Error
> > Caused by:
> > java.lang.NoSuchMethodError:
> >
> >
> org.apache.lucene.index.TermsEnum.postings(Lorg/apache/lucene/index/PostingsEnum;I)Lorg/apache/lucene/index/PostingsEnum;\n\tat
> >
> >
> org.apache.solr.search.SolrIndexSearcher.getFirstMatch(SolrIndexSearcher.java:802)\n\tat
> >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:333)\n\tat
> > ...
> >
> >
> > I'm using:
> >
> > solr version 5.3.1
> >
> > lucene 5.2.1
> >
> > zookeeper version 3.4.6
> >
> > indexing with:
> >
> >cd /opt/solr/example/films;
> >
> > /opt/solr/bin/post -c CollectionFilms -port 8081  films.json
> >
> >
> >
> > thx,
> > .strick
> >
> --
> - Mark
> about.me/markrmiller
>


RE: Verifying solr installation

2015-02-15 Thread steve
+1

> Date: Mon, 16 Feb 2015 11:53:47 +0530
> Subject: Verifying solr installation
> From: karimkhan...@gmail.com
> To: solr-user@lucene.apache.org
> 
> Is there any linux command to verify and see whether solr installed or not
> and if yes, then which version of solr?
  

RE: Performing DIH on predefined list of IDS

2015-02-21 Thread steve
Careful with the GETs! There is a real, hard limit on the length of a GET url 
(in the low hundreds of characters). That's why a POST is so much better for 
complex queries; the limit is in the hundreds of MegaBytes.

> Date: Sat, 21 Feb 2015 01:42:03 -0700
> From: osta...@gmail.com
> To: solr-user@lucene.apache.org
> Subject: Re: Performing DIH on predefined list of IDS
> 
> Yes,  you right,  I am not using a DB. 
>  SolrEntityProcessor is using a GET method,  so I will need to send
> relatively big URL ( something like a hundreds of ids ) hope it will be
> possible. 
> 
> Any way I think it is the only method to perform reindex if I want to
> control it and be able to continue from any point in case of failure.  
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performing-DIH-on-predefined-list-of-IDS-tp4187589p4187835.html
> Sent from the Solr - User mailing list archive at Nabble.com.
  

RE: Performing DIH on predefined list of IDS

2015-02-21 Thread steve
And I'm familiar with the setup and configuration using Python, JavaScript, and 
PHP; not at all with Java.

> Date: Sat, 21 Feb 2015 01:52:07 -0700
> From: osta...@gmail.com
> To: solr-user@lucene.apache.org
> Subject: RE: Performing DIH on predefined list of IDS
> 
> That's right, but I am not sure that if it is works with Get I will able to
> use Post without changing it. 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performing-DIH-on-predefined-list-of-IDS-tp4187589p4187838.html
> Sent from the Solr - User mailing list archive at Nabble.com.
  

RE: Performing DIH on predefined list of IDS

2015-02-21 Thread steve
Thank you! Another 4xx error that makes sense. Quoting from the Book of 
StackOverFlowhttp://stackoverflow.com/questions/2659952/maximum-length-of-http-get-request"Most
 webservers have a limit of 8192 bytes (8KB), which is usually configureable 
somewhere in the server configuration. As to the client side matter, the HTTP 
1.1 specification even warns about this, here's an extract of chapter 
3.2.1:Note: Servers ought to be cautious about depending on URI lengths above 
255 bytes, because some older client or proxy implementations might not 
properly support these lengths.The limit is in MSIE and Safari about 2KB, in 
Opera about 4KB and in Firefox about 8KB. We may thus assume that 8KB is the 
maximum possible length and that 2KB is a more affordable length to rely on at 
the server side and that 255 bytes is the safest length to assume that the 
entire URL will come in.If the limit is exceeded in either the browser or the 
server, most will just truncate the characters outside the limit without any 
warning. Some servers however may send a HTTP 414 error. If you need to send 
large data, then better use POST instead of GET. Its limit is much higher, but 
more dependent on the server used than the client. Usually up to around 2GB is 
allowed by the average webserver. This is also configureable somewhere in the 
server settings. The average server will display a server-specific 
error/exception when the POST limit is exceeded, usually as HTTP 500 error."
> From: wun...@wunderwood.org
> Subject: Re: Performing DIH on predefined list of IDS
> Date: Sat, 21 Feb 2015 09:50:46 -0800
> To: solr-user@lucene.apache.org
> 
> The HTTP protocol does not set a limit on GET URL size, but individual web 
> servers usually do. You should get a response code of “414 Request-URI Too 
> Long” when the URL is too long.
> 
> This limit is usually configurable.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
> On Feb 21, 2015, at 12:46 AM, steve  wrote:
> 
> > Careful with the GETs! There is a real, hard limit on the length of a GET 
> > url (in the low hundreds of characters). That's why a POST is so much 
> > better for complex queries; the limit is in the hundreds of MegaBytes.
> > 
> >> Date: Sat, 21 Feb 2015 01:42:03 -0700
> >> From: osta...@gmail.com
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Performing DIH on predefined list of IDS
> >> 
> >> Yes,  you right,  I am not using a DB. 
> >> SolrEntityProcessor is using a GET method,  so I will need to send
> >> relatively big URL ( something like a hundreds of ids ) hope it will be
> >> possible. 
> >> 
> >> Any way I think it is the only method to perform reindex if I want to
> >> control it and be able to continue from any point in case of failure.  
> >> 
> >> 
> >> 
> >> --
> >> View this message in context: 
> >> http://lucene.472066.n3.nabble.com/Performing-DIH-on-predefined-list-of-IDS-tp4187589p4187835.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >   
> 
  

RE: Solr synonyms logic

2015-02-21 Thread steve
SEO is search fun 
subject!http://www.academia.edu/1033371/Hyponymy_extraction_and_web_search_behavior_analysis_based_on_query_reformulation
planeta terra (planet earth),planeta (planet).Conclusion : Planet earth is a 
hyponym of planetplaneta terra (planet earth),planeta (planet).Conclusion : 
Planet earth is a hyponym of planet

> Date: Sat, 21 Feb 2015 08:12:33 -0800
> Subject: Re: Solr synonyms logic
> From: rjo...@gmail.com
> To: solr-user@lucene.apache.org
> 
> What you are describing is hyponymy.  Pastry is the hypernym.  You can
> accomplish this by not using expansion, for example:
> cannelloni => cannelloni, pastry
> 
> This has the result of adding pastry to the index.
> 
> Ryan
> 
> On Saturday, February 21, 2015, Mikhail Khludnev 
> wrote:
> 
> > Hello,
> >
> > usually debugQuery=true output explains a lot of such details.
> >
> > On Sat, Feb 21, 2015 at 10:52 AM, davym >
> > wrote:
> >
> > > Hi all,
> > >
> > > I'm querying a recipe database in Solr. By using synonyms, I'm trying to
> > > make my search a little smarter.
> > >
> > > What I'm trying to do here, is that a search for pastry returns all
> > > lasagne,
> > > penne & cannelloni recipes.
> > > However a search for lasagne should only return lasagne recipes.
> > >
> > > In my synonyms.txt, I have these lines:
> > > -
> > > lasagne,pastry
> > > penne,pastry
> > > cannelloni,pastry
> > > -
> > >
> > > Filter in my scheme.xml looks like this:
> > >  > > ignoreCase="true" expand="true"
> > > tokenizerFactory="solr.WhitespaceTokenizerFactory" />
> > > Only in the index analyzer, not in the query.
> > >
> > > When using the Solr analysis tool, I can see that my index for lasagne
> > has
> > > a
> > > synonym pastry and my query only queries lasagne. Same for penne and
> > > cannelloni, they both have the synonym pastry.
> > >
> > > Currently my Solr query for lasagne also returns all penne and cannelloni
> > > recipes. I cannot understand why this is the case.
> > >
> > > Can someone explain this behaviour to me please?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > > http://lucene.472066.n3.nabble.com/Solr-synonyms-logic-tp4187827.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > >
> 
  

RE: Select and Update after with DataImportHandler

2015-03-17 Thread steve
Hi, maybe I'm missing the point here, but there could be a separate 
table/database that has a record inserted after the full import is completed; 
this could be part of the same "batch" or script file, or one that is "chained" 
after the original query completes.

> From: g...@idieikon.com
> To: solr-user@lucene.apache.org
> Subject: Select and Update after with DataImportHandler
> Date: Tue, 17 Mar 2015 09:47:25 +0100
> 
> Hello, we have a sql to retrieve fields from DB and they are indexed in a 
> full-import defined in a data-config.xml.
> 
> It works as expected but now we want to update the database fields so we know 
> when data was imported.
> So we add to the table being indexed a timestamp. Now we have table1 field1 
> field2... and fieldN (the timestamp).
> 
> Is there any way to indicate "after you finish the select of the full-import, 
> now you have to do this sql update of the timestamp in the database table"?
> 
> We use Solr 4.6.0
> 
> 
> We've tried in the same sql, with cursors or transactions, etc, but the sql 
> select is pretty complex and we can't make it work in such way, but perhaps 
> the only way to do it is to substitute the select with some kind of 
> transaction but we don’t know if this is possible to do in the query for the 
> full-import.
> 
> Regards.
> 
> 
> ---
> El software de antivirus Avast ha analizado este correo electrónico en busca 
> de virus.
> http://www.avast.com
> 
> 
  

RE: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread steve
FYI:http://www.w3schools.com/tags/ref_country_codes.aspCZECH REPUBLICCZNo entry 
for CS
> From: md...@apache.org
> Date: Tue, 17 Mar 2015 12:45:57 -0500
> Subject: Re: Which one is it "cs" or "cz" for Czech language?
> To: solr-user@lucene.apache.org
> 
> Probably a historical artifact.
> 
> cz is the country code for the Czech Republic, cs is the language code for
> Czech. Once, cs was also the country code for Czechosolvakia, leading some
> folks to accidentally conflate the two.
> 
> On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru 
> wrote:
> 
> > Hi,
> >
> > First of all, a bit of a disclaimer: I am not a Czech language speaker, at
> > all.
> >
> > We are using Solr's dynamic fields in our project (XWiki), and we have
> > recently noticed a problem [1] with the Czech language.
> >
> > Basically, our mapping says something like this:
> >
> >  > multiValued="true" />
> >
> > ...but at runtime, we ask for the language code "cs" (which is the ISO
> > language code for Czech [2]) and it obviously fails (due to the mapping).
> >
> > Now, we can easily fix this on our end by fixing the mapping to
> > name="*_cs",
> > but what we are really wondering now is why does Lucene/Solr use "cz"
> > (country code) instead of "cs" (language code) in both its "text_cz" field
> > and its "stopwords_cz.txt" file?
> >
> > Is that a mistake on the Solr/Lucene side? Is it some kind of convention?
> > Is it going to be fixed?
> >
> > Thanks,
> > Eduard
> >
> > --
> > [1] http://jira.xwiki.org/browse/XWIKI-11897
> > [2] http://en.wikipedia.org/wiki/Czech_language
> >
  

RE: Unable to perform search query after changing uniqueKey

2015-04-01 Thread steve
Gently walking into rough waters here, but if you use any API with GET, you're 
sending a URI which must be properly encoded. This has nothing to do with with 
the programming language that generates key and store pairs on the browser or 
the one(s) used on the server. Lots and lots of good folks have tripped over 
this one.http://www.w3schools.com/tags/ref_urlencode.asp
Play hard, but play safe!

> Date: Wed, 1 Apr 2015 13:58:55 +0800
> Subject: Re: Unable to perform search query after changing uniqueKey
> From: edwinye...@gmail.com
> To: solr-user@lucene.apache.org
> 
> Thanks Erick.
> 
> Yes, it is able to work correct if I do not use spaces for the field names,
> especially for the uniqueKey.
> 
> Regards,
> Edwin
> 
> 
> On 31 March 2015 at 13:58, Erick Erickson  wrote:
> 
> > I would never put spaces in my field names! Frankly I have no clue
> > what Solr does with that, but it can't be good. Solr explicitly
> > supports Java naming conventions, camel case, underscores and numbers.
> > Special symbols are frowned upon, I never use anything but upper case,
> > lower case and underscores. Actually, I don't use upper case either
> > but that's a personal preference. Other things might work, but only by
> > chance.
> >
> > Best,
> > Erick
> >
> > On Mon, Mar 30, 2015 at 8:59 PM, Zheng Lin Edwin Yeo
> >  wrote:
> > > Latest information that I've found for this is that the error only occurs
> > > for shard2.
> > >
> > > If I do a search for just shard1, those records that are assigned to
> > shard1
> > > will be able to be displayed. Only when I search for shard2 will the
> > > NullPointerException error occurs. Previously I was doing a search for
> > both
> > > shards.
> > >
> > > Is there any settings that I required to do for shard2 in order to solve
> > > this issue? Currently I have not made any changes to the shards since I
> > > created it using
> > >
> > http://localhost:8983/solr/admin/collections?action=CREATE&name=nps1&numShards=2&collection.configName=collection1
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> > > On 31 March 2015 at 09:42, Zheng Lin Edwin Yeo 
> > wrote:
> > >
> > >> Hi Erick,
> > >>
> > >> I've changed the uniqueKey from id to Item No.
> > >>
> > >> Item No
> > >>
> > >>
> > >> Below are my definitions for both the id and Item No.
> > >>
> > >>  > >> required="false" multiValued="false" />
> > >> 
> > >>
> > >> Regards,
> > >> Edwin
> > >>
> > >>
> > >> On 30 March 2015 at 23:05, Erick Erickson 
> > wrote:
> > >>
> > >>> Well, let's see the definition of your ID field, 'cause I'm puzzled.
> > >>>
> > >>> It's definitely A Bad Thing to have it be any kind of tokenized field
> > >>> though, but that's a shot in the dark.
> > >>>
> > >>> Best,
> > >>> Erick
> > >>>
> > >>> On Mon, Mar 30, 2015 at 2:17 AM, Zheng Lin Edwin Yeo
> > >>>  wrote:
> > >>> > Hi Mostafa,
> > >>> >
> > >>> > Yes, I've defined all the fields in schema.xml. It is able to work on
> > >>> the
> > >>> > version without SolrCloud, but it is not working for the one with
> > >>> SolrCloud.
> > >>> > Both of them are using the same schema.xml.
> > >>> >
> > >>> > Regards,
> > >>> > Edwin
> > >>> >
> > >>> >
> > >>> >
> > >>> > On 30 March 2015 at 14:34, Mostafa Gomaa 
> > >>> wrote:
> > >>> >
> > >>> >> Hi Zheng,
> > >>> >>
> > >>> >> It's possible that there's a problem with your schema.xml. Are all
> > >>> fields
> > >>> >> defined and have appropriate options enabled?
> > >>> >>
> > >>> >> Regards,
> > >>> >>
> > >>> >> Mostafa.
> > >>> >>
> > >>> >> On Mon, Mar 30, 2015 at 7:49 AM, Zheng Lin Edwin Yeo <
> > >>> edwinye...@gmail.com
> > >>> >> >
> > >>> >> wrote:
> > >>> >>
> > >>> >> > Hi Erick,
> > >>> >> >
> > >>> >> > I've tried that, and removed the data directory from both the
> > >>> shards. But
> > >>> >> > the same problem still occurs, so we probably can rule out the
> > >>> "memory"
> > >>> >> > issue.
> > >>> >> >
> > >>> >> > Regards,
> > >>> >> > Edwin
> > >>> >> >
> > >>> >> > On 30 March 2015 at 12:39, Erick Erickson <
> > erickerick...@gmail.com>
> > >>> >> wrote:
> > >>> >> >
> > >>> >> > > I meant shut down Solr and physically remove the entire data
> > >>> >> > > directory. Not saying this is the cure, but it can't hurt to
> > rule
> > >>> out
> > >>> >> > > the index having "memory"...
> > >>> >> > >
> > >>> >> > > Best,
> > >>> >> > > Erick
> > >>> >> > >
> > >>> >> > > On Sun, Mar 29, 2015 at 6:35 PM, Zheng Lin Edwin Yeo
> > >>> >> > >  wrote:
> > >>> >> > > > Hi Erick,
> > >>> >> > > >
> > >>> >> > > > I used the following query to delete all the index.
> > >>> >> > > >
> > >>> >> > > > http://localhost:8983/solr/update?stream.body=
> > >>> >> > > *:*
> > >>> >> > > http://localhost:8983/solr/update?stream.body=
> > >>> >> > > >
> > >>> >> > > >
> > >>> >> > > > Or is it better to physically delete the entire data
> > directory?
> > >>> >> > > >
> > >>> >> > > >
> > >>> >> > > > Regards,
> > >>> >> > > > Edwin
> > >>> >> > > >
> > >>> >> > > >
> > >>> >> > > > On 28 March 2015 at 02:27, E

New article on ZK "Poison Packet"

2015-05-08 Thread steve
While very technical and unusual, a very interesting view of the world of Linux 
and ZooKeeper Clusters...
http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/ 
  

RE: Dealing with bad apples in a SolrCloud cluster

2014-11-21 Thread steve
"Last Gasp" is the last message that Sun Storage controllers would send to each 
other when things whet sideways...
For what it's worth.

> Date: Fri, 21 Nov 2014 14:07:12 -0500
> From: michael.della.bi...@appinions.com
> To: solr-user@lucene.apache.org
> Subject: Re: Dealing with bad apples in a SolrCloud cluster
> 
> Good discussion topic.
> 
> I'm wondering if Solr doesn't need some sort of "shoot the other node in 
> the head" functionality.
> 
> We ran into one of failure modes that only AWS can dream up recently, 
> where for an extended amount of time, two nodes in the same placement 
> group couldn't talk to one another, but they could both see Zookeeper, 
> so nothing was marked as down.
> 
> I've written a basic monitoring script that periodically tries to access 
> every node in the cluster from every other node, but I haven't gotten to 
> the point that I've automated anything based on that. It does trigger 
> now and again for brief moments of time.
> 
> It'd be nice if there was some way the cluster could achieve some 
> consensus that a particular node is a bad apple, and evict it from 
> collections that have other active replicas. Not sure what the logic 
> would be that would allow it to rejoin those collections after the 
> situation passed, however.
> 
> Michael
> 
> On 11/21/14 13:54, Timothy Potter wrote:
> > Just soliciting some advice from the community ...
> >
> > Let's say I have a 10-node SolrCloud cluster and have a single collection
> > with 2 shards with replication factor 10, so basically each shard has one
> > replica on each of my nodes.
> >
> > Now imagine one of those nodes starts getting into a bad state and starts
> > to be slow about serving queries (not bad enough to crash outright though)
> > ... I'm sure we could ponder any number of ways a box might slow down
> > without crashing.
> >
> >  From my calculations, about 2/10ths of the queries will now be affected
> > since
> >
> > 1/10 queries from client apps will hit the bad apple
> >+
> > 1/10 queries from other replicas will hit the bad apple (distrib=false)
> >
> >
> > If QPS is high enough and the bad apple is slow enough, things can start to
> > get out of control pretty fast, esp. since we've set max threads so high to
> > avoid distributed dead-lock.
> >
> > What have others done to mitigate this risk? Anything we can do in Solr to
> > help deal with this? It seems reasonable that nodes can identify a bad
> > apple by keeping track of query times and looking for nodes that are
> > significantly outside (>=2 stddev) what the other nodes are doing. Then
> > maybe mark the node as being down in ZooKeeper so clients and other nodes
> > stop trying to send requests to it; or maybe a simple policy of just don't
> > send requests to that node for a few minutes.
> >
> 
  

RE: Too much data after closed for HttpChannelOverHttp

2014-11-23 Thread steve



For what it's worth, depending on the type of PC/MAC you're using, you can use 
WireShark to look at active http header (sent and received) that are being 
created for the request.
https://www.wireshark.org/
I don't have any financial interest in them, but the stuff works!
Steve

> Date: Sun, 23 Nov 2014 20:47:05 +0100
> Subject: Re: Too much data after closed for HttpChannelOverHttp
> From: h.benoud...@gmail.com
> To: solr-user@lucene.apache.org
> 
> Actually I'm using a php client (I think it sends a HTTP request to Solr),
> but you're right tomorrow once I'll get to the office, I'll set chunk size
> to a smaller value, and will tell you if that was the reason.
> 
> Thanks.
> 
> 2014-11-23 19:35 GMT+01:00 Alexandre Rafalovitch :
> 
> > Most probably just a request that's too large. Have you tried dropping
> > down to 500 items and seeing what happens?
> >
> > Are you using SolrJ to send content to Solr? Or a direct HTTP request?
> >
> > Regards,
> >Alex.
> > P.s. You may also find it useful to read up on the Solr commit and
> > hard vs. soft commits. Check solrconfig.xml in the example
> > distribution.
> > Personal: http://www.outerthoughts.com/ and @arafalov
> > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
> >
> >
> > On 23 November 2014 at 12:31, Hakim Benoudjit 
> > wrote:
> > > Hi there,
> > >
> > > I have deployed solr with Jetty, and I'm trying to index a quite large
> > > amount of items (300K), retreived from a MySQL database (unfortunately
> > I'm
> > > not using DIH; I'm doing it manually, by getting items from MySQL and
> > then
> > > index them it in Solr).
> > >
> > > But, I'm not indexing all of those items at the same time; I'm indexing
> > > them by chunks of 3K.
> > > So, I get the first 3K, index them, then goes to the next 3K chunk to
> > index
> > > it.
> > >
> > > Here is the error I got in jetty logs, I guess it has nothing to do with
> > > Mysql:
> > > *Does anyone know the meaning of the error 'badMessage:
> > > java.lang.IllegalStateException: too much data after closed for
> > > HttpChannelOverHttp@5432494a' ?*
> > >
> > > Thanks for your help, if anything isnt very precise please tell me to
> > > explain it (and sorry for my bad english).
> > >
> > > --
> > > Cordialement,
> > > Best regards,
> > > Hakim Benoudjit
> >
> 
> 
> 
> -- 
> Cordialement,
> Best regards,
> Hakim Benoudjit

  

RE: Tika HTTP 400 Errors with DIH

2014-12-05 Thread steve
Likely a good http debugger would help (wireshark, or fiddler2, for example)
http://www.telerik.com/fiddler
https://www.wireshark.org/download.html
For example, it could show the http header that the "client" uses to request 
info from an api, then the show results of that query. One small caveat: I have 
not tried this with "standalone" server or with any SOLR type project.
Cheers!Steve

> From: teag...@insystechinc.com
> To: solr-user@lucene.apache.org
> Subject: RE: Tika HTTP 400 Errors with DIH
> Date: Fri, 5 Dec 2014 12:03:23 -0500
> 
> Alex,
> 
> Your suggestion might be a solution, but the issue isn't that the resource 
> isn't found. Like Walter said 400 is a "bad request" which makes me wonder, 
> what is the DIH/Tika doing when trying to access the documents? What is the 
> "request" that is bad? Is there any other way to suss this out? Placing a 
> network monitor in this case would be on the extreme end of difficult.
> 
> I know that the URL stored is good and that the resource exists by copying it 
> out of a Solr query and pasting it into the browser, so that eliminates 404 
> and 500 errors. Is the format of the URL correct? Is there some other setting 
> I've missed?
> 
> I appreciate the suggestions!
> 
> -Teague
> 
> 
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
> Sent: Thursday, December 04, 2014 12:22 PM
> To: solr-user
> Subject: Re: Tika HTTP 400 Errors with DIH
> 
> Right. Resource not found (on server).
> 
> The end result is the same. If it works in the browser but not from the 
> application than either not the same URL is being requested or - somehow - 
> not even the same server.
> 
> The solution (watching network traffic) is still the same, right?
> 
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and 
> newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers 
> community: https://www.linkedin.com/groups?gid=6713853
> 
> 
> On 4 December 2014 at 11:51, Walter Underwood  wrote:
> > No, 400 should mean that the request was bad. When the server fails, that 
> > is a 500.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/
> >
> >
> > On Dec 4, 2014, at 8:43 AM, Alexandre Rafalovitch  
> > wrote:
> >
> >> 400 error means something wrong on the server (resource not found).
> >> So, it would be useful to see what URL is actually being requested.
> >>
> >> Can you run some sort of network tracer to see the actual network 
> >> request (dtrace, Wireshark, etc)? That will dissect the problem into 
> >> half for you.
> >>
> >> Regards,
> >>   Alex.
> >> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
> >> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
> >> popularizers community: https://www.linkedin.com/groups?gid=6713853
> >>
> >>
> >> On 4 December 2014 at 09:42, Teague James  wrote:
> >>> The database stores the URL as a CLOB. Querying Solr shows that the field 
> >>> value is "http://www.someaddress.com/documents/document1.docx";
> >>> The URL works if I copy and paste it to the browser, but Tika gets a 400 
> >>> error.
> >>>
> >>> Any ideas?
> >>>
> >>> Thanks!
> >>> -Teague
> >>> -Original Message-
> >>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> >>> Sent: Tuesday, December 02, 2014 1:45 PM
> >>> To: solr-user
> >>> Subject: Re: Tika HTTP 400 Errors with DIH
> >>>
> >>> On 2 December 2014 at 13:19, Teague James  
> >>> wrote:
> >>>> clob="true"
> >>>
> >>> What does ClobTransformer is doing on the DownloadURL field? Is it 
> >>> possible it is corrupting the value somehow?
> >>>
> >>> Regards,
> >>>   Alex.
> >>>
> >>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
> >>> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
> >>> popularizers community: https://www.linkedin.com/groups?gid=6713853
> >>>
> >
> 
  

RE: To understand SolrCloud configurations

2014-12-15 Thread steve
+1

> Date: Mon, 15 Dec 2014 21:44:44 +1300
> Subject: Re: To understand SolrCloud configurations
> From: esj.f...@gmail.com
> To: solr-user@lucene.apache.org
> 
> HI Shawn,
> 
> Thanks, You have answered my question to a certain extend, But I wanted to
> Isolate Solr Cloud from application and do some load testing by setting up
> Jmeter Script. I could hit Solr instances, but it will not simulate how
> Application (Client) will deal with Solr Cloud. Any suggestions for a
> better way of achieving this?
> 
> I want to do this, Solr keeps failing when we index large data and can't
> find anything in the logs as well, I wanted to identify where it's failing,
> I thought of using the above approach to make sure Solr and zookeeper setup
> is correct ,
> Also, I would like to know a better way to debug Zookeeper and Solr, What I
> have done so far is,
> 
> 1. Make sure Zookeeper picking a Solr leader when the existing sun goes
> down.
> 2. Setup is working when one (lead) ZooKeeper is down etc ...
> 3. Access Server runtime and see the data
> 
> Thanks,
> Shanaka
> 
> On 15 December 2014 at 19:22, Shawn Heisey  wrote:
> >
> > On 12/14/2014 12:41 PM, E S J wrote:
> > > This question is related to the same configurations I've posted. How
> > should
> > > I manually test indexing via Zookeeper, I mean not directly accessing
> > solr
> > > nodes like,
> > > curl http://solr1.internal:7083/solr/c-ins/update?commit=true -H
> > > "Content-Type: text/xml" -d "@mem.xml"
> > >
> > > I have a solr client which uses CloudSolrServer to send request to
> > > SolrCloud, But my intention is to isolate my SolrCloud and send index &
> > > search requests to make sure Solr Cloud setup is working fine. Later I
> > can
> > > do Solrclient integration testing. How should I send index requests
> > > manually ( like curl) to index data to solrcloud such a way
> > CloudSolrServer
> > > use ZooKeeper to LB/Pick Solr instance ?
> >
> > If you have either single-shard collections or multi-shard collections
> > with automatic routing, SolrCloud is designed so that you can send any
> > kind of request to any machine in the entire cloud, and it will be sent
> > where it needs to go.  If the collection uses manual (implicit) routing,
> > then queries can go anywhere, but updates must be directed to the
> > correct shard.
> >
> > If you are not using CloudSolrServer, then you must either set up a load
> > balancer in front of SolrCloud, or your application will need to know
> > where your Solr servers are.  Curl cannot talk to zookeeper, because
> > zookeeper does not speak HTTP.
> >
> > CloudSolrServer allows your application to specify only the zookeeper
> > hosts, it doesn't need to know where the Solr servers are.  This is
> > because it includes a full zookeeper client.
> >
> > There is an API in Solr at /solr/zookeeper that can, with appropriate
> > parameters, return various pieces of information from zookeeper in JSON
> > format.  This is the place where the admin UI gathers the information
> > necessary to create the various options on the Cloud tab.  Once your
> > application has that information, it can use it to find out the Solr
> > URLs to use.
> >
> > If this doesn't answer your question, please clarify it.
> >
> > Thanks,
> > Shawn
> >
> >
  

RE: poor performance when connecting to CloudSolrServer(zkHosts) using solrJ

2015-01-01 Thread steve
While I'm not a net optimization whiz, a properly configured DNS client will 
"cache" the recent resolved lookups; this way even though you are referring to 
the Fully Qualified Domain Name (FQDN), the local DNS client will return the 
recently acquired IP address (within the constraints of the Domain's 
configuration). In other words, while there is "overhead" between the local 
workstation/computer and the DNS client, it will NOT require access to the 
configured DNS server "upstream".
Enjoy,Steve

> Date: Thu, 1 Jan 2015 14:30:19 -0800
> Subject: Re: poor performance when connecting to CloudSolrServer(zkHosts) 
> using solrJ
> From: mohd.huss...@gmail.com
> To: solr-user@lucene.apache.org
> 
> My two cents, do check network connectivity. In past I remember changing
> the zookeeper server name to actual IP improved the speed a bit.
> DNS sometimes take time to resolve hostname. Could be worth trying this
> option.
> 
> 
> Thanks
> -Hussain
> 
> On Mon, Dec 29, 2014 at 6:31 PM, Shawn Heisey  wrote:
> 
> > On 12/29/2014 6:52 PM, zhangjia...@dcits.com wrote:
> > >   I setups a SolrCloud, and code a simple solrJ program to query solr
> > > data as below, but it takes about 40 seconds to new CloudSolrServer
> > > instance,less than 100 miliseconds is acceptable. what is going on when
> > new
> > > CloudSolrServer? and how to fix this issue?
> > >
> > >   String zkHost = "bicenter1.dcc:2181,datanode2.dcc:2181";
> > >   String defaultCollection = "hdfsCollection";
> > >
> > >   long startms=System.currentTimeMillis();
> > >   CloudSolrServer server = new CloudSolrServer(zkHost);
> > >   server.setDefaultCollection(defaultCollection);
> > >   server.setZkConnectTimeout(3000);
> > >   server.setZkClientTimeout(6000);
> > >   long endms=System.currentTimeMillis();
> > >   System.out.println(endms-startms);
> > >
> > >   ModifiableSolrParams params = new ModifiableSolrParams();
> > >   params.set("q", "id:*hbase*");
> > >   params.set("sort", "price desc");
> > >   params.set("start", "0");
> > >   params.set("rows", "10");
> > >
> > >   try {
> > >   QueryResponse response=server.query(params);
> > >   SolrDocumentList results = response.getResults();
> > >   for (SolrDocument doc:results) {
> > >   String rowkey=doc.getFieldValue("id").toString();
> > >   }
> > >
> > >   } catch (SolrServerException e) {
> > >   // TODO Auto-generated catch block
> > >   e.printStackTrace();
> > >   }
> > >
> > >   server.shutdown();
> >
> > The only part of the constructor for CloudSolrServer that I cannot
> > easily look at is the part that creates the httpclient, because
> > ultimately that calls code outside of Solr, in the HttpComponents
> > project.  Everything that I *can* see is code that should happen
> > extremely quickly, and the httpclient creation code is something that I
> > have used myself and never had any noticeable delay.  The constructor
> > for CloudSolrServer does *NOT* contact zookeeper or Solr, it merely sets
> > up the instance.  Nothing is contacted until a request is made.  I
> > examined the CloudSolrServer code from branch_5x.
> >
> > I tried out your code (with SolrJ 4.6.0 against a SolrCloud 4.2.1
> > cluster).  Although the query itself encountered an exception in
> > zookeeper (probably from the version discrepancy between Solr and
> > SolrJ), the elapsed time printed out from the CloudSolrServer
> > initialization was 240 milliseconds on the first run, 60 milliseconds on
> > a second run, and 64 milliseconds on a third run.  Those are all MUCH
> > less than the 1000 milliseconds that would represent one second, and
> > incredibly less than the 4 milliseconds that would represent 40
> > seconds.
> >
> > Side issue:  I hope that you have more than two zookeeper servers in
> > your ensemble.  A two-node zookeeper ensemble is actually *less*
> > reliable than a single node, because a failure of EITHER of those two
> > nodes will result in a loss of quorum.  Three nodes is the minimum
> > required for a redundant zookeeper ensemble.
> >
> > Thanks,
> > Shawn
> >
> >
  

RE: De Duplication using Solr

2015-01-03 Thread steve
One possible "match" is using Python's FuzzyWuzzy
https://github.com/seatgeek/fuzzywuzzy
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/

> Date: Sat, 3 Jan 2015 13:24:17 +0530
> Subject: De Duplication using Solr
> From: shanuu@gmail.com
> To: solr-user@lucene.apache.org
> 
> I am trying to find out duplicate records based on distance and phonetic
> algorithms. Can I utilize solr for that? I have following fields and
> conditions to identify exact or possible duplicates.
> 
> 1. Fields
> prefix
> suffix
> firstname
> lastname
> email(primary_email1, email2, email3)
> phone(primary_phone1, phone2, phone3)
> 2. Conditions:
> Two records said to be exact duplicates if
> 
> 1. IsExactMatchFunction(record1_prefix, record2_prefix) AND
> IsExactMatchFunction(record1_suffix, record2_suffix) AND
> IsExactMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
> Two records said to be possible duplicates if
> 
> 1. IsExactMatchFunction(record1_prefix, record2_prefix) OR
> IsExactMatchFunction(record1_suffix, record2_suffix) OR
> IsExactMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
>  ELSE
>  2. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
>  ELSE
>  3. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_any_email,record2_any_email) OR
> IsExactMatchFunction(record1_any_phone,record2_any_primary)
> 
> IsFuzzyMatchFunction() will perform distance and phonetic algorithms
> calculation and compare it with predefined threshold.
> 
> For example:
> 
> if threshold defined for firsname is 85 and IsFuzzyMatchFunction() function
> only return "ture" only and only if one of the algorithms(distance or
> phonetic) return the similarity socre >= 85.
> 
> Can I use solr to perform this job. Or Can you guys suggest how can I
> approach to this problem. I have seen the duke(De duplication API) but I
> can not use duke out of the box.
  

OR query with multiple fields

2012-10-03 Thread Steve
If I search for

q=!categoryid:3876021, solr correctly tells me there are two million plus hits.

If I search for

q=mfrid:18678, solr tell me there are 50,314 hits

I want to combine those two results, so I try

q=!categoryid:3876021 OR mfrid:18678

I would have expected two million plus results, instead, I get 50,314.

Is there a way to do a union of results from searching two different fields?

Curiously, if I remove the ! I do get the union of the results. But I want all 
categories except that one, plus all mfr id I specified.

Re: OR query with multiple fields

2012-10-03 Thread Steve
On Oct 3, 2012, at 3:16 PM, Michael Della Bitta 
 wrote:

> Leading off with a negation does weird things. Try
> 
> (*:* AND NOT categoryid:387602) OR mfrid:18678
> 
> Michael Della Bitta
> 

Yep, works fine in this manner. So, the problem indeed is the leading negate, 
even with parentheses, it ignores them with leading negate.

Thanks!

Re: Running Lucene/SOR on Hadoop

2016-01-09 Thread Steve Davids
You might consider trying to get the de-duplication done at index time:
https://cwiki.apache.org/confluence/display/solr/De-Duplication that way
the map reduce job wouldn't even be necessary.

When it comes to the map reduce job, you would need to be more specific
with *what* you are doing for people to try and help, are you attempting to
query for every record of all 40 million rows - how many mapper tasks? But
right off the bat I see you are using Java's HttpURLConnection, you should
really use SolrJ for querying purposes:
https://cwiki.apache.org/confluence/display/solr/Using+SolrJ you won't need
to deal with xml parsing and it uses Apache's HttpClient with much more
reasonable defaults.

-Steve

On Thu, Dec 24, 2015 at 11:28 PM, Dino Chopins 
wrote:

> Hi Erick,
>
> Thank you for your response and pointer. What I mean by running Lucene/SOLR
> on Hadoop is to have Lucene/SOLR index available to be queried using
> mapreduce or any best practice recommended.
>
> I need to have this mechanism to do large scale row deduplication. Let me
> elaborate why I need this:
>
>1. I have two data sources with 35 and 40 million records of customer
>profile - the data come from two systems (SAP and MS CRM)
>2. Need to index and compare row by row of the two data sources using
>name, address, birth date, phone and email field. For birth date and
> email
>it will use exact comparison, but for the other fields will use
>probabilistic comparison. Btw, the data has been normalized before they
> are
>being indexed.
>3. Each finding will be categorized under same person, and will be
>deduplicated automatically or under user intervention depending on the
>score.
>
> I usually use it using Lucene index on local filesystem and use term
> vector, but since this will be repeated task and then challenged by
> management to do this on top of Hadoop cluster I need to have a framework
> or best practice to do this.
>
> I understand that to have Lucene index on HDFS is not very appropriate
> since HDFS is designed for large block operation. With that understanding,
> I use SOLR and hope to query it using http call from mapreduce job.  The
> snippet code is below.
>
> url = new URL(SOLR-Query-URL);
>
> HttpURLConnection connection = (HttpURLConnection)
> url.openConnection();
> connection.setRequestMethod("GET");
>
> The later method turns out to perform very bad. The simple mapreduce job
> that only read the data sources and write to hdfs takes 15 minutes, but
> once I do the http request it takes three hours now and still ongoing.
>
> What went wrong? And what will be solution to my problem?
>
> Thanks,
>
> Dino
>
> On Mon, Dec 14, 2015 at 12:30 AM, Erick Erickson 
> wrote:
>
> > First, what do you mean "run Lucene/Solr on Hadoop"?
> >
> > You can use the HdfsDirectoryFactory to store Solr/Lucene
> > indexes on Hadoop, at that point the actual filesystem
> > that holds the index is transparent to the end user, you just
> > use Solr as you would if it was using indexes on the local
> > file system. See:
> > https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
> >
> > If you want to use Map-Reduce to _build_ indexes, see the
> > MapReduceIndexerTool in the Solr contrib area.
> >
> > Best,
> > Erick
> >
>
>
>
>
> --
> Regards,
>
> Dino
>


Re: Solr search and index rate optimization

2016-01-09 Thread Steve Davids
bq. There's no good reason to have 5 with a small cluster and by "small" I
mean < 100s of nodes.

Well, a good reason would be if you want your system to continue to operate
if 2 ZK nodes lose communication with the rest of the cluster or go down
completely. Just to be clear though, the ZK nodes definitely don't need to
be beefy machines compared to your Solr data nodes since they are just
doing light-weight orchestration. But yea, for a 2 data node system one
might be willing to go with a 3 node ensemble to tolerate a single ZK
node dying, just depends on how much cash you are willing to spend and
availability level you are looking for.

-Steve


On Fri, Jan 8, 2016 at 12:07 PM, Erick Erickson 
wrote:

> Here's a longer form of Toke's answer:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> BTW, on the surface, having 5 ZK nodes isn't doing you any real good.
> Zookeeper isn't really involved in serving queries or handling
> updates, it's purpose is to have the state of the cluster (nodes up,
> recovering, down, etc) and notify Solr listeners when that state
> changes. There's no good reason to have 5 with a small cluster and by
> "small" I mean < 100s of nodes.
>
> Best,
> Erick
>
> On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsen 
> wrote:
> > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote:
> >> i wanted to ask that i need to index after evey 15 min with hard commit
> >> (real time records) and currently have 5 zookeeper instances and 2 solr
> >> instances in one machine serving 200 users with 32GB RAM. whereas i
> wanted
> >> to serve more than 10,000 users so what should be my machine specs and
> what
> >> should be my architecture for this much serve rate along with index
> rate.
> >
> > It depends on your system and if we were forced to guess, our guess
> > would be very loose.
> >
> >
> > Fortunately you do have a running system with real queries: Make a copy
> > on two similar machines (you will probably need more hardware anyway)
> > and simulate growing traffic, measuring response times at appropriate
> > points: 200 users, 500, 1000, 2000 etc.
> >
> > If you are very lucky, your current system scales all the way. If not,
> > you should have enough data to make an educated guess of the amount of
> > machines you need. You should have at least 3 measuring point to
> > extrapolate from as scaling is not always linear.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
>


Re: schemaless vs schema based core

2016-01-22 Thread Steve Rowe
Yes, and also underflow in the case of double/float.

--
Steve
www.lucidworks.com

> On Jan 22, 2016, at 12:25 PM, Shyam R  wrote:
> 
> I think, schema-less mode might allocate double instead of float, long
> instead of int to guard against overflow, which increases index size. Is my
> assumption valid?
> 
> Thanks
> 
> 
> 
> 
> On Thu, Jan 21, 2016 at 10:48 PM, Erick Erickson 
> wrote:
> 
>> I guess it's all about whether schemaless really supports
>> 1> all the docs you index.
>> 2> all the use-cases for search.
>> 3> the assumptions it makes scale to you needs.
>> 
>> If you've established rigorous tests and schemaless does all of the
>> above, I'm all for shortening the cycle by using schemaless.
>> 
>> But if it's just being sloppy and "success" is "I managed to index 50
>> docs and get some results back by searching", expect to find some
>> "interesting" issues down the road.
>> 
>> And finally, if it's "we use schemaless to quickly try things in the
>> UI and for the _real_ prod environment we need to be more rigorous
>> about the schema", well shortening development time is A Good Thing.
>> Part of moving to prod could be taking the schema generated by
>> schemaless and tweaking it for instance.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Jan 21, 2016 at 8:54 AM, Shawn Heisey  wrote:
>>> On 1/21/2016 2:22 AM, Prateek Jain J wrote:
>>>> Thanks Erick,
>>>> 
>>>> Yes, I took same approach as suggested by you. The issue is some
>> developers started with schemaless configuration and now they have started
>> liking it and avoiding restrictions (including increased time to deploy
>> application, in managed enterprise environment). I was more concerned about
>> pushing best practices around this in team, because allowing anyone to new
>> attributes will become overhead in terms of management, security and
>> maintainability. Regarding your concern about not storing documents on
>> separate disk; we are storing them in solr but not as backup copies. One
>> doubt still remains in mind w.r.t auto-detection of types in  solr:
>>>> 
>>>> Is there a performance benefit of using defined types (schema based)
>> vs un-defined types while adding documents? Does "solrj" ships this
>> meta-information like type of attributes to solr, because code looks
>> something like?
>>>> 
>>>> SolrInputDocument doc = new SolrInputDocument();
>>>>  doc.addField("category", "book"); // String
>>>>  doc.addField("id", 1234); //Long
>>>>  doc.addField("name", "Trying solrj"); //String
>>>> 
>>>> In my opinion, any auto-detector code will have some overhead vs the
>> other; any thoughts around this?
>>> 
>>> Although the true reality may be more complex, you should consider that
>>> everything Solr receives from SolrJ will be text -- as if you had sent
>>> the JSON or XML indexing format manually, which has no type information.
>>> 
>>> When you are building a document with SolrInputDocument, SolrJ has no
>>> knowledge of the schema in Solr.  It doesn't know whether the target
>>> field is numeric, string, date, or something else.
>>> 
>>> Using different object types for input to SolrJ just gives you general
>>> Java benefits -- things like detecting certain programming errors at
>>> compile time.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>> 
> 
> 
> 
> -- 
> Ph: 9845704792



Re: How to convert string field to date

2016-01-28 Thread Steve Rowe
Hi Sreenivasa,

This is a known bug: https://issues.apache.org/jira/browse/SOLR-8607

(though the problem is not just about catch-all fields as the issue currently 
indicates - all dynamic fields are affected)

Two workarounds (neither tested):

1. Add attr_date via add-dynamic-field instead of add-field (even though the 
name has no asterisk)
2. Remove the attr_* dynamic field, add attr-date, then add attr_* back; these 
can be done with a single request.

I’ll update SOLR_8607 to reflect these things.

--
Steve
www.lucidworks.com

> On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) 
>  wrote:
> 
> Hi,
>   I am new to solr.
> 
> I am using managed-schema. I am not using schema.xml.  I am indexing outlook 
> email messages.
> I can see only see three fields ( id,_version_,_text_) defined in 
> managed-schema. Remaining fields are
> handled by following dynamic field
>  multiValued="true"/>
> 
> I have field name attr_date with type string. I want convert this field type 
> to date. Currently date range is not
> working on this field. I tried schema API to add new field attr_date and got 
> following error message
> "Field 'attr_date' already exists".  I tried to replace field type to date 
> and got following error message
> "The field 'attr_date' is not present in this schema, and so cannot be 
> replaced".
> 
> Please help me to convert "attr_date"  field type to date.
> 
> Advanced Thanks.
> --sreenivasa kallu
> 
> 



Re: How to convert string field to date

2016-01-28 Thread Steve Rowe
Try workaround 2, I did and it worked for me.  See my comment on the issue: 
<https://issues.apache.org/jira/browse/SOLR-8607?focusedCommentId=15122751&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15122751>

--
Steve
www.lucidworks.com

> On Jan 28, 2016, at 6:45 PM, Kallu, Sreenivasa (HQP) 
>  wrote:
> 
> Thanks steve for prompt response.
> 
> I tried workaround one. 
> i.e.  1. Add attr_date via add-dynamic-field instead of add-field (even 
> though the name has no asterisk)
> 
> I am able to add dynamic field  attr_date. But while starting the solr , I am 
> getting following message.
> Could not load conf for core sreenimsg: Dynamic field name 'attr_date' should 
> have either a leading or a trailing asterisk, and no others.
> 
> So solr looking for either leading * or trailing * in the dynamic field name.
> 
> I can see similar problems in workaround 2.
> 
> Any other suggestions?
> 
> Advanced Thanks.
> --sreenivasa kallu
> 
> -Original Message-
> From: Steve Rowe [mailto:sar...@gmail.com] 
> Sent: Thursday, January 28, 2016 1:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to convert string field to date
> 
> Hi Sreenivasa,
> 
> This is a known bug: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D8607&d=CwIFaQ&c=19TEyCb-E0do3cLmFgm9ItTXlbGQ5gmhRAlAtE256go&r=ZV-VnW_JFfcZo8vYJrpehzAvJFfw1xE42YRKpSHHqLg&m=ZJBCYIV-H5H3u5j_Rrhaex68Eb9dgqZmlO6fzKNfr8s&s=qmQIR8akquwcJ83E7HZgK38lTfSug8QifJEH1_ljJkk&e=
>  
> 
> (though the problem is not just about catch-all fields as the issue currently 
> indicates - all dynamic fields are affected)
> 
> Two workarounds (neither tested):
> 
> 1. Add attr_date via add-dynamic-field instead of add-field (even though the 
> name has no asterisk) 2. Remove the attr_* dynamic field, add attr-date, then 
> add attr_* back; these can be done with a single request.
> 
> I’ll update SOLR_8607 to reflect these things.
> 
> --
> Steve
> www.lucidworks.com
> 
>> On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) 
>>  wrote:
>> 
>> Hi,
>>  I am new to solr.
>> 
>> I am using managed-schema. I am not using schema.xml.  I am indexing outlook 
>> email messages.
>> I can see only see three fields ( id,_version_,_text_) defined in 
>> managed-schema. Remaining fields are handled by following dynamic 
>> field > stored="true" multiValued="true"/>
>> 
>> I have field name attr_date with type string. I want convert this 
>> field type to date. Currently date range is not working on this field. 
>> I tried schema API to add new field attr_date and got following error 
>> message "Field 'attr_date' already exists".  I tried to replace field type 
>> to date and got following error message "The field 'attr_date' is not 
>> present in this schema, and so cannot be replaced".
>> 
>> Please help me to convert "attr_date"  field type to date.
>> 
>> Advanced Thanks.
>> --sreenivasa kallu
>> 
>> 
> 



RE: Spatial Search on Postal Code

2016-03-05 Thread steve shepard
re: Postal Codes and polygons. I've heard of basic techniques that use Commerce 
Department (or was it Census within Commerce??) that give the basic points, but 
the real run is deciding what the "center" of that polygon is. There is likely 
a commercial solution available, and certainly you can buy a spreadsheet with 
the zipcodes and their guestimated center. Fun project!

> Subject: Re: Spatial Search on Postal Code
> To: solr-user@lucene.apache.org
> From: emir.arnauto...@sematext.com
> Date: Fri, 4 Mar 2016 21:18:10 +0100
> 
> Hi Manohar,
> I don't think there is such functionality in Solr - you need to do it on 
> client side:
> 1. find some postal code polygons (you can use open street map - 
> http://wiki.openstreetmap.org/wiki/Key:postal_code)
> 2. create zip to polygon lookup
> 3. create code that will expand zip code polygon by some distance (you 
> can use JTS buffer api)
> 
> On query time you get zip code and distance:
> 1. find polygon for zip
> 2. expand polygon
> 3. send resulting polygon to Solr and use Intersects function to filter 
> results
> 
> Regards,
> Emir
> 
> On 04.03.2016 19:49, Manohar Sripada wrote:
> > Thanks Emir,
> >
> > Obviously #2 approach is much better. I know its not straight forward. But,
> > is it really acheivable in Solr? Like building a polygon for a postal code.
> > If so, can you throw some light how to do?
> >
> > Thanks,
> > Manohar
> >
> > On Friday, March 4, 2016, Emir Arnautovic 
> > wrote:
> >
> >> Hi Manohar,
> >> This depends on your requirements/usecase. If postal code is interpreted
> >> as point than it is expected to have radius that is significantly larger
> >> than postal code diameter. In such case you can go with first approach. In
> >> order to avoid missing results from postal code in case of small search
> >> radius and large postal code, you can reverse geocode records and store
> >> postal code with each document.
> >> If you need to handle distance from postal code precisely - distance from
> >> its border, you have to get postal code polygon, expand it by search
> >> distance and use resulting polygon to find matches.
> >>
> >> HTH,
> >> Emir
> >>
> >> On 04.03.2016 13:09, Manohar Sripada wrote:
> >>
> >>> Here's my requirement -  User enters postal code and provides the radius.
> >>> I
> >>> need to find the records with in the radius from the provided postal code.
> >>>
> >>> There are few ways I thought through after going through the "Spatial
> >>> Search" Solr wiki
> >>>
> >>> 1. As Latitude and Longitude positions are required for spatial search.
> >>> Get
> >>> Latitude Longitude position (may be using GeoCoding API) of a postal code
> >>> and use "LatLonType" field type and query accordingly. As the GeoCoding
> >>> API
> >>> returns one point and if the postal code area is too big, then I may end
> >>> up
> >>> not getting any results (apart from the records from the same postal code)
> >>> if the radius provided is small.
> >>>
> >>> 2. Get the latitude longitude points of the postal code which forms a
> >>> border (not sure yet on how to get) and build a polygon (using RPT). While
> >>> querying use this polygon and provide the distance. Can this be achieved?
> >>> Or Am I ruminating too much? :(
> >>>
> >>> Appreciate any help on this.
> >>>
> >>> Thanks
> >>>
> >>>
> >> --
> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> 
> -- 
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
  

Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-08 Thread Steve Rowe
Hi Ilan,

Looks like you’re modifying solr.in.sh instead of solr.in.cmd?

FYI running under Cygwin is not supported.

--
Steve
www.lucidworks.com

> On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
> 
> Hi all, I am trying to integrate solr with SSL on Windows 7 OS
> I followed the enable ssl guide at
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
> 
> I created the keystore and placed in on etc folder. I un-commented the
> lines and set:
> SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> SOLR_SSL_KEY_STORE_PASSWORD=password
> SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> SOLR_SSL_TRUST_STORE_PASSWORD=password
> SOLR_SSL_NEED_CLIENT_AUTH=false
> 
> When i test the storekey using
> keytool -list -alias solr-ssl -keystore
> C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password -keypass
> password
> It is okay, and print me there is 1 entry in keystore.
> 
> When i am running in from solr, it will write:
> "Keystore was tampered with, or password was incorrect"
> I get this exception after JavaKeyStore.engineLoad(JavaKeyStore.java:780)
> 
> 
> If i replace
> SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
> SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
> it will write the same error, i suspect i dont deliver the path as it
> should be.
> 
> Any suggestions ?
> 
> Thanks
> 
> 
> -- 
> 
> 
> -
> Ilan Schwarts



Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-08 Thread Steve Rowe
Hmm, not sure what’s happening.  Have you tried converting the backslashes in 
your paths to forward slashes?

--
Steve
www.lucidworks.com

> On Mar 8, 2016, at 3:39 PM, Ilan Schwarts  wrote:
> 
> Hi, thanks for reply.
> I am using solr.in.cmd
> I even put some pause in the cmd with echo to see the parameters are ok.. 
> This is the original file as found in 
> https://www.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.zip
> 
> 
> 
> On Tue, Mar 8, 2016 at 10:25 PM, Steve Rowe  wrote:
> Hi Ilan,
> 
> Looks like you’re modifying solr.in.sh instead of solr.in.cmd?
> 
> FYI running under Cygwin is not supported.
> 
> --
> Steve
> www.lucidworks.com
> 
> > On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
> >
> > Hi all, I am trying to integrate solr with SSL on Windows 7 OS
> > I followed the enable ssl guide at
> > https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
> >
> > I created the keystore and placed in on etc folder. I un-commented the
> > lines and set:
> > SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> > SOLR_SSL_KEY_STORE_PASSWORD=password
> > SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> > SOLR_SSL_TRUST_STORE_PASSWORD=password
> > SOLR_SSL_NEED_CLIENT_AUTH=false
> >
> > When i test the storekey using
> > keytool -list -alias solr-ssl -keystore
> > C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password -keypass
> > password
> > It is okay, and print me there is 1 entry in keystore.
> >
> > When i am running in from solr, it will write:
> > "Keystore was tampered with, or password was incorrect"
> > I get this exception after JavaKeyStore.engineLoad(JavaKeyStore.java:780)
> >
> >
> > If i replace
> > SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
> > SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
> > it will write the same error, i suspect i dont deliver the path as it
> > should be.
> >
> > Any suggestions ?
> >
> > Thanks
> >
> >
> > --
> >
> >
> > -
> > Ilan Schwarts
> 
> 
> 
> 
> -- 
> 
> 
> -
> Ilan Schwarts



Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-09 Thread Steve Rowe
So, did you try converting the backslashes to forward slashes?

You could try to increase logging to get more information: 
<http://eclipse.org/jetty/documentation/current/configuring-logging.html>

Can you provide a larger snippet of your log around the error?

Sounds like at a minimum Solr could do better at reporting errors 
locating/loading SSL stores.

Yes, the files in server/etc are being used in solr 5.2.1.

--
Steve
www.lucidworks.com

> On Mar 9, 2016, at 2:14 AM, Ilan Schwarts  wrote:
> 
> How would one try to solve this issue? What would you suggest me to do?
> Debug that module? I will try only to install clean jetty with ssl first.
> 
> Another question. The files jetty.xml\jetty-ssl.xml and the rest of files
> in /etc are being used in solr 5.2.1?
> On Mar 9, 2016 12:08 AM, "Steve Rowe"  wrote:
> 
>> Hmm, not sure what’s happening.  Have you tried converting the backslashes
>> in your paths to forward slashes?
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Mar 8, 2016, at 3:39 PM, Ilan Schwarts  wrote:
>>> 
>>> Hi, thanks for reply.
>>> I am using solr.in.cmd
>>> I even put some pause in the cmd with echo to see the parameters are
>> ok.. This is the original file as found in
>> https://www.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.zip
>>> 
>>> 
>>> 
>>> On Tue, Mar 8, 2016 at 10:25 PM, Steve Rowe  wrote:
>>> Hi Ilan,
>>> 
>>> Looks like you’re modifying solr.in.sh instead of solr.in.cmd?
>>> 
>>> FYI running under Cygwin is not supported.
>>> 
>>> --
>>> Steve
>>> www.lucidworks.com
>>> 
>>>> On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
>>>> 
>>>> Hi all, I am trying to integrate solr with SSL on Windows 7 OS
>>>> I followed the enable ssl guide at
>>>> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
>>>> 
>>>> I created the keystore and placed in on etc folder. I un-commented the
>>>> lines and set:
>>>> SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
>>>> SOLR_SSL_KEY_STORE_PASSWORD=password
>>>> SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
>>>> SOLR_SSL_TRUST_STORE_PASSWORD=password
>>>> SOLR_SSL_NEED_CLIENT_AUTH=false
>>>> 
>>>> When i test the storekey using
>>>> keytool -list -alias solr-ssl -keystore
>>>> C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password
>> -keypass
>>>> password
>>>> It is okay, and print me there is 1 entry in keystore.
>>>> 
>>>> When i am running in from solr, it will write:
>>>> "Keystore was tampered with, or password was incorrect"
>>>> I get this exception after
>> JavaKeyStore.engineLoad(JavaKeyStore.java:780)
>>>> 
>>>> 
>>>> If i replace
>>>> SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
>>>> SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
>>>> it will write the same error, i suspect i dont deliver the path as it
>>>> should be.
>>>> 
>>>> Any suggestions ?
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> --
>>>> 
>>>> 
>>>> -
>>>> Ilan Schwarts
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> 
>>> -
>>> Ilan Schwarts
>> 
>> 



Re: SolrCloud App Unit Testing

2016-03-19 Thread Steve Davids
Naveen,

The Solr codebase generally uses the base “SolrTestCaseJ4” class and sometimes 
mixes in the cloud cluster. I personally write a generic abstract base test 
class to fit my needs and have an abstract `getSolrServer` method with an 
EmbeddedSolrServer implementation along with a separate implementation for the 
CloudSolrServer. I use the EmbeddedSolrServer for almost all of my test cases 
since it is a lot faster to setup, I’ll pull in the Cloud implementation if 
there is some distributed logic that is necessary for testing. Here is a simple 
example project (https://gitlab.com/bti360/solr-exercise/tree/example-solution 
<https://gitlab.com/bti360/solr-exercise/tree/example-solution>) which has a 
base test 
<https://gitlab.com/bti360/solr-exercise/blob/example-solution/src/test/java/com/bti360/gt/search/BaseSolrTestCase.java>
 which piggy-backs off the SolrTestCase class. If you don’t want to complete 
the “exercise” switch over to the 

Hopefully that points you in the right direction,

-Steve


> On Mar 17, 2016, at 1:03 PM, Davis, Daniel (NIH/NLM) [C] 
>  wrote:
> 
> MiniSolrCloudCluster is intended for building unit tests for cloud commands 
> within Solr itself.
> 
> What most people do to test applications based on Solr (and their Solr 
> configurations) is to start solr either on their CI server or in the cloud 
> (more likely the later), and then point their application at that Solr 
> instance through configuration for the unit tests.   They may also have 
> separate tests to test the Solr collection/core configuration itself.
> 
> You can have your CI tool (Travis/etc.) or unit test scripts start-up Solr 
> locally, or in the cloud, using various tools and concoctions.   Part of the 
> core of that is the solr command-line in SOLR_HOME/bin, post tool in 
> SOLR_HOME/bin, and zkcli in SOLR_HOME/server/scripts/cloud-scripts.
> 
> To start Solr in the cloud, you should look towards something that exists:
>   https://github.com/lucidworks/solr-scale-tk 
>   https://github.com/vkhatri/chef-solrcloud
> 
> Hope this helps,
> 
> -Dan
> 
> -Original Message-
> From: Madhire, Naveen [mailto:naveen.madh...@capitalone.com] 
> Sent: Thursday, March 17, 2016 11:24 AM
> To: solr-user@lucene.apache.org
> Subject: FW: SolrCloud App Unit Testing
> 
> 
> Hi,
> 
> I am writing a Solr Application, can anyone please let me know how to Unit 
> test the application?
> 
> I see we have MiniSolrCloudCluster class available in Solr, but I am confused 
> about how to use that for Unit testing.
> 
> How should I create a embedded server for unit testing?
> 
> 
> 
> Thanks,
> Naveen
> 
> 
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates and may only be used solely in 
> performance of work or services for Capital One. The information transmitted 
> herewith is intended only for use by the individual or entity to which it is 
> addressed. If the reader of this message is not the intended recipient, you 
> are hereby notified that any review, retransmission, dissemination, 
> distribution, copying or other use of, or taking of any action in reliance 
> upon this information is strictly prohibited. If you have received this 
> communication in error, please contact the sender and delete the material 
> from your computer.



Re: Paging and cursorMark

2016-03-22 Thread Steve Rowe
Hi Tom,

There is an outstanding JIRA issue to directly support what you want (with a 
patch even!) but no work on it recently: 
<https://issues.apache.org/jira/browse/SOLR-6635>.  If you’re so inclined, 
please pitch in: bring the patch up-to-date, test it, contribute improvements, 
etc.

--
Steve
www.lucidworks.com

> On Mar 22, 2016, at 10:27 AM, Tom Evans  wrote:
> 
> Hi all
> 
> With Solr 5.5.0, we're trying to improve our paging performance. When
> we are delivering results using infinite scrolling, cursorMark is
> perfectly fine - one page is followed by the next. However, we also
> offer traditional paging of results, and this is where it gets a
> little tricky.
> 
> Say we have 10 results per page, and a user wants to jump from page 1
> to page 20, and then wants to view page 21, there doesn't seem to be a
> simple way to get the nextCursorMark. We can make an inefficient
> request for page 20 (start=190, rows=10), but we cannot give that
> request a cursorMark=* as it contains start=190.
> 
> Consequently, if the user clicks to page 21, we have to continue along
> using start=200, as we have no cursorMark. The only way I can see to
> get a cursorMark at that point is to omit the start=200, and instead
> say rows=210, and ignore the first 200 results on the client side.
> Obviously, this gets more and more inefficient the deeper we page - I
> know that internally to Solr, using start=200&rows=10 has to do the
> same work as rows=210, but less data is sent over the wire to the
> client.
> 
> As I understand it, the cursorMark is a hash of the sort values of the
> last document returned, so I don't really see why it is forbidden to
> specify start=190&rows=10&cursorMark=* - why is it not possible to
> calculate the nextCursorMark from the last document returned?
> 
> I was also thinking a possible temporary workaround would be to
> request start=190&rows=10, note the last document returned, and then
> make a subsequent query for q=id:""&rows=1&cursorMark=*.
> This seems to work, but means an extra Solr query for no real reason.
> Is there any other problem to doing this?
> 
> Is there some other simple trick I am missing that we can use to get
> both the page of results we want and a nextCursorMark for the
> subsequent page?
> 
> Cheers
> 
> Tom



Re: Requesting to be added to ContributorsGroup

2016-05-03 Thread Steve Rowe
Welcome Sheece,

I’ve added you to the ContributorsGroup.

--
Steve
www.lucidworks.com

> On May 3, 2016, at 10:03 AM, Syed Gardezi  wrote:
> 
> Hello,
> I am a Master student as part of Free and Open Source Software 
> Development COMP8440 - http://programsandcourses.anu.edu.au/course/COMP8440 
> at Australian National University. I have selected 
> http://wiki.apache.org/solr/ to contribute too. Kindly add me too 
> ContributorsGroup. Thank you.
> 
> wiki username: sheecegardezi
> 
> Regards,
> Sheece
> 



Do not match on high frequency terms

2015-07-31 Thread Swedish, Steve
Hello,

I'm hoping someone might be able to help me out with this as I do not have very 
much solr experience. Basically, I am wondering if it is possible to not match 
on terms that have a document frequency above a certain threshold. For my 
situation, a stop word list will be unrealistic to maintain, so I was wondering 
if there may be an alternative solution using term document frequency to 
identify common terms.

What would actually be ideal is if I could somehow use the CommonTermsQuery. 
The problem I ran across when looking at this option was that the 
CommonTermsQuery seems to only work for queries on one field at a time (unless 
I'm mistaken). However, I have a query of the structure q=(field1:(blah) AND 
(field2:(blah) OR field3:(blah))) OR field1:(blah) OR (field2:(blah) AND 
field3:(blah)). If there are any ideas on how to use the CommonTermsQuery with 
this query structure, that would be great.

If it's possible to extract the document frequency for terms in my query before 
the query is run, allowing me to remove the high frequency terms from the query 
first, that could also be a valid solution. I'm using solrj as well, so a 
solution that works with solrj would be appreciated.

Thanks,
Steve


RE: Do not match on high frequency terms

2015-08-03 Thread Swedish, Steve
Thanks for your response. For TermsComponent, I am able to get a list of all 
terms in a field that have a document frequency under a certain threshold, but 
I was wondering if I could instead pass a list of terms, and get back only the 
terms from that list that have a document frequency under a certain threshold 
in a field. I can't find an easy way to do this, do you know if this is 
possible?

Thanks,
Steve

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Saturday, August 1, 2015 6:35 AM
To: solr-user 
Subject: Re: Do not match on high frequency terms

It seems like you need to develop custom query or query parser. Regarding
SolrJ: you can try to call http://wiki.apache.org/solr/TermsComponent
https://cwiki.apache.org/confluence/display/solr/The+Terms+Component I'm not 
sure how exactly call TermsComponent in SolrJ, I just found 
https://lucene.apache.org/solr/5_2_1/solr-solrj/org/apache/solr/client/solrj/response/TermsResponse.html
to read its' response.

On Fri, Jul 31, 2015 at 11:31 PM, Swedish, Steve 
wrote:

> Hello,
>
> I'm hoping someone might be able to help me out with this as I do not 
> have very much solr experience. Basically, I am wondering if it is 
> possible to not match on terms that have a document frequency above a 
> certain threshold. For my situation, a stop word list will be 
> unrealistic to maintain, so I was wondering if there may be an 
> alternative solution using term document frequency to identify common terms.
>
> What would actually be ideal is if I could somehow use the 
> CommonTermsQuery. The problem I ran across when looking at this option 
> was that the CommonTermsQuery seems to only work for queries on one 
> field at a time (unless I'm mistaken). However, I have a query of the 
> structure
> q=(field1:(blah) AND (field2:(blah) OR field3:(blah))) OR 
> field1:(blah) OR
> (field2:(blah) AND field3:(blah)). If there are any ideas on how to 
> use the CommonTermsQuery with this query structure, that would be great.
>
> If it's possible to extract the document frequency for terms in my 
> query before the query is run, allowing me to remove the high 
> frequency terms from the query first, that could also be a valid 
> solution. I'm using solrj as well, so a solution that works with solrj would 
> be appreciated.
>
> Thanks,
> Steve
>



--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>



Re: Supported languages

2015-08-04 Thread Steve Rowe
Hi Steve,

This page may be useful: 
<https://cwiki.apache.org/confluence/display/solr/Language+Analysis#LanguageAnalysis-Language-SpecificFactories>

In most cases the configurations described there are the only OOTB alternative, 
so optimality isn’t discussed.  I think the path most people take is to try 
those out, iterate with users who can provide feedback about quality, then if 
necessary investigate alternative solutions, including commercial ones.

Steve
www.lucidworks.com

> On Aug 4, 2015, at 12:55 PM, Steven White  wrote:
> 
> Hi Everyone,
> 
> I see Solr comes pre-configured with text analyzers for a list of supported
> languages e.g.: "text_ar", "text_bq", "text_ca", "text_cjk", "text_ckb",
> "text_cz", etc.
> 
> My questions are:
> 
> 1) How well optimized are those languages for general usage?  This is
> something I need help with because other then English, I cannot judge how
> well the current pre-configured setting works for best quality.  Yes,
> "quality" means different thing for each customer, but still I'm curious to
> know if the out-of-the-box setting is optimal.
> 
> 2) Is there a landing link that talks about each of the
> supported languages, what is available and how to tune that fieldType for
> the said language?
> 
> 3) What do you do when a language I need is not on the list?  The obvious
> answer is to write my own plug-in "fieldType" (or even customize one off
> existing fieldType), but short of that, is there a "general" fieldType that
> can be used?  Even if it means this fieldType will function as if it is
> SQL's LIKE feature.
> 
> Thanks
> 
> Steve



Re: Indexing Fixed length file

2015-08-28 Thread Steve Rowe
Hi Tim,

I haven’t heard of people indexing this kind of input with Solr, but the format 
is quite similar to CSV/TSV files, with the exception that the field separators 
have fixed positions and are omitted.

You could write a short script to insert separators (e.g. commas) at these 
points (but be sure to escape quotation marks and the separators) and then use 
Solr’s CSV update functionality: 
<https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates>.

I think dealing with fixed-width fields directly would be a nice addition to 
Solr’s CSV update capabilities - feel free to make an issue - see 
<http://wiki.apache.org/solr/HowToContribute>.

Steve
www.lucidworks.com

> On Aug 28, 2015, at 3:19 AM, timmsn  wrote:
> 
> Hello,
> 
> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
> they have a fixed length and no withespace to seperate the words. 
> How can i Programm a Template or so for my fields?
> Or can i edit the schema.xml for my Problem?
> 
> This ist one record from one file, in this file are 40 - 100 records.
> 
> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
> 130445 
> 
> 
> Thanks! 
> 
> Tim
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Loading Solr Analyzer from RuntimeLib Blob

2015-09-10 Thread Steve Davids
Accidentally sent this on the java-users list instead of solr-users...


Hi,

I am attempting to migrate our deployment process over to using the
recently added "Blob Store API" which should simplify things a bit when it
comes to cloud infrastructures for us. Unfortunately, after loading the jar
in the .system collection and adding it to our runtimelib config overlay
analyzers from our schema doesn't appear to be aware of our custom code. Is
there a way to specify runtimeLib="true" on the schema or perhaps an
alternate method to make sure that jar is loaded on the classpath before
the schema is loaded?

Thanks for the help,

-Steve


RE: Google didn't help on this one!

2015-09-15 Thread steve shepard
For this type of "walking" error, I'd suggest installing on the "client" pc a 
HTTP, HTTPS packet inspector like Fiddler; this allows you to see exactly what 
information you are sending to the server, and the response that you receive; 
format, image (if any) and the like.
http://fiddlerbook.com/book/
http://www.telerik.com/fiddler
Steve

> From: mark.fenb...@noaa.gov
> Subject: Re: Google didn't help on this one!
> To: solr-user@lucene.apache.org
> Date: Tue, 15 Sep 2015 13:43:47 -0400
> 
> So I ran "nc -l 8983" then restarted solr, and then ran my app with my 
> query.   nc reported the following:
> 
> GET 
> /solr/EventLog/spellCheckCompRH?qt=%2FspellCheckCompRH&q=Some+more+text+wit+some+missspelled+wordz.&spellcheck=on&spellcheck.build=true&wt=javabin&version=2
>  
> HTTP/1.1
> User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0
> Host: dell9-tir:8983
> Connection: Keep-Alive
> 
> I'm not sure if this is good, or indicates an error of any kind.
> 
> Anyway, when I ran my app again, I got a completely different error, 
> although I didn't change anything!  So, I guess I get to move on from 
> this and see what other hurdles I run into!
> 
> Thanks for the help!
> Mark
> 
> 
> On 9/15/2015 11:13 AM, Yonik Seeley wrote:
> > On Tue, Sep 15, 2015 at 11:08 AM, Mark Fenbers  
> > wrote:
> >> I'm working with the spellcheck component of Solr for the first time.  I'm
> >> using SolrJ, and when I submit my query, I get a Solr Exception:  "Expected
> >> mime type octet/stream but got text/html."
> >>
> >> What in the world is this telling me??
> > You're probably hitting an endpoint on Solr that doesn't exist and
> > getting an HTML 404 error page rather than the response (which would
> > be in binary by default).
> >
> > An easy way to see what SolrJ is sending is to kill your solr server, then 
> > do
> >
> > nc -l 8983
> >
> > And then run your SolrJ program to see what it sends... if it look OK,
> > then try sending the request from curl to Solr.
> >
> > -Yonik
> >
> 
  

Re: ctargett commented on http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html

2015-09-21 Thread Steve Rowe
I logged into comments.a.o and then disabled emailing of comments to this
list.

When we set up the "solrcwiki" site on comments.apache.org, the requirement
was that the PMC chair be the (sole) manager, and though I am no longer
chair, I'm still the manager of the "solrcwiki" site for the ASF commenting
system.

Tomorrow I'll ask ASF Infra about whether the managership should be
transferred to the current PMC chair.  (If they don't care, I don't mind
continuing to manage it.)

On Mon, Sep 21, 2015 at 5:43 PM, Cassandra Targett 
wrote:

> Hey folks,
>
> I'm doing some experiments with other formats for the Ref Guide and playing
> around with options for comments. I didn't realize this old experiment from
> https://issues.apache.org/jira/browse/SOLR-4889 would send email - I'm
> talking to Steve Rowe to see if we can get that disabled.
>
> Cassandra
>
> On Mon, Sep 21, 2015 at 2:06 PM,  wrote:
>
> > Hello,
> > ctargett has commented on
> >
> http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html
> > .
> > You can find the comment here:
> >
> >
> http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html#comment_4535
> > Please note that if the comment contains a hyperlink, it must be
> > approved
> > before it is shown on the site.
> >
> > Below is the reply that was posted:
> > 
> > This is a test of the comments system.
> > 
> >
> > With regards,
> > Apache Solr Cwiki.
> >
> > You are receiving this email because you have subscribed to changes
> > for the solrcwiki site.
> > To stop receiving these emails, unsubscribe from the mailing list
> that
> > is providing these notifications.
> >
> >
>


Cloud Deployment Strategy... In the Cloud

2015-09-22 Thread Steve Davids
Hi,

I am trying to come up with a repeatable process for deploying a Solr Cloud
cluster from scratch along with the appropriate security groups, auto
scaling groups, and custom Solr plugin code. I saw that LucidWorks created
a Solr Scale Toolkit but that seems to be more of a one-shot deal than
really setting up your environment for the long-haul. Here is were we are
at right now:

   1. ZooKeeper ensemble is easily brought up via a Cloud Formation Script
   2. We have an RPM built to lay down the Solr distribution + Custom
   plugins + Configuration
   3. Solr machines come up and connect to ZK

Now, we are using Puppet which could easily create the core.properties file
for the corresponding core and have ZK get bootstrapped but that seems to
be a no-no these days... So, can anyone think of a way to get ZK
bootstrapped automatically with pre-configured Collection configurations?
Also, is there a recommendation on how to deal with machines that are
coming/going? As I see it machines will be getting spun up and terminated
from time to time and we need to have a process of dealing with that, the
first idea was to just use a common node name so if a machine was
terminated a new one can come up and replace that particular node but on
second thought it would seem to require an auto scaling group *per* node
(so it knows what node name it is). For a large cluster this seems crazy
from a maintenance perspective, especially if you want to be elastic with
regard to the number of live replicas for peak times. So, then the next
idea was to have some outside observer listen to when new ec2 instances are
created or terminated (via CloudWatch SQS) and make the appropriate API
calls to either add the replica or delete it, this seems doable but perhaps
not the simplest solution that could work.

I was hoping others have already gone through this and have valuable advice
to give, we are trying to setup Solr Cloud the "right way" so we don't get
nickel-and-dimed to death from an O&M perspective.

Thanks,

-Steve


Re: Cloud Deployment Strategy... In the Cloud

2015-09-23 Thread Steve Davids
What tools do you use for the "auto setup"? How do you get your config
automatically uploaded to zk?

On Tue, Sep 22, 2015 at 2:35 PM, Gili Nachum  wrote:

> Our auto setup sequence is:
> 1.deploy 3 zk nodes
> 2. Deploy solr nodes and start them connecting to zk.
> 3. Upload collection config to zk.
> 4. Call create collection rest api.
> 5. Done. SolrCloud ready to work.
>
> Don't yet have automation for replacing or adding a node.
> On Sep 22, 2015 18:27, "Steve Davids"  wrote:
>
> > Hi,
> >
> > I am trying to come up with a repeatable process for deploying a Solr
> Cloud
> > cluster from scratch along with the appropriate security groups, auto
> > scaling groups, and custom Solr plugin code. I saw that LucidWorks
> created
> > a Solr Scale Toolkit but that seems to be more of a one-shot deal than
> > really setting up your environment for the long-haul. Here is were we are
> > at right now:
> >
> >1. ZooKeeper ensemble is easily brought up via a Cloud Formation
> Script
> >2. We have an RPM built to lay down the Solr distribution + Custom
> >plugins + Configuration
> >3. Solr machines come up and connect to ZK
> >
> > Now, we are using Puppet which could easily create the core.properties
> file
> > for the corresponding core and have ZK get bootstrapped but that seems to
> > be a no-no these days... So, can anyone think of a way to get ZK
> > bootstrapped automatically with pre-configured Collection configurations?
> > Also, is there a recommendation on how to deal with machines that are
> > coming/going? As I see it machines will be getting spun up and terminated
> > from time to time and we need to have a process of dealing with that, the
> > first idea was to just use a common node name so if a machine was
> > terminated a new one can come up and replace that particular node but on
> > second thought it would seem to require an auto scaling group *per* node
> > (so it knows what node name it is). For a large cluster this seems crazy
> > from a maintenance perspective, especially if you want to be elastic with
> > regard to the number of live replicas for peak times. So, then the next
> > idea was to have some outside observer listen to when new ec2 instances
> are
> > created or terminated (via CloudWatch SQS) and make the appropriate API
> > calls to either add the replica or delete it, this seems doable but
> perhaps
> > not the simplest solution that could work.
> >
> > I was hoping others have already gone through this and have valuable
> advice
> > to give, we are trying to setup Solr Cloud the "right way" so we don't
> get
> > nickel-and-dimed to death from an O&M perspective.
> >
> > Thanks,
> >
> > -Steve
> >
>


Re: Cloud Deployment Strategy... In the Cloud

2015-09-30 Thread Steve Davids
Our project built a custom "admin" webapp that we use for various O&M
activities so I went ahead and added the ability to upload a Zip
distribution which then uses SolrJ to forward the extracted contents to ZK,
this package is built and uploaded via a Gradle build task which makes life
easy on us by allowing us to jam stuff into ZK which is sitting in a
private network (local VPC) without necessarily needing to be on a ZK
machine. We then moved on to creating collection (trivial), and
adding/removing replicas. As for adding replicas I am rather confused as to
why I would need specify a specific shard for replica placement, before
when I threw down a core.properties file the machine would automatically
come up and figure out which shard it should join based on reasonable
assumptions - why wouldn't the same logic apply here? I then saw that
a Rule-based
Replica Placement
<https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement>
feature was added which I thought would be reasonable but after looking at
the tests <https://issues.apache.org/jira/browse/SOLR-7577> it appears to
still require a shard parameter for adding a replica which seems to defeat
the entire purpose. So after getting bummed out about that, I took a look
at the delete replica request since we are having machines come/go we need
to start dropping them and found that the delete replica requires a
collection, shard, and replica name and if I have the name of the machine
it appears the only way to figure out what to remove is by walking the
clusterstate tree for all collections and determine which replicas are a
candidate for removal which seems unnecessarily complicated.

Hopefully I don't come off as complaining, but rather looking at it from a
client perspective, the Collections API doesn't seem simple to use and
really the only reason I am messing around with it now is because there is
repeated threats to make "zk as truth" the default in the 5.x branch at
some point in the future. I would personally advocate that something like
the autoManageReplicas <https://issues.apache.org/jira/browse/SOLR-5748> be
introduced to make life much simpler on clients as this appears to be the
thing I am trying to implement externally.

If anyone has happened to to build a system to orchestrate Solr for cloud
infrastructure and have some pointers it would be greatly appreciated.

Thanks,

-Steve

On Thu, Sep 24, 2015 at 10:15 AM, Dan Davis  wrote:

> ant is very good at this sort of thing, and easier for Java devs to learn
> than Make.  Python has a module called fabric that is also very fine, but
> for my dev. ops. it is another thing to learn.
> I tend to divide things into three categories:
>
>  - Things that have to do with system setup, and need to be run as root.
> For this I write a bash script (I should learn puppet, but...)
>  - Things that have to do with one time installation as a solr admin user
> with /bin/bash, including upconfig.   For this I use an ant build.
>  - Normal operational procedures.   For this, I typically use Solr admin or
> scripts, but I wish I had time to create a good webapp (or money to
> purchase Fusion).
>
>
> On Thu, Sep 24, 2015 at 12:39 AM, Erick Erickson 
> wrote:
>
> > bq: What tools do you use for the "auto setup"? How do you get your
> config
> > automatically uploaded to zk?
> >
> > Both uploading the config to ZK and creating collections are one-time
> > operations, usually done manually. Currently uploading the config set is
> > accomplished with zkCli (yes, it's a little clumsy). There's a JIRA to
> put
> > this into solr/bin as a command though. They'd be easy enough to script
> in
> > any given situation though with a shell script or wizard
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 23, 2015 at 7:33 PM, Steve Davids  wrote:
> >
> > > What tools do you use for the "auto setup"? How do you get your config
> > > automatically uploaded to zk?
> > >
> > > On Tue, Sep 22, 2015 at 2:35 PM, Gili Nachum 
> > wrote:
> > >
> > > > Our auto setup sequence is:
> > > > 1.deploy 3 zk nodes
> > > > 2. Deploy solr nodes and start them connecting to zk.
> > > > 3. Upload collection config to zk.
> > > > 4. Call create collection rest api.
> > > > 5. Done. SolrCloud ready to work.
> > > >
> > > > Don't yet have automation for replacing or adding a node.
> > > > On Sep 22, 2015 18:27, "Steve Davids"  wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am trying to come up with a repeatable process for deploying a
> Solr
> > > > Cl

Re: Can I use tokenizer twice ?

2015-10-14 Thread Steve Rowe
Hi,

Analyzers must have exactly one tokenizer, no more and no less.

You could achieve what you want by copying to another field and defining a 
separate analyzer for each.  One would create shingles, and the other edge 
ngrams.  

Steve

> On Oct 14, 2015, at 11:58 AM, vit  wrote:
> 
> I have Solr 4.2
> I need to do the following:
> 
> 1. white space tokenize
> 2. create shingles
> 3. use EdgeNGramFilter for each word in shingles, but not in a shingle as a
> string
> 
> So can I do this?
> 
> * *
> 
>  maxShingleSize="4" outputUnigrams="false" outputUnigramsIfNoShingles="true"
> />
> * *
>  maxGramSize="25"/>
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tokenize ShingleFilterFactory results and apply filters to tokens

2015-10-19 Thread Steve Rowe
Hi Vitaliy,

I don’t know of any combination of built-in Lucene/Solr analysis components 
that would do what you want, but there used to be filter called 
ShingleMatrixFilter that (if I understand both that filter and what you want 
correctly), would do what you want, following an EdgeNGramFilter: 
<https://lucene.apache.org/core/3_6_2/api/all/org/apache/lucene/analysis/shingle/ShingleMatrixFilter.html>

It was deprecated in v3.1 and removed in v4.0 (see 
<https://issues.apache.org/jira/browse/LUCENE-2920>) because it wasn’t being 
maintained by the original creator and nobody else understood it :).  Uwe 
Schindler put up a patch that rewrote it and fixed some problems on 
<https://issues.apache.org/jira/browse/LUCENE-1391>, but that was never 
finished/committed.

What you want could create a huge number of terms, depending on the # of 
documents, terms in the field, and term length.  What do you want to use these 
terms for?

Steve

> On Oct 17, 2015, at 10:33 AM, vitaly bulgakov  wrote:
> 
> /why don't you put EdgeNGramFilter just after ShingleFilter?/
> 
> Because it will do Edge Ngrams over a shingle as a string:
> for "Home Improvement" shingle it will do:  Hom, Home, Home , Home I,
> Home Im, Home Imp .. 
> 
> But I need:
> ... Hom Imp, Hom Impr ..
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Tokenize-ShingleFilterFactory-results-and-apply-filters-to-tokens-tp4234574p4234872.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: contributor request

2015-11-02 Thread Steve Rowe
Yes, sorry, the wiki took so long to come back after changing it to include 
Alex’s username that I forgot to send notification…  Thanks Erick.
 
> On Oct 31, 2015, at 11:27 PM, Erick Erickson  wrote:
> 
> Looks like Steve added you today, you should be all set.
> 
> On Sat, Oct 31, 2015 at 12:50 PM, Alex  wrote:
>> Oh, shoot, forgot to include my wiki username. Its "AlexYumas" sorry about
>> that stupid me
>> 
>> On Sat, Oct 31, 2015 at 10:48 PM, Alex  wrote:
>> 
>>> Hi,
>>> 
>>> Please kindly add me to the Solr wiki contributors list. The app we're
>>> developing (Jitbit Help) is using Apache Solr to power our knowledge-base
>>> search engine, customers love it. (we were using MS Fulltext indexing
>>> service before, but it's a huge PITA).
>>> 
>>> Thanks
>>> 



Re: how to change uniqueKey?

2015-11-04 Thread Steve Rowe
Hi Oleksandr,

> On Nov 3, 2015, at 9:24 AM, Oleksandr Yermolenko  wrote:
> 
> Hello, All,
> 
> I can't find the way to change uniqueKey in "managed-schema" environment!!!

[…]

> 7. The first and the last question: the correct way changing uniqueKey in 
> schemaless environment? what I missed?

There is an open issue to provide this capability: 
https://issues.apache.org/jira/browse/SOLR-7242 but no work done on it yet.

Re: Solr 5: data_driven_schema_config's solrconfig causing error

2015-03-10 Thread Steve Rowe
Hi Aman,

The stack trace shows that the AddSchemaFieldsUpdateProcessorFactory specified 
in data_driven_schema_configs’s solrconfig.xml expects the “booleans” field 
type to exist.

Solr 5’s data_driven_schema_configs includes the “booleans” field type:

<http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_5_0_0/solr/server/solr/configsets/data_driven_schema_configs/conf/managed-schema?view=markup#l249>

So you must have removed it when you modified the schema?  Did you do this 
intentionally?  If so, why?

Steve

> On Mar 10, 2015, at 5:25 AM, Aman Tandon  wrote:
> 
> Hi,
> 
> For the sake of using the new schema.xml and solrconfig.xml with solr 5, I
> put my old required field type & fields names (being used with solr 4.8.1)
> in the schema.xml given in *basic_configs* & configurations setting given
> in solrconfig.xml present in *data_driven_schema_configs* and put I put
> these configuration files in the configs of zookeeper.
> 
> But when i am creating the core it is giving the error as booleans
> fieldType is not found in schema. So correct me if i am doing something
> wrong.
> 
> ERROR - 2015-03-10 08:20:16.788; org.apache.solr.core.CoreContainer; Error
>> creating core [core1]: fieldType 'booleans' not found in the schema
>> org.apache.solr.common.SolrException: fieldType 'booleans' not found in
>> the schema
>> at org.apache.solr.core.SolrCore.(SolrCore.java:896)
>> at org.apache.solr.core.SolrCore.(SolrCore.java:662)
>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
>> at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:573)
>> at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
>> at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> at
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> at
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> at
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> at java.lan

Re: Solr 5: data_driven_schema_config's solrconfig causing error

2015-03-11 Thread Steve Rowe
Hi Aman,

So you (randomly?) chose an example configset, commented out parts you didn’t 
understand, and now things don’t work?

… Maybe you should review the process you’re using?

Like, don’t start with a configset that will auto-populate the schema for you 
with guessed field types if you don’t want to do that.  (That’s the focus of 
the data_driven_schema_configs configset.)

AFAICT, what you’re trying to do is take a configset you’ve used in the past 
with an older version of Solr and get it to work with a newer Solr version.  If 
that’s so, perhaps you should start with a configset like 
sample_techproducts_configs?

Steve

> On Mar 11, 2015, at 1:05 PM, Aman Tandon  wrote:
> 
> I removed/commented as it was not understood able and not for our use.
> 
> With Regards
> Aman Tandon
> 
> On Tue, Mar 10, 2015 at 8:04 PM, Steve Rowe  wrote:
> 
>> Hi Aman,
>> 
>> The stack trace shows that the AddSchemaFieldsUpdateProcessorFactory
>> specified in data_driven_schema_configs’s solrconfig.xml expects the
>> “booleans” field type to exist.
>> 
>> Solr 5’s data_driven_schema_configs includes the “booleans” field type:
>> 
>> <
>> http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_5_0_0/solr/server/solr/configsets/data_driven_schema_configs/conf/managed-schema?view=markup#l249
>>> 
>> 
>> So you must have removed it when you modified the schema?  Did you do this
>> intentionally?  If so, why?
>> 
>> Steve
>> 
>>> On Mar 10, 2015, at 5:25 AM, Aman Tandon 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> For the sake of using the new schema.xml and solrconfig.xml with solr 5,
>> I
>>> put my old required field type & fields names (being used with solr
>> 4.8.1)
>>> in the schema.xml given in *basic_configs* & configurations setting given
>>> in solrconfig.xml present in *data_driven_schema_configs* and put I put
>>> these configuration files in the configs of zookeeper.
>>> 
>>> But when i am creating the core it is giving the error as booleans
>>> fieldType is not found in schema. So correct me if i am doing something
>>> wrong.
>>> 
>>> ERROR - 2015-03-10 08:20:16.788; org.apache.solr.core.CoreContainer;
>> Error
>>>> creating core [core1]: fieldType 'booleans' not found in the schema
>>>> org.apache.solr.common.SolrException: fieldType 'booleans' not found in
>>>> the schema
>>>> at org.apache.solr.core.SolrCore.(SolrCore.java:896)
>>>> at org.apache.solr.core.SolrCore.(SolrCore.java:662)
>>>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
>>>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
>>>> at
>>>> 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:573)
>>>> at
>>>> 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
>>>> at
>>>> 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
>>>> at
>>>> 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
>>>> at
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
>>>> at
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
>>>> at
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
>>>> at
>>>> 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>>>> at
>>>> 
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>>>> at
>>>> 
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>>>> at
>>>> 
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>>>> at
>>>> 
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>>>> at
>>>> 
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>>>> at
>>>> 
>> org.eclipse.jet

Re: schemaless slow indexing

2015-03-23 Thread Steve Rowe

> On Mar 23, 2015, at 11:51 AM, Alexandre Rafalovitch  
> wrote:
> For example, I am not even sure if we can create a copyField
> definition via REST API yet.





Re: schemaless slow indexing

2015-03-23 Thread Steve Rowe
> On Mar 23, 2015, at 11:09 AM, Yonik Seeley  wrote:
> 
> On Mon, Mar 23, 2015 at 1:54 PM, Alexandre Rafalovitch
>  wrote:
>> I looked at SOLR-7290, but I think the discussion should stay on the
>> mailing list for at least one more iteration.
>> 
>> My understanding that the reason copyField exists is so that a search
>> actually worked out of the box. Without knowing the field names, one
>> cannot say what to search.
> 
> Some points:
> - Schemaless is often just to make it easier to get started.
> - If one assumes a lack of knowledge of field names, that's an issue
> for non-schemaless too.
> - Full-text search is only one use-case that people use Solr for...
> there's lots of sorting/faceting/analytics use cases.

Under SOLR-6779, Erik Hatcher changed the data_driven_schema_configs's 
auto-guessed default field type from text_general to strings in order to 
support features other than full-text search:

<https://svn.apache.org/viewvc/lucene/dev/trunk/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml?r1=1648456&r2=1648455&pathrev=1648456>

It’s for exactly this reason (as Alex pointed out) that the catch-all field 
makes sense: there is no other full-text available.

Yonik, can you suggest a path that supports both these possibilities?  Because 
having zero fields with full text search in the default Solr configuration 
seems like a really bad idea to me.

Steve

Re: Add Entry to Support Page

2015-04-21 Thread Steve Rowe
Hi Christoph,

I’ve added your wiki name to the ContributorsGroup page, so you should now be 
able to edit pages on the wiki.

Steve
 
> On Apr 21, 2015, at 8:15 AM, Christoph Schmidt 
>  wrote:
> 
> Solr Community,
> 
> I’m Christoph Schmidt (http://www.moresophy.com/de/management), CEO of the 
> german company moresophy GmbH.
> 
> My Solr Wiki name is:
> 
>  
> 
> -  ChristophSchmidt
> 
>  
> 
> We are working with Lucene since 2003 and Solr 2012 and are building 
> linguistic token filters and plugins for Solr.
> 
> We would like to add the following entry to the Solr Support page:
> 
>  
> 
> moresophy GmbH: consulting in Lucene, Solr, elasticsearch, specialization in 
> linguistic and semantic enrichment and high scalability content clouds 
> (DE/AT/CH)  href="mailto:i...@moresophy.com";>i...@moresophy.com
> 
>  
> 
> Best regards
> 
> Christoph Schmidt
> 
>  
> 
> ___
> 
> Dr. Christoph Schmidt | Geschäftsführer
> 
>  
> 
> P +49-89-523041-72
> 
> M +49-171-1419367
> 
> Skype: cs_moresophy
> 
> christoph.schm...@moresophy.de
> 
> www.moresophy.com
> 
> moresophy GmbH | Fraunhoferstrasse 15 | 82152 München-Martinsried
> 
>  
> 
> Handelsregister beim Amtsgericht München, NR. HRB 136075
> 
> Umsatzsteueridentifikationsnummer: DE813188826
> 
> Vertreten durch die Geschäftsführer: Prof. Dr. Heiko Beier | Dr. Christoph 
> Schmidt
> 
>  
> 
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte 
> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese 
> E-Mailbitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte 
> Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
> 
>  
> 
> This e-mail may contain confidential and/or privileged information. If you 
> are not the intended recipient (or have receiveddestroy this e-mail. Any 
> unauthorised copying, disclosure or distribution of the material in this 
> e-mail is strictly forbidden.
> 
>  
> 
> 
>  
> 



Re: Attributes in and

2015-04-28 Thread Steve Rowe
Hi Steve,

From 
<https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties>:

> The properties that can be specified for a given field type fall into
> three major categories:
>   • Properties specific to the field type's class.
>   • General Properties Solr supports for any field type.
>   • Field Default Properties that can be specified on the field type
> that will be inherited by fields that use this type instead of
> the default behavior.

“indexed” and “stored” are among the Field Default Properties listed as 
specifiable on -s.

 properties override  properties, not the reverse.

Steve

> On Apr 28, 2015, at 9:25 AM, Steven White  wrote:
> 
> Hi Everyone,
> 
> Looking at the out-of-the box schema.xml of Solr 5.1, I see this:
> 
> class="solr.TextField" >
>  
> 
> Is it valid to have "stored" and "indexed" on ?  My
> understanding is that those are on  only.  If not, is the value in
>  overrides what's in ?
> 
> Thanks
> 
> Steve



Re: Schema API: add-field-type

2015-05-05 Thread Steve Rowe
Hi Steve, responses inline below:

> On Apr 29, 2015, at 6:50 PM, Steven White  wrote:
> 
> Hi Everyone,
> 
> When I pass the following:
> http://localhost:8983/solr/db/schema/fieldtypes?wt=xml
> 
> I see this (as one example):
> 
>  
>date
>solr.TrieDateField
>0
>0
>
>  last_modified
>
>
>  *_dts
>  *_dt
>
>  
> 
> See how there is "fields" and "dynamicfields"?  However, when I look in
> schema.xml, I see this:
> 
>   positionIncrementGap="0"/>
> 
> See how there is nothing about "fields" and "dynamicfields".
> 
> Now, when I look further into the schema.xml, I see they are coming from:
> 
>  
>  
>   multiValued="true"/>
> 
> So it all makes sense.
> 
> Does this means the response of "fieldtypes" includes "fields" and
> "dynamicfields" as syntactic-sugar to let me know of the relationship this
> field-type has or is there more to it?

It’s FYI: this is the full list of fields and dynamic fields that use the given 
fieldtype.

> The reason why I care about this question is because I'm using Solr's
> Schema API (see: https://cwiki.apache.org/confluence/display/solr/Schema+API)
> to make changes to my schema.  Per this link:
> https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaNewFieldType
> it shows how to add a field-type via "add-field-type" but there is no
> mention of "fields" or "dynamicfields" in this API.  My assumption is
> "fields" and "dynamicfields" need not be part of this API, instead it is
> done via "add-field" and "add-dynamic-field", thus what I see in the XML of
> "fieldtypes" response is just syntactic-sugar.  Did I get all this right?
> 

Yes, as you say, to add (dynamic) fields after adding a field type, you must 
use the “add-field” and “add-dynamic-field” commands.  Note that you can do so 
in a single request if you like, as long as “add-field-type” is ordered before 
any referencing “add-field”/“add-dynamic-field” command.

To be clear, the “add-field-type” command does not support passing in a set of 
fields and/or dynamic fields to be added with the new field type.

Steve



Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-06 Thread Steve Rowe
Hi Steve,

It’s by design that you can copyField the same source/dest multiple times - 
according to Yonik (not sure where this was discussed), this capability has 
been used in the past to effectively boost terms in the source field.  

The API isn’t symmetric here though: I’m guessing deleting a mutiply specified 
copy field rule will delete all of them, but this isn’t tested, so I’m not sure.

There is no replace-copy-field command because copy field rules don’t have 
dependencies (i.e., nothing else in the schema refers to copy field rules), 
unlike fields, dynamic fields and field types, so 
delete-copy-field/add-copy-field works as one would expect.

For fields, dynamic fields and field types, a delete followed by an add is not 
the same as a replace, since (dynamic) fields could have dependent copyFields, 
and field types could have dependent (dynamic) fields.  delete-* commands are 
designed to fail if there are any existing dependencies, while the replace-* 
commands will maintain the dependencies if they exist.

Steve

> On May 6, 2015, at 6:44 PM, Steven White  wrote:
> 
> Hi Everyone,
> 
> I am using the Schema API to add a new copy field per:
> https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaNewCopyFieldRule
> 
> Unlike the other "Add" APIs, this one will not fail if you add an existing
> copy field object.  In fact, after when I call the API over and over, the
> item will appear over and over in schema.xml file like so:
> 
>  
>  
>  
>  
> 
> Is this the expected behaviour or a bug?  As a side question, is there any
> harm in having multiple "copyField" like I ended up with?
> 
> A final question, why there is no Replace a Copy Field?  Is this by design
> for some limitation or was the API just never implemented?
> 
> Thanks
> 
> Steve



Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-07 Thread Steve Rowe

> On May 6, 2015, at 8:25 PM, Yonik Seeley  wrote:
> 
> On Wed, May 6, 2015 at 8:10 PM, Steve Rowe  wrote:
>> It’s by design that you can copyField the same source/dest multiple times - 
>> according to Yonik (not sure where this was discussed), this capability has 
>> been used in the past to effectively boost terms in the source field.
> 
> Yep, used to be relatively common.
> Perhaps the API could be cleaner though if we supported that by
> passing an optional "numTimes" or "numCopies"?  Seems like a sane
> delete / overwrite options would thus be easier?

+1

Re: schema modification issue

2015-05-11 Thread Steve Rowe
Hi,

Thanks for reporting, I’m working a test to reproduce.  

Can you please create a Solr JIRA issue for this?:  
https://issues.apache.org/jira/browse/SOLR/

Thanks,
Steve

> On May 7, 2015, at 5:40 AM, User Zolr  wrote:
> 
> Hi there,
> 
> I have come accross a problem that  when using managed schema in SolrCloud,
> adding fields into schema would SOMETIMES end up prompting "Can't find
> resource 'schema.xml' in classpath or '/configs/collectionName',
> cwd=/export/solr/solr-5.1.0/server", there is of course no schema.xml in
> configs, but 'schema.xml.bak' and 'managed-schema'
> 
> i use solrj to create a collection:
> 
>Path tempPath = getConfigPath();
> client.uploadConfig(tempPath, name); //customized configs with
> solrconfig.xml using ManagedIndexSchemaFactory
> if(numShards==0){
> numShards = getNumNodes(client);
> }
> Create request = new CollectionAdminRequest.Create();
> request.setCollectionName(name);
> request.setNumShards(numShards);
> replicationFactor =
> (replicationFactor==0?DEFAULT_REPLICA_FACTOR:replicationFactor);
> request.setReplicationFactor(replicationFactor);
> request.setMaxShardsPerNode(maxShardsPerNode==0?replicationFactor:maxShardsPerNode);
> CollectionAdminResponse response = request.process(client);
> 
> 
> and adding fields to schema, either by curl or by httpclient,  would
> sometimes yield the following error, but the error can be fixed by
> RELOADING the newly created collection once or several times:
> 
> INFO  - [{  "responseHeader":{"status":500,"QTime":5},
> "errors":["Error reading input String Can't find resource 'schema.xml' in
> classpath or '/configs/collectionName',
> cwd=/export/solr/solr-5.1.0/server"],  "error":{"msg":"Can't find
> resource 'schema.xml' in classpath or '/configs/collectionName',
> cwd=/export/solr/solr-5.1.0/server","trace":"java.io.IOException: Can't
> find resource 'schema.xml' in classpath or '/configs/collectionName',
> cwd=/export/solr/solr-5.1.0/server
> 
> at
> org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:98)
> at
> org.apache.solr.schema.SchemaManager.getFreshManagedSchema(SchemaManager.java:421)
> at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:104)
> at
> org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94)
> at
> org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
> at org.ec

Re: schema.xml & xi:include -> copyField source :'_my_title' is not a glob and doesn't match any explicit field or dynamicField

2015-05-15 Thread Steve Rowe
Hi Clemens,

I think the problem is the structure of the composite schema - you’ll end up 
with:

   <- your other schema file
   <- the included schema-common.xml

   tags from your schema-common.xml.  You won’t be able to use 
it alone in that case, but if you need to do that, you could just create 
another schema file that includes it inside wrapping  tags.

Steve

> On May 15, 2015, at 4:01 AM, Clemens Wyss DEV  wrote:
> 
> Given the following schema.xml
> 
> 
>  _my_id
>  
>  
>  
> stored="true" type="string"/>
>  stored="true" type="string"/>
> type="string"/> 
>  
>   
>
>  positionIncrementGap="0" precisionStep="0"/>
>  
> 
> 
> When I try to include the very schema from another schema file, e.g.:
> 
> 
>   xmlns:xi="http://www.w3.org/2001/XInclude"/> 
> 
> 
> I get SolrException
> copyField source :'_my_title' is not a glob and doesn't match any explicit 
> field or dynamicField
> 
> Am I facing a bug or a feature?
> 
> Thanks
> - Clemens



Re: schema.xml & xi:include -> copyField source :'_my_title' is not a glob and doesn't match any explicit field or dynamicField

2015-05-15 Thread Steve Rowe
Hi Clemens,

I forgot that XInclude requires well-formed XML, so schema-common.xml without 
 tags won’t work, since it will have multiple root elements.

But instead of XInclude, you can define external entities for files you want to 
include, and then include a reference to them where you want the contents to be 
included.

This worked for me:

——
schema.xml
——

 ]>

  &schema_common;

——

——
schema-common.incl
——
 _my_id
 
 
 
   
   

 
  
   

 
——

Here’s what I get back from curl 
"http://localhost:8983/solr/mycore/schema?wt=schema.xml&indent=on”:

——


  _my_id
  
  
  
  
  
  
  

——

Steve

> On May 15, 2015, at 8:57 AM, Clemens Wyss DEV  wrote:
> 
> Thought about that too (should have written ;) ).
> When I remove the schema-tag from the composite xml I get:
> org.apache.solr.common.SolrException: Unable to create core [test]
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:533)
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:493)
> ...
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
> Caused by: org.apache.solr.common.SolrException: Could not load conf for core 
> test: org.apache.solr.common.SolrException: org.xml.sax.SAXParseException; 
> systemId: solrres:/schema.xml; lineNumber: 3; columnNumber: 84; Error 
> attempting to parse XML file (href='schema-common.xml').. Schema file is 
> C:\source\search\search-impl\WebContent\WEB-INF\solr\configsets\test\conf\schema.xml
>   at 
> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:78)
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:516)
>   ... 12 more
> Caused by: com.google.common.util.concurrent.UncheckedExecutionException: 
> org.apache.solr.common.SolrException: org.xml.sax.SAXParseException; 
> systemId: solrres:/schema.xml; lineNumber: 3; columnNumber: 84; Error 
> attempting to parse XML file (href='schema-common.xml').. Schema file is 
> C:\source\search\search-impl\WebContent\WEB-INF\solr\configsets\test\conf\schema.xml
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2199)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
>   at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
>   at 
> org.apache.solr.core.ConfigSetService$SchemaCaching.createIndexSchema(ConfigSetService.java:206)
>   at 
> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:74)
>   ... 13 more
> Caused by: org.apache.solr.common.SolrException: 
> org.xml.sax.SAXParseException; systemId: solrres:/schema.xml; lineNumber: 3; 
> columnNumber: 84; Error attempting to parse XML file 
> (href='schema-common.xml').. Schema file is 
> C:\source\search\search-impl\WebContent\WEB-INF\solr\configsets\test\conf\schema.xml
>   at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:596)
>   at org.apache.solr.schema.IndexSchema.(IndexSchema.java:175)
>   at 
> org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
>   at 
> org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
>   at 
> org.apache.solr.core.ConfigSetService$SchemaCaching$1.call(ConfigSetService.java:210)
>   at 
> org.apache.solr.core.ConfigSetService$SchemaCaching$1.call(ConfigSetService.java:206)
>   at 
> com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
>   ... 17 more
> Caused by: org.apache.solr.common.SolrException: 
> org.xml.sax.SAXParseException; systemId: solrres:/schema.xml; lineNumber: 3; 
> columnNumber: 84; Error attempting to parse XML file 
> (href='schema-common.xml').
>   at org.apache.solr.core.Config.(Config.java:156)
>   at org.apache.solr.core.Config.(Config.java:92)
>   at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:455)
>   ... 27 more
> Caused by: org.xml.sax.SAXParseException; systemId: solrres:/schema.xml; 
> lineNumber: 3; columnNumber: 84; Error attempting to parse XML file 
> (href='schema-common.xml').
>

Ability to load solrcore.properties from zookeeper

2015-05-27 Thread Steve Davids
I am attempting to override some properties in my solrconfig.xml file by
specifying properties in a solrcore.properties file which is uploaded in
Zookeeper's collections/conf directory, though when I go to create a new
collection those properties are never loaded. One work-around is to specify
properties at collection creation time but then there doesn't seem to be an
easy way of updating those properties cluster-wide, I did attempt to
specify a request parameter of 'property.properties=solrcore.properties' in
the collection creation request but that also fails.

Does anyone have any ideas on how I can get this working? If not the
solrcore.properties route, really I just need any way to specify properties
within ZK that all replicas can pickup and read + update itself
appropriately.

Thanks,

-Steve


Re: Deleting Fields

2015-05-30 Thread Steve Rowe
Hi Joseph,

> On May 30, 2015, at 8:18 AM, Joseph Obernberger  
> wrote:
> 
> Thank you Erick.  I was thinking that it actually went through and removed 
> the index data; that you for the clarification.

I added more info to the Schema API page about this not being true.  Here’s 
what I’ve got so far - let me know if you think we should add more warnings 
about this:

-
Re-index after schema modifications!

If you modify your schema, you will likely need to re-index all documents. If 
you do not, you may lose access to documents, or not be able to interpret them 
properly, e.g. after replacing a field type.

Modifying your schema will never modify any documents that are already indexed. 
Again, you must re-index documents in order to apply schema changes to them.

[…]

When modifying the schema with the API, a core reload will automatically occur 
in order for the changes to be available immediately for documents indexed 
thereafter.  Previously indexed documents will not be automatically handled - 
they must be re-indexed if they used schema elements that you changed.
-

Steve

Re: Ability to load solrcore.properties from zookeeper

2015-05-30 Thread Steve Davids
Sorry for not responding back earlier, I went ahead and created a ticket
here:

https://issues.apache.org/jira/browse/SOLR-7613

It does look somewhat trivial if you just update the current loading
mechanism as Chris describes, I can provide a patch for that if you want.
Though, if you want to go the refactoring route I can leave it to Alan to
take a crack at it.

Thanks,

-Steve

On Fri, May 29, 2015 at 3:29 AM, Alan Woodward  wrote:

> Yeah, you could do it like that.  But looking at it further, I think
> solrcore.properties is actually being loaded in entirely the wrong place -
> it should be done by whatever is creating the CoreDescriptor, and then
> passed in as a Properties object to the CD constructor.  At the moment, you
> can't refer to a property defined in solrcore.properties within your
> core.properties file.
>
> I'll open a JIRA if Steve hasn't already done so
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 28 May 2015, at 17:57, Chris Hostetter wrote:
>
> >
> > : certainly didn't intend to write it like this!).  The problem here will
> > : be that CoreDescriptors are currently built entirely from
> > : core.properties files, and the CoreLocators that construct them don't
> > : have any access to zookeeper.
> >
> > But they do have access to the CoreContainer which is passed to the
> > CoreDescriptor constructor -- it has all the ZK access you'd need at the
> > time when loadExtraProperties() is called.
> >
> > correct?
> >
> > as fleshed out in my last emil...
> >
> > : > patch:  IIUC CoreDescriptor.loadExtraProperties is the relevent
> method ...
> > : > it would need to build up the path including the core name and get
> the
> > : > system level resource loader (CoreContainer.getResourceLoader()) to
> access
> > : > it since the core doesn't exist yet so there is no core level
> > : > ResourceLoader to use.
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
>
>


Re: ManagedStopFilterFactory not accepting ignoreCase

2015-06-17 Thread Steve Rowe
Hi Mike,

Looks like a bug to me - would you please create a JIRA?

Thanks,
Steve

> On Jun 17, 2015, at 10:29 AM, Mike Thomsen  wrote:
> 
> We're running Solr 4.10.4 and getting this...
> 
> Caused by: java.lang.IllegalArgumentException: Unknown parameters:
> {ignoreCase=true}
>at
> org.apache.solr.rest.schema.analysis.BaseManagedTokenFilterFactory.(BaseManagedTokenFilterFactory.java:46)
>at
> org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory.(ManagedStopFilterFactory.java:47)
> 
> This is the filter definition I used:
> 
>   ignoreCase="true"
>  managed="english"/>
> 
> Any ideas?
> 
> Thanks,
> 
> Mike



Re: ManagedStopFilterFactory not accepting ignoreCase

2015-06-17 Thread Steve Rowe
Oh, I see you already did :) - thanks. - Steve

> On Jun 17, 2015, at 11:10 AM, Steve Rowe  wrote:
> 
> Hi Mike,
> 
> Looks like a bug to me - would you please create a JIRA?
> 
> Thanks,
> Steve
> 
>> On Jun 17, 2015, at 10:29 AM, Mike Thomsen  wrote:
>> 
>> We're running Solr 4.10.4 and getting this...
>> 
>> Caused by: java.lang.IllegalArgumentException: Unknown parameters:
>> {ignoreCase=true}
>>   at
>> org.apache.solr.rest.schema.analysis.BaseManagedTokenFilterFactory.(BaseManagedTokenFilterFactory.java:46)
>>   at
>> org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory.(ManagedStopFilterFactory.java:47)
>> 
>> This is the filter definition I used:
>> 
>> > ignoreCase="true"
>> managed="english"/>
>> 
>> Any ideas?
>> 
>> Thanks,
>> 
>> Mike
> 



Re: MappingCharFilterFactory and start and end offsets

2015-06-18 Thread Steve Rowe
Hi Dmitry,

It’s weird that start and end offsets are the same - what do you see for the 
start/end of ‘$’, i.e. if you take out MCFF?  (I think it should be start:5, 
end:6.)

As far as offsets “respecting the remapped token”, are you asking for offsets 
to be set as if ‘dollarsign' were part of the original text?  If so, there is 
no setting that would do that - the intent is for offsets to map to the 
*original* text.  You can work around this by performing the substitution prior 
to Solr analysis, e.g. in an update processor like RegexReplaceProcessorFactory.

Steve
www.lucidworks.com

> On Jun 18, 2015, at 3:07 AM, Dmitry Kan  wrote:
> 
> Hi,
> 
> It looks like MappingCharFilter sets start and end offset to the same
> value. Can this be affected on by some setting?
> 
> For a string: test $ test2 and mapping "$" => " dollarsign " (we insert
> extra space to separate $ into its own token)
> 
> we get: http://snag.gy/eJT1H.jpg
> 
> Ideally, we would like to have start and end offset respecting the remapped
> token. Can this be achieved with settings?
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info



Re: Help: Problem in customized token filter

2015-06-18 Thread Steve Rowe
Hi Aman,

The admin UI screenshot you linked to is from an older version of Solr - what 
version are you using?

Lots of extraneous angle brackets and asterisks got into your email and made 
for a bunch of cleanup work before I could read or edit it.  In the future, 
please put your code somewhere people can easily read it and copy/paste it into 
an editor: into a github gist or on a paste service, etc.

Looks to me like your use of “exhausted” is unnecessary, and is likely the 
cause of the problem you saw (only one document getting processed): you never 
set exhausted to false, and when the filter got reused, it incorrectly carried 
state from the previous document.

Here’s a simpler version that’s hopefully more correct and more efficient (2 
fewer copies from the StringBuilder to the final token).  Note: I didn’t test 
it:

https://gist.github.com/sarowe/9b9a52b683869ced3a17

Steve
www.lucidworks.com

> On Jun 18, 2015, at 11:33 AM, Aman Tandon  wrote:
> 
> Please help, what wrong I am doing here. please guide me.
> 
> With Regards
> Aman Tandon
> 
> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
> wrote:
> 
>> Hi,
>> 
>> I created a *token concat filter* to concat all the tokens from token
>> stream. It creates the concatenated token as expected.
>> 
>> But when I am posting the xml containing more than 30,000 documents, then
>> only first document is having the data of that field.
>> 
>> *Schema:*
>> 
>> *>> required="false" omitNorms="false" multiValued="false" />*
>> 
>> 
>> 
>> 
>> 
>> 
>>> *>> positionIncrementGap="100">*
>>> *  *
>>> **
>>> **
>>> *>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
>>> **
>>> *>> outputUnigrams="true" tokenSeparator=""/>*
>>> *>> language="English" protected="protwords.txt"/>*
>>> *>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>> *>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
>>> expand="true"/>*
>>> *  *
>>> *  *
>>> **
>>> *>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
>>> *>> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />*
>>> *>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
>>> **
>>> *>> language="English" protected="protwords.txt"/>*
>>> *>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>> *  ***
>> 
>> 
>> Please help me, The code for the filter is as follows, please take a look.
>> 
>> Here is the picture of what filter is doing
>> <http://i.imgur.com/THCsYtG.png?1>
>> 
>> The code of concat filter is :
>> 
>> *package com.xyz.analysis.concat;*
>>> 
>>> *import java.io.IOException;*
>>> 
>>> 
>>>> *import org.apache.lucene.analysis.TokenFilter;*
>>> 
>>> *import org.apache.lucene.analysis.TokenStream;*
>>> 
>>> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
>>> 
>>> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
>>> 
>>> *import
>>>> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
>>> 
>>> *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
>>> 
>>> 
>>>> *public class ConcatenateWordsFilter extends TokenFilter {*
>>> 
>>> 
>>>> *  private CharTermAttribute charTermAttribute =
>>>> addAttribute(CharTermAttribute.class);*
>>> 
>>> *  private OffsetAttribute offsetAttribute =
>>>> addAttribute(OffsetAttribute.class);*
>>> 
>>> *  PositionIncrementAttribute posIncr =
>>>> addAttribute(PositionIncrementAttribute.class);*
>>> 
>>> *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
>>> 
>>> 
>>

Re: Help: Problem in customized token filter

2015-06-18 Thread Steve Rowe
Aman,

My version won’t produce anything at all, since incrementToken() always returns 
false…

I updated the gist (at the same URL) to fix the problem by returning true from 
incrementToken() once and then false until reset() is called.  It also handles 
the case when the concatenated token is zero length by not emitting a token.

Steve
www.lucidworks.com

> On Jun 19, 2015, at 12:55 AM, Steve Rowe  wrote:
> 
> Hi Aman,
> 
> The admin UI screenshot you linked to is from an older version of Solr - what 
> version are you using?
> 
> Lots of extraneous angle brackets and asterisks got into your email and made 
> for a bunch of cleanup work before I could read or edit it.  In the future, 
> please put your code somewhere people can easily read it and copy/paste it 
> into an editor: into a github gist or on a paste service, etc.
> 
> Looks to me like your use of “exhausted” is unnecessary, and is likely the 
> cause of the problem you saw (only one document getting processed): you never 
> set exhausted to false, and when the filter got reused, it incorrectly 
> carried state from the previous document.
> 
> Here’s a simpler version that’s hopefully more correct and more efficient (2 
> fewer copies from the StringBuilder to the final token).  Note: I didn’t test 
> it:
> 
>https://gist.github.com/sarowe/9b9a52b683869ced3a17
> 
> Steve
> www.lucidworks.com
> 
>> On Jun 18, 2015, at 11:33 AM, Aman Tandon  wrote:
>> 
>> Please help, what wrong I am doing here. please guide me.
>> 
>> With Regards
>> Aman Tandon
>> 
>> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
>> wrote:
>> 
>>> Hi,
>>> 
>>> I created a *token concat filter* to concat all the tokens from token
>>> stream. It creates the concatenated token as expected.
>>> 
>>> But when I am posting the xml containing more than 30,000 documents, then
>>> only first document is having the data of that field.
>>> 
>>> *Schema:*
>>> 
>>> *>>> required="false" omitNorms="false" multiValued="false" />*
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> *>>> positionIncrementGap="100">*
>>>> *  *
>>>> **
>>>> **
>>>> *>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
>>>> **
>>>> *>>> outputUnigrams="true" tokenSeparator=""/>*
>>>> *>>> language="English" protected="protwords.txt"/>*
>>>> *>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>>> *>>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
>>>> expand="true"/>*
>>>> *  *
>>>> *  *
>>>> **
>>>> *>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
>>>> *>>> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />*
>>>> *>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
>>>> **
>>>> *>>> language="English" protected="protwords.txt"/>*
>>>> *>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>>> *  ***
>>> 
>>> 
>>> Please help me, The code for the filter is as follows, please take a look.
>>> 
>>> Here is the picture of what filter is doing
>>> <http://i.imgur.com/THCsYtG.png?1>
>>> 
>>> The code of concat filter is :
>>> 
>>> *package com.xyz.analysis.concat;*
>>>> 
>>>> *import java.io.IOException;*
>>>> 
>>>> 
>>>>> *import org.apache.lucene.analysis.TokenFilter;*
>>>> 
>>>> *import org.apache.lucene.analysis.TokenStream;*
>>>> 
>>>> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
>>>> 
>>>> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
>

Re: Help: Problem in customized token filter

2015-06-18 Thread Steve Rowe
Aman,

Solr uses the same Token filter instances over and over, calling reset() before 
sending each document through.  Your code sets “exhausted" to true and then 
never sets it back to false, so the next time the token filter instance is 
used, its “exhausted" value is still true, so no input stream tokens are 
concatenated ever again.

Does that make sense?

Steve
www.lucidworks.com

> On Jun 19, 2015, at 1:10 AM, Aman Tandon  wrote:
> 
> Hi Steve,
> 
> 
>> you never set exhausted to false, and when the filter got reused, *it
>> incorrectly carried state from the previous document.*
> 
> 
> Thanks for replying, but I am not able to understand this.
> 
> With Regards
> Aman Tandon
> 
> On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe  wrote:
> 
>> Hi Aman,
>> 
>> The admin UI screenshot you linked to is from an older version of Solr -
>> what version are you using?
>> 
>> Lots of extraneous angle brackets and asterisks got into your email and
>> made for a bunch of cleanup work before I could read or edit it.  In the
>> future, please put your code somewhere people can easily read it and
>> copy/paste it into an editor: into a github gist or on a paste service, etc.
>> 
>> Looks to me like your use of “exhausted” is unnecessary, and is likely the
>> cause of the problem you saw (only one document getting processed): you
>> never set exhausted to false, and when the filter got reused, it
>> incorrectly carried state from the previous document.
>> 
>> Here’s a simpler version that’s hopefully more correct and more efficient
>> (2 fewer copies from the StringBuilder to the final token).  Note: I didn’t
>> test it:
>> 
>>https://gist.github.com/sarowe/9b9a52b683869ced3a17
>> 
>> Steve
>> www.lucidworks.com
>> 
>>> On Jun 18, 2015, at 11:33 AM, Aman Tandon 
>> wrote:
>>> 
>>> Please help, what wrong I am doing here. please guide me.
>>> 
>>> With Regards
>>> Aman Tandon
>>> 
>>> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I created a *token concat filter* to concat all the tokens from token
>>>> stream. It creates the concatenated token as expected.
>>>> 
>>>> But when I am posting the xml containing more than 30,000 documents,
>> then
>>>> only first document is having the data of that field.
>>>> 
>>>> *Schema:*
>>>> 
>>>> *>>>> required="false" omitNorms="false" multiValued="false" />*
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> *>>>> positionIncrementGap="100">*
>>>>> *  *
>>>>> **
>>>>> **
>>>>> *>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
>>>>> **
>>>>> *>>>> outputUnigrams="true" tokenSeparator=""/>*
>>>>> *>>>> language="English" protected="protwords.txt"/>*
>>>>> *>>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>>>> *>>>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
>>>>> expand="true"/>*
>>>>> *  *
>>>>> *  *
>>>>> **
>>>>> *>>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
>>>>> *>>>> words="stopwords_text_prime_search.txt"
>> enablePositionIncrements="true" />*
>>>>> *>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
>>>>> **
>>>>> *>>>> language="English" protected="protwords.txt"/>*
>>>>> *>>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>>>> *  ***
>>>> 
>>>> 
>>>> Pleas

Re: accent insensitive field-type

2015-07-02 Thread Steve Rowe
Hi Søren,

“charFilter” should be “charFilters”, and “filter” should be “filters”; and 
both their values should be arrays - try this:

{
  "add-field-type”: {
"name":"myTxtField",
"class":"solr.TextField",
"positionIncrementGap":"100",
"analyzer”: {
  "charFilters": [ {"class":"solr.MappingCharFilterFactory", 
"mapping":"mapping-ISOLatin1Accent.txt”} ],
  "tokenizer": [ {"class":"solr.StandardTokenizerFactory”} ],
  "filters": {"class":"solr.LowerCaseFilterFactory"}
}
  }
}

There should be better error messages for misspellings here.  I’ll file a JIRA 
issue.

(I also moved “filters” after “tokenizer” since that’s the order in which 
they’re executed in an analysis pipeline, but Solr will interpret the 
out-of-order version correctly.)

FYI, if you want to *correct* a field type, rather than create a new one, you 
should use the “replace-field-type” command instead of the “add-field-type” 
command.  You’ll get an error if you attempt to add a field type that already 
exists in the schema.

Steve

> On Jul 2, 2015, at 1:17 AM, Søren  wrote:
> 
> Hi Solr users
> 
> I'm new to Solr and I need to be able to search in structured data in a case 
> and accent insensitive manner. E.g. find "Crème brûlée", both when quering 
> with "Crème brûlée" and "creme brulee".
> 
> It seems that none of the build-in text types support this, or am I wrong?
> So I try to add my own inspired by another post, although it was old.
> 
> I'm running solr-5.2.1.
> 
> Curl to http://localhost:8983/solr/mycore/schema
> {
> "add-field-type":{
> "name":"myTxtField",
> "class":"solr.TextField",
> "positionIncrementGap":"100",
> "analyzer":{
>"charFilter": {"class":"solr.MappingCharFilterFactory", 
> "mapping":"mapping-ISOLatin1Accent.txt"},
>"filter": {"class":"solr.LowerCaseFilterFactory"},
>"tokenizer": {"class":"solr.StandardTokenizerFactory"}
>}
>}
> }
> 
> But it doesn't work and when I look in '[... 
> ]\solr-5.2.1\server\solr\mycore\conf\managed-schema'
> the analyzer section is reduced to this:
>   positionIncrementGap="100">
>
>  
>
>  
> 
> I'm I almost there or am I on a completely wrong track?
> 
> Thanks in advance
> Søren
> 



Re: accent insensitive field-type

2015-07-02 Thread Steve Rowe
See https://issues.apache.org/jira/browse/SOLR-7749

> On Jul 2, 2015, at 8:31 AM, Steve Rowe  wrote:
> 
> Hi Søren,
> 
> “charFilter” should be “charFilters”, and “filter” should be “filters”; and 
> both their values should be arrays - try this:
> 
> {
>  "add-field-type”: {
>"name":"myTxtField",
>"class":"solr.TextField",
>"positionIncrementGap":"100",
>"analyzer”: {
>  "charFilters": [ {"class":"solr.MappingCharFilterFactory", 
> "mapping":"mapping-ISOLatin1Accent.txt”} ],
>  "tokenizer": [ {"class":"solr.StandardTokenizerFactory”} ],
>  "filters": {"class":"solr.LowerCaseFilterFactory"}
>}
>  }
> }
> 
> There should be better error messages for misspellings here.  I’ll file a 
> JIRA issue.
> 
> (I also moved “filters” after “tokenizer” since that’s the order in which 
> they’re executed in an analysis pipeline, but Solr will interpret the 
> out-of-order version correctly.)
> 
> FYI, if you want to *correct* a field type, rather than create a new one, you 
> should use the “replace-field-type” command instead of the “add-field-type” 
> command.  You’ll get an error if you attempt to add a field type that already 
> exists in the schema.
> 
> Steve
> 
>> On Jul 2, 2015, at 1:17 AM, Søren  wrote:
>> 
>> Hi Solr users
>> 
>> I'm new to Solr and I need to be able to search in structured data in a case 
>> and accent insensitive manner. E.g. find "Crème brûlée", both when quering 
>> with "Crème brûlée" and "creme brulee".
>> 
>> It seems that none of the build-in text types support this, or am I wrong?
>> So I try to add my own inspired by another post, although it was old.
>> 
>> I'm running solr-5.2.1.
>> 
>> Curl to http://localhost:8983/solr/mycore/schema
>> {
>> "add-field-type":{
>>"name":"myTxtField",
>>"class":"solr.TextField",
>>"positionIncrementGap":"100",
>>"analyzer":{
>>   "charFilter": {"class":"solr.MappingCharFilterFactory", 
>> "mapping":"mapping-ISOLatin1Accent.txt"},
>>   "filter": {"class":"solr.LowerCaseFilterFactory"},
>>   "tokenizer": {"class":"solr.StandardTokenizerFactory"}
>>   }
>>   }
>> }
>> 
>> But it doesn't work and when I look in '[... 
>> ]\solr-5.2.1\server\solr\mycore\conf\managed-schema'
>> the analyzer section is reduced to this:
>> > positionIncrementGap="100">
>>   
>> 
>>   
>> 
>> 
>> I'm I almost there or am I on a completely wrong track?
>> 
>> Thanks in advance
>> Søren
>> 
> 



Re: accent insensitive field-type

2015-07-03 Thread Steve Rowe
Hi Søren,

> On Jul 3, 2015, at 4:27 AM, Søren  wrote:
> 
> Thanks Steve! Everything works now.
> A little modification:
> 
>"analyzer":{
>"charFilters": [ {"class":"solr.MappingCharFilterFactory", 
> "mapping":"mapping-ISOLatin1Accent.txt"} ],
>"tokenizer": {"class":"solr.StandardTokenizerFactory"},
>"filters": [{"class":"solr.LowerCaseFilterFactory"}]
>}

I’m glad you got it to work.

Yeah, I put square brackets in the wrong place, cool you figured it out and 
fixed it.

> Thankfully, when key is a plural word, the value is an array.
> 
> It was still teasing me when I tested with various queries. But specifying 
> field solved that for me too.
> 
> ...q=bruleedidn't find anything. It goes into to the raw index I guess
> 
> ...q=desert:bruleedid find "Crème brûlée”!

In your query request handler you should specify a “df” param (default field), 
likely under the defaults section (so that it can be overridden via per-request 
param) - this param will work with the dismax, edisma, or wtandard query 
parsers.  The “qf" param, which supports a list of query fields (and field 
aliasing)[1], also works in the dismax and edismax query parsers.

Steve

[1] See section "Field aliasing using per-field qf overrides” on the edismax 
ref guide page: 
<https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser>
 and the qf param  description on the dismax ref guide page: 
<https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-DisMaxParameters>.

Re: unsubscribe

2015-07-07 Thread Steve Rowe
Hi Jacob,

See https://lucene.apache.org/solr/resources.html#mailing-lists for unsubscribe 
info

Notice also that every email from the solr-user mailing list contains the 
following header:

List-Unsubscribe: <mailto:solr-user-unsubscr...@lucene.apache.org>

Steve

> On Jul 7, 2015, at 11:46 AM, Jacob Singh  wrote:
> 
> Unsubscribe
> On Jul 7, 2015 11:39 AM, "Jacob Singh"  wrote:
> 
>> 
>> 
>> --
>> +1 512-522-6281
>> twitter: @JacobSingh ( http://twitter.com/#!/JacobSingh )
>> web: http://www.jacobsingh.name
>> Skype: pajamadesign
>> gTalk: jacobsi...@gmail.com
>> 



Re: Querying Nested documents

2015-07-13 Thread Steve Rowe
Hi rameshn,

Nabble has a nasty habit of stripping out HTML and XML markup before sending 
your mail out to the mailing list - see your message quoted below for how it 
appears to people who aren’t reading via Nabble.

My suggestion: directly subscribe to the solr-user mailing list[1] and avoid 
Nabble.  (They’ve known about the problem for many years and AFAICT have done 
nothing about it.)

Steve

[1] https://lucene.apache.org/solr/resources.html#mailing-lists

> On Jul 13, 2015, at 12:03 PM, rameshn  wrote:
> 
> Hi, I have question regarding nested documents.My document looks like below,  
>   
> 1234xger00parent  
>  
> 2015-06-15T13:29:07ZegeDuperhttp://www.domain.com 
>   
> zoome1234-images   
> http://somedomain.com/some.jpg1:1   
> 1234-platform-iosios   
> https://somedomain.comsomelinkfalse   
> 2015-03-23T10:58:00Z-12-30T19:00:00Z  
> 
> 1234-platform-androidandroid   
> somedomain.comsomelinkfalse   
> 2015-03-23T10:58:00Z-12-30T19:00:00Z  
> Right now I can query like
> thishttp://localhost:8983/solr/demo/select?q={!parent%20which=%27type:parent%27}&fl=*,[child%20parentFilter=type:parent%20childFilter=image_uri_s:*]&indent=trueand
> get the parent and child document with matching criteria (just parent and
> image child document).*But, I want to get all other children*
> (1234-platform-ios and 1234-platform-andriod) even if i query based on
> image_uri_s (1234-images) although they are other children which are part of
> the parent document.Is it possible ?Appreciate your help !Thanks,Ramesh
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217088.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-17 Thread Steve Rowe
Hi Peter,

Are you familiar with the Schema API?: 
<https://cwiki.apache.org/confluence/display/solr/Schema+API>

You can use it to create fields, field types, etc. prior to ingesting your data.

--
Steve
www.lucidworks.com

> On May 17, 2016, at 11:05 AM, Horváth Péter Gergely 
>  wrote:
> 
> Hi All,
> 
> By default Solr allows you to define the type of a dynamic field by
> appending a post-fix to the name itself. E.g. creating a color_s field
> instructs Solr to create a string field. My understanding is that if we do
> this, all queries must refer the post-fixed field name as well. So
> instead of a query like color:"red", we will have to write something like
> color_s:"red" -- and so on for other field types as well.
> 
> I am wondering if it is possible to specify the data type used for a field
> in Solr 6.0.0, without having to modify the field name. (Or at least in a
> way that would allow us to use the original field name) Do you have any
> idea, how to achieve this? I am fine, if we have to specify the field type
> during the insertion of a document, however, I do not want to keep using
> post-fixes while running queries...
> 
> Thanks,
> Peter



Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-19 Thread Steve Rowe
Peter,

It’s an interesting idea.  Could you make a Solr JIRA?

I don’t know where the field type specification would go, but providing a 
mechanism to specify field type for previously non-existent fields, outside of 
the field names themselves, seems useful.

In the meantime, do you know about field aliasing?  

1. You can get results back that rename fields to whatever you want: see the 
section “Field Name Aliases” here: 
<https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters>.

2. On the query side, eDisMax can perform aliasing so that user-specified field 
names in queries get mapped to one or more indexed fields: look for “alias” in 
<https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser>.

--
Steve
www.lucidworks.com

> On May 19, 2016, at 4:43 AM, Horváth Péter Gergely 
>  wrote:
> 
> Hi Steve,
> 
> Yes, I know the schema API, however I do not want to specify the field type
> problematically for every single field.
> 
> I would like to be able to specify the field type when it is being added
> (similar to the name postfixes, but without affecting the field names).
> 
> Thanks,
> Peter
> 
> 
> 2016-05-17 17:08 GMT+02:00 Steve Rowe :
> 
>> Hi Peter,
>> 
>> Are you familiar with the Schema API?: <
>> https://cwiki.apache.org/confluence/display/solr/Schema+API>
>> 
>> You can use it to create fields, field types, etc. prior to ingesting your
>> data.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On May 17, 2016, at 11:05 AM, Horváth Péter Gergely <
>> peter.gergely.horv...@gmail.com> wrote:
>>> 
>>> Hi All,
>>> 
>>> By default Solr allows you to define the type of a dynamic field by
>>> appending a post-fix to the name itself. E.g. creating a color_s field
>>> instructs Solr to create a string field. My understanding is that if we
>> do
>>> this, all queries must refer the post-fixed field name as well. So
>>> instead of a query like color:"red", we will have to write something like
>>> color_s:"red" -- and so on for other field types as well.
>>> 
>>> I am wondering if it is possible to specify the data type used for a
>> field
>>> in Solr 6.0.0, without having to modify the field name. (Or at least in a
>>> way that would allow us to use the original field name) Do you have any
>>> idea, how to achieve this? I am fine, if we have to specify the field
>> type
>>> during the insertion of a document, however, I do not want to keep using
>>> post-fixes while running queries...
>>> 
>>> Thanks,
>>> Peter
>> 
>> 



Re: Requesting to be added to ContributorsGroup

2016-05-20 Thread Steve Rowe
Hi Sheece,

I have CC’d your address for this email, but ordinarily all discussion goes 
only to the mailing list, so you have to either subscribe to this mailing list 
- see <https://lucene.apache.org/solr/resources.html#mailing-lists> - or follow 
the discussion on a service like Nabble.

I added you to the ContributorsGroup on the same day you requested it - see 
<https://lists.apache.org/thread.html/Z413qwks24kgnm3>.

You should now be able to contribute.  Please let us know if there’s a problem.

--
Steve
www.lucidworks.com

> On May 20, 2016, at 6:30 PM, Syed Gardezi  wrote:
> 
> Hello,
> 
>  There are couple of things that need to be updated on the wiki page. 
> I would like to get it done. Can you kindly update.
> 
> Cheers,
> 
> Sheece
> 
> 
> From: Syed Gardezi
> Sent: Wednesday, 4 May 2016 12:03:01 AM
> To: solr-user@lucene.apache.org
> Subject: Requesting to be added to ContributorsGroup
> 
> Hello,
> I am a Master student as part of Free and Open Source Software 
> Development COMP8440 - http://programsandcourses.anu.edu.au/course/COMP8440 
> at Australian National University. I have selected 
> http://wiki.apache.org/solr/ to contribute too. Kindly add me too 
> ContributorsGroup. Thank you.
> 
> wiki username: sheecegardezi
> 
> Regards,
> Sheece
> 



Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-27 Thread Steve Rowe
I’m working on addressing problems using multi-term synonyms at query time in 
Lucene and Solr.

I recommend these two blogs for understanding the issues (the second one was 
mentioned earlier in this thread):

<http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html>
<https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>

In addition to the already-mentioned projects, there is also:

<https://issues.apache.org/jira/browse/SOLR-5379>

All of these projects try in various ways to work around the fact that Lucene’s 
QueryParser splits on whitespace before sending text to analysis, one token at 
a time, so in a synonym filter, multi-word synonyms can never match and add 
alternatives.  See <https://issues.apache.org/jira/browse/LUCENE-2605>, where 
I’ve posted a patch to directly address that problem - note that it’s still a 
work in progress.

Once LUCENE-2605 has been fixed, there is still work to do getting (e)dismax to 
work with the modified Lucene QueryParser, and addressing problems with how 
queries are constructed from Lucene’s “sausagized” token stream.

--
Steve
www.lucidworks.com

> On May 26, 2016, at 2:21 PM, John Bickerstaff  
> wrote:
> 
> Thanks Chris --
> 
> The two projects I'm aware of are:
> 
> https://github.com/healthonnet/hon-lucene-synonyms
> 
> and the one referenced from the Lucidworks page here:
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> 
> ... which is here : https://github.com/LucidWorks/auto-phrase-tokenfilter
> 
> Is there anything else out there that you would recommend I look at?
> 
> On Thu, May 26, 2016 at 12:01 PM, Chris Morley  wrote:
> 
>> Chris Morley here, from Wayfair.  (Depahelix = my domain)
>> 
>> Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
>> We worked mostly off of Ted Sullivan's work and also off of some
>> suggestions from Koorosh Vakhshoori.  We have gotten to a point where we
>> have a more sophisticated internal implementation, however, we've found
>> that it is very difficult to make it do what you want it to do, and also be
>> sufficiently performant.  Watch out for exceptional situations with mm
>> (minimum should match).
>> 
>> Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also
>> done work in this area.
>> 
>> It should be very possible to get this kind of thing working on
>> SolrCloud.  I haven't tried it yet but I think theoretically, it should
>> just work.  The synonyms stuff is mostly about doing things at index time
>> and query time.  The index time stuff should translate to SolrCloud
>> directly, while the query time stuff might pose some issues, but probably
>> not too bad, if there are any issues at all.
>> 
>> I've had decent luck porting our various plugins from 4.10.x to 5.5.0
>> because a lot of stuff is just Java, and it still works within the Jetty
>> context.
>> 
>> -Chris.
>> 
>> 
>> 
>> 
>> 
>> From: "John Bickerstaff" 
>> Sent: Thursday, May 26, 2016 1:51 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
>> Hey Jeff (or anyone interested in multi-word synonyms) here are some
>> potentially interesting links...
>> 
>> http://wiki.apache.org/solr/QueryParser (search the page for
>> synonum_edismax)
>> 
>> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ (blog
>> post about what became the synonym_edissmax Query Parser)
>> 
>> 
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> 
>> This last was useful for lots of reasons and contains links to other
>> interesting, related web pages...
>> 
>> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes 
>> wrote:
>> 
>>> Oh, interesting. I've certainty encountered issues with multi-word
>>> synonyms, but I hadn't come across this. If you end up using it with a
>>> recent solr verison, I'd be glad to hear your experience.
>>> 
>>> I haven't used it, but I am aware of one other project in this vein that
>>> you might be interested in looking at:
>>> https://github.com/LucidWorks/auto-phrase-tokenfilter
>>> 
>>> 
>>> On 5/26/16, 9:29 AM, "John Bickerstaff" 
>> wrote:
>>> 
>>>> Ahh - for question #3 I may have spoken too soon. This line from the
>>>> github repos

[ANNOUNCE] Apache Solr 6.0.1 released

2016-05-28 Thread Steve Rowe
28 May 2016, Apache Solr™ 6.0.1 available 

The Lucene PMC is pleased to announce the release of Apache Solr 6.0.1 

Solr is the popular, blazing fast, open source NoSQL search platform 
from the Apache Lucene project. Its major features include powerful 
full-text search, hit highlighting, faceted search, dynamic 
clustering, database integration, rich document (e.g., Word, PDF) 
handling, and geospatial search. Solr is highly scalable, providing 
fault tolerant distributed search and indexing, and powers the search 
and navigation features of many of the world's largest internet sites. 

This release includes 31 bug fixes, documentation updates, etc., 
since the 6.0.0 release. 

The release is available for immediate download at: 

http://www.apache.org/dyn/closer.lua/lucene/solr/6.0.1 

Please read CHANGES.txt for a detailed list of changes: 

https://lucene.apache.org/solr/6_0_1/changes/Changes.html 

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html) 

Note: The Apache Software Foundation uses an extensive mirroring 
network for distributing releases. It is possible that the mirror you 
are using may not have replicated the release yet. If that is the 
case, please try another mirror. This also goes for Maven access.

Re: Solr 6.1.x Release Date ??

2016-06-16 Thread Steve Rowe
Tomorrow-ish.

--
Steve
www.lucidworks.com

> On Jun 16, 2016, at 4:14 AM, Ramesh shankar  wrote:
> 
> Hi,
> 
> Yes, i used the solr-6.1.0-79 nightly builds and [subquery] transformer is
> working fine in, any idea of the expected release date for 6.1 ?
> 
> Regards
> Ramesh
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-6-1-x-Release-Date-tp4280945p4282562.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Fail to load org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer for fieldType "preanalyzed"

2016-06-24 Thread Steve Rowe
Hi Liu Peng,

Did you mix parts of an older Solr installation into your 6.0.0 installation?  
There were changes to PreAnalyzedField recently (in 5.5.0: 
<https://issues.apache.org/jira/browse/SOLR-4619>), and so if you mix old Solr 
jars with newer ones, you might see things like the error you showed.  (The 
PreAnalyzedAnalyzer class was not present in older Solr versions.)

If you see this problem with a clean install of Solr 6.0:

* How did you add fields?  By directly modifying schema.xml? Or via the Schema 
API?

* Do any of your documents contain fields that use the "preanalyzed" field type?
 
* Which Java version/vendor are you using?

--
Steve
www.lucidworks.com

> On Jun 23, 2016, at 10:21 PM, t...@sina.com wrote:
> 
> Hi,
> 
> I use Solr 6.0 on Windows. And try the example techproducts. At first I run 
> bin\solr -e techproducts -s "example\techproducts" and it works fine. But 
> when I add several fields, and try to restart it, I get some failures. From 
> the log, it should be fail to load the PreAnalyzedAnalyzer for fieldType 
> "preanalyzed". The call stack is as follow:
> 
> INFO  - 2016-06-24 02:02:29.866; [   ] org.apache.solr.schema.IndexSchema; 
> [techproducts] Schema name=example
> ERROR - 2016-06-24 02:02:30.122; [   ] 
> org.apache.solr.schema.FieldTypePluginLoader; Cannot load analyzer: 
> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer
> java.lang.InstantiationException: 
> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer
>at java.lang.Class.newInstance(Class.java:427)
>at 
> org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:271)
>at 
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:104)
>at 
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:53)
>at 
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
>at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:474)
>at org.apache.solr.schema.IndexSchema.(IndexSchema.java:163)
>at 
> org.apache.solr.schema.ManagedIndexSchema.(ManagedIndexSchema.java:104)
>at 
> org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:172)
>at 
> org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:45)
>at 
> org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:75)
>at 
> org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:108)
>at 
> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:79)
>at org.apache.solr.core.CoreContainer.create(CoreContainer.java:815)
>at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:88)
>at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:468)
>at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:459)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer.()
>at java.lang.Class.getConstructor0(Class.java:3082)
>at java.lang.Class.newInstance(Class.java:412)
>... 21 more
> ERROR - 2016-06-24 02:02:30.128; [   ] org.apache.solr.core.CoreContainer; 
> Error creating core [techproducts]: Could not load conf for core 
> techproducts: Can't load schema 
> C:\WORK\Solr\Solr6.0\solr-6.0.0\example\techproducts\solr\techproducts\conf\managed-schema:
>  Plugin init failure for [schema.xml] fieldType "preanalyzed": Cannot load 
> analyzer: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer.
> 
> What could be the reason? On stackoverflow, some answers mentioned that to 
> fix the error like "java.lang.NoSuchMethodException: .YYY.()", we 
> could add a constructor without arguments. But for this issue, I don't think 
> so.
> 
> Thanks
> Liu Peng



[ANNOUNCE] Apache Solr 5.5.2 released

2016-06-25 Thread Steve Rowe
25 June 2016, Apache Solr™ 5.5.2 available

The Lucene PMC is pleased to announce the release of Apache Solr 5.5.2

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

This release includes 38 bug fixes, documentation updates, etc.,
since the 5.5.1 release.

The release is available for immediate download at:

  http://www.apache.org/dyn/closer.lua/lucene/solr/5.5.2

Please read CHANGES.txt for a detailed list of changes:

  https://lucene.apache.org/solr/5_5_2/changes/Changes.html

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.



Re: analyzer for _text_ field

2016-07-15 Thread Steve Rowe
Hi Waldyr,

An example of changing the _text_ analyzer by first creating a new field type, 
and then changing the _text_ field to use the new field type (after starting 
Solr 6.1 with “bin/solr start -e schemaless”):

-
PROMPT$ curl -X POST -H 'Content-type: application/json’ \
http://localhost:8983/solr/gettingstarted/schema --data-binary '{
  "add-field-type": {
"name": "my_new_field_type",
"class": "solr.TextField",
"analyzer": {
  "charFilters": [{
"class": "solr.HTMLStripCharFilterFactory"
  }],
  "tokenizer": {
"class": "solr.StandardTokenizerFactory"
  },
  "filters":[{
  "class": "solr.WordDelimiterFilterFactory"
}, {
  "class": "solr.LowerCaseFilterFactory"
  }]}},
  "replace-field": {
"name": "_text_",
"type": "my_new_field_type",
"multiValued": "true",
"indexed": "true",
"stored": "false"
  }}’
-

PROMPT$ curl http://localhost:8983/solr/gettingstarted/schema/fields/_text_

-
{
  "responseHeader”:{ […] },
  "field":{
"name":"_text_",
"type":"my_new_field_type",
"multiValued":true,
"indexed":true,
"stored":false}}
-

--
Steve
www.lucidworks.com

> On Jul 15, 2016, at 12:54 PM, Waldyr Neto  wrote:
> 
> Hy, How can i configure the analyzer for the _text_ field?



Re: analyzer for _text_ field

2016-07-15 Thread Steve Rowe
Waldyr, maybe it got mangled by my email client or yours?  

Here’s the same command:

  <https://gist.github.com/sarowe/db2fcd168eb77d7278f716ac75bfb9e9>

--
Steve
www.lucidworks.com

> On Jul 15, 2016, at 2:16 PM, Waldyr Neto  wrote:
> 
> Hy Steves, tks for the help
> unfortunately i'm making some mistake
> 
> when i try to run
>>> 
> curl -X POST -H 'Content-type: application/json’ \
> http://localhost:8983/solr/gettingstarted/schema --data-binary
> '{"add-field-type": { "name": "my_new_field_type", "class":
> "solr.TextField","analyzer": {"charFilters": [{"class":
> "solr.HTMLStripCharFilterFactory"}], "tokenizer": {"class":
> "solr.StandardTokenizerFactory"},"filters":[{"class":
> "solr.WordDelimiterFilterFactory"}, {"class":
> "solr.LowerCaseFilterFactory"}]}},"replace-field": { "name":
> "_text_","type": "my_new_field_type", "multiValued": "true","indexed":
> "true","stored": "false"}}’
> 
> i receave the folow error msg from curl program
> :
> 
> curl: (3) [globbing] unmatched brace in column 1
> 
> curl: (6) Could not resolve host: name
> 
> curl: (6) Could not resolve host: my_new_field_type,
> 
> curl: (6) Could not resolve host: class
> 
> curl: (6) Could not resolve host: solr.TextField,analyzer
> 
> curl: (3) [globbing] unmatched brace in column 1
> 
> curl: (3) [globbing] bad range specification in column 2
> 
> curl: (3) [globbing] unmatched close brace/bracket in column 32
> 
> curl: (6) Could not resolve host: tokenizer
> 
> curl: (3) [globbing] unmatched brace in column 1
> 
> curl: (3) [globbing] unmatched close brace/bracket in column 30
> 
> curl: (3) [globbing] unmatched close brace/bracket in column 32
> 
> curl: (3) [globbing] unmatched brace in column 1
> 
> curl: (3) [globbing] unmatched close brace/bracket in column 28
> 
> curl: (3) [globbing] unmatched brace in column 1
> 
> curl: (6) Could not resolve host: name
> 
> curl: (6) Could not resolve host: _text_,type
> 
> curl: (6) Could not resolve host: my_new_field_type,
> 
> curl: (6) Could not resolve host: multiValued
> 
> curl: (6) Could not resolve host: true,indexed
> 
> curl: (6) Could not resolve host: true,stored
> 
> curl: (3) [globbing] unmatched close brace/bracket in column 6
> 
> cvs1:~ vvisionphp1$
> 
> On Fri, Jul 15, 2016 at 2:45 PM, Steve Rowe  wrote:
> 
>> Hi Waldyr,
>> 
>> An example of changing the _text_ analyzer by first creating a new field
>> type, and then changing the _text_ field to use the new field type (after
>> starting Solr 6.1 with “bin/solr start -e schemaless”):
>> 
>> -
>> PROMPT$ curl -X POST -H 'Content-type: application/json’ \
>>http://localhost:8983/solr/gettingstarted/schema --data-binary '{
>>  "add-field-type": {
>>"name": "my_new_field_type",
>>"class": "solr.TextField",
>>"analyzer": {
>>  "charFilters": [{
>>"class": "solr.HTMLStripCharFilterFactory"
>>  }],
>>      "tokenizer": {
>>"class": "solr.StandardTokenizerFactory"
>>  },
>>  "filters":[{
>>  "class": "solr.WordDelimiterFilterFactory"
>>}, {
>>  "class": "solr.LowerCaseFilterFactory"
>>  }]}},
>>  "replace-field": {
>>"name": "_text_",
>>"type": "my_new_field_type",
>>"multiValued": "true",
>>"indexed": "true",
>>"stored": "false"
>>  }}’
>> -
>> 
>> PROMPT$ curl
>> http://localhost:8983/solr/gettingstarted/schema/fields/_text_
>> 
>> -
>> {
>>  "responseHeader”:{ […] },
>>  "field":{
>>"name":"_text_",
>>"type":"my_new_field_type",
>>"multiValued":true,
>>"indexed":true,
>>"stored":false}}
>> -
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Jul 15, 2016, at 12:54 PM, Waldyr Neto  wrote:
>>> 
>>> Hy, How can i configure the analyzer for the _text_ field?
>> 
>> 



Re: analyzer for _text_ field

2016-07-16 Thread Steve Rowe
Waldyr,

I don’t understand your first question - are you asking how to change the 
schema without using the Schema API?

About phonetic matching: there are several different phonetic token filters 
provided with Solr - see 
<https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching>.

--
Steve
www.lucidworks.com

> On Jul 16, 2016, at 5:26 AM, Waldyr Neto  wrote:
> 
> tks, it works :)
> 
> but do you know how i could do this, thange the _text_ analyzer using
> schemas? maybe in any point i could change the default analyzer. what i
> really need is to use any analyzer that work with phonetic search in the
> content of my files;
> 
> On Fri, Jul 15, 2016 at 10:11 PM, Waldyr Neto  wrote:
> 
>> tks a lot, i'll try soon and give u a feed back :)
>> 
>> On Fri, Jul 15, 2016 at 4:07 PM, David Santamauro <
>> david.santama...@gmail.com> wrote:
>> 
>>> 
>>> The opening and closing single quotes don't match
>>> 
>>> -data-binary '{ ... }’
>>> 
>>> it should be:
>>> 
>>> -data-binary '{ ... }'
>>> 
>>> 
>>> 
>>> On 07/15/2016 02:59 PM, Steve Rowe wrote:
>>> 
>>>> Waldyr, maybe it got mangled by my email client or yours?
>>>> 
>>>> Here’s the same command:
>>>> 
>>>>   <https://gist.github.com/sarowe/db2fcd168eb77d7278f716ac75bfb9e9>
>>>> 
>>>> --
>>>> Steve
>>>> www.lucidworks.com
>>>> 
>>>> On Jul 15, 2016, at 2:16 PM, Waldyr Neto  wrote:
>>>>> 
>>>>> Hy Steves, tks for the help
>>>>> unfortunately i'm making some mistake
>>>>> 
>>>>> when i try to run
>>>>> 
>>>>>> 
>>>>>>> curl -X POST -H 'Content-type: application/json’ \
>>>>> http://localhost:8983/solr/gettingstarted/schema --data-binary
>>>>> '{"add-field-type": { "name": "my_new_field_type", "class":
>>>>> "solr.TextField","analyzer": {"charFilters": [{"class":
>>>>> "solr.HTMLStripCharFilterFactory"}], "tokenizer": {"class":
>>>>> "solr.StandardTokenizerFactory"},"filters":[{"class":
>>>>> "solr.WordDelimiterFilterFactory"}, {"class":
>>>>> "solr.LowerCaseFilterFactory"}]}},"replace-field": { "name":
>>>>> "_text_","type": "my_new_field_type", "multiValued": "true","indexed":
>>>>> "true","stored": "false"}}’
>>>>> 
>>>>> i receave the folow error msg from curl program
>>>>> :
>>>>> 
>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>> 
>>>>> curl: (6) Could not resolve host: name
>>>>> 
>>>>> curl: (6) Could not resolve host: my_new_field_type,
>>>>> 
>>>>> curl: (6) Could not resolve host: class
>>>>> 
>>>>> curl: (6) Could not resolve host: solr.TextField,analyzer
>>>>> 
>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>> 
>>>>> curl: (3) [globbing] bad range specification in column 2
>>>>> 
>>>>> curl: (3) [globbing] unmatched close brace/bracket in column 32
>>>>> 
>>>>> curl: (6) Could not resolve host: tokenizer
>>>>> 
>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>> 
>>>>> curl: (3) [globbing] unmatched close brace/bracket in column 30
>>>>> 
>>>>> curl: (3) [globbing] unmatched close brace/bracket in column 32
>>>>> 
>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>> 
>>>>> curl: (3) [globbing] unmatched close brace/bracket in column 28
>>>>> 
>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>> 
>>>>> curl: (6) Could not resolve host: name
>>>>> 
>>>>> curl: (6) Could not resolve host: _text_,type
>>>>> 
>>>>> curl: (6) Could not resolve host: my_new_field_type,
>>>>> 
>>>>> curl: (6) Could not resolve host: multiValued
>>>>> 
>>>>> curl: (6) Could no

Re: analyzer for _text_ field

2016-07-16 Thread Steve Rowe
Waldyr, I recommend you start reading the Solr Reference Guide here: 
<https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters>.
  In the following sections, there are many examples of schema.xml 
configuration of field types and fields.

In general: what you’ll want to do is either modify the field type that the 
_text_ field uses, or create a new field type and change the _text_ field 
definition to use it instead.

--
Steve
www.lucidworks.com

> On Jul 16, 2016, at 1:38 PM, Waldyr Neto  wrote:
> 
> yeap,
> 
> i'm loking for a way to specify in schema.xml theh analyzer for the _text_
> field
> 
> On Sat, Jul 16, 2016 at 12:22 PM, Steve Rowe  wrote:
> 
>> Waldyr,
>> 
>> I don’t understand your first question - are you asking how to change the
>> schema without using the Schema API?
>> 
>> About phonetic matching: there are several different phonetic token
>> filters provided with Solr - see <
>> https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching>.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Jul 16, 2016, at 5:26 AM, Waldyr Neto  wrote:
>>> 
>>> tks, it works :)
>>> 
>>> but do you know how i could do this, thange the _text_ analyzer using
>>> schemas? maybe in any point i could change the default analyzer. what i
>>> really need is to use any analyzer that work with phonetic search in the
>>> content of my files;
>>> 
>>> On Fri, Jul 15, 2016 at 10:11 PM, Waldyr Neto 
>> wrote:
>>> 
>>>> tks a lot, i'll try soon and give u a feed back :)
>>>> 
>>>> On Fri, Jul 15, 2016 at 4:07 PM, David Santamauro <
>>>> david.santama...@gmail.com> wrote:
>>>> 
>>>>> 
>>>>> The opening and closing single quotes don't match
>>>>> 
>>>>> -data-binary '{ ... }’
>>>>> 
>>>>> it should be:
>>>>> 
>>>>> -data-binary '{ ... }'
>>>>> 
>>>>> 
>>>>> 
>>>>> On 07/15/2016 02:59 PM, Steve Rowe wrote:
>>>>> 
>>>>>> Waldyr, maybe it got mangled by my email client or yours?
>>>>>> 
>>>>>> Here’s the same command:
>>>>>> 
>>>>>>  <https://gist.github.com/sarowe/db2fcd168eb77d7278f716ac75bfb9e9>
>>>>>> 
>>>>>> --
>>>>>> Steve
>>>>>> www.lucidworks.com
>>>>>> 
>>>>>> On Jul 15, 2016, at 2:16 PM, Waldyr Neto  wrote:
>>>>>>> 
>>>>>>> Hy Steves, tks for the help
>>>>>>> unfortunately i'm making some mistake
>>>>>>> 
>>>>>>> when i try to run
>>>>>>> 
>>>>>>>> 
>>>>>>>>> curl -X POST -H 'Content-type: application/json’ \
>>>>>>> http://localhost:8983/solr/gettingstarted/schema --data-binary
>>>>>>> '{"add-field-type": { "name": "my_new_field_type", "class":
>>>>>>> "solr.TextField","analyzer": {"charFilters": [{"class":
>>>>>>> "solr.HTMLStripCharFilterFactory"}], "tokenizer": {"class":
>>>>>>> "solr.StandardTokenizerFactory"},"filters":[{"class":
>>>>>>> "solr.WordDelimiterFilterFactory"}, {"class":
>>>>>>> "solr.LowerCaseFilterFactory"}]}},"replace-field": { "name":
>>>>>>> "_text_","type": "my_new_field_type", "multiValued":
>> "true","indexed":
>>>>>>> "true","stored": "false"}}’
>>>>>>> 
>>>>>>> i receave the folow error msg from curl program
>>>>>>> :
>>>>>>> 
>>>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>>>> 
>>>>>>> curl: (6) Could not resolve host: name
>>>>>>> 
>>>>>>> curl: (6) Could not resolve host: my_new_field_type,
>>>>>>> 
>>>>>>> curl: (6) Could not resolve host: class
>>>>>>> 
>>>>>>> curl: (6) Could not resolve host: s

Re: How to Add New Fields and Fields Types Programmatically Using Solrj

2016-07-18 Thread Steve Rowe
Hi Jeniba,

You can add fields and field types using Solrj with SchemaRequest.Update 
subclasses - see here for a list: 
<http://lucene.apache.org/solr/6_1_0/solr-solrj/org/apache/solr/client/solrj/request/schema/SchemaRequest.Update.html>

There are quite a few examples of doing both in the tests: 
<https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=solr/solrj/src/test/org/apache/solr/client/solrj/request/SchemaTest.java;h=72051b123aadb2df57f4bf19abfedb0ac0deb6cd;hb=refs/heads/branch_6_1>

--
Steve
www.lucidworks.com

> On Jul 18, 2016, at 1:59 AM, Jeniba Johnson  
> wrote:
> 
> 
> Hi,
> 
> I have configured solr5.3.1 and started Solr in schema less mode. Using 
> SolrInputDocument, Iam able to add new fields in solrconfig.xml using Solrj.
> How to specify the field type of a field using Solrj.
> 
> Eg  required="true" multivalued="false" />
> 
> How can I add field type properties using SolrInputDocument programmatically 
> using Solrj? Can anyone help with it?
> 
> 
> 
> Regards,
> Jeniba Johnson
> 
> 
> 
> 
> The contents of this e-mail and any attachment(s) may contain confidential or 
> privileged information for the intended recipient(s). Unintended recipients 
> are prohibited from taking action on the basis of information in this e-mail 
> and using or disseminating the information, and must notify the sender and 
> delete it from their system. L&T Infotech will not accept responsibility or 
> liability for the accuracy or completeness of, or the presence of any virus 
> or disabling code in this e-mail"



Re: EmbeddedSolrServer problem when using one-jar-with-dependency including solr

2016-08-02 Thread Steve Rowe
solr-core[1] and solr-solrj[2] POMs have parent POM solr-parent[3], which in 
turn has parent POM lucene-solr-grandparent[4], which has a 
 section that specifies dependency versions & exclusions 
*for all direct dependencies*.

The intent is for all Lucene/Solr’s internal dependencies to be managed 
directly, rather than through Maven’s transitive dependency mechanism.  For 
background, see summary & comments on JIRA issue LUCENE-5217[5].

I haven’t looked into how this affects systems that depend on Lucene/Solr 
artifacts, but it appears to be the case that you can’t use Maven’s transitive 
dependency mechanism to pull in all required dependencies for you.

BTW, if you look at the grandparent POM, the httpclient version for Solr 6.1.0 
is declared as 4.4.1.  I don’t know if depending on version 4.5.2 is causing 
problems, but if you don’t need a feature in 4.5.2, I suggest that you depend 
on the same version as Solr does.

For error #2, you should depend on lucene-core[6].

My suggestion as a place to start: copy/paste the dependencies from 
solr-core[1] and solr-solrj[2] POMs, and leave out stuff you know you won’t 
need.

[1] 
<https://repo1.maven.org/maven2/org/apache/solr/solr-core/6.1.0/solr-core-6.1.0.pom>
[2] 
<https://repo1.maven.org/maven2/org/apache/solr/solr-solrj/6.1.0/solr-solrj-6.1.0.pom>
[3] 
<https://repo1.maven.org/maven2/org/apache/solr/solr-parent/6.1.0/solr-parent-6.1.0.pom>
[4] 
<https://repo1.maven.org/maven2/org/apache/lucene/lucene-solr-grandparent/6.1.0/lucene-solr-grandparent-6.1.0.pom>
[5] <https://issues.apache.org/jira/browse/LUCENE-5217>
[6] 
<http://search.maven.org/#artifactdetails|org.apache.lucene|lucene-core|6.1.0|jar>

--
Steve
www.lucidworks.com

> On Aug 2, 2016, at 12:03 PM, Ziqi Zhang  wrote:
> 
> Hi, I am using Solr, Solrj 6.1, and Maven to manage my project. I use maven 
> to build a jar-with-dependency and run a java program pointing its classpath 
> to this jar. However I keep getting errors even when I just try to create an 
> instance of EmbeddedSolrServer:
> 
> */code/
> *String solrHome = "/home/solr/";
> String solrCore = "fw";
> solrCores = new EmbeddedSolrServer(
>Paths.get(solrHome), solrCore
>).getCoreContainer();
> ///
> 
> 
> My project has dependencies defined in the pom shown below:  **When block A 
> is not present**, running the code that calls:
> 
> * pom /*
> 
>org.apache.jena
>jena-arq
>3.0.1
>
> 
>
> BLOCK A
> org.apache.httpcomponents
>httpclient
>4.5.2
> BLOCK A ENDS
>
>
>org.apache.solr
>solr-core
>6.1.0
>
>
>org.slf4j
> slf4j-log4j12
>
>
>log4j
>log4j
>
>
>org.slf4j
> slf4j-jdk14
>
>
>
>
>org.apache.solr
>solr-solrj
>6.1.0
>
>
>org.slf4j
> slf4j-log4j12
>
>
>log4j
>log4j
>
>
>org.slf4j
> slf4j-jdk14
>
>
>
> ///
> 
> 
> Block A is added because when it is missing, the following error is thrown on 
> the java code above:
> 
> * ERROR 1 ///*
> 
>Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/http/impl/client/CloseableHttpClient
>at 
> org.apache.solr.handler.component.HttpShardHandlerFactory.init(HttpShardHandlerFactory.java:167)
>at 
> org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:47)
>at org.apache.solr.core.CoreContainer.load(CoreContainer.java:404)
>at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.load(EmbeddedSolrServer.java:84)
>at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.(EmbeddedSolrServer.java:70)
>at 
> uk.ac.ntu.sac.sense.SenseProperty.initSolrServer(SenseProperty.java:103)
>at 
> uk.ac.ntu.sac.sense.SenseProperty.getClassIndex(SenseProperty.java:81)
>at 
> uk.ac

Re: EmbeddedSolrServer problem when using one-jar-with-dependency including solr

2016-08-03 Thread Steve Rowe
Oh, then likely the problem is that your uberjar packing tool doesn’t know how 
to (or maybe isn’t configured to?) include/merge/translate resources under 
META-INF/services/.  E.g. lucene/core module has SPI files there.

Info on the maven shade plugin’s configuration for this stuff is here here: 
<https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer>

--
Steve
www.lucidworks.com

> On Aug 3, 2016, at 5:26 AM, Ziqi Zhang  wrote:
> 
> Thanks
> 
> I am not sure if Steve's suggestion was the right solution. Even when I did 
> not have explicitly defined the dependency on lucene, I can see in the 
> packaged jar it still contains org.apache.lucene.
> 
> What solved my problem is to not pack a single jar but use a folder of 
> individual jars. I am not sure why though.
> 
> Regards
> 
> 
> On 02/08/2016 21:53, Rohit Kanchan wrote:
>> We also faced same issue when we were running embedded solr 6.1 server.
>> Actually I faced the same in our integration environment after deploying
>> project. Solr 6.1 is using http client 4.4.1 which I think  embedded solr
>> server is looking for. I think when solr core is getting loaded then old
>> http client is getting loaded from some where in your maven. Check
>> dependency tree of your pom.xml and see if you can exclude this jar getting
>> loaded from anywhere else. Just exclude them in your pom.xml. I hope this
>> solves your issue,
>> 
>> 
>> Thanks
>> Rohit
>> 
>> 
>> On Tue, Aug 2, 2016 at 9:44 AM, Steve Rowe  wrote:
>> 
>>> solr-core[1] and solr-solrj[2] POMs have parent POM solr-parent[3], which
>>> in turn has parent POM lucene-solr-grandparent[4], which has a
>>>  section that specifies dependency versions &
>>> exclusions *for all direct dependencies*.
>>> 
>>> The intent is for all Lucene/Solr’s internal dependencies to be managed
>>> directly, rather than through Maven’s transitive dependency mechanism.  For
>>> background, see summary & comments on JIRA issue LUCENE-5217[5].
>>> 
>>> I haven’t looked into how this affects systems that depend on Lucene/Solr
>>> artifacts, but it appears to be the case that you can’t use Maven’s
>>> transitive dependency mechanism to pull in all required dependencies for
>>> you.
>>> 
>>> BTW, if you look at the grandparent POM, the httpclient version for Solr
>>> 6.1.0 is declared as 4.4.1.  I don’t know if depending on version 4.5.2 is
>>> causing problems, but if you don’t need a feature in 4.5.2, I suggest that
>>> you depend on the same version as Solr does.
>>> 
>>> For error #2, you should depend on lucene-core[6].
>>> 
>>> My suggestion as a place to start: copy/paste the dependencies from
>>> solr-core[1] and solr-solrj[2] POMs, and leave out stuff you know you won’t
>>> need.
>>> 
>>> [1] <
>>> https://repo1.maven.org/maven2/org/apache/solr/solr-core/6.1.0/solr-core-6.1.0.pom
>>> [2] <
>>> https://repo1.maven.org/maven2/org/apache/solr/solr-solrj/6.1.0/solr-solrj-6.1.0.pom
>>> [3] <
>>> https://repo1.maven.org/maven2/org/apache/solr/solr-parent/6.1.0/solr-parent-6.1.0.pom
>>> [4] <
>>> https://repo1.maven.org/maven2/org/apache/lucene/lucene-solr-grandparent/6.1.0/lucene-solr-grandparent-6.1.0.pom
>>> [5] <https://issues.apache.org/jira/browse/LUCENE-5217>
>>> [6] <
>>> http://search.maven.org/#artifactdetails|org.apache.lucene|lucene-core|6.1.0|jar
>>> --
>>> Steve
>>> www.lucidworks.com
>>> 
>>>> On Aug 2, 2016, at 12:03 PM, Ziqi Zhang 
>>> wrote:
>>>> Hi, I am using Solr, Solrj 6.1, and Maven to manage my project. I use
>>> maven to build a jar-with-dependency and run a java program pointing its
>>> classpath to this jar. However I keep getting errors even when I just try
>>> to create an instance of EmbeddedSolrServer:
>>>> */code/
>>>> *String solrHome = "/home/solr/";
>>>> String solrCore = "fw";
>>>> solrCores = new EmbeddedSolrServer(
>>>>Paths.get(solrHome), solrCore
>>>>).getCoreContainer();
>>>> ///
>>>> 
>>>> 
>>>> My project has dependencies defined in the pom shown below:  **When
>>> block A is not present**, running the code that calls:
>>>> * pom /*
>>&

Re: Difference in boolean query parsing. Solr-5.4.0 VS Solr.6.1.0

2016-08-04 Thread Steve Rowe
It’s fairly likely these differences are as a result of SOLR-2649[1] (released 
with 5.5) and SOLR-8812[2] (released with 6.1).

If you haven’t seen it, I recommend you read Hoss'ss blog “Why Not AND, OR, And 
NOT?” <https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/>.

If you can, add parentheses to explicitly specify precedence.

[1] https://issues.apache.org/jira/browse/SOLR-2649
[2] https://issues.apache.org/jira/browse/SOLR-8812

--
Steve
www.lucidworks.com

> On Aug 4, 2016, at 2:23 AM, Modassar Ather  wrote:
> 
> Hi,
> 
> During migration from Solr-5.4.1 to Solr-6.1.0 I saw a difference in the
> behavior of few of my boolean queries.
> As per my current understanding the default operator comes in when there is
> no operator present in between two terms.
> Also both the ANDed terms are marked mandatory if not, any of them is
> introduced as NOT. Same is the case with OR.
> Please correct me if my understanding is wrong.
> 
> The below queries are parsed differently and causes a lot of difference in
> search result.
> The default operator used is AND and no mm is set.
> 
> 
> *Query  : *fl:(network hardware AND device OR system)
> *Solr.6.1.0 :* "+(+fl:network +fl:hardware fl:device fl:system)"
> *Solr-5.4.0 : *"+(fl:network +fl:hardware +fl:device fl:system)"
> 
> *Query  : *fl:(network OR hardware device system)
> *Solr.6.1.0 : *"+(fl:network fl:hardware +fl:device +fl:system)"
> *Solr-5.4.0 : *"+(fl:network fl:hardware fl:device fl:system)"
> 
> *Query  : *fl:(network OR hardware AND device OR system)
> *Solr.6.1.0 : *"+(fl:network +fl:hardware fl:device fl:system)"
> *Solr-5.4.0 : *"+(fl:network +fl:hardware +fl:device fl:system)"
> 
> *Query  : *fl:(network AND hardware AND device OR system)"
> *Solr.6.1.0 : *"+(+fl:network +fl:hardware fl:device fl:system)"
> *Solr-5.4.0 : *"+(+fl:network +fl:hardware +fl:device fl:system)"
> 
> Please help me understand the difference in parsing and its effect on
> search.
> 
> Thanks,
> Modassar



Re: How can I set the defaultOperator to be AND?

2016-08-05 Thread Steve Rowe
Hi Bastien,

Have you tried upgrading to 6.1?  SOLR-8812, mentioned earlier in the thread, 
was released with 6.1, and is directly aimed at fixing the problem you are 
having in 6.0 (also a problem in 5.5): when mm is not explicitly provided and 
the query contains explicit operators (except for AND), edismax now sets mm=0.

--
Steve
www.lucidworks.com

> On Aug 5, 2016, at 2:34 AM, Bastien Latard | MDPI AG 
>  wrote:
> 
> Hi Eric & others,
> Is there any way to overwrite the default OP when we use edismax?
> Because adding the following line to solrconfig.xml doesn't solve the problem:
> 
> 
> (Then if I do "q=black OR white", this always gives the results for "black 
> AND white")
> 
> I did not find a way to define a default OP, which is automatically 
> overwritten by the AND/OR from a query.
> 
> 
> Example - Debug: defaultOP in solrconfig = AND / q=a or b
> 
> 
> ==> results for black AND white
> The correct result should be the following (but I had to force the q.op):
> 
> ==> I cannot do this in case I want to do "(a AND b) OR c"...
> 
> 
> Kind regards,
> Bastien
> 
> On 27/04/2016 05:30, Erick Erickson wrote:
>> Defaulting to "OR" has been the behavior since forever, so changing the 
>> behavior now is just not going to happen. Making it fit a new version of 
>> "correct" will change the behavior for every application out there that has 
>> not specified the default behavior.
>> 
>> There's no a-priori reason to expect "more words to equal fewer docs", I can 
>> just as easily argue that "more words should return more docs". Which you 
>> expect depends on your mental model.
>> 
>> And providing the default op in your solrconfig.xml request handlers allows 
>> you to implement whatever model your application chooses...
>> 
>> Best,
>> Erick
>> 
>> On Mon, Apr 25, 2016 at 11:32 PM, Bastien Latard - MDPI AG 
>>  wrote:
>> Thank you Shawn, Jan and Georg for your answers.
>> 
>> Yes, it seems that if I simply remove the defaultOperator it works well for 
>> "composed queries" like '(a:x AND b:y) OR c:z'.
>> But I think that the default Operator should/could be the AND.
>> 
>> Because when I add an extra search word, I expect that the results get more 
>> accurate...
>> (It seems to be what google is also doing now)
>>|   |   
>> 
>> Otherwise, if you make a search and apply another filter (e.g.: sort by 
>> publication date, facets, ...) , user can get the less relevant item (only 1 
>> word in 4 matches) in first position only because of its date...
>> 
>> What do you think?
>> 
>> 
>> Kind regards,
>> Bastien
>> 
>> 
>> On 25/04/2016 14:53, Shawn Heisey wrote:
>>> On 4/25/2016 6:39 AM, Bastien Latard - MDPI AG wrote:
>>> 
>>>> Remember:
>>>> If I add the following line to the schema.xml, even if I do a search
>>>> 'title:"test" OR author:"me"', it will returns documents matching
>>>> 'title:"test" AND author:"me"':
>>>>  
>>>> 
>>> The settings in the schema for default field and default operator were
>>> deprecated a long time ago.  I actually have no idea whether they are
>>> even supported in newer Solr versions.
>>> 
>>> The q.op parameter controls the default operator, and the df parameter
>>> controls the default field.  These can be set in the request handler
>>> definition in solrconfig.xml -- usually in "defaults" but there might be
>>> reason to put them in "invariants" instead.
>>> 
>>> If you're using edismax, you'd be better off using the mm parameter
>>> rather than the q.op parameter.  The behavior you have described above
>>> sounds like a change in behavior (some call it a bug) introduced in the
>>> 5.5 version:
>>> 
>>> 
>>> https://issues.apache.org/jira/browse/SOLR-8812
>>> 
>>> 
>>> If you are using edismax, I suspect that if you set mm=100% instead of
>>> q.op=AND (or the schema default operator) that the problem might go away
>>> ... but I am not sure.  Someone who is more familiar with SOLR-8812
>>> probably should comment.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>>> 
> 



Re: Getting dynamic fields using LukeRequest.

2016-08-09 Thread Steve Rowe
Not sure what the issue is with LukeRequest, but Solrj has Schema API support: 
<http://lucene.apache.org/solr/6_1_0/solr-solrj/org/apache/solr/client/solrj/request/schema/SchemaRequest.DynamicFields.html>

You can see which options are supported here: 
<https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-ListDynamicFields>

--
Steve
www.lucidworks.com

> On Aug 9, 2016, at 8:52 AM, Pranaya Behera  wrote:
> 
> Hi,
> I have the following script to retrieve all the fields in the collection. 
> I am using SolrCloud 6.1.0.
> LukeRequest lukeRequest = new LukeRequest();
> lukeRequest.setNumTerms(0);
> lukeRequest.setShowSchema(false);
> LukeResponse lukeResponse = lukeRequest.process(cloudSolrClient);
> Map fieldInfoMap = 
> lukeResponse.getFieldInfo();
> for (Map.Entry entry : 
> fieldInfoMap.entrySet()) {
>  entry.getKey(); // Here fieldInfoMap is size of 0 for sometime and sometime 
> it is getting incomplete data.
> }
> 
> 
> Setting showSchema to true doesn't yield any result. Only making it false 
> yields result that too incomplete data. As I can see in the doc that it has 
> more than what it is saying it has.
> 
> LukeRequest hits /solr/product/admin/luke?numTerms=0&wt=javabin&version=2 
> HTTP/1.1 .
> 
> How it should be configured for solrcloud ?
> I have already mentioned
> 
>  class="org.apache.solr.handler.admin.LukeRequestHandler" />
> 
> in the solrconfig.xml. It doesn't matter whether it is present in the 
> solrconfig or not as I am requesting it from solrj.
> 



Re: How can I set the defaultOperator to be AND?

2016-09-05 Thread Steve Rowe
Hi Bast, 

Good to know you got it to work - thanks for letting us know!

--
Steve
www.lucidworks.com

> On Sep 2, 2016, at 4:30 AM, Bastien Latard | MDPI AG 
>  wrote:
> 
> Thanks Steve for your advice (i.e.: upgrade to Solr 6.2).
> I finally had time to upgrade and can now use "&q.op=AND" together with "&q=a 
> OR b" and this works as expected.
> 
> I even defined the following line in the defaults settings in the 
> requestHandler, to overwrite the default behavior:
> AND
> 
> Issue fixed :)
> 
> Kind regards,
> Bast
> 
> On 05/08/2016 14:57, Bastien Latard | MDPI AG wrote:
>> Hi Steve,
>> 
>> I read the thread you sent me (SOLR-8812) and it seems that the 6.1 includes 
>> this fix, as you said.
>> I will upgrade.
>> Thank you!
>> 
>> Kind regards,
>> Bast
>> 
>> On 05/08/2016 14:37, Steve Rowe wrote:
>>> Hi Bastien,
>>> 
>>> Have you tried upgrading to 6.1?  SOLR-8812, mentioned earlier in the 
>>> thread, was released with 6.1, and is directly aimed at fixing the problem 
>>> you are having in 6.0 (also a problem in 5.5): when mm is not explicitly 
>>> provided and the query contains explicit operators (except for AND), 
>>> edismax now sets mm=0.
>>> 
>>> -- 
>>> Steve
>>> www.lucidworks.com
>>> 
>>>> On Aug 5, 2016, at 2:34 AM, Bastien Latard | MDPI AG 
>>>>  wrote:
>>>> 
>>>> Hi Eric & others,
>>>> Is there any way to overwrite the default OP when we use edismax?
>>>> Because adding the following line to solrconfig.xml doesn't solve the 
>>>> problem:
>>>> 
>>>> 
>>>> (Then if I do "q=black OR white", this always gives the results for "black 
>>>> AND white")
>>>> 
>>>> I did not find a way to define a default OP, which is automatically 
>>>> overwritten by the AND/OR from a query.
>>>> 
>>>> 
>>>> Example - Debug: defaultOP in solrconfig = AND / q=a or b
>>>> 
>>>> 
>>>> ==> results for black AND white
>>>> The correct result should be the following (but I had to force the q.op):
>>>> 
>>>> ==> I cannot do this in case I want to do "(a AND b) OR c"...
>>>> 
>>>> 
>>>> Kind regards,
>>>> Bastien
>>>> 
>>>> On 27/04/2016 05:30, Erick Erickson wrote:
>>>>> Defaulting to "OR" has been the behavior since forever, so changing the 
>>>>> behavior now is just not going to happen. Making it fit a new version of 
>>>>> "correct" will change the behavior for every application out there that 
>>>>> has not specified the default behavior.
>>>>> 
>>>>> There's no a-priori reason to expect "more words to equal fewer docs", I 
>>>>> can just as easily argue that "more words should return more docs". Which 
>>>>> you expect depends on your mental model.
>>>>> 
>>>>> And providing the default op in your solrconfig.xml request handlers 
>>>>> allows you to implement whatever model your application chooses...
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>> On Mon, Apr 25, 2016 at 11:32 PM, Bastien Latard - MDPI AG 
>>>>>  wrote:
>>>>> Thank you Shawn, Jan and Georg for your answers.
>>>>> 
>>>>> Yes, it seems that if I simply remove the defaultOperator it works well 
>>>>> for "composed queries" like '(a:x AND b:y) OR c:z'.
>>>>> But I think that the default Operator should/could be the AND.
>>>>> 
>>>>> Because when I add an extra search word, I expect that the results get 
>>>>> more accurate...
>>>>> (It seems to be what google is also doing now)
>>>>>|   |
>>>>> 
>>>>> Otherwise, if you make a search and apply another filter (e.g.: sort by 
>>>>> publication date, facets, ...) , user can get the less relevant item 
>>>>> (only 1 word in 4 matches) in first position only because of its date...
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> 
>>>>> Kind regards,
>>>>> Bastien
>>>>> 
>>>>> 
>>>>> On 25/04/2016 14:53, Sha

Re: Tutorial not working for me

2016-09-19 Thread Steve Rowe
In the data driven configset, autoguessing text fields as the “strings" field 
type is intended to enable faceting.  The catch-all _text_ field enables search 
on all fields, but this may not be a good alternative to fielded search. 

I’m going to start working on updating the quick start tutorial - nobody has 
updated it since 5.0 AFAICT.

--
Steve
www.lucidworks.com

> On Sep 16, 2016, at 8:34 PM, Chris Hostetter  wrote:
> 
> 
> : I apologize if this is a really stupid question. I followed all
> 
> It's not a stupid question, the tutorial is completley broken -- and for 
> that matter, in my opinion, the data_driven_schema_configs used by that 
> tutorial (and recommended for new users) are largely useless for the same 
> underlying reason...
> 
> https://issues.apache.org/jira/browse/SOLR-9526
> 
> Thank you very much for asking about this - hopefully the folks who 
> understand this more (and don't share my opinion that the entire concept 
> of data_driven schemas are a terrible idea) can chime in and explain WTF 
> is going on here)
> 
> 
> -Hoss
> http://www.lucidworks.com/



Re: Tutorial not working for me

2016-09-19 Thread Steve Rowe
Hi Alex,

Sure - I assume you mean independently from SOLR-9526 and SOLR-6871?

--
Steve
www.lucidworks.com

> On Sep 19, 2016, at 12:40 PM, Alexandre Rafalovitch  
> wrote:
> 
> On 19 September 2016 at 23:37, Steve Rowe  wrote:
>> I’m going to start working on updating the quick start tutorial - nobody has 
>> updated it since 5.0 AFAICT.
> 
> Is that something that's worth discussing in a group/JIRA/etc?
> 
> Regards,
>   Alex.
> 
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/



Re: Tutorial not working for me

2016-09-19 Thread Steve Rowe
For now, I was thinking of making it reflect current reality as much as 
possible, without changing coverage.

--
Steve
www.lucidworks.com

> On Sep 19, 2016, at 1:13 PM, Alexandre Rafalovitch  wrote:
> 
> Whatever works. If JIRA, SOLR-6871 is probably a reasonable place.
> Depends on the scope of "updating" you want to do.
> 
> Regards,
>   Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 20 September 2016 at 00:02, Steve Rowe  wrote:
>> Hi Alex,
>> 
>> Sure - I assume you mean independently from SOLR-9526 and SOLR-6871?
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Sep 19, 2016, at 12:40 PM, Alexandre Rafalovitch  
>>> wrote:
>>> 
>>> On 19 September 2016 at 23:37, Steve Rowe  wrote:
>>>> I’m going to start working on updating the quick start tutorial - nobody 
>>>> has updated it since 5.0 AFAICT.
>>> 
>>> Is that something that's worth discussing in a group/JIRA/etc?
>>> 
>>> Regards,
>>>  Alex.
>>> 
>>> 
>>> Newsletter and resources for Solr beginners and intermediates:
>>> http://www.solr-start.com/
>> 



Re: Problem with Han character in ICUFoldingFilter

2016-10-30 Thread Steve Rowe
Among several other foldings, ICUFoldingFilter performs the Unicode NFC 
transform, which consists of canonical decomposition (NFD) followed by 
canonical composition.  NFD transforms U+FA04 to U+5B85, and canonical 
composition leaves U+5B85 as-is.

U+FA04 is in the “Pronunciation variants from KS X 1001:1998" sub-block - KS X 
1001 is a Korean encoding standard - in the "CJK Compatibility Ideographs" 
block <http://www.unicode.org/charts/PDF/UF900.pdf>.  I don’t know why these 
variants were included in Unicode, but the NFD transform includes the 
compatibility->canonical tranform, so it’s likely many other compatibility 
characters in your data will be affected, not just this one.  If the 
compatibility->canonical tranform is problematic, why are you using 
ICUFoldingFilter?

If you like some of the foldings included in ICUFoldingFilter but not others, 
check out the “gennorm2” and “gen-utr30-data-files” targets in the Lucene/Solr 
source code at lucene/analysis/icu/build.xml - you could build and use a 
modified binary tranform data file - this file is distributed as part of the 
lucene-analyzers-icu jar at org/apache/lucene/analysis/icu/utr30.nrm.
 
--
Steve
www.lucidworks.com

> On Oct 30, 2016, at 10:29 AM, Ahmet Arslan  wrote:
> 
> Hi Eyal,
> 
> ICUFoldingFilter uses http://site.icu-project.org under the hood.
> If you think there is a bug, it is better to ask its mailing list.
> 
> Ahmet
> 
> 
> 
> On Sunday, October 30, 2016 3:41 PM, "eyal.naam...@exlibrisgroup.com" 
>  wrote:
> Hi,
> 
> I was wondering if anyone ran into the following issue, or a similar one:
> In Han script there are two separate characters - 宅 (FA04) and 宅 (5B85).
> It seems that ICUFoldingFilter converts FA04 to 5B85, which results in the 
> wrong character being indexed.
> Does anyone have any idea if and how this can be resolved? Is there an option 
> to add an exception rule to ICUFoldingFilter?
> Thanks,
> Eyal



Re: Issue with SynonymGraphFilterFactory

2017-06-29 Thread Steve Rowe
Hi Diogo,

That sounds like a bug to me.  Would you mind filing a JIRA?

--
Steve
www.lucidworks.com

> On Jun 29, 2017, at 4:46 PM, diogo  wrote:
> 
> I just checked debug=query
> 
> Seems like spanNearQuery function is getting the slope parameter as 0, no
> matter what comes after the tilde:
> 
> "parsedquery":"SpanNearQuery(spanNear([laudo:mother,
> spanOr([laudo:hipoatenuaca, laudo:hipodens])],* 0*, true))"
> 
> For searching: "mother grandmother"~8 or "mother grandmother"~1000
> 
> synonyms.txt has: 
> mother, grand mother
> 
> When I search for words whose synonyms are not multi-word, MultiPhraseQuery
> is used, instead of SpanNearQuery:
> "MultiPhraseQuery(laudo:\"father (grandfather granddad)\"~10)"
> 
> synonyms.txt has:
> grandfather, granddad
> 
> Is ther a way to change the slope on the first case with Solr API?
> 
> Thanks
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Issue-with-SynonymGraphFilterFactory-tp4343400p4343544.html
> Sent from the Solr - User mailing list archive at Nabble.com.



custom search component process method not called

2017-06-30 Thread Steve Pruitt
I have a custom search component that registered in the last-components list 
for /select.  The component does some bookkeeping.  I got it working using a 
helloworld core using one of the example datasets.
I have a few logging statements to monitor the custom processing.  I have the 
jar with my components in the default server/solr/lib folder.
I created two new cores for my real datasets.  They are very small, around 60 
documents.
I duplicated the custom helloworld solrconfig.xml parts in two new core 
solrconfig.xml files.

I executed a /select on one of the new cores and nothing happened.  The 
init(...) function in my custom component for all three cores is executed ok.  
But, the process(...) and prepare(...) methods are never executed.
I retried the helloworld core and it works fine.

I can't determine why it doesn't work for the two new cores, the process method 
that is.

Why does the init method get called, but not the process method.  The prepare 
method is not called also.

The /select entry request handler config is:




  explicit
  10


   saveStateComponent




   savedState
   ${solr.solr.home}
   ${solr.core.name}



Thanks in advance.

-S
Steve Pruitt


RE: [EXTERNAL] - Re: custom search component process method not called

2017-06-30 Thread Steve Pruitt
Sigh.  Best way to find a stupid mistake is post it.

I had this...



  explicit
  10

saveStateComponent
  
  


Instead of this




  explicit
  10


saveStateComponent

mailto:erickerick...@gmail.com] 
Sent: Friday, June 30, 2017 1:26 PM
To: solr-user
Subject: [EXTERNAL] - Re: custom search component process method not called

I believe the init bit is called on startup, so that leaves the second part of 
your puzzle. I find this a bit suspicious though:

   ${solr.core.name}

Are you sure this is different for all three cores? My thought is that the 
component is being called for all three cores but it's hard to see b/c the name 
is the same.

Total guess though.

Erick

On Fri, Jun 30, 2017 at 10:08 AM, Steve Pruitt  wrote:
> I have a custom search component that registered in the last-components list 
> for /select.  The component does some bookkeeping.  I got it working using a 
> helloworld core using one of the example datasets.
> I have a few logging statements to monitor the custom processing.  I have the 
> jar with my components in the default server/solr/lib folder.
> I created two new cores for my real datasets.  They are very small, around 60 
> documents.
> I duplicated the custom helloworld solrconfig.xml parts in two new core 
> solrconfig.xml files.
>
> I executed a /select on one of the new cores and nothing happened.  The 
> init(...) function in my custom component for all three cores is executed ok. 
>  But, the process(...) and prepare(...) methods are never executed.
> I retried the helloworld core and it works fine.
>
> I can't determine why it doesn't work for the two new cores, the process 
> method that is.
>
> Why does the init method get called, but not the process method.  The prepare 
> method is not called also.
>
> The /select entry request handler config is:
>
> 
> 
> 
>   explicit
>   10
> 
> 
>saveStateComponent
> 
> 
>
>  class="mycomponent.SaveStateComponent">
>savedState
>${solr.solr.home}
>${solr.core.name}
> 
>
>
> Thanks in advance.
>
> -S
> Steve Pruitt


RE: [EXTERNAL] - Re: custom search component process method not called

2017-06-30 Thread Steve Pruitt
It works ok for me now.  Both cores execute as expected.
The ${solr.core.name} resolves correctly for both cores.  As in I get the right 
name for each.  Is still something I shouldn't do?


-S

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, June 30, 2017 1:34 PM
To: solr-user
Subject: [EXTERNAL] - Re: custom search component process method not called


: 
: I believe the init bit is called on startup, so that leaves the second
: part of your puzzle. I find this a bit suspicious though:
: 
:${solr.core.name}
: 
: Are you sure this is different for all three cores? My thought is that
: the component is being called for all three cores but it's hard to see
: b/c the name is the same.


solr.core.name is one of the implicit properties solr provides -- it should be 
impossible for 2 cores to have the same value...

https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_6-5F6_configuring-2Dsolrconfig-2Dxml.html-23Configuringsolrconfig.xml-2DImplicitCoreProperties&d=DwIBAg&c=ZgVRmm3mf2P1-XDAyDsu4A&r=ksx9qnQFG3QvxkP54EBPEzv1HHDjlk-MFO-7EONGCtY&m=05ssD9eONiN5cxvpmHRAJ9JE897CHQKf_gUJPb2wcRo&s=eFU5vYHE7gyEwfpi-8pJyAP9dRYFXMdCdSNhu4NpG0M&e=
 

Steve: can you share with us more details of what exactly your SearchComponent 
code looks like (or prune it down to a really trivial example w/only some 
logging line) and your entire solrconfig.xml for a problematic core?


: 
: Total guess though.
: 
: Erick
: 
: On Fri, Jun 30, 2017 at 10:08 AM, Steve Pruitt  wrote:
: > I have a custom search component that registered in the last-components 
list for /select.  The component does some bookkeeping.  I got it working using 
a helloworld core using one of the example datasets.
: > I have a few logging statements to monitor the custom processing.  I have 
the jar with my components in the default server/solr/lib folder.
: > I created two new cores for my real datasets.  They are very small, around 
60 documents.
: > I duplicated the custom helloworld solrconfig.xml parts in two new core 
solrconfig.xml files.
: >
: > I executed a /select on one of the new cores and nothing happened.  The 
init(...) function in my custom component for all three cores is executed ok.  
But, the process(...) and prepare(...) methods are never executed.
: > I retried the helloworld core and it works fine.
: >
: > I can't determine why it doesn't work for the two new cores, the process 
method that is.
: >
: > Why does the init method get called, but not the process method.  The 
prepare method is not called also.
: >
: > The /select entry request handler config is:
: >
: > 
: > 
: > 
: >   explicit
: >   10
: > 
: > 
: >saveStateComponent
: > 
: > 
: >
: > 
: >    savedState
: >${solr.solr.home}
: >${solr.core.name}
: > 
: >
: >
: > Thanks in advance.
: >
: > -S
: > Steve Pruitt
: 

-Hoss
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com_&d=DwIBAg&c=ZgVRmm3mf2P1-XDAyDsu4A&r=ksx9qnQFG3QvxkP54EBPEzv1HHDjlk-MFO-7EONGCtY&m=05ssD9eONiN5cxvpmHRAJ9JE897CHQKf_gUJPb2wcRo&s=OlH5tUbnx2gT9hB9tPK3ctYw7leZn_zlxHFxCQ8P_0w&e=
 


  1   2   3   4   5   6   >