from:"Steve Rowe"

Re: schemaless vs schema based core

2016-01-22 Thread Steve Rowe

Yes, and also underflow in the case of double/float.

--
Steve
www.lucidworks.com

> On Jan 22, 2016, at 12:25 PM, Shyam R  wrote:
> 
> I think, schema-less mode might allocate double instead of float, long
> instead of int to guard against overflow, which increases index size. Is my
> assumption valid?
> 
> Thanks
> 
> 
> 
> 
> On Thu, Jan 21, 2016 at 10:48 PM, Erick Erickson 
> wrote:
> 
>> I guess it's all about whether schemaless really supports
>> 1> all the docs you index.
>> 2> all the use-cases for search.
>> 3> the assumptions it makes scale to you needs.
>> 
>> If you've established rigorous tests and schemaless does all of the
>> above, I'm all for shortening the cycle by using schemaless.
>> 
>> But if it's just being sloppy and "success" is "I managed to index 50
>> docs and get some results back by searching", expect to find some
>> "interesting" issues down the road.
>> 
>> And finally, if it's "we use schemaless to quickly try things in the
>> UI and for the _real_ prod environment we need to be more rigorous
>> about the schema", well shortening development time is A Good Thing.
>> Part of moving to prod could be taking the schema generated by
>> schemaless and tweaking it for instance.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Jan 21, 2016 at 8:54 AM, Shawn Heisey  wrote:
>>> On 1/21/2016 2:22 AM, Prateek Jain J wrote:
 Thanks Erick,
 
 Yes, I took same approach as suggested by you. The issue is some
>> developers started with schemaless configuration and now they have started
>> liking it and avoiding restrictions (including increased time to deploy
>> application, in managed enterprise environment). I was more concerned about
>> pushing best practices around this in team, because allowing anyone to new
>> attributes will become overhead in terms of management, security and
>> maintainability. Regarding your concern about not storing documents on
>> separate disk; we are storing them in solr but not as backup copies. One
>> doubt still remains in mind w.r.t auto-detection of types in  solr:
 
 Is there a performance benefit of using defined types (schema based)
>> vs un-defined types while adding documents? Does "solrj" ships this
>> meta-information like type of attributes to solr, because code looks
>> something like?
 
 SolrInputDocument doc = new SolrInputDocument();
  doc.addField("category", "book"); // String
  doc.addField("id", 1234); //Long
  doc.addField("name", "Trying solrj"); //String
 
 In my opinion, any auto-detector code will have some overhead vs the
>> other; any thoughts around this?
>>> 
>>> Although the true reality may be more complex, you should consider that
>>> everything Solr receives from SolrJ will be text -- as if you had sent
>>> the JSON or XML indexing format manually, which has no type information.
>>> 
>>> When you are building a document with SolrInputDocument, SolrJ has no
>>> knowledge of the schema in Solr.  It doesn't know whether the target
>>> field is numeric, string, date, or something else.
>>> 
>>> Using different object types for input to SolrJ just gives you general
>>> Java benefits -- things like detecting certain programming errors at
>>> compile time.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>> 
> 
> 
> 
> -- 
> Ph: 9845704792

Re: How to convert string field to date

2016-01-28 Thread Steve Rowe

Hi Sreenivasa,

This is a known bug: https://issues.apache.org/jira/browse/SOLR-8607

(though the problem is not just about catch-all fields as the issue currently 
indicates - all dynamic fields are affected)

Two workarounds (neither tested):

1. Add attr_date via add-dynamic-field instead of add-field (even though the 
name has no asterisk)
2. Remove the attr_* dynamic field, add attr-date, then add attr_* back; these 
can be done with a single request.

I’ll update SOLR_8607 to reflect these things.

--
Steve
www.lucidworks.com

> On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) 
>  wrote:
> 
> Hi,
>   I am new to solr.
> 
> I am using managed-schema. I am not using schema.xml.  I am indexing outlook 
> email messages.
> I can see only see three fields ( id,_version_,_text_) defined in 
> managed-schema. Remaining fields are
> handled by following dynamic field
>  multiValued="true"/>
> 
> I have field name attr_date with type string. I want convert this field type 
> to date. Currently date range is not
> working on this field. I tried schema API to add new field attr_date and got 
> following error message
> "Field 'attr_date' already exists".  I tried to replace field type to date 
> and got following error message
> "The field 'attr_date' is not present in this schema, and so cannot be 
> replaced".
> 
> Please help me to convert "attr_date"  field type to date.
> 
> Advanced Thanks.
> --sreenivasa kallu
> 
>

Re: How to convert string field to date

2016-01-28 Thread Steve Rowe

Try workaround 2, I did and it worked for me.  See my comment on the issue: 
<https://issues.apache.org/jira/browse/SOLR-8607?focusedCommentId=15122751&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15122751>

--
Steve
www.lucidworks.com

> On Jan 28, 2016, at 6:45 PM, Kallu, Sreenivasa (HQP) 
>  wrote:
> 
> Thanks steve for prompt response.
> 
> I tried workaround one. 
> i.e.  1. Add attr_date via add-dynamic-field instead of add-field (even 
> though the name has no asterisk)
> 
> I am able to add dynamic field  attr_date. But while starting the solr , I am 
> getting following message.
> Could not load conf for core sreenimsg: Dynamic field name 'attr_date' should 
> have either a leading or a trailing asterisk, and no others.
> 
> So solr looking for either leading * or trailing * in the dynamic field name.
> 
> I can see similar problems in workaround 2.
> 
> Any other suggestions?
> 
> Advanced Thanks.
> --sreenivasa kallu
> 
> -Original Message-
> From: Steve Rowe [mailto:sar...@gmail.com] 
> Sent: Thursday, January 28, 2016 1:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to convert string field to date
> 
> Hi Sreenivasa,
> 
> This is a known bug: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D8607&d=CwIFaQ&c=19TEyCb-E0do3cLmFgm9ItTXlbGQ5gmhRAlAtE256go&r=ZV-VnW_JFfcZo8vYJrpehzAvJFfw1xE42YRKpSHHqLg&m=ZJBCYIV-H5H3u5j_Rrhaex68Eb9dgqZmlO6fzKNfr8s&s=qmQIR8akquwcJ83E7HZgK38lTfSug8QifJEH1_ljJkk&e=
>  
> 
> (though the problem is not just about catch-all fields as the issue currently 
> indicates - all dynamic fields are affected)
> 
> Two workarounds (neither tested):
> 
> 1. Add attr_date via add-dynamic-field instead of add-field (even though the 
> name has no asterisk) 2. Remove the attr_* dynamic field, add attr-date, then 
> add attr_* back; these can be done with a single request.
> 
> I’ll update SOLR_8607 to reflect these things.
> 
> --
> Steve
> www.lucidworks.com
> 
>> On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) 
>>  wrote:
>> 
>> Hi,
>>  I am new to solr.
>> 
>> I am using managed-schema. I am not using schema.xml.  I am indexing outlook 
>> email messages.
>> I can see only see three fields ( id,_version_,_text_) defined in 
>> managed-schema. Remaining fields are handled by following dynamic 
>> field > stored="true" multiValued="true"/>
>> 
>> I have field name attr_date with type string. I want convert this 
>> field type to date. Currently date range is not working on this field. 
>> I tried schema API to add new field attr_date and got following error 
>> message "Field 'attr_date' already exists".  I tried to replace field type 
>> to date and got following error message "The field 'attr_date' is not 
>> present in this schema, and so cannot be replaced".
>> 
>> Please help me to convert "attr_date"  field type to date.
>> 
>> Advanced Thanks.
>> --sreenivasa kallu
>> 
>> 
>

Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-08 Thread Steve Rowe

Hi Ilan,

Looks like you’re modifying solr.in.sh instead of solr.in.cmd?

FYI running under Cygwin is not supported.

--
Steve
www.lucidworks.com

> On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
> 
> Hi all, I am trying to integrate solr with SSL on Windows 7 OS
> I followed the enable ssl guide at
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
> 
> I created the keystore and placed in on etc folder. I un-commented the
> lines and set:
> SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> SOLR_SSL_KEY_STORE_PASSWORD=password
> SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> SOLR_SSL_TRUST_STORE_PASSWORD=password
> SOLR_SSL_NEED_CLIENT_AUTH=false
> 
> When i test the storekey using
> keytool -list -alias solr-ssl -keystore
> C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password -keypass
> password
> It is okay, and print me there is 1 entry in keystore.
> 
> When i am running in from solr, it will write:
> "Keystore was tampered with, or password was incorrect"
> I get this exception after JavaKeyStore.engineLoad(JavaKeyStore.java:780)
> 
> 
> If i replace
> SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
> SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
> it will write the same error, i suspect i dont deliver the path as it
> should be.
> 
> Any suggestions ?
> 
> Thanks
> 
> 
> -- 
> 
> 
> -
> Ilan Schwarts

Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-08 Thread Steve Rowe

Hmm, not sure what’s happening.  Have you tried converting the backslashes in 
your paths to forward slashes?

--
Steve
www.lucidworks.com

> On Mar 8, 2016, at 3:39 PM, Ilan Schwarts  wrote:
> 
> Hi, thanks for reply.
> I am using solr.in.cmd
> I even put some pause in the cmd with echo to see the parameters are ok.. 
> This is the original file as found in 
> https://www.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.zip
> 
> 
> 
> On Tue, Mar 8, 2016 at 10:25 PM, Steve Rowe  wrote:
> Hi Ilan,
> 
> Looks like you’re modifying solr.in.sh instead of solr.in.cmd?
> 
> FYI running under Cygwin is not supported.
> 
> --
> Steve
> www.lucidworks.com
> 
> > On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
> >
> > Hi all, I am trying to integrate solr with SSL on Windows 7 OS
> > I followed the enable ssl guide at
> > https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
> >
> > I created the keystore and placed in on etc folder. I un-commented the
> > lines and set:
> > SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> > SOLR_SSL_KEY_STORE_PASSWORD=password
> > SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
> > SOLR_SSL_TRUST_STORE_PASSWORD=password
> > SOLR_SSL_NEED_CLIENT_AUTH=false
> >
> > When i test the storekey using
> > keytool -list -alias solr-ssl -keystore
> > C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password -keypass
> > password
> > It is okay, and print me there is 1 entry in keystore.
> >
> > When i am running in from solr, it will write:
> > "Keystore was tampered with, or password was incorrect"
> > I get this exception after JavaKeyStore.engineLoad(JavaKeyStore.java:780)
> >
> >
> > If i replace
> > SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
> > SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
> > it will write the same error, i suspect i dont deliver the path as it
> > should be.
> >
> > Any suggestions ?
> >
> > Thanks
> >
> >
> > --
> >
> >
> > -
> > Ilan Schwarts
> 
> 
> 
> 
> -- 
> 
> 
> -
> Ilan Schwarts

Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-09 Thread Steve Rowe

So, did you try converting the backslashes to forward slashes?

You could try to increase logging to get more information: 
<http://eclipse.org/jetty/documentation/current/configuring-logging.html>

Can you provide a larger snippet of your log around the error?

Sounds like at a minimum Solr could do better at reporting errors 
locating/loading SSL stores.

Yes, the files in server/etc are being used in solr 5.2.1.

--
Steve
www.lucidworks.com

> On Mar 9, 2016, at 2:14 AM, Ilan Schwarts  wrote:
> 
> How would one try to solve this issue? What would you suggest me to do?
> Debug that module? I will try only to install clean jetty with ssl first.
> 
> Another question. The files jetty.xml\jetty-ssl.xml and the rest of files
> in /etc are being used in solr 5.2.1?
> On Mar 9, 2016 12:08 AM, "Steve Rowe"  wrote:
> 
>> Hmm, not sure what’s happening.  Have you tried converting the backslashes
>> in your paths to forward slashes?
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Mar 8, 2016, at 3:39 PM, Ilan Schwarts  wrote:
>>> 
>>> Hi, thanks for reply.
>>> I am using solr.in.cmd
>>> I even put some pause in the cmd with echo to see the parameters are
>> ok.. This is the original file as found in
>> https://www.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.zip
>>> 
>>> 
>>> 
>>> On Tue, Mar 8, 2016 at 10:25 PM, Steve Rowe  wrote:
>>> Hi Ilan,
>>> 
>>> Looks like you’re modifying solr.in.sh instead of solr.in.cmd?
>>> 
>>> FYI running under Cygwin is not supported.
>>> 
>>> --
>>> Steve
>>> www.lucidworks.com
>>> 
>>>> On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
>>>> 
>>>> Hi all, I am trying to integrate solr with SSL on Windows 7 OS
>>>> I followed the enable ssl guide at
>>>> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
>>>> 
>>>> I created the keystore and placed in on etc folder. I un-commented the
>>>> lines and set:
>>>> SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
>>>> SOLR_SSL_KEY_STORE_PASSWORD=password
>>>> SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
>>>> SOLR_SSL_TRUST_STORE_PASSWORD=password
>>>> SOLR_SSL_NEED_CLIENT_AUTH=false
>>>> 
>>>> When i test the storekey using
>>>> keytool -list -alias solr-ssl -keystore
>>>> C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password
>> -keypass
>>>> password
>>>> It is okay, and print me there is 1 entry in keystore.
>>>> 
>>>> When i am running in from solr, it will write:
>>>> "Keystore was tampered with, or password was incorrect"
>>>> I get this exception after
>> JavaKeyStore.engineLoad(JavaKeyStore.java:780)
>>>> 
>>>> 
>>>> If i replace
>>>> SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
>>>> SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
>>>> it will write the same error, i suspect i dont deliver the path as it
>>>> should be.
>>>> 
>>>> Any suggestions ?
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> --
>>>> 
>>>> 
>>>> -
>>>> Ilan Schwarts
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> 
>>> -
>>> Ilan Schwarts
>> 
>>

Re: Paging and cursorMark

2016-03-22 Thread Steve Rowe

Hi Tom,

There is an outstanding JIRA issue to directly support what you want (with a 
patch even!) but no work on it recently: 
.  If you’re so inclined, 
please pitch in: bring the patch up-to-date, test it, contribute improvements, 
etc.

--
Steve
www.lucidworks.com

> On Mar 22, 2016, at 10:27 AM, Tom Evans  wrote:
> 
> Hi all
> 
> With Solr 5.5.0, we're trying to improve our paging performance. When
> we are delivering results using infinite scrolling, cursorMark is
> perfectly fine - one page is followed by the next. However, we also
> offer traditional paging of results, and this is where it gets a
> little tricky.
> 
> Say we have 10 results per page, and a user wants to jump from page 1
> to page 20, and then wants to view page 21, there doesn't seem to be a
> simple way to get the nextCursorMark. We can make an inefficient
> request for page 20 (start=190, rows=10), but we cannot give that
> request a cursorMark=* as it contains start=190.
> 
> Consequently, if the user clicks to page 21, we have to continue along
> using start=200, as we have no cursorMark. The only way I can see to
> get a cursorMark at that point is to omit the start=200, and instead
> say rows=210, and ignore the first 200 results on the client side.
> Obviously, this gets more and more inefficient the deeper we page - I
> know that internally to Solr, using start=200&rows=10 has to do the
> same work as rows=210, but less data is sent over the wire to the
> client.
> 
> As I understand it, the cursorMark is a hash of the sort values of the
> last document returned, so I don't really see why it is forbidden to
> specify start=190&rows=10&cursorMark=* - why is it not possible to
> calculate the nextCursorMark from the last document returned?
> 
> I was also thinking a possible temporary workaround would be to
> request start=190&rows=10, note the last document returned, and then
> make a subsequent query for q=id:""&rows=1&cursorMark=*.
> This seems to work, but means an extra Solr query for no real reason.
> Is there any other problem to doing this?
> 
> Is there some other simple trick I am missing that we can use to get
> both the page of results we want and a nextCursorMark for the
> subsequent page?
> 
> Cheers
> 
> Tom

Re: Requesting to be added to ContributorsGroup

2016-05-03 Thread Steve Rowe

Welcome Sheece,

I’ve added you to the ContributorsGroup.

--
Steve
www.lucidworks.com

> On May 3, 2016, at 10:03 AM, Syed Gardezi  wrote:
> 
> Hello,
> I am a Master student as part of Free and Open Source Software 
> Development COMP8440 - http://programsandcourses.anu.edu.au/course/COMP8440 
> at Australian National University. I have selected 
> http://wiki.apache.org/solr/ to contribute too. Kindly add me too 
> ContributorsGroup. Thank you.
> 
> wiki username: sheecegardezi
> 
> Regards,
> Sheece
>

Re: Supported languages

2015-08-04 Thread Steve Rowe

Hi Steve,

This page may be useful: 


In most cases the configurations described there are the only OOTB alternative, 
so optimality isn’t discussed.  I think the path most people take is to try 
those out, iterate with users who can provide feedback about quality, then if 
necessary investigate alternative solutions, including commercial ones.

Steve
www.lucidworks.com

> On Aug 4, 2015, at 12:55 PM, Steven White  wrote:
> 
> Hi Everyone,
> 
> I see Solr comes pre-configured with text analyzers for a list of supported
> languages e.g.: "text_ar", "text_bq", "text_ca", "text_cjk", "text_ckb",
> "text_cz", etc.
> 
> My questions are:
> 
> 1) How well optimized are those languages for general usage?  This is
> something I need help with because other then English, I cannot judge how
> well the current pre-configured setting works for best quality.  Yes,
> "quality" means different thing for each customer, but still I'm curious to
> know if the out-of-the-box setting is optimal.
> 
> 2) Is there a landing link that talks about each of the
> supported languages, what is available and how to tune that fieldType for
> the said language?
> 
> 3) What do you do when a language I need is not on the list?  The obvious
> answer is to write my own plug-in "fieldType" (or even customize one off
> existing fieldType), but short of that, is there a "general" fieldType that
> can be used?  Even if it means this fieldType will function as if it is
> SQL's LIKE feature.
> 
> Thanks
> 
> Steve

Re: Indexing Fixed length file

2015-08-28 Thread Steve Rowe

Hi Tim,

I haven’t heard of people indexing this kind of input with Solr, but the format 
is quite similar to CSV/TSV files, with the exception that the field separators 
have fixed positions and are omitted.

You could write a short script to insert separators (e.g. commas) at these 
points (but be sure to escape quotation marks and the separators) and then use 
Solr’s CSV update functionality: 
.

I think dealing with fixed-width fields directly would be a nice addition to 
Solr’s CSV update capabilities - feel free to make an issue - see 
.

Steve
www.lucidworks.com

> On Aug 28, 2015, at 3:19 AM, timmsn  wrote:
> 
> Hello,
> 
> i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files
> they have a fixed length and no withespace to seperate the words. 
> How can i Programm a Template or so for my fields?
> Or can i edit the schema.xml for my Problem?
> 
> This ist one record from one file, in this file are 40 - 100 records.
> 
> AB134364312   58553521789   245678923521234130311G11222345610711MUELLER,
> MAX -00014680Q1-24579021-204052667980002 EEUR  0223/123835062 
> 130445 
> 
> 
> Thanks! 
> 
> Tim
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: ctargett commented on http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html

2015-09-21 Thread Steve Rowe

I logged into comments.a.o and then disabled emailing of comments to this
list.

When we set up the "solrcwiki" site on comments.apache.org, the requirement
was that the PMC chair be the (sole) manager, and though I am no longer
chair, I'm still the manager of the "solrcwiki" site for the ASF commenting
system.

Tomorrow I'll ask ASF Infra about whether the managership should be
transferred to the current PMC chair.  (If they don't care, I don't mind
continuing to manage it.)

On Mon, Sep 21, 2015 at 5:43 PM, Cassandra Targett 
wrote:

> Hey folks,
>
> I'm doing some experiments with other formats for the Ref Guide and playing
> around with options for comments. I didn't realize this old experiment from
> https://issues.apache.org/jira/browse/SOLR-4889 would send email - I'm
> talking to Steve Rowe to see if we can get that disabled.
>
> Cassandra
>
> On Mon, Sep 21, 2015 at 2:06 PM,  wrote:
>
> > Hello,
> > ctargett has commented on
> >
> http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html
> > .
> > You can find the comment here:
> >
> >
> http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html#comment_4535
> > Please note that if the comment contains a hyperlink, it must be
> > approved
> > before it is shown on the site.
> >
> > Below is the reply that was posted:
> > 
> > This is a test of the comments system.
> > 
> >
> > With regards,
> > Apache Solr Cwiki.
> >
> > You are receiving this email because you have subscribed to changes
> > for the solrcwiki site.
> > To stop receiving these emails, unsubscribe from the mailing list
> that
> > is providing these notifications.
> >
> >
>

Re: Can I use tokenizer twice ?

2015-10-14 Thread Steve Rowe

Hi,

Analyzers must have exactly one tokenizer, no more and no less.

You could achieve what you want by copying to another field and defining a 
separate analyzer for each.  One would create shingles, and the other edge 
ngrams.  

Steve

> On Oct 14, 2015, at 11:58 AM, vit  wrote:
> 
> I have Solr 4.2
> I need to do the following:
> 
> 1. white space tokenize
> 2. create shingles
> 3. use EdgeNGramFilter for each word in shingles, but not in a shingle as a
> string
> 
> So can I do this?
> 
> * *
> 
>  maxShingleSize="4" outputUnigrams="false" outputUnigramsIfNoShingles="true"
> />
> * *
>  maxGramSize="25"/>
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tokenize ShingleFilterFactory results and apply filters to tokens

2015-10-19 Thread Steve Rowe

Hi Vitaliy,

I don’t know of any combination of built-in Lucene/Solr analysis components 
that would do what you want, but there used to be filter called 
ShingleMatrixFilter that (if I understand both that filter and what you want 
correctly), would do what you want, following an EdgeNGramFilter: 

It was deprecated in v3.1 and removed in v4.0 (see 
) because it wasn’t being 
maintained by the original creator and nobody else understood it :).  Uwe 
Schindler put up a patch that rewrote it and fixed some problems on 
, but that was never 
finished/committed.

What you want could create a huge number of terms, depending on the # of 
documents, terms in the field, and term length.  What do you want to use these 
terms for?

Steve

> On Oct 17, 2015, at 10:33 AM, vitaly bulgakov  wrote:
> 
> /why don't you put EdgeNGramFilter just after ShingleFilter?/
> 
> Because it will do Edge Ngrams over a shingle as a string:
> for "Home Improvement" shingle it will do:  Hom, Home, Home , Home I,
> Home Im, Home Imp .. 
> 
> But I need:
> ... Hom Imp, Hom Impr ..
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Tokenize-ShingleFilterFactory-results-and-apply-filters-to-tokens-tp4234574p4234872.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: contributor request

2015-11-02 Thread Steve Rowe

Yes, sorry, the wiki took so long to come back after changing it to include 
Alex’s username that I forgot to send notification…  Thanks Erick.
 
> On Oct 31, 2015, at 11:27 PM, Erick Erickson  wrote:
> 
> Looks like Steve added you today, you should be all set.
> 
> On Sat, Oct 31, 2015 at 12:50 PM, Alex  wrote:
>> Oh, shoot, forgot to include my wiki username. Its "AlexYumas" sorry about
>> that stupid me
>> 
>> On Sat, Oct 31, 2015 at 10:48 PM, Alex  wrote:
>> 
>>> Hi,
>>> 
>>> Please kindly add me to the Solr wiki contributors list. The app we're
>>> developing (Jitbit Help) is using Apache Solr to power our knowledge-base
>>> search engine, customers love it. (we were using MS Fulltext indexing
>>> service before, but it's a huge PITA).
>>> 
>>> Thanks
>>>

Re: how to change uniqueKey?

2015-11-04 Thread Steve Rowe

Hi Oleksandr,

> On Nov 3, 2015, at 9:24 AM, Oleksandr Yermolenko  wrote:
> 
> Hello, All,
> 
> I can't find the way to change uniqueKey in "managed-schema" environment!!!

[…]

> 7. The first and the last question: the correct way changing uniqueKey in 
> schemaless environment? what I missed?

There is an open issue to provide this capability: 
https://issues.apache.org/jira/browse/SOLR-7242 but no work done on it yet.

Re: Solr 5: data_driven_schema_config's solrconfig causing error

2015-03-10 Thread Steve Rowe

Hi Aman,

The stack trace shows that the AddSchemaFieldsUpdateProcessorFactory specified 
in data_driven_schema_configs’s solrconfig.xml expects the “booleans” field 
type to exist.

Solr 5’s data_driven_schema_configs includes the “booleans” field type:



So you must have removed it when you modified the schema?  Did you do this 
intentionally?  If so, why?

Steve

> On Mar 10, 2015, at 5:25 AM, Aman Tandon  wrote:
> 
> Hi,
> 
> For the sake of using the new schema.xml and solrconfig.xml with solr 5, I
> put my old required field type & fields names (being used with solr 4.8.1)
> in the schema.xml given in *basic_configs* & configurations setting given
> in solrconfig.xml present in *data_driven_schema_configs* and put I put
> these configuration files in the configs of zookeeper.
> 
> But when i am creating the core it is giving the error as booleans
> fieldType is not found in schema. So correct me if i am doing something
> wrong.
> 
> ERROR - 2015-03-10 08:20:16.788; org.apache.solr.core.CoreContainer; Error
>> creating core [core1]: fieldType 'booleans' not found in the schema
>> org.apache.solr.common.SolrException: fieldType 'booleans' not found in
>> the schema
>> at org.apache.solr.core.SolrCore.(SolrCore.java:896)
>> at org.apache.solr.core.SolrCore.(SolrCore.java:662)
>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
>> at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:573)
>> at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
>> at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> at
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> at
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> at
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.solr.common.SolrException: fieldType 'booleans' not
>> found in the schema
>> at
>> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$TypeMapping.populateValueClasses(AddSchemaFieldsUpdateProcessorFactory.java:244)
>> at
>> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory.inform(AddSchemaFieldsUpdateProcessorFactory.java:170)
>> at
>> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.j

Re: Solr 5: data_driven_schema_config's solrconfig causing error

2015-03-11 Thread Steve Rowe

Hi Aman,

So you (randomly?) chose an example configset, commented out parts you didn’t 
understand, and now things don’t work?

… Maybe you should review the process you’re using?

Like, don’t start with a configset that will auto-populate the schema for you 
with guessed field types if you don’t want to do that.  (That’s the focus of 
the data_driven_schema_configs configset.)

AFAICT, what you’re trying to do is take a configset you’ve used in the past 
with an older version of Solr and get it to work with a newer Solr version.  If 
that’s so, perhaps you should start with a configset like 
sample_techproducts_configs?

Steve

> On Mar 11, 2015, at 1:05 PM, Aman Tandon  wrote:
> 
> I removed/commented as it was not understood able and not for our use.
> 
> With Regards
> Aman Tandon
> 
> On Tue, Mar 10, 2015 at 8:04 PM, Steve Rowe  wrote:
> 
>> Hi Aman,
>> 
>> The stack trace shows that the AddSchemaFieldsUpdateProcessorFactory
>> specified in data_driven_schema_configs’s solrconfig.xml expects the
>> “booleans” field type to exist.
>> 
>> Solr 5’s data_driven_schema_configs includes the “booleans” field type:
>> 
>> <
>> http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_5_0_0/solr/server/solr/configsets/data_driven_schema_configs/conf/managed-schema?view=markup#l249
>>> 
>> 
>> So you must have removed it when you modified the schema?  Did you do this
>> intentionally?  If so, why?
>> 
>> Steve
>> 
>>> On Mar 10, 2015, at 5:25 AM, Aman Tandon 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> For the sake of using the new schema.xml and solrconfig.xml with solr 5,
>> I
>>> put my old required field type & fields names (being used with solr
>> 4.8.1)
>>> in the schema.xml given in *basic_configs* & configurations setting given
>>> in solrconfig.xml present in *data_driven_schema_configs* and put I put
>>> these configuration files in the configs of zookeeper.
>>> 
>>> But when i am creating the core it is giving the error as booleans
>>> fieldType is not found in schema. So correct me if i am doing something
>>> wrong.
>>> 
>>> ERROR - 2015-03-10 08:20:16.788; org.apache.solr.core.CoreContainer;
>> Error
>>>> creating core [core1]: fieldType 'booleans' not found in the schema
>>>> org.apache.solr.common.SolrException: fieldType 'booleans' not found in
>>>> the schema
>>>> at org.apache.solr.core.SolrCore.(SolrCore.java:896)
>>>> at org.apache.solr.core.SolrCore.(SolrCore.java:662)
>>>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
>>>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
>>>> at
>>>> 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:573)
>>>> at
>>>> 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
>>>> at
>>>> 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
>>>> at
>>>> 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
>>>> at
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
>>>> at
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
>>>> at
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
>>>> at
>>>> 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>>>> at
>>>> 
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>>>> at
>>>> 
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>>>> at
>>>> 
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>>>> at
>>>> 
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>>>> at
>>>> 
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>>>> at
>>>> 
>> org.eclipse.jet

Re: schemaless slow indexing

2015-03-23 Thread Steve Rowe


> On Mar 23, 2015, at 11:51 AM, Alexandre Rafalovitch  
> wrote:
> For example, I am not even sure if we can create a copyField
> definition via REST API yet.

Re: schemaless slow indexing

2015-03-23 Thread Steve Rowe

> On Mar 23, 2015, at 11:09 AM, Yonik Seeley  wrote:
> 
> On Mon, Mar 23, 2015 at 1:54 PM, Alexandre Rafalovitch
>  wrote:
>> I looked at SOLR-7290, but I think the discussion should stay on the
>> mailing list for at least one more iteration.
>> 
>> My understanding that the reason copyField exists is so that a search
>> actually worked out of the box. Without knowing the field names, one
>> cannot say what to search.
> 
> Some points:
> - Schemaless is often just to make it easier to get started.
> - If one assumes a lack of knowledge of field names, that's an issue
> for non-schemaless too.
> - Full-text search is only one use-case that people use Solr for...
> there's lots of sorting/faceting/analytics use cases.

Under SOLR-6779, Erik Hatcher changed the data_driven_schema_configs's 
auto-guessed default field type from text_general to strings in order to 
support features other than full-text search:



It’s for exactly this reason (as Alex pointed out) that the catch-all field 
makes sense: there is no other full-text available.

Yonik, can you suggest a path that supports both these possibilities?  Because 
having zero fields with full text search in the default Solr configuration 
seems like a really bad idea to me.

Steve

Re: Add Entry to Support Page

2015-04-21 Thread Steve Rowe

Hi Christoph,

I’ve added your wiki name to the ContributorsGroup page, so you should now be 
able to edit pages on the wiki.

Steve
 
> On Apr 21, 2015, at 8:15 AM, Christoph Schmidt 
>  wrote:
> 
> Solr Community,
> 
> I’m Christoph Schmidt (http://www.moresophy.com/de/management), CEO of the 
> german company moresophy GmbH.
> 
> My Solr Wiki name is:
> 
>  
> 
> -  ChristophSchmidt
> 
>  
> 
> We are working with Lucene since 2003 and Solr 2012 and are building 
> linguistic token filters and plugins for Solr.
> 
> We would like to add the following entry to the Solr Support page:
> 
>  
> 
> moresophy GmbH: consulting in Lucene, Solr, elasticsearch, specialization in 
> linguistic and semantic enrichment and high scalability content clouds 
> (DE/AT/CH)  href="mailto:i...@moresophy.com";>i...@moresophy.com
> 
>  
> 
> Best regards
> 
> Christoph Schmidt
> 
>  
> 
> ___
> 
> Dr. Christoph Schmidt | Geschäftsführer
> 
>  
> 
> P +49-89-523041-72
> 
> M +49-171-1419367
> 
> Skype: cs_moresophy
> 
> christoph.schm...@moresophy.de
> 
> www.moresophy.com
> 
> moresophy GmbH | Fraunhoferstrasse 15 | 82152 München-Martinsried
> 
>  
> 
> Handelsregister beim Amtsgericht München, NR. HRB 136075
> 
> Umsatzsteueridentifikationsnummer: DE813188826
> 
> Vertreten durch die Geschäftsführer: Prof. Dr. Heiko Beier | Dr. Christoph 
> Schmidt
> 
>  
> 
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte 
> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese 
> E-Mailbitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte 
> Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
> 
>  
> 
> This e-mail may contain confidential and/or privileged information. If you 
> are not the intended recipient (or have receiveddestroy this e-mail. Any 
> unauthorised copying, disclosure or distribution of the material in this 
> e-mail is strictly forbidden.
> 
>  
> 
> 
>  
>

Re: Attributes in and

2015-04-28 Thread Steve Rowe

Hi Steve,

From 
:

> The properties that can be specified for a given field type fall into
> three major categories:
>   • Properties specific to the field type's class.
>   • General Properties Solr supports for any field type.
>   • Field Default Properties that can be specified on the field type
> that will be inherited by fields that use this type instead of
> the default behavior.

“indexed” and “stored” are among the Field Default Properties listed as 
specifiable on -s.

 properties override  properties, not the reverse.

Steve

> On Apr 28, 2015, at 9:25 AM, Steven White  wrote:
> 
> Hi Everyone,
> 
> Looking at the out-of-the box schema.xml of Solr 5.1, I see this:
> 
> class="solr.TextField" >
>  
> 
> Is it valid to have "stored" and "indexed" on ?  My
> understanding is that those are on  only.  If not, is the value in
>  overrides what's in ?
> 
> Thanks
> 
> Steve

Re: Schema API: add-field-type

2015-05-05 Thread Steve Rowe

Hi Steve, responses inline below:

> On Apr 29, 2015, at 6:50 PM, Steven White  wrote:
> 
> Hi Everyone,
> 
> When I pass the following:
> http://localhost:8983/solr/db/schema/fieldtypes?wt=xml
> 
> I see this (as one example):
> 
>  
>date
>solr.TrieDateField
>0
>0
>
>  last_modified
>
>
>  *_dts
>  *_dt
>
>  
> 
> See how there is "fields" and "dynamicfields"?  However, when I look in
> schema.xml, I see this:
> 
>   positionIncrementGap="0"/>
> 
> See how there is nothing about "fields" and "dynamicfields".
> 
> Now, when I look further into the schema.xml, I see they are coming from:
> 
>  
>  
>   multiValued="true"/>
> 
> So it all makes sense.
> 
> Does this means the response of "fieldtypes" includes "fields" and
> "dynamicfields" as syntactic-sugar to let me know of the relationship this
> field-type has or is there more to it?

It’s FYI: this is the full list of fields and dynamic fields that use the given 
fieldtype.

> The reason why I care about this question is because I'm using Solr's
> Schema API (see: https://cwiki.apache.org/confluence/display/solr/Schema+API)
> to make changes to my schema.  Per this link:
> https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaNewFieldType
> it shows how to add a field-type via "add-field-type" but there is no
> mention of "fields" or "dynamicfields" in this API.  My assumption is
> "fields" and "dynamicfields" need not be part of this API, instead it is
> done via "add-field" and "add-dynamic-field", thus what I see in the XML of
> "fieldtypes" response is just syntactic-sugar.  Did I get all this right?
> 

Yes, as you say, to add (dynamic) fields after adding a field type, you must 
use the “add-field” and “add-dynamic-field” commands.  Note that you can do so 
in a single request if you like, as long as “add-field-type” is ordered before 
any referencing “add-field”/“add-dynamic-field” command.

To be clear, the “add-field-type” command does not support passing in a set of 
fields and/or dynamic fields to be added with the new field type.

Steve

Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-06 Thread Steve Rowe

Hi Steve,

It’s by design that you can copyField the same source/dest multiple times - 
according to Yonik (not sure where this was discussed), this capability has 
been used in the past to effectively boost terms in the source field.  

The API isn’t symmetric here though: I’m guessing deleting a mutiply specified 
copy field rule will delete all of them, but this isn’t tested, so I’m not sure.

There is no replace-copy-field command because copy field rules don’t have 
dependencies (i.e., nothing else in the schema refers to copy field rules), 
unlike fields, dynamic fields and field types, so 
delete-copy-field/add-copy-field works as one would expect.

For fields, dynamic fields and field types, a delete followed by an add is not 
the same as a replace, since (dynamic) fields could have dependent copyFields, 
and field types could have dependent (dynamic) fields.  delete-* commands are 
designed to fail if there are any existing dependencies, while the replace-* 
commands will maintain the dependencies if they exist.

Steve

> On May 6, 2015, at 6:44 PM, Steven White  wrote:
> 
> Hi Everyone,
> 
> I am using the Schema API to add a new copy field per:
> https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaNewCopyFieldRule
> 
> Unlike the other "Add" APIs, this one will not fail if you add an existing
> copy field object.  In fact, after when I call the API over and over, the
> item will appear over and over in schema.xml file like so:
> 
>  
>  
>  
>  
> 
> Is this the expected behaviour or a bug?  As a side question, is there any
> harm in having multiple "copyField" like I ended up with?
> 
> A final question, why there is no Replace a Copy Field?  Is this by design
> for some limitation or was the API just never implemented?
> 
> Thanks
> 
> Steve

Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-07 Thread Steve Rowe


> On May 6, 2015, at 8:25 PM, Yonik Seeley  wrote:
> 
> On Wed, May 6, 2015 at 8:10 PM, Steve Rowe  wrote:
>> It’s by design that you can copyField the same source/dest multiple times - 
>> according to Yonik (not sure where this was discussed), this capability has 
>> been used in the past to effectively boost terms in the source field.
> 
> Yep, used to be relatively common.
> Perhaps the API could be cleaner though if we supported that by
> passing an optional "numTimes" or "numCopies"?  Seems like a sane
> delete / overwrite options would thus be easier?

+1

Re: schema modification issue

2015-05-11 Thread Steve Rowe

Hi,

Thanks for reporting, I’m working a test to reproduce.  

Can you please create a Solr JIRA issue for this?:  
https://issues.apache.org/jira/browse/SOLR/

Thanks,
Steve

> On May 7, 2015, at 5:40 AM, User Zolr  wrote:
> 
> Hi there,
> 
> I have come accross a problem that  when using managed schema in SolrCloud,
> adding fields into schema would SOMETIMES end up prompting "Can't find
> resource 'schema.xml' in classpath or '/configs/collectionName',
> cwd=/export/solr/solr-5.1.0/server", there is of course no schema.xml in
> configs, but 'schema.xml.bak' and 'managed-schema'
> 
> i use solrj to create a collection:
> 
>Path tempPath = getConfigPath();
> client.uploadConfig(tempPath, name); //customized configs with
> solrconfig.xml using ManagedIndexSchemaFactory
> if(numShards==0){
> numShards = getNumNodes(client);
> }
> Create request = new CollectionAdminRequest.Create();
> request.setCollectionName(name);
> request.setNumShards(numShards);
> replicationFactor =
> (replicationFactor==0?DEFAULT_REPLICA_FACTOR:replicationFactor);
> request.setReplicationFactor(replicationFactor);
> request.setMaxShardsPerNode(maxShardsPerNode==0?replicationFactor:maxShardsPerNode);
> CollectionAdminResponse response = request.process(client);
> 
> 
> and adding fields to schema, either by curl or by httpclient,  would
> sometimes yield the following error, but the error can be fixed by
> RELOADING the newly created collection once or several times:
> 
> INFO  - [{  "responseHeader":{"status":500,"QTime":5},
> "errors":["Error reading input String Can't find resource 'schema.xml' in
> classpath or '/configs/collectionName',
> cwd=/export/solr/solr-5.1.0/server"],  "error":{"msg":"Can't find
> resource 'schema.xml' in classpath or '/configs/collectionName',
> cwd=/export/solr/solr-5.1.0/server","trace":"java.io.IOException: Can't
> find resource 'schema.xml' in classpath or '/configs/collectionName',
> cwd=/export/solr/solr-5.1.0/server
> 
> at
> org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:98)
> at
> org.apache.solr.schema.SchemaManager.getFreshManagedSchema(SchemaManager.java:421)
> at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:104)
> at
> org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94)
> at
> org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at j

Re: schema.xml & xi:include -> copyField source :'_my_title' is not a glob and doesn't match any explicit field or dynamicField

2015-05-15 Thread Steve Rowe

Hi Clemens,

I think the problem is the structure of the composite schema - you’ll end up 
with:

   <- your other schema file
   <- the included schema-common.xml

   tags from your schema-common.xml.  You won’t be able to use 
it alone in that case, but if you need to do that, you could just create 
another schema file that includes it inside wrapping  tags.

Steve

> On May 15, 2015, at 4:01 AM, Clemens Wyss DEV  wrote:
> 
> Given the following schema.xml
> 
> 
>  _my_id
>  
>  
>  
> stored="true" type="string"/>
>  stored="true" type="string"/>
> type="string"/> 
>  
>   
>
>  positionIncrementGap="0" precisionStep="0"/>
>  
> 
> 
> When I try to include the very schema from another schema file, e.g.:
> 
> 
>   xmlns:xi="http://www.w3.org/2001/XInclude"/> 
> 
> 
> I get SolrException
> copyField source :'_my_title' is not a glob and doesn't match any explicit 
> field or dynamicField
> 
> Am I facing a bug or a feature?
> 
> Thanks
> - Clemens

Re: schema.xml & xi:include -> copyField source :'_my_title' is not a glob and doesn't match any explicit field or dynamicField

2015-05-15 Thread Steve Rowe

Hi Clemens,

I forgot that XInclude requires well-formed XML, so schema-common.xml without 
 tags won’t work, since it will have multiple root elements.

But instead of XInclude, you can define external entities for files you want to 
include, and then include a reference to them where you want the contents to be 
included.

This worked for me:

——
schema.xml
——

 ]>

  &schema_common;

——

——
schema-common.incl
——
 _my_id
 
 
 
   
   

 
  
   

 
——

Here’s what I get back from curl 
"http://localhost:8983/solr/mycore/schema?wt=schema.xml&indent=on”:

——


  _my_id
  
  
  
  
  
  
  

——

Steve

> On May 15, 2015, at 8:57 AM, Clemens Wyss DEV  wrote:
> 
> Thought about that too (should have written ;) ).
> When I remove the schema-tag from the composite xml I get:
> org.apache.solr.common.SolrException: Unable to create core [test]
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:533)
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:493)
> ...
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
> Caused by: org.apache.solr.common.SolrException: Could not load conf for core 
> test: org.apache.solr.common.SolrException: org.xml.sax.SAXParseException; 
> systemId: solrres:/schema.xml; lineNumber: 3; columnNumber: 84; Error 
> attempting to parse XML file (href='schema-common.xml').. Schema file is 
> C:\source\search\search-impl\WebContent\WEB-INF\solr\configsets\test\conf\schema.xml
>   at 
> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:78)
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:516)
>   ... 12 more
> Caused by: com.google.common.util.concurrent.UncheckedExecutionException: 
> org.apache.solr.common.SolrException: org.xml.sax.SAXParseException; 
> systemId: solrres:/schema.xml; lineNumber: 3; columnNumber: 84; Error 
> attempting to parse XML file (href='schema-common.xml').. Schema file is 
> C:\source\search\search-impl\WebContent\WEB-INF\solr\configsets\test\conf\schema.xml
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2199)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
>   at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
>   at 
> org.apache.solr.core.ConfigSetService$SchemaCaching.createIndexSchema(ConfigSetService.java:206)
>   at 
> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:74)
>   ... 13 more
> Caused by: org.apache.solr.common.SolrException: 
> org.xml.sax.SAXParseException; systemId: solrres:/schema.xml; lineNumber: 3; 
> columnNumber: 84; Error attempting to parse XML file 
> (href='schema-common.xml').. Schema file is 
> C:\source\search\search-impl\WebContent\WEB-INF\solr\configsets\test\conf\schema.xml
>   at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:596)
>   at org.apache.solr.schema.IndexSchema.(IndexSchema.java:175)
>   at 
> org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
>   at 
> org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
>   at 
> org.apache.solr.core.ConfigSetService$SchemaCaching$1.call(ConfigSetService.java:210)
>   at 
> org.apache.solr.core.ConfigSetService$SchemaCaching$1.call(ConfigSetService.java:206)
>   at 
> com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
>   ... 17 more
> Caused by: org.apache.solr.common.SolrException: 
> org.xml.sax.SAXParseException; systemId: solrres:/schema.xml; lineNumber: 3; 
> columnNumber: 84; Error attempting to parse XML file 
> (href='schema-common.xml').
>   at org.apache.solr.core.Config.(Config.java:156)
>   at org.apache.solr.core.Config.(Config.java:92)
>   at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:455)
>   ... 27 more
> Caused by: org.xml.sax.SAXParseException; systemId: solrres:/schema.xml; 
> lineNumber: 3; columnNumber: 84; Error attempting to parse XML file 
> (href='schema-common.xml').
>

Re: Deleting Fields

2015-05-30 Thread Steve Rowe

Hi Joseph,

> On May 30, 2015, at 8:18 AM, Joseph Obernberger  
> wrote:
> 
> Thank you Erick.  I was thinking that it actually went through and removed 
> the index data; that you for the clarification.

I added more info to the Schema API page about this not being true.  Here’s 
what I’ve got so far - let me know if you think we should add more warnings 
about this:

-
Re-index after schema modifications!

If you modify your schema, you will likely need to re-index all documents. If 
you do not, you may lose access to documents, or not be able to interpret them 
properly, e.g. after replacing a field type.

Modifying your schema will never modify any documents that are already indexed. 
Again, you must re-index documents in order to apply schema changes to them.

[…]

When modifying the schema with the API, a core reload will automatically occur 
in order for the changes to be available immediately for documents indexed 
thereafter.  Previously indexed documents will not be automatically handled - 
they must be re-indexed if they used schema elements that you changed.
-

Steve

Re: ManagedStopFilterFactory not accepting ignoreCase

2015-06-17 Thread Steve Rowe

Hi Mike,

Looks like a bug to me - would you please create a JIRA?

Thanks,
Steve

> On Jun 17, 2015, at 10:29 AM, Mike Thomsen  wrote:
> 
> We're running Solr 4.10.4 and getting this...
> 
> Caused by: java.lang.IllegalArgumentException: Unknown parameters:
> {ignoreCase=true}
>at
> org.apache.solr.rest.schema.analysis.BaseManagedTokenFilterFactory.(BaseManagedTokenFilterFactory.java:46)
>at
> org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory.(ManagedStopFilterFactory.java:47)
> 
> This is the filter definition I used:
> 
>   ignoreCase="true"
>  managed="english"/>
> 
> Any ideas?
> 
> Thanks,
> 
> Mike

Re: ManagedStopFilterFactory not accepting ignoreCase

2015-06-17 Thread Steve Rowe

Oh, I see you already did :) - thanks. - Steve

> On Jun 17, 2015, at 11:10 AM, Steve Rowe  wrote:
> 
> Hi Mike,
> 
> Looks like a bug to me - would you please create a JIRA?
> 
> Thanks,
> Steve
> 
>> On Jun 17, 2015, at 10:29 AM, Mike Thomsen  wrote:
>> 
>> We're running Solr 4.10.4 and getting this...
>> 
>> Caused by: java.lang.IllegalArgumentException: Unknown parameters:
>> {ignoreCase=true}
>>   at
>> org.apache.solr.rest.schema.analysis.BaseManagedTokenFilterFactory.(BaseManagedTokenFilterFactory.java:46)
>>   at
>> org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory.(ManagedStopFilterFactory.java:47)
>> 
>> This is the filter definition I used:
>> 
>> > ignoreCase="true"
>> managed="english"/>
>> 
>> Any ideas?
>> 
>> Thanks,
>> 
>> Mike
>

Re: MappingCharFilterFactory and start and end offsets

2015-06-18 Thread Steve Rowe

Hi Dmitry,

It’s weird that start and end offsets are the same - what do you see for the 
start/end of ‘$’, i.e. if you take out MCFF?  (I think it should be start:5, 
end:6.)

As far as offsets “respecting the remapped token”, are you asking for offsets 
to be set as if ‘dollarsign' were part of the original text?  If so, there is 
no setting that would do that - the intent is for offsets to map to the 
*original* text.  You can work around this by performing the substitution prior 
to Solr analysis, e.g. in an update processor like RegexReplaceProcessorFactory.

Steve
www.lucidworks.com

> On Jun 18, 2015, at 3:07 AM, Dmitry Kan  wrote:
> 
> Hi,
> 
> It looks like MappingCharFilter sets start and end offset to the same
> value. Can this be affected on by some setting?
> 
> For a string: test $ test2 and mapping "$" => " dollarsign " (we insert
> extra space to separate $ into its own token)
> 
> we get: http://snag.gy/eJT1H.jpg
> 
> Ideally, we would like to have start and end offset respecting the remapped
> token. Can this be achieved with settings?
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info

Re: Help: Problem in customized token filter

2015-06-18 Thread Steve Rowe

Hi Aman,

The admin UI screenshot you linked to is from an older version of Solr - what 
version are you using?

Lots of extraneous angle brackets and asterisks got into your email and made 
for a bunch of cleanup work before I could read or edit it.  In the future, 
please put your code somewhere people can easily read it and copy/paste it into 
an editor: into a github gist or on a paste service, etc.

Looks to me like your use of “exhausted” is unnecessary, and is likely the 
cause of the problem you saw (only one document getting processed): you never 
set exhausted to false, and when the filter got reused, it incorrectly carried 
state from the previous document.

Here’s a simpler version that’s hopefully more correct and more efficient (2 
fewer copies from the StringBuilder to the final token).  Note: I didn’t test 
it:

https://gist.github.com/sarowe/9b9a52b683869ced3a17

Steve
www.lucidworks.com

> On Jun 18, 2015, at 11:33 AM, Aman Tandon  wrote:
> 
> Please help, what wrong I am doing here. please guide me.
> 
> With Regards
> Aman Tandon
> 
> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
> wrote:
> 
>> Hi,
>> 
>> I created a *token concat filter* to concat all the tokens from token
>> stream. It creates the concatenated token as expected.
>> 
>> But when I am posting the xml containing more than 30,000 documents, then
>> only first document is having the data of that field.
>> 
>> *Schema:*
>> 
>> *>> required="false" omitNorms="false" multiValued="false" />*
>> 
>> 
>> 
>> 
>> 
>> 
>>> *>> positionIncrementGap="100">*
>>> *  *
>>> **
>>> **
>>> *>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
>>> **
>>> *>> outputUnigrams="true" tokenSeparator=""/>*
>>> *>> language="English" protected="protwords.txt"/>*
>>> *>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>> *>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
>>> expand="true"/>*
>>> *  *
>>> *  *
>>> **
>>> *>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
>>> *>> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />*
>>> *>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
>>> **
>>> *>> language="English" protected="protwords.txt"/>*
>>> *>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>> *  ***
>> 
>> 
>> Please help me, The code for the filter is as follows, please take a look.
>> 
>> Here is the picture of what filter is doing
>> 
>> 
>> The code of concat filter is :
>> 
>> *package com.xyz.analysis.concat;*
>>> 
>>> *import java.io.IOException;*
>>> 
>>> 
 *import org.apache.lucene.analysis.TokenFilter;*
>>> 
>>> *import org.apache.lucene.analysis.TokenStream;*
>>> 
>>> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
>>> 
>>> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
>>> 
>>> *import
 org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
>>> 
>>> *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
>>> 
>>> 
 *public class ConcatenateWordsFilter extends TokenFilter {*
>>> 
>>> 
 *  private CharTermAttribute charTermAttribute =
 addAttribute(CharTermAttribute.class);*
>>> 
>>> *  private OffsetAttribute offsetAttribute =
 addAttribute(OffsetAttribute.class);*
>>> 
>>> *  PositionIncrementAttribute posIncr =
 addAttribute(PositionIncrementAttribute.class);*
>>> 
>>> *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
>>> 
>>> 
 *  private StringBuilder stringBuilder = new StringBuilder();*
>>> 
>>> *  private boolean exhausted = false;*
>>> 
>>> 
 *  /***
>>> 
>>> *   * Creates a new ConcatenateWordsFilter*
>>> 
>>> *   * @param input TokenStream that will be filtered*
>>> 
>>> *   */*
>>> 
>>> *  public ConcatenateWordsFilter(TokenStream input) {*
>>> 
>>> *super(input);*
>>> 
>>> *  }*
>>> 
>>> 
 *  /***
>>> 
>>> *   * {@inheritDoc}*
>>> 
>>> *   */*
>>> 
>>> *  @Override*
>>> 
>>> *  public final boolean incrementToken() throws IOException {*
>>> 
>>> *while (!exhausted && input.incrementToken()) {*
>>> 
>>> *  char terms[] = charTermAttribute.buffer();*
>>> 
>>> *  int termLength = charTermAttribute.length();*
>>> 
>>> *  if(typeAtrr.type().equals("")){*
>>> 
>>> * stringBuilder.append(terms, 0, termLength);*
>>> 
>>> *  }*
>>> 
>>> *  charTermAttribute.copyBuffer(terms, 0, termLength);*
>>> 
>>> *  return true;*
>>> 
>>> *}*
>>> 
>>> 
 *if (!exhausted) {*
>>> 
>>> *  exhausted = true;*
>>> 
>>> *  String sb = stringBuilder.toString();*
>>> 
>>> *  System.err.println("The Data got is "+sb);*
>>> 
>>> *  int

Re: Help: Problem in customized token filter

2015-06-18 Thread Steve Rowe

Aman,

My version won’t produce anything at all, since incrementToken() always returns 
false…

I updated the gist (at the same URL) to fix the problem by returning true from 
incrementToken() once and then false until reset() is called.  It also handles 
the case when the concatenated token is zero length by not emitting a token.

Steve
www.lucidworks.com

> On Jun 19, 2015, at 12:55 AM, Steve Rowe  wrote:
> 
> Hi Aman,
> 
> The admin UI screenshot you linked to is from an older version of Solr - what 
> version are you using?
> 
> Lots of extraneous angle brackets and asterisks got into your email and made 
> for a bunch of cleanup work before I could read or edit it.  In the future, 
> please put your code somewhere people can easily read it and copy/paste it 
> into an editor: into a github gist or on a paste service, etc.
> 
> Looks to me like your use of “exhausted” is unnecessary, and is likely the 
> cause of the problem you saw (only one document getting processed): you never 
> set exhausted to false, and when the filter got reused, it incorrectly 
> carried state from the previous document.
> 
> Here’s a simpler version that’s hopefully more correct and more efficient (2 
> fewer copies from the StringBuilder to the final token).  Note: I didn’t test 
> it:
> 
>https://gist.github.com/sarowe/9b9a52b683869ced3a17
> 
> Steve
> www.lucidworks.com
> 
>> On Jun 18, 2015, at 11:33 AM, Aman Tandon  wrote:
>> 
>> Please help, what wrong I am doing here. please guide me.
>> 
>> With Regards
>> Aman Tandon
>> 
>> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
>> wrote:
>> 
>>> Hi,
>>> 
>>> I created a *token concat filter* to concat all the tokens from token
>>> stream. It creates the concatenated token as expected.
>>> 
>>> But when I am posting the xml containing more than 30,000 documents, then
>>> only first document is having the data of that field.
>>> 
>>> *Schema:*
>>> 
>>> *>>> required="false" omitNorms="false" multiValued="false" />*
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> *>>> positionIncrementGap="100">*
>>>> *  *
>>>> **
>>>> **
>>>> *>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
>>>> **
>>>> *>>> outputUnigrams="true" tokenSeparator=""/>*
>>>> *>>> language="English" protected="protwords.txt"/>*
>>>> *>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>>> *>>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
>>>> expand="true"/>*
>>>> *  *
>>>> *  *
>>>> **
>>>> *>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
>>>> *>>> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />*
>>>> *>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
>>>> **
>>>> *>>> language="English" protected="protwords.txt"/>*
>>>> *>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>>> *  ***
>>> 
>>> 
>>> Please help me, The code for the filter is as follows, please take a look.
>>> 
>>> Here is the picture of what filter is doing
>>> <http://i.imgur.com/THCsYtG.png?1>
>>> 
>>> The code of concat filter is :
>>> 
>>> *package com.xyz.analysis.concat;*
>>>> 
>>>> *import java.io.IOException;*
>>>> 
>>>> 
>>>>> *import org.apache.lucene.analysis.TokenFilter;*
>>>> 
>>>> *import org.apache.lucene.analysis.TokenStream;*
>>>> 
>>>> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
>>>> 
>>>> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
>

Re: Help: Problem in customized token filter

2015-06-18 Thread Steve Rowe

Aman,

Solr uses the same Token filter instances over and over, calling reset() before 
sending each document through.  Your code sets “exhausted" to true and then 
never sets it back to false, so the next time the token filter instance is 
used, its “exhausted" value is still true, so no input stream tokens are 
concatenated ever again.

Does that make sense?

Steve
www.lucidworks.com

> On Jun 19, 2015, at 1:10 AM, Aman Tandon  wrote:
> 
> Hi Steve,
> 
> 
>> you never set exhausted to false, and when the filter got reused, *it
>> incorrectly carried state from the previous document.*
> 
> 
> Thanks for replying, but I am not able to understand this.
> 
> With Regards
> Aman Tandon
> 
> On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe  wrote:
> 
>> Hi Aman,
>> 
>> The admin UI screenshot you linked to is from an older version of Solr -
>> what version are you using?
>> 
>> Lots of extraneous angle brackets and asterisks got into your email and
>> made for a bunch of cleanup work before I could read or edit it.  In the
>> future, please put your code somewhere people can easily read it and
>> copy/paste it into an editor: into a github gist or on a paste service, etc.
>> 
>> Looks to me like your use of “exhausted” is unnecessary, and is likely the
>> cause of the problem you saw (only one document getting processed): you
>> never set exhausted to false, and when the filter got reused, it
>> incorrectly carried state from the previous document.
>> 
>> Here’s a simpler version that’s hopefully more correct and more efficient
>> (2 fewer copies from the StringBuilder to the final token).  Note: I didn’t
>> test it:
>> 
>>https://gist.github.com/sarowe/9b9a52b683869ced3a17
>> 
>> Steve
>> www.lucidworks.com
>> 
>>> On Jun 18, 2015, at 11:33 AM, Aman Tandon 
>> wrote:
>>> 
>>> Please help, what wrong I am doing here. please guide me.
>>> 
>>> With Regards
>>> Aman Tandon
>>> 
>>> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I created a *token concat filter* to concat all the tokens from token
>>>> stream. It creates the concatenated token as expected.
>>>> 
>>>> But when I am posting the xml containing more than 30,000 documents,
>> then
>>>> only first document is having the data of that field.
>>>> 
>>>> *Schema:*
>>>> 
>>>> *>>>> required="false" omitNorms="false" multiValued="false" />*
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> *>>>> positionIncrementGap="100">*
>>>>> *  *
>>>>> **
>>>>> **
>>>>> *>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
>>>>> **
>>>>> *>>>> outputUnigrams="true" tokenSeparator=""/>*
>>>>> *>>>> language="English" protected="protwords.txt"/>*
>>>>> *>>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>>>> *>>>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
>>>>> expand="true"/>*
>>>>> *  *
>>>>> *  *
>>>>> **
>>>>> *>>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
>>>>> *>>>> words="stopwords_text_prime_search.txt"
>> enablePositionIncrements="true" />*
>>>>> *>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
>>>>> **
>>>>> *>>>> language="English" protected="protwords.txt"/>*
>>>>> *>>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>>>> *  ***
>>>> 
>>>> 
>>>> Pleas

Re: accent insensitive field-type

2015-07-02 Thread Steve Rowe

Hi Søren,

“charFilter” should be “charFilters”, and “filter” should be “filters”; and 
both their values should be arrays - try this:

{
  "add-field-type”: {
"name":"myTxtField",
"class":"solr.TextField",
"positionIncrementGap":"100",
"analyzer”: {
  "charFilters": [ {"class":"solr.MappingCharFilterFactory", 
"mapping":"mapping-ISOLatin1Accent.txt”} ],
  "tokenizer": [ {"class":"solr.StandardTokenizerFactory”} ],
  "filters": {"class":"solr.LowerCaseFilterFactory"}
}
  }
}

There should be better error messages for misspellings here.  I’ll file a JIRA 
issue.

(I also moved “filters” after “tokenizer” since that’s the order in which 
they’re executed in an analysis pipeline, but Solr will interpret the 
out-of-order version correctly.)

FYI, if you want to *correct* a field type, rather than create a new one, you 
should use the “replace-field-type” command instead of the “add-field-type” 
command.  You’ll get an error if you attempt to add a field type that already 
exists in the schema.

Steve

> On Jul 2, 2015, at 1:17 AM, Søren  wrote:
> 
> Hi Solr users
> 
> I'm new to Solr and I need to be able to search in structured data in a case 
> and accent insensitive manner. E.g. find "Crème brûlée", both when quering 
> with "Crème brûlée" and "creme brulee".
> 
> It seems that none of the build-in text types support this, or am I wrong?
> So I try to add my own inspired by another post, although it was old.
> 
> I'm running solr-5.2.1.
> 
> Curl to http://localhost:8983/solr/mycore/schema
> {
> "add-field-type":{
> "name":"myTxtField",
> "class":"solr.TextField",
> "positionIncrementGap":"100",
> "analyzer":{
>"charFilter": {"class":"solr.MappingCharFilterFactory", 
> "mapping":"mapping-ISOLatin1Accent.txt"},
>"filter": {"class":"solr.LowerCaseFilterFactory"},
>"tokenizer": {"class":"solr.StandardTokenizerFactory"}
>}
>}
> }
> 
> But it doesn't work and when I look in '[... 
> ]\solr-5.2.1\server\solr\mycore\conf\managed-schema'
> the analyzer section is reduced to this:
>   positionIncrementGap="100">
>
>  
>
>  
> 
> I'm I almost there or am I on a completely wrong track?
> 
> Thanks in advance
> Søren
>

Re: accent insensitive field-type

2015-07-02 Thread Steve Rowe

See https://issues.apache.org/jira/browse/SOLR-7749

> On Jul 2, 2015, at 8:31 AM, Steve Rowe  wrote:
> 
> Hi Søren,
> 
> “charFilter” should be “charFilters”, and “filter” should be “filters”; and 
> both their values should be arrays - try this:
> 
> {
>  "add-field-type”: {
>"name":"myTxtField",
>"class":"solr.TextField",
>"positionIncrementGap":"100",
>"analyzer”: {
>  "charFilters": [ {"class":"solr.MappingCharFilterFactory", 
> "mapping":"mapping-ISOLatin1Accent.txt”} ],
>  "tokenizer": [ {"class":"solr.StandardTokenizerFactory”} ],
>  "filters": {"class":"solr.LowerCaseFilterFactory"}
>}
>  }
> }
> 
> There should be better error messages for misspellings here.  I’ll file a 
> JIRA issue.
> 
> (I also moved “filters” after “tokenizer” since that’s the order in which 
> they’re executed in an analysis pipeline, but Solr will interpret the 
> out-of-order version correctly.)
> 
> FYI, if you want to *correct* a field type, rather than create a new one, you 
> should use the “replace-field-type” command instead of the “add-field-type” 
> command.  You’ll get an error if you attempt to add a field type that already 
> exists in the schema.
> 
> Steve
> 
>> On Jul 2, 2015, at 1:17 AM, Søren  wrote:
>> 
>> Hi Solr users
>> 
>> I'm new to Solr and I need to be able to search in structured data in a case 
>> and accent insensitive manner. E.g. find "Crème brûlée", both when quering 
>> with "Crème brûlée" and "creme brulee".
>> 
>> It seems that none of the build-in text types support this, or am I wrong?
>> So I try to add my own inspired by another post, although it was old.
>> 
>> I'm running solr-5.2.1.
>> 
>> Curl to http://localhost:8983/solr/mycore/schema
>> {
>> "add-field-type":{
>>"name":"myTxtField",
>>"class":"solr.TextField",
>>"positionIncrementGap":"100",
>>"analyzer":{
>>   "charFilter": {"class":"solr.MappingCharFilterFactory", 
>> "mapping":"mapping-ISOLatin1Accent.txt"},
>>   "filter": {"class":"solr.LowerCaseFilterFactory"},
>>   "tokenizer": {"class":"solr.StandardTokenizerFactory"}
>>   }
>>   }
>> }
>> 
>> But it doesn't work and when I look in '[... 
>> ]\solr-5.2.1\server\solr\mycore\conf\managed-schema'
>> the analyzer section is reduced to this:
>> > positionIncrementGap="100">
>>   
>> 
>>   
>> 
>> 
>> I'm I almost there or am I on a completely wrong track?
>> 
>> Thanks in advance
>> Søren
>> 
>

Re: accent insensitive field-type

2015-07-03 Thread Steve Rowe

Hi Søren,

> On Jul 3, 2015, at 4:27 AM, Søren  wrote:
> 
> Thanks Steve! Everything works now.
> A little modification:
> 
>"analyzer":{
>"charFilters": [ {"class":"solr.MappingCharFilterFactory", 
> "mapping":"mapping-ISOLatin1Accent.txt"} ],
>"tokenizer": {"class":"solr.StandardTokenizerFactory"},
>"filters": [{"class":"solr.LowerCaseFilterFactory"}]
>}

I’m glad you got it to work.

Yeah, I put square brackets in the wrong place, cool you figured it out and 
fixed it.

> Thankfully, when key is a plural word, the value is an array.
> 
> It was still teasing me when I tested with various queries. But specifying 
> field solved that for me too.
> 
> ...q=bruleedidn't find anything. It goes into to the raw index I guess
> 
> ...q=desert:bruleedid find "Crème brûlée”!

In your query request handler you should specify a “df” param (default field), 
likely under the defaults section (so that it can be overridden via per-request 
param) - this param will work with the dismax, edisma, or wtandard query 
parsers.  The “qf" param, which supports a list of query fields (and field 
aliasing)[1], also works in the dismax and edismax query parsers.

Steve

[1] See section "Field aliasing using per-field qf overrides” on the edismax 
ref guide page: 

 and the qf param  description on the dismax ref guide page: 
.

Re: unsubscribe

2015-07-07 Thread Steve Rowe

Hi Jacob,

See https://lucene.apache.org/solr/resources.html#mailing-lists for unsubscribe 
info

Notice also that every email from the solr-user mailing list contains the 
following header:

List-Unsubscribe: 

Steve

> On Jul 7, 2015, at 11:46 AM, Jacob Singh  wrote:
> 
> Unsubscribe
> On Jul 7, 2015 11:39 AM, "Jacob Singh"  wrote:
> 
>> 
>> 
>> --
>> +1 512-522-6281
>> twitter: @JacobSingh ( http://twitter.com/#!/JacobSingh )
>> web: http://www.jacobsingh.name
>> Skype: pajamadesign
>> gTalk: jacobsi...@gmail.com
>>

Re: Querying Nested documents

2015-07-13 Thread Steve Rowe

Hi rameshn,

Nabble has a nasty habit of stripping out HTML and XML markup before sending 
your mail out to the mailing list - see your message quoted below for how it 
appears to people who aren’t reading via Nabble.

My suggestion: directly subscribe to the solr-user mailing list[1] and avoid 
Nabble.  (They’ve known about the problem for many years and AFAICT have done 
nothing about it.)

Steve

[1] https://lucene.apache.org/solr/resources.html#mailing-lists

> On Jul 13, 2015, at 12:03 PM, rameshn  wrote:
> 
> Hi, I have question regarding nested documents.My document looks like below,  
>   
> 1234xger00parent  
>  
> 2015-06-15T13:29:07ZegeDuperhttp://www.domain.com 
>   
> zoome1234-images   
> http://somedomain.com/some.jpg1:1   
> 1234-platform-iosios   
> https://somedomain.comsomelinkfalse   
> 2015-03-23T10:58:00Z-12-30T19:00:00Z  
> 
> 1234-platform-androidandroid   
> somedomain.comsomelinkfalse   
> 2015-03-23T10:58:00Z-12-30T19:00:00Z  
> Right now I can query like
> thishttp://localhost:8983/solr/demo/select?q={!parent%20which=%27type:parent%27}&fl=*,[child%20parentFilter=type:parent%20childFilter=image_uri_s:*]&indent=trueand
> get the parent and child document with matching criteria (just parent and
> image child document).*But, I want to get all other children*
> (1234-platform-ios and 1234-platform-andriod) even if i query based on
> image_uri_s (1234-images) although they are other children which are part of
> the parent document.Is it possible ?Appreciate your help !Thanks,Ramesh
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217088.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-17 Thread Steve Rowe

Hi Peter,

Are you familiar with the Schema API?: 


You can use it to create fields, field types, etc. prior to ingesting your data.

--
Steve
www.lucidworks.com

> On May 17, 2016, at 11:05 AM, Horváth Péter Gergely 
>  wrote:
> 
> Hi All,
> 
> By default Solr allows you to define the type of a dynamic field by
> appending a post-fix to the name itself. E.g. creating a color_s field
> instructs Solr to create a string field. My understanding is that if we do
> this, all queries must refer the post-fixed field name as well. So
> instead of a query like color:"red", we will have to write something like
> color_s:"red" -- and so on for other field types as well.
> 
> I am wondering if it is possible to specify the data type used for a field
> in Solr 6.0.0, without having to modify the field name. (Or at least in a
> way that would allow us to use the original field name) Do you have any
> idea, how to achieve this? I am fine, if we have to specify the field type
> during the insertion of a document, however, I do not want to keep using
> post-fixes while running queries...
> 
> Thanks,
> Peter

Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-19 Thread Steve Rowe

Peter,

It’s an interesting idea.  Could you make a Solr JIRA?

I don’t know where the field type specification would go, but providing a 
mechanism to specify field type for previously non-existent fields, outside of 
the field names themselves, seems useful.

In the meantime, do you know about field aliasing?  

1. You can get results back that rename fields to whatever you want: see the 
section “Field Name Aliases” here: 
<https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters>.

2. On the query side, eDisMax can perform aliasing so that user-specified field 
names in queries get mapped to one or more indexed fields: look for “alias” in 
<https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser>.

--
Steve
www.lucidworks.com

> On May 19, 2016, at 4:43 AM, Horváth Péter Gergely 
>  wrote:
> 
> Hi Steve,
> 
> Yes, I know the schema API, however I do not want to specify the field type
> problematically for every single field.
> 
> I would like to be able to specify the field type when it is being added
> (similar to the name postfixes, but without affecting the field names).
> 
> Thanks,
> Peter
> 
> 
> 2016-05-17 17:08 GMT+02:00 Steve Rowe :
> 
>> Hi Peter,
>> 
>> Are you familiar with the Schema API?: <
>> https://cwiki.apache.org/confluence/display/solr/Schema+API>
>> 
>> You can use it to create fields, field types, etc. prior to ingesting your
>> data.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On May 17, 2016, at 11:05 AM, Horváth Péter Gergely <
>> peter.gergely.horv...@gmail.com> wrote:
>>> 
>>> Hi All,
>>> 
>>> By default Solr allows you to define the type of a dynamic field by
>>> appending a post-fix to the name itself. E.g. creating a color_s field
>>> instructs Solr to create a string field. My understanding is that if we
>> do
>>> this, all queries must refer the post-fixed field name as well. So
>>> instead of a query like color:"red", we will have to write something like
>>> color_s:"red" -- and so on for other field types as well.
>>> 
>>> I am wondering if it is possible to specify the data type used for a
>> field
>>> in Solr 6.0.0, without having to modify the field name. (Or at least in a
>>> way that would allow us to use the original field name) Do you have any
>>> idea, how to achieve this? I am fine, if we have to specify the field
>> type
>>> during the insertion of a document, however, I do not want to keep using
>>> post-fixes while running queries...
>>> 
>>> Thanks,
>>> Peter
>> 
>>

Re: Requesting to be added to ContributorsGroup

2016-05-20 Thread Steve Rowe

Hi Sheece,

I have CC’d your address for this email, but ordinarily all discussion goes 
only to the mailing list, so you have to either subscribe to this mailing list 
- see  - or follow 
the discussion on a service like Nabble.

I added you to the ContributorsGroup on the same day you requested it - see 
.

You should now be able to contribute.  Please let us know if there’s a problem.

--
Steve
www.lucidworks.com

> On May 20, 2016, at 6:30 PM, Syed Gardezi  wrote:
> 
> Hello,
> 
>  There are couple of things that need to be updated on the wiki page. 
> I would like to get it done. Can you kindly update.
> 
> Cheers,
> 
> Sheece
> 
> 
> From: Syed Gardezi
> Sent: Wednesday, 4 May 2016 12:03:01 AM
> To: solr-user@lucene.apache.org
> Subject: Requesting to be added to ContributorsGroup
> 
> Hello,
> I am a Master student as part of Free and Open Source Software 
> Development COMP8440 - http://programsandcourses.anu.edu.au/course/COMP8440 
> at Australian National University. I have selected 
> http://wiki.apache.org/solr/ to contribute too. Kindly add me too 
> ContributorsGroup. Thank you.
> 
> wiki username: sheecegardezi
> 
> Regards,
> Sheece
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-27 Thread Steve Rowe

I’m working on addressing problems using multi-term synonyms at query time in 
Lucene and Solr.

I recommend these two blogs for understanding the issues (the second one was 
mentioned earlier in this thread):




In addition to the already-mentioned projects, there is also:



All of these projects try in various ways to work around the fact that Lucene’s 
QueryParser splits on whitespace before sending text to analysis, one token at 
a time, so in a synonym filter, multi-word synonyms can never match and add 
alternatives.  See , where 
I’ve posted a patch to directly address that problem - note that it’s still a 
work in progress.

Once LUCENE-2605 has been fixed, there is still work to do getting (e)dismax to 
work with the modified Lucene QueryParser, and addressing problems with how 
queries are constructed from Lucene’s “sausagized” token stream.

--
Steve
www.lucidworks.com

> On May 26, 2016, at 2:21 PM, John Bickerstaff  
> wrote:
> 
> Thanks Chris --
> 
> The two projects I'm aware of are:
> 
> https://github.com/healthonnet/hon-lucene-synonyms
> 
> and the one referenced from the Lucidworks page here:
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> 
> ... which is here : https://github.com/LucidWorks/auto-phrase-tokenfilter
> 
> Is there anything else out there that you would recommend I look at?
> 
> On Thu, May 26, 2016 at 12:01 PM, Chris Morley  wrote:
> 
>> Chris Morley here, from Wayfair.  (Depahelix = my domain)
>> 
>> Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
>> We worked mostly off of Ted Sullivan's work and also off of some
>> suggestions from Koorosh Vakhshoori.  We have gotten to a point where we
>> have a more sophisticated internal implementation, however, we've found
>> that it is very difficult to make it do what you want it to do, and also be
>> sufficiently performant.  Watch out for exceptional situations with mm
>> (minimum should match).
>> 
>> Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also
>> done work in this area.
>> 
>> It should be very possible to get this kind of thing working on
>> SolrCloud.  I haven't tried it yet but I think theoretically, it should
>> just work.  The synonyms stuff is mostly about doing things at index time
>> and query time.  The index time stuff should translate to SolrCloud
>> directly, while the query time stuff might pose some issues, but probably
>> not too bad, if there are any issues at all.
>> 
>> I've had decent luck porting our various plugins from 4.10.x to 5.5.0
>> because a lot of stuff is just Java, and it still works within the Jetty
>> context.
>> 
>> -Chris.
>> 
>> 
>> 
>> 
>> 
>> From: "John Bickerstaff" 
>> Sent: Thursday, May 26, 2016 1:51 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
>> Hey Jeff (or anyone interested in multi-word synonyms) here are some
>> potentially interesting links...
>> 
>> http://wiki.apache.org/solr/QueryParser (search the page for
>> synonum_edismax)
>> 
>> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ (blog
>> post about what became the synonym_edissmax Query Parser)
>> 
>> 
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> 
>> This last was useful for lots of reasons and contains links to other
>> interesting, related web pages...
>> 
>> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes 
>> wrote:
>> 
>>> Oh, interesting. I've certainty encountered issues with multi-word
>>> synonyms, but I hadn't come across this. If you end up using it with a
>>> recent solr verison, I'd be glad to hear your experience.
>>> 
>>> I haven't used it, but I am aware of one other project in this vein that
>>> you might be interested in looking at:
>>> https://github.com/LucidWorks/auto-phrase-tokenfilter
>>> 
>>> 
>>> On 5/26/16, 9:29 AM, "John Bickerstaff" 
>> wrote:
>>> 
 Ahh - for question #3 I may have spoken too soon. This line from the
 github repository readme suggests a way.
 
 Update: We have tested to run with the jar in $SOLR_HOME/lib as well,
>> and
 it works (Jetty).
 
 I'll try that and only respond back if that doesn't work.
 
 Questions 1 and 2 still stand of course... If anyone on the list has
 experience in this area...
 
 Thanks.
 
 On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
>>> j...@johnbickerstaff.com
> wrote:
 
> Hi all,
> 
> I'm creating a Solr Cloud that will index and search medical text.
> Multi-word synonyms are a pretty imp

[ANNOUNCE] Apache Solr 6.0.1 released

2016-05-28 Thread Steve Rowe

28 May 2016, Apache Solr™ 6.0.1 available 

The Lucene PMC is pleased to announce the release of Apache Solr 6.0.1 

Solr is the popular, blazing fast, open source NoSQL search platform 
from the Apache Lucene project. Its major features include powerful 
full-text search, hit highlighting, faceted search, dynamic 
clustering, database integration, rich document (e.g., Word, PDF) 
handling, and geospatial search. Solr is highly scalable, providing 
fault tolerant distributed search and indexing, and powers the search 
and navigation features of many of the world's largest internet sites. 

This release includes 31 bug fixes, documentation updates, etc., 
since the 6.0.0 release. 

The release is available for immediate download at: 

http://www.apache.org/dyn/closer.lua/lucene/solr/6.0.1 

Please read CHANGES.txt for a detailed list of changes: 

https://lucene.apache.org/solr/6_0_1/changes/Changes.html 

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html) 

Note: The Apache Software Foundation uses an extensive mirroring 
network for distributing releases. It is possible that the mirror you 
are using may not have replicated the release yet. If that is the 
case, please try another mirror. This also goes for Maven access.

Re: Solr 6.1.x Release Date ??

2016-06-16 Thread Steve Rowe

Tomorrow-ish.

--
Steve
www.lucidworks.com

> On Jun 16, 2016, at 4:14 AM, Ramesh shankar  wrote:
> 
> Hi,
> 
> Yes, i used the solr-6.1.0-79 nightly builds and [subquery] transformer is
> working fine in, any idea of the expected release date for 6.1 ?
> 
> Regards
> Ramesh
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-6-1-x-Release-Date-tp4280945p4282562.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Fail to load org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer for fieldType "preanalyzed"

2016-06-24 Thread Steve Rowe

Hi Liu Peng,

Did you mix parts of an older Solr installation into your 6.0.0 installation?  
There were changes to PreAnalyzedField recently (in 5.5.0: 
), and so if you mix old Solr 
jars with newer ones, you might see things like the error you showed.  (The 
PreAnalyzedAnalyzer class was not present in older Solr versions.)

If you see this problem with a clean install of Solr 6.0:

* How did you add fields?  By directly modifying schema.xml? Or via the Schema 
API?

* Do any of your documents contain fields that use the "preanalyzed" field type?
 
* Which Java version/vendor are you using?

--
Steve
www.lucidworks.com

> On Jun 23, 2016, at 10:21 PM, t...@sina.com wrote:
> 
> Hi,
> 
> I use Solr 6.0 on Windows. And try the example techproducts. At first I run 
> bin\solr -e techproducts -s "example\techproducts" and it works fine. But 
> when I add several fields, and try to restart it, I get some failures. From 
> the log, it should be fail to load the PreAnalyzedAnalyzer for fieldType 
> "preanalyzed". The call stack is as follow:
> 
> INFO  - 2016-06-24 02:02:29.866; [   ] org.apache.solr.schema.IndexSchema; 
> [techproducts] Schema name=example
> ERROR - 2016-06-24 02:02:30.122; [   ] 
> org.apache.solr.schema.FieldTypePluginLoader; Cannot load analyzer: 
> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer
> java.lang.InstantiationException: 
> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer
>at java.lang.Class.newInstance(Class.java:427)
>at 
> org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:271)
>at 
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:104)
>at 
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:53)
>at 
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
>at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:474)
>at org.apache.solr.schema.IndexSchema.(IndexSchema.java:163)
>at 
> org.apache.solr.schema.ManagedIndexSchema.(ManagedIndexSchema.java:104)
>at 
> org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:172)
>at 
> org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:45)
>at 
> org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:75)
>at 
> org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:108)
>at 
> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:79)
>at org.apache.solr.core.CoreContainer.create(CoreContainer.java:815)
>at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:88)
>at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:468)
>at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:459)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer.()
>at java.lang.Class.getConstructor0(Class.java:3082)
>at java.lang.Class.newInstance(Class.java:412)
>... 21 more
> ERROR - 2016-06-24 02:02:30.128; [   ] org.apache.solr.core.CoreContainer; 
> Error creating core [techproducts]: Could not load conf for core 
> techproducts: Can't load schema 
> C:\WORK\Solr\Solr6.0\solr-6.0.0\example\techproducts\solr\techproducts\conf\managed-schema:
>  Plugin init failure for [schema.xml] fieldType "preanalyzed": Cannot load 
> analyzer: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer.
> 
> What could be the reason? On stackoverflow, some answers mentioned that to 
> fix the error like "java.lang.NoSuchMethodException: .YYY.()", we 
> could add a constructor without arguments. But for this issue, I don't think 
> so.
> 
> Thanks
> Liu Peng

[ANNOUNCE] Apache Solr 5.5.2 released

2016-06-25 Thread Steve Rowe

25 June 2016, Apache Solr™ 5.5.2 available

The Lucene PMC is pleased to announce the release of Apache Solr 5.5.2

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

This release includes 38 bug fixes, documentation updates, etc.,
since the 5.5.1 release.

The release is available for immediate download at:

  http://www.apache.org/dyn/closer.lua/lucene/solr/5.5.2

Please read CHANGES.txt for a detailed list of changes:

  https://lucene.apache.org/solr/5_5_2/changes/Changes.html

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

Re: analyzer for _text_ field

2016-07-15 Thread Steve Rowe

Hi Waldyr,

An example of changing the _text_ analyzer by first creating a new field type, 
and then changing the _text_ field to use the new field type (after starting 
Solr 6.1 with “bin/solr start -e schemaless”):

-
PROMPT$ curl -X POST -H 'Content-type: application/json’ \
http://localhost:8983/solr/gettingstarted/schema --data-binary '{
  "add-field-type": {
"name": "my_new_field_type",
"class": "solr.TextField",
"analyzer": {
  "charFilters": [{
"class": "solr.HTMLStripCharFilterFactory"
  }],
  "tokenizer": {
"class": "solr.StandardTokenizerFactory"
  },
  "filters":[{
  "class": "solr.WordDelimiterFilterFactory"
}, {
  "class": "solr.LowerCaseFilterFactory"
  }]}},
  "replace-field": {
"name": "_text_",
"type": "my_new_field_type",
"multiValued": "true",
"indexed": "true",
"stored": "false"
  }}’
-

PROMPT$ curl http://localhost:8983/solr/gettingstarted/schema/fields/_text_

-
{
  "responseHeader”:{ […] },
  "field":{
"name":"_text_",
"type":"my_new_field_type",
"multiValued":true,
"indexed":true,
"stored":false}}
-

--
Steve
www.lucidworks.com

> On Jul 15, 2016, at 12:54 PM, Waldyr Neto  wrote:
> 
> Hy, How can i configure the analyzer for the _text_ field?

Re: analyzer for _text_ field

2016-07-15 Thread Steve Rowe

Waldyr, maybe it got mangled by my email client or yours?  

Here’s the same command:

  <https://gist.github.com/sarowe/db2fcd168eb77d7278f716ac75bfb9e9>

--
Steve
www.lucidworks.com

> On Jul 15, 2016, at 2:16 PM, Waldyr Neto  wrote:
> 
> Hy Steves, tks for the help
> unfortunately i'm making some mistake
> 
> when i try to run
>>> 
> curl -X POST -H 'Content-type: application/json’ \
> http://localhost:8983/solr/gettingstarted/schema --data-binary
> '{"add-field-type": { "name": "my_new_field_type", "class":
> "solr.TextField","analyzer": {"charFilters": [{"class":
> "solr.HTMLStripCharFilterFactory"}], "tokenizer": {"class":
> "solr.StandardTokenizerFactory"},"filters":[{"class":
> "solr.WordDelimiterFilterFactory"}, {"class":
> "solr.LowerCaseFilterFactory"}]}},"replace-field": { "name":
> "_text_","type": "my_new_field_type", "multiValued": "true","indexed":
> "true","stored": "false"}}’
> 
> i receave the folow error msg from curl program
> :
> 
> curl: (3) [globbing] unmatched brace in column 1
> 
> curl: (6) Could not resolve host: name
> 
> curl: (6) Could not resolve host: my_new_field_type,
> 
> curl: (6) Could not resolve host: class
> 
> curl: (6) Could not resolve host: solr.TextField,analyzer
> 
> curl: (3) [globbing] unmatched brace in column 1
> 
> curl: (3) [globbing] bad range specification in column 2
> 
> curl: (3) [globbing] unmatched close brace/bracket in column 32
> 
> curl: (6) Could not resolve host: tokenizer
> 
> curl: (3) [globbing] unmatched brace in column 1
> 
> curl: (3) [globbing] unmatched close brace/bracket in column 30
> 
> curl: (3) [globbing] unmatched close brace/bracket in column 32
> 
> curl: (3) [globbing] unmatched brace in column 1
> 
> curl: (3) [globbing] unmatched close brace/bracket in column 28
> 
> curl: (3) [globbing] unmatched brace in column 1
> 
> curl: (6) Could not resolve host: name
> 
> curl: (6) Could not resolve host: _text_,type
> 
> curl: (6) Could not resolve host: my_new_field_type,
> 
> curl: (6) Could not resolve host: multiValued
> 
> curl: (6) Could not resolve host: true,indexed
> 
> curl: (6) Could not resolve host: true,stored
> 
> curl: (3) [globbing] unmatched close brace/bracket in column 6
> 
> cvs1:~ vvisionphp1$
> 
> On Fri, Jul 15, 2016 at 2:45 PM, Steve Rowe  wrote:
> 
>> Hi Waldyr,
>> 
>> An example of changing the _text_ analyzer by first creating a new field
>> type, and then changing the _text_ field to use the new field type (after
>> starting Solr 6.1 with “bin/solr start -e schemaless”):
>> 
>> -
>> PROMPT$ curl -X POST -H 'Content-type: application/json’ \
>>http://localhost:8983/solr/gettingstarted/schema --data-binary '{
>>  "add-field-type": {
>>"name": "my_new_field_type",
>>"class": "solr.TextField",
>>"analyzer": {
>>  "charFilters": [{
>>"class": "solr.HTMLStripCharFilterFactory"
>>  }],
>>  "tokenizer": {
>>"class": "solr.StandardTokenizerFactory"
>>  },
>>  "filters":[{
>>  "class": "solr.WordDelimiterFilterFactory"
>>}, {
>>  "class": "solr.LowerCaseFilterFactory"
>>  }]}},
>>  "replace-field": {
>>"name": "_text_",
>>"type": "my_new_field_type",
>>"multiValued": "true",
>>"indexed": "true",
>>"stored": "false"
>>  }}’
>> -
>> 
>> PROMPT$ curl
>> http://localhost:8983/solr/gettingstarted/schema/fields/_text_
>> 
>> -
>> {
>>  "responseHeader”:{ […] },
>>  "field":{
>>"name":"_text_",
>>"type":"my_new_field_type",
>>"multiValued":true,
>>"indexed":true,
>>"stored":false}}
>> -
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Jul 15, 2016, at 12:54 PM, Waldyr Neto  wrote:
>>> 
>>> Hy, How can i configure the analyzer for the _text_ field?
>> 
>>

Re: analyzer for _text_ field

2016-07-16 Thread Steve Rowe

Waldyr,

I don’t understand your first question - are you asking how to change the 
schema without using the Schema API?

About phonetic matching: there are several different phonetic token filters 
provided with Solr - see 
<https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching>.

--
Steve
www.lucidworks.com

> On Jul 16, 2016, at 5:26 AM, Waldyr Neto  wrote:
> 
> tks, it works :)
> 
> but do you know how i could do this, thange the _text_ analyzer using
> schemas? maybe in any point i could change the default analyzer. what i
> really need is to use any analyzer that work with phonetic search in the
> content of my files;
> 
> On Fri, Jul 15, 2016 at 10:11 PM, Waldyr Neto  wrote:
> 
>> tks a lot, i'll try soon and give u a feed back :)
>> 
>> On Fri, Jul 15, 2016 at 4:07 PM, David Santamauro <
>> david.santama...@gmail.com> wrote:
>> 
>>> 
>>> The opening and closing single quotes don't match
>>> 
>>> -data-binary '{ ... }’
>>> 
>>> it should be:
>>> 
>>> -data-binary '{ ... }'
>>> 
>>> 
>>> 
>>> On 07/15/2016 02:59 PM, Steve Rowe wrote:
>>> 
>>>> Waldyr, maybe it got mangled by my email client or yours?
>>>> 
>>>> Here’s the same command:
>>>> 
>>>>   <https://gist.github.com/sarowe/db2fcd168eb77d7278f716ac75bfb9e9>
>>>> 
>>>> --
>>>> Steve
>>>> www.lucidworks.com
>>>> 
>>>> On Jul 15, 2016, at 2:16 PM, Waldyr Neto  wrote:
>>>>> 
>>>>> Hy Steves, tks for the help
>>>>> unfortunately i'm making some mistake
>>>>> 
>>>>> when i try to run
>>>>> 
>>>>>> 
>>>>>>> curl -X POST -H 'Content-type: application/json’ \
>>>>> http://localhost:8983/solr/gettingstarted/schema --data-binary
>>>>> '{"add-field-type": { "name": "my_new_field_type", "class":
>>>>> "solr.TextField","analyzer": {"charFilters": [{"class":
>>>>> "solr.HTMLStripCharFilterFactory"}], "tokenizer": {"class":
>>>>> "solr.StandardTokenizerFactory"},"filters":[{"class":
>>>>> "solr.WordDelimiterFilterFactory"}, {"class":
>>>>> "solr.LowerCaseFilterFactory"}]}},"replace-field": { "name":
>>>>> "_text_","type": "my_new_field_type", "multiValued": "true","indexed":
>>>>> "true","stored": "false"}}’
>>>>> 
>>>>> i receave the folow error msg from curl program
>>>>> :
>>>>> 
>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>> 
>>>>> curl: (6) Could not resolve host: name
>>>>> 
>>>>> curl: (6) Could not resolve host: my_new_field_type,
>>>>> 
>>>>> curl: (6) Could not resolve host: class
>>>>> 
>>>>> curl: (6) Could not resolve host: solr.TextField,analyzer
>>>>> 
>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>> 
>>>>> curl: (3) [globbing] bad range specification in column 2
>>>>> 
>>>>> curl: (3) [globbing] unmatched close brace/bracket in column 32
>>>>> 
>>>>> curl: (6) Could not resolve host: tokenizer
>>>>> 
>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>> 
>>>>> curl: (3) [globbing] unmatched close brace/bracket in column 30
>>>>> 
>>>>> curl: (3) [globbing] unmatched close brace/bracket in column 32
>>>>> 
>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>> 
>>>>> curl: (3) [globbing] unmatched close brace/bracket in column 28
>>>>> 
>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>> 
>>>>> curl: (6) Could not resolve host: name
>>>>> 
>>>>> curl: (6) Could not resolve host: _text_,type
>>>>> 
>>>>> curl: (6) Could not resolve host: my_new_field_type,
>>>>> 
>>>>> curl: (6) Could not resolve host: multiValued
>>>>> 
>>>>> curl: (6) Could no

Re: analyzer for _text_ field

2016-07-16 Thread Steve Rowe

Waldyr, I recommend you start reading the Solr Reference Guide here: 
<https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters>.
  In the following sections, there are many examples of schema.xml 
configuration of field types and fields.

In general: what you’ll want to do is either modify the field type that the 
_text_ field uses, or create a new field type and change the _text_ field 
definition to use it instead.

--
Steve
www.lucidworks.com

> On Jul 16, 2016, at 1:38 PM, Waldyr Neto  wrote:
> 
> yeap,
> 
> i'm loking for a way to specify in schema.xml theh analyzer for the _text_
> field
> 
> On Sat, Jul 16, 2016 at 12:22 PM, Steve Rowe  wrote:
> 
>> Waldyr,
>> 
>> I don’t understand your first question - are you asking how to change the
>> schema without using the Schema API?
>> 
>> About phonetic matching: there are several different phonetic token
>> filters provided with Solr - see <
>> https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching>.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Jul 16, 2016, at 5:26 AM, Waldyr Neto  wrote:
>>> 
>>> tks, it works :)
>>> 
>>> but do you know how i could do this, thange the _text_ analyzer using
>>> schemas? maybe in any point i could change the default analyzer. what i
>>> really need is to use any analyzer that work with phonetic search in the
>>> content of my files;
>>> 
>>> On Fri, Jul 15, 2016 at 10:11 PM, Waldyr Neto 
>> wrote:
>>> 
>>>> tks a lot, i'll try soon and give u a feed back :)
>>>> 
>>>> On Fri, Jul 15, 2016 at 4:07 PM, David Santamauro <
>>>> david.santama...@gmail.com> wrote:
>>>> 
>>>>> 
>>>>> The opening and closing single quotes don't match
>>>>> 
>>>>> -data-binary '{ ... }’
>>>>> 
>>>>> it should be:
>>>>> 
>>>>> -data-binary '{ ... }'
>>>>> 
>>>>> 
>>>>> 
>>>>> On 07/15/2016 02:59 PM, Steve Rowe wrote:
>>>>> 
>>>>>> Waldyr, maybe it got mangled by my email client or yours?
>>>>>> 
>>>>>> Here’s the same command:
>>>>>> 
>>>>>>  <https://gist.github.com/sarowe/db2fcd168eb77d7278f716ac75bfb9e9>
>>>>>> 
>>>>>> --
>>>>>> Steve
>>>>>> www.lucidworks.com
>>>>>> 
>>>>>> On Jul 15, 2016, at 2:16 PM, Waldyr Neto  wrote:
>>>>>>> 
>>>>>>> Hy Steves, tks for the help
>>>>>>> unfortunately i'm making some mistake
>>>>>>> 
>>>>>>> when i try to run
>>>>>>> 
>>>>>>>> 
>>>>>>>>> curl -X POST -H 'Content-type: application/json’ \
>>>>>>> http://localhost:8983/solr/gettingstarted/schema --data-binary
>>>>>>> '{"add-field-type": { "name": "my_new_field_type", "class":
>>>>>>> "solr.TextField","analyzer": {"charFilters": [{"class":
>>>>>>> "solr.HTMLStripCharFilterFactory"}], "tokenizer": {"class":
>>>>>>> "solr.StandardTokenizerFactory"},"filters":[{"class":
>>>>>>> "solr.WordDelimiterFilterFactory"}, {"class":
>>>>>>> "solr.LowerCaseFilterFactory"}]}},"replace-field": { "name":
>>>>>>> "_text_","type": "my_new_field_type", "multiValued":
>> "true","indexed":
>>>>>>> "true","stored": "false"}}’
>>>>>>> 
>>>>>>> i receave the folow error msg from curl program
>>>>>>> :
>>>>>>> 
>>>>>>> curl: (3) [globbing] unmatched brace in column 1
>>>>>>> 
>>>>>>> curl: (6) Could not resolve host: name
>>>>>>> 
>>>>>>> curl: (6) Could not resolve host: my_new_field_type,
>>>>>>> 
>>>>>>> curl: (6) Could not resolve host: class
>>>>>>> 
>>>>>>> curl: (6) Could not resolve host: s

Re: How to Add New Fields and Fields Types Programmatically Using Solrj

2016-07-18 Thread Steve Rowe

Hi Jeniba,

You can add fields and field types using Solrj with SchemaRequest.Update 
subclasses - see here for a list: 


There are quite a few examples of doing both in the tests: 


--
Steve
www.lucidworks.com

> On Jul 18, 2016, at 1:59 AM, Jeniba Johnson  
> wrote:
> 
> 
> Hi,
> 
> I have configured solr5.3.1 and started Solr in schema less mode. Using 
> SolrInputDocument, Iam able to add new fields in solrconfig.xml using Solrj.
> How to specify the field type of a field using Solrj.
> 
> Eg  required="true" multivalued="false" />
> 
> How can I add field type properties using SolrInputDocument programmatically 
> using Solrj? Can anyone help with it?
> 
> 
> 
> Regards,
> Jeniba Johnson
> 
> 
> 
> 
> The contents of this e-mail and any attachment(s) may contain confidential or 
> privileged information for the intended recipient(s). Unintended recipients 
> are prohibited from taking action on the basis of information in this e-mail 
> and using or disseminating the information, and must notify the sender and 
> delete it from their system. L&T Infotech will not accept responsibility or 
> liability for the accuracy or completeness of, or the presence of any virus 
> or disabling code in this e-mail"

Re: EmbeddedSolrServer problem when using one-jar-with-dependency including solr

2016-08-02 Thread Steve Rowe

solr-core[1] and solr-solrj[2] POMs have parent POM solr-parent[3], which in 
turn has parent POM lucene-solr-grandparent[4], which has a 
 section that specifies dependency versions & exclusions 
*for all direct dependencies*.

The intent is for all Lucene/Solr’s internal dependencies to be managed 
directly, rather than through Maven’s transitive dependency mechanism.  For 
background, see summary & comments on JIRA issue LUCENE-5217[5].

I haven’t looked into how this affects systems that depend on Lucene/Solr 
artifacts, but it appears to be the case that you can’t use Maven’s transitive 
dependency mechanism to pull in all required dependencies for you.

BTW, if you look at the grandparent POM, the httpclient version for Solr 6.1.0 
is declared as 4.4.1.  I don’t know if depending on version 4.5.2 is causing 
problems, but if you don’t need a feature in 4.5.2, I suggest that you depend 
on the same version as Solr does.

For error #2, you should depend on lucene-core[6].

My suggestion as a place to start: copy/paste the dependencies from 
solr-core[1] and solr-solrj[2] POMs, and leave out stuff you know you won’t 
need.

[1] 

[2] 

[3] 

[4] 

[5] 
[6] 

--
Steve
www.lucidworks.com

> On Aug 2, 2016, at 12:03 PM, Ziqi Zhang  wrote:
> 
> Hi, I am using Solr, Solrj 6.1, and Maven to manage my project. I use maven 
> to build a jar-with-dependency and run a java program pointing its classpath 
> to this jar. However I keep getting errors even when I just try to create an 
> instance of EmbeddedSolrServer:
> 
> */code/
> *String solrHome = "/home/solr/";
> String solrCore = "fw";
> solrCores = new EmbeddedSolrServer(
>Paths.get(solrHome), solrCore
>).getCoreContainer();
> ///
> 
> 
> My project has dependencies defined in the pom shown below:  **When block A 
> is not present**, running the code that calls:
> 
> * pom /*
> 
>org.apache.jena
>jena-arq
>3.0.1
>
> 
>
> BLOCK A
> org.apache.httpcomponents
>httpclient
>4.5.2
> BLOCK A ENDS
>
>
>org.apache.solr
>solr-core
>6.1.0
>
>
>org.slf4j
> slf4j-log4j12
>
>
>log4j
>log4j
>
>
>org.slf4j
> slf4j-jdk14
>
>
>
>
>org.apache.solr
>solr-solrj
>6.1.0
>
>
>org.slf4j
> slf4j-log4j12
>
>
>log4j
>log4j
>
>
>org.slf4j
> slf4j-jdk14
>
>
>
> ///
> 
> 
> Block A is added because when it is missing, the following error is thrown on 
> the java code above:
> 
> * ERROR 1 ///*
> 
>Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/http/impl/client/CloseableHttpClient
>at 
> org.apache.solr.handler.component.HttpShardHandlerFactory.init(HttpShardHandlerFactory.java:167)
>at 
> org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:47)
>at org.apache.solr.core.CoreContainer.load(CoreContainer.java:404)
>at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.load(EmbeddedSolrServer.java:84)
>at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.(EmbeddedSolrServer.java:70)
>at 
> uk.ac.ntu.sac.sense.SenseProperty.initSolrServer(SenseProperty.java:103)
>at 
> uk.ac.ntu.sac.sense.SenseProperty.getClassIndex(SenseProperty.java:81)
>at 
> uk.ac.ntu.sac.sense.kb.indexer.IndexMaster.(IndexMaster.java:31)
>at uk.ac.ntu.sac.sense.test.TestIndexer.main(TestIndexer.java:14)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI

Re: EmbeddedSolrServer problem when using one-jar-with-dependency including solr

2016-08-03 Thread Steve Rowe

Oh, then likely the problem is that your uberjar packing tool doesn’t know how 
to (or maybe isn’t configured to?) include/merge/translate resources under 
META-INF/services/.  E.g. lucene/core module has SPI files there.

Info on the maven shade plugin’s configuration for this stuff is here here: 
<https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer>

--
Steve
www.lucidworks.com

> On Aug 3, 2016, at 5:26 AM, Ziqi Zhang  wrote:
> 
> Thanks
> 
> I am not sure if Steve's suggestion was the right solution. Even when I did 
> not have explicitly defined the dependency on lucene, I can see in the 
> packaged jar it still contains org.apache.lucene.
> 
> What solved my problem is to not pack a single jar but use a folder of 
> individual jars. I am not sure why though.
> 
> Regards
> 
> 
> On 02/08/2016 21:53, Rohit Kanchan wrote:
>> We also faced same issue when we were running embedded solr 6.1 server.
>> Actually I faced the same in our integration environment after deploying
>> project. Solr 6.1 is using http client 4.4.1 which I think  embedded solr
>> server is looking for. I think when solr core is getting loaded then old
>> http client is getting loaded from some where in your maven. Check
>> dependency tree of your pom.xml and see if you can exclude this jar getting
>> loaded from anywhere else. Just exclude them in your pom.xml. I hope this
>> solves your issue,
>> 
>> 
>> Thanks
>> Rohit
>> 
>> 
>> On Tue, Aug 2, 2016 at 9:44 AM, Steve Rowe  wrote:
>> 
>>> solr-core[1] and solr-solrj[2] POMs have parent POM solr-parent[3], which
>>> in turn has parent POM lucene-solr-grandparent[4], which has a
>>>  section that specifies dependency versions &
>>> exclusions *for all direct dependencies*.
>>> 
>>> The intent is for all Lucene/Solr’s internal dependencies to be managed
>>> directly, rather than through Maven’s transitive dependency mechanism.  For
>>> background, see summary & comments on JIRA issue LUCENE-5217[5].
>>> 
>>> I haven’t looked into how this affects systems that depend on Lucene/Solr
>>> artifacts, but it appears to be the case that you can’t use Maven’s
>>> transitive dependency mechanism to pull in all required dependencies for
>>> you.
>>> 
>>> BTW, if you look at the grandparent POM, the httpclient version for Solr
>>> 6.1.0 is declared as 4.4.1.  I don’t know if depending on version 4.5.2 is
>>> causing problems, but if you don’t need a feature in 4.5.2, I suggest that
>>> you depend on the same version as Solr does.
>>> 
>>> For error #2, you should depend on lucene-core[6].
>>> 
>>> My suggestion as a place to start: copy/paste the dependencies from
>>> solr-core[1] and solr-solrj[2] POMs, and leave out stuff you know you won’t
>>> need.
>>> 
>>> [1] <
>>> https://repo1.maven.org/maven2/org/apache/solr/solr-core/6.1.0/solr-core-6.1.0.pom
>>> [2] <
>>> https://repo1.maven.org/maven2/org/apache/solr/solr-solrj/6.1.0/solr-solrj-6.1.0.pom
>>> [3] <
>>> https://repo1.maven.org/maven2/org/apache/solr/solr-parent/6.1.0/solr-parent-6.1.0.pom
>>> [4] <
>>> https://repo1.maven.org/maven2/org/apache/lucene/lucene-solr-grandparent/6.1.0/lucene-solr-grandparent-6.1.0.pom
>>> [5] <https://issues.apache.org/jira/browse/LUCENE-5217>
>>> [6] <
>>> http://search.maven.org/#artifactdetails|org.apache.lucene|lucene-core|6.1.0|jar
>>> --
>>> Steve
>>> www.lucidworks.com
>>> 
>>>> On Aug 2, 2016, at 12:03 PM, Ziqi Zhang 
>>> wrote:
>>>> Hi, I am using Solr, Solrj 6.1, and Maven to manage my project. I use
>>> maven to build a jar-with-dependency and run a java program pointing its
>>> classpath to this jar. However I keep getting errors even when I just try
>>> to create an instance of EmbeddedSolrServer:
>>>> */code/
>>>> *String solrHome = "/home/solr/";
>>>> String solrCore = "fw";
>>>> solrCores = new EmbeddedSolrServer(
>>>>Paths.get(solrHome), solrCore
>>>>).getCoreContainer();
>>>> ///
>>>> 
>>>> 
>>>> My project has dependencies defined in the pom shown below:  **When
>>> block A is not present**, running the code that calls:
>>>> * pom /*
>>&

Re: Difference in boolean query parsing. Solr-5.4.0 VS Solr.6.1.0

2016-08-04 Thread Steve Rowe

It’s fairly likely these differences are as a result of SOLR-2649[1] (released 
with 5.5) and SOLR-8812[2] (released with 6.1).

If you haven’t seen it, I recommend you read Hoss'ss blog “Why Not AND, OR, And 
NOT?” .

If you can, add parentheses to explicitly specify precedence.

[1] https://issues.apache.org/jira/browse/SOLR-2649
[2] https://issues.apache.org/jira/browse/SOLR-8812

--
Steve
www.lucidworks.com

> On Aug 4, 2016, at 2:23 AM, Modassar Ather  wrote:
> 
> Hi,
> 
> During migration from Solr-5.4.1 to Solr-6.1.0 I saw a difference in the
> behavior of few of my boolean queries.
> As per my current understanding the default operator comes in when there is
> no operator present in between two terms.
> Also both the ANDed terms are marked mandatory if not, any of them is
> introduced as NOT. Same is the case with OR.
> Please correct me if my understanding is wrong.
> 
> The below queries are parsed differently and causes a lot of difference in
> search result.
> The default operator used is AND and no mm is set.
> 
> 
> *Query  : *fl:(network hardware AND device OR system)
> *Solr.6.1.0 :* "+(+fl:network +fl:hardware fl:device fl:system)"
> *Solr-5.4.0 : *"+(fl:network +fl:hardware +fl:device fl:system)"
> 
> *Query  : *fl:(network OR hardware device system)
> *Solr.6.1.0 : *"+(fl:network fl:hardware +fl:device +fl:system)"
> *Solr-5.4.0 : *"+(fl:network fl:hardware fl:device fl:system)"
> 
> *Query  : *fl:(network OR hardware AND device OR system)
> *Solr.6.1.0 : *"+(fl:network +fl:hardware fl:device fl:system)"
> *Solr-5.4.0 : *"+(fl:network +fl:hardware +fl:device fl:system)"
> 
> *Query  : *fl:(network AND hardware AND device OR system)"
> *Solr.6.1.0 : *"+(+fl:network +fl:hardware fl:device fl:system)"
> *Solr-5.4.0 : *"+(+fl:network +fl:hardware +fl:device fl:system)"
> 
> Please help me understand the difference in parsing and its effect on
> search.
> 
> Thanks,
> Modassar

Re: How can I set the defaultOperator to be AND?

2016-08-05 Thread Steve Rowe

Hi Bastien,

Have you tried upgrading to 6.1?  SOLR-8812, mentioned earlier in the thread, 
was released with 6.1, and is directly aimed at fixing the problem you are 
having in 6.0 (also a problem in 5.5): when mm is not explicitly provided and 
the query contains explicit operators (except for AND), edismax now sets mm=0.

--
Steve
www.lucidworks.com

> On Aug 5, 2016, at 2:34 AM, Bastien Latard | MDPI AG 
>  wrote:
> 
> Hi Eric & others,
> Is there any way to overwrite the default OP when we use edismax?
> Because adding the following line to solrconfig.xml doesn't solve the problem:
> 
> 
> (Then if I do "q=black OR white", this always gives the results for "black 
> AND white")
> 
> I did not find a way to define a default OP, which is automatically 
> overwritten by the AND/OR from a query.
> 
> 
> Example - Debug: defaultOP in solrconfig = AND / q=a or b
> 
> 
> ==> results for black AND white
> The correct result should be the following (but I had to force the q.op):
> 
> ==> I cannot do this in case I want to do "(a AND b) OR c"...
> 
> 
> Kind regards,
> Bastien
> 
> On 27/04/2016 05:30, Erick Erickson wrote:
>> Defaulting to "OR" has been the behavior since forever, so changing the 
>> behavior now is just not going to happen. Making it fit a new version of 
>> "correct" will change the behavior for every application out there that has 
>> not specified the default behavior.
>> 
>> There's no a-priori reason to expect "more words to equal fewer docs", I can 
>> just as easily argue that "more words should return more docs". Which you 
>> expect depends on your mental model.
>> 
>> And providing the default op in your solrconfig.xml request handlers allows 
>> you to implement whatever model your application chooses...
>> 
>> Best,
>> Erick
>> 
>> On Mon, Apr 25, 2016 at 11:32 PM, Bastien Latard - MDPI AG 
>>  wrote:
>> Thank you Shawn, Jan and Georg for your answers.
>> 
>> Yes, it seems that if I simply remove the defaultOperator it works well for 
>> "composed queries" like '(a:x AND b:y) OR c:z'.
>> But I think that the default Operator should/could be the AND.
>> 
>> Because when I add an extra search word, I expect that the results get more 
>> accurate...
>> (It seems to be what google is also doing now)
>>|   |   
>> 
>> Otherwise, if you make a search and apply another filter (e.g.: sort by 
>> publication date, facets, ...) , user can get the less relevant item (only 1 
>> word in 4 matches) in first position only because of its date...
>> 
>> What do you think?
>> 
>> 
>> Kind regards,
>> Bastien
>> 
>> 
>> On 25/04/2016 14:53, Shawn Heisey wrote:
>>> On 4/25/2016 6:39 AM, Bastien Latard - MDPI AG wrote:
>>> 
 Remember:
 If I add the following line to the schema.xml, even if I do a search
 'title:"test" OR author:"me"', it will returns documents matching
 'title:"test" AND author:"me"':
  
 
>>> The settings in the schema for default field and default operator were
>>> deprecated a long time ago.  I actually have no idea whether they are
>>> even supported in newer Solr versions.
>>> 
>>> The q.op parameter controls the default operator, and the df parameter
>>> controls the default field.  These can be set in the request handler
>>> definition in solrconfig.xml -- usually in "defaults" but there might be
>>> reason to put them in "invariants" instead.
>>> 
>>> If you're using edismax, you'd be better off using the mm parameter
>>> rather than the q.op parameter.  The behavior you have described above
>>> sounds like a change in behavior (some call it a bug) introduced in the
>>> 5.5 version:
>>> 
>>> 
>>> https://issues.apache.org/jira/browse/SOLR-8812
>>> 
>>> 
>>> If you are using edismax, I suspect that if you set mm=100% instead of
>>> q.op=AND (or the schema default operator) that the problem might go away
>>> ... but I am not sure.  Someone who is more familiar with SOLR-8812
>>> probably should comment.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>>> 
>

Re: Getting dynamic fields using LukeRequest.

2016-08-09 Thread Steve Rowe

Not sure what the issue is with LukeRequest, but Solrj has Schema API support: 


You can see which options are supported here: 


--
Steve
www.lucidworks.com

> On Aug 9, 2016, at 8:52 AM, Pranaya Behera  wrote:
> 
> Hi,
> I have the following script to retrieve all the fields in the collection. 
> I am using SolrCloud 6.1.0.
> LukeRequest lukeRequest = new LukeRequest();
> lukeRequest.setNumTerms(0);
> lukeRequest.setShowSchema(false);
> LukeResponse lukeResponse = lukeRequest.process(cloudSolrClient);
> Map fieldInfoMap = 
> lukeResponse.getFieldInfo();
> for (Map.Entry entry : 
> fieldInfoMap.entrySet()) {
>  entry.getKey(); // Here fieldInfoMap is size of 0 for sometime and sometime 
> it is getting incomplete data.
> }
> 
> 
> Setting showSchema to true doesn't yield any result. Only making it false 
> yields result that too incomplete data. As I can see in the doc that it has 
> more than what it is saying it has.
> 
> LukeRequest hits /solr/product/admin/luke?numTerms=0&wt=javabin&version=2 
> HTTP/1.1 .
> 
> How it should be configured for solrcloud ?
> I have already mentioned
> 
>  class="org.apache.solr.handler.admin.LukeRequestHandler" />
> 
> in the solrconfig.xml. It doesn't matter whether it is present in the 
> solrconfig or not as I am requesting it from solrj.
>

Re: How can I set the defaultOperator to be AND?

2016-09-05 Thread Steve Rowe

Hi Bast, 

Good to know you got it to work - thanks for letting us know!

--
Steve
www.lucidworks.com

> On Sep 2, 2016, at 4:30 AM, Bastien Latard | MDPI AG 
>  wrote:
> 
> Thanks Steve for your advice (i.e.: upgrade to Solr 6.2).
> I finally had time to upgrade and can now use "&q.op=AND" together with "&q=a 
> OR b" and this works as expected.
> 
> I even defined the following line in the defaults settings in the 
> requestHandler, to overwrite the default behavior:
> AND
> 
> Issue fixed :)
> 
> Kind regards,
> Bast
> 
> On 05/08/2016 14:57, Bastien Latard | MDPI AG wrote:
>> Hi Steve,
>> 
>> I read the thread you sent me (SOLR-8812) and it seems that the 6.1 includes 
>> this fix, as you said.
>> I will upgrade.
>> Thank you!
>> 
>> Kind regards,
>> Bast
>> 
>> On 05/08/2016 14:37, Steve Rowe wrote:
>>> Hi Bastien,
>>> 
>>> Have you tried upgrading to 6.1?  SOLR-8812, mentioned earlier in the 
>>> thread, was released with 6.1, and is directly aimed at fixing the problem 
>>> you are having in 6.0 (also a problem in 5.5): when mm is not explicitly 
>>> provided and the query contains explicit operators (except for AND), 
>>> edismax now sets mm=0.
>>> 
>>> -- 
>>> Steve
>>> www.lucidworks.com
>>> 
>>>> On Aug 5, 2016, at 2:34 AM, Bastien Latard | MDPI AG 
>>>>  wrote:
>>>> 
>>>> Hi Eric & others,
>>>> Is there any way to overwrite the default OP when we use edismax?
>>>> Because adding the following line to solrconfig.xml doesn't solve the 
>>>> problem:
>>>> 
>>>> 
>>>> (Then if I do "q=black OR white", this always gives the results for "black 
>>>> AND white")
>>>> 
>>>> I did not find a way to define a default OP, which is automatically 
>>>> overwritten by the AND/OR from a query.
>>>> 
>>>> 
>>>> Example - Debug: defaultOP in solrconfig = AND / q=a or b
>>>> 
>>>> 
>>>> ==> results for black AND white
>>>> The correct result should be the following (but I had to force the q.op):
>>>> 
>>>> ==> I cannot do this in case I want to do "(a AND b) OR c"...
>>>> 
>>>> 
>>>> Kind regards,
>>>> Bastien
>>>> 
>>>> On 27/04/2016 05:30, Erick Erickson wrote:
>>>>> Defaulting to "OR" has been the behavior since forever, so changing the 
>>>>> behavior now is just not going to happen. Making it fit a new version of 
>>>>> "correct" will change the behavior for every application out there that 
>>>>> has not specified the default behavior.
>>>>> 
>>>>> There's no a-priori reason to expect "more words to equal fewer docs", I 
>>>>> can just as easily argue that "more words should return more docs". Which 
>>>>> you expect depends on your mental model.
>>>>> 
>>>>> And providing the default op in your solrconfig.xml request handlers 
>>>>> allows you to implement whatever model your application chooses...
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>> On Mon, Apr 25, 2016 at 11:32 PM, Bastien Latard - MDPI AG 
>>>>>  wrote:
>>>>> Thank you Shawn, Jan and Georg for your answers.
>>>>> 
>>>>> Yes, it seems that if I simply remove the defaultOperator it works well 
>>>>> for "composed queries" like '(a:x AND b:y) OR c:z'.
>>>>> But I think that the default Operator should/could be the AND.
>>>>> 
>>>>> Because when I add an extra search word, I expect that the results get 
>>>>> more accurate...
>>>>> (It seems to be what google is also doing now)
>>>>>|   |
>>>>> 
>>>>> Otherwise, if you make a search and apply another filter (e.g.: sort by 
>>>>> publication date, facets, ...) , user can get the less relevant item 
>>>>> (only 1 word in 4 matches) in first position only because of its date...
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> 
>>>>> Kind regards,
>>>>> Bastien
>>>>> 
>>>>> 
>>>>> On 25/04/2016 14:53, Sha

Re: Tutorial not working for me

2016-09-19 Thread Steve Rowe

In the data driven configset, autoguessing text fields as the “strings" field 
type is intended to enable faceting.  The catch-all _text_ field enables search 
on all fields, but this may not be a good alternative to fielded search. 

I’m going to start working on updating the quick start tutorial - nobody has 
updated it since 5.0 AFAICT.

--
Steve
www.lucidworks.com

> On Sep 16, 2016, at 8:34 PM, Chris Hostetter  wrote:
> 
> 
> : I apologize if this is a really stupid question. I followed all
> 
> It's not a stupid question, the tutorial is completley broken -- and for 
> that matter, in my opinion, the data_driven_schema_configs used by that 
> tutorial (and recommended for new users) are largely useless for the same 
> underlying reason...
> 
> https://issues.apache.org/jira/browse/SOLR-9526
> 
> Thank you very much for asking about this - hopefully the folks who 
> understand this more (and don't share my opinion that the entire concept 
> of data_driven schemas are a terrible idea) can chime in and explain WTF 
> is going on here)
> 
> 
> -Hoss
> http://www.lucidworks.com/

Re: Tutorial not working for me

2016-09-19 Thread Steve Rowe

Hi Alex,

Sure - I assume you mean independently from SOLR-9526 and SOLR-6871?

--
Steve
www.lucidworks.com

> On Sep 19, 2016, at 12:40 PM, Alexandre Rafalovitch  
> wrote:
> 
> On 19 September 2016 at 23:37, Steve Rowe  wrote:
>> I’m going to start working on updating the quick start tutorial - nobody has 
>> updated it since 5.0 AFAICT.
> 
> Is that something that's worth discussing in a group/JIRA/etc?
> 
> Regards,
>   Alex.
> 
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/

Re: Tutorial not working for me

2016-09-19 Thread Steve Rowe

For now, I was thinking of making it reflect current reality as much as 
possible, without changing coverage.

--
Steve
www.lucidworks.com

> On Sep 19, 2016, at 1:13 PM, Alexandre Rafalovitch  wrote:
> 
> Whatever works. If JIRA, SOLR-6871 is probably a reasonable place.
> Depends on the scope of "updating" you want to do.
> 
> Regards,
>   Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 20 September 2016 at 00:02, Steve Rowe  wrote:
>> Hi Alex,
>> 
>> Sure - I assume you mean independently from SOLR-9526 and SOLR-6871?
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Sep 19, 2016, at 12:40 PM, Alexandre Rafalovitch  
>>> wrote:
>>> 
>>> On 19 September 2016 at 23:37, Steve Rowe  wrote:
>>>> I’m going to start working on updating the quick start tutorial - nobody 
>>>> has updated it since 5.0 AFAICT.
>>> 
>>> Is that something that's worth discussing in a group/JIRA/etc?
>>> 
>>> Regards,
>>>  Alex.
>>> 
>>> 
>>> Newsletter and resources for Solr beginners and intermediates:
>>> http://www.solr-start.com/
>>

Re: Problem with Han character in ICUFoldingFilter

2016-10-30 Thread Steve Rowe

Among several other foldings, ICUFoldingFilter performs the Unicode NFC 
transform, which consists of canonical decomposition (NFD) followed by 
canonical composition.  NFD transforms U+FA04 to U+5B85, and canonical 
composition leaves U+5B85 as-is.

U+FA04 is in the “Pronunciation variants from KS X 1001:1998" sub-block - KS X 
1001 is a Korean encoding standard - in the "CJK Compatibility Ideographs" 
block .  I don’t know why these 
variants were included in Unicode, but the NFD transform includes the 
compatibility->canonical tranform, so it’s likely many other compatibility 
characters in your data will be affected, not just this one.  If the 
compatibility->canonical tranform is problematic, why are you using 
ICUFoldingFilter?

If you like some of the foldings included in ICUFoldingFilter but not others, 
check out the “gennorm2” and “gen-utr30-data-files” targets in the Lucene/Solr 
source code at lucene/analysis/icu/build.xml - you could build and use a 
modified binary tranform data file - this file is distributed as part of the 
lucene-analyzers-icu jar at org/apache/lucene/analysis/icu/utr30.nrm.

--
Steve
www.lucidworks.com

> On Oct 30, 2016, at 10:29 AM, Ahmet Arslan  wrote:
> 
> Hi Eyal,
> 
> ICUFoldingFilter uses http://site.icu-project.org under the hood.
> If you think there is a bug, it is better to ask its mailing list.
> 
> Ahmet
> 
> 
> 
> On Sunday, October 30, 2016 3:41 PM, "eyal.naam...@exlibrisgroup.com" 
>  wrote:
> Hi,
> 
> I was wondering if anyone ran into the following issue, or a similar one:
> In Han script there are two separate characters - 宅 (FA04) and 宅 (5B85).
> It seems that ICUFoldingFilter converts FA04 to 5B85, which results in the 
> wrong character being indexed.
> Does anyone have any idea if and how this can be resolved? Is there an option 
> to add an exception rule to ICUFoldingFilter?
> Thanks,
> Eyal

Re: Issue with SynonymGraphFilterFactory

2017-06-29 Thread Steve Rowe

Hi Diogo,

That sounds like a bug to me.  Would you mind filing a JIRA?

--
Steve
www.lucidworks.com

> On Jun 29, 2017, at 4:46 PM, diogo  wrote:
> 
> I just checked debug=query
> 
> Seems like spanNearQuery function is getting the slope parameter as 0, no
> matter what comes after the tilde:
> 
> "parsedquery":"SpanNearQuery(spanNear([laudo:mother,
> spanOr([laudo:hipoatenuaca, laudo:hipodens])],* 0*, true))"
> 
> For searching: "mother grandmother"~8 or "mother grandmother"~1000
> 
> synonyms.txt has: 
> mother, grand mother
> 
> When I search for words whose synonyms are not multi-word, MultiPhraseQuery
> is used, instead of SpanNearQuery:
> "MultiPhraseQuery(laudo:\"father (grandfather granddad)\"~10)"
> 
> synonyms.txt has:
> grandfather, granddad
> 
> Is ther a way to change the slope on the first case with Solr API?
> 
> Thanks
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Issue-with-SynonymGraphFilterFactory-tp4343400p4343544.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 6.6 test failure: TestSolrCloudWithKerberosAlt.testBasics

2017-07-20 Thread Steve Rowe

Does it look like this?: 


I see failures like that on my Jenkins once or twice a week.

--
Steve
www.lucidworks.com

> On Jul 20, 2017, at 3:53 PM, Nawab Zada Asad Iqbal  wrote:
> 
> Hi,
> 
> I cloned solr 6.6 branch today and I see this failure consistently.
> 
> TestSolrCloudWithKerberosAlt.testBasics
> 
> 
> I had done some script changes but after seeing this failure I reverted
> them and ran: `ant -Dtestcase=TestSolrCloudWithKerberosAlt clean test` but
> this test still fails with this error:-
> 
>   [junit4]> Throwable #1: java.lang.NoSuchFieldError: id_aes128_CBC
>   [junit4]> at
> __randomizedtesting.SeedInfo.seed([453D16027AC52FD9:78E5B82E422B71A9]:0)
> 
> 
> I see the jenkins build are all clean, so not sure what I am hitting.
> 
> https://builds.apache.org/job/Lucene-Solr-Maven-6.x/
> 
> https://builds.apache.org/job/Solr-Artifacts-6.x/
> 
> Regards
> Nawab

Re: How to remove Scripts and Styles in content of SOLR Indexes[content field] while indexed through URL?

2017-08-10 Thread Steve Rowe

Hi Daniel,

HTMLStripCharFilterFactory in your index analyzer should do the trick: 


--
Steve
www.lucidworks.com

> On Aug 10, 2017, at 4:13 AM, Daniel von der Helm 
>  wrote:
> 
> Hi,
> if a fetched HTML page (using SimplePostTool: -Ddata=web) contains

Re: QueryParser changes query by itself [solved]

2017-08-22 Thread Steve Rowe

Hi Bernd,

> On Aug 22, 2017, at 4:31 AM, Bernd Fehling  
> wrote:
> 
> But the QueryBuilder only calls "stream.reset()", it never calls 
> "stream.end()" so that Filters
> in the Analyzer chain can't do any cleanup (like my Filter wanted to do).
> I moved my "cleanup" into reset() which feels like a dirty hack.
> 
> 
> My opinion, in lucene QueryBuilder there should be a "stream.end()" after 
> consuming the stream:
> ...
>   stream.reset();
>   while (stream.incrementToken()) {
>   numTokens++;
>   ...
>   }
>   stream.end();
> ...

The stream here is a CachingTokenFilter wrapping the passed-in TokenStream. On 
first call to cache.incrementToken(), CachingTokenFilter's cache is populated 
by exhausting the wrapped stream and then calling its end() method.

--
Steve
www.lucidworks.com

Re: Possible memory leak with VersionBucket objects

2017-09-25 Thread Steve Rowe

Hi Sundeep,

This looks to me like  / 
, which was fixed in Solr 7.0.

--
Steve
www.lucidworks.com

> On Sep 25, 2017, at 2:42 PM, Sundeep T  wrote:
> 
> Hello,
> 
> We are running our solr 6.4.2 instance on a single node without zookeeper. 
> So, we are not using solr cloud. We have been ingesting about 50k messages 
> per second into this instance spread over 4 cores. 
> 
> When we looked at the heapdump we see that it has there are around 385 
> million instances of VersionBucket objects taking about 8gb memory. This 
> number seems to grow based on the number of cores into which we are ingesting 
> data into.PFA a screen cap of heap recording.
> 
> Browsing through the jira list we saw a similar issue 
> -https://issues.apache.org/jira/browse/SOLR-9803
> 
> This issue is recently resolved by Erick. But this issue seems be 
> specifically tied to SolrCloud mode and Zookeeper. We are not using any of 
> these.
> 
> So, we are thinking this could be another issue. Any one has ideas on what 
> this could be and if there is a fix for it?
> 
> Thanks
> Sundeep

Re: When will be solr 7.1 released?

2017-09-26 Thread Steve Rowe

Hi Nawab,

Committership is a prerequisite for the Lucene/Solr release manager role.

Some info here about the release process: 


--
Steve
www.lucidworks.com

> On Sep 26, 2017, at 11:28 AM, Nawab Zada Asad Iqbal  wrote:
> 
> Where can I learn more about this process? I am not a committer but I am
> wondering if I know enough to do it.
> 
> 
> Thanks
> Nawab
> 
> 
> On Mon, Sep 25, 2017 at 9:23 PM, Erick Erickson 
> wrote:
> 
>> In a word "no". Basically whenever a committer feels like there are
>> enough changes to warrant spinning a new version, they volunteer.
>> Nobody has stepped up to do that yet, although I expect it to be in
>> the next 2-3 months, but that's only a guess.
>> 
>> Best,
>> Erick
>> 
>> On Mon, Sep 25, 2017 at 5:21 PM, Nawab Zada Asad Iqbal 
>> wrote:
>>> Hi,
>>> 
>>> How are the release dates decided for new versions, are they known in
>>> advance?
>>> 
>>> Thanks
>>> Nawab
>>

Re: When will be solr 7.1 released?

2017-09-26 Thread Steve Rowe

Hi Nawab,

> On Sep 26, 2017, at 8:04 PM, Nawab Zada Asad Iqbal  wrote:
> 
> Thanks ,  another question(s):
> 
> why is this released marked 'unreleased' ?
> https://issues.apache.org/jira/projects/SOLR/versions/12335718

The 7.0 release manager hasn’t gotten around to marking it as released; note 
that there is an item to do this in the ReleaseToDo: 

> how is it different from :
> https://issues.apache.org/jira/projects/SOLR/versions/12341601 (i guess
> this is duplicate and will not be used)

Yes, the 7.0.0 version is a duplicate of the 7.0 version.  I switched issues 
with 7.0.0 to 7.0 and removed the 7.0.0 version in JIRA.

> I was expecting to see https://issues.apache.org/jira/browse/SOLR-11297 in
> https://issues.apache.org/jira/projects/SOLR/versions/12341052 but couldn't
> locate it.

I see SOLR-11297 in the JIRA-generated "release note”: 
.
  (Note that Lucene and Solr maintain their own release notes: solr/CHANGES.txt 
and lucene/CHANGES.txt.)

--
Steve
www.lucidworks.com

Re: Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Steve Rowe

Hi Nawab,

> On Oct 4, 2017, at 7:39 PM, Nawab Zada Asad Iqbal  wrote:
> 
> I am hitting following error with maven build:
> Is that expected?

No.  What commands did you use?

> Can someone share me the details about how
> https://builds.apache.org/job/Lucene-Solr-Maven-master is configured.

The Jenkins job runs the equivalent of the following:

ant jenkins-maven-nightly -Dm2.repository.id=apache.snapshots.https
-Dm2.repository.url=https://repository.apache.org/content/repositories/snapshots
-DskipTests=true

This in turn runs the equivalent of the following:

ant get-maven-poms
mvn -f maven-build/pom.xml -fae  -Dm2.repository.id=apache.snapshots.https
-Dm2.repository.url=https://repository.apache.org/content/repositories/snapshots
-DskipTests=true install

Note that tests are not run, and that artifacts are published to the Apache 
sandbox repository.

--
Steve
www.lucidworks.com

Re: Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Steve Rowe

When I run those commands (on Debian Linux 8.9, with Maven v3.0.5 and Oracle 
JDK 1.8.0.77), I get:

-
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 1:19.741s
-

Are you on the master branch?  Have you modified the source? 

--
Steve
www.lucidworks.com

> On Oct 4, 2017, at 8:25 PM, Nawab Zada Asad Iqbal  wrote:
> 
> Hi Steve,
> 
> I did this:
> 
> ant get-maven-poms
>  cd maven-build/
>  mvn -DskipTests install
> 
> On Wed, Oct 4, 2017 at 4:56 PM, Steve Rowe  wrote:
> 
>> Hi Nawab,
>> 
>>> On Oct 4, 2017, at 7:39 PM, Nawab Zada Asad Iqbal 
>> wrote:
>>> 
>>> I am hitting following error with maven build:
>>> Is that expected?
>> 
>> No.  What commands did you use?
>> 
>>> Can someone share me the details about how
>>> https://builds.apache.org/job/Lucene-Solr-Maven-master is configured.
>> 
>> The Jenkins job runs the equivalent of the following:
>> 
>> ant jenkins-maven-nightly -Dm2.repository.id=apache.snapshots.https
>> -Dm2.repository.url=https://repository.apache.org/content/
>> repositories/snapshots
>> -DskipTests=true
>> 
>> This in turn runs the equivalent of the following:
>> 
>> ant get-maven-poms
>> mvn -f maven-build/pom.xml -fae  -Dm2.repository.id=apache.snapshots.https
>> -Dm2.repository.url=https://repository.apache.org/content/
>> repositories/snapshots
>> -DskipTests=true install
>> 
>> Note that tests are not run, and that artifacts are published to the
>> Apache sandbox repository.
>> 
>> --
>> Steve
>> www.lucidworks.com

[ANNOUNCE] Apache Solr 7.0.1 released

2017-10-06 Thread Steve Rowe

6 October 2017, Apache Solr™ 7.0.1 available 

Solr is the popular, blazing fast, open source NoSQL search platform from the 
Apache Lucene project. Its major features include powerful full-text search, 
hit highlighting, faceted search and analytics, rich document parsing, 
geospatial search, extensive REST APIs as well as parallel SQL. Solr is 
enterprise grade, secure and highly scalable, providing fault tolerant 
distributed search and indexing, and powers the search and navigation 
features of many of the world's largest internet sites. 

This release includes 2 bug fixes since the 7.0.0 release: 

* Solr 7.0 cannot read indexes from 6.x versions. 

* Message "Lock held by this virtual machine" during startup. 
Solr is trying to start some cores twice. 

Furthermore, this release includes Apache Lucene 7.0.1 which includes 1 bug 
fix since the 7.0.0 release. 

The release is available for immediate download at: 

http://www.apache.org/dyn/closer.lua/lucene/solr/7.0.1 

Please read CHANGES.txt for a detailed list of changes: 

https://lucene.apache.org/solr/7_0_1/changes/Changes.html 

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html) 

Note: The Apache Software Foundation uses an extensive mirroring 
network for distributing releases. It is possible that the mirror you 
are using may not have replicated the release yet. If that is the 
case, please try another mirror. This also goes for Maven access.

[ANNOUNCE] Apache Solr 5.5.5 released

2017-10-24 Thread Steve Rowe

24 October 2017, Apache Solr™ 5.5.5 available 

The Lucene PMC is pleased to announce the release of Apache Solr 5.5.5. 

Solr is the popular, blazing fast, open source NoSQL search platform from the 
Apache Lucene project. Its major features include powerful full-text search, 
hit highlighting, faceted search and analytics, rich document parsing, 
geospatial search, extensive REST APIs as well as parallel SQL. Solr is 
enterprise grade, secure and highly scalable, providing fault tolerant 
distributed search and indexing, and powers the search and navigation features 
of many of the world's largest internet sites. 

This release contains one bugfix. 

This release includes one critical and one important security fix. Details: 

* Fix for a 0-day exploit (CVE-2017-12629), details: https://s.apache.org/FJDl. 
RunExecutableListener has been disabled by default (can be enabled by 
-Dsolr.enableRunExecutableListener=true) and resolving external entities in the 
XML query parser (defType=xmlparser or {!xmlparser ... }) is disabled by 
default. 

* Fix for CVE-2017-7660: Security Vulnerability in secure inter-node 
communication 
in Apache Solr, details: https://s.apache.org/APTY 

Furthermore, this release includes Apache Lucene 5.5.5 which includes one 
security 
fix since the 5.5.4 release. 

The release is available for immediate download at: 

http://www.apache.org/dyn/closer.lua/lucene/solr/5.5.5 

Please read CHANGES.txt for a detailed list of changes: 

https://lucene.apache.org/solr/5_5_5/changes/Changes.html 

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html) 

Note: The Apache Software Foundation uses an extensive mirroring 
network for distributing releases. It is possible that the mirror you 
are using may not have replicated the release yet. If that is the 
case, please try another mirror. This also goes for Maven access.

Re: [ANNOUNCE] Apache Solr 5.5.5 released

2017-10-24 Thread Steve Rowe

Yes.  

--
Steve
www.lucidworks.com

> On Oct 24, 2017, at 12:25 PM, Moenieb Davids  wrote:
> 
> Solr 5.5.5?
> 
> On 24 Oct 2017 17:34, "Steve Rowe"  wrote:
> 
>> 24 October 2017, Apache Solr™ 5.5.5 available
>> 
>> The Lucene PMC is pleased to announce the release of Apache Solr 5.5.5.
>> 
>> Solr is the popular, blazing fast, open source NoSQL search platform from
>> the
>> Apache Lucene project. Its major features include powerful full-text
>> search,
>> hit highlighting, faceted search and analytics, rich document parsing,
>> geospatial search, extensive REST APIs as well as parallel SQL. Solr is
>> enterprise grade, secure and highly scalable, providing fault tolerant
>> distributed search and indexing, and powers the search and navigation
>> features
>> of many of the world's largest internet sites.
>> 
>> This release contains one bugfix.
>> 
>> This release includes one critical and one important security fix. Details:
>> 
>> * Fix for a 0-day exploit (CVE-2017-12629), details:
>> https://s.apache.org/FJDl.
>> RunExecutableListener has been disabled by default (can be enabled by
>> -Dsolr.enableRunExecutableListener=true) and resolving external entities
>> in the
>> XML query parser (defType=xmlparser or {!xmlparser ... }) is disabled by
>> default.
>> 
>> * Fix for CVE-2017-7660: Security Vulnerability in secure inter-node
>> communication
>> in Apache Solr, details: https://s.apache.org/APTY
>> 
>> Furthermore, this release includes Apache Lucene 5.5.5 which includes one
>> security
>> fix since the 5.5.4 release.
>> 
>> The release is available for immediate download at:
>> 
>> http://www.apache.org/dyn/closer.lua/lucene/solr/5.5.5
>> 
>> Please read CHANGES.txt for a detailed list of changes:
>> 
>> https://lucene.apache.org/solr/5_5_5/changes/Changes.html
>> 
>> Please report any feedback to the mailing lists
>> (http://lucene.apache.org/solr/discussion.html)
>> 
>> Note: The Apache Software Foundation uses an extensive mirroring
>> network for distributing releases. It is possible that the mirror you
>> are using may not have replicated the release yet. If that is the
>> case, please try another mirror. This also goes for Maven access.

Re: mvn test failing

2017-10-30 Thread Steve Rowe

Hi Tariq,

It’s difficult to tell what happened without seeing the logs from the failed 
test(s).  (The commands you issued look fine.)

--
Steve
www.lucidworks.com

> On Oct 29, 2017, at 1:48 AM, Tarique Anwer  wrote:
> 
> hi,
> 
> I am new to Solr.
> I am trying to build Solr from source code using Maven.
> So I performed the following steps:
> 
> 1. Download the source code zip from https://github.com/apache/lucene-solr
> 2. unzip & run from top level dir:
>  $ ant get-maven-poms
> $ cd maven-build
> 
> 3. then build:
>  $ mvn -DskipTests install
> 
> Which shows that build is successful.
> 
> So I tried to run the tests afterwords:
>  $ mvn test
> 
> But tests are failing:
> 
> [INFO] Apache Solr Analysis Extras  FAILURE [02:48
> min]
> [INFO] Apache Solr Core tests . SKIPPED
> [INFO] Apache Solr Core aggregator POM  SKIPPED
> [INFO] Apache Solr Solrj tests  SKIPPED
> [INFO] Apache Solr Solrj aggregator POM ... SKIPPED
> [INFO] Apache Solr Analytics Package .. SKIPPED
> [INFO] Apache Solr Clustering . SKIPPED
> [INFO] Apache Solr DataImportHandler .. SKIPPED
> [INFO] Apache Solr DataImportHandler Extras ... SKIPPED
> [INFO] Apache Solr Content Extraction Library . SKIPPED
> [INFO] Apache Solr Language Identifier  SKIPPED
> [INFO] Apache Solr Learning to Rank Package ... SKIPPED
> [INFO] Apache Solr UIMA integration ... SKIPPED
> [INFO] Apache Solr Velocity ... SKIPPED
> [INFO] Apache Solr Contrib aggregator POM . SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 17:45 min
> [INFO] Finished at: 2017-10-29T05:46:43Z
> [INFO] Final Memory: 194M/1999M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on
> project solr-analysis-extras: There are test failures.
> [ERROR]
> [ERROR] Please refer to
> /home/ec2-user/tariq/lucene-solr-master/maven-build/solr/contrib/analysis-extras/target/surefire-reports
> for the individual test results.
> [ERROR] -> [Help 1]
> 
> 
> 
> Did I do something wrong? Or i missed some steps before build.
> Any help is highly appreciated.
> 
> 
> With Regards,
> 
> Tariq

Re: SynonymGraphFilterFactory with edismax

2017-11-02 Thread Steve Rowe

Hi Amar,

What version of Solr are you using?  This looks like a bug that was fixed in 
Solr 6.6.1: .

--
Steve
www.lucidworks.com

> On Nov 2, 2017, at 8:31 AM, Amar Raja  
> wrote:
> 
> Hello,
> 
> I have the following field definition:
> 
> 
>  
>
> ignoreCase="true" expand="true"/>
> words="lang/stopwords_en.txt" />
>
>
> protected="protwords.txt"/>
>
>  
> 
> 
> And the following two synonym definitions:
> 
> kids => boys,girls
> metallic => rose gold,metallic
> 
> The intent being a user searching for "kids" should get girls or boys
> results, but searching for "boys" will not bring back girls results.
> Similarly searching for "metallic" should bring back results for either
> "metallic" or "rose gold", but the search for "rose gold" should not bring
> back "metallic".
> 
> Another property I have set is q.op=AND. I.e. "boys tops" should return
> where only both terms exist.
> 
> The first synonym works well, producing the following dismax query:
> 
> (+(+DisjunctionMaxQuery((Synonym(web_name:boi
> web_name:girl))~1.0)))/no_coord
> 
> However, for the second I get this:
> 
> (+(+DisjunctionMaxQuery(+web_name:rose +web_name:gold)
> web_name:metal)~2))~1.0)))/no_coord
> 
> But for any terms where any of the terms in the RHS have multiple terms, it
> seems to want to match both synonyms, so in this case only documents with
> both "metallic" and "rose gold" will match.
> 
> Any ideas where I am going wrong?

Re: Solr mm is field Level in case sow is false

2017-11-28 Thread Steve Rowe

Hi Aman,

From the last bullet in the “Caveats and remaining issues” section of m 
query-time multi-word synonyms blog: 
,
 in part:

> sow=false changes the queries edismax produces over multiple fields when
> any of the fields’ query-time analysis differs from the other fields’ [...]
> This can change results in general, but quite significantly when combined
> with the mm (min-should-match) request parameter: since min-should-match
> applies per field instead of per term, missing terms in one field’s analysis
> won’t disqualify docs from matching.

One effective way of addressing this issue is to make all queried fields use 
the same analysis, e.g. by copy-fielding the subset of fields that are 
different into ones that are the same, and then querying against the target 
fields instead.

--
Steve
www.lucidworks.com

> On Nov 28, 2017, at 5:25 AM, Aman Deep singh  
> wrote:
> 
> Hi,
> When sow is set to false then solr query is generated a little differently as 
> compared to sow=true
> 
> Solr version -6.6.1
> 
> User query -Asus ZenFone Go ZB5 Smartphone
> mm is set to 100%
> qf=nameSearch^7 brandSearch
> 
> field definition
> 
> 1. nameSearch—
>  autoGeneratePhraseQueries="false" positionIncrementGap="100">
>
>
> replacement="and"/>
>
> generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" 
> preserveOriginal="1" catenateAll="1" catenateWords="1"/>
>
>
>
>
>
> replacement="and"/>
>
> synonyms=“synonyms.txt"/>
> generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" 
> splitOnNumerics="1" preserveOriginal="0" catenateAll="0" catenateWords="0"/>
>
>
>
> 
> 
> 2. brandSearch
>  autoGeneratePhraseQueries="true" positionIncrementGap="100">
>
>
>
>
>
>
>
>
> 
> 
> 
> with sow=false
> "parsedquery":"(+DisjunctionMaxQuerybrandSearch:asus brandSearch:zenfone 
> brandSearch:go brandSearch:zb5 brandSearch:smartphone)~5) | ((nameSearch:asus 
> nameSearch:zen nameSearch:fone nameSearch:go nameSearch:zb nameSearch:5 
> nameSearch:smartphone)~7)^7.0)))/no_coord",
> 
> with sow=true
> "parsedquery":"(+(DisjunctionMaxQuery((brandSearch:asus | 
> (nameSearch:asus)^7.0)) DisjunctionMaxQuery((brandSearch:zenfone | 
> ((nameSearch:zen nameSearch:fone)~2)^7.0)) 
> DisjunctionMaxQuery((brandSearch:go | (nameSearch:go)^7.0)) 
> DisjunctionMaxQuery((brandSearch:zb5 | ((nameSearch:zb nameSearch:5)~2)^7.0)) 
> DisjunctionMaxQuery((brandSearch:smartphone | 
> (nameSearch:smartphone)^7.0)))~5)/no_coord",
> 
> 
> 
> If you see the difference in parsed query in sow=false case mm is working as 
> field level while in case of sow=true mm is working across the field
> 
> We need to use sow=false as it is the only way to use multiword synonyms
> Any idea why it is behaving in this manner and any way to fix so that mm will 
> work across fields in qf.
> 
> Thanks,
> Aman Deep Singh

Re: Does the schema api support xml requests?

2017-11-28 Thread Steve Rowe

Hi,

The schema api does not support XML requests, and there are currently no plans 
I’m aware of to add support. 

--
Steve
www.lucidworks.com

> On Nov 28, 2017, at 8:23 AM, startrekfan  wrote:
> 
> Hey
> 
> Does the schema api supports xml requests? I tried to post a xml formatted
> "add-field" but got an parser exception. Is there no xml support? Is it
> planned to add xml support within the next few months?
> 
> Thanks

Re: Solr mm is field Level in case sow is false

2017-11-28 Thread Steve Rowe

Hi Aman, see my responses inline below.

> On Nov 28, 2017, at 9:11 PM, Aman Deep Singh  
> wrote:
> 
> Thanks steve,
> I got it but my problem is u can't make the every field with same analysis,

I don’t understand: why can’t you use copy fields with all the same analysis?

> Is there any chance that sow and mm will work properly ,I don't see this in
> future pipe line also,as their is no jira related to this.

I wrote up a description of an idea I had about addressing it in a reply to 
Doug Turnbull's thread on this subject, linked from my blog: from 
:

> In implementing the SOLR-9185 changtes, I considered a compromise approach to 
> the term-centric
> / field-centric axis you describe in the case of differing field analysis 
> pipelines: finding
> common source-text-offset bounded slices in all per-field queries, and then 
> producing dismax
> queries over these slices; this is a generalization of what happens in the 
> sow=true case,
> where slice points are pre-determined by whitespace.  However, it looked 
> really complicated
> to maintain source text offsets with queries (if you’re interested, you can 
> see an example
> of the kind of thing I’m talking about in my initial patch on 
> , which I ultimately 
> decided against committing), so I decided to go with per-field dismax when
> structural differences are encountered in the per-field queries.  While I 
> won’t be doing
> any work on this short term, I still think the above-described approach could 
> improve the
> situation in the sow=false/differing-field-analysis case.  Patches welcome!

--
Steve
www.lucidworks.com

Re: No Live SolrServer available to handle this request

2017-12-07 Thread Steve Rowe

Hi Selvam,

This sounds like it may be a bug - could you please create a JIRA?  (See 

 for more info.)

Thanks,

--
Steve
www.lucidworks.com

> On Dec 6, 2017, at 9:56 PM, Selvam Raman  wrote:
> 
> Yes. you are right. we are using preanalyzed field and that causing the
> problem.
> The actual problem is preanalyzed with highlight option. if i disable
> highlight option it works fine. Please let me know if there is work around
> to solve it.
> 
> On Wed, Dec 6, 2017 at 10:19 PM, Erick Erickson 
> wrote:
> 
>> This looks like you're using "pre analyzed fields" which have a very
>> specific format. PreAnalyzedFields are actually pretty rarely used,
>> did you enable them by mistake?
>> 
>> On Tue, Dec 5, 2017 at 11:37 PM, Selvam Raman  wrote:
>>> When i look at the solr logs i find the below exception
>>> 
>>> Caused by: java.io.IOException: Invalid JSON type java.lang.String,
>>> expected Map
>>> at
>>> org.apache.solr.schema.JsonPreAnalyzedParser.parse(
>> JsonPreAnalyzedParser.java:86)
>>> at
>>> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedTokenizer.
>> decodeInput(PreAnalyzedField.java:345)
>>> at
>>> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedTokenizer.access$
>> 000(PreAnalyzedField.java:280)
>>> at
>>> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer$1.
>> setReader(PreAnalyzedField.java:375)
>>> at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:202)
>>> at
>>> org.apache.lucene.search.uhighlight.AnalysisOffsetStrategy.tokenStream(
>> AnalysisOffsetStrategy.java:58)
>>> at
>>> org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.
>> getOffsetsEnums(MemoryIndexOffsetStrategy.java:106)
>>> ... 37 more
>>> 
>>> 
>>> 
>>> I am setting up lot of fields (fq, score, highlight,etc) then put it
>> into
>>> solrquery.
>>> 
>>> On Wed, Dec 6, 2017 at 11:22 AM, Selvam Raman  wrote:
>>> 
 When i am firing query it returns the doc as expected. (Example:
 q=synthesis)
 
 I am facing the problem when i include wildcard character in the query.
 (Example: q=synthesi*)
 
 
 org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
 Error from server at http://localhost:8983/solr/Metadata2:
 org.apache.solr.client.solrj.SolrServerException:
 
 No live SolrServers available to handle this request:[/solr/Metadata2_
 shard1_replica1,
  solr/Metadata2_shard2_replica2,
  solr/Metadata2_shard1_replica2]
 
 --
 Selvam Raman
 "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
 
>>> 
>>> 
>>> 
>>> --
>>> Selvam Raman
>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>> 
> 
> 
> 
> -- 
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter

2017-12-20 Thread Steve Rowe

Hi Markus,

My suggestion: rewrite your synonyms to include the triggering word in the 
expanded synonyms list.  That way you won’t need KeywordRepeat/RemoveDuplicates 
filters, and mm=100% will work as you expect.

I don’t think this situation is a bug, since mm applies to the built query, not 
to the original query terms.

--
Steve
www.lucidworks.com

> On Dec 20, 2017, at 5:02 PM, Markus Jelsma  wrote:
> 
> Hello,
> 
> Yes of course, index time synonyms lessens the query time complexity and will 
> solve the mm problem. It also screws IDF and the flexibility of adding 
> synonyms on demand. The first we do not want, the second is impossible for us 
> (very large main search index).
> 
> We are looking for a solution with mm that takes KeywordRepeat, stemming and 
> synonym expansion into consideration. To me the current working of mm in this 
> case is a bug, i input one term so treat it as one term in mm, regardless of 
> expanded query terms.
> 
> Any query time ideas to share? I am not well versed with the actual code 
> dealing with this specific subject, the code doesn't like me. I am fine if 
> someone points me to the code that tells mm about the number of original 
> input terms, and what to do. If someone does, please also explain why the 
> change i want to make is a bad one, what to be aware of or what to beware of, 
> or what to take into account.
> 
> Also, am i the only one who regards this behaviour as a bug, or more subtle, 
> a weird unexpected behaviour?
> 
> Many many thanks!
> Markus
> 
> -Original message-
>> From:Shawn Heisey 
>> Sent: Wednesday 20th December 2017 22:39
>> To: solr-user@lucene.apache.org
>> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter
>> 
>> On 12/19/2017 4:38 AM, Markus Jelsma wrote:
>>> I have an interesting issue with mm and SynonymQuery and 
>>> KeywordRepeatFilter. We do query time synonym expansion and use 
>>> KeywordRepeat for not only finding stemmed tokens. Our synonyms are already 
>>> preprocessed and contain only stemmed tokens. Synonym file contains: 
>>> traject,verbind
>>> 
>>> So, any non-root stem that ends up in a synonym is actually a search for 
>>> three terms: +DisjunctionMaxQuery(((title_nl:trajecten 
>>> Synonym(title_nl:traject title_nl:verbind
>>> 
>>> But, our default mm requires that two terms must match if the input query 
>>> consists of two terms: 2<-1 5<-2 6<90%
>>> 
>>> So, a simple query looking for a plural (trajecten) will not match a 
>>> document where the title contains only its singular form: q=trajecten will 
>>> not match document with title_nl:"een traject"
>> 
>> I would think that doing synonym expansion at index time would remove
>> any possible confusion about the number of terms at query time.  Queries
>> that involve synonyms will be slightly less complex, but the index would
>> be larger, so it's difficult to say whether those kinds of queries would
>> be any faster or not.
>> 
>> There is one clear disadvantage to index-time synonym expansion: If you
>> change your synonyms, you have to reindex.
>> 
>> Thanks,
>> Shawn
>> 
>>

Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter

2017-12-21 Thread Steve Rowe

Markus,

I’m confused about exactly what operations you’re performing - could you 
provide your field type?

In particular, I don’t understand why you can’t just rewrite the synonyms file 
entry

  word1 => word2

to:

  word1 => word1, word2

(Clearly I’m missing something about how stemming is involved.)

--
Steve
www.lucidworks.com

> On Dec 21, 2017, at 9:28 AM, Markus Jelsma  wrote:
> 
> Hello Steve,
> 
> Well, that is an interesting approach to the topic indeed. But i do not think 
> it is possible to obtain a list of all inflected forms for all words that 
> also have roots in some synonym file, the stemmers are not reversible. 
> 
> Any other ideas?
> 
> Thanks,
> Markus
> 
> -Original message-
>> From:Steve Rowe 
>> Sent: Thursday 21st December 2017 0:10
>> To: solr-user@lucene.apache.org
>> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter
>> 
>> Hi Markus,
>> 
>> My suggestion: rewrite your synonyms to include the triggering word in the 
>> expanded synonyms list.  That way you won’t need 
>> KeywordRepeat/RemoveDuplicates filters, and mm=100% will work as you expect.
>> 
>> I don’t think this situation is a bug, since mm applies to the built query, 
>> not to the original query terms.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Dec 20, 2017, at 5:02 PM, Markus Jelsma  
>>> wrote:
>>> 
>>> Hello,
>>> 
>>> Yes of course, index time synonyms lessens the query time complexity and 
>>> will solve the mm problem. It also screws IDF and the flexibility of adding 
>>> synonyms on demand. The first we do not want, the second is impossible for 
>>> us (very large main search index).
>>> 
>>> We are looking for a solution with mm that takes KeywordRepeat, stemming 
>>> and synonym expansion into consideration. To me the current working of mm 
>>> in this case is a bug, i input one term so treat it as one term in mm, 
>>> regardless of expanded query terms.
>>> 
>>> Any query time ideas to share? I am not well versed with the actual code 
>>> dealing with this specific subject, the code doesn't like me. I am fine if 
>>> someone points me to the code that tells mm about the number of original 
>>> input terms, and what to do. If someone does, please also explain why the 
>>> change i want to make is a bad one, what to be aware of or what to beware 
>>> of, or what to take into account.
>>> 
>>> Also, am i the only one who regards this behaviour as a bug, or more 
>>> subtle, a weird unexpected behaviour?
>>> 
>>> Many many thanks!
>>> Markus
>>> 
>>> -Original message-
 From:Shawn Heisey 
 Sent: Wednesday 20th December 2017 22:39
 To: solr-user@lucene.apache.org
 Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter
 
 On 12/19/2017 4:38 AM, Markus Jelsma wrote:
> I have an interesting issue with mm and SynonymQuery and 
> KeywordRepeatFilter. We do query time synonym expansion and use 
> KeywordRepeat for not only finding stemmed tokens. Our synonyms are 
> already preprocessed and contain only stemmed tokens. Synonym file 
> contains: traject,verbind
> 
> So, any non-root stem that ends up in a synonym is actually a search for 
> three terms: +DisjunctionMaxQuery(((title_nl:trajecten 
> Synonym(title_nl:traject title_nl:verbind
> 
> But, our default mm requires that two terms must match if the input query 
> consists of two terms: 2<-1 5<-2 6<90%
> 
> So, a simple query looking for a plural (trajecten) will not match a 
> document where the title contains only its singular form: q=trajecten 
> will not match document with title_nl:"een traject"
 
 I would think that doing synonym expansion at index time would remove
 any possible confusion about the number of terms at query time.  Queries
 that involve synonyms will be slightly less complex, but the index would
 be larger, so it's difficult to say whether those kinds of queries would
 be any faster or not.
 
 There is one clear disadvantage to index-time synonym expansion: If you
 change your synonyms, you have to reindex.
 
 Thanks,
 Shawn
 
 
>> 
>>

Re: Solr Analyzer Language

2016-11-27 Thread Steve Rowe

Hi Chien,

By “unsigned” I think you mean without diacritics, for example ‘D’ instead of 
‘Đ'.

I think you can get what you want by including ICUFoldingFilterFactory in your 
analyzer - see 
.

--
Steve
www.lucidworks.com

> On Nov 27, 2016, at 11:24 AM, Chien Nguyen  wrote:
> 
> Hi, everyone. 
> Now, i want to search with input is Unsigned Vietnamese. How can i do to get
> result same with Vietnamese input? I hope get a help from u. 
> Thank u so much. 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Analyzer-Language-tp4307585.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to use the StandardTokenizer with currency

2016-11-30 Thread Steve Rowe

Hi Vinay,

You should be able to use a char filter to convert “$” characters into 
something that will survive tokenization, and then a token filter to convert it 
back.

Something like this (untested):

  


http://stackoverflow.com/questions/40877567/using-standardtokenizerfactory-with-currency
> 
> I'd like to maintain other aspects of the StandardTokenizer functionality
> but I'm wondering if to do what I want, the task boils down to be able to
> instruct the StandardTokenizer not to discard the $ symbol ? Or is there
> another way? I'm hoping that this is possible with configuration, rather
> than code changes.
> 
> Thanks

Re: Solr Doc Site Down?

2016-12-01 Thread Steve Rowe

Yup, same for me - see 

--
Steve
www.lucidworks.com

> On Dec 1, 2016, at 11:11 AM, Matt Kuiper  wrote:
> 
> FYI - This morning I am no longer able to access - 
> https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
> 
> Matt

Re: How to use the StandardTokenizer with currency

2016-12-06 Thread Steve Rowe

Cool, thanks for letting us know (and sorry about the typo!)

--
Steve
www.lucidworks.com

> On Dec 6, 2016, at 4:15 PM, Vinay B,  wrote:
> 
> Yes, that works (apart from the typo in PatternReplaceCharFilterFactory)
> 
> Here is my config
> 
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>  
> mapping="mapping.txt"/>
> replacement="xxdollarxx"/>
>
> replacement="\$" replace="all"/>
> words="stopwords.txt" enablePositionIncrements="true"/>
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1" types="word-delim-types.txt" />
>
> 
>  
> mapping="mapping.txt"/>
> replacement="xxdollarxx"/>
>
> replacement="\$" replace="all"/>
> ignoreCase="true" expand="true"/>
> words="stopwords.txt" enablePositionIncrements="true"/>
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"  types="word-delim-types.txt" />
>
>  
> 
> 
> On Wed, Nov 30, 2016 at 2:08 PM, Steve Rowe  wrote:
> 
>> Hi Vinay,
>> 
>> You should be able to use a char filter to convert “$” characters into
>> something that will survive tokenization, and then a token filter to
>> convert it back.
>> 
>> Something like this (untested):
>> 
>>  
>>>pattern=“\$”
>>replacement=“__dollar__”/>
>>
>>http://stackoverflow.com/questions/40877567/using-
>> standardtokenizerfactory-with-currency
>>> 
>>> I'd like to maintain other aspects of the StandardTokenizer functionality
>>> but I'm wondering if to do what I want, the task boils down to be able to
>>> instruct the StandardTokenizer not to discard the $ symbol ? Or is there
>>> another way? I'm hoping that this is possible with configuration, rather
>>> than code changes.
>>> 
>>> Thanks
>> 
>>

Re: Arabic words search in solr

2017-01-29 Thread Steve Rowe

Hi Mohan,

The analyzer in your text_ar field type looks like an expanded version of the 
one suggested in the Solr Reference Guide[1].

Can you give an example of a query and the indexed text you expect to match but 
doesn't?

ArabicNormalizationFilterFactory, which uses Lucene’s ArabicNormalizer[2] 
should convert alefs with hamza to plain alef, among several other 
normalizations.

The Light 10 stemming algorithm implemented by ArabicNormalizer and 
ArabicStemmer[3] is described here: 
.

[1] Solr Ref Guide: Language Analysis: Arabic 

[2] ArabicNormalizer javadocs 

[3] ArabicStemmer javadocs 

--
Steve
www.lucidworks.com

> On Jan 29, 2017, at 2:12 PM, mohan sundaram  wrote:
> 
> Hi,
> 
> In solr search I want to search with product name using Arabic letters.
> While searching, Arabic user can feel little default to search some product
> name. Because some characters need to mention while searching.
> 
> Ex: إ أ آ
> 
> 
> In the above mentioned characters, user can get combination of shift key.
> Usually if Arabic people will mention “ ا “  character and will get the
> below combined words.
> 
> Ex: إبرا
> 
> 
> In my solr schema.xml I defined product arabic name field as below
> 
> 
>  stored="true"/>
> 
> 
>   positionIncrementGap="100">
> 
>  
> 
>
> 
>
> 
> words="lang/stopwords_ar.txt" />
> 
>
> 
>
> 
>  
> 
>
> 
> 
> 
> What changes I have do in schame.xml. Please help me on this.
> 
> 
> 
> --
> Regards,
> Mohan.N
> 096896429683

Re: Arabic words search in solr

2017-01-31 Thread Steve Rowe

Mohan,

I downloaded and started Solr 4.9.0 and entered your example indexed and 
queried words into the Admin UI’s Analysis pane using the text_ar field type.  
You can see the results here: 
<http://sarowe.net:8080/solr-4.9.0.admin.ui.text_ar.analysis.png>.

Each of the indexed words and the query word are analyzed to the same string.  
They should match and return docs containing them as hits for the query word.

So, what is exactly the problem you are having?  What specifically doesn’t work?

FYI, in general you should be using the most recent release of Solr (6.4.0 
right now) unless there are reasons why you can't.  It’s the most 
stable/performant/supported version.

--
Steve
www.lucidworks.com

> On Jan 31, 2017, at 1:19 AM, mohan sundaram  wrote:
> 
> Hi,
> 
> I went through the solr references document which you shared in the link.
> Your shared references document pointing to solr version 6.4.0.
> 
> The implemented Solr version in my project is 4.9.0.
> 
> 
> As I mentioned earlier In my solr schema.xml I defined product Arabic name
> field as below:
> 
> /*--*/
> 
>  stored="true"/>
> 
> 
> 
>  positionIncrementGap="100">
> 
>
> 
> class="solr.StandardTokenizerFactory"/>
> 
> 
>
> 
> 
> ignoreCase="true" words="lang/stopwords_ar.txt" />
> 
>
> 
>
> 
>
> 
> 
> 
> /*--*/
> 
> 
> 
> I am indexing the Arabic content using “text_ar” field type.
> 
> 
> 
> 
> *Characters*
> 
> *ا*
> 
> *أ*
> 
> *إ*
> 
> *آ*
> 
> Shift key Considers for the above
> 
> Table 1
> 
> 
> These are the example of characters where I’m facing the searching
> difficulty.
> 
> 
> 
> 
> *Example Indexed words*
> 
> *ابرا*
> 
> *أبرا*
> 
> *إبرا*
> 
> *آبرا*
> 
> Table 2
> 
> These an example of indexed words in Solr.
> 
> 
> 
> *Searching word*
> 
> *ابرا*
> 
> Table 3
> 
> 
> Now my problem is, By searching for the above word(table 3) I should get
> all indexed words in table 2 in the output.
> 
> 
> 
> Is Solr version 4.9.0 compatible with Arabic search or do I need to upgrade
> to higher version?
> 
> 
> Kindly, do let me know if I need to give an example of all characters since
> I gave only for one character which is hamza with alef.
> 
> 
> Thanks,
> 
> Mohan
> 
> 
> 
> 
> On Mon, Jan 30, 2017 at 9:21 PM, Steve Rowe  wrote:
> 
>> Hi Mohan,
>> 
>> I answered your question on the solr-user list.  Did you see my response?
>> 
>> I CC’d you on this email, but you should know that Apache mailing lists
>> won’t automatically send you email unless you have subscribed to the list.
>> For more information, see <http://lucene.apache.org/solr
>> /community.html#mailing-lists-irc>.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Jan 29, 2017, at 2:16 PM, mohan sundaram 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> In solr search I want to search with product name using Arabic letters.
>>> While searching, Arabic user can feel little default to search some
>> product
>>> name. Because some characters need to mention while searching.
>>> 
>>> Ex: إ أ آ
>>> 
>>> 
>>> In the above mentioned characters, user can get combination of shift key.
>>> Usually if Arabic people will mention “ ا “  character and will get the
>>> below combined words.
>>> 
>>> Ex: إبرا
>>> 
>>> 
>>> In my solr schema.xml I defined product arabic name field as below
>>> 
>>> 
>>> >> stored="true"/>
>>> 
>>> 
>>> >> positionIncrementGap="100">
>>> 
>>> 
>>> 
>>>   
>>> 
>>>   
>>> 
>>>   >> words="lang/stopwords_ar.txt" />
>>> 
>>>   
>>> 
>>>   
>>> 
>>> 
>>> 
>>>   
>>> 
>>> 
>>> 
>>> What changes I have do in schame.xml. Please help me on this.
>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> Mohan.N
>>> 096896429683
>> 
>>

Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-02 Thread Steve Rowe

Hi Cliff,

The Solr query parsers (standard/“Lucene” and e/dismax anyway) have a problem 
that prevents SynonymGraphFilter from working: the text fed to your query 
analyzer is first split on whitespace.  So e.g. a query containing “United 
States” will never match multi-word synonym “United States”->”US”, since the 
analyzer will fist see “United” and then, separately, “States”.

I fixed the whitespace splitting problem in the classic Lucene query parser in 
.  (Note that this is *not* 
the same as Solr’s standard/“Lucene” query parser, which is actually a fork of 
Lucene’s query parser with added functionality.)

There is a Solr JIRA I’m working on to fix the whitespace splitting problem: 
.  I hope to get it committed 
in time for inclusion in Solr 6.5.

--
Steve
www.lucidworks.com

> On Feb 2, 2017, at 9:50 AM, Shawn Heisey  wrote:
> 
> On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
>> The SynonymGraphFilter API documentation contains the following statement
>> at the end:
>> 
>> "To get fully correct positional queries when your synonym replacements are
>> multiple tokens, you should instead apply synonyms using this TokenFilter
>> at query time and translate the resulting graph to a TermAutomatonQuery
>> e.g. using TokenStreamToTermAutomatonQuery."
> 
> Lucene is a programming API for search.  That documentation is intended
> for people who are writing Lucene programs.  Those users would be
> constructing query objects in their own code, so they would most likely
> know exactly which object needs to be changed to TermAutomatonQuery.
> 
> Solr is a Lucene program ... and an immensely complicated one.  Many
> Lucene improvements require changes in the end program for full
> support.  I suspect that Solr's capability has not been updated to use
> this new feature in Lucene.  I cannot say for sure, I hope someone who
> is familiar with this Lucene change and Solr internals can comment.
> 
> Thanks,
> Shawn
>

Re: Arabic words search in solr

2017-02-02 Thread Steve Rowe

Hi Mohan,

I ran your Case #1 through Solr 4.9.0’s Admin UI Analysis pane and I can see 
the analyzer for the field type “text_ar" analyzer does not remove all 
diacritics:

Indexed original: المؤسسة التجارية العمانية
Indexed analyzed: مؤسس تجار عمان

Query original: الموسسة التجارية
Query analyzed: موسس تجار

The analyzed query terms are the same as the first two analyzed indexed terms, 
with one exception: the hamza on the waw in the analyzed indexed term “مؤسس” 
was not stripped off by the analyzer, and so won’t match the analyzed query 
term “موسس”, which was entered by the user without the hamza.

Adding ICUFoldingFilterFactory to the “text_ar” field type fixed case #1 for me 
by stripping the hamza from the waw.  You can read more about this filter in 
the Solr Reference Guide (yes, this is basically for Solr 6.4, but I don’t 
think this functionality has changed between 4.9 and 6.4): 
.
  If you do this, you can remove the LowerCaseFilterFactory since 
ICUFoldingFilterFactory performs lowercasing as part of its work.

Note that to use ICUFoldingFilterFactory you must add three jars to the lib/ 
directory in your solr home dir.  Here’s how I did it:

$ mkdir example/solr/lib
$ cp dist/solr-analysis-extras-4.9.0.jar example/solr/lib/
$ cp contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.9.0.jar 
example/solr/lib/
$ cp contrib/analysis-extras/lib/icu4j-53.1.jar example/solr/lib/

--
Steve
www.lucidworks.com 

> On Feb 1, 2017, at 6:50 AM, mohanmca01  wrote:
> 
> Dear Steve,Thanks for investigating our problem. Our project is basically
> business directory search platform, and we have more than 100+ K business
> details information. I’m providing you some examples of Arabic words to
> reproduce the problem. please find attached word file where i explained
> everything along with screenshots. arabicSearch.docx
>  
> regarding upgrading to the latest version, our project is running on Java
> 1.7V, and if i need to upgrade then we have to upgrade Java, Application
> Server JBoos, and etc. which is not that right time to do this activity at
> all..!!
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4318227.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Arabic words search in solr

2017-02-08 Thread Steve Rowe

Hi Mohan,

I haven’t looked at the latest problems, but the ICU folding filter should be 
the last filter, to allow the Arabic normalization and stemming filters to see 
the original words.

--
Steve
www.lucidworks.com

> On Feb 8, 2017, at 10:58 PM, mohanmca01  wrote:
> 
> Hi Steve,
> 
> Thanks for your continues investigation on this issue.
> 
> I added ICU Folding Filter in schema.xml file and re-indexed all the data
> again. i noticed some improvements in search but its not really as expected.
> 
> below is the configuration changed in schema file:
> 
> -
> 
>   
>
> 
> 
> words="lang/stopwords_ar.txt" />
> 
>
>
>  
>
> -
> 
> attached the document for your reference where highlighted ones in red are
> not working as expected.
> 
> Also, i have raised one point regarding Jquery autocomplete with unique
> records..kindly let me know if you have any background on how to implement
> the same.
> 
> arabicSearch.docx
>   
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4319436.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unable to build Solr 5.5.3 from source

2017-02-13 Thread Steve Rowe

Hi Sahil,

I downloaded the Solr 5.5.3 source, deleted my Ivy cache, and successfully ran 
“ant compile” from the solr/ directory.

My Ant version is the same as yours.

Do you have ivy-2.3.0.jar in your ~/.ant/lib/ directory?  (I do.)

Are you attempting to compile the unmodified released source, or have you made 
modifications?

AFAICT, those are warnings, not errors - can you post the full output somewhere 
and give the link?

--
Steve
www.lucidworks.com

> On Feb 13, 2017, at 1:52 AM, Sahil Agarwal  wrote:
> 
> I have not been able to build Solr 5.5.3 from the source.
> 
> - I was able to build Solr 6.4 successfully but haven't been able to build 
> solr
> 5.5.3 (which I need) successfully.
> - I have tried deleting the cache and building again. Same errors.
> 
> I've been getting unresolved dependencies error. I get the following output
> when using ant compile -v
> 
> 
> Apache Ant(TM) version 1.9.6 compiled on July 8 2015
> Trying the default build file: build.xml
> Buildfile: /home/sahil/Work/solr/solr-5.5.3/build.xml
> Detected Java version: 1.8 in: /usr/lib/jvm/jdk1.8.0_121/jre
> Detected OS: Linux
> parsing buildfile /home/sahil/Work/solr/solr-5.5.3/build.xml with URI =
> file:/home/sahil/Work/solr/solr-5.5.3/build.xml
> Project base dir set to: /home/sahil/Work/solr/solr-5.5.3
> parsing buildfile
> jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/antlib.xml
> with URI = 
> jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/antlib.xml
> from a zip file
> Importing file /home/sahil/Work/solr/solr-5.5.3/lucene/common-build.xml
> from /home/sahil/Work/solr/solr-5.5.3/build.xml
> Overriding previous definition of reference to ant.projectHelper
> parsing buildfile /home/sahil/Work/solr/solr-5.5.3/lucene/common-build.xml
> with URI = file:/home/sahil/Work/solr/solr-5.5.3/lucene/common-build.xml
> 
> .
> .
> .
> // Deleted to make file smaller. Please tell if anything is needed.
> .
> .
> .
> 
> resolve:
> [ivy:retrieve] no resolved descriptor found: launching default resolve
> Overriding previous definition of property "ivy.version"
> [ivy:retrieve] using ivy parser to parse file:/home/sahil/Work/solr/
> solr-5.5.3/solr/core/ivy.xml
> [ivy:retrieve] :: resolving dependencies :: org.apache.solr#core;working@
> D-PGB7YZ
> [ivy:retrieve]  confs: [compile, compile.hadoop]
> [ivy:retrieve]  validate = true
> [ivy:retrieve]  refresh = false
> [ivy:retrieve] resolving dependencies for configuration 'compile'
> [ivy:retrieve] == resolving dependencies for
> org.apache.solr#core;working@D-PGB7YZ
> [compile]
> [ivy:retrieve] == resolving dependencies org.apache.solr#core;working@
> D-PGB7YZ->commons-codec#commons-codec;1.10 [compile->master]
> [ivy:retrieve] default: Checking cache for: dependency:
> commons-codec#commons-codec;1.10 {compile=[master]}
> [ivy:retrieve] don't use cache for commons-codec#commons-codec;1.10:
> checkModified=true
> [ivy:retrieve]  tried /home/sahil/.ivy2/local/commons-codec/commons-codec/1.
> 10/ivys/ivy.xml
> [ivy:retrieve]  tried /home/sahil/.ivy2/local/commons-codec/commons-codec/1.
> 10/jars/commons-codec.jar
> [ivy:retrieve]  local: no ivy file nor artifact found for
> commons-codec#commons-codec;1.10
> [ivy:retrieve] main: Checking cache for: dependency:
> commons-codec#commons-codec;1.10 {compile=[master]}
> [ivy:retrieve] main: module revision found in cache:
> commons-codec#commons-codec;1.10
> [ivy:retrieve]  found commons-codec#commons-codec;1.10 in public
> [ivy:retrieve] == resolving dependencies org.apache.solr#core;working@
> D-PGB7YZ->org.apache.commons#commons-exec;1.3 [compile->master]
> [ivy:retrieve] default: Checking cache for: dependency:
> org.apache.commons#commons-exec;1.3 {compile=[master]}
> [ivy:retrieve] don't use cache for org.apache.commons#commons-exec;1.3:
> checkModified=true
> [ivy:retrieve]  tried /home/sahil/.ivy2/local/org.
> apache.commons/commons-exec/1.3/ivys/ivy.xml
> [ivy:retrieve]  tried /home/sahil/.ivy2/local/org.
> apache.commons/commons-exec/1.3/jars/commons-exec.jar
> [ivy:retrieve]  local: no ivy file nor artifact found for
> org.apache.commons#commons-exec;1.3
> [ivy:retrieve] main: Checking cache for: dependency:
> org.apache.commons#commons-exec;1.3 {compile=[master]}
> [ivy:retrieve] main: module revision found in cache:
> org.apache.commons#commons-exec;1.3
> [ivy:retrieve]  found org.apache.commons#commons-exec;1.3 in public
> [ivy:retrieve] == resolving dependencies org.apache.solr#core;working@
> D-PGB7YZ->commons-fileupload#commons-fileupload;1.3.1 [compile->master]
> [ivy:retrieve] default: Checking cache for: dependency:
> commons-fileupload#commons-fileupload;1.3.1 {compile=[master]}
> [ivy:retrieve] don't use cache for 
> commons-fileupload#commons-fileupload;1.3.1:
> checkModified=true
> [ivy:retrieve]  local: revision in cache: commons-fileupload#commons-
> fileupload;1.3.1
> [ivy:retrieve]  found commons-fileupload#commons-fileupload;1.3.1 in local
> [ivy:retrieve] == resolving dependencies org.apache.

Re: Unable to build Solr 5.5.3 from source

2017-02-13 Thread Steve Rowe

Sahil,

Dependency versions are in lucene/ivy-versions.properties.  When we upgrade, we 
change the version there instead of in each ivy.xml file with the dependency.

--
Steve
www.lucidworks.com

> On Feb 13, 2017, at 11:00 AM, Sahil Agarwal  wrote:
> 
> The issue has been fixed. Seems there is a problem in *solr/core/ivy.xml *
> 
>  rev="${/commons-fileupload/commons-fileupload}" conf="compile"/>
> 
> In this line, I replaced the ${/commons-fileupload/commons-fileupload} with
> 1.3.2 as the variable seemed to be downloading version 1.3.1 of the
> commons-fileupload instead of the latest 1.3.2 version.
> 
> Once this was done, ant built the sources successfully.
> 
> Thanks!
> Sahil
> 
> On 13 February 2017 at 19:30, Shawn Heisey  wrote:
> 
>> On 2/12/2017 11:52 PM, Sahil Agarwal wrote:
>>> I have not been able to build Solr 5.5.3 from the source.
>> 
>>> Detected Java version: 1.8 in: /usr/lib/jvm/jdk1.8.0_121/jre
>> 
>> The unresolved dependency error is unusual, I'm not really sure what's
>> going on there.  My best idea would be to delete the ivy cache entirely
>> and try again.  These would be the commands I would use, from the top
>> level of the source code:
>> 
>> rm -rf ~/.ivy2
>> ant clean clean-jars
>> 
>> This will cause ivy to re-download all dependent jars when you do the
>> compile, and if you are using ivy with any other java source code, might
>> cause some temporary issues for those builds.
>> 
>> Even if you get ivy to work right, you're going to run into another
>> problem due to the JDK version you've got.  Oracle changed the javadoc
>> compiler to be more strict in that version, which broke the build.
>> 
>> https://issues.apache.org/jira/browse/LUCENE-7651
>> 
>> The fix has been backported to the 5.5 branch, so it will be available
>> in the 5.5.4 tag when it is created.  The 5.5.3 build will continue to
>> be broken with Java 8u121.
>> 
>> You'll need to either get the branch_5_5 source code from git to build
>> 5.5.4, or downgrade your JDK version.  Alternatively, you can wait for
>> the 5.5.4 release to be available to get the source code, or get the
>> patch and apply it to your 5.5.3 code.  I do not know if the patch will
>> apply cleanly -- it may require manual work.
>> 
>> Thanks,
>> Shawn
>> 
>>

Re: Arabic words search in solr

2017-02-14 Thread Steve Rowe

Hi Mohan,

Did you change the order of the filters as I suggested?

--
Steve
eww.lucidworks.com

On Tue, Feb 14, 2017 at 8:05 AM mohanmca01  wrote:

> Hi Steve,
>
> any update on this .???.. I am waiting for your inputs..
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4320253.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Arabic words search in solr

2017-02-15 Thread Steve Rowe

Hi Mohan,

When I said "the ICU folding filter should be the last filter, to allow the 
Arabic normalization and stemming filters to see the original words”, I meant 
that no filter should follow it.  

You did not make that change.

Here’s what I mean:

   
  
   
   
   
   
   
 
   

--
Steve
www.lucidworks.com

> On Feb 15, 2017, at 12:23 AM, mohanmca01  wrote:
> 
> Hi Steve,
> 
> As per your suggestion,I added ICUFoldingFilterFactory in schema.xml as
> below:
> 
> 
>   
>
>
> words="lang/stopwords_ar.txt" />
>
>
>  
>
> 
> I attached expecting result document in previous mail thread for your
> references.
> 
> Kindly check and let me know.
> 
> Thanks
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4320427.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Arabic words search in solr

2017-02-21 Thread Steve Rowe

Hi Mohan,

It looks to me like the example query should match, since the analyzed query 
terms look like a subset of the analyzed document terms.

Did you re-index your docuemnts after you changed your schema?  If not, then 
the indexed documents won’t have the same terms as the ones you see on the 
Admin UI Analysis pane.

If you have re-indexed, and are still not getting matches you expect, please 
include textual examples of the remaining problems, so that I can copy/paste to 
reproduce the problem - I can’t copy/paste Arabic from images you pointed to.

--
Steve
www.lucidworks.com

> On Feb 21, 2017, at 1:28 AM, mohanmca01  wrote:
> 
> Hi Steve,
> 
> I changed ICU folding filter order and re-index entire Arabic content. But
> still problem is present. I am not able to get the expected result.
> 
> I attached screen shot for your references.
>  
>  
>  
> 
> Kindly check and let me know.
> 
> Thanks
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4321397.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Arabic words search in solr

2017-02-23 Thread Steve Rowe

Hi Mohan,

I indexed your 9 examples as simple documents after mapping dynamic field 
“*_ar” to the “text_ar” field type:

-
[{"id":"1", "name_ar":"المؤسسة التجارية العمانية"},
{"id":"2", "name_ar":"شركة التأمين الأهلية ش.م.ع.م"},
{"id":"3", "name_ar":"شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - - 
مركز شرطة إبراء"},
{"id":"4", "name_ar":"شركة ظفار للتأمين ش.م.ع.ع"},
{"id":"5", "name_ar":"طوارئ المستشفيات   - طوارئ مستشفى صحار"},
{"id":"6", "name_ar":"شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية - - مركز 
شرطة إزكي"},
{"id":"7", "name_ar":"المؤسسة التجارية العمانية"},
{"id":"8", "name_ar":"وزارة الصحة - المديرية العامة للخدمات الصحية  محافظة 
الداخلية -  - مستشفى إزكي (البدالة)  - الطوارئ"},
{"id":"9", "name_ar":"أسعار المكالمات الدولية - مونتسرات -  - مونتسرات”}]
-

Then when I search from the Admin UI for “name_ar:شرطة ازكي” (the query in one 
of your screenshots with numFound=0) I get the following results:

-
{
  "responseHeader": {
"status": 0,
"QTime": 1,
"params": {
  "indent": "true",
  "q": "name_ar:شرطة ازكي",
  "_": "1487912340325",
  "wt": "json"
}
  },
  "response": {
"numFound": 2,
"start": 0,
"docs": [
  {
"id": "6",
"name_ar": [
  "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية - - مركز شرطة إزكي"
],
"_version_": 1560170434794619000
  },
  {
"id": "3",
"name_ar": [
  "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - - مركز شرطة 
إبراء"
],
"_version_": 1560170434793570300
  }
]
  }
}
-

So I cannot reproduce the failures you’re seeing.  In fact, I tried all 9 of 
the queries you listed as not working, and all of them matched at least one of 
the above 9 documents, except for case 5 (which I give details for below).  Are 
you absolutely sure that you reindexed your data with the ICUFF last?

The one query that didn’t return any matches for me is “name_ar:طوارى صحار”.  
Here’s why:

Indexed original: طوارئ صحار
Indexed analyzed: طواري صحار

Query original: طوارى صحار
Query analyzed: طوار صحار

In the analyzed indexed form, the “ئ” (yeh with hamza above) is left intact by 
ArabicNormalizationFilter and ArabicStemFilter, and then the ICUFoldingFilter 
converts it to “ي” (yeh without the hamza).

In the analyzed query, ArabicNormalizationFilter converts “طوارى” to “طواري” 
(alef maksura->yeh), which ArabicStemFilter converts to “طوار” by removing the 
trailing yeh.

I don’t know what the correct thing to do is to make alef maksura and yeh match 
each other, but one possibility is adding a char filter that converts all alefs 
maksura into yehs with hamza, like this:

Re: Susbcribe

2017-03-01 Thread Steve Rowe

Hi Pankaj,

To subscribe, send an email to .

More info here: 
.

--
Steve
www.lucidworks.com

Re: Arabic words search in solr

2017-03-02 Thread Steve Rowe

Hi Mohan,

> On Feb 26, 2017, at 1:37 AM, mohanmca01  wrote:
> 
> i searched with (bizNameAr: شرطة ازكي), and am getting:
> […]
> 
> the expected result is:   "id": "82",
>  "bizNameAr": "شرطة عمان السلطانية - قيادة
> شرطة محافظة الداخلية - - مركز *شرطة إزكي*",
> 
> as the above has both the words mentioned in the query (marked as Bold),
> where the rest have the following:
> 
>"id": "63",
>"bizNameAr": "شركة ظفار للتأمين ش.م.ع.ع - فرع ازكي"
> 
> it has only one word of the query (ازكي)
> 
>"id": "56",
>"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية 
> -  - مركز شرطة إبراء"
> 
> it has only one word of the query (شرطة)
> 
> "id": "79",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - - مركز
> شرطة إبراء"
> 
> It has only one word of the query (شرطة)
> 
> where the above 3 records should not come in the result since already 2
> words mentioned in the query, and only one record has these two words.

Solr's standard query language includes two mechanisms for requiring terms: ‘+’ 
before a required term, and ‘AND’ between two required terms.  ‘+’ is better - 
see  for more 
information.

You can also set the default operator to ‘AND’, e.g. via request parameter 
“&q.op=AND” (if this is always what you want, you can include this in the 
/select request handler’s definition in solrconfig.xml).  See 

for more information.  

> I would really suggest if we can give you a real-time demo on our system
> with my Arab colleague so it can be more clear for you. let us know if we
> can do that.

I prefer to keep discussion on this public mailing list so that others can 
benefit.  If you find that you need faster or more interactive help, you can 
check out the list of people who have indicated that they provide Solr support: 
.

--
Steve
www.lucidworks.com

Re: Arabic words search in solr

2017-03-09 Thread Steve Rowe

Hi Mohan,

Your examples refer to documents I don’t have in my 9 document set, so I recast 
the problem to a query/doc combo I have from earlier in this thread, and I was 
able to restrict hits to only documents that contained all terms from the query.

If I use the query “name_ar:(شرطة ازكي)” I get 3 hits (I’ve left out some 
details):

-
{ "responseHeader": { ... "params": { "q":"name_ar:(شرطة ازكي)”, ... } },
  "response": { "numFound":3, "start":0,
"docs": [
  { "id":"6", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية 
- - مركز شرطة إزكي"], ... },
  { "id":"3", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة شمال 
الشرقية - - مركز شرطة إبراء”], ... },
  { "id":"8", "name_ar":["وزارة الصحة - المديرية العامة للخدمات الصحية  
محافظة الداخلية -  - مستشفى إزكي (البدالة)  - الطوارئ”], ... }]}
-

If I add “q.op=AND” to the request, only one of these documents matches - note 
that I’ve also checked the “debugQuery” option on the Admin UI:

-
{ "responseHeader": { … 
  "params": { "q":"name_ar:(شرطة ازكي)”, "q.op":"AND”, "debugQuery":“true”, ... 
} },
  "response": { "numFound":1, "start":0,
"docs": [
  { "id":"6", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية 
- - مركز شرطة إزكي”], ... }]},
  "debug": {
"rawquerystring": "name_ar:(شرطة ازكي)",
"querystring": "name_ar:(شرطة ازكي)",
"parsedquery": "+name_ar:شرط +name_ar:ازك",
"parsedquery_toString": "+name_ar:شرط +name_ar:ازك",
-

Note the “parsedquery" above - it shows how to require individual terms when 
specifying the field for each term.  This is how the "name_ar:(شرطة ازكي)” 
query is interpreted when the "q.op=AND” request param is used.

The equivalent query using ‘+’ signs is: "name_ar:(+شرطة +ازكي)”.  This *looks* 
strange because of how the Unicode bidirectional algorithm works.  This W3C 
writeup uses Arabic to drive its discussion of display of strings that contain 
both RTL and LTR character runs, and I found it quite helpful here: 
.

Here’s the output from the "name_ar:(+شرطة +ازكي)” query:

-
{ "responseHeader": { ... "params": { "q":"name_ar:(+شرطة +ازكي)", 
"debugQuery":“true” ... } },
  "response": { "numFound":1, "start":0,
"docs": [
  { "id":"6", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية 
- - مركز شرطة إزكي”], ... }]},
  "debug": {
"rawquerystring": "name_ar:(+شرطة +ازكي)",
"querystring": "name_ar:(+شرطة +ازكي)",
"parsedquery": "+name_ar:شرط +name_ar:ازك",
"parsedquery_toString": "+name_ar:شرط +name_ar:ازك",
-

The above is the same result (and has the same parsedQuery) as query 
"name_ar:(شرطة ازكي)” with request param “q.op=AND”.

I won’t show it here, but I get the same 1-hit result for this query when I use 
AND instead of ‘+’: "name_ar:(شرطة AND ازكي)” - note that the terms only 
*appear* to be in reverse order because of how the Unicode bidirectional 
algorithm works.

> On Mar 9, 2017, at 2:30 AM, mohanmca01  wrote:
> 
> I saw your products in lucidworks website. Do you have any solr arabic
> support customized product?

Lucidworks doesn’t have a specifically Arabic-focused product, but we have 
helped people enable Arabic search in the past.  Click on the “Contact Us” link 
on the website if you’d like to talk to us about getting involved.

--
Steve
www.lucidworks.com

1 2 3 4 >

1 - 100 of 302 matches

Mail list logo