Facets

2014-01-14 Thread dmacuk
First excuse me if I do not use the correct terminology.

I have some records in a Solr document with a field called icDesc_en.

The contents of this field contains a sentences of two, e.g. "2.4l engine
automatic 5 speed", "Left front door, electric windows", etc.

When I preform a search to retrieve facets on the field, it comes back as
individual words, e.g. engine:10, front:15 etc,

I would like the facet count to return "2.4l engine automatic 5 speed":20,
""Left front door, electric windows":15.

How can I make this happen?

TIA

David.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-tp491.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facets

2014-01-14 Thread Karan jindal
what's field type of "icDesc_en"?
See it in schema.xml in conf directory of your  solr setup.

I guess it must be tokenized by tokenizer.
If that is the case than change the type of this field to "string" type.
By doing this tokens wouldn't be created and you will get desired results.

-Karan


On Tue, Jan 14, 2014 at 2:15 PM, dmacuk wrote:

> First excuse me if I do not use the correct terminology.
>
> I have some records in a Solr document with a field called icDesc_en.
>
> The contents of this field contains a sentences of two, e.g. "2.4l engine
> automatic 5 speed", "Left front door, electric windows", etc.
>
> When I preform a search to retrieve facets on the field, it comes back as
> individual words, e.g. engine:10, front:15 etc,
>
> I would like the facet count to return "2.4l engine automatic 5 speed":20,
> ""Left front door, electric windows":15.
>
> How can I make this happen?
>
> TIA
>
> David.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Facets-tp491.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Simple payloads example not working

2014-01-14 Thread michael.boom
Investigating, it looks that the payload.bytes property is where the problem
is.
payload.toString() outputs corrects values, but .bytes property seems to
behave a little weird:
public class CustomSimilarity extends DefaultSimilarity {

@Override
public float scorePayload(int doc, int start, int end, BytesRef payload)
{
if (payload != null) {
Float pscore = PayloadHelper.decodeFloat(payload.bytes);
System.out.println("payload : " + payload.toString() + ",
payload bytes: " + payload.bytes.toString() + ", decoded value is " +
pscore);
return pscore;
}
return 1.0f;
}
}

outputs on query:
http://localhost:8983/solr/collection1/pds-search?q=payloads:testone&wt=json&indent=true&debugQuery=true

payload : [41 26 66 66], payload bytes: [B@149c678, decoded value is 10.4
payload : [41 f0 0 0], payload bytes: [B@149c678, decoded value is 10.4
payload : [42 4a cc cd], payload bytes: [B@149c678, decoded value is 10.4
payload : [42 c6 0 0], payload bytes: [B@149c678, decoded value is 10.4
payload : [41 26 66 66], payload bytes: [B@850fb7, decoded value is 10.4
payload : [41 f0 0 0], payload bytes: [B@1cad357, decoded value is 10.4
payload : [42 4a cc cd], payload bytes: [B@f922cf, decoded value is 10.4
payload : [42 c6 0 0], payload bytes: [B@5c4dc4, decoded value is 10.4


Something doesn't seem right here. Any idea why this behaviour?
Is anyone using payloads using Solr 4.6.0 ?




-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111214.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cancel Solr query?

2014-01-14 Thread Mikhail Khludnev
if you are interested here is the brief scratch of the possible hack:
- client adds special query ID parameter into request &interruptID=DEADBEEF
- create InterruptionComponent and add into search handler list before
query component
- this component adds PostFilter which yields DelegatingCollector which
checks volatile interruption flag, if it's triggered, the collector throws
TimeLimitingCollector.TimeExceededException which is handled by current
timeAllowed logic (see TimeLimitingCollector)
- DelegatingCollector is registered in globally accessible map by the given
interruptID. InterruptionComponent can hold this map.
- SolrRequestInfo.addCloseHook(Closeable) can wipe the entry from this map.

- InterruptionComponent is registered in other SearchHandler without any
other sibling components
- when it receives interruption request with the specified
&interruptID=DEADBEEF it can find PostFilter in the map and trigger the
interruption flag.



On Mon, Jan 13, 2014 at 10:47 PM, Luis Lebolo  wrote:

> Hi All,
>
> Is it possible to cancel a Solr query/request currently in progress?
>
> Suppose the user starts searching for something (that takes a long time for
> Solr to process), then decides the modify the query. I can simply ignore
> the previous request and create a new request, but Solr is still processing
> the old request, correct?
>
> Is there any way to cancel that first request?
>
> Thanks,
> Luis
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


RE: Simple payloads example not working

2014-01-14 Thread Markus Jelsma
Strange, is it really floats you are inserting as payload? We use payloads too 
but write them via PayloadAttribute in custom token filters as float. 
 
-Original message-
> From:michael.boom 
> Sent: Tuesday 14th January 2014 11:59
> To: solr-user@lucene.apache.org
> Subject: RE: Simple payloads example not working
> 
> Investigating, it looks that the payload.bytes property is where the problem
> is.
> payload.toString() outputs corrects values, but .bytes property seems to
> behave a little weird:
> public class CustomSimilarity extends DefaultSimilarity {
> 
> @Override
> public float scorePayload(int doc, int start, int end, BytesRef payload)
> {
> if (payload != null) {
> Float pscore = PayloadHelper.decodeFloat(payload.bytes);
> System.out.println("payload : " + payload.toString() + ",
> payload bytes: " + payload.bytes.toString() + ", decoded value is " +
> pscore);
> return pscore;
> }
> return 1.0f;
> }
> }
> 
> outputs on query:
> http://localhost:8983/solr/collection1/pds-search?q=payloads:testone&wt=json&indent=true&debugQuery=true
> 
> payload : [41 26 66 66], payload bytes: [B@149c678, decoded value is 10.4
> payload : [41 f0 0 0], payload bytes: [B@149c678, decoded value is 10.4
> payload : [42 4a cc cd], payload bytes: [B@149c678, decoded value is 10.4
> payload : [42 c6 0 0], payload bytes: [B@149c678, decoded value is 10.4
> payload : [41 26 66 66], payload bytes: [B@850fb7, decoded value is 10.4
> payload : [41 f0 0 0], payload bytes: [B@1cad357, decoded value is 10.4
> payload : [42 4a cc cd], payload bytes: [B@f922cf, decoded value is 10.4
> payload : [42 c6 0 0], payload bytes: [B@5c4dc4, decoded value is 10.4
> 
> 
> Something doesn't seem right here. Any idea why this behaviour?
> Is anyone using payloads using Solr 4.6.0 ?
> 
> 
> 
> 
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111214.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


RE: Simple payloads example not working

2014-01-14 Thread michael.boom
Yes, it's float:


The scenario is simple to replicate - default solr-4.6.0 example, with a
custom Similarity class (the one above) and a custom queryparser (again,
listed above).
I posted the docs in XML format (docs also listed above) using
exampledocs/post.sh utility.

Indeed it looks weird, and can't explain it.




-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111219.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facets

2014-01-14 Thread dmacuk
Karan,

The field was a "text" type, which by experimentation I changed to "string"
and all was OK.

Thanks for your prompt reply.

David



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-tp491p4111234.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facets

2014-01-14 Thread Aruna Kumar Pamulapati
Hi David,

As Karan suggested,your current icDesc_en is tokenized (understandably you
need to do that if you want to search on it in powerful way). So the
solution is create another field say icDesc_en_facet and define "string" as
the type (like Karan suggested) and then do this :  .
Now you can use icDesc_en_facet to facet. Look for an example here:
http://serverfault.com/questions/463047/not-getting-desired-results-from-solr-facets



On Tue, Jan 14, 2014 at 4:09 AM, Karan jindal wrote:

> what's field type of "icDesc_en"?
> See it in schema.xml in conf directory of your  solr setup.
>
> I guess it must be tokenized by tokenizer.
> If that is the case than change the type of this field to "string" type.
> By doing this tokens wouldn't be created and you will get desired results.
>
> -Karan
>
>
> On Tue, Jan 14, 2014 at 2:15 PM, dmacuk  >wrote:
>
> > First excuse me if I do not use the correct terminology.
> >
> > I have some records in a Solr document with a field called icDesc_en.
> >
> > The contents of this field contains a sentences of two, e.g. "2.4l engine
> > automatic 5 speed", "Left front door, electric windows", etc.
> >
> > When I preform a search to retrieve facets on the field, it comes back as
> > individual words, e.g. engine:10, front:15 etc,
> >
> > I would like the facet count to return "2.4l engine automatic 5
> speed":20,
> > ""Left front door, electric windows":15.
> >
> > How can I make this happen?
> >
> > TIA
> >
> > David.
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Facets-tp491.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: background merge hit exception while optimizing index (SOLR 4.4.0)

2014-01-14 Thread Ralf Matulat

I checked the index without any problems beeing found.
So it is not obvious, what is going wrong here while the index itself 
looks okay.


Next step, updating java, is work in progress.
So I will come back after sorting out the java version as the cause for 
the failing optimize.


The checkIndex output:

# java -cp lucene-core-4.4.0.jar -ea:org.apache.lucene... 
org.apache.lucene.index.CheckIndex /opt/solr/core-archiv/data/index/


Opening index @ /opt/solr/core-archiv/data/index/

Segments file=segments_44u numSegments=7 version=4.4 format= 
userData={commitTimeMSec=1389680580721}

  1 of 7: name=_dc8 docCount=9876366
codec=Lucene42
compound=false
numFiles=13
size (MB)=39,878.536
diagnostics = {os.arch=amd64, os=Linux, java.vendor=IBM 
Corporation, timestamp=1389390082688, mergeFactor=6, 
mergeMaxNumSegments=1, source=merge, lucene.version=4.4.0 1504776 - 
sarowe - 2013-07-19 02:53:42, java.version=1.6.0, 
os.version=2.6.27.19-5-default}

has deletions [delGen=50]
test: open reader.OK [2355 deleted docs]
test: fields..OK [18 fields]
test: field norms.OK [11 fields]
test: terms, freq, prox...OK [67539365 terms; 2886174773 terms/docs 
pairs; 3908230597 tokens]
test (ignoring deletes): terms, freq, prox...OK [67545837 terms; 
2886362782 terms/docs pairs; 3908385036 tokens]
test: stored fields...OK [129471633 total field count; avg 
13.112 fields per doc]
test: term vectorsOK [29360029 total vector count; avg 
2.974 term/freq vector fields per doc]

test: docvalues...OK [0 total doc count; 0 docvalues fields]

  2 of 7: name=_e8u docCount=4250
codec=Lucene42
compound=false
numFiles=13
size (MB)=10.137
diagnostics = {os.arch=amd64, os=Linux, java.vendor=IBM 
Corporation, timestamp=1389469381574, mergeFactor=3, 
mergeMaxNumSegments=-1, source=merge, lucene.version=4.4.0 1504776 - 
sarowe - 2013-07-19 02:53:42, java.version=1.6.0, 
os.version=2.6.27.19-5-default}

has deletions [delGen=1]
test: open reader.OK [7 deleted docs]
test: fields..OK [18 fields]
test: field norms.OK [11 fields]
test: terms, freq, prox...OK [106619 terms; 736990 terms/docs 
pairs; 899712 tokens]
test (ignoring deletes): terms, freq, prox...OK [106849 terms; 
739599 terms/docs pairs; 903504 tokens]
test: stored fields...OK [56355 total field count; avg 13.282 
fields per doc]
test: term vectorsOK [11933 total vector count; avg 2.812 
term/freq vector fields per doc]

test: docvalues...OK [0 total doc count; 0 docvalues fields]

  3 of 7: name=_fv0 docCount=4603
codec=Lucene42
compound=false
numFiles=12
size (MB)=15.763
diagnostics = {os.arch=amd64, os=Linux, java.vendor=IBM 
Corporation, timestamp=1389592952262, mergeFactor=3, 
mergeMaxNumSegments=1, source=merge, lucene.version=4.4.0 1504776 - 
sarowe - 2013-07-19 02:53:42, java.version=1.6.0, 
os.version=2.6.27.19-5-default}

no deletions
test: open reader.OK
test: fields..OK [18 fields]
test: field norms.OK [11 fields]
test: terms, freq, prox...OK [142338 terms; 1095467 terms/docs 
pairs; 1467387 tokens]
test: stored fields...OK [61486 total field count; avg 13.358 
fields per doc]
test: term vectorsOK [13686 total vector count; avg 2.973 
term/freq vector fields per doc]

test: docvalues...OK [0 total doc count; 0 docvalues fields]

  4 of 7: name=_g6z docCount=2853
codec=Lucene42
compound=false
numFiles=12
size (MB)=8.187
diagnostics = {os.arch=amd64, os=Linux, java.vendor=IBM 
Corporation, timestamp=1389610041349, mergeFactor=3, 
mergeMaxNumSegments=1, source=merge, lucene.version=4.4.0 1504776 - 
sarowe - 2013-07-19 02:53:42, java.version=1.6.0, 
os.version=2.6.27.19-5-default}

no deletions
test: open reader.OK
test: fields..OK [18 fields]
test: field norms.OK [11 fields]
test: terms, freq, prox...OK [97344 terms; 549425 terms/docs pairs; 
705964 tokens]
test: stored fields...OK [38013 total field count; avg 13.324 
fields per doc]
test: term vectorsOK [8059 total vector count; avg 2.825 
term/freq vector fields per doc]

test: docvalues...OK [0 total doc count; 0 docvalues fields]

  5 of 7: name=_gzx docCount=3647
codec=Lucene42
compound=false
numFiles=12
size (MB)=13.878
diagnostics = {os.arch=amd64, os=Linux, java.vendor=IBM 
Corporation, timestamp=1389640939123, mergeFactor=3, 
mergeMaxNumSegments=-1, source=merge, lucene.version=4.4.0 1504776 - 
sarowe - 2013-07-19 02:53:42, java.version=1.6.0, 
os.version=2.6.27.19-5-default}

no deletions
test: open reader.OK
test: fields..OK [18 fields]
test: field norms.OK [11 fields]
test: terms, freq, prox...OK [133598 terms; 923902 terms/docs 
pairs; 1243419 tokens

Re: Can I define the copy field like title_*

2014-01-14 Thread rachun
thank you very much Mr. Sumit



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-define-the-copy-field-like-title-tp468p471.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can I define the copy field like title_*

2014-01-14 Thread rachun
I just wonder can I define copy field like this



instead of 

 



millions thank you
Rachun



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-define-the-copy-field-like-title-tp468.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Simple payloads example not working

2014-01-14 Thread michael.boom
Hi Markus, 

Do you have any example/tutorials of your payloads in custom filter
implementation ?

I really want to get payloads working, in any way.
Thanks!



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111244.html
Sent from the Solr - User mailing list archive at Nabble.com.


Splitting strings in Java - how to escape delimiter characters?

2014-01-14 Thread Shawn Heisey
I have a Java question, for a custom update processor I'm developing. It
takes an input field of the following format:

field:value;mvfield:value1;mvfield:value2

With an inner delimiter set to a colon and an outer delimiter set to a
semicolon, this results in two new fields going into the document. The
field named 'field' has one value and the field named mvfield has two.

This code uses the String#split method,  so it can't deal with the
delimiter characters being escaped with a backslash.

How can I make the code deal with an escape character (backslash) on the
two delimiters and the escape character itself? Unless it's absolutely
necessary or super easy, I do not plan to deal with the full set of regex
escaped characters.

I can move this discussion to the development list, but I thought I would
start here.

Thanks,
Shawn




Re: Splitting strings in Java - how to escape delimiter characters?

2014-01-14 Thread Steve Rowe
Hi Shawn,

Solrj’s StrUtils.splitSmart() should do exactly what you want - in the first 
pass, split on semicolon and don’t decode backslash escaping, and then in the 
inner loop, use the same method to split on colons and decode backslash 
escaping.  I think :).

Steve
 
On Jan 14, 2014, at 10:07 AM, Shawn Heisey  wrote:

> I have a Java question, for a custom update processor I'm developing. It
> takes an input field of the following format:
> 
> field:value;mvfield:value1;mvfield:value2
> 
> With an inner delimiter set to a colon and an outer delimiter set to a
> semicolon, this results in two new fields going into the document. The
> field named 'field' has one value and the field named mvfield has two.
> 
> This code uses the String#split method,  so it can't deal with the
> delimiter characters being escaped with a backslash.
> 
> How can I make the code deal with an escape character (backslash) on the
> two delimiters and the escape character itself? Unless it's absolutely
> necessary or super easy, I do not plan to deal with the full set of regex
> escaped characters.
> 
> I can move this discussion to the development list, but I thought I would
> start here.
> 
> Thanks,
> Shawn
> 
> 



Re: Splitting strings in Java - how to escape delimiter characters?

2014-01-14 Thread Yonik Seeley
Look at the StrUtils.splitSmart methods... the first variant treats
quotes specially,
the second variant doesn't (that's the one you probably want).

-Yonik
http://heliosearch.org -- off-heap filters for solr


On Tue, Jan 14, 2014 at 10:07 AM, Shawn Heisey  wrote:
> I have a Java question, for a custom update processor I'm developing. It
> takes an input field of the following format:
>
> field:value;mvfield:value1;mvfield:value2
>
> With an inner delimiter set to a colon and an outer delimiter set to a
> semicolon, this results in two new fields going into the document. The
> field named 'field' has one value and the field named mvfield has two.
>
> This code uses the String#split method,  so it can't deal with the
> delimiter characters being escaped with a backslash.
>
> How can I make the code deal with an escape character (backslash) on the
> two delimiters and the escape character itself? Unless it's absolutely
> necessary or super easy, I do not plan to deal with the full set of regex
> escaped characters.
>
> I can move this discussion to the development list, but I thought I would
> start here.
>
> Thanks,
> Shawn
>
>


How to override rollback behavior in DIH

2014-01-14 Thread Peter Keegan
I have a custom data import handler that creates an ExternalFileField from
a source that is different from the main index. If the import fails (in my
case, a connection refused in URLDataSource), I don't want to roll back any
uncommitted changes to the main index. However, this seems to be the
default behavior. Is there a way to override the IndexWriter rollback?

Thanks,
Peter


Re: Splitting strings in Java - how to escape delimiter characters?

2014-01-14 Thread Shawn Heisey

On 1/14/2014 8:20 AM, Steve Rowe wrote:

Solrj’s StrUtils.splitSmart() should do exactly what you want - in the first 
pass, split on semicolon and don’t decode backslash escaping, and then in the 
inner loop, use the same method to split on colons and decode backslash 
escaping.  I think :).


Thank you, Yonik and Steve! This seems to work perfectly. Here's what I 
did.  Naturally the whole thing is in a try/catch:


http://apaste.info/4beg

Shawn



[SolR 3.0] Boost score by string position in field

2014-01-14 Thread Sébastien LAMAISON
Hi all,
I'm almost new to SolR, and I have to make a improvment on a existing project, 
but despite some hours of searching, I'm stuck.
We have an index containing products, which the user can search by reference, 
or name.By now, when the user make a search by product name, the score is the 
same for all products containing the search string in the name.
For example, if the search string is "TEST", the following products have the 
same score : - BLAHBLAH TEST BLAH- TEST BLAHBLAH- BLAHBLAHBLAHBLAHBLAHBLAH TEST
My question is : how can I make TEST BLAHBLAH have a better score than BLAHBLAH 
TEST BLAH have a better score than BLAHBLAHBLAHBLAHBLAHBLAH TEST if the user 
search "TEST" ?
Thanks in advance.
Seb   

Re: [SolR 3.0] Boost score by string position in field

2014-01-14 Thread Erick Erickson
It's usually a mistake to try to tune at this level. The tf/idf
calculations _already_ take into account the field length (measured in
tokens) when scoring. Matches on shorter fields add more to the score
than matches on longer fields, which seems to be what you're looking
for.

That said, the length of the field is encoded and loses some precision
in order to save space (although I believe if memory serves you can
use higher precision lately). So Solr/Lucene will tend to think most
fields with, say, less n tokens are all the same length

Very often, people spend lots of time chasing this down in artificial
test cases only to discover that when looking at real data, the
tweaking is unnecessary.

So I'm not really giving you any guidance to do what you asked, rather
I'm suggesting that you don't even try :)..

Best,
Erick

On Tue, Jan 14, 2014 at 12:11 PM, Sébastien LAMAISON  wrote:
> Hi all,
> I'm almost new to SolR, and I have to make a improvment on a existing 
> project, but despite some hours of searching, I'm stuck.
> We have an index containing products, which the user can search by reference, 
> or name.By now, when the user make a search by product name, the score is the 
> same for all products containing the search string in the name.
> For example, if the search string is "TEST", the following products have the 
> same score : - BLAHBLAH TEST BLAH- TEST BLAHBLAH- BLAHBLAHBLAHBLAHBLAHBLAH 
> TEST
> My question is : how can I make TEST BLAHBLAH have a better score than 
> BLAHBLAH TEST BLAH have a better score than BLAHBLAHBLAHBLAHBLAHBLAH TEST if 
> the user search "TEST" ?
> Thanks in advance.
> Seb


core.properties and solr.xml

2014-01-14 Thread Steven Bower
Are there any plans/tickets to allow for pluggable SolrConf and
CoreLocator? In my use case my solr.xml is totally static, i have a
separate dataDir and my core.properties are derived from a separate
configuration (living in ZK) but totally outside of the SolrCloud..

I'd like to be able to not have any instance directories and/or no solr.xml
or core.properties files laying around as right now I just regenerate them
on startup each time in my start scripts..

Obviously I can just hack my stuff in and clearly this could break the
write side of the collections API (which i don't care about for my case)...
but having a way to plug these would be nice..

steve


Re: question about DIH solr-data-config.xml and XML include

2014-01-14 Thread Bill Au
The problem is with the admin UI not following the XML include to find
entity so it found none.  DIH itself does support XML include as I can
issue the DIH commands via HTTP on the included entities successfully.

Bill


On Mon, Jan 13, 2014 at 8:03 PM, Shawn Heisey  wrote:

> On 1/13/2014 3:31 PM, Bill Au wrote:
>
>> But when I use XML include, the Entity pull-down in the Dataimport section
>> of the Solr admin UI is empty.  I know that happens when there is a syntax
>> error in solr-data-config.xml.  Does DIH supports XML include?  Also I am
>> not seeing any error message in the log even if I set log level to ALL.
>>  Is
>> there any way to get DIH to log what it thinks is wrong
>> solr-data-cofig.xml?
>>
>
> Paying it forward.  Someone on this mailing list helped me with this.  I
> have tested this DIH configand found that it works:
>
> 
> http://www.w3.org/2001/XInclude";>
>driver="com.mysql.jdbc.Driver"
> encoding="UTF-8"
> url="jdbc:mysql://${dih.request.dbHost}:3306/${dih.request.dbSchema}?
> zeroDateTimeBehavior=convertToNull"
> batchSize="-1"
> user="REDACTED"
> password="REDACTED"/>
>   
>   
>   
> 
>
> The xlmns:xi attribute in the outer tag makes it possible to use the
> xi:include syntax later.
>
> I make extensive use of this in my solrconfig.xml file. There's almost no
> actual config in that file, everything is included from other files.
>
> When you look at the config in the admin UI, you will not see the included
> text, you'll only see the xi:include tag.
>
> Thanks,
> Shawn
>
>


Re: Query time join with conditions

2014-01-14 Thread heaven
Can someone shed some light on this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-time-join-with-conditions-tp4108365p4111300.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: core.properties and solr.xml

2014-01-14 Thread Erick Erickson
The work done as part of "new style" solr.xml, particularly by
romsegeek should make this a lot easier. But no, there's no formal
support for such a thing.

There's also a desire to make ZK "the one source of truth" in Solr 5,
although that effort is in early stages.

Which is a long way of saying that I think this would be a good thing
to add. Currently there's no formal way to specify one though. We'd
have to give some thought as to what abstract methods are required.
The current "old style" and "new style" classes . There's also the
chicken-and-egg question; how does one specify the new class? This
seems like something that would be in a (very small) solr.xml or
specified as a sysprop. And knowing where to load the class from could
be "interesting".

A pluggable SolrConfig I think is a stickier wicket, it hasn't been
broken out into nice interfaces like coreslocator has been. And it's
used all over the place, passed in and recorded in constructors etc,
as well as being possibly unique for each core. There's been some talk
of sharing a single config object, and there's also talk about using
"config sets" that might address some of those concerns, but neither
one has gotten very far in 4x land.

FWIW,
Erick

On Tue, Jan 14, 2014 at 1:41 PM, Steven Bower  wrote:
> Are there any plans/tickets to allow for pluggable SolrConf and
> CoreLocator? In my use case my solr.xml is totally static, i have a
> separate dataDir and my core.properties are derived from a separate
> configuration (living in ZK) but totally outside of the SolrCloud..
>
> I'd like to be able to not have any instance directories and/or no solr.xml
> or core.properties files laying around as right now I just regenerate them
> on startup each time in my start scripts..
>
> Obviously I can just hack my stuff in and clearly this could break the
> write side of the collections API (which i don't care about for my case)...
> but having a way to plug these would be nice..
>
> steve


Re: leading wildcard characters

2014-01-14 Thread Peter Keegan
I created SOLR-5630.
Although WildcardQuery is much much faster now with AutomatonQuery, it can
still result in slow queries when used in multiple keywords. From my
testing, I think I will need to disable all WildcardQuerys and only allow
PrefixQuery.

Peter


On Sat, Jan 11, 2014 at 4:17 AM, Ahmet Arslan  wrote:

> Hi Peter,
>
> Yes you are correct. There is no way to disable it.
>
> Weird thing is javadoc says default is false but it is enabled by default
> in SolrQueryParserBase.
> boolean allowLeadingWildcard = true;
>
>
>
> http://search-lucene.com/jd/solr/solr-core/org/apache/solr/parser/SolrQueryParserBase.html#setAllowLeadingWildcard(boolean)
>
>
> There is an effort for making such (allowLeadingWilcard,fuzzyMinSim,
> fuzzyPrefixLength) properties configurable :
> https://issues.apache.org/jira/browse/SOLR-218
>
> But this one is somehow old. Since its description is stale, do you want
> to open a new one?
>
> Ahmet
>
>
> On Friday, January 10, 2014 6:12 PM, Peter Keegan 
> wrote:
> Removing ReversedWildcardFilterFactory  had no effect.
>
>
>
> On Fri, Jan 10, 2014 at 10:48 AM, Ahmet Arslan  wrote:
>
> > Hi Peter,
> >
> > Can you remove any occurrence of ReversedWildcardFilterFactory in
> > schema.xml? (even if you don't use it)
> >
> > Ahmet
> >
> >
> >
> > On Friday, January 10, 2014 3:34 PM, Peter Keegan <
> peterlkee...@gmail.com>
> > wrote:
> > How do you disable leading wildcards in 4.X? The setAllowLeadingWildcard
> > method is there in the parser, but nothing references the getter. Also,
> the
> > Edismax parser always enables it and provides no way to override.
> >
> > Thanks,
> > Peter
> >
> >
>
>


Re: core.properties and solr.xml

2014-01-14 Thread Alan Woodward
Hi Steve,

I think this is a great idea.  Currently the implementation of CoresLocator is 
picked depending on the type of solr.xml you have (new- vs old-style), but it 
should be easy enough to extend the new-style logic to optionally look up and 
instantiate a plugin implementation.

Core loading and new core creation is all done through the CL now, so as long 
as the plugin implemented all methods, it shouldn't break the Collections API 
either.

Do you want to open a JIRA?

Alan Woodward
www.flax.co.uk


On 14 Jan 2014, at 19:20, Erick Erickson wrote:

> The work done as part of "new style" solr.xml, particularly by
> romsegeek should make this a lot easier. But no, there's no formal
> support for such a thing.
> 
> There's also a desire to make ZK "the one source of truth" in Solr 5,
> although that effort is in early stages.
> 
> Which is a long way of saying that I think this would be a good thing
> to add. Currently there's no formal way to specify one though. We'd
> have to give some thought as to what abstract methods are required.
> The current "old style" and "new style" classes . There's also the
> chicken-and-egg question; how does one specify the new class? This
> seems like something that would be in a (very small) solr.xml or
> specified as a sysprop. And knowing where to load the class from could
> be "interesting".
> 
> A pluggable SolrConfig I think is a stickier wicket, it hasn't been
> broken out into nice interfaces like coreslocator has been. And it's
> used all over the place, passed in and recorded in constructors etc,
> as well as being possibly unique for each core. There's been some talk
> of sharing a single config object, and there's also talk about using
> "config sets" that might address some of those concerns, but neither
> one has gotten very far in 4x land.
> 
> FWIW,
> Erick
> 
> On Tue, Jan 14, 2014 at 1:41 PM, Steven Bower  wrote:
>> Are there any plans/tickets to allow for pluggable SolrConf and
>> CoreLocator? In my use case my solr.xml is totally static, i have a
>> separate dataDir and my core.properties are derived from a separate
>> configuration (living in ZK) but totally outside of the SolrCloud..
>> 
>> I'd like to be able to not have any instance directories and/or no solr.xml
>> or core.properties files laying around as right now I just regenerate them
>> on startup each time in my start scripts..
>> 
>> Obviously I can just hack my stuff in and clearly this could break the
>> write side of the collections API (which i don't care about for my case)...
>> but having a way to plug these would be nice..
>> 
>> steve



Re: Simple payloads example not working

2014-01-14 Thread Ahmet Arslan
Hi Michael
 
Did you re-index after you register your custom similarity?


Ahmet



On Tuesday, January 14, 2014 4:36 PM, michael.boom  wrote:
Hi Markus, 

Do you have any example/tutorials of your payloads in custom filter
implementation ?

I really want to get payloads working, in any way.
Thanks!



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111244.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Simple payloads example not working

2014-01-14 Thread michael.boom
Hi Ahmet,

Yes, I did, also tried various scenarios with the same outcome. I used the
stock example, with minimum customization ( custom similarity and query
parser ). 



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111324.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [SolR 3.0] Boost score by string position in field

2014-01-14 Thread Ahmet Arslan
Hi Sebastien,

I think you want to boost product names that start with query term, right? Or 
in other words, boost if query term occurs within the first N words of a 
document.

SpanFirstQuery seems an elegant way to do it. 
https://issues.apache.org/jira/browse/SOLR-3925

Alternatively one can add an artificial token at the beginning of text. 

someArtificialToken TEST BLAHBLAH 


and use a phrase query to boost : "someArtificialToken queryTerm"~N

Ahmet
On Tuesday, January 14, 2014 7:13 PM, Sébastien LAMAISON  
wrote:
Hi all,
I'm almost new to SolR, and I have to make a improvment on a existing project, 
but despite some hours of searching, I'm stuck.
We have an index containing products, which the user can search by reference, 
or name.By now, when the user make a search by product name, the score is the 
same for all products containing the search string in the name.
For example, if the search string is "TEST", the following products have the 
same score : - BLAHBLAH TEST BLAH- TEST BLAHBLAH- BLAHBLAHBLAHBLAHBLAHBLAH TEST
My question is : how can I make TEST BLAHBLAH have a better score than BLAHBLAH 
TEST BLAH have a better score than BLAHBLAHBLAHBLAHBLAHBLAH TEST if the user 
search "TEST" ?
Thanks in advance.
Seb                          


SolrCloud Result Grouping vs CollapsingQParserPlugin

2014-01-14 Thread Shamik Bandopadhyay
Hi,

  I'm planning to upgrade to Solr 4.6 to move from using Result Grouping to
CollapsingQParserPlugin. I'm currently using SolrCloud, couple of issues
with Result Grouping are :

1. Slow performance
2. Incorrect result count from ngroup

My understanding is that CollapsingQParserPlugin is aimed at addressing the
performance issue with Result Grouping. Based on the available
documentation, I'm not sure if CollapsingQParserPlugin addresses the result
count when the collapse field is spread across shards. The  Result Grouping
ngroup currently works if the groups are not distributed and confined to a
dedicated shard. Just wondering if this applies to CollapsingQParserPlugin
as well ? Will "" be
incorrect if the collapsed field is distributed ?

I'll really appreciate if someone can provide pointers on this.

Thanks,
Shamik


Re: dataimport.properties files

2014-01-14 Thread samsolr
It's last_index_time which is written after data import is finished
successfully. In case of an error, the file is unchanged and nothing is
updated.



-
Sumit Arora
--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-properties-files-tp484p4111332.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Result Grouping vs CollapsingQParserPlugin

2014-01-14 Thread Joel Bernstein
Shamik,

You still need to keep docs in the same group on the same shard with the
CollapsingQParserPlugin. If you use the group id as the shard-key with
SolrCloud's automatic document routing (
http://searchhub.org/2013/06/13/solr-cloud-document-routing/), the groups
will automatically end up on the same shard.




Joel Bernstein
Search Engineer at Heliosearch


On Tue, Jan 14, 2014 at 6:17 PM, Shamik Bandopadhyay wrote:

> Hi,
>
>   I'm planning to upgrade to Solr 4.6 to move from using Result Grouping to
> CollapsingQParserPlugin. I'm currently using SolrCloud, couple of issues
> with Result Grouping are :
>
> 1. Slow performance
> 2. Incorrect result count from ngroup
>
> My understanding is that CollapsingQParserPlugin is aimed at addressing the
> performance issue with Result Grouping. Based on the available
> documentation, I'm not sure if CollapsingQParserPlugin addresses the result
> count when the collapse field is spread across shards. The  Result Grouping
> ngroup currently works if the groups are not distributed and confined to a
> dedicated shard. Just wondering if this applies to CollapsingQParserPlugin
> as well ? Will "" be
> incorrect if the collapsed field is distributed ?
>
> I'll really appreciate if someone can provide pointers on this.
>
> Thanks,
> Shamik
>


Re: SolrCloud Result Grouping vs CollapsingQParserPlugin

2014-01-14 Thread Joel Bernstein
Also, there are a number of bugs in the CollapsingQParserPlugin in Solr 4.6
that are resolved in Solr 4.6.1 which should be out soon.

Joel Bernstein
Search Engineer at Heliosearch


On Tue, Jan 14, 2014 at 10:00 PM, Joel Bernstein  wrote:

> Shamik,
>
> You still need to keep docs in the same group on the same shard with the
> CollapsingQParserPlugin. If you use the group id as the shard-key with
> SolrCloud's automatic document routing (
> http://searchhub.org/2013/06/13/solr-cloud-document-routing/), the groups
> will automatically end up on the same shard.
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Tue, Jan 14, 2014 at 6:17 PM, Shamik Bandopadhyay wrote:
>
>> Hi,
>>
>>   I'm planning to upgrade to Solr 4.6 to move from using Result Grouping
>> to
>> CollapsingQParserPlugin. I'm currently using SolrCloud, couple of issues
>> with Result Grouping are :
>>
>> 1. Slow performance
>> 2. Incorrect result count from ngroup
>>
>> My understanding is that CollapsingQParserPlugin is aimed at addressing
>> the
>> performance issue with Result Grouping. Based on the available
>> documentation, I'm not sure if CollapsingQParserPlugin addresses the
>> result
>> count when the collapse field is spread across shards. The  Result
>> Grouping
>> ngroup currently works if the groups are not distributed and confined to a
>> dedicated shard. Just wondering if this applies to CollapsingQParserPlugin
>> as well ? Will "" be
>> incorrect if the collapsed field is distributed ?
>>
>> I'll really appreciate if someone can provide pointers on this.
>>
>> Thanks,
>> Shamik
>>
>
>


Re: Questionon CollapsingQParserPlugin

2014-01-14 Thread Joel Bernstein
Something is off but I'm not sure what. A couple of questions.

1) You mention updating the solr.xml. Did you the schema.xml?
2) Did you load only those 4 docs?

Joel

Joel Bernstein
Search Engineer at Heliosearch


On Mon, Jan 13, 2014 at 4:21 PM, Shamik Bandopadhyay wrote:

> Hi,
>
>   I'm looking for some clarification on CollapsingQParserPlugin feature.
>
> Here's what I tried. I downloaded 4.6, updated "solr.xml" under exampledocs
> folder and added the following entry. I've added a new field "adskdedup"
> on which I'm planning to test field collapsing. As you can see, out of four
> documents, three have similar adskdedup values while the last one is
> different.
>
> 
>   SOLR1000
>   Solr, the Enterprise Search Server
>   0
>   10
>   true
>   2006-01-17T00:00:00.000Z
>   ABCD-XYZ
> 
> 
>   SOLR1001
>   Solr, the Enterprise Search Server
>   0
>   10
>   true
>   2006-01-17T00:00:00.000Z
>   ABCD-XYZ
> 
> 
>   SOLR1002
>   Solr, the Enterprise Search Server
>   0
>   10
>   true
>   2006-01-17T00:00:00.000Z
>   ABCD-XYZ
> 
> 
>   SOLR1003
>   Solr, the Enterprise Search Server
>   0
>   10
>   true
>   2006-01-17T00:00:00.000Z
>   MNOP-QRS
> 
>
> Here's my query :
>
>
> http://localhost:8983/solr/collection1/select?q=solr&wt=xml&fq={!collapse%20field=adskdedup}
>
> Based on my understanding of using group by, I was expecting couple of
> results from the query. One with id=SOLR1000 and the second with
> id=SOLR1003. Instead, its returning only 1 result based on the field
> collapsing, i.e. id=SOLR1000.
>
> Am I missing something here ?
>
> Any pointer will be appreciated.
>
> -Thanks
>


Re: Questionon CollapsingQParserPlugin

2014-01-14 Thread Joel Bernstein
I just did a quick test with the 4 docs and got the proper result.
All I did was change the adskdedup field to adskdedup_s so it would load as
a dynamic string field. You can see the output below.

Can you provide more details on the exact steps you took?

{
  "responseHeader":{
"status":0,
"QTime":24,
"params":{
  "indent":"true",
  "q":"*:*",
  "wt":"json",
  "fq":"{!collapse field=adskdedup_s}"}},
  "response":{"numFound":2,"start":0,"docs":[
  {
"id":"SOLR1000",
"name":"Solr, the Enterprise Search Server",
"price":0.0,
"price_c":"0,USD",
"popularity":10,
"inStock":true,
"incubationdate_dt":"2006-01-17T00:00:00Z",
"adskdedup_s":"ABCD-XYZ",
"_version_":1457264913719230464},
  {
"id":"SOLR1003",
"name":"Solr, the Enterprise Search Server",
"price":0.0,
"price_c":"0,USD",
"popularity":10,
"inStock":true,
"incubationdate_dt":"2006-01-17T00:00:00Z",
"adskdedup_s":"MNOP-QRS",
"_version_":1457264913752784896}]
  }}


Joel Bernstein
Search Engineer at Heliosearch


On Tue, Jan 14, 2014 at 10:10 PM, Joel Bernstein  wrote:

> Something is off but I'm not sure what. A couple of questions.
>
> 1) You mention updating the solr.xml. Did you the schema.xml?
> 2) Did you load only those 4 docs?
>
> Joel
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Mon, Jan 13, 2014 at 4:21 PM, Shamik Bandopadhyay wrote:
>
>> Hi,
>>
>>   I'm looking for some clarification on CollapsingQParserPlugin feature.
>>
>> Here's what I tried. I downloaded 4.6, updated "solr.xml" under
>> exampledocs
>> folder and added the following entry. I've added a new field "adskdedup"
>> on which I'm planning to test field collapsing. As you can see, out of
>> four
>> documents, three have similar adskdedup values while the last one is
>> different.
>>
>> 
>>   SOLR1000
>>   Solr, the Enterprise Search Server
>>   0
>>   10
>>   true
>>   2006-01-17T00:00:00.000Z
>>   ABCD-XYZ
>> 
>> 
>>   SOLR1001
>>   Solr, the Enterprise Search Server
>>   0
>>   10
>>   true
>>   2006-01-17T00:00:00.000Z
>>   ABCD-XYZ
>> 
>> 
>>   SOLR1002
>>   Solr, the Enterprise Search Server
>>   0
>>   10
>>   true
>>   2006-01-17T00:00:00.000Z
>>   ABCD-XYZ
>> 
>> 
>>   SOLR1003
>>   Solr, the Enterprise Search Server
>>   0
>>   10
>>   true
>>   2006-01-17T00:00:00.000Z
>>   MNOP-QRS
>> 
>>
>> Here's my query :
>>
>>
>> http://localhost:8983/solr/collection1/select?q=solr&wt=xml&fq={!collapse%20field=adskdedup}
>>
>> Based on my understanding of using group by, I was expecting couple of
>> results from the query. One with id=SOLR1000 and the second with
>> id=SOLR1003. Instead, its returning only 1 result based on the field
>> collapsing, i.e. id=SOLR1000.
>>
>> Am I missing something here ?
>>
>> Any pointer will be appreciated.
>>
>> -Thanks
>>
>
>


Re: Questionon CollapsingQParserPlugin

2014-01-14 Thread Joel Bernstein
Just tried it with q=solr as well:

{
  "responseHeader":{
"status":0,
"QTime":1,
"params":{
  "indent":"true",
  "q":"solr",
  "wt":"json",
  "fq":"{!collapse field=adskdedup_s}"}},
  "response":{"numFound":2,"start":0,"docs":[
  {
"id":"SOLR1000",
"name":"Solr, the Enterprise Search Server",
"price":0.0,
"price_c":"0,USD",
"popularity":10,
"inStock":true,
"incubationdate_dt":"2006-01-17T00:00:00Z",
"adskdedup_s":"ABCD-XYZ",
"_version_":1457264913719230464},
  {
"id":"SOLR1003",
"name":"Solr, the Enterprise Search Server",
"price":0.0,
"price_c":"0,USD",
"popularity":10,
"inStock":true,
"incubationdate_dt":"2006-01-17T00:00:00Z",
"adskdedup_s":"MNOP-QRS",
"_version_":1457264913752784896}]
  }}


Joel Bernstein
Search Engineer at Heliosearch


On Tue, Jan 14, 2014 at 10:25 PM, Joel Bernstein  wrote:

> I just did a quick test with the 4 docs and got the proper result.
> All I did was change the adskdedup field to adskdedup_s so it would load
> as a dynamic string field. You can see the output below.
>
> Can you provide more details on the exact steps you took?
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":24,
> "params":{
>   "indent":"true",
>   "q":"*:*",
>   "wt":"json",
>   "fq":"{!collapse field=adskdedup_s}"}},
>   "response":{"numFound":2,"start":0,"docs":[
>   {
> "id":"SOLR1000",
> "name":"Solr, the Enterprise Search Server",
> "price":0.0,
> "price_c":"0,USD",
> "popularity":10,
> "inStock":true,
> "incubationdate_dt":"2006-01-17T00:00:00Z",
> "adskdedup_s":"ABCD-XYZ",
> "_version_":1457264913719230464},
>   {
> "id":"SOLR1003",
> "name":"Solr, the Enterprise Search Server",
> "price":0.0,
> "price_c":"0,USD",
> "popularity":10,
> "inStock":true,
> "incubationdate_dt":"2006-01-17T00:00:00Z",
> "adskdedup_s":"MNOP-QRS",
> "_version_":1457264913752784896}]
>   }}
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Tue, Jan 14, 2014 at 10:10 PM, Joel Bernstein wrote:
>
>> Something is off but I'm not sure what. A couple of questions.
>>
>> 1) You mention updating the solr.xml. Did you the schema.xml?
>> 2) Did you load only those 4 docs?
>>
>> Joel
>>
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>>
>> On Mon, Jan 13, 2014 at 4:21 PM, Shamik Bandopadhyay 
>> wrote:
>>
>>> Hi,
>>>
>>>   I'm looking for some clarification on CollapsingQParserPlugin feature.
>>>
>>> Here's what I tried. I downloaded 4.6, updated "solr.xml" under
>>> exampledocs
>>> folder and added the following entry. I've added a new field "adskdedup"
>>> on which I'm planning to test field collapsing. As you can see, out of
>>> four
>>> documents, three have similar adskdedup values while the last one is
>>> different.
>>>
>>> 
>>>   SOLR1000
>>>   Solr, the Enterprise Search Server
>>>   0
>>>   10
>>>   true
>>>   2006-01-17T00:00:00.000Z
>>>   ABCD-XYZ
>>> 
>>> 
>>>   SOLR1001
>>>   Solr, the Enterprise Search Server
>>>   0
>>>   10
>>>   true
>>>   2006-01-17T00:00:00.000Z
>>>   ABCD-XYZ
>>> 
>>> 
>>>   SOLR1002
>>>   Solr, the Enterprise Search Server
>>>   0
>>>   10
>>>   true
>>>   2006-01-17T00:00:00.000Z
>>>   ABCD-XYZ
>>> 
>>> 
>>>   SOLR1003
>>>   Solr, the Enterprise Search Server
>>>   0
>>>   10
>>>   true
>>>   2006-01-17T00:00:00.000Z
>>>   MNOP-QRS
>>> 
>>>
>>> Here's my query :
>>>
>>>
>>> http://localhost:8983/solr/collection1/select?q=solr&wt=xml&fq={!collapse%20field=adskdedup}
>>>
>>> Based on my understanding of using group by, I was expecting couple of
>>> results from the query. One with id=SOLR1000 and the second with
>>> id=SOLR1003. Instead, its returning only 1 result based on the field
>>> collapsing, i.e. id=SOLR1000.
>>>
>>> Am I missing something here ?
>>>
>>> Any pointer will be appreciated.
>>>
>>> -Thanks
>>>
>>
>>
>


Re: Index size - to determine storage

2014-01-14 Thread Sumit Arora
Hi Amit,

This excel sheet will help you estimating the index size.

size-estimator-lucene-solr.xls

  




-
Sumit Arora
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-size-to-determine-storage-tp4110522p4111365.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query time join with conditions

2014-01-14 Thread Kranti Parisa
you should be able to do the following
/ProfileCore/select?q=*:*&fq={!join fromIndex=RssCore from=profile_id to=id
v=$rssQuery}&rssQuery=(type:'RssEntry')

There is also a new join impl
https://issues.apache.org/jira/browse/SOLR-4787 which allows you to use fq
within join, which will support Nested Joins and obviously hit filter cache.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Tue, Jan 14, 2014 at 2:20 PM, heaven  wrote:

> Can someone shed some light on this?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Query-time-join-with-conditions-tp4108365p4111300.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: background merge hit exception while optimizing index (SOLR 4.4.0)

2014-01-14 Thread Ralf Matulat

It becomes just more spooky.

The optimize-run this night was succesful.
Yesterday I did two things:

1. Checked the index without any result (no problems found).
2. I did an expungeDelete on the mentioned index.

So I have no idea what is going on here.
Btw: Java version is still the old 1.6.0. from 2008.

Best regards
Ralf

Am 13.01.14 21:15, schrieb Michael McCandless:

I have trouble understanding J9's version strings ... but, is it
really from 2008?  You could be hitting a JVM bug; can you test
upgrading?

I don't have much experience with Solr faceting on optimized vs
unoptimized indices; maybe someone else can answer your question.

Lucene's facet module (not yet exposed through Solr) performance
shouldn't change much for optimized vs unoptimized indices.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jan 13, 2014 at 10:09 AM, Ralf Matulat
 wrote:

java -version

java version "1.6.0"
Java(TM) SE Runtime Environment (build
pxa6460sr3ifix-20090218_02(SR3+IZ43791+IZ43798))
IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Linux amd64-64
jvmxa6460-20081105_25433 (JIT enabled, AOT enabled)
J9VM - 20081105_025433_LHdSMr
JIT  - r9_20081031_1330
GC   - 20081027_AB)
JCL  - 20090218_01

A question regarding to optimizing the index:
As of SOLR 3.X we encountered massive performance improvements with facettet
queries after optimizing an index. So we once started optimizing the indexes
on a daily basis.
With SOLR 4.X and the new index-format that is not true anymore?

Btw: The checkIndex failed with 'java.io.FileNotFoundException:', I guess
because I did not stopped the tomcat while checking. So SOLR created, merged
and deleted some segments while checking. I will restart the check after
stoppimg SOLR.

Kind regards
Ralf Matulat




Which version of Java are you using?

That root cause exception is somewhat spooky: it's in the
ByteBufferIndexCode that handles an "UnderflowException", ie when a
small (maybe a few hundred bytes) read happens to span the 1 GB page
boundary, and specifically the exception happens on the final read
(curBuf.get(b, offset, len)).  Such page-spanning reads are very rare.

The code looks fine to me though, and it's hard to explain how NPE (b
= null) could happen: that byte array is allocated in the
Lucene41PostingsReader.BlockDocsAndPositionsEnum class's ctor: encoded
= new byte[MAX_ENCODED_SIZE].

Separately, you really should not have to optimize daily, if ever.

Mike McCandless

http://blog.mikemccandless.com






--
Ralf Matulat
Deutscher Bundestag
Platz der Republik 1
11011 Berlin
Referat IT 1 - Anwendungsadministration
ralf.matu...@bundestag.de
Tel.: 030 - 227 34260



Re: SolrCloud Result Grouping vs CollapsingQParserPlugin

2014-01-14 Thread shamik
Joel,

  Thanks for the pointer. I went through your blog on Document routing, very
informative. I do need some clarifications on the implementation. I'll try
to run it based on my use case. 

I'm indexing documents from multiple source system out of which a bunch
consist of duplicate content. I'm trying to remove them by applying result
grouping / CollapsingQParserPlugin. For e.g. lets say I've source ABC, MNO
and XYZ. Now, ABC and MNO source contains the duplicate documents, which is
identified by a field say adskdedup. I've couple of shards, the id being the
url of the documents. Now, to make field collapsing work, I need to update
the id field to include "adskdedup!url" . Documents having identical
adskdedup values should route to a dedicated shard , e.g. shard1. The ones
which are not identical will be routed to either Shard1 or Shard2. After the
indexing is done, shard1 should have all documents on which grouping needs
to be applied upon.

During query time, depending on the query, results can be returned from both
shards. For e.g. a query
q=solr&group=true&group.field=adskdedup&group.ngroups=true would ideally
return data from both shards and apply the grouping on shard1 based on
adskdedup field. This will also ensure that group.ngroups=true will return
the right count.

The other clarification I wanted was based on this statement : "When a
tenant is too large to fit on a single shard it can be spread across
multiple shards be specifying the number of bits to use from the shard key."
If we split shards, will Result Grouping / CollapsingQParserPlugin and
number of results still work ?

Last but not the least, when are you planning to release 4.6.1 ?

Again, appreciate your help on this.

- Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-tp4111331p4111375.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Questionon CollapsingQParserPlugin

2014-01-14 Thread shamik
Thanks Joel, I found the issue. It had to do with the schema definition for
adskdedup field. I had defined it as a text_general which was analyzing it
based on "-". After I changed it to type string, it worked as expected.
Thanks for looking into this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-Questionon-CollapsingQParserPlugin-tp4111357p4111376.html
Sent from the Solr - User mailing list archive at Nabble.com.