Re: Solrj problem

2014-05-17 Thread Shawn Heisey
On 5/7/2014 7:41 AM, blach wrote:
> according to this : https://issues.apache.org/jira/browse/SOLR-5590
> 
> I understand that solrj is still depends on the old httpclient shipped with
> android tools, and this is my problem too. KARL has made an patch, could you
> please explain what that patch for, 

I committed this patch to Solr.  It upgraded the HttpClient jars used in
Solr to 4.3.1.

This change reached users with the 4.7.0 release.  No code changes were
required -- Solr and SolrJ were already compatible with the newer
HttpClient version.  It should also work with HttpClient 4.2.x, as well
as the newest release.

Thanks,
Shawn



Re: Solr Commiter

2014-05-17 Thread Shawn Heisey
On 5/15/2014 6:10 AM, Mukundaraman valakumaresan wrote:
> How to become a solr committer? Any suggestions?

For me, this question has personal relevance.

In 2010, I began to integrate Solr into our environment.  I joined the
mailing list, asked questions, stumbled around quite a lot.  Eventually
I got my install working very well, and I discovered that when others
would ask questions, I sometimes knew the answer, so I started answering
a lot more questions than I asked.

Eventually, I also joined the dev list, began to learn Java, and started
contributing patches, mostly to issues that I would file myself, but
sometimes for other issues.  My name ended up in the CHANGES.txt more
than once.  A little over a year ago, the Lucene PMC asked me to become
a committer.  I was not pursuing this as a goal, so it was completely
unexpected.  I accepted the offer.

My advice would be to put some serious time and effort into making Solr
better.  As the following wiki page says, this involves a lot more than
writing code.

http://wiki.apache.org/solr/HowToContribute

Thanks,
Shawn



Solr 4.8: Does eDisMax parser calls analyzer chain to tokenize?

2014-05-17 Thread Alexandre Rafalovitch
Hello,

I am getting weird results that seem to come from eDisMax using
analyzer chain to break the input text. I have
WordDelimiterFilterFactory in my chain, which does a lot of
interesting things I did not expect query parser to be involved in.

Specifically, the string "abc123XYZ" gets split into 3 components on
digits and gets lowercased as well. I thought all that was happening
later, inside individual fields.

All documentation talks about query parsers splitting on space, so I
don't know where this "full chain" business is coming from. Or maybe I
am misunderstanding which phase debug output is from.

Here is the field definition:








  

  





And here is the debug output:
http://localhost:9000/solr/collection1/select?q=hello+big+world+abc123XYZ&wt=json&indent=true&debugQuery=true&defType=edismax&qf=wdText+wsText&stopwords=true&lowercaseOperators=true

   "rawquerystring":"hello big world abc123XYZ",
"querystring":"hello big world abc123XYZ",
"parsedquery":"(+(DisjunctionMaxQuery((wdText:hello |
wsText:hello)) DisjunctionMaxQuery((wdText:big | wsText:big))
DisjunctionMaxQuery((wdText:world | wsText:world))
DisjunctionMaxQuerywdText:abc123xyz wdText:abc) wdText:123
wdText:xyz) | wsText:abc123XYZ/no_coord",
"parsedquery_toString":"+((wdText:hello | wsText:hello)
(wdText:big | wsText:big) (wdText:world | wsText:world)
(((wdText:abc123xyz wdText:abc) wdText:123 wdText:xyz) |
wsText:abc123XYZ))",

Or, and enabling phrase search on the field type, gets even more
weird. But one problem at a time.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


Re: retreive all the fields in join

2014-05-17 Thread Ahmet Arslan



Hi Karanti,

I was thinking the same. DocTransformer looks a good candidate for such 
implementation. What do you think? May be we can implement this and contribute 
back?




On Saturday, May 17, 2014 9:13 AM, Kranti Parisa  
wrote:
Aman,

The option you have got is:
- write custom components like request handlers, collectors & response
writers..
- first you would do the join, then apply the pagination
- you will get the docList in response writer, you would need to make a
call to the second core (you could be smart to use the FQs so that you
could hit the cache and hence the second call will be fast) and fetch the
documents
- use them for building the response

out of the box Solr won't do this for you..

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa




On Mon, May 12, 2014 at 7:05 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> On Sun, May 11, 2014 at 12:14 PM, Aman Tandon  >wrote:
>
> > Is it possible?
>
>
> no.
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>



Re: What is the usage of solr.NumericPayloadTokenFilterFactory

2014-05-17 Thread Ahmet Arslan
Hi,


Payloads are used to store arbitrary data along with terms. You can influence 
score with these arbitrary data.
See : http://sujitpal.blogspot.com.tr/2013/07/porting-payloads-to-solr4.html

But remember that there is an ongoing work to nuke Spans.

Ahmet



On Saturday, May 17, 2014 8:24 AM, ienjreny  wrote:
Regarding to your question: "That said, are you sure you want to be using
the payload feature of Lucene? "

I don't know because I don't know what is the benefits from this tokenizer,
and what Payload means here!


On Sat, May 17, 2014 at 2:45 AM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4136467...@n3.nabble.com> wrote:

> I do have basic coverage for that filter (and all other filters) and the
> parameter values in my e-book:
>
>
> http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
>
> That said, are you sure you want to be using the payload feature of
> Lucene?
>
> -- Jack Krupansky
>
> -Original Message-
> From: ienjreny
> Sent: Monday, May 12, 2014 12:51 PM
> To: [hidden email] 
> Subject: What is the usage of solr.NumericPayloadTokenFilterFactory
>
> Dears:
> Can any body explain at easy way what is the benefits of
> solr.NumericPayloadTokenFilterFactory and what is acceptable values for
> typeMatch
>
> Thanks in advance
>
>
>
> --
> View this message in context:
>
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136467.html
>  To unsubscribe from What is the usage of
> solr.NumericPayloadTokenFilterFactory, click 
> here
> .
> NAML
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136597.html

Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance: multiValued filed vs separate fields

2014-05-17 Thread Yonik Seeley
On Thu, May 15, 2014 at 10:29 AM, danny teichthal  wrote:
> I wonder about performance difference of 2 indexing options: 1- multivalued
> field 2- separate fields
>
> The case is as follows: Each document has 100 “properties”: prop1..prop100.
> The values are strings and there is no relation between different
> properties. I would like to search by exact match on several properties by
> known values (like ids). For example: search for all docs having
> prop1=”blue” and prop6=”high”
>
> I can choose to build the indexes in 1 of 2 ways: 1- the trivial way – 100
> separate fields, 1 for each property, multiValued=false. the values are
> just property values. 2- 1 field (named “properties”) multiValued=true. The
> field will have 100 values: value1=”prop1:blue”.. value6=”high” etc
>
> Is it correct to say that option1 will have much better performance in
> searching?  How about indexing performance?

For straight exact-match searching (matching properties) there should
be no difference.  A single field should be slightly faster at
indexing.

If you need fast numeric range queries, faceting, or sortong on any
properties, you would want those as separate fields.

-Yonik
http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache


Re: Solr 4.8: Does eDisMax parser calls analyzer chain to tokenize?

2014-05-17 Thread Michael Sokolov
Alex - the query parsers generally accept an analyzer, which they must 
apply after they perform their own tokenization.  Consider: how would a 
capitalized query term match lower-cased terms in the index without 
query analysis?


-Mike

On 5/17/2014 4:05 AM, Alexandre Rafalovitch wrote:

Hello,

I am getting weird results that seem to come from eDisMax using
analyzer chain to break the input text. I have
WordDelimiterFilterFactory in my chain, which does a lot of
interesting things I did not expect query parser to be involved in.

Specifically, the string "abc123XYZ" gets split into 3 components on
digits and gets lowercased as well. I thought all that was happening
later, inside individual fields.

All documentation talks about query parsers splitting on space, so I
don't know where this "full chain" business is coming from. Or maybe I
am misunderstanding which phase debug output is from.

Here is the field definition:
 
 
 
 
 
 
 
 
   
 
   
 

 
 

And here is the debug output:
http://localhost:9000/solr/collection1/select?q=hello+big+world+abc123XYZ&wt=json&indent=true&debugQuery=true&defType=edismax&qf=wdText+wsText&stopwords=true&lowercaseOperators=true

"rawquerystring":"hello big world abc123XYZ",
 "querystring":"hello big world abc123XYZ",
 "parsedquery":"(+(DisjunctionMaxQuery((wdText:hello |
wsText:hello)) DisjunctionMaxQuery((wdText:big | wsText:big))
DisjunctionMaxQuery((wdText:world | wsText:world))
DisjunctionMaxQuerywdText:abc123xyz wdText:abc) wdText:123
wdText:xyz) | wsText:abc123XYZ/no_coord",
 "parsedquery_toString":"+((wdText:hello | wsText:hello)
(wdText:big | wsText:big) (wdText:world | wsText:world)
(((wdText:abc123xyz wdText:abc) wdText:123 wdText:xyz) |
wsText:abc123XYZ))",

Or, and enabling phrase search on the field type, gets even more
weird. But one problem at a time.

Regards,
Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency




Re: What is the usage of solr.NumericPayloadTokenFilterFactory

2014-05-17 Thread Ahmet Arslan
Hi,

I forget to include Grant's write-up: 
http://searchhub.org/2009/08/05/getting-started-with-payloads/





On Saturday, May 17, 2014 3:53 PM, Ahmet Arslan  wrote:
Hi,


Payloads are used to store arbitrary data along with terms. You can influence 
score with these arbitrary data.
See : http://sujitpal.blogspot.com.tr/2013/07/porting-payloads-to-solr4.html

But remember that there is an ongoing work to nuke Spans.

Ahmet




On Saturday, May 17, 2014 8:24 AM, ienjreny  wrote:
Regarding to your question: "That said, are you sure you want to be using
the payload feature of Lucene? "

I don't know because I don't know what is the benefits from this tokenizer,
and what Payload means here!


On Sat, May 17, 2014 at 2:45 AM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4136467...@n3.nabble.com> wrote:

> I do have basic coverage for that filter (and all other filters) and the
> parameter values in my e-book:
>
>
> http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
>
> That said, are you sure you want to be using the payload feature of
> Lucene?
>
> -- Jack Krupansky
>
> -Original Message-
> From: ienjreny
> Sent: Monday, May 12, 2014 12:51 PM
> To: [hidden email] 
> Subject: What is the usage of solr.NumericPayloadTokenFilterFactory
>
> Dears:
> Can any body explain at easy way what is the benefits of
> solr.NumericPayloadTokenFilterFactory and what is acceptable values for
> typeMatch
>
> Thanks in advance
>
>
>
> --
> View this message in context:
>
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136467.html
>  To unsubscribe from What is the usage of
> solr.NumericPayloadTokenFilterFactory, click 
> here
> .
> NAML
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136597.html

Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is the usage of solr.NumericPayloadTokenFilterFactory

2014-05-17 Thread Roman Chyla
Hi, What will replace spans, if spans are nuked ?
Roman
On 17 May 2014 09:15, "Ahmet Arslan"  wrote:

> Hi,
>
>
> Payloads are used to store arbitrary data along with terms. You can
> influence score with these arbitrary data.
> See :
> http://sujitpal.blogspot.com.tr/2013/07/porting-payloads-to-solr4.html
>
> But remember that there is an ongoing work to nuke Spans.
>
> Ahmet
>
>
>
> On Saturday, May 17, 2014 8:24 AM, ienjreny 
> wrote:
> Regarding to your question: "That said, are you sure you want to be using
> the payload feature of Lucene? "
>
> I don't know because I don't know what is the benefits from this tokenizer,
> and what Payload means here!
>
>
> On Sat, May 17, 2014 at 2:45 AM, Jack Krupansky-2 [via Lucene] <
> ml-node+s472066n4136467...@n3.nabble.com> wrote:
>
> > I do have basic coverage for that filter (and all other filters) and the
> > parameter values in my e-book:
> >
> >
> >
> http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
> >
> > That said, are you sure you want to be using the payload feature of
> > Lucene?
> >
> > -- Jack Krupansky
> >
> > -Original Message-
> > From: ienjreny
> > Sent: Monday, May 12, 2014 12:51 PM
> > To: [hidden email]  >
> > Subject: What is the usage of solr.NumericPayloadTokenFilterFactory
> >
> > Dears:
> > Can any body explain at easy way what is the benefits of
> > solr.NumericPayloadTokenFilterFactory and what is acceptable values for
> > typeMatch
> >
> > Thanks in advance
> >
> >
> >
> > --
> > View this message in context:
> >
> >
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
> > --
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136467.html
> >  To unsubscribe from What is the usage of
> > solr.NumericPayloadTokenFilterFactory, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4135326&code=aXNtYWVlbC5lbmpyZW55QGdtYWlsLmNvbXw0MTM1MzI2fC01NTkxMjYzODg=
> >
> > .
> > NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136597.html
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: deep paging without sorting / keep IRs open

2014-05-17 Thread Yonik Seeley
On Wed, May 14, 2014 at 8:34 AM, Tommaso Teofili
 wrote:
> Basically I need the ability to keep running searches against a specified
> commit point / index reader / state of the Lucene / Solr index.

I think searcher leases would fit the bill here?
https://issues.apache.org/jira/browse/SOLR-2809

Not yet implemented though...

-Yonik
http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache


Re: What is the usage of solr.NumericPayloadTokenFilterFactory

2014-05-17 Thread Jack Krupansky
I hate to say this, but if you have to ask, then it is highly likely that 
the feature is inappropriate for you.


It may in fact be true that Payload is precisely the feature you need, but 
Solr support for this Lucene feature is rather limited, so you may have to 
do a lot of work on your own to use the Payload feature.


So, let me revise my question to: What application requirement led you to 
suspect that use of the Lucene payload feature might be beneficial? There 
are likely to be other approaches to meeting your application requirements 
rather than Payload.


All of that said, it sure would be nice to see more substantial and easier 
to use support for Payload in Solr.


-- Jack Krupansky

-Original Message- 
From: ienjreny

Sent: Saturday, May 17, 2014 1:24 AM
To: solr-user@lucene.apache.org
Subject: Re: What is the usage of solr.NumericPayloadTokenFilterFactory

Regarding to your question: "That said, are you sure you want to be using
the payload feature of Lucene? "

I don't know because I don't know what is the benefits from this tokenizer,
and what Payload means here!


On Sat, May 17, 2014 at 2:45 AM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4136467...@n3.nabble.com> wrote:


I do have basic coverage for that filter (and all other filters) and the
parameter values in my e-book:


http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

That said, are you sure you want to be using the payload feature of
Lucene?

-- Jack Krupansky

-Original Message-
From: ienjreny
Sent: Monday, May 12, 2014 12:51 PM
To: [hidden email] 
Subject: What is the usage of solr.NumericPayloadTokenFilterFactory

Dears:
Can any body explain at easy way what is the benefits of
solr.NumericPayloadTokenFilterFactory and what is acceptable values for
typeMatch

Thanks in advance



--
View this message in context:

http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
 If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136467.html
 To unsubscribe from What is the usage of
solr.NumericPayloadTokenFilterFactory, click 
here

.
NAML






--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136597.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: deep paging without sorting / keep IRs open

2014-05-17 Thread Yonik Seeley
On Sat, May 17, 2014 at 10:30 AM, Yonik Seeley  wrote:
> I think searcher leases would fit the bill here?
> https://issues.apache.org/jira/browse/SOLR-2809
>
> Not yet implemented though...

FYI, I just put up a simple LeaseManager implementation on that issue.

-Yonik
http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache


RE: Question regarding the lastest version of HeliosSearch

2014-05-17 Thread Jean-Sebastien Vachon
Thanks for the information Yonik. 

> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: May-16-14 8:52 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Question regarding the lastest version of HeliosSearch
> 
> On Thu, May 15, 2014 at 3:44 PM, Jean-Sebastien Vachon  sebastien.vac...@wantedanalytics.com> wrote:
> > I spent some time today playing around with subfacets and facets functions
> now available in helios search 0.05 and I have some concerns... They look
> very promising .
> 
> Thanks, glad for the feedback!
> 
> [...]
> > the response looks good except for one little thing... the mincount is not
> respected whenever I specify the facet.stat parameter. Removing it will
> cause the mincount to be respected but then I need this parameter.
> 
> Right, the mincount parameter is not yet implemented.   Hopefully soon!
> 
> > {
> >
> >   "val":1133,
> >
> >   "unique(job_id)":0, <== what is this?
> >
> >   "count":0},
> >  Many zero entries following...
> >
> > I was wondering where the extra entries were coming from... the
> > position_id = 1133 above is not even a match for my query (its title is 
> > "Audit
> Consultant") I`ve also noticed a similar behaviour when using subfacets. It
> looks like the number of items returned always match the "facet.limit"
> parameter.
> > If not enough values are present for a given entry then the bucket is filled
> with documents not matching the original query.
> 
> Right... straight Solr faceting will do this too (unless you have a
> mincount>0).  We're just looking at terms in the field and we don't
> have enough context to know if some 0's make more sense than others to
> return.
> 
> -Yonik
> http://heliosearch.org - facet functions, subfacets, off-heap
> filters&fieldcache
> 
> -
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4570 / Base de données virale: 3931/7453 - Date:
> 07/05/2014 La Base de données des virus a expiré.


Re: Solr 4.8: Does eDisMax parser calls analyzer chain to tokenize?

2014-05-17 Thread Alexandre Rafalovitch
My understanding was that the lower-case and other things happen on
per-field basis and is a step after the dismax formula is applied. In
this case, however, this seems to be happening before:
DisjunctionMaxQuerywdText:abc123xyz wdText:abc) wdText:123 wdText:xyz)

Hence to question to someone who actually understands those guts. For
eDisMax, what's the correct/expected call sequence between query
parser and field-type parser? Or maybe just a slightly more in-depth
explanation of Michael's statement.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sat, May 17, 2014 at 8:28 PM, Michael Sokolov
 wrote:
> Alex - the query parsers generally accept an analyzer, which they must apply
> after they perform their own tokenization.  Consider: how would a
> capitalized query term match lower-cased terms in the index without query
> analysis?
>
> -Mike
>
>
> On 5/17/2014 4:05 AM, Alexandre Rafalovitch wrote:
>>
>> Hello,
>>
>> I am getting weird results that seem to come from eDisMax using
>> analyzer chain to break the input text. I have
>> WordDelimiterFilterFactory in my chain, which does a lot of
>> interesting things I did not expect query parser to be involved in.
>>
>> Specifically, the string "abc123XYZ" gets split into 3 components on
>> digits and gets lowercased as well. I thought all that was happening
>> later, inside individual fields.
>>
>> All documentation talks about query parsers splitting on space, so I
>> don't know where this "full chain" business is coming from. Or maybe I
>> am misunderstanding which phase debug output is from.
>>
>> Here is the field definition:
>>  
>>  
>>  
>>  > preserveOriginal="1" />
>>  
>>  
>>  
>>  > positionIncrementGap="100">
>>
>>  
>>
>>  
>>
>>  > />
>>  > />
>>
>> And here is the debug output:
>>
>> http://localhost:9000/solr/collection1/select?q=hello+big+world+abc123XYZ&wt=json&indent=true&debugQuery=true&defType=edismax&qf=wdText+wsText&stopwords=true&lowercaseOperators=true
>>
>> "rawquerystring":"hello big world abc123XYZ",
>>  "querystring":"hello big world abc123XYZ",
>>  "parsedquery":"(+(DisjunctionMaxQuery((wdText:hello |
>> wsText:hello)) DisjunctionMaxQuery((wdText:big | wsText:big))
>> DisjunctionMaxQuery((wdText:world | wsText:world))
>> DisjunctionMaxQuerywdText:abc123xyz wdText:abc) wdText:123
>> wdText:xyz) | wsText:abc123XYZ/no_coord",
>>  "parsedquery_toString":"+((wdText:hello | wsText:hello)
>> (wdText:big | wsText:big) (wdText:world | wsText:world)
>> (((wdText:abc123xyz wdText:abc) wdText:123 wdText:xyz) |
>> wsText:abc123XYZ))",
>>
>> Or, and enabling phrase search on the field type, gets even more
>> weird. But one problem at a time.
>>
>> Regards,
>> Alex.
>>
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>
>


Re: What is the usage of solr.NumericPayloadTokenFilterFactory

2014-05-17 Thread Ahmet Arslan
Hi Roman,

I was referring to this https://issues.apache.org/jira/browse/LUCENE-2878 
ticket.



Ahmet

On Saturday, May 17, 2014 5:50 PM, Roman Chyla  wrote:
Hi, What will replace spans, if spans are nuked ?
Roman

On 17 May 2014 09:15, "Ahmet Arslan"  wrote:

> Hi,
>
>
> Payloads are used to store arbitrary data along with terms. You can
> influence score with these arbitrary data.
> See :
> http://sujitpal.blogspot.com.tr/2013/07/porting-payloads-to-solr4.html
>
> But remember that there is an ongoing work to nuke Spans.
>
> Ahmet
>
>
>
> On Saturday, May 17, 2014 8:24 AM, ienjreny 
> wrote:
> Regarding to your question: "That said, are you sure you want to be using
> the payload feature of Lucene? "
>
> I don't know because I don't know what is the benefits from this tokenizer,
> and what Payload means here!
>
>
> On Sat, May 17, 2014 at 2:45 AM, Jack Krupansky-2 [via Lucene] <
> ml-node+s472066n4136467...@n3.nabble.com> wrote:
>
> > I do have basic coverage for that filter (and all other filters) and the
> > parameter values in my e-book:
> >
> >
> >
> http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
> >
> > That said, are you sure you want to be using the payload feature of
> > Lucene?
> >
> > -- Jack Krupansky
> >
> > -Original Message-
> > From: ienjreny
> > Sent: Monday, May 12, 2014 12:51 PM
> > To: [hidden email]  >
> > Subject: What is the usage of solr.NumericPayloadTokenFilterFactory
> >
> > Dears:
> > Can any body explain at easy way what is the benefits of
> > solr.NumericPayloadTokenFilterFactory and what is acceptable values for
> > typeMatch
> >
> > Thanks in advance
> >
> >
> >
> > --
> > View this message in context:
> >
> >
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
> > --
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136467.html
> >  To unsubscribe from What is the usage of
> > solr.NumericPayloadTokenFilterFactory, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4135326&code=aXNtYWVlbC5lbmpyZW55QGdtYWlsLmNvbXw0MTM1MzI2fC01NTkxMjYzODg=
> >
> > .
> > NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136597.html
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Postingshighlighter with prefix queries

2014-05-17 Thread Puneet Pawaia
Hi all
Postingshighlighter in Solr 4.7 is supposed to be able to highlight prefix
queries.  However you are supposed to subclass it and override getAnalyzer
to that used at index time.
Any examples to show how this is done when using Solr?

Regards
Puneet


Re: Solr 4.8: Does eDisMax parser calls analyzer chain to tokenize?

2014-05-17 Thread Jack Krupansky
Your bad experience seems to have occurred because you chose to use all 
default values for the WDF attributes. In particular, the generateWordParts 
and generateNumberParts attributes default to "1" (true), resulting in the 
discrete "abc", "123", and "xyz" tokens, and the catenateAll attribute 
defaults to "0" (false), which means that the "abc123xyz" token is not 
generated by that attribute, although "abc123xyz" is generated because you 
explicitly specified the preserveOriginal attribute to be "1".


Generally, you need to have asymmetric WDF analyzers, one for indexing that 
generates multiple terms for better recall, and one for query that generates 
only a sequence of the sub-terms (as if a quoted phrase) for more precise 
matching. So, it's fine to use preserveOriginal="1" for indexing, as well as 
catenateAll="1" and generateNumberParts="1" and generateWordParts="1", but 
for query analysis you should have preserveOriginal="0", catenateAll="0" and 
catenateWordParts="0" and catenateNumberParts="0" and 
generateNumberParts="1" and generateWordParts="1".


The distinction between preserveOriginal and catenateAll is whether 
punctuation should be included (for the former) or stripped out (the 
latter):


abc. => abc. vs. abc

(xyz). => (xyz). vs. xyz

401(k). => 401(k). vs. 401 k

CD-ROM. => CD-ROM. vs. CD ROM

Finally, the default for the splitOnNumerics attribute is "1" (true), which 
is why "abc123xyz" is split into three terms. If you don't want that split, 
set splitOnNumerics="0".


There are more details on WDF in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html


-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Saturday, May 17, 2014 1:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.8: Does eDisMax parser calls analyzer chain to tokenize?

My understanding was that the lower-case and other things happen on
per-field basis and is a step after the dismax formula is applied. In
this case, however, this seems to be happening before:
DisjunctionMaxQuerywdText:abc123xyz wdText:abc) wdText:123 wdText:xyz)

Hence to question to someone who actually understands those guts. For
eDisMax, what's the correct/expected call sequence between query
parser and field-type parser? Or maybe just a slightly more in-depth
explanation of Michael's statement.

Regards,
  Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr 
proficiency



On Sat, May 17, 2014 at 8:28 PM, Michael Sokolov
 wrote:
Alex - the query parsers generally accept an analyzer, which they must 
apply

after they perform their own tokenization.  Consider: how would a
capitalized query term match lower-cased terms in the index without query
analysis?

-Mike


On 5/17/2014 4:05 AM, Alexandre Rafalovitch wrote:


Hello,

I am getting weird results that seem to come from eDisMax using
analyzer chain to break the input text. I have
WordDelimiterFilterFactory in my chain, which does a lot of
interesting things I did not expect query parser to be involved in.

Specifically, the string "abc123XYZ" gets split into 3 components on
digits and gets lowercased as well. I thought all that was happening
later, inside individual fields.

All documentation talks about query parsers splitting on space, so I
don't know where this "full chain" business is coming from. Or maybe I
am misunderstanding which phase debug output is from.

Here is the field definition:
 
 
 
 
 
 
 
 
   
 
   
 

 
 

And here is the debug output:

http://localhost:9000/solr/collection1/select?q=hello+big+world+abc123XYZ&wt=json&indent=true&debugQuery=true&defType=edismax&qf=wdText+wsText&stopwords=true&lowercaseOperators=true

"rawquerystring":"hello big world abc123XYZ",
 "querystring":"hello big world abc123XYZ",
 "parsedquery":"(+(DisjunctionMaxQuery((wdText:hello |
wsText:hello)) DisjunctionMaxQuery((wdText:big | wsText:big))
DisjunctionMaxQuery((wdText:world | wsText:world))
DisjunctionMaxQuerywdText:abc123xyz wdText:abc) wdText:123
wdText:xyz) | wsText:abc123XYZ/no_coord",
 "parsedquery_toString":"+((wdText:hello | wsText:hello)
(wdText:big | wsText:big) (wdText:world | wsText:world)
(((wdText:abc123xyz wdText:abc) wdText:123 wdText:xyz) |
wsText:abc123XYZ))",

Or, and enabling phrase search on the field type, gets even more
weird. But one problem at a time.

Regards,
Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr
proficiency