Re: Implicit Router Configurations

2015-06-24 Thread Upayavira
You can use the server/scripts/cloud-scripts/zkcli.sh script (or the cmd
one) in server/scripts/cloud-scripts. Note, in older versions this is in
example/scripts/cloud-scripts. 

I just used this command to get the file from zookeeper:

server/scripts/cloud-scripts/zkcli.sh -z localhost:9983 -cmd getfile
/clusterstate.json clusterstate.json

You can use -cmd putfile to push it back to Zookeeper. As Erick says,
have all nodes on your cluster down at the time. And as Erick says, this
is not something that people are recommended to be doing generally.

Upayavira

On Wed, Jun 24, 2015, at 07:54 AM, Arnon Yogev wrote:
> Thank you Erick,
> 
> What is the recommended way to manually change clusterstate.json?
> Is there a java code \ script way of editing a file in ZK?
> 
> Best,
> Arnon
> 
> 
> 
> From:   Erick Erickson 
> To: solr-user@lucene.apache.org
> Date:   23/06/2015 09:09 PM
> Subject:Re: Implicit Router Configurations
> 
> 
> 
> Please raise a JIRA for this, I can see why this would occur.
> You can manually change the clusterstate.json file when this
> happens as a stop-gap, I'd have all the Solr instances down
> when doing this though.
> 
> Best,
> Erick
> 
> 
> 
> On Tue, Jun 23, 2015 at 8:19 AM, Arnon Yogev  wrote:
> > We have a use case where documents are indexed in shards according to a
> > specific field (shard per user), and the number of shards is unknown 
> when
> > creating the collection.
> > For that purpose we use the implicit router and define 
> router.field=user.
> >
> > From what we've seen, the only way to define an implicit router is 
> during
> > the collection creation.
> > Moreover, the router definitions (router.name and router.field) are kept
> > only in clusterstate.json and not in any solr configuration file on 
> disk.
> >
> > In some cases solr state becomes inconsistent and we need to delete the
> > configs from ZK and restart the solr server. The behavior we see is the
> > new clusterstate.json generated by solr on startup has the default
> > router.name=compositeId, which is not what we defined during creation.
> >
> > Are we missing something? Is there a place to configure the implicit
> > router on disk such that it will be persistent?
> >
> > Thanks,
> > Arnon
> >
> >
> 
> 


Attaching payload on spatial RPT?

2015-06-24 Thread Markus Jelsma
Hi we have a multiValued spatial RPT field. Each document has 0 or more 
coordinate pairs attached to it. I derived the coordinate pairs from the 
spatial facetting heatmap and i have the count that comes with it, to do so i 
had to translate the heatmap grid to a list of coordinate pairs with the 
accompanying count. Now i want to use the count as a measure of popularity so 
ideally i want to attach the count to the coordinate pair as payload attribute. 
So i can boost on most popular item in the vicinity.

But RPT is no regular field type i can attach a payload too, nor do i know how 
to score payloads in a geodist valueSource. I did notice 
SpatialTermQueryPrefixTreeFieldType, which suggests that it uses TermQuery 
internally, which would mean it could take PayloadTermQuery as well?

Any ideas?

Many thanks,
Markus


Re: Creating A User Interface On Top of Solr

2015-06-24 Thread Alessandro Benedetti
Erick,
related that I noticed that a lot of times, a developer would need an
intermediate API that will proxy the Search UI requests to Solr.

Of course there are scenarios where is necessary to build this intermediate
API ( for example if you customise how the results must be processed after
Solr returns them) .
But sometimes it can happen a straightforward communication from the UI to
Solr using the REST endpoints provided .

In your opinion, does make any sense to provide the intermediate Search API
in simple use cases as well ?
Doing that, aren't we introducing  scalability issues/ delay in query time ?

I always thought the intermediate Rest API to be necessary, but day by day,
Solr REST API are becoming more and more complete.

To refer to the the user mail, maybe he can simply use Solr REST endpoints.
But what about production environments ?

Cheers

2015-06-24 2:18 GMT+01:00 Erick Erickson :

> First, the Velocity UI was never intended to be a user-facing application,
> if
> for no other reason than it has direct access to Solr. And I can delete
> all the
> docs in the collection, delete collections, create new collections and
> all manner
> of other really bad stuff.
>
> So almost _every_ application has its own UI for search.
>
> You don't need to worry in the least about the Solr code base. You
> shouldn't
> have to care. All you need to do is send HTTP requests to Solr and deal
> with
> the response.
>
> For instance, try these two commands from a browser:
> http://sever:port/solr/collection/query?q=*:*
> (that's star colon star in case your e-mail makes the colon just a bold).
> http://sever:port/solr/collection/select?q=*:*
>
> You should see a JSON and XML response packet. _That's_ all you
> really have to deal with. The rest of theUI has nothing to do with
> Solr and everything to do with your UI skills, you "merely" have to parse
> the response and display it pleasingly.
>
> If you don't want to curl the commands or use your own HTTP client, you
> can use SolrJ to query Solr and not have to parse the response because
> there are a bunch of helper methods that make navigating the response
> easier, google "solrj query example" and you'll see a bunch of examples. Be
> a little careful to make sure the example you use is compatible with your
> Solr version.
>
> HTH,
> Erick
>
> On Tue, Jun 23, 2015 at 5:20 PM, Paden  wrote:
> > Hello,
> >
> > I'm trying to custom build my own Solr interface in Visual Studios
> instead
> > of using/modifying the original Velocity interface. I'm mostly doing
> this as
> > a learning exercise for building UI that's why I'm opting out of using
> it.
> >
> > The problem is I'm pretty new and not sure where to begin. Most of the
> posts
> > I've seen end in people opting to modify the default UI. Are there any
> good
> > links to tutorials or good Solr classes I should look at as I'm trying to
> > build this UI?
> >
> > This interface isn't to replace the Solr Admin page, it's simply to
> handle
> > the querying of the indexed documents that I have.
> >
> > If this seems more an "exercise in futility" (due to vast amounts of code
> > that needs written or whatnot) then please let me know. I don't really
> have
> > any reason to believe it should be but I've been wrong before.
> >
> > Thanks in advance
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Creating-A-User-Interface-On-Top-of-Solr-tp4213541.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: Solr Unexpected Query Parser Exception

2015-06-24 Thread Vishnu Mishra
I think there is no issue with query escaping. I am doing shard query to my
main solr server, and from inside the main solr server I am doing simple *:*
query to another solr server by using solrj.  But most of the time I get
following error.  Cannot parse '*:*': Encountered "" at line 1, column
1. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Unexpected-Query-Parser-Exception-tp4194156p4213621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Unexpected Query Parser Exception

2015-06-24 Thread Upayavira


On Wed, Jun 24, 2015, at 11:19 AM, Vishnu Mishra wrote:
> I think there is no issue with query escaping. I am doing shard query to
> my
> main solr server, and from inside the main solr server I am doing simple
> *:*
> query to another solr server by using solrj.  But most of the time I get
> following error.  Cannot parse '*:*': Encountered "" at line 1,
> column
> 1. 

Same answer as last time. Please give us more information.

Please give us the full query that you are doing. If you are using
SolrJ, you can get it out of the logs of the server you are hitting.

Please also give us the full stack trace of the error you are seeing,
not just the error message itself. 

Without this extra information, anything we say will be speculation.

Thanks,

Upayavira


fq versus q

2015-06-24 Thread Esther Goldbraich
Hi,

We are comparing the performance of fq versus q for queries that are 
actually filters and should not be cached.
In part of queries we see strange behavior where q performs 5-10x better 
than fq. The question is why?

An example1:
q=maildate:{DATE1 to DATE2} COMPARED TO fq={!cache=false}maildate:{DATE1 
to DATE2}
sort=maildate_sort* desc
rows=50
start=0
group=true
group.query=some query (without dates)
group.query=*:*
group.sort=maildate_sort desc
additional fqs

Schema:



Thank you,
Esther
-
Esther Goldbraich
Social Technologies & Analytics - IBM Haifa Research Lab
Phone: +972-4-8281059

Re: fq versus q

2015-06-24 Thread Esther Goldbraich
Some clarification:
I would like to understand how solr processes fq (without cache) versus q 
when sort and group are required.




From:
Esther Goldbraich/Haifa/IBM@IBMIL
To:
solr-user@lucene.apache.org
Cc:
Arnon Yogev/Haifa/IBM@IBMIL, Shai Erera/Haifa/IBM@IBMIL
Date:
24/06/2015 02:29 PM
Subject:
fq versus q



Hi,

We are comparing the performance of fq versus q for queries that are 
actually filters and should not be cached.
In part of queries we see strange behavior where q performs 5-10x better 
than fq. The question is why?

An example1:
q=maildate:{DATE1 to DATE2} COMPARED TO fq={!cache=false}maildate:{DATE1 
to DATE2}
sort=maildate_sort* desc
rows=50
start=0
group=true
group.query=some query (without dates)
group.query=*:*
group.sort=maildate_sort desc
additional fqs

Schema:



Thank you,
Esther
-
Esther Goldbraich
Social Technologies & Analytics - IBM Haifa Research Lab
Phone: +972-4-8281059



Re: Creating A User Interface On Top of Solr

2015-06-24 Thread Shawn Heisey
On 6/24/2015 4:16 AM, Alessandro Benedetti wrote:
> related that I noticed that a lot of times, a developer would need an
> intermediate API that will proxy the Search UI requests to Solr.
> 
> Of course there are scenarios where is necessary to build this intermediate
> API ( for example if you customise how the results must be processed after
> Solr returns them) .
> But sometimes it can happen a straightforward communication from the UI to
> Solr using the REST endpoints provided .
> 
> In your opinion, does make any sense to provide the intermediate Search API
> in simple use cases as well ?
> Doing that, aren't we introducing  scalability issues/ delay in query time ?
> 
> I always thought the intermediate Rest API to be necessary, but day by day,
> Solr REST API are becoming more and more complete.
> 
> To refer to the the user mail, maybe he can simply use Solr REST endpoints.
> But what about production environments ?

Solr's REST-like interface is indeed becoming more and more complete ...
which is part of the problem.  It's a bad idea to expose Solr directly
to end users, because as Erick said, users with direct access to Solr
can wipe out your index, change it in subtle ways, and access things you
may not have wanted them to access.  When we add another feature to Solr
that allows administrators to modify the Solr config or index, it makes
end-user access that much more dangerous.

Even if you implement a proxy for Solr that blocks access to the admin
UI and the update handler, a user can still craft queries with enough
complexity to keep the Solr server busy and cause denial of service.
Configuring a proxy for that role requires time, which might be better
spent on an intermediate UI.

Chances are that if you're building a website, you're going to invest
heavily in writing a user interface, part of which will run on the
server side.  It might be in a language like PHP, Ruby, or Java.  Adding
Solr support to such an interface is normally pretty easy, because Solr
clients are available for several languages that are commonly used for
website design.

https://wiki.apache.org/solr/IntegratingSolr

A modern interface will include plenty of Javascript, which runs in the
user's browser, but I expect that the sensitive parts will run
server-side.  As an example, there are Solr plugins available for
applications like Wordpress and Drupal.  Those plugins are server-side
code, and do not require end-user access to Solr.

Thanks,
Shawn



Re: fq versus q

2015-06-24 Thread Shawn Heisey
On 6/24/2015 5:28 AM, Esther Goldbraich wrote:
> We are comparing the performance of fq versus q for queries that are 
> actually filters and should not be cached.
> In part of queries we see strange behavior where q performs 5-10x better 
> than fq. The question is why?
> 
> An example1:
> q=maildate:{DATE1 to DATE2} COMPARED TO fq={!cache=false}maildate:{DATE1 
> to DATE2}
> sort=maildate_sort* desc



> 
>  docValues="true"/>

For simplicity, I would probably just use one field for that, rather
than a separate sort field.  The disk space required would probably be
the same either way, but your interaction with the index will not be as
complex.  There's nothing wrong with doing it the way you have, though.

I'm not at all an expert, but I've been a member of this community for a
long time.  Here's my guess about why your query is faster in the q
parameter than a non-cached filter:

The result of a standard query is the stored fields from the top N
documents, where N is the value in the rows parameter.  The default for
N is typically set to 10, and for most people will normally be 200 or less.

The result of a filter is very different -- it is a bitset of all the
documents in your entire index, with binary 0 for documents that don't
match the filter and binary 1 for documents that do match.

If your index has 100 million documents, every single one of those 100
million documents must be checked against the filter query to produce a
filter bitset, but when it's in the q parameter, shortcuts can be taken
which will get the top N results quickly.

The filterCache levels the playing field when filters are re-used.  If a
requested filter is already in the cache, it can be retrieved and
applied to a result VERY quickly.

You have turned off the caching for your filter.  I'm not sure why you
did this, but you know your use case a lot better than I do.  If it were
me, I would use filter queries and do everything possible to re-use the
same filters, and I would cache them.

Thanks,
Shawn



Re: /suggest through SolrJ?

2015-06-24 Thread Alessandro Benedetti
https://issues.apache.org/jira/browse/SOLR-7719

I will work on it as soon as I can, it is very simple.

Cheers

2015-05-06 13:38 GMT+01:00 Alessandro Benedetti 
:

> Exactly Tomnaso ,
> I was referring to that !
>
> I wrote another mail in the dev mailing list, I will open a Jira Issue for
> that !
>
> Cheers
>
> 2015-04-29 12:16 GMT+01:00 Tommaso Teofili :
>
>> 2015-04-27 19:22 GMT+02:00 Alessandro Benedetti <
>> benedetti.ale...@gmail.com>
>> :
>>
>> > Just had the very same problem, and I confirm that currently is quite a
>> > mess to manage suggestions in SolrJ !
>> > I have to go with manual Json parsing.
>> >
>>
>> or very not nice NamedList API mess (see an example in JR Oak [1][2]).
>>
>> Regards,
>> Tommaso
>>
>> p.s.:
>> note that this applies to Solr 4.7.1 API, but reading the thread it seems
>> the problem is still there.
>>
>> [1] :
>>
>> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-solr-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/solr/query/SolrQueryIndex.java#L318
>> [2] :
>>
>> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-solr-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/solr/query/SolrQueryIndex.java#L370
>>
>>
>>
>> >
>> > Cheers
>> >
>> > 2015-02-02 12:17 GMT+00:00 Jan Høydahl :
>> >
>> > > Using the /suggest handler wired to SuggestComponent, the
>> > > SpellCheckResponse objects are not populated.
>> > > Reason is that QueryResponse looks for a top-level element named
>> > > "spellcheck"
>> > >
>> > >   else if ( "spellcheck".equals( n ) )  {
>> > > _spellInfo = (NamedList) res.getVal( i );
>> > > extractSpellCheckInfo( _spellInfo );
>> > >   }
>> > >
>> > > Earlier the suggester was the same as the Spell component, but now
>> with
>> > > its own component, suggestions are put in "suggest".
>> > >
>> > > I think we're lacking a SuggestResponse.java for parsing suggest
>> > > responses..??
>> > >
>> > > --
>> > > Jan Høydahl, search solution architect
>> > > Cominvent AS - www.cominvent.com
>> > >
>> > > > 26. sep. 2014 kl. 07.27 skrev Clemens Wyss DEV <
>> clemens...@mysign.ch>:
>> > > >
>> > > > Thx to you two.
>> > > >
>> > > > Just in case anybody else is trying to do "this". The following
>> SolrJ
>> > > code corresponds to the http request
>> > > > GET http://localhost:8983/solr/solrpedia/suggest?q=atmo
>> > > > of  "Solr in Action" (chapter 10):
>> > > > ...
>> > > > SolrServer server = new HttpSolrServer("
>> > > http://localhost:8983/solr/solrpedia";);
>> > > > SolrQuery query = new SolrQuery( "atmo" );
>> > > > query.setRequestHandler( "/suggest" );
>> > > > QueryResponse queryresponse = server.query( query );
>> > > > ...
>> > > > queryresponse.getSpellCheckResponse().getSuggestions();
>> > > > ...
>> > > >
>> > > >
>> > > > -Ursprüngliche Nachricht-
>> > > > Von: Shawn Heisey [mailto:s...@elyograg.org]
>> > > > Gesendet: Donnerstag, 25. September 2014 17:37
>> > > > An: solr-user@lucene.apache.org
>> > > > Betreff: Re: /suggest through SolrJ?
>> > > >
>> > > > On 9/25/2014 8:43 AM, Erick Erickson wrote:
>> > > >> You can call anything from SolrJ that you can call from a URL.
>> > > >> SolrJ has lots of convenience stuff to set particular parameters,
>> > > >> parse the response, etc... But in the end it's communicating with
>> Solr
>> > > >> via a URL.
>> > > >>
>> > > >> Take a look at something like SolrQuery for instance. It has a nice
>> > > >> command setFacetPrefix. Here's the entire method:
>> > > >>
>> > > >> public SolrQuery setFacetPrefix( String field, String prefix ) {
>> > > >>this.set( FacetParams.FACET_PREFIX, prefix );
>> > > >>return this;
>> > > >> }
>> > > >>
>> > > >> which is really
>> > > >>this.set( "facet.prefix", prefix ); All it's really doing is
>> > > >> setting a SolrParams key/value pair which is equivalent to
>> > > >> &facet.prefix=blahblah on a URL.
>> > > >>
>> > > >> As I remember, there's a "setPath" method that you can use to set
>> the
>> > > >> destination for the request to "suggest" (or maybe "/suggest").
>> It's
>> > > >> something like that.
>> > > >
>> > > > Yes, like Erick says, just use SolrQuery for most accesses to Solr
>> on
>> > > arbitrary URL paths with arbitrary URL parameters.  The "set" method
>> is
>> > how
>> > > you include those parameters.
>> > > >
>> > > > The SolrQuery method Erick was talking about at the end of his
>> email is
>> > > setRequestHandler(String), and you would set that to "/suggest".  Full
>> > > disclosure about what this method actually does: it also sets the "qt"
>> > > > parameter, but with the modern example Solr config, the qt parameter
>> > > doesn't do anything -- you must actually change the URL path on the
>> > > request, which this method will do if the value starts with a forward
>> > slash.
>> > > >
>> > > > Thanks,
>> > > > Shawn
>> > > >
>> > >
>> > >
>> >
>> >
>> > --
>> > --
>> >
>> > Benedetti Alessandro
>> > Visiting card : http://about.me/alessandro_benedetti
>

Term Vector and Optimization

2015-06-24 Thread sudeep kumar
I want to know what is impact to disable term vector to existing production 
environment, I mean how new segments create and how old segments will merge 
with new segments because before this term vector was enable.

I have one more question Is schema.xml file read during solr core optimization? 
Thanks,Sudeep



Re: How to do a Data sharding for data in a database table

2015-06-24 Thread wwang525
Hi All,

I built the Solr index with 14 M records.

I have > 20 G RAM in my local machine, and the Solr instance was started
with -Xms1024m -Xmx8196m

The following query:

http://localhost:8983/solr/db-mssql/select?q=*:*&fq=GatewayCode:(YYZ)&fq=DestCode:(CUN)&fq=Duration:(5
OR 6 OR 7 OR 8)&fq=DateDep:([20150610 TO
20150810])&facet=true&facet.field=DestCode&facet.field=DateDep&facet.field=GatewayCode&facet.field=HotelName&facet.sort=count&facet.limit=40&facet.mincount=1&rows=30&group=true&group.field=HotelCode&group.ngroups=true&group.facet=true&debugQuery=true

The response found a total matched base records of 98105, these records were
grouped at hotelcode level to give the ngroups: 143, however, the query only
retrieve the first base record of each group, and only 30 groups were
retrieved.

The performance statistics:

Total response time in solr.log: 1791 ms
>From the query response page: the query took 764 ms and facet took 1007 ms.
Debug took 13 ms

This is a typical query that business need. Previously, I was testing the
data size of 6 M and no faceted search, the typical response time at single
request scenario was around 200 ms.

Please let me know if additional information is needed.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4213648.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Term Vector and Optimization

2015-06-24 Thread Upayavira


On Wed, Jun 24, 2015, at 02:50 PM, sudeep kumar wrote:
> I want to know what is impact to disable term vector to existing
> production environment, I mean how new segments create and how old
> segments will merge with new segments because before this term vector was
> enable.
> 
> I have one more question Is schema.xml file read during solr core
> optimization? 

Generally, if you disable something like that, the data will stay in the
index but become inaccessible. Then, as segments are merged, the new
segments will exclude that data.

schema.xml is read at startup and when cores are reloaded. Generally,
you shouldn't be optimizing these days - it actually, ironically, makes
things much less optimal.

Upayavira


Re: Term Vector and Optimization

2015-06-24 Thread Upayavira


On Wed, Jun 24, 2015, at 03:27 PM, Upayavira wrote:
> 
> 
> On Wed, Jun 24, 2015, at 02:50 PM, sudeep kumar wrote:
> > I want to know what is impact to disable term vector to existing
> > production environment, I mean how new segments create and how old
> > segments will merge with new segments because before this term vector was
> > enable.
> > 
> > I have one more question Is schema.xml file read during solr core
> > optimization? 
> 
> Generally, if you disable something like that, the data will stay in the
> index but become inaccessible. Then, as segments are merged, the new
> segments will exclude that data.
> 
> schema.xml is read at startup and when cores are reloaded. Generally,
> you shouldn't be optimizing these days - it actually, ironically, makes
> things much less optimal.

To say a little more - if you are really sure you want to get rid of the
raw data that is term vectors, then you should edit your schema, reload
the core, then optimize.

But the real question becomes, why is storing term vectors causing you
so much trouble? What problem are you trying to solve with this effort?

Upayavira


Re: Term Vector and Optimization

2015-06-24 Thread sudeepgarg
as i understood that after merging with old segments which was generated with
term vectors=true this won't cause any trouble i.e. index corruption or
index mismatch. and new segments will be merge with old segments
irrespective we have disable the term feature or not. And we are fine with
this that data stay in index. We are just try to disable term vector feature
in our environment as we don't require it any more.   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Term-Vector-and-Optimization-tp4213647p4213657.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fq versus q

2015-06-24 Thread Shai Erera
Thanks Shawn,

What's Solr equivalence to ConstantScoreQuery? I.e., what if you want to
run a query that does not score, but only filter. The rationale behind
using a non-cached 'fq' was just that.

Shai

On Wed, Jun 24, 2015 at 4:29 PM, Shawn Heisey  wrote:

> On 6/24/2015 5:28 AM, Esther Goldbraich wrote:
> > We are comparing the performance of fq versus q for queries that are
> > actually filters and should not be cached.
> > In part of queries we see strange behavior where q performs 5-10x better
> > than fq. The question is why?
> >
> > An example1:
> > q=maildate:{DATE1 to DATE2} COMPARED TO fq={!cache=false}maildate:{DATE1
> > to DATE2}
> > sort=maildate_sort* desc
>
> 
>
> > 
> >  > docValues="true"/>
>
> For simplicity, I would probably just use one field for that, rather
> than a separate sort field.  The disk space required would probably be
> the same either way, but your interaction with the index will not be as
> complex.  There's nothing wrong with doing it the way you have, though.
>
> I'm not at all an expert, but I've been a member of this community for a
> long time.  Here's my guess about why your query is faster in the q
> parameter than a non-cached filter:
>
> The result of a standard query is the stored fields from the top N
> documents, where N is the value in the rows parameter.  The default for
> N is typically set to 10, and for most people will normally be 200 or less.
>
> The result of a filter is very different -- it is a bitset of all the
> documents in your entire index, with binary 0 for documents that don't
> match the filter and binary 1 for documents that do match.
>
> If your index has 100 million documents, every single one of those 100
> million documents must be checked against the filter query to produce a
> filter bitset, but when it's in the q parameter, shortcuts can be taken
> which will get the top N results quickly.
>
> The filterCache levels the playing field when filters are re-used.  If a
> requested filter is already in the cache, it can be retrieved and
> applied to a result VERY quickly.
>
> You have turned off the caching for your filter.  I'm not sure why you
> did this, but you know your use case a lot better than I do.  If it were
> me, I would use filter queries and do everything possible to re-use the
> same filters, and I would cache them.
>
> Thanks,
> Shawn
>
>


Re: fq versus q

2015-06-24 Thread jim ferenczi
> In part of queries we see strange behavior where q performs 5-10x better
> than fq. The question is why?
Are you sure that the query result cache is disabled ?

2015-06-24 13:28 GMT+02:00 Esther Goldbraich :

> Hi,
>
> We are comparing the performance of fq versus q for queries that are
> actually filters and should not be cached.
> In part of queries we see strange behavior where q performs 5-10x better
> than fq. The question is why?
>
> An example1:
> q=maildate:{DATE1 to DATE2} COMPARED TO fq={!cache=false}maildate:{DATE1
> to DATE2}
> sort=maildate_sort* desc
> rows=50
> start=0
> group=true
> group.query=some query (without dates)
> group.query=*:*
> group.sort=maildate_sort desc
> additional fqs
>
> Schema:
> 
>  docValues="true"/>
>
> Thank you,
> Esther
> -
> Esther Goldbraich
> Social Technologies & Analytics - IBM Haifa Research Lab
> Phone: +972-4-8281059


Re: fq versus q

2015-06-24 Thread Jack Krupansky
Yonik added syntax to request a constant score query in Solr with the ^=
operator.

For example: +color:blue^=1 text:shoes

See:
https://issues.apache.org/jira/browse/SOLR-7218

-- Jack Krupansky

On Wed, Jun 24, 2015 at 1:41 PM, Shai Erera  wrote:

> Thanks Shawn,
>
> What's Solr equivalence to ConstantScoreQuery? I.e., what if you want to
> run a query that does not score, but only filter. The rationale behind
> using a non-cached 'fq' was just that.
>
> Shai
>
> On Wed, Jun 24, 2015 at 4:29 PM, Shawn Heisey  wrote:
>
> > On 6/24/2015 5:28 AM, Esther Goldbraich wrote:
> > > We are comparing the performance of fq versus q for queries that are
> > > actually filters and should not be cached.
> > > In part of queries we see strange behavior where q performs 5-10x
> better
> > > than fq. The question is why?
> > >
> > > An example1:
> > > q=maildate:{DATE1 to DATE2} COMPARED TO
> fq={!cache=false}maildate:{DATE1
> > > to DATE2}
> > > sort=maildate_sort* desc
> >
> > 
> >
> > > 
> > >  > > docValues="true"/>
> >
> > For simplicity, I would probably just use one field for that, rather
> > than a separate sort field.  The disk space required would probably be
> > the same either way, but your interaction with the index will not be as
> > complex.  There's nothing wrong with doing it the way you have, though.
> >
> > I'm not at all an expert, but I've been a member of this community for a
> > long time.  Here's my guess about why your query is faster in the q
> > parameter than a non-cached filter:
> >
> > The result of a standard query is the stored fields from the top N
> > documents, where N is the value in the rows parameter.  The default for
> > N is typically set to 10, and for most people will normally be 200 or
> less.
> >
> > The result of a filter is very different -- it is a bitset of all the
> > documents in your entire index, with binary 0 for documents that don't
> > match the filter and binary 1 for documents that do match.
> >
> > If your index has 100 million documents, every single one of those 100
> > million documents must be checked against the filter query to produce a
> > filter bitset, but when it's in the q parameter, shortcuts can be taken
> > which will get the top N results quickly.
> >
> > The filterCache levels the playing field when filters are re-used.  If a
> > requested filter is already in the cache, it can be retrieved and
> > applied to a result VERY quickly.
> >
> > You have turned off the caching for your filter.  I'm not sure why you
> > did this, but you know your use case a lot better than I do.  If it were
> > me, I would use filter queries and do everything possible to re-use the
> > same filters, and I would cache them.
> >
> > Thanks,
> > Shawn
> >
> >
>


solr help

2015-06-24 Thread Seunghun . Han
Hi
I am new and learning Solr-5.2.1. I am using windows without 
servlet.(using post.jar)
i have manage to index some files and try searching on 
http://localhost:8983/solr/test/browse

I have few question to ask.
1. Can i modify "browse" to show little bit of the content that i search?
ex) searching apple
results)  this apple is red...
2. can i index JAR or JAVA file?
i have indexed jar file by changing it to txt form using Tika but 
i wonder if solr can do it because i have bunch of jar files.
3.Can solr update index automatically?
if i change content of test folder( that i have indexed)

i have done several searches but still having trouble.
Thankyou!

Seunghun Han

IT Intern | LeasePlan USA 
Direct: 678-202-8860 | Toll-free: 800 457 8721 (ext. 8860)
seunghun@leaseplan.com 
1165 Sanctuary Parkway | Alpharetta, GA 30009
www.us.leaseplan.com

It's easier to leaseplan 



This message (including any attachments) is confidential and may be privileged. 
If you have received it by mistake please notify the sender by return e-mail 
and delete this message from your system. Any unauthorized use or dissemination 
of this message in whole or in part is strictly prohibited. Please note that 
e-mails are susceptible to change. LeasePlan Corporation N.V. (including its 
group companies) shall not be responsible nor liable for the proper and 
complete transmission of the information contained in this communication nor 
for any delay in its receipt or damage to your system. LeasePlan Corporation 
N.V. (or its group companies) does not guarantee that the integrity of this 
communication has been maintained nor that this communication is free of 
viruses, interceptions or interference.

Re: fq versus q

2015-06-24 Thread Shai Erera
Ah thanks. I see it was added in 5.1 - is there any other way prior to that
(like 4.7)?

if not, I guess the only option is to not use fq if we don't intend to
cache it, and on 5.1 use the ^= syntax.

Shai

On Wed, Jun 24, 2015 at 9:21 PM, Jack Krupansky 
wrote:

> Yonik added syntax to request a constant score query in Solr with the ^=
> operator.
>
> For example: +color:blue^=1 text:shoes
>
> See:
> https://issues.apache.org/jira/browse/SOLR-7218
>
> -- Jack Krupansky
>
> On Wed, Jun 24, 2015 at 1:41 PM, Shai Erera  wrote:
>
> > Thanks Shawn,
> >
> > What's Solr equivalence to ConstantScoreQuery? I.e., what if you want to
> > run a query that does not score, but only filter. The rationale behind
> > using a non-cached 'fq' was just that.
> >
> > Shai
> >
> > On Wed, Jun 24, 2015 at 4:29 PM, Shawn Heisey 
> wrote:
> >
> > > On 6/24/2015 5:28 AM, Esther Goldbraich wrote:
> > > > We are comparing the performance of fq versus q for queries that are
> > > > actually filters and should not be cached.
> > > > In part of queries we see strange behavior where q performs 5-10x
> > better
> > > > than fq. The question is why?
> > > >
> > > > An example1:
> > > > q=maildate:{DATE1 to DATE2} COMPARED TO
> > fq={!cache=false}maildate:{DATE1
> > > > to DATE2}
> > > > sort=maildate_sort* desc
> > >
> > > 
> > >
> > > > 
> > > >  type="tdate"
> > > > docValues="true"/>
> > >
> > > For simplicity, I would probably just use one field for that, rather
> > > than a separate sort field.  The disk space required would probably be
> > > the same either way, but your interaction with the index will not be as
> > > complex.  There's nothing wrong with doing it the way you have, though.
> > >
> > > I'm not at all an expert, but I've been a member of this community for
> a
> > > long time.  Here's my guess about why your query is faster in the q
> > > parameter than a non-cached filter:
> > >
> > > The result of a standard query is the stored fields from the top N
> > > documents, where N is the value in the rows parameter.  The default for
> > > N is typically set to 10, and for most people will normally be 200 or
> > less.
> > >
> > > The result of a filter is very different -- it is a bitset of all the
> > > documents in your entire index, with binary 0 for documents that don't
> > > match the filter and binary 1 for documents that do match.
> > >
> > > If your index has 100 million documents, every single one of those 100
> > > million documents must be checked against the filter query to produce a
> > > filter bitset, but when it's in the q parameter, shortcuts can be taken
> > > which will get the top N results quickly.
> > >
> > > The filterCache levels the playing field when filters are re-used.  If
> a
> > > requested filter is already in the cache, it can be retrieved and
> > > applied to a result VERY quickly.
> > >
> > > You have turned off the caching for your filter.  I'm not sure why you
> > > did this, but you know your use case a lot better than I do.  If it
> were
> > > me, I would use filter queries and do everything possible to re-use the
> > > same filters, and I would cache them.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Re: Term Vector and Optimization

2015-06-24 Thread sudeepgarg
Hi,

can someone help me in this regard?


Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Term-Vector-and-Optimization-tp4213647p4213732.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Term Vector and Optimization

2015-06-24 Thread Upayavira


On Wed, Jun 24, 2015, at 08:41 PM, sudeepgarg wrote:
> Hi,
> 
> can someone help me in this regard?


What additional help do you need?

Upayavira


Re: fq versus q

2015-06-24 Thread Upayavira
Are you wanting to do no scoring at all, or just have a portion of the
query not contribute to the score?

If you don't want scoring at all, just sort by another field. If you
don't have a field, I just tried "&sort=1 desc", and it worked! This
should, if I'm right, pull documents out of the index in index order.

Upayavira

On Wed, Jun 24, 2015, at 08:26 PM, Shai Erera wrote:
> Ah thanks. I see it was added in 5.1 - is there any other way prior to
> that
> (like 4.7)?
> 
> if not, I guess the only option is to not use fq if we don't intend to
> cache it, and on 5.1 use the ^= syntax.
> 
> Shai
> 
> On Wed, Jun 24, 2015 at 9:21 PM, Jack Krupansky
> 
> wrote:
> 
> > Yonik added syntax to request a constant score query in Solr with the ^=
> > operator.
> >
> > For example: +color:blue^=1 text:shoes
> >
> > See:
> > https://issues.apache.org/jira/browse/SOLR-7218
> >
> > -- Jack Krupansky
> >
> > On Wed, Jun 24, 2015 at 1:41 PM, Shai Erera  wrote:
> >
> > > Thanks Shawn,
> > >
> > > What's Solr equivalence to ConstantScoreQuery? I.e., what if you want to
> > > run a query that does not score, but only filter. The rationale behind
> > > using a non-cached 'fq' was just that.
> > >
> > > Shai
> > >
> > > On Wed, Jun 24, 2015 at 4:29 PM, Shawn Heisey 
> > wrote:
> > >
> > > > On 6/24/2015 5:28 AM, Esther Goldbraich wrote:
> > > > > We are comparing the performance of fq versus q for queries that are
> > > > > actually filters and should not be cached.
> > > > > In part of queries we see strange behavior where q performs 5-10x
> > > better
> > > > > than fq. The question is why?
> > > > >
> > > > > An example1:
> > > > > q=maildate:{DATE1 to DATE2} COMPARED TO
> > > fq={!cache=false}maildate:{DATE1
> > > > > to DATE2}
> > > > > sort=maildate_sort* desc
> > > >
> > > > 
> > > >
> > > > > 
> > > > >  > type="tdate"
> > > > > docValues="true"/>
> > > >
> > > > For simplicity, I would probably just use one field for that, rather
> > > > than a separate sort field.  The disk space required would probably be
> > > > the same either way, but your interaction with the index will not be as
> > > > complex.  There's nothing wrong with doing it the way you have, though.
> > > >
> > > > I'm not at all an expert, but I've been a member of this community for
> > a
> > > > long time.  Here's my guess about why your query is faster in the q
> > > > parameter than a non-cached filter:
> > > >
> > > > The result of a standard query is the stored fields from the top N
> > > > documents, where N is the value in the rows parameter.  The default for
> > > > N is typically set to 10, and for most people will normally be 200 or
> > > less.
> > > >
> > > > The result of a filter is very different -- it is a bitset of all the
> > > > documents in your entire index, with binary 0 for documents that don't
> > > > match the filter and binary 1 for documents that do match.
> > > >
> > > > If your index has 100 million documents, every single one of those 100
> > > > million documents must be checked against the filter query to produce a
> > > > filter bitset, but when it's in the q parameter, shortcuts can be taken
> > > > which will get the top N results quickly.
> > > >
> > > > The filterCache levels the playing field when filters are re-used.  If
> > a
> > > > requested filter is already in the cache, it can be retrieved and
> > > > applied to a result VERY quickly.
> > > >
> > > > You have turned off the caching for your filter.  I'm not sure why you
> > > > did this, but you know your use case a lot better than I do.  If it
> > were
> > > > me, I would use filter queries and do everything possible to re-use the
> > > > same filters, and I would cache them.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> >


Re: Term Vector and Optimization

2015-06-24 Thread sudeepgarg
 I would like to ask that after merging with old segments which was generated
with term vectors=true this won't cause any trouble i.e. index corruption or
index mismatch. and new segments will be merge with old segments
irrespective we have disable the term feature or not. 
And we are fine with this that data stay in index. We are just try to
disable term vector feature in our environment as we don't require it any
more. 

Is this correct what i understood please confirm this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Term-Vector-and-Optimization-tp4213647p4213755.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Term Vector and Optimization

2015-06-24 Thread Upayavira


On Wed, Jun 24, 2015, at 10:51 PM, sudeepgarg wrote:
>  I would like to ask that after merging with old segments which was
>  generated
> with term vectors=true this won't cause any trouble i.e. index corruption
> or
> index mismatch. and new segments will be merge with old segments
> irrespective we have disable the term feature or not. 
> And we are fine with this that data stay in index. We are just try to
> disable term vector feature in our environment as we don't require it any
> more. 
> 
> Is this correct what i understood please confirm this?

I would expect that this will be absolutely fine. What I'd suggest is
simply that you try it. You'd know pretty quickly if it were to cause
you issues.

Upayavira


Re: Sorting documents by nested / child docs with FunctionQueries

2015-06-24 Thread Mikhail Khludnev
Just pulled and launched Solr 5.2.1

dropped multivalued child into see data below. Response is quite correct:

"id":"22", "COLOR_s":"Blue","SIZE_ss":["XL","XXL"]}]}]

http://localhost:8983/solr/solr/select?q={!parent+which%3Dtype_s%3Aparent}%2BCOLOR_s%3ABlue+%2BSIZE_ss%3AXL&fl=id%2C[child+parentFilter%3Dtype_s%3Aparent+childFilter%3D-type_s%3Aparent+limit%3D100]&wt=json&indent=true

{
  "responseHeader":{
"status":0,
"QTime":3,
"params":{
  "q":"{!parent which=type_s:parent}+COLOR_s:Blue +SIZE_ss:XL",
  "indent":"true",
  "fl":"id,[child parentFilter=type_s:parent
childFilter=-type_s:parent limit=100]",
  "wt":"json"}},
  "response":{"numFound":2,"start":0,"docs":[
  {
"id":"10",
"_childDocuments_":[
{
  "id":"11",
  "COLOR_s":"Red",
  "SIZE_ss":["XL"]},
{
  "id":"12",
  "COLOR_s":"Blue",
  "SIZE_ss":["XL"]}]},
  {
"id":"20",
"_childDocuments_":[
{
  "id":"21",
  "COLOR_s":"Red",
  "SIZE_ss":["M"]},
{
  "id":"22",
  "COLOR_s":"Blue",
  "SIZE_ss":["XL",
"XXL"]}]}]
  }}


  *:*
  

  10
  parent
  Nike
  
11
Red
XL
  
  
12
Blue
XL
  


  20
  parent
  Nike
  
21
Red
M
  
  
22
Blue
XL
XXL
  


  30
  parent
  Puma
  
31
Red
XL
  
  
32
Blue
M
  

  
  




On Mon, Jun 22, 2015 at 2:04 PM, Maya G  wrote:

> I've tried your solution and encountered a problem.
>
> My child document has a multi-valued field.
>
> When I query the doc by its' guid, all of the field's values are returned.
> When I use the join block query only one value is returned for the
> multi-value field.
>
> Do you have any suggestions?
> Thank you,
> Maya
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940p4213242.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Implicit Router Configurations

2015-06-24 Thread Erick Erickson
There's also a nifty plugin for IntelliJ that'll allow you to edit files on ZK
if you use that IDE.

On Wed, Jun 24, 2015 at 4:51 AM, Upayavira  wrote:
> You can use the server/scripts/cloud-scripts/zkcli.sh script (or the cmd
> one) in server/scripts/cloud-scripts. Note, in older versions this is in
> example/scripts/cloud-scripts.
>
> I just used this command to get the file from zookeeper:
>
> server/scripts/cloud-scripts/zkcli.sh -z localhost:9983 -cmd getfile
> /clusterstate.json clusterstate.json
>
> You can use -cmd putfile to push it back to Zookeeper. As Erick says,
> have all nodes on your cluster down at the time. And as Erick says, this
> is not something that people are recommended to be doing generally.
>
> Upayavira
>
> On Wed, Jun 24, 2015, at 07:54 AM, Arnon Yogev wrote:
>> Thank you Erick,
>>
>> What is the recommended way to manually change clusterstate.json?
>> Is there a java code \ script way of editing a file in ZK?
>>
>> Best,
>> Arnon
>>
>>
>>
>> From:   Erick Erickson 
>> To: solr-user@lucene.apache.org
>> Date:   23/06/2015 09:09 PM
>> Subject:Re: Implicit Router Configurations
>>
>>
>>
>> Please raise a JIRA for this, I can see why this would occur.
>> You can manually change the clusterstate.json file when this
>> happens as a stop-gap, I'd have all the Solr instances down
>> when doing this though.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Tue, Jun 23, 2015 at 8:19 AM, Arnon Yogev  wrote:
>> > We have a use case where documents are indexed in shards according to a
>> > specific field (shard per user), and the number of shards is unknown
>> when
>> > creating the collection.
>> > For that purpose we use the implicit router and define
>> router.field=user.
>> >
>> > From what we've seen, the only way to define an implicit router is
>> during
>> > the collection creation.
>> > Moreover, the router definitions (router.name and router.field) are kept
>> > only in clusterstate.json and not in any solr configuration file on
>> disk.
>> >
>> > In some cases solr state becomes inconsistent and we need to delete the
>> > configs from ZK and restart the solr server. The behavior we see is the
>> > new clusterstate.json generated by solr on startup has the default
>> > router.name=compositeId, which is not what we defined during creation.
>> >
>> > Are we missing something? Is there a place to configure the implicit
>> > router on disk such that it will be persistent?
>> >
>> > Thanks,
>> > Arnon
>> >
>> >
>>
>>


Re: solr help

2015-06-24 Thread Erick Erickson
See inline.

On Wed, Jun 24, 2015 at 1:49 PM,  wrote:

> Hi
> I am new and learning Solr-5.2.1. I am using windows without
> servlet.(using post.jar)
> i have manage to index some files and try searching on
> http://localhost:8983/solr/test/browse
>
> I have few question to ask.
> 1. Can i modify "browse" to show little bit of the content that i search?
> ex) searching apple
> results)  this apple is red...
>

Sure, just look in the velocity subdirectory and modify the appropriate .vm
files.


> 2. can i index JAR or JAVA file?
> i have indexed jar file by changing it to txt form using Tika but
> i wonder if solr can do it because i have bunch of jar files.
>

Sure to the extent that Tika supports it, which it does. You'll have to
familiarize yourself with
what Tika actually does with these types of files.


> 3.Can solr update index automatically?
> if i change content of test folder( that i have indexed)
>
>
No. You'll have to write something that monitors or periodically checks the
files and re-indexes.
There are various commercial products that will handle this.

Best,
Erick


> i have done several searches but still having trouble.
> Thankyou!
>  Seunghun Han
>
> IT Intern | LeasePlan USA
> Direct: 678-202-8860 | Toll-free: 800 457 8721 (ext. 8860)
> seunghun@leaseplan.com
> 1165 Sanctuary Parkway | Alpharetta, GA 30009
> *www.us.leaseplan.com*
>
> It's easier to leaseplan
>
>
> This message (including any attachments) is confidential and may be
> privileged. If you have received it by mistake please notify the sender by
> return e-mail and delete this message from your system. Any unauthorized
> use or dissemination of this message in whole or in part is strictly
> prohibited. Please note that e-mails are susceptible to change. LeasePlan
> Corporation N.V. (including its group companies) shall not be responsible
> nor liable for the proper and complete transmission of the information
> contained in this communication nor for any delay in its receipt or damage
> to your system. LeasePlan Corporation N.V. (or its group companies) does
> not guarantee that the integrity of this communication has been maintained
> nor that this communication is free of viruses, interceptions or
> interference.
>


Re: fq versus q

2015-06-24 Thread Erick Erickson
Tell us a bit more about your test setup. 1 or 2 tests
don't mean much. For instance, if the fq query has to
load the low-level caches from disk then the q-only
query is run and doesn't that could skew the results.
Or if somehow you're hitting the queryResultCache. Or

Frankly I'd disable all my caches for running tests like
this, and be sure to mix-n-match the tests so I wasn't
getting bitten by caches.

And please tell us what the actual numbers are. 5-10X
doesn't mean much at all if it's 25ms .vs. 5 ms. It means
a lot (and something's very wrong) if it means
200ms .vs. 1,000ms.

Best,
Erick

On Wed, Jun 24, 2015 at 5:30 PM, Upayavira  wrote:
> Are you wanting to do no scoring at all, or just have a portion of the
> query not contribute to the score?
>
> If you don't want scoring at all, just sort by another field. If you
> don't have a field, I just tried "&sort=1 desc", and it worked! This
> should, if I'm right, pull documents out of the index in index order.
>
> Upayavira
>
> On Wed, Jun 24, 2015, at 08:26 PM, Shai Erera wrote:
>> Ah thanks. I see it was added in 5.1 - is there any other way prior to
>> that
>> (like 4.7)?
>>
>> if not, I guess the only option is to not use fq if we don't intend to
>> cache it, and on 5.1 use the ^= syntax.
>>
>> Shai
>>
>> On Wed, Jun 24, 2015 at 9:21 PM, Jack Krupansky
>> 
>> wrote:
>>
>> > Yonik added syntax to request a constant score query in Solr with the ^=
>> > operator.
>> >
>> > For example: +color:blue^=1 text:shoes
>> >
>> > See:
>> > https://issues.apache.org/jira/browse/SOLR-7218
>> >
>> > -- Jack Krupansky
>> >
>> > On Wed, Jun 24, 2015 at 1:41 PM, Shai Erera  wrote:
>> >
>> > > Thanks Shawn,
>> > >
>> > > What's Solr equivalence to ConstantScoreQuery? I.e., what if you want to
>> > > run a query that does not score, but only filter. The rationale behind
>> > > using a non-cached 'fq' was just that.
>> > >
>> > > Shai
>> > >
>> > > On Wed, Jun 24, 2015 at 4:29 PM, Shawn Heisey 
>> > wrote:
>> > >
>> > > > On 6/24/2015 5:28 AM, Esther Goldbraich wrote:
>> > > > > We are comparing the performance of fq versus q for queries that are
>> > > > > actually filters and should not be cached.
>> > > > > In part of queries we see strange behavior where q performs 5-10x
>> > > better
>> > > > > than fq. The question is why?
>> > > > >
>> > > > > An example1:
>> > > > > q=maildate:{DATE1 to DATE2} COMPARED TO
>> > > fq={!cache=false}maildate:{DATE1
>> > > > > to DATE2}
>> > > > > sort=maildate_sort* desc
>> > > >
>> > > > 
>> > > >
>> > > > > 
>> > > > > > > type="tdate"
>> > > > > docValues="true"/>
>> > > >
>> > > > For simplicity, I would probably just use one field for that, rather
>> > > > than a separate sort field.  The disk space required would probably be
>> > > > the same either way, but your interaction with the index will not be as
>> > > > complex.  There's nothing wrong with doing it the way you have, though.
>> > > >
>> > > > I'm not at all an expert, but I've been a member of this community for
>> > a
>> > > > long time.  Here's my guess about why your query is faster in the q
>> > > > parameter than a non-cached filter:
>> > > >
>> > > > The result of a standard query is the stored fields from the top N
>> > > > documents, where N is the value in the rows parameter.  The default for
>> > > > N is typically set to 10, and for most people will normally be 200 or
>> > > less.
>> > > >
>> > > > The result of a filter is very different -- it is a bitset of all the
>> > > > documents in your entire index, with binary 0 for documents that don't
>> > > > match the filter and binary 1 for documents that do match.
>> > > >
>> > > > If your index has 100 million documents, every single one of those 100
>> > > > million documents must be checked against the filter query to produce a
>> > > > filter bitset, but when it's in the q parameter, shortcuts can be taken
>> > > > which will get the top N results quickly.
>> > > >
>> > > > The filterCache levels the playing field when filters are re-used.  If
>> > a
>> > > > requested filter is already in the cache, it can be retrieved and
>> > > > applied to a result VERY quickly.
>> > > >
>> > > > You have turned off the caching for your filter.  I'm not sure why you
>> > > > did this, but you know your use case a lot better than I do.  If it
>> > were
>> > > > me, I would use filter queries and do everything possible to re-use the
>> > > > same filters, and I would cache them.
>> > > >
>> > > > Thanks,
>> > > > Shawn
>> > > >
>> > > >
>> > >
>> >


Re: fq versus q

2015-06-24 Thread Yonik Seeley
Why is cache=false set for the filter?
Grouping uses a 2 pass algorithm by default, so that means that the
filter will need to be generated twice (I think) if caching is turned
off.

Also, when you try to use the "fq" version, what are you using for the
main query?

-Yonik


On Wed, Jun 24, 2015 at 7:28 AM, Esther Goldbraich
 wrote:
> Hi,
>
> We are comparing the performance of fq versus q for queries that are
> actually filters and should not be cached.
> In part of queries we see strange behavior where q performs 5-10x better
> than fq. The question is why?
>
> An example1:
> q=maildate:{DATE1 to DATE2} COMPARED TO fq={!cache=false}maildate:{DATE1
> to DATE2}
> sort=maildate_sort* desc
> rows=50
> start=0
> group=true
> group.query=some query (without dates)
> group.query=*:*
> group.sort=maildate_sort desc
> additional fqs
>
> Schema:
> 
>  docValues="true"/>
>
> Thank you,
> Esther
> -
> Esther Goldbraich
> Social Technologies & Analytics - IBM Haifa Research Lab
> Phone: +972-4-8281059


Re: fq versus q

2015-06-24 Thread Esther Goldbraich
Cache=false - cause the use-case requires distinct time ranges, no reuse.
When using fq: q is set to *:*.
Are there any alternatives for the grouping algorithm?
If not, is there a way to reuse filter results between 2 passes?

Thank you,
Esther



From:
Yonik Seeley 
To:
"solr-user@lucene.apache.org" 
Cc:
Arnon Yogev/Haifa/IBM@IBMIL, Shai Erera/Haifa/IBM@IBMIL
Date:
25/06/2015 02:50 AM
Subject:
Re: fq versus q



Why is cache=false set for the filter?
Grouping uses a 2 pass algorithm by default, so that means that the
filter will need to be generated twice (I think) if caching is turned
off.

Also, when you try to use the "fq" version, what are you using for the
main query?

-Yonik


On Wed, Jun 24, 2015 at 7:28 AM, Esther Goldbraich
 wrote:
> Hi,
>
> We are comparing the performance of fq versus q for queries that are
> actually filters and should not be cached.
> In part of queries we see strange behavior where q performs 5-10x better
> than fq. The question is why?
>
> An example1:
> q=maildate:{DATE1 to DATE2} COMPARED TO fq={!cache=false}maildate:{DATE1
> to DATE2}
> sort=maildate_sort* desc
> rows=50
> start=0
> group=true
> group.query=some query (without dates)
> group.query=*:*
> group.sort=maildate_sort desc
> additional fqs
>
> Schema:
> 
>  docValues="true"/>
>
> Thank you,
> Esther
> -
> Esther Goldbraich
> Social Technologies & Analytics - IBM Haifa Research Lab
> Phone: +972-4-8281059





Re: Sorting documents by nested / child docs with FunctionQueries

2015-06-24 Thread Mikhail Khludnev
no way. it's SOLR-6096 aka SOLR-6700

On Thu, Jun 25, 2015 at 9:16 AM, מאיה גלעד  wrote:

> Hey
> Your example works on my cloud but my problem didn't resolve.
>
> I'be checked and found the following :
> 1. When a child is created with multivalues it can be queried correctly
> with the url you've given me.
> 2. If you add a new value to a field in an existing child it doesn't
> return in the father-child query but can be queried individualy.
>
> Thank you,
> Maya
> Just pulled and launched Solr 5.2.1
>
> dropped multivalued child into see data below. Response is quite correct:
>
> "id":"22", "COLOR_s":"Blue","SIZE_ss":["XL","XXL"]}]}]
>
>
> http://localhost:8983/solr/solr/select?q={!parent+which%3Dtype_s%3Aparent}%2BCOLOR_s%3ABlue+%2BSIZE_ss%3AXL&fl=id%2C[child+parentFilter%3Dtype_s%3Aparent+childFilter%3D-type_s%3Aparent+limit%3D100]&wt=json&indent=true
> 
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":3,
> "params":{
>   "q":"{!parent which=type_s:parent}+COLOR_s:Blue +SIZE_ss:XL",
>   "indent":"true",
>   "fl":"id,[child parentFilter=type_s:parent childFilter=-type_s:parent 
> limit=100]",
>   "wt":"json"}},
>   "response":{"numFound":2,"start":0,"docs":[
>   {
> "id":"10",
> "_childDocuments_":[
> {
>   "id":"11",
>   "COLOR_s":"Red",
>   "SIZE_ss":["XL"]},
> {
>   "id":"12",
>   "COLOR_s":"Blue",
>   "SIZE_ss":["XL"]}]},
>   {
> "id":"20",
> "_childDocuments_":[
> {
>   "id":"21",
>   "COLOR_s":"Red",
>   "SIZE_ss":["M"]},
> {
>   "id":"22",
>   "COLOR_s":"Blue",
>   "SIZE_ss":["XL",
> "XXL"]}]}]
>   }}
>
> 
>   *:*
>   
> 
>   10
>   parent
>   Nike
>   
> 11
> Red
> XL
>   
>   
> 12
> Blue
> XL
>   
> 
> 
>   20
>   parent
>   Nike
>   
> 21
> Red
> M
>   
>   
> 22
> Blue
> XL
> XXL
>   
> 
> 
>   30
>   parent
>   Puma
>   
> 31
> Red
> XL
>   
>   
> 32
> Blue
> M
>   
> 
>   
>   
> 
>
>
>
> On Mon, Jun 22, 2015 at 2:04 PM, Maya G  wrote:
>
>> I've tried your solution and encountered a problem.
>>
>> My child document has a multi-valued field.
>>
>> When I query the doc by its' guid, all of the field's values are returned.
>> When I use the join block query only one value is returned for the
>> multi-value field.
>>
>> Do you have any suggestions?
>> Thank you,
>> Maya
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940p4213242.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: How to do a Data sharding for data in a database table

2015-06-24 Thread William Bell
1GB is too small to start. Try starting the same on both:

 -Xms8196m -Xmx8196m

We use 12GB for these on a similar sized index and it works good.

Send schema.xml and solrconfig.xml.

Try not to store fields as much as possible.

On Wed, Jun 24, 2015 at 8:08 AM, wwang525  wrote:

> Hi All,
>
> I built the Solr index with 14 M records.
>
> I have > 20 G RAM in my local machine, and the Solr instance was started
> with -Xms1024m -Xmx8196m
>
> The following query:
>
>
> http://localhost:8983/solr/db-mssql/select?q=*:*&fq=GatewayCode:(YYZ)&fq=DestCode:(CUN)&fq=Duration:(5
> OR 6 OR 7 OR 8)&fq=DateDep:([20150610 TO
>
> 20150810])&facet=true&facet.field=DestCode&facet.field=DateDep&facet.field=GatewayCode&facet.field=HotelName&facet.sort=count&facet.limit=40&facet.mincount=1&rows=30&group=true&group.field=HotelCode&group.ngroups=true&group.facet=true&debugQuery=true
>
> The response found a total matched base records of 98105, these records
> were
> grouped at hotelcode level to give the ngroups: 143, however, the query
> only
> retrieve the first base record of each group, and only 30 groups were
> retrieved.
>
> The performance statistics:
>
> Total response time in solr.log: 1791 ms
> From the query response page: the query took 764 ms and facet took 1007 ms.
> Debug took 13 ms
>
> This is a typical query that business need. Previously, I was testing the
> data size of 6 M and no faceted search, the typical response time at single
> request scenario was around 200 ms.
>
> Please let me know if additional information is needed.
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4213648.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076