date:20170705

Re: xml indexing

2017-07-05 Thread txlap786

Thanks for your reply,
but it only works when i got no response.  But as i said im working on
arrays. As soon as i get an array it doesnt matter if array's length is 1 or
105 it returns what i get earlier.

 #1  json response
"detailComment", 
[ 
"100.01",
null,
"102.01",
null 
] 
return as
 #1  indexed
"detailComment": [ 
"100.01",
"102.01"
] 

 #2  json response
"detailComment", 
[ 
null 
] 
return as
 #2  indexed
"detailComment": [ 
"0.0"
] 

Result i want to see
 #3  json response
"detailComment", 
[ 
"100.01",
null,
"102.01",
null 
] 
return as
 #3  indexed
"detailComment": [ 
"100.01",
"0.0",
"102.01",
"0.0"
] 


detailComment
0.0




dih-config.xml
upd





--
View this message in context: 
http://lucene.472066.n3.nabble.com/xml-indexing-tp4344191p4344298.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strange boolean query behaviour on 5.5.4

2017-07-05 Thread Bram Van Dam

On 04/07/17 18:10, Erick Erickson wrote:
> I think you'll get what you expect by something like:
> (*:* -someField:Foo) AND (otherField: (Bar OR Baz))

Yeah that's what I figured. It's not a big deal since we generate Solr
syntax using a parser/generator on top of our own query syntax. Still a
little strange!

Thanks for the heads up,

 - Bram

Re: Did /export use to emit tuples and now does not?

2017-07-05 Thread Ronald Wood

Thanks, Joel. I just wanted to confirm, as I was having trouble tracking down 
when the change occurred.

-R

On 04/07/2017, 23:51, "Joel Bernstein"  wrote:

In the very early releases (5x) the /export handler had a different format
then the /search handler. Later the /export handler was changed to have the
same basic response format as the /search handler. This was done in
anticipation of unifying /search and /export at a later date.

The /export handler still powers the parallel relational algebra
expressions. In Solr 7.0 there is a shuffle expression that always uses the
/export handler to sort and partition result sets. In 6x the search
expression can be used with the qt=/export param to use the /export handler.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jul 4, 2017 at 11:38 AM, Ronald Wood  wrote:

> 9 months ago I did a proof of concept for solr streaming using the /export
> handler. At that time, I got tuples back.
>
> Now when I try 6.x, I get results in a format similar to /search
> (including a count), instead of tuples (with an EOF).
>
> Did something change between 5.x and 6.x in this regard?
>
> I am trying to stream results in a non-cloud scenario, and I was under the
> impression that /export was the primitive handler for the more advanced
> streaming operations only possible under Solr Cloud.
>
> I am using official docker images for testing. I tried to to retest under
> 5.5.4 but I need to do some more work as docValues aren’t the default when
> using the gettingstarted index.
>
> -Ronald Wood
>
>

High disk write usage

2017-07-05 Thread Antonio De Miguel

Hi,

We are implementing a solrcloud cluster (6.6 version) with NRT requisites.
We are indexing 600 docs/sec with 1500 docs/sec peaks, and we are serving
about 1500qps.

Our documents has 300 fields with some doc values, about 4kb and we have 3
million of documents.

HardCommit is set to 15 minutes, but disk writing is about 15mbps all the
time (60mbps on peaks), without higher writing disk rates each 15
minutes ... ¿is this the expected behaviour?

RE: Solr Prod Issue | KeeperErrorCode = ConnectionLoss for /overseer_elect/leader

2017-07-05 Thread Bhalla, Rahat

Hi

I'm not sure if any of you have had a chance to see this email yet.

We had a reoccurrence of the Issue Today, and I'm attaching the Logs from today 
as well inline below.

Please let me know if any of you have seen this issue before as this would 
really help me to get to the root of the problem to fix it. I'm a little lost 
here and not entirely sure what to do.

Thanks,
Rahat Bhalla

8696248 [qtp778720569-28] [ WARN] 2017-07-04 01:40:20 
(HttpParser.java:parseNext:1391) - parse exception: 
java.lang.IllegalArgumentException: No Authority for 
HttpChannelOverHttp@30a86e14{r=0,c=false,a=IDLE,uri=null}
java.lang.IllegalArgumentException: No Authority
at 
org.eclipse.jetty.http.HostPortHttpField.(HostPortHttpField.java:43)
at org.eclipse.jetty.http.HttpParser.parsedHeader(HttpParser.java:877)
at org.eclipse.jetty.http.HttpParser.parseHeaders(HttpParser.java:1050)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1266)
at 
org.eclipse.jetty.server.HttpConnection.parseRequestBuffer(HttpConnection.java:344)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:227)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Unknown Source)
8697308 [qtp778720569-21] [ WARN] 2017-07-04 01:40:21 
(HttpParser.java:parseNext:1364) - bad HTTP parsed: 400 Bad URI for 
HttpChannelOverHttp@1276{r=16,c=false,a=IDLE,uri=/../../../../../../../../../../etc/passwd}
8697338 [qtp778720569-29] [ WARN] 2017-07-04 01:40:21 
(HttpParser.java:parseNext:1364) - bad HTTP parsed: 400 No Host for 
HttpChannelOverHttp@50a994ce{r=29,c=false,a=IDLE,uri=null}
8697388 [qtp778720569-21] [ WARN] 2017-07-04 01:40:22 
(HttpParser.java:parseNext:1364) - bad HTTP parsed: 400 Bad URI for 
HttpChannelOverHttp@19a624ec{r=1,c=false,a=IDLE,uri=//prod-solr-node01.healthplan.com:9080/solr/admin/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd}
8697401 [qtp778720569-27] [ WARN] 2017-07-04 01:40:22 
(URIUtil.java:decodePath:348) - 
/solr/admin/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/etc/passwd
 org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte 
C0 in state 0
8697444 [qtp778720569-25] [ WARN] 2017-07-04 01:40:22 
(URIUtil.java:decodePath:348) - 
/solr/admin/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/etc/passwd
 org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte 
80 in state 4
8697475 [qtp778720569-26] [ WARN] 2017-07-04 01:40:22 
(URIUtil.java:decodePath:348) - 
/solr/admin/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/etc/passwd
 org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte 
80 in state 6
8697500 [qtp778720569-29] [ WARN] 2017-07-04 01:40:22 
(URIUtil.java:decodePath:348) - 
/solr/admin/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/%f8%80%80%80%ae%f8%80%80%80%ae/etc/passwd
 org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte 
F8 in state 0
8706641 [qtp778720569-27] [ WARN] 2017-07-04 01:40:31 
(HttpParser.java:parseNext:1364) - bad HTTP parsed: 400 Unknown Version for 
HttpChannelOverHttp@7fcd594a{r=54,c=false,a=IDLE,uri=null}
8707033 [qtp778720569-20] [ WARN] 2017-07-04 01:40:31 
(HttpParser.java:parseNext:1364) - bad HTTP parsed: 400 Unknown Version for 
HttpChannelOverHttp@66740d77{r=54,c=false,a=IDLE,uri=null}
8719390 [qtp778720569-23] [ WARN] 2017-07-04 01:40:44 
(HttpParser.java::1740) - Illegal character 0xA in state=HEADER_IN_N

Re: Solr Prod Issue | KeeperErrorCode = ConnectionLoss for /overseer_elect/leader

2017-07-05 Thread Ere Maijala

From the fact that someone has tried to access /etc/passwd file via 
your Solr (see all those WARN messages), it seems you have it exposed to 
the world, unless of course it's a security scanner you use internally. 
Internet is a hostile place, and the very first thing I would do is 
shield Solr from external traffic. Even if it's your own security 
scanning, I wouldn't do it until you have the system stable.


Doing the above you'll reduce noise in the logs and might be able to 
better identify the issue.


Losing the Zookeeper connection is typically a Java garbage collection 
issue. If GC causes too long pauses, the connection may time out. So I 
would recommend you start by reading 
https://wiki.apache.org/solr/SolrPerformanceProblems and 
https://wiki.apache.org/solr/ShawnHeisey. Also make sure that 
Zookeeper's Java settings are good.


--Ere

Bhalla, Rahat kirjoitti 5.7.2017 klo 11.05:

Hi

I’m not sure if any of you have had a chance to see this email yet.

We had a reoccurrence of the Issue Today, and I’m attaching the Logs 
from today as well inline below.


Please let me know if any of you have seen this issue before as this 
would really help me to get to the root of the problem to fix it. I’m a 
little lost here and not entirely sure what to do.


Thanks,

Rahat Bhalla

8696248 [qtp778720569-28] [ WARN] 2017-07-04 01:40:20 
(HttpParser.java:parseNext:1391) - parse exception: 
java.lang.IllegalArgumentException: No Authority for 
HttpChannelOverHttp@30a86e14{r=0,c=false,a=IDLE,uri=null}


java.lang.IllegalArgumentException: No Authority

 at 
org.eclipse.jetty.http.HostPortHttpField.(HostPortHttpField.java:43)


 at 
org.eclipse.jetty.http.HttpParser.parsedHeader(HttpParser.java:877)


 at 
org.eclipse.jetty.http.HttpParser.parseHeaders(HttpParser.java:1050)


 at 
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1266)


 at 
org.eclipse.jetty.server.HttpConnection.parseRequestBuffer(HttpConnection.java:344)


 at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:227)


 at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)


 at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)

 at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)


 at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)


 at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)


 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)


 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)


 at java.lang.Thread.run(Unknown Source)

8697308 [qtp778720569-21] [ WARN] 2017-07-04 01:40:21 
(HttpParser.java:parseNext:1364) - bad HTTP parsed: 400 Bad URI for 
HttpChannelOverHttp@1276{r=16,c=false,a=IDLE,uri=/../../../../../../../../../../etc/passwd}


8697338 [qtp778720569-29] [ WARN] 2017-07-04 01:40:21 
(HttpParser.java:parseNext:1364) - bad HTTP parsed: 400 No Host for 
HttpChannelOverHttp@50a994ce{r=29,c=false,a=IDLE,uri=null}


8697388 [qtp778720569-21] [ WARN] 2017-07-04 01:40:22 
(HttpParser.java:parseNext:1364) - bad HTTP parsed: 400 Bad URI for 
HttpChannelOverHttp@19a624ec{r=1,c=false,a=IDLE,uri=//prod-solr-node01.healthplan.com:9080/solr/admin/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd}


8697401 [qtp778720569-27] [ WARN] 2017-07-04 01:40:22 
(URIUtil.java:decodePath:348) - 
/solr/admin/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/etc/passwd 
org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! 
byte C0 in state 0


8697444 [qtp778720569-25] [ WARN] 2017-07-04 01:40:22 
(URIUtil.java:decodePath:348) - 
/solr/admin/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/%e0%80%ae%e0%80%ae/etc/passwd 
org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! 
byte 80 in state 4


8697475 [qtp778720569-26] [ WARN] 2017-07-04 01:40:22 
(URIUtil.java:decodePath:348) - 
/solr/admin/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/%f0%80%80%ae%f0%80%80%ae/etc/passwd 
org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! 
byte 80 in state 6


8697500 [qtp778720569-29] [ WARN] 2017-07-04 01:40:22 
(URIUtil.java:decodePath:348) - 
/solr/admin/%f8%80%80%80%

help on implicit routing

2017-07-05 Thread imran

I am trying out the document routing feature in Solr 6.4.1. I am unable to 
comprehend the documentation where it states that 
“The 'implicit' router does not
automatically route documents to different
shards.  Whichever shard you indicate on the
indexing request (or within each document) will
be used as the destination for those documents”

How do you specify the shard inside a document? E.g If I have basic collection 
with two shards called day_1 and day_2. What value should be populated in the 
router field that will ensure the document routing to the respective shard?

Regards,
Imran

Sent from Mail for Windows 10

index new discovered fileds of different types

2017-07-05 Thread Thaer Sammar

Hi,
We are trying to index documents of different types. Document have different 
fields. fields are known at indexing time. We run a query on a database and we 
index what comes using query variables as field names in solr. Our current 
solution: we use dynamic fields with prefix, for example feature_i_*, the issue 
with that
1) we need to define the type of the dynamic field and to be able to cover the 
type of discovered fields we define the following
feature_i_* for integers, feature_t_* for string, feature_d_* for double, 
1.a) this means we need to check the type of the discovered field and then put 
in the corresponding dynamic field
2) at search time, we need to know the right prefix
We are looking for help to find away to ignore the prefix and check of the type

regards,
Thaer

Re: Solr dynamic "on the fly fields"

2017-07-05 Thread Pablo Anzorena

Thanks Erick for the answer. Function Queries are great, but for my use
case what I really do is making aggregations (using Json Facet for example)
with this functions.

I have tried using Function Queries with Json Facet but it does not support
it.

Any other idea you can imagine?





2017-07-03 21:57 GMT-03:00 Erick Erickson :

> I don't know how one would do this. But I would ask what the use-case
> is. Creating such fields at index time just seems like it would be
> inviting abuse by creating a zillion fields as you have no control
> over what gets created. I'm assuming your tenants don't talk to each
> other
>
> Have you thought about using function queries to pull this data out as
> needed at _query_ time? See:
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
> Best,
> Erick
>
> On Mon, Jul 3, 2017 at 12:06 PM, Pablo Anzorena 
> wrote:
> > Thanks Erick,
> >
> > For my use case it's not possible any of those solutions. I have a
> > multitenancy scheme in the most basic level, that is I have a single
> > collection with fields (clientId, field1, field2, ..., field50) attending
> > many clients.
> >
> > Clients can create custom fields based on arithmetic operations of any
> > other field.
> >
> > So, is it possible to update let's say field49 with the follow operation:
> > log(field39) + field25 on clientId=43?
> >
> > Do field39 and field25 need to be stored to accomplish this? Is there any
> > other way to avoid storing them?
> >
> > Thanks!
> >
> >
> > 2017-07-03 15:00 GMT-03:00 Erick Erickson :
> >
> >> There are two ways:
> >> 1> define a dynamic field pattern, i.e.
> >>
> >> 
> >>
> >> Now just add any field in the doc you want. If it ends in "_sum" and
> >> no other explicit field matches you have a new field.
> >>
> >> 2> Use the managed schema to add these on the fly. I don't recommend
> >> this from what I know of your use case, this is primarily intended for
> >> front-ends to be able to modify the schema and/or "field guessing".
> >>
> >> I do caution you though that either way don't go over-the-top. If
> >> you're thinking of thousands of different fields that can lead to
> >> performance issues.
> >>
> >> You can either put stuff in the field on your indexing client or
> >> create a custom update component, perhaps the simplest would be a
> >> "StatelessScriptUpdateProcessorFactory:
> >>
> >> see: https://cwiki.apache.org/confluence/display/solr/
> >> Update+Request+Processors#UpdateRequestProcessors-
> >> UpdateRequestProcessorFactories
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Jul 3, 2017 at 10:52 AM, Pablo Anzorena <
> anzorena.f...@gmail.com>
> >> wrote:
> >> > Hey,
> >> >
> >> > I was wondering if there is some way to add fields "on the fly" based
> on
> >> > arithmetic operations on other fields. For example add a new field
> >> > "custom_field" = log(field1) + field2 -5.
> >> >
> >> > Thanks.
> >>
>

Re: High disk write usage

2017-07-05 Thread alessandro.benedetti

Is the phisical machine dedicated ? Is a dedicated VM on shared metal ?
Apart from this operational checks I will assume the machine is dedicated.

In Solr a write to the disk does not happen only on commit, I can think to
other scenarios :
1) *Transaction log* [1] 
2) 



3) Spellcheck and SuggestComponent  building ( this depends on the config in
case you use them)

4) memory Swapping ?

5) merges ( they are triggered potentially by a segment writing or an
explicit optimize call and they can last a while potentially)

Maybe other edge cases, but i would first check this list!

[1]
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/




-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-disk-write-usage-tp4344356p4344383.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: High disk write usage

2017-07-05 Thread alessandro.benedetti

Point 2 was the ram Buffer size :

*ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
 indexing for buffering added documents and deletions before they
are
 flushed to the Directory.
 maxBufferedDocs sets a limit on the number of documents buffered
 before flushing.
 If both ramBufferSizeMB and maxBufferedDocs is set, then
 Lucene will flush based on whichever limit is hit first.  

100
1000




-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-disk-write-usage-tp4344356p4344386.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-07-05 Thread Susheel Kumar

Does "uniq" expression sounds good to use for UniqueMetric class?

Thanks,
Susheel

On Tue, Jul 4, 2017 at 5:45 PM, Susheel Kumar  wrote:

> Hello Joel,
>
> I tried to create a patch to add UniqueMetric and it works, but soon
> realized, we have UniqueStream as well and can't load both of them (like
> below) when required,  since both uses "unique" keyword.
>
> Any advice how we can handle this.  Come up with different keyword for
> UniqueMetric or rename UniqueStream etc..?
>
>StreamFactory factory = new StreamFactory()
>  .withCollectionZkHost (...)
>.withFunctionName("facet", FacetStream.class)
>  .withFunctionName("sum", SumMetric.class)
>  .withFunctionName("unique", UniqueStream.class)
>  .withFunctionName("unique", UniqueMetric.class)
>
> On Thu, Jun 29, 2017 at 9:32 AM, Joel Bernstein 
> wrote:
>
>> This is mainly due to focus on other things. It would great to support all
>> the aggregate functions in facet, rollup and timeseries expressions.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Thu, Jun 29, 2017 at 8:23 AM, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > We are working on the Solr Streaming expression, using the facet stream
>> > source.
>> >
>> > As the underlying structure is using JSON Facet, would like to find out
>> why
>> > the unique() metrics is not supported? Currently, it only supports
>> sum(col)
>> > , avg(col), min(col), max(col), count(*)
>> >
>> > I'm using Solr 6.5.1
>> >
>> > Regards,
>> > Edwin
>> >
>>
>
>

Re: cursorMark / Deep Paging and SEO

2017-07-05 Thread Shawn Heisey

On 6/30/2017 1:30 AM, Jacques du Rand wrote:
> I'm not quite sure I understand the deep paging / cursorMark internals
>
> We have implemented it on our search pages like so:
>
> http://mysite.com/search?foobar&page=1
> http://mysite.com/search?foobar&page=2&cmark=djkldskljsdsa
> http://mysite.com/search?foobar&page=3&cmark=uoieuwqjdlsa
>
> So if we reindex the data the cursorMark for search "foobar" and page2 will
> the cursorMark value  change ???
>
> But google might have already index our page as  "
> http://mysite.com/search?foobar&page=2&cmark=djkldskljsdsa"; so that
> cursorMark will keep changing ??

The cursorMark feature does not use page numbers, so your "page"
parameter won't provide any useful information to Solr.  Presumably
you're using that for your own application.

The string values used in cursorMark have a tendency to lose their
usefulness the more you index after making the query, so they are not
useful to have in Google's index.  The cursorMark value points at a
specific document ... if you index new documents or delete old
documents, that specific document might end up on a completely different
page number than where it was when you initially made the query.

Thanks,
Shawn

Re: High disk write usage

2017-07-05 Thread Antonio De Miguel

Thnaks a lot alessandro!

Yes, we have very big physical dedicated machines, with a topology of 5
shards and10 replicas each shard.


1. transaction log files are increasing but not with this rate

2.  we 've probed with values between 300 and 2000 MB... without any
visible results

3.  We don't use those features

4. No.

5. I've probed with low and high mergefacors and i think that is  the point.

With low merge factor (over 4) we 've high write disk rate as i said
previously

with merge factor of 20, writing disk rate is decreasing, but now, with
high qps rates (over 1000 qps) system is overloaded.

i think that's the expected behaviour :(




2017-07-05 15:49 GMT+02:00 alessandro.benedetti :

> Point 2 was the ram Buffer size :
>
> *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
>  indexing for buffering added documents and deletions before they
> are
>  flushed to the Directory.
>  maxBufferedDocs sets a limit on the number of documents buffered
>  before flushing.
>  If both ramBufferSizeMB and maxBufferedDocs is set, then
>  Lucene will flush based on whichever limit is hit first.
>
> 100
> 1000
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/High-disk-write-usage-tp4344356p4344386.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

RE: High disk write usage

2017-07-05 Thread Markus Jelsma

Try mergeFactor of 10 (default) which should be fine in most cases. If you got 
an extreme case, either create more shards and consider better hardware (SSD's)
 
-Original message-
> From:Antonio De Miguel 
> Sent: Wednesday 5th July 2017 16:48
> To: solr-user@lucene.apache.org
> Subject: Re: High disk write usage
> 
> Thnaks a lot alessandro!
> 
> Yes, we have very big physical dedicated machines, with a topology of 5
> shards and10 replicas each shard.
> 
> 
> 1. transaction log files are increasing but not with this rate
> 
> 2.  we 've probed with values between 300 and 2000 MB... without any
> visible results
> 
> 3.  We don't use those features
> 
> 4. No.
> 
> 5. I've probed with low and high mergefacors and i think that is  the point.
> 
> With low merge factor (over 4) we 've high write disk rate as i said
> previously
> 
> with merge factor of 20, writing disk rate is decreasing, but now, with
> high qps rates (over 1000 qps) system is overloaded.
> 
> i think that's the expected behaviour :(
> 
> 
> 
> 
> 2017-07-05 15:49 GMT+02:00 alessandro.benedetti :
> 
> > Point 2 was the ram Buffer size :
> >
> > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
> >  indexing for buffering added documents and deletions before they
> > are
> >  flushed to the Directory.
> >  maxBufferedDocs sets a limit on the number of documents buffered
> >  before flushing.
> >  If both ramBufferSizeMB and maxBufferedDocs is set, then
> >  Lucene will flush based on whichever limit is hit first.
> >
> > 100
> > 1000
> >
> >
> >
> >
> > -
> > ---
> > Alessandro Benedetti
> > Search Consultant, R&D Software Engineer, Director
> > Sease Ltd. - www.sease.io
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/High-disk-write-usage-tp4344356p4344386.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: index new discovered fileds of different types

2017-07-05 Thread Furkan KAMACI

Hi Thaer,

Do you use schemeless mode [1] ?

Kind Regards,
Furkan KAMACI

[1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode

On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar  wrote:

> Hi,
> We are trying to index documents of different types. Document have
> different fields. fields are known at indexing time. We run a query on a
> database and we index what comes using query variables as field names in
> solr. Our current solution: we use dynamic fields with prefix, for example
> feature_i_*, the issue with that
> 1) we need to define the type of the dynamic field and to be able to cover
> the type of discovered fields we define the following
> feature_i_* for integers, feature_t_* for string, feature_d_* for double,
> 
> 1.a) this means we need to check the type of the discovered field and then
> put in the corresponding dynamic field
> 2) at search time, we need to know the right prefix
> We are looking for help to find away to ignore the prefix and check of the
> type
>
> regards,
> Thaer

Re: High disk write usage

2017-07-05 Thread Antonio De Miguel

thanks Markus!

We already have SSD.

About changing topology we probed yesterday with 10 shards, but system
goes more inconsistent than with the current topology (5x10). I dont know
why... too many traffic perhaps?

About merge factor.. we set default configuration for some days... but when
a merge occurs system overload. We probed with mergefactor of 4 to improbe
query times and trying to have smaller merges.

2017-07-05 16:51 GMT+02:00 Markus Jelsma :

> Try mergeFactor of 10 (default) which should be fine in most cases. If you
> got an extreme case, either create more shards and consider better hardware
> (SSD's)
>
> -Original message-
> > From:Antonio De Miguel 
> > Sent: Wednesday 5th July 2017 16:48
> > To: solr-user@lucene.apache.org
> > Subject: Re: High disk write usage
> >
> > Thnaks a lot alessandro!
> >
> > Yes, we have very big physical dedicated machines, with a topology of 5
> > shards and10 replicas each shard.
> >
> >
> > 1. transaction log files are increasing but not with this rate
> >
> > 2.  we 've probed with values between 300 and 2000 MB... without any
> > visible results
> >
> > 3.  We don't use those features
> >
> > 4. No.
> >
> > 5. I've probed with low and high mergefacors and i think that is  the
> point.
> >
> > With low merge factor (over 4) we 've high write disk rate as i said
> > previously
> >
> > with merge factor of 20, writing disk rate is decreasing, but now, with
> > high qps rates (over 1000 qps) system is overloaded.
> >
> > i think that's the expected behaviour :(
> >
> >
> >
> >
> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti :
> >
> > > Point 2 was the ram Buffer size :
> > >
> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
> > >  indexing for buffering added documents and deletions before
> they
> > > are
> > >  flushed to the Directory.
> > >  maxBufferedDocs sets a limit on the number of documents
> buffered
> > >  before flushing.
> > >  If both ramBufferSizeMB and maxBufferedDocs is set, then
> > >  Lucene will flush based on whichever limit is hit first.
> > >
> > > 100
> > > 1000
> > >
> > >
> > >
> > >
> > > -
> > > ---
> > > Alessandro Benedetti
> > > Search Consultant, R&D Software Engineer, Director
> > > Sease Ltd. - www.sease.io
> > > --
> > > View this message in context: http://lucene.472066.n3.
> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>

Re: index new discovered fileds of different types

2017-07-05 Thread Thaer Sammar

Hi Furkan,

No, In the schema we also defined some static fields such as uri and geo
field.

On 5 July 2017 at 17:07, Furkan KAMACI  wrote:

> Hi Thaer,
>
> Do you use schemeless mode [1] ?
>
> Kind Regards,
> Furkan KAMACI
>
> [1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
>
> On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar  wrote:
>
> > Hi,
> > We are trying to index documents of different types. Document have
> > different fields. fields are known at indexing time. We run a query on a
> > database and we index what comes using query variables as field names in
> > solr. Our current solution: we use dynamic fields with prefix, for
> example
> > feature_i_*, the issue with that
> > 1) we need to define the type of the dynamic field and to be able to
> cover
> > the type of discovered fields we define the following
> > feature_i_* for integers, feature_t_* for string, feature_d_* for double,
> > 
> > 1.a) this means we need to check the type of the discovered field and
> then
> > put in the corresponding dynamic field
> > 2) at search time, we need to know the right prefix
> > We are looking for help to find away to ignore the prefix and check of
> the
> > type
> >
> > regards,
> > Thaer
>

Re: index new discovered fileds of different types

2017-07-05 Thread Erick Erickson

I really have no idea what "to ignore the prefix and check of the type" means.

When? How? Can you give an example of inputs and outputs? You might
want to review:
https://wiki.apache.org/solr/UsingMailingLists

And to add to what Furkan mentioned, in addition to schemaless you can
use "managed schema"
which will allow you to add fields and types on the fly.

Best,
Erick

On Wed, Jul 5, 2017 at 8:12 AM, Thaer Sammar  wrote:
> Hi Furkan,
>
> No, In the schema we also defined some static fields such as uri and geo
> field.
>
> On 5 July 2017 at 17:07, Furkan KAMACI  wrote:
>
>> Hi Thaer,
>>
>> Do you use schemeless mode [1] ?
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>> [1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
>>
>> On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar  wrote:
>>
>> > Hi,
>> > We are trying to index documents of different types. Document have
>> > different fields. fields are known at indexing time. We run a query on a
>> > database and we index what comes using query variables as field names in
>> > solr. Our current solution: we use dynamic fields with prefix, for
>> example
>> > feature_i_*, the issue with that
>> > 1) we need to define the type of the dynamic field and to be able to
>> cover
>> > the type of discovered fields we define the following
>> > feature_i_* for integers, feature_t_* for string, feature_d_* for double,
>> > 
>> > 1.a) this means we need to check the type of the discovered field and
>> then
>> > put in the corresponding dynamic field
>> > 2) at search time, we need to know the right prefix
>> > We are looking for help to find away to ignore the prefix and check of
>> the
>> > type
>> >
>> > regards,
>> > Thaer
>>

Re: High disk write usage

2017-07-05 Thread Erick Erickson

What is your soft commit interval? That'll cause I/O as well.

How much physical RAM and how much is dedicated to _all_ the JVMs on a
machine? One cause here is that Lucene uses MMapDirectory which can be
starved for OS memory if you use too much JVM, my rule of thumb is
that _at least_ half of the physical memory should be reserved for the
OS.

Your transaction logs should fluctuate but even out. By that I mean
they should increase in size but every hard commit should truncate
some of them so I wouldn't expect them to grow indefinitely.

One strategy is to put your tlogs on a separate drive exactly to
reduce contention. You could disable them too at a cost of risking
your data. That might be a quick experiment you could run though,
disable tlogs and see what that changes. Of course I'd do this on my
test system ;).

But yeah, Solr will use a lot of I/O in the scenario you are outlining
I'm afraid.

Best,
Erick

On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel  wrote:
> thanks Markus!
>
> We already have SSD.
>
> About changing topology we probed yesterday with 10 shards, but system
> goes more inconsistent than with the current topology (5x10). I dont know
> why... too many traffic perhaps?
>
> About merge factor.. we set default configuration for some days... but when
> a merge occurs system overload. We probed with mergefactor of 4 to improbe
> query times and trying to have smaller merges.
>
> 2017-07-05 16:51 GMT+02:00 Markus Jelsma :
>
>> Try mergeFactor of 10 (default) which should be fine in most cases. If you
>> got an extreme case, either create more shards and consider better hardware
>> (SSD's)
>>
>> -Original message-
>> > From:Antonio De Miguel 
>> > Sent: Wednesday 5th July 2017 16:48
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: High disk write usage
>> >
>> > Thnaks a lot alessandro!
>> >
>> > Yes, we have very big physical dedicated machines, with a topology of 5
>> > shards and10 replicas each shard.
>> >
>> >
>> > 1. transaction log files are increasing but not with this rate
>> >
>> > 2.  we 've probed with values between 300 and 2000 MB... without any
>> > visible results
>> >
>> > 3.  We don't use those features
>> >
>> > 4. No.
>> >
>> > 5. I've probed with low and high mergefacors and i think that is  the
>> point.
>> >
>> > With low merge factor (over 4) we 've high write disk rate as i said
>> > previously
>> >
>> > with merge factor of 20, writing disk rate is decreasing, but now, with
>> > high qps rates (over 1000 qps) system is overloaded.
>> >
>> > i think that's the expected behaviour :(
>> >
>> >
>> >
>> >
>> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti :
>> >
>> > > Point 2 was the ram Buffer size :
>> > >
>> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
>> > >  indexing for buffering added documents and deletions before
>> they
>> > > are
>> > >  flushed to the Directory.
>> > >  maxBufferedDocs sets a limit on the number of documents
>> buffered
>> > >  before flushing.
>> > >  If both ramBufferSizeMB and maxBufferedDocs is set, then
>> > >  Lucene will flush based on whichever limit is hit first.
>> > >
>> > > 100
>> > > 1000
>> > >
>> > >
>> > >
>> > >
>> > > -
>> > > ---
>> > > Alessandro Benedetti
>> > > Search Consultant, R&D Software Engineer, Director
>> > > Sease Ltd. - www.sease.io
>> > > --
>> > > View this message in context: http://lucene.472066.n3.
>> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html
>> > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > >
>> >
>>

Optimization/Merging space

2017-07-05 Thread David Hastings

Hi all, I am curious to know what happens when solr begins a merge/optimize
operation, but then runs out of physical disk space.  I havent had the
chance to try this out yet but I was wondering if anyone knows what the
underlying codes response to the situation would be if it happened.  Thanks
-David

Re: Solr dynamic "on the fly fields"

2017-07-05 Thread Erick Erickson

Some aggregations are supported by combining stats with pivot facets? See:

https://lucidworks.com/2015/01/29/you-got-stats-in-my-facets/

Don't quite think that works for your use case though.

the other thing that _might_ help is all the Streaming
Expression/Streaming Aggregation work.

Best,
Erick


On Wed, Jul 5, 2017 at 6:23 AM, Pablo Anzorena  wrote:
> Thanks Erick for the answer. Function Queries are great, but for my use
> case what I really do is making aggregations (using Json Facet for example)
> with this functions.
>
> I have tried using Function Queries with Json Facet but it does not support
> it.
>
> Any other idea you can imagine?
>
>
>
>
>
> 2017-07-03 21:57 GMT-03:00 Erick Erickson :
>
>> I don't know how one would do this. But I would ask what the use-case
>> is. Creating such fields at index time just seems like it would be
>> inviting abuse by creating a zillion fields as you have no control
>> over what gets created. I'm assuming your tenants don't talk to each
>> other
>>
>> Have you thought about using function queries to pull this data out as
>> needed at _query_ time? See:
>> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>>
>> Best,
>> Erick
>>
>> On Mon, Jul 3, 2017 at 12:06 PM, Pablo Anzorena 
>> wrote:
>> > Thanks Erick,
>> >
>> > For my use case it's not possible any of those solutions. I have a
>> > multitenancy scheme in the most basic level, that is I have a single
>> > collection with fields (clientId, field1, field2, ..., field50) attending
>> > many clients.
>> >
>> > Clients can create custom fields based on arithmetic operations of any
>> > other field.
>> >
>> > So, is it possible to update let's say field49 with the follow operation:
>> > log(field39) + field25 on clientId=43?
>> >
>> > Do field39 and field25 need to be stored to accomplish this? Is there any
>> > other way to avoid storing them?
>> >
>> > Thanks!
>> >
>> >
>> > 2017-07-03 15:00 GMT-03:00 Erick Erickson :
>> >
>> >> There are two ways:
>> >> 1> define a dynamic field pattern, i.e.
>> >>
>> >> 
>> >>
>> >> Now just add any field in the doc you want. If it ends in "_sum" and
>> >> no other explicit field matches you have a new field.
>> >>
>> >> 2> Use the managed schema to add these on the fly. I don't recommend
>> >> this from what I know of your use case, this is primarily intended for
>> >> front-ends to be able to modify the schema and/or "field guessing".
>> >>
>> >> I do caution you though that either way don't go over-the-top. If
>> >> you're thinking of thousands of different fields that can lead to
>> >> performance issues.
>> >>
>> >> You can either put stuff in the field on your indexing client or
>> >> create a custom update component, perhaps the simplest would be a
>> >> "StatelessScriptUpdateProcessorFactory:
>> >>
>> >> see: https://cwiki.apache.org/confluence/display/solr/
>> >> Update+Request+Processors#UpdateRequestProcessors-
>> >> UpdateRequestProcessorFactories
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, Jul 3, 2017 at 10:52 AM, Pablo Anzorena <
>> anzorena.f...@gmail.com>
>> >> wrote:
>> >> > Hey,
>> >> >
>> >> > I was wondering if there is some way to add fields "on the fly" based
>> on
>> >> > arithmetic operations on other fields. For example add a new field
>> >> > "custom_field" = log(field1) + field2 -5.
>> >> >
>> >> > Thanks.
>> >>
>>

Re: Optimization/Merging space

2017-07-05 Thread Erick Erickson

Bad Things Can Happen. Solr (well, Lucene in this case) tries very
hard to keep disk full operations from having repercussions., but it's
kind of like OOMs. What happens next?

It's not so much the merge/optimize, but what happens in the future
when the _next_ segment is written...

The merge or optimize goes something like this:

1> copy and merge all the segments you intend to
2> when all that is successful, update the segments file
3> delete the old segments.

So theoretically if your disk fills up during <1> or <2> your old
index is intact and usable. It isn't until the segments file has been
successfully changed that the new snapshot of the index is active.
Which, in a nutshell, is why you need to have at least as much free
space on your disk as your index occupies since you can't control when
a merge happens which may copy _all_ of your segments to new ones.

Let's say that during <1> your disk fills up _and_ you're indexing new
documents at the same time. Solr/Lucene can't guarantee that the new
documents got written to disk in that case. So while your current
snapshot is probably OK, your index may not be in the state you want.
Meanwhile if you're using transaction logs Solr is trying to write
tlogs to disk. It's unknown what happened to them (another good
argument for putting them on a separate disk!).

Best,
Erick

On Wed, Jul 5, 2017 at 9:07 AM, David Hastings
 wrote:
> Hi all, I am curious to know what happens when solr begins a merge/optimize
> operation, but then runs out of physical disk space.  I havent had the
> chance to try this out yet but I was wondering if anyone knows what the
> underlying codes response to the situation would be if it happened.  Thanks
> -David

Re: help on implicit routing

2017-07-05 Thread Erick Erickson

Use the _route_ field and put in "day_1" or "day_2". You've presumably
named the shards (the "shard" parameter) when you added them with the
CREATESHARD command so use the value you specified there.

Best,
Erick

On Wed, Jul 5, 2017 at 6:15 PM,   wrote:
> I am trying out the document routing feature in Solr 6.4.1. I am unable to 
> comprehend the documentation where it states that
> “The 'implicit' router does not
> automatically route documents to different
> shards.  Whichever shard you indicate on the
> indexing request (or within each document) will
> be used as the destination for those documents”
>
> How do you specify the shard inside a document? E.g If I have basic 
> collection with two shards called day_1 and day_2. What value should be 
> populated in the router field that will ensure the document routing to the 
> respective shard?
>
> Regards,
> Imran
>
> Sent from Mail for Windows 10
>

Re: High disk write usage

2017-07-05 Thread Antonio De Miguel

Hi Erik! thanks for your response!

Our soft commit is 5 seconds. Why generates I/0 a softcommit? first notice.


We have enough physical RAM to store full collection and 16Gb for each
JVM.  The collection is relatively small.

I've tried (for testing purposes)  disabling transactionlog (commenting
)... but cluster does not go up. I'll try writing into separated
drive, nice idea...








2017-07-05 18:04 GMT+02:00 Erick Erickson :

> What is your soft commit interval? That'll cause I/O as well.
>
> How much physical RAM and how much is dedicated to _all_ the JVMs on a
> machine? One cause here is that Lucene uses MMapDirectory which can be
> starved for OS memory if you use too much JVM, my rule of thumb is
> that _at least_ half of the physical memory should be reserved for the
> OS.
>
> Your transaction logs should fluctuate but even out. By that I mean
> they should increase in size but every hard commit should truncate
> some of them so I wouldn't expect them to grow indefinitely.
>
> One strategy is to put your tlogs on a separate drive exactly to
> reduce contention. You could disable them too at a cost of risking
> your data. That might be a quick experiment you could run though,
> disable tlogs and see what that changes. Of course I'd do this on my
> test system ;).
>
> But yeah, Solr will use a lot of I/O in the scenario you are outlining
> I'm afraid.
>
> Best,
> Erick
>
> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel 
> wrote:
> > thanks Markus!
> >
> > We already have SSD.
> >
> > About changing topology we probed yesterday with 10 shards, but
> system
> > goes more inconsistent than with the current topology (5x10). I dont know
> > why... too many traffic perhaps?
> >
> > About merge factor.. we set default configuration for some days... but
> when
> > a merge occurs system overload. We probed with mergefactor of 4 to
> improbe
> > query times and trying to have smaller merges.
> >
> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma :
> >
> >> Try mergeFactor of 10 (default) which should be fine in most cases. If
> you
> >> got an extreme case, either create more shards and consider better
> hardware
> >> (SSD's)
> >>
> >> -Original message-
> >> > From:Antonio De Miguel 
> >> > Sent: Wednesday 5th July 2017 16:48
> >> > To: solr-user@lucene.apache.org
> >> > Subject: Re: High disk write usage
> >> >
> >> > Thnaks a lot alessandro!
> >> >
> >> > Yes, we have very big physical dedicated machines, with a topology of
> 5
> >> > shards and10 replicas each shard.
> >> >
> >> >
> >> > 1. transaction log files are increasing but not with this rate
> >> >
> >> > 2.  we 've probed with values between 300 and 2000 MB... without any
> >> > visible results
> >> >
> >> > 3.  We don't use those features
> >> >
> >> > 4. No.
> >> >
> >> > 5. I've probed with low and high mergefacors and i think that is  the
> >> point.
> >> >
> >> > With low merge factor (over 4) we 've high write disk rate as i said
> >> > previously
> >> >
> >> > with merge factor of 20, writing disk rate is decreasing, but now,
> with
> >> > high qps rates (over 1000 qps) system is overloaded.
> >> >
> >> > i think that's the expected behaviour :(
> >> >
> >> >
> >> >
> >> >
> >> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti  >:
> >> >
> >> > > Point 2 was the ram Buffer size :
> >> > >
> >> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
> >> > >  indexing for buffering added documents and deletions before
> >> they
> >> > > are
> >> > >  flushed to the Directory.
> >> > >  maxBufferedDocs sets a limit on the number of documents
> >> buffered
> >> > >  before flushing.
> >> > >  If both ramBufferSizeMB and maxBufferedDocs is set, then
> >> > >  Lucene will flush based on whichever limit is hit first.
> >> > >
> >> > > 100
> >> > > 1000
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > -
> >> > > ---
> >> > > Alessandro Benedetti
> >> > > Search Consultant, R&D Software Engineer, Director
> >> > > Sease Ltd. - www.sease.io
> >> > > --
> >> > > View this message in context: http://lucene.472066.n3.
> >> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html
> >> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >> > >
> >> >
> >>
>

Re: High disk write usage

2017-07-05 Thread Erick Erickson

bq: We have enough physical RAM to store full collection and 16Gb for each JVM.

That's not quite what I was asking for. Lucene uses MMapDirectory to
map part of the index into the OS memory space. If you've
over-allocated the JVM space relative to your physical memory that
space can start swapping. Frankly I'd expect your query performance to
die if that was happening so this is a sanity check.

How much physical memory does the machine have and how much memory is
allocated to _all_ of the JVMs running on that machine?

see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick


On Wed, Jul 5, 2017 at 9:41 AM, Antonio De Miguel  wrote:
> Hi Erik! thanks for your response!
>
> Our soft commit is 5 seconds. Why generates I/0 a softcommit? first notice.
>
>
> We have enough physical RAM to store full collection and 16Gb for each
> JVM.  The collection is relatively small.
>
> I've tried (for testing purposes)  disabling transactionlog (commenting
> )... but cluster does not go up. I'll try writing into separated
> drive, nice idea...
>
>
>
>
>
>
>
>
> 2017-07-05 18:04 GMT+02:00 Erick Erickson :
>
>> What is your soft commit interval? That'll cause I/O as well.
>>
>> How much physical RAM and how much is dedicated to _all_ the JVMs on a
>> machine? One cause here is that Lucene uses MMapDirectory which can be
>> starved for OS memory if you use too much JVM, my rule of thumb is
>> that _at least_ half of the physical memory should be reserved for the
>> OS.
>>
>> Your transaction logs should fluctuate but even out. By that I mean
>> they should increase in size but every hard commit should truncate
>> some of them so I wouldn't expect them to grow indefinitely.
>>
>> One strategy is to put your tlogs on a separate drive exactly to
>> reduce contention. You could disable them too at a cost of risking
>> your data. That might be a quick experiment you could run though,
>> disable tlogs and see what that changes. Of course I'd do this on my
>> test system ;).
>>
>> But yeah, Solr will use a lot of I/O in the scenario you are outlining
>> I'm afraid.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel 
>> wrote:
>> > thanks Markus!
>> >
>> > We already have SSD.
>> >
>> > About changing topology we probed yesterday with 10 shards, but
>> system
>> > goes more inconsistent than with the current topology (5x10). I dont know
>> > why... too many traffic perhaps?
>> >
>> > About merge factor.. we set default configuration for some days... but
>> when
>> > a merge occurs system overload. We probed with mergefactor of 4 to
>> improbe
>> > query times and trying to have smaller merges.
>> >
>> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma :
>> >
>> >> Try mergeFactor of 10 (default) which should be fine in most cases. If
>> you
>> >> got an extreme case, either create more shards and consider better
>> hardware
>> >> (SSD's)
>> >>
>> >> -Original message-
>> >> > From:Antonio De Miguel 
>> >> > Sent: Wednesday 5th July 2017 16:48
>> >> > To: solr-user@lucene.apache.org
>> >> > Subject: Re: High disk write usage
>> >> >
>> >> > Thnaks a lot alessandro!
>> >> >
>> >> > Yes, we have very big physical dedicated machines, with a topology of
>> 5
>> >> > shards and10 replicas each shard.
>> >> >
>> >> >
>> >> > 1. transaction log files are increasing but not with this rate
>> >> >
>> >> > 2.  we 've probed with values between 300 and 2000 MB... without any
>> >> > visible results
>> >> >
>> >> > 3.  We don't use those features
>> >> >
>> >> > 4. No.
>> >> >
>> >> > 5. I've probed with low and high mergefacors and i think that is  the
>> >> point.
>> >> >
>> >> > With low merge factor (over 4) we 've high write disk rate as i said
>> >> > previously
>> >> >
>> >> > with merge factor of 20, writing disk rate is decreasing, but now,
>> with
>> >> > high qps rates (over 1000 qps) system is overloaded.
>> >> >
>> >> > i think that's the expected behaviour :(
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti > >:
>> >> >
>> >> > > Point 2 was the ram Buffer size :
>> >> > >
>> >> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
>> >> > >  indexing for buffering added documents and deletions before
>> >> they
>> >> > > are
>> >> > >  flushed to the Directory.
>> >> > >  maxBufferedDocs sets a limit on the number of documents
>> >> buffered
>> >> > >  before flushing.
>> >> > >  If both ramBufferSizeMB and maxBufferedDocs is set, then
>> >> > >  Lucene will flush based on whichever limit is hit first.
>> >> > >
>> >> > > 100
>> >> > > 1000
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > -
>> >> > > ---
>> >> > > Alessandro Benedetti
>> >> > > Search Consultant, R&D Software Engineer, Director
>> >> > > Sease Ltd. - www.sease.io
>> >> > > --
>> >> > > View this message in context: http://lucene.472066.n3.
>> >> > > nabble.com/High-disk-write-usa

Best way to split text

2017-07-05 Thread tstusr

We are working on a search application for large pdfs (~ 10 - 100 Mb), there
are been correctly indexed.

However we want to make some training in the pipeline, so we are
implementing some spark mllib algorithms.

But now, some requirements are to split documents into either paragraphs or
pages. Some alternatives, we find, is to split via tika-pdfbox or making a
custom processor to catch words.

In terms of performance, what options is preferred? A custom class of tika
that extracts just paragraphs or with all document filter paragraphs that
match our vocabulary.

Thanks for your advice.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-split-text-tp4344498.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr alias not working on streaming query search

2017-07-05 Thread Lewin Joy (TMS)

** PROTECTED 関係者外秘

Have anyone faced a similar issue?

I have a collection named “solr_test”. I created an alias to it as “solr_alias”.
This alias works well when I do a simple search:
http://localhost:8983/solr/solr_alias/select?indent=on&q=*:*&wt=json

But, this will not work when used in a streaming expression:

http://localhost:8983/solr/solr_alias/stream?expr=search(solr_alias, q=*:*, 
fl="p_PrimaryKey, p_name", qt="/select", sort="p_name asc")

This gives me an error:
"EXCEPTION": "java.lang.Exception: Collection not found:solr_alias"

The same streaming query works when I use the actual collection name: 
“solr_test”


Is this a limitation for aliases in solr? Or am I doing something wrong?

Thanks,
Lewin

Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-07-05 Thread Joel Bernstein

There are a number of functions that are currently being held up because of
conflicting duplicate function names. We haven't come to an agreement yet
on the best way forward for this yet. I think we should open a separate
ticket to discuss how best to handle this issue.


Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jul 5, 2017 at 10:04 AM, Susheel Kumar 
wrote:

> Does "uniq" expression sounds good to use for UniqueMetric class?
>
> Thanks,
> Susheel
>
> On Tue, Jul 4, 2017 at 5:45 PM, Susheel Kumar 
> wrote:
>
> > Hello Joel,
> >
> > I tried to create a patch to add UniqueMetric and it works, but soon
> > realized, we have UniqueStream as well and can't load both of them (like
> > below) when required,  since both uses "unique" keyword.
> >
> > Any advice how we can handle this.  Come up with different keyword for
> > UniqueMetric or rename UniqueStream etc..?
> >
> >StreamFactory factory = new StreamFactory()
> >  .withCollectionZkHost (...)
> >.withFunctionName("facet", FacetStream.class)
> >  .withFunctionName("sum", SumMetric.class)
> >  .withFunctionName("unique", UniqueStream.class)
> >  .withFunctionName("unique", UniqueMetric.class)
> >
> > On Thu, Jun 29, 2017 at 9:32 AM, Joel Bernstein 
> > wrote:
> >
> >> This is mainly due to focus on other things. It would great to support
> all
> >> the aggregate functions in facet, rollup and timeseries expressions.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Thu, Jun 29, 2017 at 8:23 AM, Zheng Lin Edwin Yeo <
> >> edwinye...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > We are working on the Solr Streaming expression, using the facet
> stream
> >> > source.
> >> >
> >> > As the underlying structure is using JSON Facet, would like to find
> out
> >> why
> >> > the unique() metrics is not supported? Currently, it only supports
> >> sum(col)
> >> > , avg(col), min(col), max(col), count(*)
> >> >
> >> > I'm using Solr 6.5.1
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >>
> >
> >
>

Re: solr alias not working on streaming query search

2017-07-05 Thread Joel Bernstein

This should be fixed in Solr 6.4:
https://issues.apache.org/jira/browse/SOLR-9077

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jul 5, 2017 at 2:40 PM, Lewin Joy (TMS) 
wrote:

> ** PROTECTED 関係者外秘
>
> Have anyone faced a similar issue?
>
> I have a collection named “solr_test”. I created an alias to it as
> “solr_alias”.
> This alias works well when I do a simple search:
> http://localhost:8983/solr/solr_alias/select?indent=on&q=*:*&wt=json
>
> But, this will not work when used in a streaming expression:
>
> http://localhost:8983/solr/solr_alias/stream?expr=search(solr_alias,
> q=*:*, fl="p_PrimaryKey, p_name", qt="/select", sort="p_name asc")
>
> This gives me an error:
> "EXCEPTION": "java.lang.Exception: Collection not found:solr_alias"
>
> The same streaming query works when I use the actual collection name:
> “solr_test”
>
>
> Is this a limitation for aliases in solr? Or am I doing something wrong?
>
> Thanks,
> Lewin
>

Re: Allow Join over two sharded collection

2017-07-05 Thread Susheel Kumar

How are you planing to manual route? What key(s) are you thinking to use.

Second the link i shared was collection aliasing and if you use that, you
will end up with multiple collections. Just want to clarify as you said
above "...manual routing and creating alias"

Again until the join feature is available across shards, you can still
continue with one shard (and replica's if needed).  20M + 1M/per month
shouldn't be a big deal.

Thanks,
Susheel

On Mon, Jul 3, 2017 at 11:16 PM, mganeshs  wrote:

> Hi Susheel,
>
> To make use of Joins only option is I should go for manual routing. If I go
> for manual routing based on time, we miss the power of distributing the
> load
> while indexing. It will end up with all indexing happens in newly created
> shard, which we feel this will not be efficient approach and degrades the
> performance of indexing as we have lot of jvms running, but still all
> indexing going to one single shard for indexing and we are also expecting
> 1M+ docs per month in coming days.
>
> For your question on whether we will query old aged document... ? Mostly we
> won't query old aged documents. With querying pattern, it's clear we should
> go for manual routing and creating alias. But when it comes to indexing, in
> order to distribute the load of indexing, we felt default routing is the
> best option, but Join will not work. And that's the reason for asking when
> this feature will be in place ?
>
> Regards,
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4344098.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-07-05 Thread Susheel Kumar

Hello Joel,

Opened the ticket

https://issues.apache.org/jira/browse/SOLR-11017

Thanks,
Susheel

On Wed, Jul 5, 2017 at 2:46 PM, Joel Bernstein  wrote:

> There are a number of functions that are currently being held up because of
> conflicting duplicate function names. We haven't come to an agreement yet
> on the best way forward for this yet. I think we should open a separate
> ticket to discuss how best to handle this issue.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Jul 5, 2017 at 10:04 AM, Susheel Kumar 
> wrote:
>
> > Does "uniq" expression sounds good to use for UniqueMetric class?
> >
> > Thanks,
> > Susheel
> >
> > On Tue, Jul 4, 2017 at 5:45 PM, Susheel Kumar 
> > wrote:
> >
> > > Hello Joel,
> > >
> > > I tried to create a patch to add UniqueMetric and it works, but soon
> > > realized, we have UniqueStream as well and can't load both of them
> (like
> > > below) when required,  since both uses "unique" keyword.
> > >
> > > Any advice how we can handle this.  Come up with different keyword for
> > > UniqueMetric or rename UniqueStream etc..?
> > >
> > >StreamFactory factory = new StreamFactory()
> > >  .withCollectionZkHost (...)
> > >.withFunctionName("facet", FacetStream.class)
> > >  .withFunctionName("sum", SumMetric.class)
> > >  .withFunctionName("unique", UniqueStream.class)
> > >  .withFunctionName("unique", UniqueMetric.class)
> > >
> > > On Thu, Jun 29, 2017 at 9:32 AM, Joel Bernstein 
> > > wrote:
> > >
> > >> This is mainly due to focus on other things. It would great to support
> > all
> > >> the aggregate functions in facet, rollup and timeseries expressions.
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Thu, Jun 29, 2017 at 8:23 AM, Zheng Lin Edwin Yeo <
> > >> edwinye...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > We are working on the Solr Streaming expression, using the facet
> > stream
> > >> > source.
> > >> >
> > >> > As the underlying structure is using JSON Facet, would like to find
> > out
> > >> why
> > >> > the unique() metrics is not supported? Currently, it only supports
> > >> sum(col)
> > >> > , avg(col), min(col), max(col), count(*)
> > >> >
> > >> > I'm using Solr 6.5.1
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> >
> > >>
> > >
> > >
> >
>

Re: High disk write usage

2017-07-05 Thread Antonio De Miguel

Hi erik.

What i want to said is that we have enough memory to store shards, and
furthermore, JVMs heapspaces

Machine has 400gb of RAM. I think we have enough.

We have 10 JVM running on the machine, each of one using 16gb.

Shard size is about 8gb.

When we have query or indexing peaks our problem are the CPU ussage and the
disk io, but we have a lot of unused memory.









El 5/7/2017 19:04, "Erick Erickson"  escribió:

> bq: We have enough physical RAM to store full collection and 16Gb for each
> JVM.
>
> That's not quite what I was asking for. Lucene uses MMapDirectory to
> map part of the index into the OS memory space. If you've
> over-allocated the JVM space relative to your physical memory that
> space can start swapping. Frankly I'd expect your query performance to
> die if that was happening so this is a sanity check.
>
> How much physical memory does the machine have and how much memory is
> allocated to _all_ of the JVMs running on that machine?
>
> see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-
> on-64bit.html
>
> Best,
> Erick
>
>
> On Wed, Jul 5, 2017 at 9:41 AM, Antonio De Miguel 
> wrote:
> > Hi Erik! thanks for your response!
> >
> > Our soft commit is 5 seconds. Why generates I/0 a softcommit? first
> notice.
> >
> >
> > We have enough physical RAM to store full collection and 16Gb for each
> > JVM.  The collection is relatively small.
> >
> > I've tried (for testing purposes)  disabling transactionlog (commenting
> > )... but cluster does not go up. I'll try writing into
> separated
> > drive, nice idea...
> >
> >
> >
> >
> >
> >
> >
> >
> > 2017-07-05 18:04 GMT+02:00 Erick Erickson :
> >
> >> What is your soft commit interval? That'll cause I/O as well.
> >>
> >> How much physical RAM and how much is dedicated to _all_ the JVMs on a
> >> machine? One cause here is that Lucene uses MMapDirectory which can be
> >> starved for OS memory if you use too much JVM, my rule of thumb is
> >> that _at least_ half of the physical memory should be reserved for the
> >> OS.
> >>
> >> Your transaction logs should fluctuate but even out. By that I mean
> >> they should increase in size but every hard commit should truncate
> >> some of them so I wouldn't expect them to grow indefinitely.
> >>
> >> One strategy is to put your tlogs on a separate drive exactly to
> >> reduce contention. You could disable them too at a cost of risking
> >> your data. That might be a quick experiment you could run though,
> >> disable tlogs and see what that changes. Of course I'd do this on my
> >> test system ;).
> >>
> >> But yeah, Solr will use a lot of I/O in the scenario you are outlining
> >> I'm afraid.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel 
> >> wrote:
> >> > thanks Markus!
> >> >
> >> > We already have SSD.
> >> >
> >> > About changing topology we probed yesterday with 10 shards, but
> >> system
> >> > goes more inconsistent than with the current topology (5x10). I dont
> know
> >> > why... too many traffic perhaps?
> >> >
> >> > About merge factor.. we set default configuration for some days... but
> >> when
> >> > a merge occurs system overload. We probed with mergefactor of 4 to
> >> improbe
> >> > query times and trying to have smaller merges.
> >> >
> >> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma  >:
> >> >
> >> >> Try mergeFactor of 10 (default) which should be fine in most cases.
> If
> >> you
> >> >> got an extreme case, either create more shards and consider better
> >> hardware
> >> >> (SSD's)
> >> >>
> >> >> -Original message-
> >> >> > From:Antonio De Miguel 
> >> >> > Sent: Wednesday 5th July 2017 16:48
> >> >> > To: solr-user@lucene.apache.org
> >> >> > Subject: Re: High disk write usage
> >> >> >
> >> >> > Thnaks a lot alessandro!
> >> >> >
> >> >> > Yes, we have very big physical dedicated machines, with a topology
> of
> >> 5
> >> >> > shards and10 replicas each shard.
> >> >> >
> >> >> >
> >> >> > 1. transaction log files are increasing but not with this rate
> >> >> >
> >> >> > 2.  we 've probed with values between 300 and 2000 MB... without
> any
> >> >> > visible results
> >> >> >
> >> >> > 3.  We don't use those features
> >> >> >
> >> >> > 4. No.
> >> >> >
> >> >> > 5. I've probed with low and high mergefacors and i think that is
> the
> >> >> point.
> >> >> >
> >> >> > With low merge factor (over 4) we 've high write disk rate as i
> said
> >> >> > previously
> >> >> >
> >> >> > with merge factor of 20, writing disk rate is decreasing, but now,
> >> with
> >> >> > high qps rates (over 1000 qps) system is overloaded.
> >> >> >
> >> >> > i think that's the expected behaviour :(
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti <
> a.benede...@sease.io
> >> >:
> >> >> >
> >> >> > > Point 2 was the ram Buffer size :
> >> >> > >
> >> >> > > *ramBufferSizeMB* sets the amount of RAM that may be used by
> Lucene
> >> >> > >  indexing for buffering added docume

Placing different collections on different hard disk/folder

2017-07-05 Thread Zheng Lin Edwin Yeo

Hi,

Would like to check, how can we place the indexed files of different
collections on different hard disk/folder, but they are in the same node?

For example, I want collection1 to be placed in C: drive, collection2 to be
placed in D: drive, and collection3 to be placed in E: drive.

I am using Solr 6.5.1

Regards,
Edwin

Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-07-05 Thread Zheng Lin Edwin Yeo

Thanks for your help, Joel and Susheel.

Regards,
Edwin

On 6 July 2017 at 05:49, Susheel Kumar  wrote:

> Hello Joel,
>
> Opened the ticket
>
> https://issues.apache.org/jira/browse/SOLR-11017
>
> Thanks,
> Susheel
>
> On Wed, Jul 5, 2017 at 2:46 PM, Joel Bernstein  wrote:
>
> > There are a number of functions that are currently being held up because
> of
> > conflicting duplicate function names. We haven't come to an agreement yet
> > on the best way forward for this yet. I think we should open a separate
> > ticket to discuss how best to handle this issue.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, Jul 5, 2017 at 10:04 AM, Susheel Kumar 
> > wrote:
> >
> > > Does "uniq" expression sounds good to use for UniqueMetric class?
> > >
> > > Thanks,
> > > Susheel
> > >
> > > On Tue, Jul 4, 2017 at 5:45 PM, Susheel Kumar 
> > > wrote:
> > >
> > > > Hello Joel,
> > > >
> > > > I tried to create a patch to add UniqueMetric and it works, but soon
> > > > realized, we have UniqueStream as well and can't load both of them
> > (like
> > > > below) when required,  since both uses "unique" keyword.
> > > >
> > > > Any advice how we can handle this.  Come up with different keyword
> for
> > > > UniqueMetric or rename UniqueStream etc..?
> > > >
> > > >StreamFactory factory = new StreamFactory()
> > > >  .withCollectionZkHost (...)
> > > >.withFunctionName("facet", FacetStream.class)
> > > >  .withFunctionName("sum", SumMetric.class)
> > > >  .withFunctionName("unique", UniqueStream.class)
> > > >  .withFunctionName("unique", UniqueMetric.class)
> > > >
> > > > On Thu, Jun 29, 2017 at 9:32 AM, Joel Bernstein 
> > > > wrote:
> > > >
> > > >> This is mainly due to focus on other things. It would great to
> support
> > > all
> > > >> the aggregate functions in facet, rollup and timeseries expressions.
> > > >>
> > > >> Joel Bernstein
> > > >> http://joelsolr.blogspot.com/
> > > >>
> > > >> On Thu, Jun 29, 2017 at 8:23 AM, Zheng Lin Edwin Yeo <
> > > >> edwinye...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > We are working on the Solr Streaming expression, using the facet
> > > stream
> > > >> > source.
> > > >> >
> > > >> > As the underlying structure is using JSON Facet, would like to
> find
> > > out
> > > >> why
> > > >> > the unique() metrics is not supported? Currently, it only supports
> > > >> sum(col)
> > > >> > , avg(col), min(col), max(col), count(*)
> > > >> >
> > > >> > I'm using Solr 6.5.1
> > > >> >
> > > >> > Regards,
> > > >> > Edwin
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Joins in Parallel SQL?

2017-07-05 Thread imran


Is it possible to join documents from different collections through Parallel 
SQL?

In addition to the LIMIT feature on Parallel SQL, can we do use OFFSET to 
implement paging?

Thanks,
Imran


Sent from Mail for Windows 10

Re: xml indexing

Re: Strange boolean query behaviour on 5.5.4

Re: Did /export use to emit tuples and now does not?

High disk write usage

RE: Solr Prod Issue | KeeperErrorCode = ConnectionLoss for /overseer_elect/leader

Re: Solr Prod Issue | KeeperErrorCode = ConnectionLoss for /overseer_elect/leader

help on implicit routing

index new discovered fileds of different types

Re: Solr dynamic "on the fly fields"

Re: High disk write usage

Re: High disk write usage

Re: Unique() metrics not supported in Solr Streaming facet stream source

Re: cursorMark / Deep Paging and SEO

Re: High disk write usage

RE: High disk write usage

Re: index new discovered fileds of different types

Re: High disk write usage

Re: index new discovered fileds of different types

Re: index new discovered fileds of different types

Re: High disk write usage

Optimization/Merging space

Re: Solr dynamic "on the fly fields"

Re: Optimization/Merging space

Re: help on implicit routing

Re: High disk write usage

Re: High disk write usage

Best way to split text

solr alias not working on streaming query search

Re: Unique() metrics not supported in Solr Streaming facet stream source

Re: solr alias not working on streaming query search

Re: Allow Join over two sharded collection

Re: Unique() metrics not supported in Solr Streaming facet stream source

Re: High disk write usage

Placing different collections on different hard disk/folder

Re: Unique() metrics not supported in Solr Streaming facet stream source

Joins in Parallel SQL?

36 matches

Site Navigation

Mail list logo

Footer information