Re: Get results in multiple orders (multiple boosts)

2017-07-19 Thread Luca Dall'Osto
Hello,The problem of build an index is that each user has a custom source order 
and category order: are not static orders (for example user X could have 
category:5 as most important category but user Y could have category:9 as most 
important).
Has anyone ever written a custom sort function in solr?Maybe a link of a 
tutorial or an example could be very helpful. Thanks 

Luca

On Tuesday, July 18, 2017 4:18 PM, alessandro.benedetti 
 wrote:
 

 "I have different "sort preferences", so I can't build a index and use for
sorting.Maybe I have to sort by category then by source and by language or
by source, then by category and by date"

I would like to focus on this bit.
It is ok to go for a custom function and sort at query time, but I am
curious to explore why an index time solution should not be ok.
You can have these distinct fields :
source_priority
language_priority
category_priority 
ect

This values can be assigned at the documents at indexing time ( using for
example a custom update request processor).
Then at query time you can easily sort on those values in a multi layered
approach :
sort:source_priority desc, category_priority  desc
Of course, if the priority for a source changes quite often or if it's user
dependent, a query time solution would be preferred.





-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-results-in-multiple-orders-multiple-boosts-tp4346304p4346559.html
Sent from the Solr - User mailing list archive at Nabble.com.


   

Re: Highlighting words with special characters

2017-07-19 Thread Lasitha Wattaladeniya
Update,

I changed the UAX29URLEmailTokenizerFactory to StandardTokenizerFactory and
now it shows highlighted text fragments in the indexed email text.

But I don't understand this behavior. Can someone shed some light please

On 18 Jul 2017 14:18, "Lasitha Wattaladeniya"  wrote:

> Further more, ngram field has following tokenizer/filter chain in index
> and query
>
> UAX29URLEmailTokenizerFactory (only in index)
> stopFilterFactory
> LowerCaseFilterFactory
> ASCIIFoldingFilterFactory
> EnglishPossessiveFilterFactory
> StemmerOverrideFilterFactory (only in query)
> NgramTokenizerFactory (only in index)
>
> Regards,
> Lasitha
>
> On 18 Jul 2017 14:11, "Lasitha Wattaladeniya"  wrote:
>
>> Hi devs,
>>
>> I have setup solr highlighting with default setup (only changed the
>> fragsize to 0 to match any field length). It worked fine but recently I
>> discovered it doesn't highlight for words with special characters in the
>> middle.
>>
>> For an example, let's say I have indexed email address test.f...@ran.com
>> to a ngram field. And when I search for the partial text fsdg, I get the
>> results but it's not highlighted. It works in all other scenarios as
>> expected.
>>
>> The ngram field has termVectors, termPositions, termOffsets set to true.
>>
>> Can somebody please suggest me, what may be wrong here?
>>
>> (sorry for the unstructured text. Typed using a mobile phone )
>>
>> Regards
>> Lasitha
>>
>


Re: Solr Issue While indexing Data

2017-07-19 Thread rajat rastogi
Hi Eric ,

Thanks for your Reply.

I tried the solution given , but it did not work.

Please help me to narrow down the problem.

Please let me know if any more inputs are required from my end viz schema,
configs etc.

Can this problem be related to GC ?

regards

Rajat



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Issue-While-indexing-Data-tp4339417p4346745.html
Sent from the Solr - User mailing list archive at Nabble.com.


6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Hello,

Another peculiarity here, our six node (2 shards / 3 replica's) cluster is 
going crazy after a good part of the day has passed. It starts eating CPU for 
no good reason and its latency goes up. Grafana graphs show the problem really 
well

After restarting 2/6 nodes, there is also quite a distinction in the VisualVM 
monitor views, and the VisualVM CPU sampler reports (sorted on self time 
(CPU)). The busy nodes are deeply red in 
o.a.h.impl.io.AbstractSessionInputBuffer.fillBuffer (as usual), the restarted 
nodes are not.

The real distinction between busy and calm nodes is that busy nodes all have 
o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as second to 
fillBuffer(), what are they doing?! Why? The calm nodes don't show this at all. 
Busy nodes all have o.a.l.codec stuff on top, restarted nodes don't.

So, actually, i don't have a clue! Any, any ideas? 

Thanks,
Markus

Each replica is underpowered but performing really well after restart (and JVM 
warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size 18 GB.


Returning unique values for suggestion

2017-07-19 Thread Zheng Lin Edwin Yeo
Hi,

Is there any configuration that we can set for the /suggest handler, so
that the suggestion output will only return unique records, and not
duplicated?

Below is my /suggest handler.

  

all
   json
   true
content
100
id, score
  on
  content
  true
  false
  html
  100
204800
  true


Regards,
Edwin


Re: Get results in multiple orders (multiple boosts)

2017-07-19 Thread Rick Leir
Luca,
You can pass a sort parameter in the query. User A could sort=date%20desc and 
user b could sort=foofield%20asc. 

Maybe query functions can also help with this. Cheers -- Rick

On July 19, 2017 4:39:59 AM EDT, Luca Dall'Osto  
wrote:
>Hello,The problem of build an index is that each user has a custom
>source order and category order: are not static orders (for example
>user X could have category:5 as most important category but user Y
>could have category:9 as most important).
>Has anyone ever written a custom sort function in solr?Maybe a link of
>a tutorial or an example could be very helpful. Thanks 
>
>Luca
>
>On Tuesday, July 18, 2017 4:18 PM, alessandro.benedetti
> wrote:
> 
>
>"I have different "sort preferences", so I can't build a index and use
>for
>sorting.Maybe I have to sort by category then by source and by language
>or
>by source, then by category and by date"
>
>I would like to focus on this bit.
>It is ok to go for a custom function and sort at query time, but I am
>curious to explore why an index time solution should not be ok.
>You can have these distinct fields :
>source_priority
>language_priority
>category_priority 
>ect
>
>This values can be assigned at the documents at indexing time ( using
>for
>example a custom update request processor).
>Then at query time you can easily sort on those values in a multi
>layered
>approach :
>sort:source_priority desc, category_priority  desc
>Of course, if the priority for a source changes quite often or if it's
>user
>dependent, a query time solution would be preferred.
>
>
>
>
>
>-
>---
>Alessandro Benedetti
>Search Consultant, R&D Software Engineer, Director
>Sease Ltd. - www.sease.io
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Get-results-in-multiple-orders-multiple-boosts-tp4346304p4346559.html
>Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>   

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Rick Leir
Markus, 
What does iostat(1) tell you? Cheers -- Rick

On July 19, 2017 5:35:32 AM EDT, Markus Jelsma  
wrote:
>Hello,
>
>Another peculiarity here, our six node (2 shards / 3 replica's) cluster
>is going crazy after a good part of the day has passed. It starts
>eating CPU for no good reason and its latency goes up. Grafana graphs
>show the problem really well
>
>After restarting 2/6 nodes, there is also quite a distinction in the
>VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on
>self time (CPU)). The busy nodes are deeply red in
>o.a.h.impl.io.AbstractSessionInputBuffer.fillBuffer (as usual), the
>restarted nodes are not.
>
>The real distinction between busy and calm nodes is that busy nodes all
>have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms()
>as second to fillBuffer(), what are they doing?! Why? The calm nodes
>don't show this at all. Busy nodes all have o.a.l.codec stuff on top,
>restarted nodes don't.
>
>So, actually, i don't have a clue! Any, any ideas? 
>
>Thanks,
>Markus
>
>Each replica is underpowered but performing really well after restart
>(and JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million,
>index size 18 GB.

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Hello,

Not too much actually:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  10.55    0.00    0.25    0.03    0.95   88.22

Device:    tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda   3.26    78.34   218.67  188942841  527408404

These are all SSD's.

Thanks,
Markus

-Original message-
> From:Rick Leir 
> Sent: Wednesday 19th July 2017 12:48
> To: solr-user@lucene.apache.org
> Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> 
> Markus, 
> What does iostat(1) tell you? Cheers -- Rick
> 
> On July 19, 2017 5:35:32 AM EDT, Markus Jelsma  
> wrote:
> >Hello,
> >
> >Another peculiarity here, our six node (2 shards / 3 replica's) cluster
> >is going crazy after a good part of the day has passed. It starts
> >eating CPU for no good reason and its latency goes up. Grafana graphs
> >show the problem really well
> >
> >After restarting 2/6 nodes, there is also quite a distinction in the
> >VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on
> >self time (CPU)). The busy nodes are deeply red in
> >o.a.h.impl.io.AbstractSessionInputBuffer.fillBuffer (as usual), the
> >restarted nodes are not.
> >
> >The real distinction between busy and calm nodes is that busy nodes all
> >have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms()
> >as second to fillBuffer(), what are they doing?! Why? The calm nodes
> >don't show this at all. Busy nodes all have o.a.l.codec stuff on top,
> >restarted nodes don't.
> >
> >So, actually, i don't have a clue! Any, any ideas? 
> >
> >Thanks,
> >Markus
> >
> >Each replica is underpowered but performing really well after restart
> >(and JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million,
> >index size 18 GB.
> 
> -- 
> Sorry for being brief. Alternate email is rickleir at yahoo dot com 


Re: Solr Issue While indexing Data

2017-07-19 Thread rajat rastogi
Hi Erik,

Some Logs of solr 


2017-07-19 08:14:09.104 INFO  (qtp434091818-6937) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
params={wt=javabin&version=2}{add=[54945918f81f4e218b994b75]} 0 24362730
2017-07-19 08:14:09.181 DEBUG (qtp434091818-10997) [   x:cda6m]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
add{,id=54945918f81f4e218b994b75} {wt=javabin&version=2&df=text}
2017-07-19 08:25:06.333 DEBUG (qtp434091818-7165) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
{df=text&qt=edismax&wt=javabin&version=2}
2017-07-19 08:25:06.334 INFO  (qtp434091818-7165) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
params={wt=javabin&version=2}{add=[541011730e39384b04ba64ed]} 0 24005213
2017-07-19 08:25:06.454 DEBUG (qtp434091818-10958) [   x:cda6m]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
add{,id=541011730e39384b04ba64ed} {wt=javabin&version=2&df=text}
2017-07-19 09:14:57.321 DEBUG (qtp434091818-10682) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
{df=text&qt=edismax&wt=javabin&version=2}
2017-07-19 09:14:57.321 INFO  (qtp434091818-10682) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
params={wt=javabin&version=2}{add=[54c92064a9b16e2fafeff7a0]} 0 8594125
2017-07-19 09:14:57.373 DEBUG (qtp434091818-12118) [   x:cda6m]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
add{,id=54c92064a9b16e2fafeff7a0} {wt=javabin&version=2&df=text}
2017-07-19 09:23:26.477 DEBUG (qtp434091818-10770) [   x:cda6m]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
{wt=javabin&version=2&df=text}
2017-07-19 09:23:26.477 INFO  (qtp434091818-10770) [   x:cda6m]
o.a.s.u.p.LogUpdateProcessorFactory [cda6m]  webapp=/solr path=/update
params={wt=javabin&version=2}{add=[520f8aaf53941a4044cdbe86]} 0 8555637
2017-07-19 10:17:29.822 DEBUG (qtp434091818-9274) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
{df=text&qt=edismax&wt=javabin&version=2}
2017-07-19 10:17:29.822 INFO  (qtp434091818-9274) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
params={wt=javabin&version=2}{add=[559623155ffdf728657573ef]} 0 19477949
2017-07-19 10:17:29.896 DEBUG (qtp434091818-13046) [   x:cda6m]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
add{,id=559623155ffdf728657573ef} {wt=javabin&version=2&df=text}
2017-07-19 10:18:47.465 INFO  (qtp434091818-7737) [   x:cda]
o.a.s.u.DirectUpdateHandler2 end_commit_flush
2017-07-19 10:18:47.465 INFO  (qtp434091818-10998) [   x:cda]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2017-07-19 10:18:47.465 INFO  (qtp434091818-10998) [   x:cda]
o.a.s.u.SolrIndexWriter Calling setCommitData with
IW:org.apache.solr.update.SolrIndexWriter@4843d921
2017-07-19 10:18:47.469 DEBUG (qtp434091818-7737) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
{commit=true&df=text&qt=edismax}
2017-07-19 10:18:47.469 INFO  (qtp434091818-7737) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
params={commit=true}{commit=} 0 28038519
2017-07-19 10:18:48.620 INFO  (qtp434091818-10998) [   x:cda]
o.a.s.u.DirectUpdateHandler2 end_commit_flush
2017-07-19 10:18:48.728 DEBUG (qtp434091818-10998) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
{df=text&qt=edismax&waitSearcher=true&commit=true&softCommit=false&wt=javabin&version=2}
2017-07-19 10:18:48.728 INFO  (qtp434091818-10998) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
params={waitSearcher=true&commit=true&softCommit=false&wt=javabin&version=2}{commit=}
0 10735236
2017-07-19 10:18:48.731 DEBUG (qtp434091818-13082) [   x:cda6m]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
{commit=true&softCommit=false&df=text&waitSearcher=true&wt=javabin&version=2}
root@shineRecSolrMast:~/solr-6.4.2/server/logs# tail -f solr.log
2017-07-19 10:17:29.896 DEBUG (qtp434091818-13046) [   x:cda6m]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
add{,id=559623155ffdf728657573ef} {wt=javabin&version=2&df=text}
2017-07-19 10:18:47.465 INFO  (qtp434091818-7737) [   x:cda]
o.a.s.u.DirectUpdateHandler2 end_commit_flush
2017-07-19 10:18:47.465 INFO  (qtp434091818-10998) [   x:cda]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2017-07-19 10:18:47.465 INFO  (qtp434091818-10998) [   x:cda]
o.a.s.u.SolrIndexWriter Calling setCommitData with
IW:org.apache.solr.update.SolrIndexWriter@4843d921
2017-07-19 10:18:47.469 DEBUG (qtp434091818-7737) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
{commit=true&df=text&qt=edismax}
2017-07-19 10:18:47.469 INFO  (qtp434091818-7737) [   x:cda]
o.a.s.u.p.LogUpdateProcessorFactory

Re: Solr Issue While indexing Data

2017-07-19 Thread Susheel Kumar
What is you current
a) softcommit and hardcommit settings. you can share as it is from config
and how are you committing then?
b) how much is heap out of 124gb
c) how many documents are you adding that is taking long and approx how
many fields including copy fields?

Thnx

On Wed, Jul 19, 2017 at 7:32 AM, rajat rastogi <
rajat.rast...@hindustantimes.com> wrote:

> Hi Erik,
>
> Some Logs of solr
>
>
> 2017-07-19 08:14:09.104 INFO  (qtp434091818-6937) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
> params={wt=javabin&version=2}{add=[54945918f81f4e218b994b75]} 0 24362730
> 2017-07-19 08:14:09.181 DEBUG (qtp434091818-10997) [   x:cda6m]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
> add{,id=54945918f81f4e218b994b75} {wt=javabin&version=2&df=text}
> 2017-07-19 08:25:06.333 DEBUG (qtp434091818-7165) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
> {df=text&qt=edismax&wt=javabin&version=2}
> 2017-07-19 08:25:06.334 INFO  (qtp434091818-7165) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
> params={wt=javabin&version=2}{add=[541011730e39384b04ba64ed]} 0 24005213
> 2017-07-19 08:25:06.454 DEBUG (qtp434091818-10958) [   x:cda6m]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
> add{,id=541011730e39384b04ba64ed} {wt=javabin&version=2&df=text}
> 2017-07-19 09:14:57.321 DEBUG (qtp434091818-10682) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
> {df=text&qt=edismax&wt=javabin&version=2}
> 2017-07-19 09:14:57.321 INFO  (qtp434091818-10682) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
> params={wt=javabin&version=2}{add=[54c92064a9b16e2fafeff7a0]} 0 8594125
> 2017-07-19 09:14:57.373 DEBUG (qtp434091818-12118) [   x:cda6m]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
> add{,id=54c92064a9b16e2fafeff7a0} {wt=javabin&version=2&df=text}
> 2017-07-19 09:23:26.477 DEBUG (qtp434091818-10770) [   x:cda6m]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
> {wt=javabin&version=2&df=text}
> 2017-07-19 09:23:26.477 INFO  (qtp434091818-10770) [   x:cda6m]
> o.a.s.u.p.LogUpdateProcessorFactory [cda6m]  webapp=/solr path=/update
> params={wt=javabin&version=2}{add=[520f8aaf53941a4044cdbe86]} 0 8555637
> 2017-07-19 10:17:29.822 DEBUG (qtp434091818-9274) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
> {df=text&qt=edismax&wt=javabin&version=2}
> 2017-07-19 10:17:29.822 INFO  (qtp434091818-9274) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
> params={wt=javabin&version=2}{add=[559623155ffdf728657573ef]} 0 19477949
> 2017-07-19 10:17:29.896 DEBUG (qtp434091818-13046) [   x:cda6m]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
> add{,id=559623155ffdf728657573ef} {wt=javabin&version=2&df=text}
> 2017-07-19 10:18:47.465 INFO  (qtp434091818-7737) [   x:cda]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2017-07-19 10:18:47.465 INFO  (qtp434091818-10998) [   x:cda]
> o.a.s.u.DirectUpdateHandler2 start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,
> expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2017-07-19 10:18:47.465 INFO  (qtp434091818-10998) [   x:cda]
> o.a.s.u.SolrIndexWriter Calling setCommitData with
> IW:org.apache.solr.update.SolrIndexWriter@4843d921
> 2017-07-19 10:18:47.469 DEBUG (qtp434091818-7737) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
> {commit=true&df=text&qt=edismax}
> 2017-07-19 10:18:47.469 INFO  (qtp434091818-7737) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
> params={commit=true}{commit=} 0 28038519
> 2017-07-19 10:18:48.620 INFO  (qtp434091818-10998) [   x:cda]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2017-07-19 10:18:48.728 DEBUG (qtp434091818-10998) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
> {df=text&qt=edismax&waitSearcher=true&commit=true&
> softCommit=false&wt=javabin&version=2}
> 2017-07-19 10:18:48.728 INFO  (qtp434091818-10998) [   x:cda]
> o.a.s.u.p.LogUpdateProcessorFactory [cda]  webapp=/solr path=/update
> params={waitSearcher=true&commit=true&softCommit=false&
> wt=javabin&version=2}{commit=}
> 0 10735236
> 2017-07-19 10:18:48.731 DEBUG (qtp434091818-13082) [   x:cda6m]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
> commit{,optimize=false,openSearcher=true,waitSearcher=true,
> expungeDeletes=false,softCommit=false,prepareCommit=false}
> {commit=true&softCommit=false&df=text&waitSearcher=true&wt=
> javabin&version=2}
> root@shineRecSolrMast:~/solr-6.4.2/server/logs# tail -f solr.log
> 2017-07-19 10:17:29.896 DEBUG (qtp434091818-13046) [   x:cda6m]
> o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
> add{,id=559623155ffdf728657573ef} {wt=javabin&version=2&df=text}
> 2017-07-19 10:18:47.465 INFO  (qtp434091818-7737) [   x:cda]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2017-07-19 10:18:47.465 INFO  (qtp434091818-10998) [   x:cda]
> o.a.s.u.DirectUpdateHandler

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Mikhail Khludnev
>
> The real distinction between busy and calm nodes is that busy nodes all
> have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as
> second to fillBuffer(), what are they doing?


Can you expose the stack deeper?
Can they start to sync shards due to some reason?

On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma 
wrote:

> Hello,
>
> Another peculiarity here, our six node (2 shards / 3 replica's) cluster is
> going crazy after a good part of the day has passed. It starts eating CPU
> for no good reason and its latency goes up. Grafana graphs show the problem
> really well
>
> After restarting 2/6 nodes, there is also quite a distinction in the
> VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on
> self time (CPU)). The busy nodes are deeply red in o.a.h.impl.io.
> AbstractSessionInputBuffer.fillBuffer (as usual), the restarted nodes are
> not.
>
> The real distinction between busy and calm nodes is that busy nodes all
> have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as
> second to fillBuffer(), what are they doing?! Why? The calm nodes don't
> show this at all. Busy nodes all have o.a.l.codec stuff on top, restarted
> nodes don't.
>
> So, actually, i don't have a clue! Any, any ideas?
>
> Thanks,
> Markus
>
> Each replica is underpowered but performing really well after restart (and
> JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size
> 18 GB.
>



-- 
Sincerely yours
Mikhail Khludnev


RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Hello,

No i cannot expose the stack, VisualVM samples won't show it to me.

I am not sure if they're about to sync all the time, but every 15 minutes some 
documents are indexed (3 - 4k). For some reason, index time does increase with 
latency / CPU usage.

This situation runs fine for many hours, then it will slowly start to go bad, 
until nodes are restarted (or index size decreased).

Thanks,
Markus 
 
-Original message-
> From:Mikhail Khludnev 
> Sent: Wednesday 19th July 2017 14:18
> To: solr-user 
> Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> 
> >
> > The real distinction between busy and calm nodes is that busy nodes all
> > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as
> > second to fillBuffer(), what are they doing?
> 
> 
> Can you expose the stack deeper?
> Can they start to sync shards due to some reason?
> 
> On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma 
> wrote:
> 
> > Hello,
> >
> > Another peculiarity here, our six node (2 shards / 3 replica's) cluster is
> > going crazy after a good part of the day has passed. It starts eating CPU
> > for no good reason and its latency goes up. Grafana graphs show the problem
> > really well
> >
> > After restarting 2/6 nodes, there is also quite a distinction in the
> > VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on
> > self time (CPU)). The busy nodes are deeply red in o.a.h.impl.io.
> > AbstractSessionInputBuffer.fillBuffer (as usual), the restarted nodes are
> > not.
> >
> > The real distinction between busy and calm nodes is that busy nodes all
> > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as
> > second to fillBuffer(), what are they doing?! Why? The calm nodes don't
> > show this at all. Busy nodes all have o.a.l.codec stuff on top, restarted
> > nodes don't.
> >
> > So, actually, i don't have a clue! Any, any ideas?
> >
> > Thanks,
> > Markus
> >
> > Each replica is underpowered but performing really well after restart (and
> > JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size
> > 18 GB.
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> 


Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Mikhail Khludnev
You can get stack from kill -3 jstack even from solradmin. Overall, this
behavior looks like typical heavy merge kicking off from time to time.

On Wed, Jul 19, 2017 at 3:31 PM, Markus Jelsma 
wrote:

> Hello,
>
> No i cannot expose the stack, VisualVM samples won't show it to me.
>
> I am not sure if they're about to sync all the time, but every 15 minutes
> some documents are indexed (3 - 4k). For some reason, index time does
> increase with latency / CPU usage.
>
> This situation runs fine for many hours, then it will slowly start to go
> bad, until nodes are restarted (or index size decreased).
>
> Thanks,
> Markus
>
> -Original message-
> > From:Mikhail Khludnev 
> > Sent: Wednesday 19th July 2017 14:18
> > To: solr-user 
> > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> >
> > >
> > > The real distinction between busy and calm nodes is that busy nodes all
> > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms()
> as
> > > second to fillBuffer(), what are they doing?
> >
> >
> > Can you expose the stack deeper?
> > Can they start to sync shards due to some reason?
> >
> > On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello,
> > >
> > > Another peculiarity here, our six node (2 shards / 3 replica's)
> cluster is
> > > going crazy after a good part of the day has passed. It starts eating
> CPU
> > > for no good reason and its latency goes up. Grafana graphs show the
> problem
> > > really well
> > >
> > > After restarting 2/6 nodes, there is also quite a distinction in the
> > > VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on
> > > self time (CPU)). The busy nodes are deeply red in o.a.h.impl.io.
> > > AbstractSessionInputBuffer.fillBuffer (as usual), the restarted nodes
> are
> > > not.
> > >
> > > The real distinction between busy and calm nodes is that busy nodes all
> > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms()
> as
> > > second to fillBuffer(), what are they doing?! Why? The calm nodes don't
> > > show this at all. Busy nodes all have o.a.l.codec stuff on top,
> restarted
> > > nodes don't.
> > >
> > > So, actually, i don't have a clue! Any, any ideas?
> > >
> > > Thanks,
> > > Markus
> > >
> > > Each replica is underpowered but performing really well after restart
> (and
> > > JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index
> size
> > > 18 GB.
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>



-- 
Sincerely yours
Mikhail Khludnev


RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Oh of course, didn't think about it. Will do next time this happens (which 
might take a few weeks since we purged the index).

It could be merging indeed, but i don't understand why the scheduler would wait 
so long, should it not schedule the same when running a long time vs. a fresh 
start?

Thanks,
Markus
 
-Original message-
> From:Mikhail Khludnev 
> Sent: Wednesday 19th July 2017 14:41
> To: solr-user 
> Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> 
> You can get stack from kill -3 jstack even from solradmin. Overall, this
> behavior looks like typical heavy merge kicking off from time to time.
> 
> On Wed, Jul 19, 2017 at 3:31 PM, Markus Jelsma 
> wrote:
> 
> > Hello,
> >
> > No i cannot expose the stack, VisualVM samples won't show it to me.
> >
> > I am not sure if they're about to sync all the time, but every 15 minutes
> > some documents are indexed (3 - 4k). For some reason, index time does
> > increase with latency / CPU usage.
> >
> > This situation runs fine for many hours, then it will slowly start to go
> > bad, until nodes are restarted (or index size decreased).
> >
> > Thanks,
> > Markus
> >
> > -Original message-
> > > From:Mikhail Khludnev 
> > > Sent: Wednesday 19th July 2017 14:18
> > > To: solr-user 
> > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> > >
> > > >
> > > > The real distinction between busy and calm nodes is that busy nodes all
> > > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms()
> > as
> > > > second to fillBuffer(), what are they doing?
> > >
> > >
> > > Can you expose the stack deeper?
> > > Can they start to sync shards due to some reason?
> > >
> > > On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma <
> > markus.jel...@openindex.io>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > Another peculiarity here, our six node (2 shards / 3 replica's)
> > cluster is
> > > > going crazy after a good part of the day has passed. It starts eating
> > CPU
> > > > for no good reason and its latency goes up. Grafana graphs show the
> > problem
> > > > really well
> > > >
> > > > After restarting 2/6 nodes, there is also quite a distinction in the
> > > > VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on
> > > > self time (CPU)). The busy nodes are deeply red in o.a.h.impl.io.
> > > > AbstractSessionInputBuffer.fillBuffer (as usual), the restarted nodes
> > are
> > > > not.
> > > >
> > > > The real distinction between busy and calm nodes is that busy nodes all
> > > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms()
> > as
> > > > second to fillBuffer(), what are they doing?! Why? The calm nodes don't
> > > > show this at all. Busy nodes all have o.a.l.codec stuff on top,
> > restarted
> > > > nodes don't.
> > > >
> > > > So, actually, i don't have a clue! Any, any ideas?
> > > >
> > > > Thanks,
> > > > Markus
> > > >
> > > > Each replica is underpowered but performing really well after restart
> > (and
> > > > JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index
> > size
> > > > 18 GB.
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> 


Re: Returning unique values for suggestion

2017-07-19 Thread Walter Underwood
I was surprised to see duplicate suggestions coming from my 4.10.4 suggester. 
This is analyzing infix with terms loaded from the index.

"titles_infix": {
"chemistry": {
"numFound": 10,
"suggestions": [
{
"term": "Chemistry",
"weight": 5285,
"payload": ""
},
{
"term": "Chemistry",
"weight": 4548,
"payload": ""
},
{
"term": "Chemistry",
"weight": 3002,
"payload": ""
},
{
"term": "Introductory Chemistry",
"weight": 2823,
"payload": ""
},


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 19, 2017, at 3:33 AM, Zheng Lin Edwin Yeo  wrote:
> 
> Hi,
> 
> Is there any configuration that we can set for the /suggest handler, so
> that the suggestion output will only return unique records, and not
> duplicated?
> 
> Below is my /suggest handler.
> 
>  
> 
> all
>   json
>   true
> content
> 100
> id, score
>  on
>  content
>  true
>  false
>  html
>  100
>204800
>  true
> 
> 
> Regards,
> Edwin



Getting IO Exception while Indexing

2017-07-19 Thread subbarao
Hi all,

we have solr cloud setup with 2 shards.

In this we trying to index documents by taking a json format input, and
creating a SolrDocument. and pushing to solr through solrJ. 

Then it is throwing exception  saying *"SolrUpdate got error: IOException
occured when talking to server" *

and in apache it is throwing 400 response code. But only few documents are
having this problem.Some documents are indexing fine.

Can anybody help us in this issue?


Thanks,
Subbarao.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-IO-Exception-while-Indexing-tp4346801.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting IO Exception while Indexing

2017-07-19 Thread Susheel Kumar
Is that always the problem with those documents or is it random. If it is
the same documents always, look what is different in those docs.

Usually i have seen these errors once in a while when SolrJ unable to
connect/communicate with Solr.

On Wed, Jul 19, 2017 at 10:23 AM, subbarao 
wrote:

> Hi all,
>
> we have solr cloud setup with 2 shards.
>
> In this we trying to index documents by taking a json format input, and
> creating a SolrDocument. and pushing to solr through solrJ.
>
> Then it is throwing exception  saying *"SolrUpdate got error: IOException
> occured when talking to server" *
>
> and in apache it is throwing 400 response code. But only few documents are
> having this problem.Some documents are indexing fine.
>
> Can anybody help us in this issue?
>
>
> Thanks,
> Subbarao.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Getting-IO-Exception-while-Indexing-tp4346801.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Erick Erickson
also jstack can give you a full stack trace

On Wed, Jul 19, 2017 at 5:47 AM, Markus Jelsma
 wrote:
> Oh of course, didn't think about it. Will do next time this happens (which 
> might take a few weeks since we purged the index).
>
> It could be merging indeed, but i don't understand why the scheduler would 
> wait so long, should it not schedule the same when running a long time vs. a 
> fresh start?
>
> Thanks,
> Markus
>
> -Original message-
>> From:Mikhail Khludnev 
>> Sent: Wednesday 19th July 2017 14:41
>> To: solr-user 
>> Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
>>
>> You can get stack from kill -3 jstack even from solradmin. Overall, this
>> behavior looks like typical heavy merge kicking off from time to time.
>>
>> On Wed, Jul 19, 2017 at 3:31 PM, Markus Jelsma 
>> wrote:
>>
>> > Hello,
>> >
>> > No i cannot expose the stack, VisualVM samples won't show it to me.
>> >
>> > I am not sure if they're about to sync all the time, but every 15 minutes
>> > some documents are indexed (3 - 4k). For some reason, index time does
>> > increase with latency / CPU usage.
>> >
>> > This situation runs fine for many hours, then it will slowly start to go
>> > bad, until nodes are restarted (or index size decreased).
>> >
>> > Thanks,
>> > Markus
>> >
>> > -Original message-
>> > > From:Mikhail Khludnev 
>> > > Sent: Wednesday 19th July 2017 14:18
>> > > To: solr-user 
>> > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
>> > >
>> > > >
>> > > > The real distinction between busy and calm nodes is that busy nodes all
>> > > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms()
>> > as
>> > > > second to fillBuffer(), what are they doing?
>> > >
>> > >
>> > > Can you expose the stack deeper?
>> > > Can they start to sync shards due to some reason?
>> > >
>> > > On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma <
>> > markus.jel...@openindex.io>
>> > > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > Another peculiarity here, our six node (2 shards / 3 replica's)
>> > cluster is
>> > > > going crazy after a good part of the day has passed. It starts eating
>> > CPU
>> > > > for no good reason and its latency goes up. Grafana graphs show the
>> > problem
>> > > > really well
>> > > >
>> > > > After restarting 2/6 nodes, there is also quite a distinction in the
>> > > > VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on
>> > > > self time (CPU)). The busy nodes are deeply red in o.a.h.impl.io.
>> > > > AbstractSessionInputBuffer.fillBuffer (as usual), the restarted nodes
>> > are
>> > > > not.
>> > > >
>> > > > The real distinction between busy and calm nodes is that busy nodes all
>> > > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms()
>> > as
>> > > > second to fillBuffer(), what are they doing?! Why? The calm nodes don't
>> > > > show this at all. Busy nodes all have o.a.l.codec stuff on top,
>> > restarted
>> > > > nodes don't.
>> > > >
>> > > > So, actually, i don't have a clue! Any, any ideas?
>> > > >
>> > > > Thanks,
>> > > > Markus
>> > > >
>> > > > Each replica is underpowered but performing really well after restart
>> > (and
>> > > > JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index
>> > size
>> > > > 18 GB.
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Sincerely yours
>> > > Mikhail Khludnev
>> > >
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>


Re: regarding cursorMark feature for deep pagination

2017-07-19 Thread Erick Erickson
Chris Hostetter has a writeup here that has a good explanation:

https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Best,
Erick

On Tue, Jul 18, 2017 at 10:00 PM, suresh pendap  wrote:
> Hi,
>
> This question is more about the Implementation detail of the cursorMark
> feature.
>
> I was reading about using the cursorMark feature for deep pagination in
> Solr mentioned in this blog http://yonik.com/solr/paging-and-deep-paging/
>
> It is not clear to me as to how it is more efficient as compared to the
> regular pagination.
>
> The blog says that there is no state maintained on the server side.
>
> If there is no state maintained then where does it get its efficiency from?
>
> Assuming that it does maintain the state on the server side, does the next
> page request has to go the same aggregator node which had served the first
> page?
>
>
> Thanks
> Suresh


Re: Getting IO Exception while Indexing

2017-07-19 Thread Walter Underwood
A 400 would not be a failure to connect. A 400 means that the client is sending 
a bad request.

Look at the Solr logs. Most likely, the document is invalid.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 19, 2017, at 7:54 AM, Susheel Kumar  wrote:
> 
> Is that always the problem with those documents or is it random. If it is
> the same documents always, look what is different in those docs.
> 
> Usually i have seen these errors once in a while when SolrJ unable to
> connect/communicate with Solr.
> 
> On Wed, Jul 19, 2017 at 10:23 AM, subbarao 
> wrote:
> 
>> Hi all,
>> 
>> we have solr cloud setup with 2 shards.
>> 
>> In this we trying to index documents by taking a json format input, and
>> creating a SolrDocument. and pushing to solr through solrJ.
>> 
>> Then it is throwing exception  saying *"SolrUpdate got error: IOException
>> occured when talking to server" *
>> 
>> and in apache it is throwing 400 response code. But only few documents are
>> having this problem.Some documents are indexing fine.
>> 
>> Can anybody help us in this issue?
>> 
>> 
>> Thanks,
>> Subbarao.
>> 
>> 
>> 
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/Getting-IO-Exception-while-Indexing-tp4346801.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 



Re: regarding cursorMark feature for deep pagination

2017-07-19 Thread suresh pendap
Eric,
Thanks!! for the link.

-suresh

On Wed, Jul 19, 2017 at 8:11 AM, Erick Erickson 
wrote:

> Chris Hostetter has a writeup here that has a good explanation:
>
> https://lucidworks.com/2013/12/12/coming-soon-to-solr-
> efficient-cursor-based-iteration-of-large-result-sets/
>
> Best,
> Erick
>
> On Tue, Jul 18, 2017 at 10:00 PM, suresh pendap 
> wrote:
> > Hi,
> >
> > This question is more about the Implementation detail of the cursorMark
> > feature.
> >
> > I was reading about using the cursorMark feature for deep pagination in
> > Solr mentioned in this blog http://yonik.com/solr/paging-
> and-deep-paging/
> >
> > It is not clear to me as to how it is more efficient as compared to the
> > regular pagination.
> >
> > The blog says that there is no state maintained on the server side.
> >
> > If there is no state maintained then where does it get its efficiency
> from?
> >
> > Assuming that it does maintain the state on the server side, does the
> next
> > page request has to go the same aggregator node which had served the
> first
> > page?
> >
> >
> > Thanks
> > Suresh
>


Re: Solr Issue While indexing Data

2017-07-19 Thread Shawn Heisey
On 6/7/2017 5:10 AM, rajat.rast...@hindustantimes.com wrote:
> My enviorment 
>
> os :Ubuntu 14.04.1 LTS
> java : Orcale hotspot 1.8.0_121
> solr version :6.4.2
> cpu :16 cores
> ram :124 gb

Everybody seems to want different information from you.  Here's my
contribution:

On the linux commandline, run the "top" utility (not htop, or anything
else, actually type "top").  Press shift-M to sort the list by memory
usage, then grab a screenshot or a photo of that display.  Share the
image with us in some way.  Typically a file-sharing website is the best
option.

That will provide a wealth of information that can be useful for
narrowing down performance issues.

Thanks,
Shawn



Re: Copy field a source of copy field

2017-07-19 Thread tstusr
Well, our documents consist on pdf files (between 20 to 200 pages).

So, we catch words of all the file, for that, we use the extract handler,
that's why we have this fields:
  



We catch species in all the pdf content (On attr_content field)

Species captured are used for ranking purposes. So, we have to have the
whole name, that's why we use shingles. As an example, we catch from the
pdf:

abelmoschus achanioides
abies colimensis
abies concolor

Because that information is important, we provide a facet of those species,
grouped by genus (just the first word of the species). So, in the facet we
have to have:

abelmoschus (1)
abies (2)

Nevertheless, we need a sort of subquery, because first, we need the
complete species and then of those results facet by genus. For example:

the abies something else (This phrase shouldn't have to be captured)
the abies concolor something else (This phrase should've to be captured) ->
Finish with just "abies concolor" and for consequence then captured by genus

I realized that all genus are contained on species.

So, there is a way to make a facet with just the first word of a field, like
I've got for the field:

abelmoschus achanioides
abies colimensis
abies concolor

Just use the first word of those?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Copy-field-a-source-of-copy-field-tp4346425p4346846.html
Sent from the Solr - User mailing list archive at Nabble.com.


'ant test' gets stuck after aborting one run

2017-07-19 Thread Nawab Zada Asad Iqbal
Hi


I stopped 'ant test' target before it finished, and now whenever I run it
again, it is stuck at 'install-junit4-taskdef'.

I have tried 'ant clean' but it didn't help. I guessed that it could be
some locking thing in ivy or ant so I set ivy.sync to false in the
common-build.xml

 ""

I also deleted the .cache folder.

But that didn't help either.

What should I do?

When run with '-v', the execution halts at following logs:-

...
install-junit4-taskdef:
Overriding previous definition of property "ivy.version"
[ivy:cachepath] using inline mode to resolve
com.carrotsearch.randomizedtesting junit4-ant 2.5.0 (*(public))
[ivy:cachepath] no resolved descriptor found: launching default resolve
Overriding previous definition of property "ivy.version"
[ivy:cachepath] default: Checking cache for: dependency:
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {}
[ivy:cachepath] don't use cache for
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0: checkModified=true
[ivy:cachepath] tried
/Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/2.5.0/ivys/ivy.xml
[ivy:cachepath] tried
/Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/2.5.0/jars/junit4-ant.jar
[ivy:cachepath] local: no ivy file nor artifact found for
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
[ivy:cachepath] main: Checking cache for: dependency:
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {}
[ivy:cachepath] main: module revision found in cache:
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
[ivy:cachepath] :: resolving dependencies ::
com.carrotsearch.randomizedtesting#junit4-ant-caller;working
[ivy:cachepath] confs: [default, master, compile, provided, runtime,
system, sources, javadoc, optional]
[ivy:cachepath] validate = true
[ivy:cachepath] refresh = false
[ivy:cachepath] resolving dependencies for configuration 'default'
[ivy:cachepath] == resolving dependencies for
com.carrotsearch.randomizedtesting#junit4-ant-caller;working [default]
[ivy:cachepath] == resolving dependencies
com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
[default->default]
[ivy:cachepath] default: Checking cache for: dependency:
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {default=[default],
master=[master], compile=[compile], provided=[provided], runtime=[runtime],
system=[system], sources=[sources], javadoc=[javadoc], optional=[optional]}
[ivy:cachepath] don't use cache for
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0: checkModified=true
[ivy:cachepath] tried
/Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/2.5.0/ivys/ivy.xml
[ivy:cachepath] tried
/Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/2.5.0/jars/junit4-ant.jar
[ivy:cachepath] local: no ivy file nor artifact found for
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
[ivy:cachepath] main: Checking cache for: dependency:
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {default=[default],
master=[master], compile=[compile], provided=[provided], runtime=[runtime],
system=[system], sources=[sources], javadoc=[javadoc], optional=[optional]}
[ivy:cachepath] main: module revision found in cache:
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
[ivy:cachepath] found
com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 in public
[ivy:cachepath] == resolving dependencies
com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
[default->runtime]
[ivy:cachepath] == resolving dependencies
com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
[default->compile]
[ivy:cachepath] == resolving dependencies
com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
[default->master]
[ivy:cachepath] resolving dependencies for configuration 'master'
[ivy:cachepath] == resolving dependencies for
com.carrotsearch.randomizedtesting#junit4-ant-caller;working [master]
[ivy:cachepath] == resolving dependencies
com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
[master->master]
[ivy:cachepath] resolving dependencies for configuration 'compile'
[ivy:cachepath] == resolving dependencies for
com.carrotsearch.randomizedtesting#junit4-ant-caller;working [compile]
[ivy:cachepath] == resolving dependencies
com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
[compile->compile]
[ivy:cachepath] resolving dependencies for configuration 'provided'
[ivy:cachepath] == resolving dependencies for
com.carrotsearch.randomizedtesting#junit4-ant-caller;working [provided]
[ivy:cachepath] == resolving dependencies
com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant

Re: 'ant test' gets stuck after aborting one run

2017-07-19 Thread Erick Erickson
This is often an issue with ivy, one of my least favorite "features"
of Ivy. To cure it I delete all the *.lck files in my ivy cache. On my
mac:

cd ~/.ivy2
find . -name "*.lck" | xargs rm

Best,
Erick


On Wed, Jul 19, 2017 at 11:21 AM, Nawab Zada Asad Iqbal
 wrote:
> Hi
>
>
> I stopped 'ant test' target before it finished, and now whenever I run it
> again, it is stuck at 'install-junit4-taskdef'.
>
> I have tried 'ant clean' but it didn't help. I guessed that it could be
> some locking thing in ivy or ant so I set ivy.sync to false in the
> common-build.xml
>
>  ""
>
> I also deleted the .cache folder.
>
> But that didn't help either.
>
> What should I do?
>
> When run with '-v', the execution halts at following logs:-
>
> ...
> install-junit4-taskdef:
> Overriding previous definition of property "ivy.version"
> [ivy:cachepath] using inline mode to resolve
> com.carrotsearch.randomizedtesting junit4-ant 2.5.0 (*(public))
> [ivy:cachepath] no resolved descriptor found: launching default resolve
> Overriding previous definition of property "ivy.version"
> [ivy:cachepath] default: Checking cache for: dependency:
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {}
> [ivy:cachepath] don't use cache for
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0: checkModified=true
> [ivy:cachepath] tried
> /Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/2.5.0/ivys/ivy.xml
> [ivy:cachepath] tried
> /Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/2.5.0/jars/junit4-ant.jar
> [ivy:cachepath] local: no ivy file nor artifact found for
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> [ivy:cachepath] main: Checking cache for: dependency:
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {}
> [ivy:cachepath] main: module revision found in cache:
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> [ivy:cachepath] :: resolving dependencies ::
> com.carrotsearch.randomizedtesting#junit4-ant-caller;working
> [ivy:cachepath] confs: [default, master, compile, provided, runtime,
> system, sources, javadoc, optional]
> [ivy:cachepath] validate = true
> [ivy:cachepath] refresh = false
> [ivy:cachepath] resolving dependencies for configuration 'default'
> [ivy:cachepath] == resolving dependencies for
> com.carrotsearch.randomizedtesting#junit4-ant-caller;working [default]
> [ivy:cachepath] == resolving dependencies
> com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> [default->default]
> [ivy:cachepath] default: Checking cache for: dependency:
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {default=[default],
> master=[master], compile=[compile], provided=[provided], runtime=[runtime],
> system=[system], sources=[sources], javadoc=[javadoc], optional=[optional]}
> [ivy:cachepath] don't use cache for
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0: checkModified=true
> [ivy:cachepath] tried
> /Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/2.5.0/ivys/ivy.xml
> [ivy:cachepath] tried
> /Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/2.5.0/jars/junit4-ant.jar
> [ivy:cachepath] local: no ivy file nor artifact found for
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> [ivy:cachepath] main: Checking cache for: dependency:
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {default=[default],
> master=[master], compile=[compile], provided=[provided], runtime=[runtime],
> system=[system], sources=[sources], javadoc=[javadoc], optional=[optional]}
> [ivy:cachepath] main: module revision found in cache:
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> [ivy:cachepath] found
> com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 in public
> [ivy:cachepath] == resolving dependencies
> com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> [default->runtime]
> [ivy:cachepath] == resolving dependencies
> com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> [default->compile]
> [ivy:cachepath] == resolving dependencies
> com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> [default->master]
> [ivy:cachepath] resolving dependencies for configuration 'master'
> [ivy:cachepath] == resolving dependencies for
> com.carrotsearch.randomizedtesting#junit4-ant-caller;working [master]
> [ivy:cachepath] == resolving dependencies
> com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> [master->master]
> [ivy:cachepath] resolving dependencies for configuration 'compile'
> [ivy:cachepath] == resolving dependencies for
> com.carrotsearch.randomizedtesting#junit4-ant-caller;working [compile]
> [ivy:cachepath] == resolving dependencies
> com.carrotsearch.random

Re: 'ant test' gets stuck after aborting one run

2017-07-19 Thread Nawab Zada Asad Iqbal
Thanks Erick for the fix.

Meanwhile, I had restarted the terminal, then the machine and cloned the
repo again and then realized that the problematic status is somewhere else
on the drive which I don't know.


Nawab

On Wed, Jul 19, 2017 at 12:57 PM, Erick Erickson 
wrote:

> This is often an issue with ivy, one of my least favorite "features"
> of Ivy. To cure it I delete all the *.lck files in my ivy cache. On my
> mac:
>
> cd ~/.ivy2
> find . -name "*.lck" | xargs rm
>
> Best,
> Erick
>
>
> On Wed, Jul 19, 2017 at 11:21 AM, Nawab Zada Asad Iqbal
>  wrote:
> > Hi
> >
> >
> > I stopped 'ant test' target before it finished, and now whenever I run it
> > again, it is stuck at 'install-junit4-taskdef'.
> >
> > I have tried 'ant clean' but it didn't help. I guessed that it could be
> > some locking thing in ivy or ant so I set ivy.sync to false in the
> > common-build.xml
> >
> >  ""
> >
> > I also deleted the .cache folder.
> >
> > But that didn't help either.
> >
> > What should I do?
> >
> > When run with '-v', the execution halts at following logs:-
> >
> > ...
> > install-junit4-taskdef:
> > Overriding previous definition of property "ivy.version"
> > [ivy:cachepath] using inline mode to resolve
> > com.carrotsearch.randomizedtesting junit4-ant 2.5.0 (*(public))
> > [ivy:cachepath] no resolved descriptor found: launching default resolve
> > Overriding previous definition of property "ivy.version"
> > [ivy:cachepath] default: Checking cache for: dependency:
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {}
> > [ivy:cachepath] don't use cache for
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0: checkModified=true
> > [ivy:cachepath] tried
> > /Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/
> 2.5.0/ivys/ivy.xml
> > [ivy:cachepath] tried
> > /Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/
> 2.5.0/jars/junit4-ant.jar
> > [ivy:cachepath] local: no ivy file nor artifact found for
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> > [ivy:cachepath] main: Checking cache for: dependency:
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {}
> > [ivy:cachepath] main: module revision found in cache:
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> > [ivy:cachepath] :: resolving dependencies ::
> > com.carrotsearch.randomizedtesting#junit4-ant-caller;working
> > [ivy:cachepath] confs: [default, master, compile, provided, runtime,
> > system, sources, javadoc, optional]
> > [ivy:cachepath] validate = true
> > [ivy:cachepath] refresh = false
> > [ivy:cachepath] resolving dependencies for configuration 'default'
> > [ivy:cachepath] == resolving dependencies for
> > com.carrotsearch.randomizedtesting#junit4-ant-caller;working [default]
> > [ivy:cachepath] == resolving dependencies
> > com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.
> carrotsearch.randomizedtesting#junit4-ant;2.5.0
> > [default->default]
> > [ivy:cachepath] default: Checking cache for: dependency:
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {default=[default],
> > master=[master], compile=[compile], provided=[provided],
> runtime=[runtime],
> > system=[system], sources=[sources], javadoc=[javadoc],
> optional=[optional]}
> > [ivy:cachepath] don't use cache for
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0: checkModified=true
> > [ivy:cachepath] tried
> > /Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/
> 2.5.0/ivys/ivy.xml
> > [ivy:cachepath] tried
> > /Users/niqbal/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/
> 2.5.0/jars/junit4-ant.jar
> > [ivy:cachepath] local: no ivy file nor artifact found for
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> > [ivy:cachepath] main: Checking cache for: dependency:
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 {default=[default],
> > master=[master], compile=[compile], provided=[provided],
> runtime=[runtime],
> > system=[system], sources=[sources], javadoc=[javadoc],
> optional=[optional]}
> > [ivy:cachepath] main: module revision found in cache:
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0
> > [ivy:cachepath] found
> > com.carrotsearch.randomizedtesting#junit4-ant;2.5.0 in public
> > [ivy:cachepath] == resolving dependencies
> > com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.
> carrotsearch.randomizedtesting#junit4-ant;2.5.0
> > [default->runtime]
> > [ivy:cachepath] == resolving dependencies
> > com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.
> carrotsearch.randomizedtesting#junit4-ant;2.5.0
> > [default->compile]
> > [ivy:cachepath] == resolving dependencies
> > com.carrotsearch.randomizedtesting#junit4-ant-caller;working->com.
> carrotsearch.randomizedtesting#junit4-ant;2.5.0
> > [default->master]
> > [ivy:cachepath] resolving dependencies for configuration 'master'
> > [ivy:cachepath] == resolving dependencies for
> > com.carro

Issues trying to boost phrase containing stop word

2017-07-19 Thread Shamik Bandopadhyay
Hi,

  I'm trying to show titles with exact query phrase match at the top of the
result. That includes supporting stop words as part of the phrase. For e.g.
if I'm using "about dynamic "block" , I expect the title with "About
Dynamic Blocks" to appear at the top. Since the title field uses
stoprword filter factory as part of its analysis chain, I decided to create
a copyfield of title and use that in search with a higher boost. That
didn't seem to work either. Although it brought back the expected document
at the top, it excluded documents with title "Dynamic Block Grip
Reference", to be precise content which doesn't have "about" in title or
subject. Even setting the default operator to OR didn't make any
difference. Here's the entry from config.


















































Request handler:

 
   explicit
   velocity
   browse
   layout
   Solritas
   AND
edismax
   title^5 titleExact^15 subject^3 description^2

   100%
   *:*
   10
   *,score
 
  

Sample data:


SOLR1000
About Dynamic Blocks
Dynamic blocks contain rules, or parameters, for how
to change the appearance of the block reference when it is inserted in the
drawing. With dynamic blocks you can insert one block that can change
shape, size, or configuration instead of inserting one of many static block
definitions. For example, instead of creating multiple interior door blocks
of different sizes, you can create one resizable door block. You author
dynamic blocks with either constraint parameters or action parameters.
Note: Using both constraint parameters and action parameters in the same
block definition is not recommended. Add Constraints In a block definition,
constraint parameters Associate objects with one another Restrict geometry
or dimensions The following example shows a block reference with a
constraint (in gray) and a constraint parameter (blue with grip). Once the
block is inserted into the drawing, the constraint parameters can be edited
as properties by using the Properties palette. Add Actions and Parameters
In a block definition, actions and parameters provide rules for the
behavior of a block once it is inserted into the drawing. Depending on the
specified block geometry or parameter, you can associate an action to that
parameter. The parameter is represented as a grip in the drawing. When you
edit the grip, the associated action determines what will change in the
block reference. Like constraint parameters, action parameters can be
changed using the Properties palette.
Dynamic blocks contain rules, or parameters, for
how to change the appearance of the block reference when it is inserted in
the drawing.


SOLR1001
About Creating Dynamic Blocks
This table gives an overview of the steps required
add behaviors that make blocks dynamic. Plan the block content. Know how
the block should change or move, and what parts will depend on the others.
Example: The block will be resizable, and after it is resized, additional
geometry is displayed. Draw the geometry. Draw the block geometry in the
drawing area or the Block Editor. Note: If you will use visibility states
to change how geometry is displayed, you may not want to include all the
geometry at this point. Add parameters. Add either individual parameters or
parameter sets to define geometry that will be affected by an action or
manipulation. Keep in mind the objects that will be dependent on one
another. Add actions. If you are working with action parameters, if
necessary, add actions to define what will happen to the geometry when it
is manipulated. Define custom properties. Add properties that determine how
the block is displayed in the drawing area. Custom properties affect grips,
labels, and preset values for block geometry. Test the block. On the
ribbon, in the Block Editor contextual tab, Open/Save panel, click Test
Block to test the block before you save it.
This table gives an overview of the steps
required add behaviors that make blocks dynamic.


SOLR1002
About Modifying Dynamic Block Definitions
Use the Block Editor to edit, correct, and save a
block definition. Correct Errors in Action Parameters A yellow alert icon (
) is displayed when A parameter is not associated with an action An action
is not associated with a parameter or selection set To correct these
errors, hover over the yellow alert icon until the tooltip displays a
description of the problem. Then double-click the constraint and follow the
prompts. Save Dynamic Blocks When you save a block definition, the current
values of the geometry and parameters in the block become the default
values for the block reference. The default visibility state for the block
reference is the visibility state at the top of the list in the Manage
Visibility States dialog box. Note: If you click File menu Save while you
are in the Block Editor, you will save the drawing but not the block
definition. You must specifically save the block definition while

Re: Copy field a source of copy field

2017-07-19 Thread Erick Erickson
OK, you'll need two fields pretty much for certain. The trick is
getting _only_ genus names in the genus field.

The simplest thing to do would be a straight copyField with a single
keep word filter that contains a list of all the genera. That
presupposes that the genera are disjoint sets from all other words.
You search on your species field and facet on the genus field.

But assuming your genera are not disjoint from all other words, h.
Do you have a way of unambiguously identifying genus/species pairs in
the text you're processing? If you do we can work with that, but
without that you're talking entity recognition of some sort.

BTW, there's no real need to shingle the species field, just search
for "genus species" as a phrase. Unless those two appear next to each
other in order you won't get a hit.

Best,
Erick

On Wed, Jul 19, 2017 at 11:07 AM, tstusr  wrote:
> Well, our documents consist on pdf files (between 20 to 200 pages).
>
> So, we catch words of all the file, for that, we use the extract handler,
> that's why we have this fields:
>
> 
> 
>
> We catch species in all the pdf content (On attr_content field)
>
> Species captured are used for ranking purposes. So, we have to have the
> whole name, that's why we use shingles. As an example, we catch from the
> pdf:
>
> abelmoschus achanioides
> abies colimensis
> abies concolor
>
> Because that information is important, we provide a facet of those species,
> grouped by genus (just the first word of the species). So, in the facet we
> have to have:
>
> abelmoschus (1)
> abies (2)
>
> Nevertheless, we need a sort of subquery, because first, we need the
> complete species and then of those results facet by genus. For example:
>
> the abies something else (This phrase shouldn't have to be captured)
> the abies concolor something else (This phrase should've to be captured) ->
> Finish with just "abies concolor" and for consequence then captured by genus
>
> I realized that all genus are contained on species.
>
> So, there is a way to make a facet with just the first word of a field, like
> I've got for the field:
>
> abelmoschus achanioides
> abies colimensis
> abies concolor
>
> Just use the first word of those?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Copy-field-a-source-of-copy-field-tp4346425p4346846.html
> Sent from the Solr - User mailing list archive at Nabble.com.


DateRangeField and Timezone

2017-07-19 Thread Ulul
Hi everyone

I'm trying to query on dates with time zone taken into account. I have
the following document

{"date" : "2016-12-31T04:15:00Z", "desc" : "winter time day before" }
date being of type DateRangeField

I would like to be able to perform a query based on local date. For
instance the above date corresponds to 2016-12-30 in New York (UTC-5 in
winter) so I would expect the following query NOT to retrieve the document :

http://127.0.1.1:7574/solr/date_test/select?TZ=America/New_York&indent=on&q=date:2016-12-31&wt=json

Unfortunately it does... and it's the same using filter query

https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
describes how to use TZ in facets, why doesn't it work with simple queries ?

I'm using Solr 6.5.1

I had to add DateRangeField type myself to the collection schema. I did
it with :

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type" : {
 "name":"DateRangeField",
 "class":"solr.DateRangeField"
  }
}' http://localhost:7574/solr/date_test/schema

Thank you for your help

Ulul



Re: Highlighting words with special characters

2017-07-19 Thread Ahmet Arslan
Hi,
Maybe name of the UAX29URLEMailTokenizer is deceiving you?It does *not* 
tokenize URLs and Emails. Actually it recognises them and emits them as a 
single token.
Ahmet

On Wednesday, July 19, 2017, 12:00:05 PM GMT+3, Lasitha Wattaladeniya 
 wrote:

Update,

I changed the UAX29URLEmailTokenizerFactory to StandardTokenizerFactory and
now it shows highlighted text fragments in the indexed email text.

But I don't understand this behavior. Can someone shed some light please

On 18 Jul 2017 14:18, "Lasitha Wattaladeniya"  wrote:

> Further more, ngram field has following tokenizer/filter chain in index
> and query
>
> UAX29URLEmailTokenizerFactory (only in index)
> stopFilterFactory
> LowerCaseFilterFactory
> ASCIIFoldingFilterFactory
> EnglishPossessiveFilterFactory
> StemmerOverrideFilterFactory (only in query)
> NgramTokenizerFactory (only in index)
>
> Regards,
> Lasitha
>
> On 18 Jul 2017 14:11, "Lasitha Wattaladeniya"  wrote:
>
>> Hi devs,
>>
>> I have setup solr highlighting with default setup (only changed the
>> fragsize to 0 to match any field length). It worked fine but recently I
>> discovered it doesn't highlight for words with special characters in the
>> middle.
>>
>> For an example, let's say I have indexed email address test.f...@ran.com
>> to a ngram field. And when I search for the partial text fsdg, I get the
>> results but it's not highlighted. It works in all other scenarios as
>> expected.
>>
>> The ngram field has termVectors, termPositions, termOffsets set to true.
>>
>> Can somebody please suggest me, what may be wrong here?
>>
>> (sorry for the unstructured text. Typed using a mobile phone )
>>
>> Regards
>> Lasitha
>>
>


Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread Koji Sekiguchi

Hi Shamik,

How about using ShingleFilter which constructs token n-grams from a token 
stream?

http://lucene.apache.org/core/6_6_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html

As for "about dynamic block", ShingleFilter produces "about dynamic" and "dynamic 
block".

Thanks,

Koji

On 2017/07/20 5:54, Shamik Bandopadhyay wrote:

Hi,

   I'm trying to show titles with exact query phrase match at the top of the
result. That includes supporting stop words as part of the phrase. For e.g.
if I'm using "about dynamic "block" , I expect the title with "About
Dynamic Blocks" to appear at the top. Since the title field uses
stoprword filter factory as part of its analysis chain, I decided to create
a copyfield of title and use that in search with a higher boost. That
didn't seem to work either. Although it brought back the expected document
at the top, it excluded documents with title "Dynamic Block Grip
Reference", to be precise content which doesn't have "about" in title or
subject. Even setting the default operator to OR didn't make any
difference. Here's the entry from config.


















































Request handler:

  
explicit
velocity
browse
layout
Solritas
AND
 edismax
title^5 titleExact^15 subject^3 description^2

100%
*:*
10
*,score
  
   

Sample data:


SOLR1000
About Dynamic Blocks
Dynamic blocks contain rules, or parameters, for how
to change the appearance of the block reference when it is inserted in the
drawing. With dynamic blocks you can insert one block that can change
shape, size, or configuration instead of inserting one of many static block
definitions. For example, instead of creating multiple interior door blocks
of different sizes, you can create one resizable door block. You author
dynamic blocks with either constraint parameters or action parameters.
Note: Using both constraint parameters and action parameters in the same
block definition is not recommended. Add Constraints In a block definition,
constraint parameters Associate objects with one another Restrict geometry
or dimensions The following example shows a block reference with a
constraint (in gray) and a constraint parameter (blue with grip). Once the
block is inserted into the drawing, the constraint parameters can be edited
as properties by using the Properties palette. Add Actions and Parameters
In a block definition, actions and parameters provide rules for the
behavior of a block once it is inserted into the drawing. Depending on the
specified block geometry or parameter, you can associate an action to that
parameter. The parameter is represented as a grip in the drawing. When you
edit the grip, the associated action determines what will change in the
block reference. Like constraint parameters, action parameters can be
changed using the Properties palette.
Dynamic blocks contain rules, or parameters, for
how to change the appearance of the block reference when it is inserted in
the drawing.


SOLR1001
About Creating Dynamic Blocks
This table gives an overview of the steps required
add behaviors that make blocks dynamic. Plan the block content. Know how
the block should change or move, and what parts will depend on the others.
Example: The block will be resizable, and after it is resized, additional
geometry is displayed. Draw the geometry. Draw the block geometry in the
drawing area or the Block Editor. Note: If you will use visibility states
to change how geometry is displayed, you may not want to include all the
geometry at this point. Add parameters. Add either individual parameters or
parameter sets to define geometry that will be affected by an action or
manipulation. Keep in mind the objects that will be dependent on one
another. Add actions. If you are working with action parameters, if
necessary, add actions to define what will happen to the geometry when it
is manipulated. Define custom properties. Add properties that determine how
the block is displayed in the drawing area. Custom properties affect grips,
labels, and preset values for block geometry. Test the block. On the
ribbon, in the Block Editor contextual tab, Open/Save panel, click Test
Block to test the block before you save it.
This table gives an overview of the steps
required add behaviors that make blocks dynamic.


SOLR1002
About Modifying Dynamic Block Definitions
Use the Block Editor to edit, correct, and save a
block definition. Correct Errors in Action Parameters A yellow alert icon (
) is displayed when A parameter is not associated with an action An action
is not associated with a parameter or selection set To correct these
errors, hover over the yellow alert icon until the tooltip displays a
description of the problem. Then double-click the constraint and follow the
prompts. Save Dynamic Blocks When you save a block definition, the current
values of the geometry and parameters in the blo

Re: Get results in multiple orders (multiple boosts)

2017-07-19 Thread Susheel Kumar
Let me try to put an example for custom sort.

On Wed, Jul 19, 2017 at 6:34 AM, Rick Leir  wrote:

> Luca,
> You can pass a sort parameter in the query. User A could sort=date%20desc
> and user b could sort=foofield%20asc.
>
> Maybe query functions can also help with this. Cheers -- Rick
>
> On July 19, 2017 4:39:59 AM EDT, Luca Dall'Osto
>  wrote:
> >Hello,The problem of build an index is that each user has a custom
> >source order and category order: are not static orders (for example
> >user X could have category:5 as most important category but user Y
> >could have category:9 as most important).
> >Has anyone ever written a custom sort function in solr?Maybe a link of
> >a tutorial or an example could be very helpful. Thanks
> >
> >Luca
> >
> >On Tuesday, July 18, 2017 4:18 PM, alessandro.benedetti
> > wrote:
> >
> >
> >"I have different "sort preferences", so I can't build a index and use
> >for
> >sorting.Maybe I have to sort by category then by source and by language
> >or
> >by source, then by category and by date"
> >
> >I would like to focus on this bit.
> >It is ok to go for a custom function and sort at query time, but I am
> >curious to explore why an index time solution should not be ok.
> >You can have these distinct fields :
> >source_priority
> >language_priority
> >category_priority
> >ect
> >
> >This values can be assigned at the documents at indexing time ( using
> >for
> >example a custom update request processor).
> >Then at query time you can easily sort on those values in a multi
> >layered
> >approach :
> >sort:source_priority desc, category_priority  desc
> >Of course, if the priority for a source changes quite often or if it's
> >user
> >dependent, a query time solution would be preferred.
> >
> >
> >
> >
> >
> >-
> >---
> >Alessandro Benedetti
> >Search Consultant, R&D Software Engineer, Director
> >Sease Ltd. - www.sease.io
> >--
> >View this message in context:
> >http://lucene.472066.n3.nabble.com/Get-results-in-
> multiple-orders-multiple-boosts-tp4346304p4346559.html
> >Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: Returning unique values for suggestion

2017-07-19 Thread Zheng Lin Edwin Yeo
I am getting something similar to yours too, but I'm using Solr 6.5.1.


  "highlighting":{
"1":{
  "content":["Incoming Call"]},
"2":{
  "content":["Incoming Call"]},
"3":{
  "content":["Outgoing Call"]},
"4":{
  "content":["Outgoing Call"]},

Regards,
Edwin


On 19 July 2017 at 22:21, Walter Underwood  wrote:

> I was surprised to see duplicate suggestions coming from my 4.10.4
> suggester. This is analyzing infix with terms loaded from the index.
>
> "titles_infix": {
> "chemistry": {
> "numFound": 10,
> "suggestions": [
> {
> "term": "Chemistry",
> "weight": 5285,
> "payload": ""
> },
> {
> "term": "Chemistry",
> "weight": 4548,
> "payload": ""
> },
> {
> "term": "Chemistry",
> "weight": 3002,
> "payload": ""
> },
> {
> "term": "Introductory Chemistry",
> "weight": 2823,
> "payload": ""
> },
>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jul 19, 2017, at 3:33 AM, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > Is there any configuration that we can set for the /suggest handler, so
> > that the suggestion output will only return unique records, and not
> > duplicated?
> >
> > Below is my /suggest handler.
> >
> >  
> > 
> > all
> >   json
> >   true
> > content
> > 100
> > id, score
> >  on
> >  content
> >  true
> >  false
> >  html
> >  100
> >204800
> >  true
> > 
> >
> > Regards,
> > Edwin
>
>


Re: Highlighting words with special characters

2017-07-19 Thread Lasitha Wattaladeniya
Hi ahmet,

But I have NgramTokenizerFactory at the end of indexing analyzer chain.
Therefore I should still tokenize the email address. But how this affects
the highlighting?, that's what I'm confused to understand

Solr version : 4.10.4

Regards,
Lasitha

On 20 Jul 2017 08:28, "Ahmet Arslan"  wrote:

Hi,
Maybe name of the UAX29URLEMailTokenizer is deceiving you?It does *not*
tokenize URLs and Emails. Actually it recognises them and emits them as a
single token.
Ahmet

On Wednesday, July 19, 2017, 12:00:05 PM GMT+3, Lasitha Wattaladeniya <
watt...@gmail.com> wrote:

Update,

I changed the UAX29URLEmailTokenizerFactory to StandardTokenizerFactory and
now it shows highlighted text fragments in the indexed email text.

But I don't understand this behavior. Can someone shed some light please

On 18 Jul 2017 14:18, "Lasitha Wattaladeniya"  wrote:

> Further more, ngram field has following tokenizer/filter chain in index
> and query
>
> UAX29URLEmailTokenizerFactory (only in index)
> stopFilterFactory
> LowerCaseFilterFactory
> ASCIIFoldingFilterFactory
> EnglishPossessiveFilterFactory
> StemmerOverrideFilterFactory (only in query)
> NgramTokenizerFactory (only in index)
>
> Regards,
> Lasitha
>
> On 18 Jul 2017 14:11, "Lasitha Wattaladeniya"  wrote:
>
>> Hi devs,
>>
>> I have setup solr highlighting with default setup (only changed the
>> fragsize to 0 to match any field length). It worked fine but recently I
>> discovered it doesn't highlight for words with special characters in the
>> middle.
>>
>> For an example, let's say I have indexed email address test.f...@ran.com
>> to a ngram field. And when I search for the partial text fsdg, I get the
>> results but it's not highlighted. It works in all other scenarios as
>> expected.
>>
>> The ngram field has termVectors, termPositions, termOffsets set to true.
>>
>> Can somebody please suggest me, what may be wrong here?
>>
>> (sorry for the unstructured text. Typed using a mobile phone )
>>
>> Regards
>> Lasitha
>>
>


Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread shamik
Thanks Koji, I've tried KeywordRepeatFilterFactory which keeps the original
term, but the Stopword filter in the analysis chain will remove it
nonetheless. That's why I thought of creating a separate field devoiding of
stopwords/stemmers. Let me know if I'm missing something here.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-trying-to-boost-phrase-containing-stop-word-tp4346860p4346909.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread Koji Sekiguchi

Hi Shamik,

I'm sorry but I don't understand why you use KeywordRepeatFilter.

I think it's normal to create separate fields to solve this kind of problems.
Why don't you have another separate field which has ShingleFilter as I 
mentioned in the previous reply?

Koji

On 2017/07/20 12:13, shamik wrote:

Thanks Koji, I've tried KeywordRepeatFilterFactory which keeps the original
term, but the Stopword filter in the analysis chain will remove it
nonetheless. That's why I thought of creating a separate field devoiding of
stopwords/stemmers. Let me know if I'm missing something here.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-trying-to-boost-phrase-containing-stop-word-tp4346860p4346909.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread shamik
Hi Koji,

   I'm using a copy field to preserve the original term with stopword. It's
mapped to  titleExact.

  


textExact definition:













I'm using minimum analyzers to keep the original query in titleExact which
is exactly what it is doing. Not sure how adding a shingle filter is going
to benefit here. 

adsktext does all the heavy lifting of removing the stopwors and applying
stemmers.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-trying-to-boost-phrase-containing-stop-word-tp4346860p4346915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Issue While indexing Data

2017-07-19 Thread rajat rastogi
hi Shawn ,

Top out put is as follows


 

regards

Rajat



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Issue-While-indexing-Data-tp4339417p4346917.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Issue While indexing Data

2017-07-19 Thread rajat rastogi
Hi Shawn ,

I shared The code base, config , schema with you . Were they of any help , or 
can You point what I am doing wrong in them .

regards

Rajat

On 19-Jul-2017, at 21:41, Shawn Heisey-2 [via Lucene] 
mailto:ml+s472066n4346826...@n3.nabble.com>>
 wrote:

On 6/7/2017 5:10 AM, [hidden 
email] wrote:
> My enviorment
>
> os :Ubuntu 14.04.1 LTS
> java : Orcale hotspot 1.8.0_121
> solr version :6.4.2
> cpu :16 cores
> ram :124 gb

Everybody seems to want different information from you.  Here's my
contribution:

On the linux commandline, run the "top" utility (not htop, or anything
else, actually type "top").  Press shift-M to sort the list by memory
usage, then grab a screenshot or a photo of that display.  Share the
image with us in some way.  Typically a file-sharing website is the best
option.

That will provide a wealth of information that can be useful for
narrowing down performance issues.

Thanks,
Shawn




If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-Issue-While-indexing-Data-tp4339417p4346826.html
To unsubscribe from Solr Issue While indexing Data, click 
here.
NAML

IMPORTANT NOTICE: "This email is confidential containing HT Media confidential 
information, may be legally privileged, and is for the intended recipient only. 
Access, disclosure, copying, distribution, or reliance on any of it by anyone 
else is prohibited and may be a criminal offense. Please delete if obtained in 
error and email confirmation to the sender." Experience news. Like never 
before. Only on www.hindustantimes.com




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Issue-While-indexing-Data-tp4339417p4346931.html
Sent from the Solr - User mailing list archive at Nabble.com.