Re: Match in MultiValueField

2014-06-18 Thread Ahmet Arslan
Hi,

So you have a query=foo bar and you want to retrieve yes foo ?

In other words you want to learn which fields caused a match.

May be you can use dynamic fields (instead of multivalued) and use 
debugQuery=true response to extract that information

no foo1 
yes foo 
no foo1 






On Monday, June 16, 2014 7:08 PM, StickHello  wrote:
If am searching for a query say "foo bar" and "foo" is matching in a value of
a multivalued field. can i get that value of the multivalued field.
Suppose i have a multivalued field like
no foo1 
yes foo 
no foo1 

Can i get that in f1 it match at 2nd value which is "yes foo", how do we get
this?
One thing that i know that highlighter will give me this, but that is
increasing the latency.

Any other suggested method?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Match-in-MultiValueField-tp4142085.html
Sent from the Solr - User mailing list archive at Nabble.com.



Basic Authentication for Admin GUI

2014-06-18 Thread Thomas Fischer
Hello,

I'm trying to set up a basic authentication for the admin function in the new 
solr GUI.
For this I have to give the appropriate url-pattern, e.g.
/
will match every URL in my solr server.
But the GUI now runs all administrative tasks under /#/ and there is no 
particular /admin/ branch anymore.
Does anybody know how to deal with that situation?
Can I move the administration to a new admin directory?

Best regards
Thomas Fischer





VelocityResponseWriter in solr

2014-06-18 Thread Vivekanand Ittigi
Hi,

I want to use VelocityResponseWriter in solr.

I've indexed a website( for example http://www.biginfolabs.com/). If i type
a query
http://localhost:8983/solr/collection1/select?q=santhos&wt=xml&indent=true

I will get all the fields related to that document (content,host,title,url
etc) but if i put the query in velocity
http://localhost:8983/solr/collection1/browse?q=santhosh i will see only 3
fields(id,url,content) instead of all other fields.

How can i display all the fields??

This is in solrconfig.xml


 
   explicit

   
   velocity
   browse
   layout
   Solritas

   
   edismax
   
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  title^10.0 description^5.0 keywords^5.0 author^2.0
resourcename^1.0
   
   text
   100%
   *:*
   10
   *,score

   
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
   
   text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename
   3

   
   on
   cat
   manu_exact
   content_type
   author_s
   ipod
   GB
   1
   cat,inStock
   after
   price
   0
   600
   50
   popularity
   0
   10
   3
   manufacturedate_dt
   NOW/YEAR-10YEARS
   NOW
   +1YEAR
   before
   after

   
   on
   content features title name
   html
   
   
   0
   title
   0
   name
   3
   200
   content
   750

   
   on
   false
   5
   2
   5
   true
   true
   5
   3
 

 
 
   spellcheck
 
  


Thanks,
Vivek


Solr and OpenNLP - Error loading class 'solr.OpenNLPTokenizerFactory'

2014-06-18 Thread Bhadra Mani
Hi All,

Using link
https://wiki.apache.org/solr/OpenNLP#Deployment_to_Solr

Followed the steps.
bin/trainall.sh - It keept running while I executed next step "go to
trunk-dir/solr and run 'ant test-contrib'"
(This has 2 test suite failed.)

Later created war file using 'ant dist'
(Build Successful)

Then run solr, working fine..

Added this =>

  


  


to schema.xml

now getting this error. Please help

4961 [coreLoadExecutor-4-thread-1] WARN
org.apache.solr.core.SolrResourceLoader  – Solr loaded a deprecated
plugin/analysis class [solr.DoubleField]. Please consult documentation how
to replace it accordingly.
4970 [coreLoadExecutor-4-thread-1] WARN
org.apache.solr.core.SolrResourceLoader  – Solr loaded a deprecated
plugin/analysis class [solr.DateField]. Please consult documentation how to
replace it accordingly.
5829 [coreLoadExecutor-4-thread-1] WARN
org.apache.solr.core.SolrResourceLoader  – Solr loaded a deprecated
plugin/analysis class [solr.ThaiWordFilterFactory]. Please consult
documentation how to replace it accordingly.
5875 [coreLoadExecutor-4-thread-1] ERROR
org.apache.solr.core.CoreContainer  – Unable to create core: collection1
org.apache.solr.common.SolrException: Could not load core configuration for
core collection1
at
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:66)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:554)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType "text_opennlp": Plugin init failure for [schema.xml]
analyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory'.
Schema file is /home/bhadra/svn3/solr/example/solr/collection1/schema.xml
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:616)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:166)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at
org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:89)
at
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)
... 9 more
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType "text_opennlp": Plugin init failure for [schema.xml]
analyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory'
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:470)
... 14 more
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/tokenizer: Error loading class
'solr.OpenNLPTokenizerFactory'
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 15 more
Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.OpenNLPTokenizerFactory'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:490)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:593)
at
org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342)
at
org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 19 more
Caused by: java.lang.ClassNotFoundException: solr.OpenNLPTokenizerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lan

Fwd: Solr and OpenNLP - Error loading class 'solr.OpenNLPTokenizerFactory'

2014-06-18 Thread Bhadra Mani
I am facing this issue
https://issues.apache.org/jira/browse/SOLR-3625
but adding to solrconfig.xml doesnot work.
it is on 4.8.1
Thanks,
Bhadra

-- Forwarded message --
From: Bhadra Mani 
Date: Wed, Jun 18, 2014 at 4:46 PM
Subject: Solr and OpenNLP - Error loading class
'solr.OpenNLPTokenizerFactory'
To: solr-user@lucene.apache.org


Hi All,

Using link
https://wiki.apache.org/solr/OpenNLP#Deployment_to_Solr

Followed the steps.
bin/trainall.sh - It keept running while I executed next step "go to
trunk-dir/solr and run 'ant test-contrib'"
(This has 2 test suite failed.)

Later created war file using 'ant dist'
(Build Successful)

Then run solr, working fine..

Added this =>

  


  


to schema.xml

now getting this error. Please help

4961 [coreLoadExecutor-4-thread-1] WARN
org.apache.solr.core.SolrResourceLoader  – Solr loaded a deprecated
plugin/analysis class [solr.DoubleField]. Please consult documentation how
to replace it accordingly.
4970 [coreLoadExecutor-4-thread-1] WARN
org.apache.solr.core.SolrResourceLoader  – Solr loaded a deprecated
plugin/analysis class [solr.DateField]. Please consult documentation how to
replace it accordingly.
5829 [coreLoadExecutor-4-thread-1] WARN
org.apache.solr.core.SolrResourceLoader  – Solr loaded a deprecated
plugin/analysis class [solr.ThaiWordFilterFactory]. Please consult
documentation how to replace it accordingly.
5875 [coreLoadExecutor-4-thread-1] ERROR
org.apache.solr.core.CoreContainer  – Unable to create core: collection1
org.apache.solr.common.SolrException: Could not load core configuration for
core collection1
at
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:66)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:554)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType "text_opennlp": Plugin init failure for [schema.xml]
analyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory'.
Schema file is /home/bhadra/svn3/solr/example/solr/collection1/schema.xml
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:616)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:166)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at
org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:89)
at
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)
... 9 more
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType "text_opennlp": Plugin init failure for [schema.xml]
analyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory'
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:470)
... 14 more
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/tokenizer: Error loading class
'solr.OpenNLPTokenizerFactory'
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 15 more
Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.OpenNLPTokenizerFactory'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:490)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:593)
at
org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342)
at
org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 19 more
Caused by: java.lang.ClassNotFoundException: solr.OpenNLPTokenizerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController

Re: Converting XML response of Search query into HTML.

2014-06-18 Thread Venkata krishna
Thanks  for quick responses,

 Ahemt , i  have tried by removing ampersand then xml response has not
converted to html response it is in just xml only.

Erik , according to  your suggestion  i used VelocityResponseWriter.
like this manner
query.set("&wt", "velocity");
query.set("&v.template","browse");
query.set("&v.layout", "layout");
then also it is throwing same exception as previous
Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Expected mime type application/xml but got text/html. 
at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:516)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)


but when i do searching  through solr admin interface then  response is
converted to html.
1. using XslResponseWriter

http://localhost:8983/solr/collection1/select?q=coby&df=text&wt=xslt&indent=true&tr=example.xsl&hl=true&hl.fl=content&hl.fragsize=1000&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

2. using VelocityResponseWriter

http://localhost:8983/solr/collection1/select?q=coby&wt=velocity&indent=true&v.template=browse&v.layout=layout&hl=true&hl.fl=content&hl.fragsize=1000&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

It seems like coding issue of solrj (HttpSolrServer class).


so could you please provide me suggestions.

Thanks,

Venkata krishna Tolusuri.
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-XML-response-of-Search-query-into-HTML-tp4141456p4142490.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Document security filtering in distributed solr (with multi shard)

2014-06-18 Thread Ali Nazemian
Any idea would be appropriate.



On Tue, Jun 17, 2014 at 5:44 PM, Ali Nazemian  wrote:

> Dear Alexandre,
> Yeah I saw that, but what is the best way of doing that from the
> performance point of view?
> I think of one solution myself:
> Suppose we have a RDBMS for users that contains the category and group for
> each user. (It could be in hierarchical format) Suppose there is a field
> name "security" in solr index that contains the list of each group or
> category that is applied to each document. So the query would be filter
> only documents that its category or group match the specific one for that
> user.
> Is this solution works in distributed way? What if we concern about
> performance?
> Also I was wondering how lucidworks do that?
> Best regards.
>
>
> On Tue, Jun 17, 2014 at 4:08 PM, Alexandre Rafalovitch  > wrote:
>
>> Have you looked at Post Filters? I think this was one of the use cases.
>>
>> An old article:
>> http://java.dzone.com/articles/custom-security-filtering-solr . Google
>> search should bring a couple more.
>>
>> Regards,
>>Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>>
>>
>> On Tue, Jun 17, 2014 at 6:24 PM, Ali Nazemian 
>> wrote:
>> > Dears,
>> > Hi,
>> > I am going to apply customer security filtering for each document per
>> each
>> > user. (using custom profile for each user). I was thinking of adding
>> user
>> > fields to index and using solr join for filtering. But It seems for
>> > distributed solr this is not a solution. Could you please tell me what
>> the
>> > solution would be in this case?
>> > Best regards.
>> >
>> > --
>> > A.Nazemian
>>
>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian


Re: Document security filtering in distributed solr (with multi shard)

2014-06-18 Thread Alexandre Rafalovitch
The performance you should usually test yourself. Especially, since
you probably want some sort of cashing.

But post-filters were specifically designed to be used for expensive
operations (and you can order them too to apply in sequence). They
should also work distributed, though each shard will need the
information separately, so probably a n* requests to the database for
updates.

I think ManifoldCF also has some security filters and there are might
be commercial implementations too.

No idea about LucidWorks, did you check their documentations? They are
usually pretty good with that.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Wed, Jun 18, 2014 at 8:10 PM, Ali Nazemian  wrote:
> Any idea would be appropriate.
>
>
>
> On Tue, Jun 17, 2014 at 5:44 PM, Ali Nazemian  wrote:
>
>> Dear Alexandre,
>> Yeah I saw that, but what is the best way of doing that from the
>> performance point of view?
>> I think of one solution myself:
>> Suppose we have a RDBMS for users that contains the category and group for
>> each user. (It could be in hierarchical format) Suppose there is a field
>> name "security" in solr index that contains the list of each group or
>> category that is applied to each document. So the query would be filter
>> only documents that its category or group match the specific one for that
>> user.
>> Is this solution works in distributed way? What if we concern about
>> performance?
>> Also I was wondering how lucidworks do that?
>> Best regards.
>>
>>
>> On Tue, Jun 17, 2014 at 4:08 PM, Alexandre Rafalovitch > > wrote:
>>
>>> Have you looked at Post Filters? I think this was one of the use cases.
>>>
>>> An old article:
>>> http://java.dzone.com/articles/custom-security-filtering-solr . Google
>>> search should bring a couple more.
>>>
>>> Regards,
>>>Alex.
>>> Personal website: http://www.outerthoughts.com/
>>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>>> proficiency
>>>
>>>
>>> On Tue, Jun 17, 2014 at 6:24 PM, Ali Nazemian 
>>> wrote:
>>> > Dears,
>>> > Hi,
>>> > I am going to apply customer security filtering for each document per
>>> each
>>> > user. (using custom profile for each user). I was thinking of adding
>>> user
>>> > fields to index and using solr join for filtering. But It seems for
>>> > distributed solr this is not a solution. Could you please tell me what
>>> the
>>> > solution would be in this case?
>>> > Best regards.
>>> >
>>> > --
>>> > A.Nazemian
>>>
>>
>>
>>
>> --
>> A.Nazemian
>>
>
>
>
> --
> A.Nazemian


Re: Converting XML response of Search query into HTML.

2014-06-18 Thread Erik Hatcher
The ‘&’ is only for separating parameters when building a URL, but omit the ‘&’ 
when using SolrJ.

You’ll probably need to do a little bit of SolrJ trickery to get the response 
back as text, such that SolrJ doesn’t try to interpret the response as XML or 
javabin.

Erik


On Jun 18, 2014, at 9:04 AM, Venkata krishna  wrote:

> Thanks  for quick responses,
> 
> Ahemt , i  have tried by removing ampersand then xml response has not
> converted to html response it is in just xml only.
> 
> Erik , according to  your suggestion  i used VelocityResponseWriter.
> like this manner
>query.set("&wt", "velocity");
>   query.set("&v.template","browse");
>   query.set("&v.layout", "layout");
> then also it is throwing same exception as previous
> Exception in thread "main"
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> Expected mime type application/xml but got text/html. 
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:516)
>   at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
>   at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
>   at
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
> 
> 
> but when i do searching  through solr admin interface then  response is
> converted to html.
> 1. using XslResponseWriter
> 
> http://localhost:8983/solr/collection1/select?q=coby&df=text&wt=xslt&indent=true&tr=example.xsl&hl=true&hl.fl=content&hl.fragsize=1000&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> 
> 2. using VelocityResponseWriter
> 
> http://localhost:8983/solr/collection1/select?q=coby&wt=velocity&indent=true&v.template=browse&v.layout=layout&hl=true&hl.fl=content&hl.fragsize=1000&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> 
> It seems like coding issue of solrj (HttpSolrServer class).
> 
> 
> so could you please provide me suggestions.
> 
> Thanks,
> 
> Venkata krishna Tolusuri.
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Converting-XML-response-of-Search-query-into-HTML-tp4141456p4142490.html
> Sent from the Solr - User mailing list archive at Nabble.com.



What is causing snapshots?

2014-06-18 Thread Tom Van den Abbeele
Hi all,

I'm trying to figure out what is causing several snapshots a day on a 
multi-core solr 3.5.0 instance deployed in Jboss with replication via the 
ReplicationHandler.
Currently the parameters indicate there will be made a backup after an 
optimize, but at the moment there are no automatic optimizations scheduled 
anywhere, nor any backup requests. The snapshots also appear with strange 
intervals on all cores; for instance following snapshot files for the last 2 
days: 

core1:
o   16/6 – 15u27
o   17/6 – 15u29
core2:
o   16/6 – 17u04
o   17/6 – 17u04
o   17/6 – 17u09
core3:
o   16/6 – 09u59
o   16/6 – 19u29
o   17/6 – 10u59
o   17/6 – 19u34
core4:
o   16/6 – 14u54
o   16/6 – 15u04
o   17/6 – 15u14
o   17/6 – 15u24
o   17/6 – 15u34
core5:
o   16/6 – 21u24
o   17/6 – 21u29
core6:
o   16/6 – 21u59
o   17/6 – 21u59

core4 being the biggest
The logs don't show anything relevant around those timestamps. Snapshooter is 
not configured anywhere as far as I'm aware and commits happen much more 
frequently.

Is there a way to figure out what causes the snapshots or is there still some 
automatic snapshot mechanism I'm not aware of? I don't seem to find anything 
related to this topic on the web. All our other instances don't show this 
behavior; difference is they are all at least 3.6 and deployed in tomcat, this 
one 3.5.0 in Jboss.

tx,
Tom


Re: VelocityResponseWriter in solr

2014-06-18 Thread Erik Hatcher
You’ll see that all the fields are visible when you either use /browse?wt=xml 
or /browse?debugQuery=true, so the values are all there (via fl=*).  The 
default non-debug view only shows a few fields, but you can adjust the 
template(s) used to render things how you’d like.  The main template to render 
each document is hit.vm*, and I generally replace what’s in there with what’s 
in hit_plain.vm that you’ll find also in the example templates.

Erik

* 
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/velocity/hit.vm


On Jun 18, 2014, at 5:51 AM, Vivekanand Ittigi  wrote:

> Hi,
> 
> I want to use VelocityResponseWriter in solr.
> 
> I've indexed a website( for example http://www.biginfolabs.com/). If i type
> a query
> http://localhost:8983/solr/collection1/select?q=santhos&wt=xml&indent=true
> 
> I will get all the fields related to that document (content,host,title,url
> etc) but if i put the query in velocity
> http://localhost:8983/solr/collection1/browse?q=santhosh i will see only 3
> fields(id,url,content) instead of all other fields.
> 
> How can i display all the fields??
> 
> This is in solrconfig.xml
> 
> 
> 
>   explicit
> 
>   
>   velocity
>   browse
>   layout
>   Solritas
> 
>   
>   edismax
>   
>  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>  title^10.0 description^5.0 keywords^5.0 author^2.0
> resourcename^1.0
>   
>   text
>   100%
>   *:*
>   10
>   *,score
> 
>   
> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
> title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
>   
>name="mlt.fl">text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename
>   3
> 
>   
>   on
>   cat
>   manu_exact
>   content_type
>   author_s
>   ipod
>   GB
>   1
>   cat,inStock
>   after
>   price
>   0
>   600
>   50
>   popularity
>   0
>   10
>   3
>   manufacturedate_dt
>name="f.manufacturedate_dt.facet.range.start">NOW/YEAR-10YEARS
>   NOW
>   +1YEAR
>   before
>   after
> 
>   
>   on
>   content features title name
>   html
>   
>   
>   0
>   title
>   0
>   name
>   3
>   200
>   content
>   750
> 
>   
>   on
>   false
>   5
>   2
>   5
>   true
>   true
>   5
>   3
> 
> 
> 
> 
>   spellcheck
> 
>  
> 
> 
> Thanks,
> Vivek



Solr 4.8 result page desplay changes and highlighting

2014-06-18 Thread vicky
Hi Everyone,

I just installed solr 4.8 release and playing with DIH and Velocity
configuration. 

I am trying to change result page columns to display more # of fields and
type of format to tabular since I have 1 rows to display on one page if
I can in out of box configuration.

I also tried highlight feature in 4.8 and out of box it is not working.

Has anyone ran into this issue? Please advise,

All help is appreciated in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-8-result-page-desplay-changes-and-highlighting-tp4142504.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting not working

2014-06-18 Thread vicky
Were you ever able to resolve this issue? I am having same issue and highligh
is not working for me on solr 4.8?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-not-working-tp4112659p4142513.html
Sent from the Solr - User mailing list archive at Nabble.com.


add new Fields with SolrJ without changing schema.xml

2014-06-18 Thread benjelloun
Hello,

I need to add new Fields with SolrJ without changing schema.xml.
this is my code java :

HttpSolrServer server = new HttpSolrServer("http://localhost:8080/solr";);
SolrInputDocument doc = new SolrInputDocument();
 doc.addField("id", id);
 doc.addField("Titre", nomdocument);

the id and Titre are already on schema.xml, but what i need to do id to add
new fields like  firstname,last name, age...

thanks for help,
Best regards,
Anass BENJELLOUN



--
View this message in context: 
http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515.html
Sent from the Solr - User mailing list archive at Nabble.com.


Calculating filterCache size

2014-06-18 Thread Benjamin Wiens
Hi,
I'm looking for a formula to calculate filterCache size in the RAM.

The best estimation I can find is here
http://stackoverflow.com/questions/2004/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem

An index of 1.000.000 would thus take 12,5 GB in the RAM with this formula:

100.000.000.000 bit / 8 (to byte) / 1000 (to kb) / 1000 (to mb) / 1000 (to
gb) = 12,5 GB

Can anyone confirm this formula? I am aware that if the result of the
filter query is low, it can just create something else which take up less
memory.

I know I can just start with a low filterCache size and kick it up in my
environment, but I'd like to come up with a scientific formula.

Thanks,
Ben


Leader Selection Error

2014-06-18 Thread Gurfan
Hi,

We have a SolrCloud 4.7.1 setup having some leader and some replica. If a
leader goes down then it tries to elect the leader between the replica`s.
Between the replica`s some replica`s gets into recovery mode. In this
activity an error is thrown "we are not the leader". The server went into 40
minute loop to recover and still did not recover completely. 


,,"INFO  - 2014-06-18 01:26:45.820; org.apache.solr.cloud.RecoveryStrategy;
Wait 2.0 seconds before trying to recover again
(1)","2014-06-18T01:26:45.820+",1,18,26,june,45,wednesday,2014,local"nix-all-logs""renew-sdb-1.int.ssi-cloud.com",database,1,"__-_--_::.;_;__.___()",,"/var/log/tomcat7/solr.log",sdb,"splunkindexer-1.int.ssi-cloud.com",,31,,8,,,
,,"ERROR - 2014-06-18 01:26:45.820; org.apache.solr.cloud.RecoveryStrategy;
Recovery failed - trying again... (0)
core=app.quotes_shard1_replica1","2014-06-18T01:26:45.820+",,,"app.quotes_shard1_replica1",,1,18,26,june,45,wednesday,2014,local"nix-all-logs
nix_errors""renew-sdb-1.int.ssi-cloud.com",database,1,"_-_--_::.;_;___-__..._()_=.",,"/var/log/tomcat7/solr.log",sdb,"splunkindexer-1.int.ssi-cloud.com"error,error,31,,8,,,
,,"ERROR - 2014-06-18 01:26:45.820; org.apache.solr.common.SolrException;
Error while trying to recover.
core=app.quotes_shard1_replica1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
We are not the leader
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:224)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247)","2014-06-18T01:26:45.820+",,,"app.quotes_shard1_replica1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:",,1,18,26,june,45,wednesday,2014,local"nix-all-logs
nix_errors""renew-sdb-1.int.ssi-cloud.com",database,6,"_-_--_::.;_;_._=.:..$:_t_...(.",,"/var/log/tomcat7/solr.log",sdb,"splunkindexer-1.int.ssi-cloud.com"error,error,31,,8,,,
,0,"INFO  - 2014-06-18 01:26:45.818;
org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null
path=/admin/cores
params={coreNodeName=core_node1&onlyIfLeaderActive=true&state=recovering&nodeName=10.4.30.89:8080_solr&action=PREPRECOVERY&checkLive=true&core=app.quotes_shard1_replica2&wt=javabin&onlyIfLeader=true&version=2}
status=400 QTime=0
","2014-06-18T01:26:45.818+",PREPRECOVERYtrue,,"app.quotes_shard1_replica2","core_node1",1,18,26,june,45,wednesday,2014,local"nix-all-logs""renew-sdb-3.int.ssi-cloud.com",database,1,,,"10.4.30.89:8080_solr",,,true,true,,,"{coreNodeName=core_node1&onlyIfLeaderActive=true&state=recovering&nodeName=10.4.30.89:8080_solr&action=PREPRECOVERY&checkLive=true&core=app.quotes_shard1_replica2&wt=javabin&onlyIfLeader=true&version=2}","/admin/cores",,"__-_--_::.;_;_[]_=_=//_={=&=&=&=...:&=&=&=.&=&",,"/var/log/tomcat7/solr.log",sdb,"splunkindexer-3.int.ssi-cloud.com",recovering,40031,,8,,2,,,null,javabin,
,,"ERROR - 2014-06-18 01:26:45.818; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: We are not the leader
at
org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:905)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:198)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:732)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at
org.a

Re: Calculating filterCache size

2014-06-18 Thread Erick Erickson
You pretty much have it. Actually, the number you want is the "maxDoc"
figure from the admin UI screen. The formula will be maxDoc/8 bytes +
(some overhead but not enough to matter), for EVERY entry.

You'll never fit 100B docs on a single machine anyway. Lucene has a
hard limit of 2B docs, and I've never heard of anyone fitting even 2B
docs on a single machine in a performant manner. So under any
circumstance this won't all be on one machine. You have to figure it
locally for each shard. And at this size there's no doubt you'll be
sharding!

Also be very careful here. the "size" parameter in the cache
definition is the number of _entries_, NOT the number of _bytes_.

_Each_ entry is that size! So the cache requirements will be close to
((maxDoc/8) + 128) * (size_defined_in_the_config_file), where 128 is
an approximation of the storage necessary for the text of the fq
clause.

Best,
Erick

On Wed, Jun 18, 2014 at 8:00 AM, Benjamin Wiens
 wrote:
> Hi,
> I'm looking for a formula to calculate filterCache size in the RAM.
>
> The best estimation I can find is here
> http://stackoverflow.com/questions/2004/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem
>
> An index of 1.000.000 would thus take 12,5 GB in the RAM with this formula:
>
> 100.000.000.000 bit / 8 (to byte) / 1000 (to kb) / 1000 (to mb) / 1000 (to
> gb) = 12,5 GB
>
> Can anyone confirm this formula? I am aware that if the result of the
> filter query is low, it can just create something else which take up less
> memory.
>
> I know I can just start with a low filterCache size and kick it up in my
> environment, but I'd like to come up with a scientific formula.
>
> Thanks,
> Ben


Re: add new Fields with SolrJ without changing schema.xml

2014-06-18 Thread Erick Erickson
bq: I need to add new Fields with SolrJ without changing schema.xml

You have three options:
1> change the schema.xml file. but you say you can't. Why not?
2> use dynamic fields. If you're lucky, you have the stock schema.xml
and can use them. They'll require some suffix to match the pattern
though.
3> use "managed schemas", the schemaless variant to just try to figure
it out or the managed variant and use the API to add fields before
using them.

But under any circumstance, you probably have to change your configs.

Best,
Erick

On Wed, Jun 18, 2014 at 7:31 AM, benjelloun  wrote:
> Hello,
>
> I need to add new Fields with SolrJ without changing schema.xml.
> this is my code java :
>
> HttpSolrServer server = new HttpSolrServer("http://localhost:8080/solr";);
> SolrInputDocument doc = new SolrInputDocument();
>  doc.addField("id", id);
>  doc.addField("Titre", nomdocument);
>
> the id and Titre are already on schema.xml, but what i need to do id to add
> new fields like  firstname,last name, age...
>
> thanks for help,
> Best regards,
> Anass BENJELLOUN
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: add new Fields with SolrJ without changing schema.xml

2014-06-18 Thread benjelloun
Hello,

i'm using this configuration on solrconfig.xml:


 true
 managed-schema
   

Can you please give me an exemple on SolrJ for adding a new field.
thanks,
best regards,
Anass BENJELLOUN



--
View this message in context: 
http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515p4142533.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Highlighting not working

2014-06-18 Thread Teague James
Vicky,

I resolved this by making sure that the field that is searched has
"stored=true". By default "text" is searched, which is the destination of
the copyFields and is not stored. If you change your copyField destination
to a field that is stored and use that field as the default search field
then highlighting should work - or at least it did for me.

As a super fast check, change the text field to "stored=true" and test.
Remember that you'll have to restart Solr and re-index first! HTH!

-Teague

-Original Message-
From: vicky [mailto:vi...@raytheon.com] 
Sent: Wednesday, June 18, 2014 10:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Highlighting not working

Were you ever able to resolve this issue? I am having same issue and
highligh is not working for me on solr 4.8?



--
View this message in context:
http://lucene.472066.n3.nabble.com/Highlighting-not-working-tp4112659p414251
3.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: add new Fields with SolrJ without changing schema.xml

2014-06-18 Thread benjelloun
Hello,

this is what i want to do:

public static void addNewField(Boolean uniqueId,String type, Boolean
indexed,Boolean stored,Boolean   multivalued,Boolean
sortmissinglast,Boolean required){ 

.
.
}

any exemple please,
thanks,
Best regards,
Anass BENJELLOUN




--
View this message in context: 
http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515p4142555.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Converting XML response of Search query into HTML.

2014-06-18 Thread Venkata krishna
Hi Erik,

I have tried by removing '&' and i got response as in text format but i
don't want response in text form. We need to get response as in html form
with out any exception(Expected mime type application/xml but got
text/html).So could you please provide any suggestion.

Thanks,

venkata krishna tolusuri.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-XML-response-of-Search-query-into-HTML-tp4141456p4142546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: add new Fields with SolrJ without changing schema.xml

2014-06-18 Thread Walter Underwood
Why can't you change schema.xml?  --wunder

On Jun 18, 2014, at 8:56 AM, benjelloun  wrote:

> Hello,
> 
> this is what i want to do:
> 
> public static void addNewField(Boolean uniqueId,String type, Boolean
> indexed,Boolean stored,Boolean   multivalued,Boolean
> sortmissinglast,Boolean required){ 
> 
> .
> .
>}
> 
> any exemple please,
> thanks,
> Best regards,
> Anass BENJELLOUN
> 
> 
> 
> 




Re: Calculating filterCache size

2014-06-18 Thread Benjamin Wiens
Thanks Erick!
So let's say I have a config of



MaxDocuments = 1,000,000

So according to your formula, filterCache should roughly have the potential
to consume this much RAM:
((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 =
1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb

Thanks,
Ben





On Wed, Jun 18, 2014 at 11:13 AM, Erick Erickson 
wrote:

> You pretty much have it. Actually, the number you want is the "maxDoc"
> figure from the admin UI screen. The formula will be maxDoc/8 bytes +
> (some overhead but not enough to matter), for EVERY entry.
>
> You'll never fit 100B docs on a single machine anyway. Lucene has a
> hard limit of 2B docs, and I've never heard of anyone fitting even 2B
> docs on a single machine in a performant manner. So under any
> circumstance this won't all be on one machine. You have to figure it
> locally for each shard. And at this size there's no doubt you'll be
> sharding!
>
> Also be very careful here. the "size" parameter in the cache
> definition is the number of _entries_, NOT the number of _bytes_.
>
> _Each_ entry is that size! So the cache requirements will be close to
> ((maxDoc/8) + 128) * (size_defined_in_the_config_file), where 128 is
> an approximation of the storage necessary for the text of the fq
> clause.
>
> Best,
> Erick
>
> On Wed, Jun 18, 2014 at 8:00 AM, Benjamin Wiens
>  wrote:
> > Hi,
> > I'm looking for a formula to calculate filterCache size in the RAM.
> >
> > The best estimation I can find is here
> >
> http://stackoverflow.com/questions/2004/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem
> >
> > An index of 1.000.000 would thus take 12,5 GB in the RAM with this
> formula:
> >
> > 100.000.000.000 bit / 8 (to byte) / 1000 (to kb) / 1000 (to mb) / 1000
> (to
> > gb) = 12,5 GB
> >
> > Can anyone confirm this formula? I am aware that if the result of the
> > filter query is low, it can just create something else which take up less
> > memory.
> >
> > I know I can just start with a low filterCache size and kick it up in my
> > environment, but I'd like to come up with a scientific formula.
> >
> > Thanks,
> > Ben
>


Limit Porter stemmer to plural stemming only?

2014-06-18 Thread Jacob, Jerry (RIS-ATL)
Hi,

Can you please share the Java code for Plural Only Porter Stemmer for English 
if you don't mind?

Thanks,
Jerry




Re: Calculating filterCache size

2014-06-18 Thread Shawn Heisey
On 6/18/2014 10:57 AM, Benjamin Wiens wrote:
> Thanks Erick!
> So let's say I have a config of
>
>  class="solr.FastLRUCache"
> size="1"
> initialSize="1"
> autowarmCount="5000"/>
>
> MaxDocuments = 1,000,000
>
> So according to your formula, filterCache should roughly have the potential
> to consume this much RAM:
> ((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 =
> 1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb

Yes, this is essentially correct.  If you want to arrive at a number
that's more accurate for the way that OS tools will report memory,
you'll divide by 1024 instead of 1000 for each of the larger units. 
That results in a size of 1.16GB instead of 1.25.  Computers think in
powers of 2, dividing by 1000 assumes a bias to how people think, in
powers of 10.  It's the same thing that causes your computer to report
931GB for a 1TB hard drive.

Thanks,
Shawn



Re: MergeReduceIndexerTool takes a lot of time for a limited number of documents

2014-06-18 Thread Wolfgang Hoschek
Consider giving the MR tasks more RAM, for example via 

hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar
org.apache.solr.hadoop.MapReduceIndexerTool -D 
'mapred.child.java.opts=-Xmx2000m’ ...

Wolfgang.

On May 26, 2014, at 10:48 AM, Costi Muraru  wrote:

> Hey Erick,
> 
> The job reducers began to die with "Error: Java heap space", after 1h and
> 22 minutes being stucked at ~80%.
> 
> I did a few more tests:
> 
> Test 1.
> 80,000 documents
> Each document had *20* fields. The field names were* the same *for all the
> documents. Values were different.
> Job status: successful
> Execution time: 33 seconds.
> 
> Test 2.
> 80,000 documents
> Each document had *20* fields. The field names were *different* for all the
> documents. Values were also different.
> Job status: successful
> Execution time: 643 seconds.
> 
> Test 3.
> 80,000 documents
> Each document had *50* fields. The field names were *the same* for all the
> documents. Values were different.
> Job status: successful
> Execution time: 45.96 seconds.
> 
> Test 4.
> 80,000 documents
> Each document had *50* fields. The field names were *different* for all the
> documents. Values were also different.
> Job status: failed
> Execution time: after 1h reducers failed.
> Unfortunately, this is my use case.
> 
> My guess is that the reduce time (to perform the merges) depends if the
> field names are the same across the documents. If they are different the
> merge time increases very much. I don't have any knowledge behind the solr
> merge operation, but is it possible that it tries to group the fields with
> the same name across all the documents?
> In the first case, when the field names are the same across documents, the
> number of buckets is equal to the number of unique field names which is 20.
> In the second case, where all the field names are different (my use case),
> it creates a lot more buckets (80k documents * 50 different field names = 4
> million buckets) and the process gets slowed down significantly.
> Is this assumption correct / Is there any way to get around it?
> 
> Thanks again for reaching out. Hope this is more clear now.
> 
> This is how one of the 80k documents looks like (json format):
> {
> "id" : "442247098240414508034066540706561683636",
> "items" : {
>   "IT49597_1180_i" : 76,
>   "IT25363_1218_i" : 4,
>   "IT12418_1291_i" : 95,
>   "IT55979_1051_i" : 31,
>   "IT9841_1224_i" : 36,
>   "IT40463_1010_i" : 87,
>   "IT37932_1346_i" : 11,
>   "IT17653_1054_i" : 37,
>   "IT59414_1025_i" : 96,
>   "IT51080_1133_i" : 5,
>   "IT7369_1395_i" : 90,
>   "IT59974_1245_i" : 25,
>   "IT25374_1345_i" : 75,
>   "IT16825_1458_i" : 28,
>   "IT56643_1050_i" : 76,
>   "IT46274_1398_i" : 50,
>   "IT47411_1275_i" : 11,
>   "IT2791_1000_i" : 97,
>   "IT7708_1053_i" : 96,
>   "IT46622_1112_i" : 90,
>   "IT47161_1382_i" : 64
>   }
> }
> 
> Costi
> 
> 
> On Mon, May 26, 2014 at 7:45 PM, Erick Erickson 
> wrote:
> 
>> The MapReduceIndexerTool is really intended for very large data sets,
>> and by today's standards 80K doesn't qualify :).
>> 
>> Basically, MRIT creates N sub-indexes, then merges them, which it
>> may to in a tiered fashion. That is, it may merge gen1 to gen2, then
>> merge gen2 to gen3 etc. Which is great when indexing a bazillion
>> documents into 20 shards, but all that copying around may take
>> more time than you really gain for 80K docs.
>> 
>> Also be aware that MRIT does NOT update docs with the same ID, this
>> is due to the inherent limitation of the Lucene mergeIndex process.
>> 
>> How long is "a long time"? attachments tend to get filtered out, so if you
>> want us to see the graph you might paste it somewhere and provide a link.
>> 
>> Best,
>> Erick
>> 
>> On Mon, May 26, 2014 at 8:51 AM, Costi Muraru 
>> wrote:
>>> Hey guys,
>>> 
>>> I'm using the MergeReduceIndexerTool to import data into a SolrCloud
>>> cluster made out of 3 decent machines.
>>> Looking in the JobTracker, I can see that the mapper jobs finish quite
>>> fast. The reduce jobs get to ~80% quite fast as well. It is here where
>>> they get stucked for a long period of time (picture + log attached).
>>> I'm only trying to insert ~80k documents with 10-50 different fields
>>> each. Why is this happening? Am I not setting something correctly? Is
>>> the fact that most of the documents have different field names, or too
>>> many for that matter?
>>> Any tips are gladly appreciated.
>>> 
>>> Thanks,
>>> Costi
>>> 
>>> From the reduce logs:
>>> 60208 [main] INFO  org.apache.solr.update.UpdateHandler  - start
>>> 
>> commit{,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false,prepareCommit=false}
>>> 60208 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
>>> [IW][main]: commit: start
>>> 60208 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
>>> [IW][main]: commit: enter lock
>>> 60208 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
>>> [IW][main]: commit: now prepare
>>> 602

Synonyms - 20th and 20

2014-06-18 Thread Jae Joo
I have a synonyms.txt file which has
20th,twentieth

Once I apply the synonym, I see "20th", "twentieth" and "20" for "20th".
Does anyone know where "20" comes from? How can I have only "20th" and
"twentieth"?

Thanks,

Jae


Re: Synonyms - 20th and 20

2014-06-18 Thread Diego Fernandez
What tokenizer and filters are you using?

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics


- Original Message -
> I have a synonyms.txt file which has
> 20th,twentieth
> 
> Once I apply the synonym, I see "20th", "twentieth" and "20" for "20th".
> Does anyone know where "20" comes from? How can I have only "20th" and
> "twentieth"?
> 
> Thanks,
> 
> Jae
> 


Looking for migration stories to an HDFS-backed Solr Cloud

2014-06-18 Thread Michael Della Bitta
Hi everyone,

We're considering a migration to an HDFS-backed Solr Cloud, both from our
4.2-based Solr Cloud, and a legacy 3.6 classic replication setup. In the
end, we hope to unify these two and upgrade to 4.8.1, or 4.9 if that's out
in time.

I'm wondering how many of you have experience with migrating to HDFS, and
if you managed to do something a little more crafty than a bulk reindex
against a new installation.

For example, is it possible to do something like join some 4.8, HDFS-backed
nodes to your 4.2 setup, add replicas to the new nodes, have things sync
over, and then terminate the 4.2 nodes?

For our older setup, could I bodge together collections by simply copying
the index data into HDFS and building a single shard collection from each
one? Would the HDFSDirectoryFactory do OK against an index written using an
older codec and on a random access disk?

Any information or experiences you might be able to share would be helpful.
In the meantime, I'm going to start experimenting with some of these
approaches.

Thanks!

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions

w: appinions.com 


RE: Solr maximum Optimal Index Size per Shard

2014-06-18 Thread Toke Eskildsen
Toke Eskildsen [t...@statsbiblioteket.dk] wrote:

[Toke: SSDs with 2.7TB of index on a 256GB machine]

> tl;dr: for small result sets (< 1M hits) on unwarmed searches with
> simple queries, response time is below 100ms. If we enable faceting with
> plain Solr, this jumps to about 1 second.

> I did a top on the machine and it says that 50GB is currently used for
> caching, so an 80GB (and probably less) machine would work fine for our
> 2.7TB index.

So we actually tried this: 3.6TB (4 shards) and 80GB of RAM, leaving a little 
less than 40GB for caching: 40GB / 3,600GB ~= 1% of the index size.

This performed quite well, with faceting times still around 1 second and 
non-faceted search a lot lower. There's a writeup at
http://sbdevel.wordpress.com/2014/06/17/terabyte-index-search-and-faceting-with-solr/

- Toke Eskildsen, State and University Library, Denmark




Re: Warning message logs on startup after upgrading to 4.8.1

2014-06-18 Thread Chris Hostetter

: WARN  o.a.s.r.ManagedResource- No stored data found for
: /schema/analysis/stopwords/english
: WARN  o.a.s.r.ManagedResource- No stored data found for
: /schema/analysis/synonyms/english
: 
: I fixed these by commenting out the managed_en field type in my
: schema, see 
https://github.com/xwiki/xwiki-platform/commit/d41580c383f40d2aa4e4f551971418536a3f3a20#diff-44d79e64e45f3b05115aebcd714bd897L486

FWIW: Unless i'm missing something, you should have only gotten those 
warnings in the situation where you started using the 4.8 
example schema.xml (or cut/pasted those from the 4.8 into your existing 
schema) but you didn't use the rest of the cof files that came with 4.8 -- 
so you didn't have the stored data JSON file that goes with it -- in which 
case that is a legitimate warning that you have an analysis factory 
existing to use a "managed" resource but there is no managed data file 
available.

: WARN  o.a.s.r.ManagedResource- No stored data found for /rest/managed
: WARN  o.a.s.r.ManagedResource- No registered observers for 
/rest/managed
: 
: How can I get rid of these 2?
: 
: This jira issue is related https://issues.apache.org/jira/browse/SOLR-6128 .

I agree, there's no reason i can see for those to be warnings -- so as to 
keep SOLR-6128 focused on just one thing, i've created SOLR-6179 to track 
the ManagedResource WARNs...

https://issues.apache.org/jira/browse/SOLR-6179


-Hoss
http://www.lucidworks.com/


Why aren't my nested documents nesting?

2014-06-18 Thread Vinay B,
Probably a silly error. Can someone point out my mistake? Code and output
gists at https://gist.github.com/anonymous/fb9cdb5b44e76b2c308d

Thanks

Code:
SolrInputDocument solrDoc = new SolrInputDocument();
solrDoc.addField("id", documentId);
solrDoc.addField("content_type", "parentDocument");
solrDoc.addField(Constants.REMOTE_FILE_PATH, filePath == null ? ""
: filePath);
solrDoc.addField(Constants.REMOTE_FILE_LOAD, Constants.TRUE);

SolrInputDocument childDoc = new SolrInputDocument();
childDoc.addField(Constants.ID, documentId+"-A");
childDoc.addField("ATTRIBUTES.STATE", "LA");
childDoc.addField("ATTRIBUTES.STATE", "TX");
solrDoc.addChildDocument(childDoc);

solrServer.add(solrDoc);
solrServer.commit();


What's the best way to specify nested child documents using UpdateRequest

2014-06-18 Thread Vinay B,
SolrJ allows a direct linkage between parent and child document using
SolrInputDocument.addChildDocument(...) .

We, however, construct our request via a raw UpdateRequest() as that gives
us a bit more flexibility. I'm investigating how best to add nested docs
using this approach.

>From my understanding (correct me if I'm wrong), the child doc has to be
created at  the same time as the parent (see
https://www.youtube.com/watch?v=74Wyk4OEtv8 @ 21 minutes). I'm hoping that
despite this,  that the child can be modified updated later on, without
affecting the main document. Our interest in this is that our main document
would contain indexed text (time consuming to index at scale) whereas if we
could separately update the (smaller) child document, we could save a
significant amount of time.

Thoughts?

Thanks


Re: Debug different Results from different Request Handlers

2014-06-18 Thread O. Olson
Thank you Erik (and to steffkes who helped me on the IRC #Solr Chat). Sorry
for the delay in responding, but I got this to work. 

Your suggestion about adding debug=true to the query helped me. Since I 
was
adding this to the Velocity request handler, I could not see the debug
results, but when I added wt=xml i.e. /products?q=hp|lync&
debug=true&wt=xml, I could see the Parsed Query as well as the Parser used
for each handler. 

Thanks also to steffkes who answered my question in the original post 
(on
IRC) i.e. both of my handlers go through
org.apache.solr.servlet.SolrDispatchFilter, particularly it’s the doFilter()
method that I was looking for.

Also as steffkes pointed out, (from my original post), the /products
request handler uses the ExtendedDismaxQParser whereas the second /search or
/select request handler uses the LuceneQParser. It seems that these two
parsers handle the | sign very differently.  For my limited private
installation, I decided to get to the base class of ExtendedDismaxQParser &
LuceneQParser i.e. QParser. There in the constructor, I strip out the | sign
from the qstr parameter. This is probably the dirtiest way to get this to
work, but it works for now. 

Thanks again to you all.
O. O. 

 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Debug-different-Results-from-different-Request-Handlers-tp4141804p4142716.html
Sent from the Solr - User mailing list archive at Nabble.com.


ICUTokenizer or StandardTokenizer or ??? for "text_all" type field that might include non-whitespace langs

2014-06-18 Thread Allison, Timothy B.
All,

In one index I’m working with, the setup is the typical langid mapping to 
language specific fields.  There is also a text_all field that everything is 
copied to.  The documents can contain a wide variety of languages including 
non-whitespace languages.  We’ll be using the ICUTokenFilter in the analysis 
chain, but what should we use for the tokenizer for the “text_all” field?  My 
inclination is to go with the ICUTokenizer.  Are there any reasons to prefer 
the StandardTokenizer or another tokenizer for this field?

Thank you.

   Best,

  Tim


Re: ICUTokenizer or StandardTokenizer or ??? for "text_all" type field that might include non-whitespace langs

2014-06-18 Thread Alexandre Rafalovitch
I don't think the text_all field would work too well for multilingual
setup. Any reason you cannot use edismax to search over a bunch of
fields instead?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Jun 19, 2014 at 8:31 AM, Allison, Timothy B.  wrote:
> All,
>
> In one index I’m working with, the setup is the typical langid mapping to 
> language specific fields.  There is also a text_all field that everything is 
> copied to.  The documents can contain a wide variety of languages including 
> non-whitespace languages.  We’ll be using the ICUTokenFilter in the analysis 
> chain, but what should we use for the tokenizer for the “text_all” field?  My 
> inclination is to go with the ICUTokenizer.  Are there any reasons to prefer 
> the StandardTokenizer or another tokenizer for this field?
>
> Thank you.
>
>Best,
>
>   Tim


Re: What's the best way to specify nested child documents using UpdateRequest

2014-06-18 Thread Mikhail Khludnev
On Thu, Jun 19, 2014 at 3:32 AM, Vinay B,  wrote:

> I'm hoping that
> despite this,  that the child can be modified updated later on, without
> affecting the main document.
>

Nope. You have to rewrite whole block.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Why aren't my nested documents nesting?

2014-06-18 Thread Mikhail Khludnev
because you need you query by special query parser
http://blog.griddynamics.com/2013/09/solr-block-join-support.html
to nest the output you need https://issues.apache.org/jira/browse/SOLR-5285


On Thu, Jun 19, 2014 at 3:20 AM, Vinay B,  wrote:

> Probably a silly error. Can someone point out my mistake? Code and output
> gists at https://gist.github.com/anonymous/fb9cdb5b44e76b2c308d
>
> Thanks
>
> Code:
> SolrInputDocument solrDoc = new SolrInputDocument();
> solrDoc.addField("id", documentId);
> solrDoc.addField("content_type", "parentDocument");
> solrDoc.addField(Constants.REMOTE_FILE_PATH, filePath == null ? ""
> : filePath);
> solrDoc.addField(Constants.REMOTE_FILE_LOAD, Constants.TRUE);
>
> SolrInputDocument childDoc = new SolrInputDocument();
> childDoc.addField(Constants.ID, documentId+"-A");
> childDoc.addField("ATTRIBUTES.STATE", "LA");
> childDoc.addField("ATTRIBUTES.STATE", "TX");
> solrDoc.addChildDocument(childDoc);
>
> solrServer.add(solrDoc);
> solrServer.commit();
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken

2014-06-18 Thread Umesh Prasad
Continuing the discussion on mailing list from Jira.

An Example


*id  group   f1  f2*1   g1
5   10
2   g1 5   1000
3   g1 5   1000
4   g1 10  100
5   g2 5   10
6   g2 5   1000
7   g2 5   1000
8   g210  100

sort= f1 asc, f2 desc , id desc


*Without collapse will give : *
(7,g2), (6,g2),  (3,g1), (2,g1), (5,g2), (1,g1), (8,g2), (4,g1)


*On collapsing by group_s  expected output is : *  (7,g2), (3,g1)

solr standard collapsing does give this output  with
group=on,group.field=group_s,group.main=true

* Collapsing with CollapsingQParserPlugin* fq={!collapse field=group_s} :
  (5,g2), (1,g1)



* Summarizing Jira Discussion :*
1. CollapsingQParserPlugin picks up the group heads from matching results
and passes those further. So in essence filtering some of the matching
documents, so that subsequent collectors never see them. It can also pass
on score to subsequent collectors using a dummy scorer.

2. TopDocCollector comes later in hierarchy and it will sort on the
collapsed set. That works fine.

The issue is with step 1. Collapsing is done by a single comparator which
can take its value from a field or function. It defaults to score.
Function queries do allow us to combine multiple fields / value sources,
however it would be difficult to construct a function for given sort
fields. Primarily because
a) The range of values for a given sort field is not known in advance.
It is possible for one sort field to unbounded, but other to be bounded
within a small range.
b) The sort field can itself hold custom logic.

Because of (a) the group head selected by CollapsingQParserPlugin will be
incorrect and subsequent sorting will break.



On 14 June 2014 12:38, Umesh Prasad  wrote:

> Thanks Joel for the quick response. I have opened a new jira ticket.
>
> https://issues.apache.org/jira/browse/SOLR-6168
>
>
>
>
> On 13 June 2014 17:45, Joel Bernstein  wrote:
>
>> Let's open a new ticket.
>>
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>>
>> On Fri, Jun 13, 2014 at 8:08 AM, Umesh Prasad 
>> wrote:
>>
>> > The patch in SOLR-5408 fixes the issue with sorting only for two sort
>> > fields. Sorting still breaks when 3 or more sort fields are used.
>> >
>> > I have attached a test case, which demonstrates the broken behavior
>> when 3
>> > sort fields are used.
>> >
>> > The failing test case patch is against Lucene/Solr 4.7 revision  number
>> > 1602388
>> >
>> > Can someone apply and verify the bug ?
>> >
>> > Also, should I re-open SOLR-5408  or open a new ticket ?
>> >
>> >
>> > ---
>> > Thanks & Regards
>> > Umesh Prasad
>> >
>>
>
>
>
> --
> ---
> Thanks & Regards
> Umesh Prasad
>



-- 
---
Thanks & Regards
Umesh Prasad


Re: Warning message logs on startup after upgrading to 4.8.1

2014-06-18 Thread Marius Dumitru Florea
On Thu, Jun 19, 2014 at 12:49 AM, Chris Hostetter
 wrote:
>
> : WARN  o.a.s.r.ManagedResource- No stored data found for
> : /schema/analysis/stopwords/english
> : WARN  o.a.s.r.ManagedResource- No stored data found for
> : /schema/analysis/synonyms/english
> :
> : I fixed these by commenting out the managed_en field type in my
> : schema, see 
> https://github.com/xwiki/xwiki-platform/commit/d41580c383f40d2aa4e4f551971418536a3f3a20#diff-44d79e64e45f3b05115aebcd714bd897L486
>

> FWIW: Unless i'm missing something, you should have only gotten those
> warnings in the situation where you started using the 4.8
> example schema.xml (or cut/pasted those from the 4.8 into your existing
> schema) but you didn't use the rest of the cof files that came with 4.8 --
> so you didn't have the stored data JSON file that goes with it -- in which
> case that is a legitimate warning that you have an analysis factory
> existing to use a "managed" resource but there is no managed data file
> available.

Yes, you're right, I've merged my schema with the one provided with 4.8.

>
> : WARN  o.a.s.r.ManagedResource- No stored data found for 
> /rest/managed
> : WARN  o.a.s.r.ManagedResource- No registered observers for 
> /rest/managed
> :
> : How can I get rid of these 2?
> :
> : This jira issue is related https://issues.apache.org/jira/browse/SOLR-6128 .
>
> I agree, there's no reason i can see for those to be warnings -- so as to
> keep SOLR-6128 focused on just one thing, i've created SOLR-6179 to track
> the ManagedResource WARNs...
>

> https://issues.apache.org/jira/browse/SOLR-6179

Thanks,
Marius

>
>
> -Hoss
> http://www.lucidworks.com/