Re: Solarium Extension

2014-08-07 Thread Shawn Heisey
On 8/7/2014 12:34 AM, pushkar sawant wrote:
> I have done installation of Solarium Search on Magento 1.7 ver. my Solr
> 4.9 is also working in background.
> My Base OS is Ubuntu 13.10 on which solr 4.9 is running.
> when i go & check the extension in magento admin it only shows Test
> Connection.
> Please find attached image.
> 
> Note - When Installing Extension on Magento through Content Manger it shows
> Installation done. But gives error please find attached img. for same.

The list will eat most attachments.  Yours did not make it.

Chances are that you won't be able to get much help with Solarium or
Magento here.  You'll need to find a mailing list or another support
venue for those programs.  They were not created by the Apache Solr project.

Thanks,
Shawn



Cannot finish recovery due to always met ReplicationHandler SnapPull failed: Unable to download xxx.fdt completely

2014-08-07 Thread forest_soup
I have 2 solr nodes(solr1 and solr2) in a SolrCloud. 
After some issue happened, solr2 are in recovering state. The peersync
cannot finish in about 15 min, so it turn to snappull. 
But when it's doing snap pull, it always met this issue below. Meanwhile,
there are still update requests sent to this recovering node(solr2) and the
good node(solr1). And the index in the recovering node is deleted and
rebuild again and again. So it takes lots of time to finish. 

Is it a bug or as solr design? 
And could anyone help me on accelerate the progress of recovery? 

Thanks! 

2014年7月17日 下午5:12:50ERROR   ReplicationHandler  SnapPull failed
:org.apache.solr.common.SolrException: Unable to download _vdq.fdt
completely. Downloaded 0!=182945 
SnapPull failed :org.apache.solr.common.SolrException: Unable to download
_vdq.fdt completely. Downloaded 0!=182945 
   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1305)
 
   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1185)
 
   at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) 
   at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) 
   at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) 
   at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) 
   at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) 
   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) 


We have below settings in solrconfig.xml: 
   
   1000  
   ${solr.autoCommit.maxTime:15000}
   true  
 

   

   ${solr.autoSoftCommit.maxTime:-1}
 

and the 8 is as default. 

my solrconfig.xml is as attached.  solrconfig.xml
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cannot-finish-recovery-due-to-always-met-ReplicationHandler-SnapPull-failed-Unable-to-download-xxx-fy-tp4151611.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cannot finish recovery due to always met ReplicationHandler SnapPull failed: Unable to download xxx.fdt completely

2014-08-07 Thread Shalin Shekhar Mangar
Why does PeerSync take so much time? Are these two nodes in different data
centers or are they connected by a slow link?


On Thu, Aug 7, 2014 at 12:41 PM, forest_soup  wrote:

> I have 2 solr nodes(solr1 and solr2) in a SolrCloud.
> After some issue happened, solr2 are in recovering state. The peersync
> cannot finish in about 15 min, so it turn to snappull.
> But when it's doing snap pull, it always met this issue below. Meanwhile,
> there are still update requests sent to this recovering node(solr2) and the
> good node(solr1). And the index in the recovering node is deleted and
> rebuild again and again. So it takes lots of time to finish.
>
> Is it a bug or as solr design?
> And could anyone help me on accelerate the progress of recovery?
>
> Thanks!
>
> 2014年7月17日 下午5:12:50ERROR   ReplicationHandler  SnapPull failed
> :org.apache.solr.common.SolrException: Unable to download _vdq.fdt
> completely. Downloaded 0!=182945
> SnapPull failed :org.apache.solr.common.SolrException: Unable to download
> _vdq.fdt completely. Downloaded 0!=182945
>at
>
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1305)
>at
>
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1185)
>at
> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771)
>at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421)
>at
>
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322)
>at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155)
>at
>
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
>at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247)
>
>
> We have below settings in solrconfig.xml:
>  
>1000
>${solr.autoCommit.maxTime:15000}
>true
>  
>
>  
>
>${solr.autoSoftCommit.maxTime:-1}
>  
>
> and the 8 is as default.
>
> my solrconfig.xml is as attached.  solrconfig.xml
> 
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Cannot-finish-recovery-due-to-always-met-ReplicationHandler-SnapPull-failed-Unable-to-download-xxx-fy-tp4151611.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Cannot finish recovery due to always met ReplicationHandler SnapPull failed: Unable to download xxx.fdt completely

2014-08-07 Thread forest_soup
Thanks. 
My env is 2 VM with good network condition. So not sure why it is happened.
We are trying to reproduce it. The peersync fail log is :
2014年7月25日 上午6:30:48
WARN
SnapPuller
Error in fetching packets
java.io.EOFException
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154)
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174)
at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771)
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cannot-finish-recovery-due-to-always-met-ReplicationHandler-SnapPull-failed-Unable-to-download-xxx-fy-tp4151611p4151621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cannot finish recovery due to always met ReplicationHandler SnapPull failed: Unable to download xxx.fdt completely

2014-08-07 Thread forest_soup
I have opened one JIRA for it:
https://issues.apache.org/jira/browse/SOLR-6333



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cannot-finish-recovery-due-to-always-met-ReplicationHandler-SnapPull-failed-Unable-to-download-xxx-fy-tp4151611p4151631.html
Sent from the Solr - User mailing list archive at Nabble.com.


org.apache.solr.common.SolrException: no servers hosting shard

2014-08-07 Thread forest_soup
I have 2 solr nodes(solr1 and solr2) in a SolrCloud. 
After this issue happened, solr2 are in recovering state. And after it takes
long time to finish recovery, there is this issue again, and it turn to
recovery again. It happens again and again. 

ERROR - 2014-08-04 21:12:27.917; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: no servers hosting shard: 
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:148)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:273)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482)
at java.util.concurrent.FutureTask.run(FutureTask.java:273)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)
at java.lang.Thread.run(Thread.java:804)

We have those settings in solrconfig.xml different with default:

24  
200
1 

  
   1000 
   ${solr.autoCommit.maxTime:15000}
   true 
 
  
   
   ${solr.autoSoftCommit.maxTime:-1}
 





   
   50

The full solrconfig.xml is as attachment.
solrconfig_perf0804.xml
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-no-servers-hosting-shard-tp4151637.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-07 Thread Ali Nazemian
Thank you very much. But why we should go for solr distributed with hadoop?
There is already solrCloud which is pretty applicable in the case of big
index. Is there any advantage for sending indexes over map reduce that
solrCloud can not provide?
Regards.


On Wed, Aug 6, 2014 at 9:09 PM, Erick Erickson 
wrote:

> bq: Are you aware of Cloudera search? I know they provide an integrated
> Hadoop ecosystem.
>
> What Cloudera Search does via the MapReduceIndexerTool (MRIT) is create N
> sub-indexes for
> each shard in the M/R paradigm via EmbeddedSolrServer. Eventually, these
> sub-indexes for
> each shard are merged (perhaps through some number of levels) in the reduce
> phase and
> maybe merged into a live Solr instance (--go-live). You'll note that this
> tool requires the
> address of the ZK ensemble from which it can get the network topology,
> configuration files,
> all that rot. If you don't use the --go-live option, the output is still a
> Solr index, it's just that
> the index for each shard is left in a specific directory on HDFS. Being on
> HDFS allows
> this kind of M/R paradigm for massively parallel indexing operations, and
> perhaps massively
> complex analysis.
>
> Nowhere is there any low-level non-Solr manipulation of the indexes.
>
> The Flume fork just writes directly to the Solr nodes. It knows about the
> ZooKeeper
> ensemble and the collection too and communicates via SolrJ I'm pretty sure.
>
> As far as integrating with HDFS, you're right, HA is part of the package.
> As far as using
> the Solr indexes for analysis, well you can write anything you want to use
> the Solr indexes
> from anywhere in the M/R world and have them available from anywhere in the
> cluster. There's
> no real need to even have Solr running, you could use the output from MRIT
> and access the
> sub-shards with the EmbeddedSolrServer if you wanted, leaving out all the
> pesky servlet
> container stuff.
>
> bq: So why we go for HDFS in the case of analysis if we want to use SolrJ
> for this purpose?
> What is the point?
>
> Scale and data access in a nutshell. In the HDFS world, you can scale
> pretty linearly
> with the number of nodes you can rack together.
>
> Frankly though, if your data set is small enough to fit on a single machine
> _and_ you can get
> through your analysis in a reasonable time (reasonable here is up to you),
> then HDFS
> is probably not worth the hassle. But in the big data world where we're
> talking petabyte scale,
> having HDFS as the underpinning opens up possibilities for working on data
> that were
> difficult/impossible with Solr previously.
>
> Best,
> Erick
>
>
>
> On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian 
> wrote:
>
> > Dear Erick,
> > I remembered some times ago, somebody asked about what is the point of
> > modify Solr to use HDFS for storing indexes. As far as I remember
> somebody
> > told him integrating Solr with HDFS has two advantages. 1) having hadoop
> > replication and HA. 2) using indexes and Solr documents for other
> purposes
> > such as Analysis. So why we go for HDFS in the case of analysis if we
> want
> > to use SolrJ for this purpose? What is the point?
> > Regards.
> >
> >
> > On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian 
> > wrote:
> >
> > > Dear Erick,
> > > Hi,
> > > Thank you for you reply. Yeah I am aware that SolrJ is my last option.
> I
> > > was thinking about raw I/O operation. So according to your reply
> probably
> > > it is not applicable somehow. What about the Lily project that Michael
> > > mentioned? Is that consider SolrJ too? Are you aware of Cloudera
> search?
> > I
> > > know they provide an integrated Hadoop ecosystem. Do you know what is
> > their
> > > suggestion?
> > > Best regards.
> > >
> > >
> > >
> > > On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > >> What you haven't told us is what you mean by "modify the
> > >> index outside Solr". SolrJ? Using raw Lucene? Trying to modify
> > >> things by writing your own codec? Standard Java I/O operations?
> > >> Other?
> > >>
> > >> You could use SolrJ to connect to an existing Solr server and
> > >> both read and modify at will form your M/R jobs. But if you're
> > >> thinking of trying to write/modify the segment files by raw I/O
> > >> operations, good luck! I'm 99.99% certain that's going to cause
> > >> you endless grief.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>
> > >> On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian 
> > >> wrote:
> > >>
> > >> > Actually I am going to do some analysis on the solr data using map
> > >> reduce.
> > >> > For this purpose it might be needed to change some part of data or
> add
> > >> new
> > >> > fields from outside solr.
> > >> >
> > >> >
> > >> > On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey 
> > wrote:
> > >> >
> > >> > > On 8/5/2014 7:04 AM, Ali Nazemian wrote:
> > >> > > > I changed solr 4.9 to write index and data on hdfs. Now I am
> going
> > >> to
> > >> > > > connect to those data from 

solr in classic asp project

2014-08-07 Thread Sandeep Bohra
I am using an classic ASP 3.0 application and would like to implement SOLR
onto it. My database is SQL server and also it connects to AS/400 using
batch processing. Can someone suggest a starting point?



*RegardsSandeep*


Re: solr in classic asp project

2014-08-07 Thread parnab kumar
Can you elaborate on how you plan to use SOLR in your project?

Parnab..
CSE, IIT Kharagpur



On Thu, Aug 7, 2014 at 12:51 PM, Sandeep Bohra <
sandeep.bo...@3pillarglobal.com> wrote:

> I am using an classic ASP 3.0 application and would like to implement SOLR
> onto it. My database is SQL server and also it connects to AS/400 using
> batch processing. Can someone suggest a starting point?
>
>
>
> *RegardsSandeep*
>


Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-07 Thread Erick Erickson
If SolrCloud meets your needs, without Hadoop, then
there's no real reason to introduce the added complexity.

There are a bunch of problems that do _not_ work
well with SolrCloud over non-Hadoop file systems. For
those problems, the combination of SolrCloud and Hadoop
make tackling them possible.

Best,
Erick


On Thu, Aug 7, 2014 at 3:55 AM, Ali Nazemian  wrote:

> Thank you very much. But why we should go for solr distributed with hadoop?
> There is already solrCloud which is pretty applicable in the case of big
> index. Is there any advantage for sending indexes over map reduce that
> solrCloud can not provide?
> Regards.
>
>
> On Wed, Aug 6, 2014 at 9:09 PM, Erick Erickson 
> wrote:
>
> > bq: Are you aware of Cloudera search? I know they provide an integrated
> > Hadoop ecosystem.
> >
> > What Cloudera Search does via the MapReduceIndexerTool (MRIT) is create N
> > sub-indexes for
> > each shard in the M/R paradigm via EmbeddedSolrServer. Eventually, these
> > sub-indexes for
> > each shard are merged (perhaps through some number of levels) in the
> reduce
> > phase and
> > maybe merged into a live Solr instance (--go-live). You'll note that this
> > tool requires the
> > address of the ZK ensemble from which it can get the network topology,
> > configuration files,
> > all that rot. If you don't use the --go-live option, the output is still
> a
> > Solr index, it's just that
> > the index for each shard is left in a specific directory on HDFS. Being
> on
> > HDFS allows
> > this kind of M/R paradigm for massively parallel indexing operations, and
> > perhaps massively
> > complex analysis.
> >
> > Nowhere is there any low-level non-Solr manipulation of the indexes.
> >
> > The Flume fork just writes directly to the Solr nodes. It knows about the
> > ZooKeeper
> > ensemble and the collection too and communicates via SolrJ I'm pretty
> sure.
> >
> > As far as integrating with HDFS, you're right, HA is part of the package.
> > As far as using
> > the Solr indexes for analysis, well you can write anything you want to
> use
> > the Solr indexes
> > from anywhere in the M/R world and have them available from anywhere in
> the
> > cluster. There's
> > no real need to even have Solr running, you could use the output from
> MRIT
> > and access the
> > sub-shards with the EmbeddedSolrServer if you wanted, leaving out all the
> > pesky servlet
> > container stuff.
> >
> > bq: So why we go for HDFS in the case of analysis if we want to use SolrJ
> > for this purpose?
> > What is the point?
> >
> > Scale and data access in a nutshell. In the HDFS world, you can scale
> > pretty linearly
> > with the number of nodes you can rack together.
> >
> > Frankly though, if your data set is small enough to fit on a single
> machine
> > _and_ you can get
> > through your analysis in a reasonable time (reasonable here is up to
> you),
> > then HDFS
> > is probably not worth the hassle. But in the big data world where we're
> > talking petabyte scale,
> > having HDFS as the underpinning opens up possibilities for working on
> data
> > that were
> > difficult/impossible with Solr previously.
> >
> > Best,
> > Erick
> >
> >
> >
> > On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian 
> > wrote:
> >
> > > Dear Erick,
> > > I remembered some times ago, somebody asked about what is the point of
> > > modify Solr to use HDFS for storing indexes. As far as I remember
> > somebody
> > > told him integrating Solr with HDFS has two advantages. 1) having
> hadoop
> > > replication and HA. 2) using indexes and Solr documents for other
> > purposes
> > > such as Analysis. So why we go for HDFS in the case of analysis if we
> > want
> > > to use SolrJ for this purpose? What is the point?
> > > Regards.
> > >
> > >
> > > On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian 
> > > wrote:
> > >
> > > > Dear Erick,
> > > > Hi,
> > > > Thank you for you reply. Yeah I am aware that SolrJ is my last
> option.
> > I
> > > > was thinking about raw I/O operation. So according to your reply
> > probably
> > > > it is not applicable somehow. What about the Lily project that
> Michael
> > > > mentioned? Is that consider SolrJ too? Are you aware of Cloudera
> > search?
> > > I
> > > > know they provide an integrated Hadoop ecosystem. Do you know what is
> > > their
> > > > suggestion?
> > > > Best regards.
> > > >
> > > >
> > > >
> > > > On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson <
> > erickerick...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> What you haven't told us is what you mean by "modify the
> > > >> index outside Solr". SolrJ? Using raw Lucene? Trying to modify
> > > >> things by writing your own codec? Standard Java I/O operations?
> > > >> Other?
> > > >>
> > > >> You could use SolrJ to connect to an existing Solr server and
> > > >> both read and modify at will form your M/R jobs. But if you're
> > > >> thinking of trying to write/modify the segment files by raw I/O
> > > >> operations, good luck! I'm 99.99% certain that's going to cause
> > > >>

RE: Data Import handler and join select

2014-08-07 Thread Dyer, James
Alejandro,

You can use a sub-entity with a cache using DIH.  This will solve the 
"n+1-select" problem and make it run quickly.  Unfortunately, the only built-in 
cache implementation is in-memory so it doesn't scale.  There is a fast, 
disk-backed cache using bdb-je, which I use in production.  See 
https://issues.apache.org/jira/browse/SOLR-2613 .  You will need to build this 
youself and include it on the classpath, and obtain a copy of bdb-je from 
Oracle.  While bdb-je is open source, its license is incompatible with ASL so 
this will never officially be part of Solr.

Once you have a disk-backed cache, you can specify it on the child entity like 
this:




If you don't want to go down this path, you can achieve this all with one 
query, if you include and ORDER BY to sort by whatever field is used as Solr's 
uniqueKey, and add a dummy row at the end with a UNION:

SELECT p.uniqueKey, ..., 'A' as lastInd from PRODUCTS p 
INNER JOIN DESCRIPTIONS d ON p.uniqueKey = d.productKey
UNION SELECT 0 as uniqueKey, ... , 'B' as lastInd from dual 
ORDER BY uniqueKey, lastInd

Then your transformer would need to keep the "lastUniqueKey" in an instance 
variable and keep a running map of everything its seen for that key.  When the 
key changes, or if on the last row, send that map as the document.  Otherwise, 
the transformer returns null.  This will collect data from each row seen onto 
one document.

Keep in mind also, that in a lot of cases like this, it might just be easiest 
to write a program that uses solrj to send your documents rather than trying to 
make DIH's features fit your use-case.  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alejandro Marqués Rodríguez [mailto:amarq...@paradigmatecnologico.com] 
Sent: Thursday, August 07, 2014 1:43 AM
To: solr-user@lucene.apache.org
Subject: Data Import handler and join select

Hi,

I have one problem while indexing with data import hadler while doing a
join select. I have two tables, one with products and another one with
descriptions for each product in several languages.

So it would be:

Products: ID, NAME, BRAND, PRICE, ...
Descriptions: ID, LANGUAGE, DESCRIPTION

I would like to have every product indexed as a document with a multivalued
field "language" which contains every language that has an associated
description and several dinamic fields "description_" one for each language.

So it would be for example:

Id: 1
Name: Product
Brand: Brand
Price: 10
Languages: [es,en]
Description_es: Descripción en español
Description_en: English description

Our first approach was using sub-entities for the data import handler and
after implementing some transformers we had everything indexed as we
wanted. The sub-entity process added the descriptions for each language to
the solr document and then indexed them.

The problem was performance. I've read that using sub-entities affected
performance greatly, so we changed our process in order to use a join
instead.

Performance was greatly improved this way but now we have a problem. Each
time a row is processed a solr document is generated and indexed into solr,
but the data is not added to any previous data, but it replaces it.

If we had the previous example the query resulting from the join would be:

Id - Name - Brand - Price - Language - Description
1 - Product - Brand - 10 - es - Descripción en español
1 - Product - Brand - 10 - en - English description

So when indexing as both have the same id the only information I get is the
second row.

Is there any way for data import handler to manage this and allow the
documents to be indexed updating any previous data?

Thanks in advance



-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Disabling transaction logs

2014-08-07 Thread KNitin
Hello

I am using solr 4.6.1 with over 1000 collections and 8 nodes. Restarting of
nodes takes a long time (especially if we have indexing running against it)
. I want to see if disabling transaction logs can help with a better robust
restart. However I can't see any docs around disabling txn logs in solrcloud

Can anyone help with info on how to disable transaction logs ?


Thanks
Nitin


Re: Character encoding problems

2014-08-07 Thread Chris Hostetter

It's not clear to me from any of the comments you've made in this thread 
wether you've ever confirmed *exactly* what you are getting back from 
solr, ignoring the PHP completley. (ie: you refer to "UTF-8 for all of the 
web pages" suggesting you are only looking at some web application which 
is consuming dat from solr)

What do you see when you use something like curl to talk to solr directly 
and inspect the raw bytes (in both directions) ?

For example...

$ echo '[{"id":"HOSS","fr_s":"téléphone"}]' > french.json
$ # sanity check that my shell didn't bork the utf8
$ cat french.json | uniname -ap
character  byte   UTF-32   encoded as glyph   name
   23 23  E9   C3 A9  é  LATIN SMALL LETTER E WITH 
ACUTE
   25 26  E9   C3 A9  é  LATIN SMALL LETTER E WITH 
ACUTE
$ curl -sS -X POST 'http://localhost:8983/solr/collection1/update?commit=true' 
-H 'Content-Type: application/json' -d @french.json 
{"responseHeader":{"status":0,"QTime":445}}
$ curl -sS 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&wt=json&omitHeader=true&indent=true'
{
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"HOSS",
"fr_s":"téléphone",
"_version_":1475795659384684544}]
  }}
$ curl -sS 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&wt=json&omitHeader=true&indent=true'
 | uniname -ap
character  byte   UTF-32   encoded as glyph   name
   94 94  E9   C3 A9  é  LATIN SMALL LETTER E WITH 
ACUTE
   96 97  E9   C3 A9  é  LATIN SMALL LETTER E WITH 
ACUTE



One other cool diagnostic trick you can use, if the data coming back 
over the wire is definitely no longer utf8, is to leverate the "python" 
response writer, because it generates "\uXX" escape sequences for 
non-ASCII strings at the solr level -- if those are correct, that helps 
you clearly identify that it's the HTTP layer where your values are 
getting corrupted...

$ curl -sS 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&wt=python&omitHeader=true&indent=true'
{
  'response':{'numFound':1,'start':0,'docs':[
  {
'id':'HOSS',
'fr_s':u't\u00e9l\u00e9phone',
'_version_':1475795807492898816}]
  }}


-Hoss
http://www.lucidworks.com/

Re: Disabling transaction logs

2014-08-07 Thread Anshum Gupta
Hi Nitin,

To answer your question first, yes, you can disable the transaction log by
commenting/removing the   part of the solrconfig.xml.

At the same time, I'd highly recommend not disabling transaction logs. They
are needed for NRT, peer sync, high availability/disaster recovery parts of
SolrCloud i.e. a lot of what makes SolrCloud depends on these logs. When
you say you want a robust restart, I think that is what you're getting
right now. If you mean to make the entire process faster, read the post
below and you should be in a much better position.

Here's a writeup by Erik Erickson on soft/hard commits and transaction logs
in Solr that would help you understand this better.
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/


On Thu, Aug 7, 2014 at 9:12 AM, KNitin  wrote:

> Hello
>
> I am using solr 4.6.1 with over 1000 collections and 8 nodes. Restarting of
> nodes takes a long time (especially if we have indexing running against it)
> . I want to see if disabling transaction logs can help with a better robust
> restart. However I can't see any docs around disabling txn logs in
> solrcloud
>
> Can anyone help with info on how to disable transaction logs ?
>
>
> Thanks
> Nitin
>



-- 

Anshum Gupta
http://www.anshumgupta.net


Change order of spell checker suggestions issue

2014-08-07 Thread Corey Gerhardt
Solr Rev: 4.6 Lucidworks: 2.6.3

This is sort of a repeat question, sorry.

In the solrconfig.xml, will changing the value for the comparatorClass affect 
the sort of suggestions returned?

This is my spellcheck component:


false
true
5


textSpell


org.apache.solr.spelling.DirectSolrSpellChecker
default
spell
internal
0.5
2
1
5
score
1
4
0.01

  

Searching for unie produces the following suggestions. But the suggestions 
appear to me to be by frequency (I've indicated Levenshtein distance in []):



unity [ 3  ]

1200





unger [ 3  ]

119





unick [ 3 ]

16





united [ 4 ]

16





unique [ 4 ]

10





unity [ 3 ]

7





unser [ 3 ]

7





unyi [ 2 ]

7



Is something configured incorrectly or am I just needing more coffee?


Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-07 Thread Ali Nazemian
Dear Erick,
Could you please name those problems that SolrCloud can not tackle them
alone? Maybe I need solrCloud+ Hadoop and I am not aware of that yet.
Regards.


On Thu, Aug 7, 2014 at 7:37 PM, Erick Erickson 
wrote:

> If SolrCloud meets your needs, without Hadoop, then
> there's no real reason to introduce the added complexity.
>
> There are a bunch of problems that do _not_ work
> well with SolrCloud over non-Hadoop file systems. For
> those problems, the combination of SolrCloud and Hadoop
> make tackling them possible.
>
> Best,
> Erick
>
>
> On Thu, Aug 7, 2014 at 3:55 AM, Ali Nazemian 
> wrote:
>
> > Thank you very much. But why we should go for solr distributed with
> hadoop?
> > There is already solrCloud which is pretty applicable in the case of big
> > index. Is there any advantage for sending indexes over map reduce that
> > solrCloud can not provide?
> > Regards.
> >
> >
> > On Wed, Aug 6, 2014 at 9:09 PM, Erick Erickson 
> > wrote:
> >
> > > bq: Are you aware of Cloudera search? I know they provide an integrated
> > > Hadoop ecosystem.
> > >
> > > What Cloudera Search does via the MapReduceIndexerTool (MRIT) is
> create N
> > > sub-indexes for
> > > each shard in the M/R paradigm via EmbeddedSolrServer. Eventually,
> these
> > > sub-indexes for
> > > each shard are merged (perhaps through some number of levels) in the
> > reduce
> > > phase and
> > > maybe merged into a live Solr instance (--go-live). You'll note that
> this
> > > tool requires the
> > > address of the ZK ensemble from which it can get the network topology,
> > > configuration files,
> > > all that rot. If you don't use the --go-live option, the output is
> still
> > a
> > > Solr index, it's just that
> > > the index for each shard is left in a specific directory on HDFS. Being
> > on
> > > HDFS allows
> > > this kind of M/R paradigm for massively parallel indexing operations,
> and
> > > perhaps massively
> > > complex analysis.
> > >
> > > Nowhere is there any low-level non-Solr manipulation of the indexes.
> > >
> > > The Flume fork just writes directly to the Solr nodes. It knows about
> the
> > > ZooKeeper
> > > ensemble and the collection too and communicates via SolrJ I'm pretty
> > sure.
> > >
> > > As far as integrating with HDFS, you're right, HA is part of the
> package.
> > > As far as using
> > > the Solr indexes for analysis, well you can write anything you want to
> > use
> > > the Solr indexes
> > > from anywhere in the M/R world and have them available from anywhere in
> > the
> > > cluster. There's
> > > no real need to even have Solr running, you could use the output from
> > MRIT
> > > and access the
> > > sub-shards with the EmbeddedSolrServer if you wanted, leaving out all
> the
> > > pesky servlet
> > > container stuff.
> > >
> > > bq: So why we go for HDFS in the case of analysis if we want to use
> SolrJ
> > > for this purpose?
> > > What is the point?
> > >
> > > Scale and data access in a nutshell. In the HDFS world, you can scale
> > > pretty linearly
> > > with the number of nodes you can rack together.
> > >
> > > Frankly though, if your data set is small enough to fit on a single
> > machine
> > > _and_ you can get
> > > through your analysis in a reasonable time (reasonable here is up to
> > you),
> > > then HDFS
> > > is probably not worth the hassle. But in the big data world where we're
> > > talking petabyte scale,
> > > having HDFS as the underpinning opens up possibilities for working on
> > data
> > > that were
> > > difficult/impossible with Solr previously.
> > >
> > > Best,
> > > Erick
> > >
> > >
> > >
> > > On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian 
> > > wrote:
> > >
> > > > Dear Erick,
> > > > I remembered some times ago, somebody asked about what is the point
> of
> > > > modify Solr to use HDFS for storing indexes. As far as I remember
> > > somebody
> > > > told him integrating Solr with HDFS has two advantages. 1) having
> > hadoop
> > > > replication and HA. 2) using indexes and Solr documents for other
> > > purposes
> > > > such as Analysis. So why we go for HDFS in the case of analysis if we
> > > want
> > > > to use SolrJ for this purpose? What is the point?
> > > > Regards.
> > > >
> > > >
> > > > On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian 
> > > > wrote:
> > > >
> > > > > Dear Erick,
> > > > > Hi,
> > > > > Thank you for you reply. Yeah I am aware that SolrJ is my last
> > option.
> > > I
> > > > > was thinking about raw I/O operation. So according to your reply
> > > probably
> > > > > it is not applicable somehow. What about the Lily project that
> > Michael
> > > > > mentioned? Is that consider SolrJ too? Are you aware of Cloudera
> > > search?
> > > > I
> > > > > know they provide an integrated Hadoop ecosystem. Do you know what
> is
> > > > their
> > > > > suggestion?
> > > > > Best regards.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson <
> > > erickerick...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > >> 

RE: Change order of spell checker suggestions issue

2014-08-07 Thread Dyer, James
Corey,

Looking more carefully at your responses than I did last time I answered this 
question, it looks like every correction is 2 edits in this example.  

unie > unity (e>t , insert y)
unie > unger (i>g , insert r)
unie > unick (e>c , insert k)
unie > united (delete t , insert d)
unie > unique (delete q, u)
unie > unity (e>t , insert y)
unie > unser (s>i , insert r)
unie > unyi (i>y , e>i)

So both "score" and "freq" will give it to you by frequency.  Usually when I'm 
in doubt of something like this working like it should, I try to come up with 
more than 1 clear-cut example.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Thursday, August 07, 2014 11:31 AM
To: Solr User List
Subject: Change order of spell checker suggestions issue

Solr Rev: 4.6 Lucidworks: 2.6.3

This is sort of a repeat question, sorry.

In the solrconfig.xml, will changing the value for the comparatorClass affect 
the sort of suggestions returned?

This is my spellcheck component:


false
true
5


textSpell


org.apache.solr.spelling.DirectSolrSpellChecker
default
spell
internal
0.5
2
1
5
score
1
4
0.01

  

Searching for unie produces the following suggestions. But the suggestions 
appear to me to be by frequency (I've indicated Levenshtein distance in []):



unity [ 3  ]

1200





unger [ 3  ]

119





unick [ 3 ]

16





united [ 4 ]

16





unique [ 4 ]

10





unity [ 3 ]

7





unser [ 3 ]

7





unyi [ 2 ]

7



Is something configured incorrectly or am I just needing more coffee?



Wrong XSLT used in translation

2014-08-07 Thread Christopher Gross
Solr 4.1, in SolrCloud mode.  3 nodes configured, Running in Tomcat 7 w/
Java 7.

I have a few cores set up, let's just call them A, B, C and D.   They have
some uniquely named xslt files, but they all have a "rss.xsl" file.

Sometimes, on just 1 of the nodes, if I do a query for something in A and
translate it with the rss.xsl, it will do the query just fine and give the
right number of results (solr logged the query and had it going to the
correct core), but it uses B or C's rss.xsl.  Since the schemas are
different, the xml is mostly empty.  A refresh will have it go back to
using the correct rss.xsl.

Has anyone run into a problem like this?  Is it a problem with the 4.1
Solr?  Will upgrading fix it?

Is it a better practice to uniquely name the xslt files for each core
(having a-rss.xsl, b-rss.xsl, etc)?

Any help/thoughts would be appreciated.

-- Chris


Re: Wrong XSLT used in translation

2014-08-07 Thread Shawn Heisey
On 8/7/2014 1:46 PM, Christopher Gross wrote:
> Solr 4.1, in SolrCloud mode.  3 nodes configured, Running in Tomcat 7 w/
> Java 7.
>
> I have a few cores set up, let's just call them A, B, C and D.   They have
> some uniquely named xslt files, but they all have a "rss.xsl" file.
>
> Sometimes, on just 1 of the nodes, if I do a query for something in A and
> translate it with the rss.xsl, it will do the query just fine and give the
> right number of results (solr logged the query and had it going to the
> correct core), but it uses B or C's rss.xsl.  Since the schemas are
> different, the xml is mostly empty.  A refresh will have it go back to
> using the correct rss.xsl.
>
> Has anyone run into a problem like this?  Is it a problem with the 4.1
> Solr?  Will upgrading fix it?
>
> Is it a better practice to uniquely name the xslt files for each core
> (having a-rss.xsl, b-rss.xsl, etc)?

I wonder if Solr might have a bug with XSLT caching, where the cache is
global and simply looks at the base filename, not the full path.  If it
works when you use xsl files with different names, then that is the most
likely problem.

If you determine that the bug I mentioned is what's happening, before
filing a bug in Jira, we need to determine whether it's still a problem
in the latest version.  Version 4.1 came out in January 2013.  Upgrading
is definitely advised, if you can do it.

Thanks,
Shawn



Re: Anybody uses Solr JMX?

2014-08-07 Thread Otis Gospodnetic
Hi Paul,

There are lots of people/companies using SPM for Solr/SolrCloud and I don't
recall anyone saying SPM agent collecting metrics via JMX had a negative
impact on Solr performance.  That said, some people really dislike JMX and
some open source projects choose to expose metrics via custom stats APIs or
even files.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Aug 6, 2014 at 11:18 PM, Paul Libbrecht  wrote:

> Hello Otis,
>
> this looks like an excellent idea!
> I'm in need of that, erm… last week and probably this one too.
>
> Is there not a risk that reading certain JMX properties actually hogs the
> process? (or is it by design that MBeans are supposed to be read without
> any lock effect?).
>
> thanks for the hint.
>
> paul
>
>
>
> On 6 mai 2014, at 04:43, Otis Gospodnetic 
> wrote:
>
> > Alexandre, you could use something like
> > http://blog.sematext.com/2012/09/25/new-tool-jmxc-jmx-console/ to
> quickly
> > dump everything out of JMX and see if there is anything there Solr Admin
> UI
> > doesn't expose.  I think you'll find there is more in JMX than Solr Admin
> > UI shows.
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Mon, May 5, 2014 at 1:56 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>wrote:
> >
> >> Thank you everybody for the links and explanations.
> >>
> >> I am still curious whether JMX exposes more details than the Admin UI?
> >> I am thinking of a troubleshooting context, rather than long-term
> >> monitoring one.
> >>
> >> Regards,
> >>   Alex.
> >> Personal website: http://www.outerthoughts.com/
> >> Current project: http://www.solr-start.com/ - Accelerating your Solr
> >> proficiency
> >>
> >>
> >> On Mon, May 5, 2014 at 12:21 PM, Gora Mohanty 
> wrote:
> >>> On May 5, 2014 7:09 AM, "Alexandre Rafalovitch" 
> >> wrote:
> 
>  I have religiously kept  statement in my solrconfig.xml, thinking
>  it was enabling the web interface statistics output.
> 
>  But looking at the server logs really closely, I can see that JMX is
>  actually disabled without server present. And the Admin UI does not
>  actually seem to care after a quick test.
> 
>  Does anybody have a real experience with Solr JMX? Does it expose more
>  information than Admin UI's Plugins/Stats page? Is it good for
> 
> >>>
> >>> Have not been using JMX lately, but we were using it in the past. It
> does
> >>> allow monitoring many useful details. As others have commented, it also
> >>> integrates well with other monitoring  tools as JMX is a standard.
> >>>
> >>> Regards,
> >>> Gora
> >>
>
>


Re: Anybody uses Solr JMX?

2014-08-07 Thread rulinma
useful.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Anybody-uses-Solr-JMX-tp4134598p4151820.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to sync lib directory in SolrCloud?

2014-08-07 Thread rulinma
mark.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-sync-lib-directory-in-SolrCloud-tp4150405p4151821.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: why solr commit with serval docs

2014-08-07 Thread rulinma
code error by my colleague.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-solr-commit-with-serval-docs-tp4150583p4151822.html
Sent from the Solr - User mailing list archive at Nabble.com.