Re: Optimize SolrCloud without downtime

2015-03-31 Thread Pavel Hladik
When we indexing I see the deleted docs are a bit changing.. I was surprised
when developer reindex 120M index, we had around 110M of deleted docs and
this number was not falling. As you wrote, the typical behavior should be
merging deleted docs to 10-20% of whole index? So it should be after two
weeks around 20M of deleted docs.

I'm not sure of settings in our solrconfig.xml:


  10
  10


10



Should we change some of them? The mergeScheduler class is empty.

When I go to Core Admin, select our core I see:

maxCacheMB=48.0 maxMergeSizeMB=4.0

Is that ok, or the values are too low?

Best,
Pavel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Optimize-SolrCloud-without-downtime-tp4195170p4196506.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Same schema.xml is loaded for different cores in SolrCloud

2015-03-31 Thread Zheng Lin Edwin Yeo
Yes, I've delete my previous collections, and retried these using zkcli and
creating my collections using the collections API thereafter. It's working
now.

Thanks Erick.

Regards,
Edwin


On 31 March 2015 at 13:55, Erick Erickson  wrote:

> By now, I wouldn't particularly trust my setup. I'd blow it away and start
> over.
>
> bootstrapping is _only_ required to get the configs up to Zookeeper
> the first time. In fact I suggest you don't use it at all. Just start
> SolrCloud, and use zkcli to push the configs up. Thereafter, create
> your collections using the collections API.
>
> Zookeeper is just a central repository for your configs and the
> overall state of your cluster. As far as config sets are concerned,
> think of the upconfig (or bootstrap) as copying the config files to a
> place where they can be found by any random Solr instance that starts
> up.
>
> And the same applies to parameters like numShards. It's _only_ used as
> a convenience for creating a cluster for demo purposes. Thereafter,
> any time you start up that particular cloud, it'll read the old
> cluster state and completely ignore the numShards parameter.
>
> Rather than try to untangle what you've done, I'd re-install and work
> through the tutorial step-by-step. I think you've jumped ahead and
> gotten some things mixed up as far as your cluster state is concerned.
>
> Best,
> Erick
>
> On Mon, Mar 30, 2015 at 6:52 PM, Zheng Lin Edwin Yeo
>  wrote:
> > Hi Erick,
> >
> > I've started shard2 with the following command instead, but it's still
> the
> > same problem.
> > java -DzkHost=localhost:9983 -Djetty.port=8984 -jar start.jar
> >
> > But you mean for shard1 we do not have to include "
> > -Dbootstrap_confdir=./solr/logmill/conf" for subsequent startup?
> >
> > Regards,
> > Edwin
> >
> >
> > On 31 March 2015 at 00:46, Erick Erickson 
> wrote:
> >
> >> OK, this is a bit confused:
> >>
> >> 1> You're starting two embedded Zookeepers but they don't know about
> >> each other. So looking for the configsets is a bit confused.
> >> 2> There's no need to do the bootstrap thing after the first time. The
> >> _very_ first time you do this it pushes the configs up to Zookeeper,
> >> but after that you should just reference the config name.
> >> 3> you specify the config name when you _create_ the collection, not
> >> when you start it up. You may be doing this, but your startup.
> >> 4> I think you're confusing shards with collections. The equivalent of
> >> older-style cores would be just single-shard _collections_. Configs
> >> are associated on the collection level, not the shard level as all
> >> shards in a collection are presumed (indeed, _must_) use the same
> >> configuration.
> >>
> >> HTH,
> >> Erick
> >>
> >> On Mon, Mar 30, 2015 at 2:20 AM, Zheng Lin Edwin Yeo
> >>  wrote:
> >> > I've roughly know what is the problem from here.
> >> >
> >>
> http://stackoverflow.com/questions/23338324/zookeeper-multiple-collection-different-schema
> >> >
> >> > However, I couldn't find the zoo_data directory in all of my solr
> folder.
> >> > What could be the problem or where is the directory supposed to be
> >> located?
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> > On 30 March 2015 at 11:56, Zheng Lin Edwin Yeo 
> >> wrote:
> >> >
> >> >> Hi everyone,
> >> >>
> >> >> I've created a SolrCloud with multiple core, and I have different
> >> >> schema.xml for each of the core. However, when I start Solr, there's
> >> only
> >> >> one version of the schema.xml that is loaded onto Solr. Regardless of
> >> which
> >> >> core I go to, the schema.xml that is shown is the first one which I
> have
> >> >> loaded.
> >> >>
> >> >> What I did was, I have 3 cores: logmill, collection1 and collection2.
> >> >> Each of the core has 2 shrads: shard1 and shard2
> >> >>
> >> >> I first started the Solr with shard1 using the following command:
> >> >> java -Dcollection.configName=logmill -DzkRun -DnumShards=2
> >> >> -Dbootstrap_confdir=./solr/logmill/conf -jar start.jar
> >> >>
> >> >> After that I start shard2 using the following command:
> >> >> java -Dcollection.configName=logmill -DzkRun -DnumShards=2
> >> >> -Dbootstrap_confdir=./solr/logmill/conf -jar start.jar
> >> >>
> >> >> All the schema.xml loaded are from logmill core, even for the
> >> collection1
> >> >> and collection2.
> >> >>
> >> >> Even after I change the command to start shard1 with the following
> >> >> command, all the schema.xml are still from logmill
> >> >> java -Dcollection.configName=collection1 -DzkRun
> >> >> -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -jar
> start.jar
> >> >>
> >> >>
> >> >> How do I get Solr to read the different schema.xml for the different
> >> cores?
> >> >>
> >> >> Regards,
> >> >> Edwin
> >> >>
> >>
>


Re: SolrJ commit with openSearcher=false

2015-03-31 Thread vidit.asthana
Thanks for reply Shawn. I will try it out.

The reason that I am forced to do a hard commit through code is to handle a
problem I am facing with transaction logs.

I am forced to delete tlogs manually at regular interval and hence I want to
issue a hard commit before deleting them to ensure that no data loss happens
in case of node failure.

I have explained the issue in detail in another thread -
http://lucene.472066.n3.nabble.com/Transaction-logs-not-getting-deleted-td4184635.html

If you can provide me some help in finding the fix for the issue, then it
would be a huge help for me. 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-commit-with-openSearcher-false-tp4196499p4196527.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collapse and Expand behaviour on result with 1 document.

2015-03-31 Thread Joel Bernstein
You should be able to use collapse/expand with one result.

Does the document in the main result set have group members that aren't
being expanded?



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh  wrote:

> If I want to group the results (by a certain field) even if there is only
> 1 document, I should use the group parameter instead?
> The requirement is to group the result of product documents by their
> supplier id.
> "&group=true&group.field=P_SupplierId&group.limit=5"
>
> Is it true that the performance of collapse is better than group parameter
> on large data set, say 10-20 million documents?
>
> -Derek
>
>
> On 3/31/2015 10:03 AM, Joel Bernstein wrote:
>
>> The expanded section will only include groups that have expanded
>> documents.
>>
>> So, if the document that in the main result set has no documents to
>> expand,
>> then this is working as expected.
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh 
>> wrote:
>>
>>  Hi
>>>
>>> I have a query which return 1 document.
>>> When I add the collapse and expand parameters to it,
>>> "&expand=true&expand.rows=5&fq={!collapse%20field=P_SupplierId}", the
>>> expanded section is empty ().
>>>
>>> Is this the behaviour of collapse and expand parameters on result which
>>> contain only 1 document?
>>>
>>> -Derek
>>>
>>>
>>>
>>>
>


Re: Collapse and Expand behaviour on result with 1 document.

2015-03-31 Thread Joel Bernstein
The way that collapse/expand is designed to be used is as follows:

The main result set will contain the collapsed group heads.

The expanded section will contain the expanded groups for the page of
results.

To render the page you iterate the main result set. For each document check
to see if there is an expanded group.




Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein  wrote:

> You should be able to use collapse/expand with one result.
>
> Does the document in the main result set have group members that aren't
> being expanded?
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh  wrote:
>
>> If I want to group the results (by a certain field) even if there is only
>> 1 document, I should use the group parameter instead?
>> The requirement is to group the result of product documents by their
>> supplier id.
>> "&group=true&group.field=P_SupplierId&group.limit=5"
>>
>> Is it true that the performance of collapse is better than group
>> parameter on large data set, say 10-20 million documents?
>>
>> -Derek
>>
>>
>> On 3/31/2015 10:03 AM, Joel Bernstein wrote:
>>
>>> The expanded section will only include groups that have expanded
>>> documents.
>>>
>>> So, if the document that in the main result set has no documents to
>>> expand,
>>> then this is working as expected.
>>>
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh 
>>> wrote:
>>>
>>>  Hi

 I have a query which return 1 document.
 When I add the collapse and expand parameters to it,
 "&expand=true&expand.rows=5&fq={!collapse%20field=P_SupplierId}", the
 expanded section is empty ().

 Is this the behaviour of collapse and expand parameters on result which
 contain only 1 document?

 -Derek




>>
>


What's the need for copyField> when you have "fq"

2015-03-31 Thread Steven White
Hi folks,

I'm new to Solr and I have a question about , "q" and "fq".

If I have 50 fields in a Solr doc and I index them without doing any
 to a catch-all-field called "all_text".  During search I use
"fq" to list all the 50 fields to search on.  Now how different is this
from not using "fq" and searching against my catch-all-field of "all_text"
using "q"?

It seems to me that using  is a wast of space, and it also seems
to me that using "fq" I have better control over which fields will be
searched against.  Also, using "fq" I'm assuming my search terms will be
analyzed using that field's analyzer, in effect giving me better control
score and result.

Have I got this right, or am I missing something?

The problem that I'm trying to solve is this: user-A can search on a set of
field which is different from user-B.  Given this, why should I bother to
use  because my search will *always* be against a set of fields.

Note: I maybe mixing up "fq" with "qf" or even "uf".  Is "uf" what I should
be using vs. "fq"?

Thanks!

Steve


RE: What's the need for copyField> when you have "fq"

2015-03-31 Thread Toke Eskildsen
Steven White [swhite4...@gmail.com] wrote:
> If I have 50 fields in a Solr doc and I index them without doing any
>  to a catch-all-field called "all_text".  During search I use
> "fq" to list all the 50 fields to search on.  Now how different is this
> from not using "fq" and searching against my catch-all-field of "all_text"
> using "q"?

One potential use it to have the catch-all-field perform severe normalization 
to match more queries but rank those extra matches lower than a direct hit in a 
specific field. The same effect can be accomplished by having differently 
analyzed versions of the same logical field: Having a single catch-all is just 
easy to do.

Another reason can be performance: fq-matching against all fields is heavier 
than matching against a few fields and the catch-all.

- Toke Eskildsen


RE: how do you replicate solr-cloud between datacenters?

2015-03-31 Thread Davis, Daniel (NIH/NLM) [C]
I got the answer to my most recent question without even asking it!
Thanks

-Original Message-
From: Jack Krupansky [mailto:jack.krupan...@gmail.com] 
Sent: Monday, March 30, 2015 6:40 PM
To: solr-user@lucene.apache.org
Subject: Re: how do you replicate solr-cloud between datacenters?

That's an open issue. See:
https://issues.apache.org/jira/browse/SOLR-6273

-- Jack Krupansky

On Mon, Mar 30, 2015 at 5:45 PM, Timothy Ehlers  wrote:

> Can you use /replication ??? How would you do this between datacenters?
>
> --
> Tim Ehlers
>


FYI: danizen and me

2015-03-31 Thread Davis, Daniel (NIH/NLM) [C]
In the wake of Hillary Clinton's email, I'll be asking questions about work 
related stuff as daniel.da...@nih.gov.   
https://github.com/danizen/ is both work and personal, as is the norm for 
github.

Disclaimer - posts made my daniel.da...@nih.gov do 
not represent Medical Sciences and Computing or the National Library of 
Medicine, but you can assume that questions posted as 
daniel.da...@nih.gov still have something to do 
with a work project.   d...@danizen.net will probably 
be a  lurker at this point.

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH



Re: Optimize SolrCloud without downtime

2015-03-31 Thread Erick Erickson
I really don't have a good explanation here, those are the default
values and the folks who set them up no doubt chose them with some
care. Afraid I'll have to defer to people who actually know the
code...

Erick

On Mon, Mar 30, 2015 at 11:59 PM, Pavel Hladik
 wrote:
> When we indexing I see the deleted docs are a bit changing.. I was surprised
> when developer reindex 120M index, we had around 110M of deleted docs and
> this number was not falling. As you wrote, the typical behavior should be
> merging deleted docs to 10-20% of whole index? So it should be after two
> weeks around 20M of deleted docs.
>
> I'm not sure of settings in our solrconfig.xml:
>
> 
>   10
>   10
> 
>
> 10
>
>  class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
>
> Should we change some of them? The mergeScheduler class is empty.
>
> When I go to Core Admin, select our core I see:
>
> maxCacheMB=48.0 maxMergeSizeMB=4.0
>
> Is that ok, or the values are too low?
>
> Best,
> Pavel
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Optimize-SolrCloud-without-downtime-tp4195170p4196506.html
> Sent from the Solr - User mailing list archive at Nabble.com.


solr.DictionaryCompoundWordTokenFilterFactory extracts words in string

2015-03-31 Thread Simon Martinelli
Hi,

I configured solr.DictionaryCompoundWordTokenFilterFactory using a
dictionary with the following content:

- lindor
- schlitten
- dorsch
- filet

I want to index the compound words

- dorschfilet
- lindorschlitten

dorschfilet is processed as expected

dorsch filet

but lindorschlitten is compound of

lindor and schlitten

but i get

lindor dorsch schlitten

so the filter is extracting dorsch but the word before (lin) and after
(litten) are not valid word parts.

Is there any better compound word filter for German?

Thanks, Simon


Re: SolrJ commit with openSearcher=false

2015-03-31 Thread Shawn Heisey
On 3/31/2015 2:56 AM, vidit.asthana wrote:
> Thanks for reply Shawn. I will try it out.
>
> The reason that I am forced to do a hard commit through code is to handle a
> problem I am facing with transaction logs.
>
> I am forced to delete tlogs manually at regular interval and hence I want to
> issue a hard commit before deleting them to ensure that no data loss happens
> in case of node failure.
>
> I have explained the issue in detail in another thread -
> http://lucene.472066.n3.nabble.com/Transaction-logs-not-getting-deleted-td4184635.html
>
> If you can provide me some help in finding the fix for the issue, then it
> would be a huge help for me. 

The first thing I would try is to set up autoCommit with a maxTime of
30 (five minutes) and openSearcher set to false, as shown in the
comments of the the example solrconfig.xml, although that example may
have a value of 15000 (15 seconds).  If that doesn't bring your
transaction logs under control, then you definitely are facing an
unusual situation or a bug where old and outdated transaction logs are
not being automatically deleted.  If it does appear that you've got a
bug, one of the first steps I would take is upgrading from 4.10.0 to
4.10.4 - it should be a drop-in replacement of your .war file and any
contrib jars, and I would delete the extracted version of the war before
restarting.

Is your data directory on a network filesystem, like NFS or SMB?  That
can sometimes cause weird problems with Solr.

Are you seeing any ERROR or WARN entries in your solr log?

Thanks,
Shawn



Re: What's the need for copyField> when you have "fq"

2015-03-31 Thread Erick Erickson
Yet a third is that  is often used when you want to treat
the same data different ways. For instance, consider a "title" field.
You might want to sort by title, but sorting on a tokenized field is
undefined so I might use a copyField from "title" to "title_sort" and
analyze the sort field with some kind of normalized sorting (including
lowercasing, removing leading articles, etc).

Another thing to consider is dynamic fields. You may not _know_ all
the fields up-front and putting them all in a single field to search
may make sense.

You're right though, if you want to construct entire clauses across
individual fields, you can do that explicitly or use edismax and
there's no need to copyField anything.

Best,
Erick


On Tue, Mar 31, 2015 at 6:47 AM, Toke Eskildsen  
wrote:
> Steven White [swhite4...@gmail.com] wrote:
>> If I have 50 fields in a Solr doc and I index them without doing any
>>  to a catch-all-field called "all_text".  During search I use
>> "fq" to list all the 50 fields to search on.  Now how different is this
>> from not using "fq" and searching against my catch-all-field of "all_text"
>> using "q"?
>
> One potential use it to have the catch-all-field perform severe normalization 
> to match more queries but rank those extra matches lower than a direct hit in 
> a specific field. The same effect can be accomplished by having differently 
> analyzed versions of the same logical field: Having a single catch-all is 
> just easy to do.
>
> Another reason can be performance: fq-matching against all fields is heavier 
> than matching against a few fields and the catch-all.
>
> - Toke Eskildsen


Re: SolrJ commit with openSearcher=false

2015-03-31 Thread Erick Erickson
Hmmm, you really shouldn't have to do this. What have you tried to
figure out why the strange node isn't getting cleaned up? Is there
anything in the Solr logs that might help?

Is it a Windows machine? Some of the delete semantics for Windows can
leave things around. What happens if you restart the server and
continue indexing, do some of the tlogs disappear (the hypothesis here
is that somehow the files are being held open).

Best,
Erick

On Tue, Mar 31, 2015 at 8:39 AM, Shawn Heisey  wrote:
> On 3/31/2015 2:56 AM, vidit.asthana wrote:
>> Thanks for reply Shawn. I will try it out.
>>
>> The reason that I am forced to do a hard commit through code is to handle a
>> problem I am facing with transaction logs.
>>
>> I am forced to delete tlogs manually at regular interval and hence I want to
>> issue a hard commit before deleting them to ensure that no data loss happens
>> in case of node failure.
>>
>> I have explained the issue in detail in another thread -
>> http://lucene.472066.n3.nabble.com/Transaction-logs-not-getting-deleted-td4184635.html
>>
>> If you can provide me some help in finding the fix for the issue, then it
>> would be a huge help for me.
>
> The first thing I would try is to set up autoCommit with a maxTime of
> 30 (five minutes) and openSearcher set to false, as shown in the
> comments of the the example solrconfig.xml, although that example may
> have a value of 15000 (15 seconds).  If that doesn't bring your
> transaction logs under control, then you definitely are facing an
> unusual situation or a bug where old and outdated transaction logs are
> not being automatically deleted.  If it does appear that you've got a
> bug, one of the first steps I would take is upgrading from 4.10.0 to
> 4.10.4 - it should be a drop-in replacement of your .war file and any
> contrib jars, and I would delete the extracted version of the war before
> restarting.
>
> Is your data directory on a network filesystem, like NFS or SMB?  That
> can sometimes cause weird problems with Solr.
>
> Are you seeing any ERROR or WARN entries in your solr log?
>
> Thanks,
> Shawn
>


Filtering in Solr

2015-03-31 Thread Steven White
Hi folks,

I need filtering capability just as described here for Lucene:
http://www.javaranch.com/journal/2009/02/filtering-a-lucene-search.html

"Filtering is a mechanism of narrowing the search space, allowing only a
subset of the documents to be considered as possible hits. They can be used
to implement search-within-search features to successively search within a
previous set of results *or to constrain the document search space for
security or external data reasons.* A security filter is a powerful
example, *allowing users to only see search results of documents they own
even if their query technically matches other documents that are off
limits;* we provide an example of a security filter in the section
"Security filters".

How do I get this behavior using Solr?

If there is an example, that's great.

Thanks

Steve


Re: Solr 5.0.0 and HDFS

2015-03-31 Thread Joseph Obernberger
I've tried to replicate the issue starting from new, but so far it 
hasn't happened again.


-Joe

On 3/28/2015 2:10 PM, Mark Miller wrote:

Hmm...can you file a JIRA issue with this info?

- Mark

On Fri, Mar 27, 2015 at 6:09 PM Joseph Obernberger 
wrote:


I just started up a two shard cluster on two machines using HDFS. When I
started to index documents, the log shows errors like this. They repeat
when I execute searches.  All seems well - searches and indexing appear
to be working.
Possibly a configuration issue?
My HDFS config:

  true
  160
  true
  16384
  true
  false
  true
  64
  512
  hdfs://nameservice1:8020/solr5
  /etc/hadoop/conf.cloudera.hdfs1
  
Thank you!

-Joe


java.lang.IllegalStateException: file:
BlockDirectory(HdfsDirectory@799d5a0e
lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@49838b82) appears
both in delegate and in cache: cache=[_25.fnm, _2d.si, _2e.nvd, _2b.si,
_28.tvx, _2c.tvx, _1t.si, _27.nvd, _2b.tvd, _2d_Lucene50_0.pos, _23.nvd,
_28_Lucene50_0.doc, _28_Lucene50_0.dvd, _2d.fdt, _2c_Lucene50_0.pos,
_23.fdx, _2b_Lucene50_0.doc, _2d.nvm, _28.nvd, _23.fnm,
_2b_Lucene50_0.tim, _2e.fdt, _2d_Lucene50_0.doc, _2b_Lucene50_0.dvd,
_2d_Lucene50_0.dvd, _2b.nvd, _2g.tvx, _28_Lucene50_0.dvm,
_1v_Lucene50_0.tip, _2e_Lucene50_0.dvm, _2e_Lucene50_0.pos, _2g.fdx,
_2e.nvm, _2f.fdx, _1s.tvd, _23.nvm, _27.nvm, _1s_Lucene50_0.tip,
_2c.fnm, _2b.fdt, _2d.fdx, _2c.fdx, _2c.nvm, _2e.fnm,
_2d_Lucene50_0.dvm, _28.nvm, _28.fnm, _2b_Lucene50_0.tip,
_2e_Lucene50_0.dvd, _2c.si, _2f.fdt, _2b.fnm, _2e_Lucene50_0.tip,
_28.si, _28_Lucene50_0.tip, _2f.tvd, _2d_Lucene50_0.tim, _2f.tvx,
_2b_Lucene50_0.pos, _2e.fdx, _28.fdx, _2c_Lucene50_0.dvd, _2g.tvd,
_2c_Lucene50_0.tim, _2b.nvm, _23.fdt, _1s_Lucene50_0.tim,
_28_Lucene50_0.tim, _2c_Lucene50_0.doc, _28.tvd, _2b.tvx, _2c.nvd,
_2b.fdx, _2c_Lucene50_0.tip, _2e_Lucene50_0.doc, _2e_Lucene50_0.tim,
_2c.fdt, _27.tvd, _2d.tvd, _2d.tvx, _28_Lucene50_0.pos,
_2b_Lucene50_0.dvm, _2e.si, _2e.tvd, _2d.fnm, _2c.tvd, _2g.fdt, _2e.tvx,
_28.fdt, _2d_Lucene50_0.tip, _2c_Lucene50_0.dvm,
_2d.nvd],delegate=[_10.fdt, _10.fdx, _10.fnm, _10.nvd, _10.nvm, _10.si,
_10.tvd, _10.tvx, _10_Lucene50_0.doc, _10_Lucene50_0.dvd,
_10_Lucene50_0.dvm, _10_Lucene50_0.pos, _10_Lucene50_0.tim,
_10_Lucene50_0.tip, _11.fdt, _11.fdx, _11.fnm, _11.nvd, _11.nvm, _11.si,
_11.tvd, _11.tvx, _11_Lucene50_0.doc, _11_Lucene50_0.dvd,
_11_Lucene50_0.dvm, _11_Lucene50_0.pos, _11_Lucene50_0.tim,
_11_Lucene50_0.tip, _12.fdt, _12.fdx, _12.fnm, _12.nvd, _12.nvm, _12.si,
_12.tvd, _12.tvx, _12_Lucene50_0.doc, _12_Lucene50_0.dvd,
_12_Lucene50_0.dvm, _12_Lucene50_0.pos, _12_Lucene50_0.tim,
_12_Lucene50_0.tip, _13.fdt, _13.fdx, _13.fnm, _13.nvd, _13.nvm, _13.si,
_13.tvd, _13.tvx, _13_Lucene50_0.doc, _13_Lucene50_0.dvd,
_13_Lucene50_0.dvm, _13_Lucene50_0.pos, _13_Lucene50_0.tim,
_13_Lucene50_0.tip, _14.fdt, _14.fdx, _14.fnm, _14.nvd, _14.nvm, _14.si,
_14.tvd, _14.tvx, _14_Lucene50_0.doc, _14_Lucene50_0.dvd,
_14_Lucene50_0.dvm, _14_Lucene50_0.pos, _14_Lucene50_0.tim,
_14_Lucene50_0.tip, _15.fdt, _15.fdx, _15.fnm, _15.nvd, _15.nvm, _15.si,
_15.tvd, _15.tvx, _15_Lucene50_0.doc, _15_Lucene50_0.dvd,
_15_Lucene50_0.dvm, _15_Lucene50_0.pos, _15_Lucene50_0.tim,
_15_Lucene50_0.tip, _1f.fdt, _1f.fdx, _1f.fnm, _1f.nvd, _1f.nvm, _1f.si,
_1f.tvd, _1f.tvx, _1f_Lucene50_0.doc, _1f_Lucene50_0.dvd,
_1f_Lucene50_0.dvm, _1f_Lucene50_0.pos, _1f_Lucene50_0.tim,
_1f_Lucene50_0.tip, _1g.fdt, _1g.fdx, _1g.fnm, _1g.nvd, _1g.nvm, _1g.si,
_1g.tvd, _1g.tvx, _1g_Lucene50_0.doc, _1g_Lucene50_0.dvd,
_1g_Lucene50_0.dvm, _1g_Lucene50_0.pos, _1g_Lucene50_0.tim,
_1g_Lucene50_0.tip, _1h.fdt, _1h.fdx, _1h.fnm, _1h.nvd, _1h.nvm, _1h.si,
_1h.tvd, _1h.tvx, _1h_Lucene50_0.doc, _1h_Lucene50_0.dvd,
_1h_Lucene50_0.dvm, _1h_Lucene50_0.pos, _1h_Lucene50_0.tim,
_1h_Lucene50_0.tip, _1i.fdt, _1i.fdx, _1i.fnm, _1i.nvd, _1i.nvm, _1i.si,
_1i.tvd, _1i.tvx, _1i_Lucene50_0.doc, _1i_Lucene50_0.dvd,
_1i_Lucene50_0.dvm, _1i_Lucene50_0.pos, _1i_Lucene50_0.tim,
_1i_Lucene50_0.tip, _1j.fdt, _1j.fdx, _1j.fnm, _1j.nvd, _1j.nvm, _1j.si,
_1j.tvd, _1j.tvx, _1j_Lucene50_0.doc, _1j_Lucene50_0.dvd,
_1j_Lucene50_0.dvm, _1j_Lucene50_0.pos, _1j_Lucene50_0.tim,
_1j_Lucene50_0.tip, _1k.fdt, _1k.fdx, _1k.fnm, _1k.nvd, _1k.nvm, _1k.si,
_1k.tvd, _1k.tvx, _1k_Lucene50_0.doc, _1k_Lucene50_0.dvd,
_1k_Lucene50_0.dvm, _1k_Lucene50_0.pos, _1k_Lucene50_0.tim,
_1k_Lucene50_0.tip, _1l.fdt, _1l.fdx, _1l.fnm, _1l.nvd, _1l.nvm, _1l.si,
_1l.tvd, _1l.tvx, _1l_Lucene50_0.doc, _1l_Lucene50_0.dvd,
_1l_Lucene50_0.dvm, _1l_Lucene50_0.pos, _1l_Lucene50_0.tim,
_1l_Lucene50_0.tip, _1m.fdt, _1m.fdx, _1m.fnm, _1m.nvd, _1m.nvm, _1m.si,
_1m.tvd, _1m.tvx, _1m_Lucene50_0.doc, _1m_Lucene50_0.dvd,
_1m_Lucene50_0.dvm, _1m_Lucene50_0.pos, _1m_Lucene50_0.tim,
_1m_Lucene50_0.tip, _1n.fdt, _1n.fdx, _1n.fnm, _1n.nvd, _1n.nvm, _1n.si,
_1n.tvd, _1n.tvx, _1n_Lucene50_0.doc, _1n_Lucene50_

Re: Filtering in Solr

2015-03-31 Thread Shawn Heisey
On 3/31/2015 12:25 PM, Steven White wrote:
> I need filtering capability just as described here for Lucene:
> http://www.javaranch.com/journal/2009/02/filtering-a-lucene-search.html
>
> "Filtering is a mechanism of narrowing the search space, allowing only a
> subset of the documents to be considered as possible hits. They can be used
> to implement search-within-search features to successively search within a
> previous set of results *or to constrain the document search space for
> security or external data reasons.* A security filter is a powerful
> example, *allowing users to only see search results of documents they own
> even if their query technically matches other documents that are off
> limits;* we provide an example of a security filter in the section
> "Security filters".
>
> How do I get this behavior using Solr?

https://wiki.apache.org/solr/CommonQueryParameters#fq
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefq%28FilterQuery%29Parameter

Both of these include an explanation and examples.

The user-editable wiki talks about caching quite a lot, because when a
filter comes from the cache instead of being executed, it can REALLY
make things go faster.

Thanks,
Shawn



Re: Spark-Solr in python

2015-03-31 Thread Timothy Potter
You'll need a python lib that uses a python ZooKeeper client to be
SolrCloud-aware so that you can do RDD like things, such as reading
from all shards in a collection in parallel. I'm not aware of any Solr
py libs that are cloud-aware yet, but it would be a good contribution
to upgrade https://github.com/toastdriven/pysolr to be SolrCloud-aware

On Mon, Mar 30, 2015 at 11:31 PM, Chaushu, Shani
 wrote:
> Hi,
> I saw there is a tool for reading solr into Spark RDD in JAVA
> I want to do something like this in python, is there any package in python 
> for reading solr into spark RDD?
>
> Thanks ,
> Shani
>
>
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.


RE: Spark-Solr in python

2015-03-31 Thread Davis, Daniel (NIH/NLM) [C]
There is a pull request for that - 
https://github.com/toastdriven/pysolr/pull/138.   Depending on how you install 
Python modules, you could grab the cone for the feature, and run that version.

-Original Message-
From: Timothy Potter [mailto:thelabd...@gmail.com] 
Sent: Tuesday, March 31, 2015 4:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Spark-Solr in python

You'll need a python lib that uses a python ZooKeeper client to be 
SolrCloud-aware so that you can do RDD like things, such as reading from all 
shards in a collection in parallel. I'm not aware of any Solr py libs that are 
cloud-aware yet, but it would be a good contribution to upgrade 
https://github.com/toastdriven/pysolr to be SolrCloud-aware

On Mon, Mar 30, 2015 at 11:31 PM, Chaushu, Shani  
wrote:
> Hi,
> I saw there is a tool for reading solr into Spark RDD in JAVA I want 
> to do something like this in python, is there any package in python for 
> reading solr into spark RDD?
>
> Thanks ,
> Shani
>
>
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.


Stopwords magic

2015-03-31 Thread Alex Sylka
My stopwords don't works as expected.
Here is part of my schema:
 











 














 

In stopwords.txt I have next words: the, is, a;
Also I have next data in my fields:

deal_description - This is the my description
deal_title_terms - This is the deal title a terms (will be splitted in
terms)

When I try to search deal_description:
Example 1: "deal_description: *his is the m*" - I expect that document with
deal_description "This is the my description" will be returned
Example 2: "deal_description: *is th*" - I expect that nothing will be
found because "is" and "the" are stopwords.

When I try to search deal_title_terms:
Example 1: "deal_title_terms: *is*" - I expect that nothing will be found
because "is" is stopword.
Example 2: "deal_title_terms: *is the deal*" - I expect that "is" and "the"
will be ignored and term "deal" will be found.
Example 3: "deal_title_terms: *title a terms*" - I expect that "a" will be
ignored and term "title terms" will be found.

Question 1: Why stopwords don't works for "deal_description" field ?
Question 2: Why for field "deal_title_terms" stopwords not removed for my
query ?(When I am trying to find *title a terms* it will not find "title
terms" term)
Question 3: Is there any way to show stopwords in search result but prevent
them from searching ? Example:

data: This is cool search engine
search query : "*is coo*" -> return "This is cool search engine"
search query : "*is*" -> return nothing
search query : "*This coll*" -> return "This is cool search engine"

Question 4: *Where I can find detailed description (maybe with examples)
how stopwords works in solr ? Because it looks like magic.*


How to find out which fields a search came from

2015-03-31 Thread Steven White
Hi folks,

When I get my hits back from Solr, is there a way to find out into which
fields my search term matched in?

For example, if the indexed document is:

  doc_1:
title = From Russia with Love
director = Terence Young
starting = Sean Connery, Redro Amendariz, Lotte Lenya,
music_by = John Barry
doc_2:
title = Goldfinger
director = Guy Hamilton
starting = Sean Connery, Honor Blackman, Gert Frobe
music_by = John Barry
doc_3:
title = Skyfall
director = Sam Mendes
starting = Daniel Craig, Javier Bardem, Ralph Fiennes
music_by = Thomas Newman

If my search term is "love john barry guy", Solr will tell me I have a hit
in doc_1 and doc_2.  But what I also need to know in which field my search
terms match.  How can Solr tell me that doc_1::title and doc_1::music_by
and doc_2::music_by are where my search terms matched?

It looks to me that the highlighter does this, but I need this feature
without enabling the highlighter.

Thanks!

Steve


Re: Stopwords magic

2015-03-31 Thread Jack Krupansky
Use the Solr Admin UI analysis page to see how the text is analyzed at both
index and query time.

My e-book does have more narrative and examples for stop word processing:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

On Tue, Mar 31, 2015 at 5:41 PM, Alex Sylka  wrote:

> My stopwords don't works as expected.
> Here is part of my schema:
>  
> 
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
> 
> 
> 
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
> 
> 
> 
>  
> 
> 
> 
>  words="stopwords.txt" enablePositionIncrements="false"/>
> 
> 
>  outputUnigrams="true" outputUnigramsIfNoShingles="false"/>
> 
> 
> 
> 
>  words="stopwords.txt" enablePositionIncrements="false"/>
> 
> 
>   stored="false" required="false" multiValued="true"/>
>  stored="true" required="false" multiValued="false"/>
> In stopwords.txt I have next words: the, is, a;
> Also I have next data in my fields:
>
> deal_description - This is the my description
> deal_title_terms - This is the deal title a terms (will be splitted in
> terms)
>
> When I try to search deal_description:
> Example 1: "deal_description: *his is the m*" - I expect that document with
> deal_description "This is the my description" will be returned
> Example 2: "deal_description: *is th*" - I expect that nothing will be
> found because "is" and "the" are stopwords.
>
> When I try to search deal_title_terms:
> Example 1: "deal_title_terms: *is*" - I expect that nothing will be found
> because "is" is stopword.
> Example 2: "deal_title_terms: *is the deal*" - I expect that "is" and "the"
> will be ignored and term "deal" will be found.
> Example 3: "deal_title_terms: *title a terms*" - I expect that "a" will be
> ignored and term "title terms" will be found.
>
> Question 1: Why stopwords don't works for "deal_description" field ?
> Question 2: Why for field "deal_title_terms" stopwords not removed for my
> query ?(When I am trying to find *title a terms* it will not find "title
> terms" term)
> Question 3: Is there any way to show stopwords in search result but prevent
> them from searching ? Example:
>
> data: This is cool search engine
> search query : "*is coo*" -> return "This is cool search engine"
> search query : "*is*" -> return nothing
> search query : "*This coll*" -> return "This is cool search engine"
>
> Question 4: *Where I can find detailed description (maybe with examples)
> how stopwords works in solr ? Because it looks like magic.*
>


RE: How to find out which fields a search came from

2015-03-31 Thread Reitzel, Charles
Highlighting is the way to go.  Note, you have options to make it better suit 
your application.  e.g. You can control the delimiters the highlighter uses.   
You can also choose from a couple different implementations.   We have been 
able to use the highlight results, as is, to pull data from fields which match 
the query.  Works fine.

That said, Damian Dykman was asking for highlight results in object form 
recently.  And Simon (Rosenthal?) responded with a link to SOLR-4722, which 
includes a patch for such a highlighter.   Might be worth a look.

https://issues.apache.org/jira/browse/SOLR-4722


-Original Message-
From: Steven White [mailto:swhite4...@gmail.com] 
Sent: Tuesday, March 31, 2015 5:42 PM
To: solr-user@lucene.apache.org
Subject: How to find out which fields a search came from

Hi folks,

When I get my hits back from Solr, is there a way to find out into which fields 
my search term matched in?

For example, if the indexed document is:

  doc_1:
title = From Russia with Love
director = Terence Young
starting = Sean Connery, Redro Amendariz, Lotte Lenya,
music_by = John Barry
doc_2:
title = Goldfinger
director = Guy Hamilton
starting = Sean Connery, Honor Blackman, Gert Frobe
music_by = John Barry
doc_3:
title = Skyfall
director = Sam Mendes
starting = Daniel Craig, Javier Bardem, Ralph Fiennes
music_by = Thomas Newman

If my search term is "love john barry guy", Solr will tell me I have a hit in 
doc_1 and doc_2.  But what I also need to know in which field my search terms 
match.  How can Solr tell me that doc_1::title and doc_1::music_by and 
doc_2::music_by are where my search terms matched?

It looks to me that the highlighter does this, but I need this feature without 
enabling the highlighter.

Thanks!

Steve

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*


Re: how do you replicate solr-cloud between datacenters?

2015-03-31 Thread Timothy Ehlers
Yes, thank you.

On Tue, Mar 31, 2015 at 9:54 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> I got the answer to my most recent question without even asking it!
> Thanks
>
> -Original Message-
> From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
> Sent: Monday, March 30, 2015 6:40 PM
> To: solr-user@lucene.apache.org
> Subject: Re: how do you replicate solr-cloud between datacenters?
>
> That's an open issue. See:
> https://issues.apache.org/jira/browse/SOLR-6273
>
> -- Jack Krupansky
>
> On Mon, Mar 30, 2015 at 5:45 PM, Timothy Ehlers  wrote:
>
> > Can you use /replication ??? How would you do this between datacenters?
> >
> > --
> > Tim Ehlers
> >
>



-- 
Tim Ehlers


Re: Collapse and Expand behaviour on result with 1 document.

2015-03-31 Thread Derek Poh
There is only 1 document in the main result set.The expanded section is 
empty.



On 3/31/2015 7:37 PM, Joel Bernstein wrote:

You should be able to use collapse/expand with one result.

Does the document in the main result set have group members that aren't
being expanded?



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh  wrote:


If I want to group the results (by a certain field) even if there is only
1 document, I should use the group parameter instead?
The requirement is to group the result of product documents by their
supplier id.
"&group=true&group.field=P_SupplierId&group.limit=5"

Is it true that the performance of collapse is better than group parameter
on large data set, say 10-20 million documents?

-Derek


On 3/31/2015 10:03 AM, Joel Bernstein wrote:


The expanded section will only include groups that have expanded
documents.

So, if the document that in the main result set has no documents to
expand,
then this is working as expected.



Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh 
wrote:

  Hi

I have a query which return 1 document.
When I add the collapse and expand parameters to it,
"&expand=true&expand.rows=5&fq={!collapse%20field=P_SupplierId}", the
expanded section is empty ().

Is this the behaviour of collapse and expand parameters on result which
contain only 1 document?

-Derek








Re: Unable to perform search query after changing uniqueKey

2015-03-31 Thread Zheng Lin Edwin Yeo
Thanks Erick.

Yes, it is able to work correct if I do not use spaces for the field names,
especially for the uniqueKey.

Regards,
Edwin


On 31 March 2015 at 13:58, Erick Erickson  wrote:

> I would never put spaces in my field names! Frankly I have no clue
> what Solr does with that, but it can't be good. Solr explicitly
> supports Java naming conventions, camel case, underscores and numbers.
> Special symbols are frowned upon, I never use anything but upper case,
> lower case and underscores. Actually, I don't use upper case either
> but that's a personal preference. Other things might work, but only by
> chance.
>
> Best,
> Erick
>
> On Mon, Mar 30, 2015 at 8:59 PM, Zheng Lin Edwin Yeo
>  wrote:
> > Latest information that I've found for this is that the error only occurs
> > for shard2.
> >
> > If I do a search for just shard1, those records that are assigned to
> shard1
> > will be able to be displayed. Only when I search for shard2 will the
> > NullPointerException error occurs. Previously I was doing a search for
> both
> > shards.
> >
> > Is there any settings that I required to do for shard2 in order to solve
> > this issue? Currently I have not made any changes to the shards since I
> > created it using
> >
> http://localhost:8983/solr/admin/collections?action=CREATE&name=nps1&numShards=2&collection.configName=collection1
> >
> >
> > Regards,
> > Edwin
> >
> > On 31 March 2015 at 09:42, Zheng Lin Edwin Yeo 
> wrote:
> >
> >> Hi Erick,
> >>
> >> I've changed the uniqueKey from id to Item No.
> >>
> >> Item No
> >>
> >>
> >> Below are my definitions for both the id and Item No.
> >>
> >>  >> required="false" multiValued="false" />
> >> 
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On 30 March 2015 at 23:05, Erick Erickson 
> wrote:
> >>
> >>> Well, let's see the definition of your ID field, 'cause I'm puzzled.
> >>>
> >>> It's definitely A Bad Thing to have it be any kind of tokenized field
> >>> though, but that's a shot in the dark.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Mon, Mar 30, 2015 at 2:17 AM, Zheng Lin Edwin Yeo
> >>>  wrote:
> >>> > Hi Mostafa,
> >>> >
> >>> > Yes, I've defined all the fields in schema.xml. It is able to work on
> >>> the
> >>> > version without SolrCloud, but it is not working for the one with
> >>> SolrCloud.
> >>> > Both of them are using the same schema.xml.
> >>> >
> >>> > Regards,
> >>> > Edwin
> >>> >
> >>> >
> >>> >
> >>> > On 30 March 2015 at 14:34, Mostafa Gomaa 
> >>> wrote:
> >>> >
> >>> >> Hi Zheng,
> >>> >>
> >>> >> It's possible that there's a problem with your schema.xml. Are all
> >>> fields
> >>> >> defined and have appropriate options enabled?
> >>> >>
> >>> >> Regards,
> >>> >>
> >>> >> Mostafa.
> >>> >>
> >>> >> On Mon, Mar 30, 2015 at 7:49 AM, Zheng Lin Edwin Yeo <
> >>> edwinye...@gmail.com
> >>> >> >
> >>> >> wrote:
> >>> >>
> >>> >> > Hi Erick,
> >>> >> >
> >>> >> > I've tried that, and removed the data directory from both the
> >>> shards. But
> >>> >> > the same problem still occurs, so we probably can rule out the
> >>> "memory"
> >>> >> > issue.
> >>> >> >
> >>> >> > Regards,
> >>> >> > Edwin
> >>> >> >
> >>> >> > On 30 March 2015 at 12:39, Erick Erickson <
> erickerick...@gmail.com>
> >>> >> wrote:
> >>> >> >
> >>> >> > > I meant shut down Solr and physically remove the entire data
> >>> >> > > directory. Not saying this is the cure, but it can't hurt to
> rule
> >>> out
> >>> >> > > the index having "memory"...
> >>> >> > >
> >>> >> > > Best,
> >>> >> > > Erick
> >>> >> > >
> >>> >> > > On Sun, Mar 29, 2015 at 6:35 PM, Zheng Lin Edwin Yeo
> >>> >> > >  wrote:
> >>> >> > > > Hi Erick,
> >>> >> > > >
> >>> >> > > > I used the following query to delete all the index.
> >>> >> > > >
> >>> >> > > > http://localhost:8983/solr/update?stream.body=
> >>> >> > > *:*
> >>> >> > > http://localhost:8983/solr/update?stream.body=
> >>> >> > > >
> >>> >> > > >
> >>> >> > > > Or is it better to physically delete the entire data
> directory?
> >>> >> > > >
> >>> >> > > >
> >>> >> > > > Regards,
> >>> >> > > > Edwin
> >>> >> > > >
> >>> >> > > >
> >>> >> > > > On 28 March 2015 at 02:27, Erick Erickson <
> >>> erickerick...@gmail.com>
> >>> >> > > wrote:
> >>> >> > > >
> >>> >> > > >> You say you re-indexed, did you _completely_ remove the data
> >>> >> directory
> >>> >> > > >> first, i.e. the parent of the "index" and, maybe, "tlog"
> >>> >> directories?
> >>> >> > > >> I've occasionally seen remnants of old definitions "pollute"
> >>> the new
> >>> >> > > >> one, and since the  key is so fundamental I can
> see
> >>> it
> >>> >> > > >> being a problem.
> >>> >> > > >>
> >>> >> > > >> Best,
> >>> >> > > >> Erick
> >>> >> > > >>
> >>> >> > > >> On Fri, Mar 27, 2015 at 1:42 AM, Andrea Gazzarini <
> >>> >> > > a.gazzar...@gmail.com>
> >>> >> > > >> wrote:
> >>> >> > > >> > Hi Edwin,
> >>> >> > > >> > please provide some other detail about your context, (e.g.
> >>> >> complete
> >>> >> > > >> > stacktrace, query you're issuing)
> >>> >> > > >> >
> >>> >> 

Solr Cloud Security not working for internal authentication

2015-03-31 Thread Swaraj Kumar
I am trying to use Solr Security on Solr 5.0 Cloud. Following process I
have used :-

 1. Modifying web.xml :-


   AdminAllowedQueries
   /admin/*
 
 
   admin
   

 BASIC
 Solr Realm

 
 Admin
 admin   


   1.

   Changes in jetty.xml :-

  Solr
   Realm /etc/realm.properties 0   
   2.

   Creating realm.properties:- solradmin: solradmin,admin
   3.

   Set SOLR OPTS in solr.in.sh:-

   SOLR_OPTS="$SOLR_OPTS
   -DinternalAuthCredentialsBasicAuthUsername=solradmin" SOLR_OPTS="$SOLR_OPTS
   -DinternalAuthCredentialsBasicAuthPassword=solradmin"

I am getting Unauthorized error while creating collection using following
command:-

curl -i -X GET \
   -H "Authorization:Basic c29scmFkbWluOnNvbHJhZG1pbg==" \
 
'http://localhost:8080/solr/admin/collections?action=CREATE&name=test&collection.configName=testconf&numShards=1'

Kindly help or suggest the best to get this done.

Thanx in advance.


Regards,


Swaraj Kumar
Senior Software Engineer I
MakeMyTrip.com
✆ +91-9811774497


RE: Spark-Solr in python

2015-03-31 Thread Chaushu, Shani
There is a package of python with solr-cloud
https://pypi.python.org/pypi/solrcloudpy

but I don't know if there is possibility to connect it to spark


-Original Message-
From: Timothy Potter [mailto:thelabd...@gmail.com] 
Sent: Tuesday, March 31, 2015 23:15
To: solr-user@lucene.apache.org
Subject: Re: Spark-Solr in python

You'll need a python lib that uses a python ZooKeeper client to be 
SolrCloud-aware so that you can do RDD like things, such as reading from all 
shards in a collection in parallel. I'm not aware of any Solr py libs that are 
cloud-aware yet, but it would be a good contribution to upgrade 
https://github.com/toastdriven/pysolr to be SolrCloud-aware

On Mon, Mar 30, 2015 at 11:31 PM, Chaushu, Shani  
wrote:
> Hi,
> I saw there is a tool for reading solr into Spark RDD in JAVA I want 
> to do something like this in python, is there any package in python for 
> reading solr into spark RDD?
>
> Thanks ,
> Shani
>
>
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.