date:20130625

Re: Updating solrconfig and schema.xml for solrcloud in multicore setup

2013-06-25 Thread Jan Høydahl

Hi,

As I understand, your initial bootstrap works ok (boostrap_conf). What you want 
help with is *changing* the config on a live system.
That's when you are encouraged to use zkCli and don't mess with trying to let 
Solr bootstrap things - after all it's not a bootstrap anymore, it's a change.

Did you try updating schema.xml for a specific collection using zkCli? Any 
issues?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 11:24 skrev Utkarsh Sengar :

> But as when I launch a solr instance without "-Dbootstrap_conf=true", just
> once core is launched and I cannot see the other core.
> 
> This behavior is the same as Mark's reply here:
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3cbb7ad9bf-389b-4b94-8c1b-bbfc4028a...@gmail.com%3E
> 
> - bootstrap_conf: you pass it true and it reads solr.xml and uploads
> the conf set for each
> SolrCore it finds, gives the conf set the name of the collection and
> associates each collection
> with the same named config set.
> 
> So the first just lets you boot strap one collection easily...but what
> if you start with a
> multi-core, multi-collection setup that you want to bootstrap into
> SolrCloud? And they don't
> share a common config set? That's what the second command is for. You
> can setup 30 local SolrCores
> in solr.xml and then just bootstrap all 30 different config sets up
> and have them fully linked
> with each collection just by passing bootstrap_conf=true.
> 
> 
> 
> Note: I am using -Dbootstrap_conf=true and not -Dbootstrap_confdir
> 
> 
> Thanks,
> -Utkarsh
> 
> 
> On Tue, Jun 25, 2013 at 2:14 AM, Jan Høydahl  wrote:
> 
>> Hi,
>> 
>> The -Dbootstrap_confdir option is really only meant for a first-time
>> bootstrap for your development environment, not for serious use.
>> 
>> Once you got your config into ZK you should modify the config directly in
>> ZK.
>> There are many tools (also 3rd party) for this. But your best choice is
>> probably zkCli shipping with Solr.
>> See http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
>> This means you will NOT need to start Solr with -Dboostrap_confdir at all.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 25. juni 2013 kl. 10:29 skrev Utkarsh Sengar :
>> 
>>> Hello,
>>> 
>>> I am trying to update schema.xml for a core in a multicore setup and this
>>> is what I do to update it:
>>> 
>>> I have 3 nodes in my solr cluster.
>>> 
>>> 1. Pick node1 and manually update schema.xml
>>> 
>>> 2. Restart node1 with -Dbootstrap_conf=true
>>> java -Dsolr.solr.home=multicore -DnumShards=3 -Dbootstrap_conf=true
>>> -DzkHost=localhost:2181 -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar
>> start.jar
>>> 
>>> 3. Restart the other 2 nodes using this command (without
>>> -Dbootstrap_conf=true since these should pull from zk).:
>>> java -Dsolr.solr.home=multicore -DnumShards=3 -DzkHost=localhost:2181
>>> -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar
>>> 
>>> But, when I do that. node1 displays all of my cores and the other 2 nodes
>>> displays just one core.
>>> 
>>> Then, I found this:
>>> 
>> http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3cbb7ad9bf-389b-4b94-8c1b-bbfc4028a...@gmail.com%3E
>>> Which says bootstrap_conf is used for multicore setup.
>>> 
>>> 
>>> But if I use bootstrap_conf for every node, then I will have to manually
>>> update schema.xml (for any config file) everywhere? That does not sound
>>> like an efficient way of managing configuration right?
>>> 
>>> 
>>> --
>>> Thanks,
>>> -Utkarsh
>> 
>> 
> 
> 
> -- 
> Thanks,
> -Utkarsh

Re: Updating solrconfig and schema.xml for solrcloud in multicore setup

2013-06-25 Thread Utkarsh Sengar

Yes, I have tried zkCli and it works.
But I also need to restart solr after the schema change right?

I tried to reload the core, but I think there is an open bug where a core
reload is successful but a shard goes down for that core. I just tried it
out, i.e tried to reload a core after config change via zkCli and a shard
went down.

Since I am not able to reload a core, I am restarting the whole solr
process for make the change.

Thanks,
-Utkarsh


On Tue, Jun 25, 2013 at 2:46 AM, Jan Høydahl  wrote:

> Hi,
>
> As I understand, your initial bootstrap works ok (boostrap_conf). What you
> want help with is *changing* the config on a live system.
> That's when you are encouraged to use zkCli and don't mess with trying to
> let Solr bootstrap things - after all it's not a bootstrap anymore, it's a
> change.
>
> Did you try updating schema.xml for a specific collection using zkCli? Any
> issues?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 25. juni 2013 kl. 11:24 skrev Utkarsh Sengar :
>
> > But as when I launch a solr instance without "-Dbootstrap_conf=true",
> just
> > once core is launched and I cannot see the other core.
> >
> > This behavior is the same as Mark's reply here:
> >
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3cbb7ad9bf-389b-4b94-8c1b-bbfc4028a...@gmail.com%3E
> >
> > - bootstrap_conf: you pass it true and it reads solr.xml and uploads
> > the conf set for each
> > SolrCore it finds, gives the conf set the name of the collection and
> > associates each collection
> > with the same named config set.
> >
> > So the first just lets you boot strap one collection easily...but what
> > if you start with a
> > multi-core, multi-collection setup that you want to bootstrap into
> > SolrCloud? And they don't
> > share a common config set? That's what the second command is for. You
> > can setup 30 local SolrCores
> > in solr.xml and then just bootstrap all 30 different config sets up
> > and have them fully linked
> > with each collection just by passing bootstrap_conf=true.
> >
> >
> >
> > Note: I am using -Dbootstrap_conf=true and not -Dbootstrap_confdir
> >
> >
> > Thanks,
> > -Utkarsh
> >
> >
> > On Tue, Jun 25, 2013 at 2:14 AM, Jan Høydahl 
> wrote:
> >
> >> Hi,
> >>
> >> The -Dbootstrap_confdir option is really only meant for a first-time
> >> bootstrap for your development environment, not for serious use.
> >>
> >> Once you got your config into ZK you should modify the config directly
> in
> >> ZK.
> >> There are many tools (also 3rd party) for this. But your best choice is
> >> probably zkCli shipping with Solr.
> >> See http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
> >> This means you will NOT need to start Solr with -Dboostrap_confdir at
> all.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >> 25. juni 2013 kl. 10:29 skrev Utkarsh Sengar :
> >>
> >>> Hello,
> >>>
> >>> I am trying to update schema.xml for a core in a multicore setup and
> this
> >>> is what I do to update it:
> >>>
> >>> I have 3 nodes in my solr cluster.
> >>>
> >>> 1. Pick node1 and manually update schema.xml
> >>>
> >>> 2. Restart node1 with -Dbootstrap_conf=true
> >>> java -Dsolr.solr.home=multicore -DnumShards=3 -Dbootstrap_conf=true
> >>> -DzkHost=localhost:2181 -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar
> >> start.jar
> >>>
> >>> 3. Restart the other 2 nodes using this command (without
> >>> -Dbootstrap_conf=true since these should pull from zk).:
> >>> java -Dsolr.solr.home=multicore -DnumShards=3 -DzkHost=localhost:2181
> >>> -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar
> >>>
> >>> But, when I do that. node1 displays all of my cores and the other 2
> nodes
> >>> displays just one core.
> >>>
> >>> Then, I found this:
> >>>
> >>
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3cbb7ad9bf-389b-4b94-8c1b-bbfc4028a...@gmail.com%3E
> >>> Which says bootstrap_conf is used for multicore setup.
> >>>
> >>>
> >>> But if I use bootstrap_conf for every node, then I will have to
> manually
> >>> update schema.xml (for any config file) everywhere? That does not sound
> >>> like an efficient way of managing configuration right?
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> -Utkarsh
> >>
> >>
> >
> >
> > --
> > Thanks,
> > -Utkarsh
>
>


-- 
Thanks,
-Utkarsh

Re: Updating solrconfig and schema.xml for solrcloud in multicore setup

2013-06-25 Thread Utkarsh Sengar

I believe I am hitting this bug:
https://issues.apache.org/jira/browse/SOLR-4805
I am using solr 4.3.1


-Utkarsh


On Tue, Jun 25, 2013 at 2:56 AM, Utkarsh Sengar wrote:

> Yes, I have tried zkCli and it works.
> But I also need to restart solr after the schema change right?
>
> I tried to reload the core, but I think there is an open bug where a core
> reload is successful but a shard goes down for that core. I just tried it
> out, i.e tried to reload a core after config change via zkCli and a shard
> went down.
>
> Since I am not able to reload a core, I am restarting the whole solr
> process for make the change.
>
> Thanks,
> -Utkarsh
>
>
> On Tue, Jun 25, 2013 at 2:46 AM, Jan Høydahl wrote:
>
>> Hi,
>>
>> As I understand, your initial bootstrap works ok (boostrap_conf). What
>> you want help with is *changing* the config on a live system.
>> That's when you are encouraged to use zkCli and don't mess with trying to
>> let Solr bootstrap things - after all it's not a bootstrap anymore, it's a
>> change.
>>
>> Did you try updating schema.xml for a specific collection using zkCli?
>> Any issues?
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 25. juni 2013 kl. 11:24 skrev Utkarsh Sengar :
>>
>> > But as when I launch a solr instance without "-Dbootstrap_conf=true",
>> just
>> > once core is launched and I cannot see the other core.
>> >
>> > This behavior is the same as Mark's reply here:
>> >
>> http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3cbb7ad9bf-389b-4b94-8c1b-bbfc4028a...@gmail.com%3E
>> >
>> > - bootstrap_conf: you pass it true and it reads solr.xml and uploads
>> > the conf set for each
>> > SolrCore it finds, gives the conf set the name of the collection and
>> > associates each collection
>> > with the same named config set.
>> >
>> > So the first just lets you boot strap one collection easily...but what
>> > if you start with a
>> > multi-core, multi-collection setup that you want to bootstrap into
>> > SolrCloud? And they don't
>> > share a common config set? That's what the second command is for. You
>> > can setup 30 local SolrCores
>> > in solr.xml and then just bootstrap all 30 different config sets up
>> > and have them fully linked
>> > with each collection just by passing bootstrap_conf=true.
>> >
>> >
>> >
>> > Note: I am using -Dbootstrap_conf=true and not -Dbootstrap_confdir
>> >
>> >
>> > Thanks,
>> > -Utkarsh
>> >
>> >
>> > On Tue, Jun 25, 2013 at 2:14 AM, Jan Høydahl 
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> The -Dbootstrap_confdir option is really only meant for a first-time
>> >> bootstrap for your development environment, not for serious use.
>> >>
>> >> Once you got your config into ZK you should modify the config directly
>> in
>> >> ZK.
>> >> There are many tools (also 3rd party) for this. But your best choice is
>> >> probably zkCli shipping with Solr.
>> >> See http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
>> >> This means you will NOT need to start Solr with -Dboostrap_confdir at
>> all.
>> >>
>> >> --
>> >> Jan Høydahl, search solution architect
>> >> Cominvent AS - www.cominvent.com
>> >>
>> >> 25. juni 2013 kl. 10:29 skrev Utkarsh Sengar :
>> >>
>> >>> Hello,
>> >>>
>> >>> I am trying to update schema.xml for a core in a multicore setup and
>> this
>> >>> is what I do to update it:
>> >>>
>> >>> I have 3 nodes in my solr cluster.
>> >>>
>> >>> 1. Pick node1 and manually update schema.xml
>> >>>
>> >>> 2. Restart node1 with -Dbootstrap_conf=true
>> >>> java -Dsolr.solr.home=multicore -DnumShards=3 -Dbootstrap_conf=true
>> >>> -DzkHost=localhost:2181 -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar
>> >> start.jar
>> >>>
>> >>> 3. Restart the other 2 nodes using this command (without
>> >>> -Dbootstrap_conf=true since these should pull from zk).:
>> >>> java -Dsolr.solr.home=multicore -DnumShards=3 -DzkHost=localhost:2181
>> >>> -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar
>> >>>
>> >>> But, when I do that. node1 displays all of my cores and the other 2
>> nodes
>> >>> displays just one core.
>> >>>
>> >>> Then, I found this:
>> >>>
>> >>
>> http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3cbb7ad9bf-389b-4b94-8c1b-bbfc4028a...@gmail.com%3E
>> >>> Which says bootstrap_conf is used for multicore setup.
>> >>>
>> >>>
>> >>> But if I use bootstrap_conf for every node, then I will have to
>> manually
>> >>> update schema.xml (for any config file) everywhere? That does not
>> sound
>> >>> like an efficient way of managing configuration right?
>> >>>
>> >>>
>> >>> --
>> >>> Thanks,
>> >>> -Utkarsh
>> >>
>> >>
>> >
>> >
>> > --
>> > Thanks,
>> > -Utkarsh
>>
>>
>
>
> --
> Thanks,
> -Utkarsh
>



-- 
Thanks,
-Utkarsh

Pivot-Facets with ranges

2013-06-25 Thread Jakob Frank

Hi all,

is it possible using SOLR 4.3 to combine pivot-facets with (date) range facets?

Currently, what I get is sth. like


  

  date
  2001-06-19T20:31:12Z
  1
  


  date
  2001-06-20T09:40:35Z
  1
  

  cat
  public
  1

  

...
  


I'd like to use the same mechanism like in facet.range with
facet.range.start, facet.range.end and facet.range.gap. Is there
something similar to have a result like:


  

  date
  2001-06-01T00:00:00Z
  36
  

  cat
  public
  21


  cat
  private
  15

 


  date
  2001-07-01T00:00:00Z
  25
  

  cat
  public
  19


  cat
  private
  6

 
   
...
  


Thanks,
Jakob

URL search and indexing

2013-06-25 Thread Flavio Pompermaier

Hi to everybody,
I'm quite new to Solr so maybe my question could be trivial for you..
In my use case I have to index stuff contained in some URL so i use url as
key of my document and I treat it like a string.

However I'd like to be able to query by domain name, like *.it or *.
somesite.com, what's the best strategy? I tought to made a URL to path
transfromation and indexed using solr.PathHierarchyTokenizerFactory but
maybe there's a simpler solution..isn't it?

Best,
Flavio

-- 

Flavio Pompermaier
*Development Department
*___
*OKKAM**Srl **- www.okkam.it*

*Phone:* +(39) 0461 283 702
*Fax:* + (39) 0461 186 6433
*Email:* f.pomperma...@okkam.it
*Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
*Registered office:* Trento (Italy), via Segantini 23

Confidentially notice. This e-mail transmission may contain legally
privileged and/or confidential information. Please do not read it if you
are not the intended recipient(S). Any use, distribution, reproduction or
disclosure by any other person is strictly prohibited. If you have received
this e-mail in error, please notify the sender and destroy the original
transmission and its attachments without reading or saving it in any manner.

Re: URL search and indexing

2013-06-25 Thread Jan Høydahl

Probably a good match for the RegExp feature of Solr (given that your url is 
not tokenized)
e.g. q=url:/.*\.it$/

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 12:17 skrev Flavio Pompermaier :

> Hi to everybody,
> I'm quite new to Solr so maybe my question could be trivial for you..
> In my use case I have to index stuff contained in some URL so i use url as
> key of my document and I treat it like a string.
> 
> However I'd like to be able to query by domain name, like *.it or *.
> somesite.com, what's the best strategy? I tought to made a URL to path
> transfromation and indexed using solr.PathHierarchyTokenizerFactory but
> maybe there's a simpler solution..isn't it?
> 
> Best,
> Flavio
> 
> -- 
> 
> Flavio Pompermaier
> *Development Department
> *___
> *OKKAM**Srl **- www.okkam.it*
> 
> *Phone:* +(39) 0461 283 702
> *Fax:* + (39) 0461 186 6433
> *Email:* f.pomperma...@okkam.it
> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> *Registered office:* Trento (Italy), via Segantini 23
> 
> Confidentially notice. This e-mail transmission may contain legally
> privileged and/or confidential information. Please do not read it if you
> are not the intended recipient(S). Any use, distribution, reproduction or
> disclosure by any other person is strictly prohibited. If you have received
> this e-mail in error, please notify the sender and destroy the original
> transmission and its attachments without reading or saving it in any manner.

Several Machines Communication Failure

2013-06-25 Thread Ophir Michaeli

Hi,

I have a 2 Solr shards and 2 replicas running on the same machine ok.
When I try to put each shard/replica on another machine (and set the ips
accordingly) it fails, or work slowly, and fails sometimes.
Any explanation for this behavior? 

Thanks

Re: Book progress (Solr 4.x Deep Dive) - see my blog

2013-06-25 Thread Jack Krupansky

Please report any comments or issues to my email address or comment on my 
blog. Comments on the blog will benefit other readers, but the choice is 
yours.


Thanks!

-- Jack Krupansky

-Original Message- 
From: Bernd Fehling

Sent: Tuesday, June 25, 2013 2:06 AM
To: solr-user@lucene.apache.org
Subject: Re: Book progress (Solr 4.x Deep Dive) - see my blog


Am 24.06.2013 16:37, schrieb Jack Krupansky:
I won’t continue to bore annoy anybody on this list with tedious comments 
about my new Solr book on Lulu.com... please bookmark my blog, 
http://basetechnology.blogspot.com/, for further updates on the book.


The book itself is here:
http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21079719.html

Feel free to comment on the blog with any questions, issues, or mistakes 
in the book.


Thanks, especially for your support!

-- Jack Krupansky



Where to report errata and corrections to EAR #1?

Regards,
Bernd

Re: Solr indexer and Hadoop

2013-06-25 Thread Jack Krupansky

Solr does not have any integrated Hadoop/HDFS crawling or indexing support 
today. Sorry.


LucidWorks Search does have HDFS crawling support:
http://docs.lucidworks.com/display/lweug/Using+the+High+Volume+HDFS+Crawler

Cloudera Search has HDFS support as well.

-- Jack Krupansky

-Original Message- 
From: engy.morsy

Sent: Tuesday, June 25, 2013 3:14 AM
To: solr-user@lucene.apache.org
Subject: Solr indexer and Hadoop

Hi All,

I have TB of data that need to be indexed. I am trying to use hadoop to
index those TB. I am still newbie.
I thought that the Map function will read data from hard disks and the
reduce function will index them. The problem I am facing is how to read
those data from hard disks which are not HDFS.

I understand that the data to be indexed must be on HDFS, don't they? or I
am missing something here.

I can't convert the nodes on which the data resides to HDFS. Can anyone
please help.

I would also appreciate if you can provide a good tutorial for solr indexing
using hadoop. I googled alot but I did not find a sufficient one.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Pivot-Facets with ranges

2013-06-25 Thread Upayavira

You can only do this with some index time work. If you index the date
field rounded to the various levels you need, then you can pivot facet
on your rounded date. At present you will need to do this rounding in
your own indexing code before it gets near to Solr. However, I have
created some rounding UpdateProcessors [1] that would allow you to do
this within Solr itself.

It'd be great if these were included in 4.4 :-) (hint to committers)

Upayavira

[1] https://issues.apache.org/jira/browse/SOLR-4772



On Tue, Jun 25, 2013, at 11:14 AM, Jakob Frank wrote:
> Hi all,
> 
> is it possible using SOLR 4.3 to combine pivot-facets with (date) range
> facets?
> 
> Currently, what I get is sth. like
> 
> 
>   
> 
>   date
>   2001-06-19T20:31:12Z
>   1
>   
> 
> 
>   date
>   2001-06-20T09:40:35Z
>   1
>   
> 
>   cat
>   public
>   1
> 
>   
> 
> ...
>   
> 
> 
> I'd like to use the same mechanism like in facet.range with
> facet.range.start, facet.range.end and facet.range.gap. Is there
> something similar to have a result like:
> 
> 
>   
> 
>   date
>   2001-06-01T00:00:00Z
>   36
>   
> 
>   cat
>   public
>   21
> 
> 
>   cat
>   private
>   15
> 
>  
> 
> 
>   date
>   2001-07-01T00:00:00Z
>   25
>   
> 
>   cat
>   public
>   19
> 
> 
>   cat
>   private
>   6
> 
>  
>
> ...
>   
> 
> 
> Thanks,
> Jakob

Re: Pivot-Facets with ranges

2013-06-25 Thread Jack Krupansky

No, facet.pivot takes a comma-separated list of "fields", with no support 
for "ranges".


But, you can have a combination of field and range facets without pivoting.

-- Jack Krupansky

-Original Message- 
From: Jakob Frank

Sent: Tuesday, June 25, 2013 6:14 AM
To: solr-user@lucene.apache.org
Subject: Pivot-Facets with ranges

Hi all,

is it possible using SOLR 4.3 to combine pivot-facets with (date) range 
facets?


Currently, what I get is sth. like


 
   
 date
 2001-06-19T20:31:12Z
 1
 
   
   
 date
 2001-06-20T09:40:35Z
 1
 
   
 cat
 public
 1
   
 
   
   ...
 


I'd like to use the same mechanism like in facet.range with
facet.range.start, facet.range.end and facet.range.gap. Is there
something similar to have a result like:


 
   
 date
 2001-06-01T00:00:00Z
 36
 
   
 cat
 public
 21
   
   
 cat
 private
 15
   

   
   
 date
 2001-07-01T00:00:00Z
 25
 
   
 cat
 public
 19
   
   
 cat
 private
 6
   

  
   ...
 


Thanks,
Jakob

Re: Solr indexer and Hadoop

2013-06-25 Thread engy.morsy

Thank you Jack. So, I need to convert those nodes holding data to HDFS.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951p4073013.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: URL search and indexing

2013-06-25 Thread Flavio Pompermaier

Sorry but maybe I miss something here..could I declare url as key field and
query it too..?
At the moment, my schema.xml looks like:


 

   
   
  ...
   

 
 url

Is it ok? or should I add a "baseurl" field of some kind to be able to
query all url coming from a certain domain (1st or 2nd level as well)?

Best,
Flavio


On Tue, Jun 25, 2013 at 12:28 PM, Jan Høydahl  wrote:

> Probably a good match for the RegExp feature of Solr (given that your url
> is not tokenized)
> e.g. q=url:/.*\.it$/
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier :
>
> > Hi to everybody,
> > I'm quite new to Solr so maybe my question could be trivial for you..
> > In my use case I have to index stuff contained in some URL so i use url
> as
> > key of my document and I treat it like a string.
> >
> > However I'd like to be able to query by domain name, like *.it or *.
> > somesite.com, what's the best strategy? I tought to made a URL to path
> > transfromation and indexed using solr.PathHierarchyTokenizerFactory but
> > maybe there's a simpler solution..isn't it?
> >
> > Best,
> > Flavio
> >
> > --
> >
> > Flavio Pompermaier
> > *Development Department
> > *___
> > *OKKAM**Srl **- www.okkam.it*
> >
> > *Phone:* +(39) 0461 283 702
> > *Fax:* + (39) 0461 186 6433
> > *Email:* f.pomperma...@okkam.it
> > *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> > *Registered office:* Trento (Italy), via Segantini 23
> >
> > Confidentially notice. This e-mail transmission may contain legally
> > privileged and/or confidential information. Please do not read it if you
> > are not the intended recipient(S). Any use, distribution, reproduction or
> > disclosure by any other person is strictly prohibited. If you have
> received
> > this e-mail in error, please notify the sender and destroy the original
> > transmission and its attachments without reading or saving it in any
> manner.
>
>


-- 

Flavio Pompermaier
*Development Department
*___
*OKKAM**Srl **- www.okkam.it*

*Phone:* +(39) 0461 283 702
*Fax:* + (39) 0461 186 6433
*Email:* f.pomperma...@okkam.it
*Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
*Registered office:* Trento (Italy), via Segantini 23

Confidentially notice. This e-mail transmission may contain legally
privileged and/or confidential information. Please do not read it if you
are not the intended recipient(S). Any use, distribution, reproduction or
disclosure by any other person is strictly prohibited. If you have received
this e-mail in error, please notify the sender and destroy the original
transmission and its attachments without reading or saving it in any manner.

Re: Several Machines Communication Failure

2013-06-25 Thread Jan Høydahl

Hi,

We cannot help you based on this brief email.

Please provide a much more detailed description. Version of Solr, SolrCloud or 
not. How exactly have you done this move? Relevant configuration snippets, 
relevant log snippets of what goes wrong...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 12:36 skrev "Ophir Michaeli" :

> Hi,
> 
> I have a 2 Solr shards and 2 replicas running on the same machine ok.
> When I try to put each shard/replica on another machine (and set the ips
> accordingly) it fails, or work slowly, and fails sometimes.
> Any explanation for this behavior? 
> 
> Thanks
>

Re: URL search and indexing

2013-06-25 Thread Jan Høydahl

Sure you can query the url directly. Or if you choose you can split it up in 
multiple components, e.g. using 
http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 14:10 skrev Flavio Pompermaier :

> Sorry but maybe I miss something here..could I declare url as key field and
> query it too..?
> At the moment, my schema.xml looks like:
> 
> 
>  required="true" multiValued="false" />
> 
>   
>   
>  ...
>   
> 
> 
> url
> 
> Is it ok? or should I add a "baseurl" field of some kind to be able to
> query all url coming from a certain domain (1st or 2nd level as well)?
> 
> Best,
> Flavio
> 
> 
> On Tue, Jun 25, 2013 at 12:28 PM, Jan Høydahl  wrote:
> 
>> Probably a good match for the RegExp feature of Solr (given that your url
>> is not tokenized)
>> e.g. q=url:/.*\.it$/
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier :
>> 
>>> Hi to everybody,
>>> I'm quite new to Solr so maybe my question could be trivial for you..
>>> In my use case I have to index stuff contained in some URL so i use url
>> as
>>> key of my document and I treat it like a string.
>>> 
>>> However I'd like to be able to query by domain name, like *.it or *.
>>> somesite.com, what's the best strategy? I tought to made a URL to path
>>> transfromation and indexed using solr.PathHierarchyTokenizerFactory but
>>> maybe there's a simpler solution..isn't it?
>>> 
>>> Best,
>>> Flavio
>>> 
>>> --
>>> 
>>> Flavio Pompermaier
>>> *Development Department
>>> *___
>>> *OKKAM**Srl **- www.okkam.it*
>>> 
>>> *Phone:* +(39) 0461 283 702
>>> *Fax:* + (39) 0461 186 6433
>>> *Email:* f.pomperma...@okkam.it
>>> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
>>> *Registered office:* Trento (Italy), via Segantini 23
>>> 
>>> Confidentially notice. This e-mail transmission may contain legally
>>> privileged and/or confidential information. Please do not read it if you
>>> are not the intended recipient(S). Any use, distribution, reproduction or
>>> disclosure by any other person is strictly prohibited. If you have
>> received
>>> this e-mail in error, please notify the sender and destroy the original
>>> transmission and its attachments without reading or saving it in any
>> manner.
>> 
>> 
> 
> 
> -- 
> 
> Flavio Pompermaier
> *Development Department
> *___
> *OKKAM**Srl **- www.okkam.it*
> 
> *Phone:* +(39) 0461 283 702
> *Fax:* + (39) 0461 186 6433
> *Email:* f.pomperma...@okkam.it
> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> *Registered office:* Trento (Italy), via Segantini 23
> 
> Confidentially notice. This e-mail transmission may contain legally
> privileged and/or confidential information. Please do not read it if you
> are not the intended recipient(S). Any use, distribution, reproduction or
> disclosure by any other person is strictly prohibited. If you have received
> this e-mail in error, please notify the sender and destroy the original
> transmission and its attachments without reading or saving it in any manner.

Re: URL search and indexing

2013-06-25 Thread Flavio Pompermaier

That's sound exactly what I'm looking for! However I cannot find an example
of how to use it..could you help me please?
Moreover, about id field, isn't true that id field shouldn't be analyzed as
suggested in
http://wiki.apache.org/solr/UniqueKey#Text_field_in_the_document?


On Tue, Jun 25, 2013 at 2:47 PM, Jan Høydahl  wrote:

> Sure you can query the url directly. Or if you choose you can split it up
> in multiple components, e.g. using
> http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 25. juni 2013 kl. 14:10 skrev Flavio Pompermaier :
>
> > Sorry but maybe I miss something here..could I declare url as key field
> and
> > query it too..?
> > At the moment, my schema.xml looks like:
> >
> > 
> >  > required="true" multiValued="false" />
> >
> >   
> >   
> >  ...
> >   
> >
> > 
> > url
> >
> > Is it ok? or should I add a "baseurl" field of some kind to be able to
> > query all url coming from a certain domain (1st or 2nd level as well)?
> >
> > Best,
> > Flavio
> >
> >
> > On Tue, Jun 25, 2013 at 12:28 PM, Jan Høydahl 
> wrote:
> >
> >> Probably a good match for the RegExp feature of Solr (given that your
> url
> >> is not tokenized)
> >> e.g. q=url:/.*\.it$/
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >> 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier  >:
> >>
> >>> Hi to everybody,
> >>> I'm quite new to Solr so maybe my question could be trivial for you..
> >>> In my use case I have to index stuff contained in some URL so i use url
> >> as
> >>> key of my document and I treat it like a string.
> >>>
> >>> However I'd like to be able to query by domain name, like *.it or *.
> >>> somesite.com, what's the best strategy? I tought to made a URL to path
> >>> transfromation and indexed using solr.PathHierarchyTokenizerFactory but
> >>> maybe there's a simpler solution..isn't it?
> >>>
> >>> Best,
> >>> Flavio
> >>>
> >>> --
> >>>
> >>> Flavio Pompermaier
> >>> *Development Department
> >>> *___
> >>> *OKKAM**Srl **- www.okkam.it*
> >>>
> >>> *Phone:* +(39) 0461 283 702
> >>> *Fax:* + (39) 0461 186 6433
> >>> *Email:* f.pomperma...@okkam.it
> >>> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> >>> *Registered office:* Trento (Italy), via Segantini 23
> >>>
> >>> Confidentially notice. This e-mail transmission may contain legally
> >>> privileged and/or confidential information. Please do not read it if
> you
> >>> are not the intended recipient(S). Any use, distribution, reproduction
> or
> >>> disclosure by any other person is strictly prohibited. If you have
> >> received
> >>> this e-mail in error, please notify the sender and destroy the original
> >>> transmission and its attachments without reading or saving it in any
> >> manner.
> >>
> >>
> >
> >
> > --
> >
> > Flavio Pompermaier
> > *Development Department
> > *___
> > *OKKAM**Srl **- www.okkam.it*
> >
> > *Phone:* +(39) 0461 283 702
> > *Fax:* + (39) 0461 186 6433
> > *Email:* f.pomperma...@okkam.it
> > *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> > *Registered office:* Trento (Italy), via Segantini 23
> >
> > Confidentially notice. This e-mail transmission may contain legally
> > privileged and/or confidential information. Please do not read it if you
> > are not the intended recipient(S). Any use, distribution, reproduction or
> > disclosure by any other person is strictly prohibited. If you have
> received
> > this e-mail in error, please notify the sender and destroy the original
> > transmission and its attachments without reading or saving it in any
> manner.
>
>


-- 

Flavio Pompermaier
*Development Department
*___
*OKKAM**Srl **- www.okkam.it*

*Phone:* +(39) 0461 283 702
*Fax:* + (39) 0461 186 6433
*Email:* f.pomperma...@okkam.it
*Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
*Registered office:* Trento (Italy), via Segantini 23

Confidentially notice. This e-mail transmission may contain legally
privileged and/or confidential information. Please do not read it if you
are not the intended recipient(S). Any use, distribution, reproduction or
disclosure by any other person is strictly prohibited. If you have received
this e-mail in error, please notify the sender and destroy the original
transmission and its attachments without reading or saving it in any manner.

RE: Solr indexer and Hadoop

2013-06-25 Thread James Thomas

>> The problem I am facing is how to read those data from hard disks which are 
>> not HDFS

If you are planning to use a Map-Reduce job to do the indexing then the source 
data will definitely have to be on HDFS.
The Map function can transform the source data to Solr documents and send them 
to Solr  (e.g. via CloudSolrServer Java API) for indexing.

-- James

-Original Message-
From: engy.morsy [mailto:engy.mo...@bibalex.org] 
Sent: Tuesday, June 25, 2013 3:14 AM
To: solr-user@lucene.apache.org
Subject: Solr indexer and Hadoop

Hi All, 

I have TB of data that need to be indexed. I am trying to use hadoop to index 
those TB. I am still newbie. 
I thought that the Map function will read data from hard disks and the reduce 
function will index them. The problem I am facing is how to read those data 
from hard disks which are not HDFS. 

I understand that the data to be indexed must be on HDFS, don't they? or I am 
missing something here. 

I can't convert the nodes on which the data resides to HDFS. Can anyone please 
help.

I would also appreciate if you can provide a good tutorial for solr indexing 
using hadoop. I googled alot but I did not find a sufficient one. 
 
Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: URL search and indexing

2013-06-25 Thread Jack Krupansky


There are examples in my book:
http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21079719.html

But... I still think you should use a tokenized text field as well - use all 
three: raw string, tokenized text, and URL classification fields.


-- Jack Krupansky

-Original Message- 
From: Flavio Pompermaier

Sent: Tuesday, June 25, 2013 9:02 AM
To: solr-user@lucene.apache.org
Subject: Re: URL search and indexing

That's sound exactly what I'm looking for! However I cannot find an example
of how to use it..could you help me please?
Moreover, about id field, isn't true that id field shouldn't be analyzed as
suggested in
http://wiki.apache.org/solr/UniqueKey#Text_field_in_the_document?


On Tue, Jun 25, 2013 at 2:47 PM, Jan Høydahl  wrote:


Sure you can query the url directly. Or if you choose you can split it up
in multiple components, e.g. using
http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 14:10 skrev Flavio Pompermaier :

> Sorry but maybe I miss something here..could I declare url as key field
and
> query it too..?
> At the moment, my schema.xml looks like:
>
> 
>  required="true" multiValued="false" />
>
>   
>   
>  ...
>   
>
> 
> url
>
> Is it ok? or should I add a "baseurl" field of some kind to be able to
> query all url coming from a certain domain (1st or 2nd level as well)?
>
> Best,
> Flavio
>
>
> On Tue, Jun 25, 2013 at 12:28 PM, Jan Høydahl 
wrote:
>
>> Probably a good match for the RegExp feature of Solr (given that your
url
>> is not tokenized)
>> e.g. q=url:/.*\.it$/
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier :
>>
>>> Hi to everybody,
>>> I'm quite new to Solr so maybe my question could be trivial for you..
>>> In my use case I have to index stuff contained in some URL so i use 
>>> url

>> as
>>> key of my document and I treat it like a string.
>>>
>>> However I'd like to be able to query by domain name, like *.it or *.
>>> somesite.com, what's the best strategy? I tought to made a URL to path
>>> transfromation and indexed using solr.PathHierarchyTokenizerFactory 
>>> but

>>> maybe there's a simpler solution..isn't it?
>>>
>>> Best,
>>> Flavio
>>>
>>> --
>>>
>>> Flavio Pompermaier
>>> *Development Department
>>> *___
>>> *OKKAM**Srl **- www.okkam.it*
>>>
>>> *Phone:* +(39) 0461 283 702
>>> *Fax:* + (39) 0461 186 6433
>>> *Email:* f.pomperma...@okkam.it
>>> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
>>> *Registered office:* Trento (Italy), via Segantini 23
>>>
>>> Confidentially notice. This e-mail transmission may contain legally
>>> privileged and/or confidential information. Please do not read it if
you
>>> are not the intended recipient(S). Any use, distribution, reproduction
or
>>> disclosure by any other person is strictly prohibited. If you have
>> received
>>> this e-mail in error, please notify the sender and destroy the 
>>> original

>>> transmission and its attachments without reading or saving it in any
>> manner.
>>
>>
>
>
> --
>
> Flavio Pompermaier
> *Development Department
> *___
> *OKKAM**Srl **- www.okkam.it*
>
> *Phone:* +(39) 0461 283 702
> *Fax:* + (39) 0461 186 6433
> *Email:* f.pomperma...@okkam.it
> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> *Registered office:* Trento (Italy), via Segantini 23
>
> Confidentially notice. This e-mail transmission may contain legally
> privileged and/or confidential information. Please do not read it if you
> are not the intended recipient(S). Any use, distribution, reproduction 
> or

> disclosure by any other person is strictly prohibited. If you have
received
> this e-mail in error, please notify the sender and destroy the original
> transmission and its attachments without reading or saving it in any
manner.





--

Flavio Pompermaier
*Development Department
*___
*OKKAM**Srl **- www.okkam.it*

*Phone:* +(39) 0461 283 702
*Fax:* + (39) 0461 186 6433
*Email:* f.pomperma...@okkam.it
*Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
*Registered office:* Trento (Italy), via Segantini 23

Confidentially notice. This e-mail transmission may contain legally
privileged and/or confidential information. Please do not read it if you
are not the intended recipient(S). Any use, distribution, reproduction or
disclosure by any other person is strictly prohibited. If you have received
this e-mail in error, please notify the sender and destroy the original
transmission and its attachments without reading or saving it in any manner.

Re: Solr indexer and Hadoop

2013-06-25 Thread Otis Gospodnetic

But note that MapReduce and HDFS are not the only way to go.
For example, can you split your source data?  If you can, you could do
that, put them on N machines, and run indexer on all of them, each for
some number of threads.  Of course, your Solr(Cloud?) cluster better
have enough servers/CPU cores and fast enough disk and network to
handle the input and get maxed out only at a fairly high N.  What that
N should be depends on how quickly you need to index your 1 TB of
data.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jun 25, 2013 at 9:13 AM, James Thomas  wrote:
>>> The problem I am facing is how to read those data from hard disks which are 
>>> not HDFS
>
> If you are planning to use a Map-Reduce job to do the indexing then the 
> source data will definitely have to be on HDFS.
> The Map function can transform the source data to Solr documents and send 
> them to Solr  (e.g. via CloudSolrServer Java API) for indexing.
>
> -- James
>
> -Original Message-
> From: engy.morsy [mailto:engy.mo...@bibalex.org]
> Sent: Tuesday, June 25, 2013 3:14 AM
> To: solr-user@lucene.apache.org
> Subject: Solr indexer and Hadoop
>
> Hi All,
>
> I have TB of data that need to be indexed. I am trying to use hadoop to index 
> those TB. I am still newbie.
> I thought that the Map function will read data from hard disks and the reduce 
> function will index them. The problem I am facing is how to read those data 
> from hard disks which are not HDFS.
>
> I understand that the data to be indexed must be on HDFS, don't they? or I am 
> missing something here.
>
> I can't convert the nodes on which the data resides to HDFS. Can anyone 
> please help.
>
> I would also appreciate if you can provide a good tutorial for solr indexing 
> using hadoop. I googled alot but I did not find a sufficient one.
>
> Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr, Shards, multi cores and (reverse proxy)

2013-06-25 Thread medley

Thanks.

It is working now and the QTime has been divided by 10.

I would like to put the shard parameters in the requesthandler. I have one
solr-config.xml file by core.

Is it possible to have a common solr-config.xml file and in that case, a
common requesthandler ?

Regards
Medley



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shards-multi-cores-and-reverse-proxy-tp4072094p4073039.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can we use Solr to serve data to web analytics & Dashboard charts

2013-06-25 Thread pradeep kumar

Any help?
On 25 Jun 2013 13:35, "pradeep kumar"  wrote:

> Sure,
>
> First of all thanks a lot everyone for very quick reply.
>
> We have a Ordering system which has a lakhs of records so far in a
> normalized RDBMS tables, say Order, Item, Details etc. We are planning to
> have a offline database (star schema) and develop reports, data analytical
> charts with drill down, dashboard with data from offline database.
>
> I am planning to propose solr as a solution instead of a offline database
> ie through DIH to import data from DB into solr indexes. Since Solr indexes
> are stored denormalized manner and querying is faster, faceting search, i
> assumed that solr can be used to solve my requirement.
>
> Please correct me if i am wrong.
>
> Thanks,
> Pradeep
>
>
>
>
> On Tue, Jun 25, 2013 at 3:43 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> Yeah, perhaps yet people keep using it for this.  So, Pradeep, it
>> may work for you and if you share some numbers with us we may be able
>> to tell you "no way" or "very likely OK". :)
>>
>>
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Mon, Jun 24, 2013 at 4:14 PM, Walter Underwood 
>> wrote:
>> > I expect it won't be fast enough for general use. Most analytics stores
>> implement functions inside the server to aggregate large amounts of data.
>> There is always some query that returns the whole database in order to
>> calculate an average.
>> >
>> > I'm sure it will work fine for some things and for small data sets, but
>> it probably won't scale for most real analytics applications.
>> >
>> > wunder
>> >
>> > On Jun 24, 2013, at 12:47 PM, pradeep kumar wrote:
>> >
>> >> Hello everyone,
>> >>
>> >> Apart from text search, can we use Solr as data store to serve data to
>> form
>> >> analytics with drilldown charts or charts to add as widgets on
>> dashboards?
>> >>
>> >> Any suggestion, examples?
>> >>
>> >> Thanks,
>> >> Pradeep
>> >
>> >
>> >
>> >
>>
>
>

Re: Solr, Shards, multi cores and (reverse proxy)

2013-06-25 Thread Upayavira

Create a new RequestHandler config, say /distrib. Requests will be
forwarded to /select, which doesn't have the shards parameter, and
everything will be just fine.

Upayavira

On Tue, Jun 25, 2013, at 02:17 PM, medley wrote:
> Thanks.
> 
> It is working now and the QTime has been divided by 10.
> 
> I would like to put the shard parameters in the requesthandler. I have
> one
> solr-config.xml file by core.
> 
> Is it possible to have a common solr-config.xml file and in that case, a
> common requesthandler ?
> 
> Regards
> Medley
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Shards-multi-cores-and-reverse-proxy-tp4072094p4073039.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: URL search and indexing

2013-06-25 Thread Flavio Pompermaier

I bought the book and looking at the example I still don't understand if it
possible query all sub-urls of my URL.
For example, if the URLClassifyProcessorFactory takes in input "url_s":"
http://lucene.apache.org/solr/4_0_0/changes/Changes.html"; and makes some
outputs like
 - "url_domain_s":"lucene.apache.org"
 - "url_canonical_s":"
http://lucene.apache.org/solr/4_0_0/changes/Changes.html";
How should I configure url_domain_s in order to be able to makes query like
'*.apache.org'?
How should I configure url_canonical_s in order to be able to makes query
like 'http://lucene.apache.org/solr/*'?
Is it better to have two different fields for the two queries or could I
create just one field for the two kind of queries (obviously for the former
case then I should query something like *://.apache.org/*)?


On Tue, Jun 25, 2013 at 3:15 PM, Jack Krupansky wrote:

> There are examples in my book:
> http://www.lulu.com/shop/jack-**krupansky/solr-4x-deep-dive-**
> early-access-release-1/ebook/**product-21079719.html
>
> But... I still think you should use a tokenized text field as well - use
> all three: raw string, tokenized text, and URL classification fields.
>
> -- Jack Krupansky
>
> -Original Message- From: Flavio Pompermaier
> Sent: Tuesday, June 25, 2013 9:02 AM
> To: solr-user@lucene.apache.org
> Subject: Re: URL search and indexing
>
>
> That's sound exactly what I'm looking for! However I cannot find an example
> of how to use it..could you help me please?
> Moreover, about id field, isn't true that id field shouldn't be analyzed as
> suggested in
> http://wiki.apache.org/solr/**UniqueKey#Text_field_in_the_**document
> ?
>
>
> On Tue, Jun 25, 2013 at 2:47 PM, Jan Høydahl 
> wrote:
>
>  Sure you can query the url directly. Or if you choose you can split it up
>> in multiple components, e.g. using
>> http://lucene.apache.org/solr/**4_3_0/solr-core/org/apache/**
>> solr/update/processor/**URLClassifyProcessor.html
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 25. juni 2013 kl. 14:10 skrev Flavio Pompermaier :
>>
>> > Sorry but maybe I miss something here..could I declare url as key field
>> and
>> > query it too..?
>> > At the moment, my schema.xml looks like:
>> >
>> > 
>> > > > required="true" multiValued="false" />
>> >
>> >   
>> >   
>> >  ...
>> >   
>> >
>> > 
>> > url
>> >
>> > Is it ok? or should I add a "baseurl" field of some kind to be able to
>> > query all url coming from a certain domain (1st or 2nd level as well)?
>> >
>> > Best,
>> > Flavio
>> >
>> >
>> > On Tue, Jun 25, 2013 at 12:28 PM, Jan Høydahl 
>> wrote:
>> >
>> >> Probably a good match for the RegExp feature of Solr (given that your
>> url
>> >> is not tokenized)
>> >> e.g. q=url:/.*\.it$/
>> >>
>> >> --
>> >> Jan Høydahl, search solution architect
>> >> Cominvent AS - www.cominvent.com
>> >>
>> >> 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier > >:
>> >>
>> >>> Hi to everybody,
>> >>> I'm quite new to Solr so maybe my question could be trivial for you..
>> >>> In my use case I have to index stuff contained in some URL so i use
>> >>> url
>> >> as
>> >>> key of my document and I treat it like a string.
>> >>>
>> >>> However I'd like to be able to query by domain name, like *.it or *.
>> >>> somesite.com, what's the best strategy? I tought to made a URL to
>> path
>> >>> transfromation and indexed using solr.**PathHierarchyTokenizerFactory
>> >>> but
>> >>> maybe there's a simpler solution..isn't it?
>> >>>
>> >>> Best,
>> >>> Flavio
>> >>>
>> >>> --
>> >>>
>> >>> Flavio Pompermaier
>> >>> *Development Department
>> >>> *_**__
>> >>> *OKKAM**Srl **- www.okkam.it*
>> >>>
>> >>> *Phone:* +(39) 0461 283 702
>> >>> *Fax:* + (39) 0461 186 6433
>> >>> *Email:* f.pomperma...@okkam.it
>> >>> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
>> >>> *Registered office:* Trento (Italy), via Segantini 23
>> >>>
>> >>> Confidentially notice. This e-mail transmission may contain legally
>> >>> privileged and/or confidential information. Please do not read it if
>> you
>> >>> are not the intended recipient(S). Any use, distribution, reproduction
>> or
>> >>> disclosure by any other person is strictly prohibited. If you have
>> >> received
>> >>> this e-mail in error, please notify the sender and destroy the >>>
>> original
>> >>> transmission and its attachments without reading or saving it in any
>> >> manner.
>> >>
>> >>
>> >
>> >
>> > --
>> >
>> > Flavio Pompermaier
>> > *Development Department
>> > *_**__
>> > *OKKAM**Srl **- www.okkam.it*
>> >
>> > *Phone:* +(39) 0461 283 702
>> > *Fax:* + (39) 0461 186

Re: URL search and indexing

2013-06-25 Thread Jack Krupansky

As Jan indicates, your users could perform regular expression queries on a 
URL string field, but maybe you should tell us more about your use case and 
how your users really want to search.


One technique is to copy the URL to a tokenized text field. Then, users can 
search for names and sub-sequences that occur in the URL without the need 
for wildcards or regular expressions.


-- Jack Krupansky

-Original Message- 
From: Jan Høydahl

Sent: Tuesday, June 25, 2013 6:28 AM
To: solr-user@lucene.apache.org
Subject: Re: URL search and indexing

Probably a good match for the RegExp feature of Solr (given that your url is 
not tokenized)

e.g. q=url:/.*\.it$/

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 12:17 skrev Flavio Pompermaier :


Hi to everybody,
I'm quite new to Solr so maybe my question could be trivial for you..
In my use case I have to index stuff contained in some URL so i use url as
key of my document and I treat it like a string.

However I'd like to be able to query by domain name, like *.it or *.
somesite.com, what's the best strategy? I tought to made a URL to path
transfromation and indexed using solr.PathHierarchyTokenizerFactory but
maybe there's a simpler solution..isn't it?

Best,
Flavio

--

Flavio Pompermaier
*Development Department
*___
*OKKAM**Srl **- www.okkam.it*

*Phone:* +(39) 0461 283 702
*Fax:* + (39) 0461 186 6433
*Email:* f.pomperma...@okkam.it
*Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
*Registered office:* Trento (Italy), via Segantini 23

Confidentially notice. This e-mail transmission may contain legally
privileged and/or confidential information. Please do not read it if you
are not the intended recipient(S). Any use, distribution, reproduction or
disclosure by any other person is strictly prohibited. If you have 
received

this e-mail in error, please notify the sender and destroy the original
transmission and its attachments without reading or saving it in any 
manner.

Re: Solr indexer and Hadoop

2013-06-25 Thread Jack Krupansky


???

Hadoop=HDFS

If the data is not in Hadoop/HDFS, just use the normal Solr indexing tools, 
including SolrCell and Data Import Handler, and possibly ManifoldCF.


-- Jack Krupansky

-Original Message- 
From: engy.morsy

Sent: Tuesday, June 25, 2013 8:10 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr indexer and Hadoop

Thank you Jack. So, I need to convert those nodes holding data to HDFS.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951p4073013.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Help with synonyms

2013-06-25 Thread Peter Kirk

Thanks. I'm looking in to it.

Somehow it appears that the first line in the synonyms file is not registered 
as a synonym. Can this be correct, the first line is ignored?

/Peter



-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 24. juni 2013 15:22
To: solr-user@lucene.apache.org
Subject: Re: Help with synonyms

Try the Solr Admin UI analysis page and see how finagle and æggeblomme are 
analyzed at BOTH index and query time.

The "=>" rule does a replacement, while the pure comma rules support 
equivalence.

Your query-time and index-time analyzers need to be "compatible", which 
sometimes means that they can't be identical - since replacement rules mean 
that a term will not appear in the index.

-- Jack Krupansky

-Original Message-
From: Peter Kirk
Sent: Monday, June 24, 2013 4:10 AM
To: solr-user@lucene.apache.org
Subject: Help with synonyms

Hi

I have a synonyms file that looks like this:

finagle => æggeblomme
frumpy => spiste
canard, æggeblomme
corpse, spiste

(It's just an example, and has no real meaning).

The issue I don't understand is that a search for "finagle" does not find 
documents containing "æggeblomme" (which means "egg yolk").
But all the others find the corresponding words. (For example, "frumpy" 
finds "spiste", and "canard" finds "æggeblomme").

Why could this be?

Thanks,
Peter

Re: Can we use Solr to serve data to web analytics & Dashboard charts

2013-06-25 Thread Otis Gospodnetic

Hi Pradeep,

5-6 hours between email and "Any help?" == "not enough patience" :)

The advantage of something like Solr over RDBMS with star schema may
be that it is easier to scale horizontally than MySQL, or at least
that was the case I last looked at horizontal RDBMS partitioning.  But
if you are planning to have both RDBMS w/ star schema for reporting
AND Solr for reporting (via facets and such), that seems redundant.
You need just one of these two.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jun 25, 2013 at 9:50 AM, pradeep kumar  wrote:
> Any help?
> On 25 Jun 2013 13:35, "pradeep kumar"  wrote:
>
>> Sure,
>>
>> First of all thanks a lot everyone for very quick reply.
>>
>> We have a Ordering system which has a lakhs of records so far in a
>> normalized RDBMS tables, say Order, Item, Details etc. We are planning to
>> have a offline database (star schema) and develop reports, data analytical
>> charts with drill down, dashboard with data from offline database.
>>
>> I am planning to propose solr as a solution instead of a offline database
>> ie through DIH to import data from DB into solr indexes. Since Solr indexes
>> are stored denormalized manner and querying is faster, faceting search, i
>> assumed that solr can be used to solve my requirement.
>>
>> Please correct me if i am wrong.
>>
>> Thanks,
>> Pradeep
>>
>>
>>
>>
>> On Tue, Jun 25, 2013 at 3:43 AM, Otis Gospodnetic <
>> otis.gospodne...@gmail.com> wrote:
>>
>>> Yeah, perhaps yet people keep using it for this.  So, Pradeep, it
>>> may work for you and if you share some numbers with us we may be able
>>> to tell you "no way" or "very likely OK". :)
>>>
>>>
>>> Otis
>>> --
>>> Solr & ElasticSearch Support -- http://sematext.com/
>>> Performance Monitoring -- http://sematext.com/spm
>>>
>>>
>>>
>>> On Mon, Jun 24, 2013 at 4:14 PM, Walter Underwood 
>>> wrote:
>>> > I expect it won't be fast enough for general use. Most analytics stores
>>> implement functions inside the server to aggregate large amounts of data.
>>> There is always some query that returns the whole database in order to
>>> calculate an average.
>>> >
>>> > I'm sure it will work fine for some things and for small data sets, but
>>> it probably won't scale for most real analytics applications.
>>> >
>>> > wunder
>>> >
>>> > On Jun 24, 2013, at 12:47 PM, pradeep kumar wrote:
>>> >
>>> >> Hello everyone,
>>> >>
>>> >> Apart from text search, can we use Solr as data store to serve data to
>>> form
>>> >> analytics with drilldown charts or charts to add as widgets on
>>> dashboards?
>>> >>
>>> >> Any suggestion, examples?
>>> >>
>>> >> Thanks,
>>> >> Pradeep
>>> >
>>> >
>>> >
>>> >
>>>
>>
>>

Re: Solr indexer and Hadoop

2013-06-25 Thread Michael Della Bitta

Jack,

Sorry, but I don't agree that it's that cut and dried. I've very
successfully worked with terabytes of data in Hadoop that was stored on an
Isilon mounted via NFS, for example. In cases like this, you're using
MapReduce purely for it's execution model (which existed far before Hadoop
and HDFS ever did).


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Tue, Jun 25, 2013 at 8:58 AM, Jack Krupansky wrote:

> ???
>
> Hadoop=HDFS
>
> If the data is not in Hadoop/HDFS, just use the normal Solr indexing
> tools, including SolrCell and Data Import Handler, and possibly ManifoldCF.
>
>
> -- Jack Krupansky
>
> -Original Message- From: engy.morsy
> Sent: Tuesday, June 25, 2013 8:10 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr indexer and Hadoop
>
>
> Thank you Jack. So, I need to convert those nodes holding data to HDFS.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/Solr-indexer-and-**Hadoop-tp4072951p4073013.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: URL search and indexing

2013-06-25 Thread Flavio Pompermaier

Basically I have to design the solr document and I was thinking that
actually users could be more interested in filtering by domain (*.it or
*.com), however I cannot exclude more site-related queries (like '
http://lucene.apache.org/solr/*').
>From what I understood I should configure my schema.xml like:


   
   
   ...
 
url
   


  




  
  




  


Does the text_general field type fit to my needs on URLs? Or should I use a
more specific tokenizer?


On Tue, Jun 25, 2013 at 2:00 PM, Jack Krupansky wrote:

> As Jan indicates, your users could perform regular expression queries on a
> URL string field, but maybe you should tell us more about your use case and
> how your users really want to search.
>
> One technique is to copy the URL to a tokenized text field. Then, users
> can search for names and sub-sequences that occur in the URL without the
> need for wildcards or regular expressions.
>
> -- Jack Krupansky
>
> -Original Message- From: Jan Høydahl
> Sent: Tuesday, June 25, 2013 6:28 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: URL search and indexing
>
> Probably a good match for the RegExp feature of Solr (given that your url
> is not tokenized)
> e.g. q=url:/.*\.it$/
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier :
>
>  Hi to everybody,
>> I'm quite new to Solr so maybe my question could be trivial for you..
>> In my use case I have to index stuff contained in some URL so i use url as
>> key of my document and I treat it like a string.
>>
>> However I'd like to be able to query by domain name, like *.it or *.
>> somesite.com, what's the best strategy? I tought to made a URL to path
>> transfromation and indexed using solr.**PathHierarchyTokenizerFactory but
>> maybe there's a simpler solution..isn't it?
>>
>> Best,
>> Flavio
>>
>> --
>>
>> Flavio Pompermaier
>> *Development Department
>> *_**__
>> *OKKAM**Srl **- www.okkam.it*
>>
>> *Phone:* +(39) 0461 283 702
>> *Fax:* + (39) 0461 186 6433
>> *Email:* f.pomperma...@okkam.it
>> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
>> *Registered office:* Trento (Italy), via Segantini 23
>>
>> Confidentially notice. This e-mail transmission may contain legally
>> privileged and/or confidential information. Please do not read it if you
>> are not the intended recipient(S). Any use, distribution, reproduction or
>> disclosure by any other person is strictly prohibited. If you have
>> received
>> this e-mail in error, please notify the sender and destroy the original
>> transmission and its attachments without reading or saving it in any
>> manner.
>>
>
>


-- 

Flavio Pompermaier
*Development Department
*___
*OKKAM**Srl **- www.okkam.it*

*Phone:* +(39) 0461 283 702
*Fax:* + (39) 0461 186 6433
*Email:* f.pomperma...@okkam.it
*Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
*Registered office:* Trento (Italy), via Segantini 23

Confidentially notice. This e-mail transmission may contain legally
privileged and/or confidential information. Please do not read it if you
are not the intended recipient(S). Any use, distribution, reproduction or
disclosure by any other person is strictly prohibited. If you have received
this e-mail in error, please notify the sender and destroy the original
transmission and its attachments without reading or saving it in any manner.

RE: shardkey

2013-06-25 Thread Joshi, Shital

Thanks so much for answering! 

"it looks like you're doing time based sharding, and one would normally not use 
the compositeId router for that."

What would be the recommend router or alternative if we wanted to do time-based 
sharding? We are using business date to build composite key (it's a String, 
without timestamp) and per business date we're expecting about 3 million 
documents. 

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Friday, June 21, 2013 8:50 PM
To: solr-user@lucene.apache.org
Subject: Re: shardkey

On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital  wrote:
> But now Solr stores composite id in the document id

Correct, it's the document id itself that contains everything needed
for tje compositeId router to determine the hash.

> It would only use it to calculate hash key but while storing

compositeId routing is when it makes sense to make the routing part of
the unique id so that an id is all the information needed to find the
document in the cluster.  For example customer_id!document_name.  From
your example of 20130611!test_14 it looks like you're doing time based
sharding, and one would normally not use the compositeId router for
that.

-Yonik
http://lucidworks.com

Re: URL search and indexing

2013-06-25 Thread Erik Hatcher

If you want to query by domain, then index the domain (or just the last piece 
of it).  I'd suggest you somehow (either in your indexer code or via clever 
analysis tricks) peel off the last piece of the domain as its own string field 
so you get "com", "it", "edu", "gov", etc all as indexed values in a single 
field.

Erik

On Jun 25, 2013, at 10:37 , Flavio Pompermaier wrote:

> Basically I have to design the solr document and I was thinking that
> actually users could be more interested in filtering by domain (*.it or
> *.com), however I cannot exclude more site-related queries (like '
> http://lucene.apache.org/solr/*').
> From what I understood I should configure my schema.xml like:
> 
> 
>required="true" multiValued="false" />
>stored="false" multiValued="false"/>
>   ...
> 
> url
>   
> 
> positionIncrementGap="100">
>  
>
> words="stopwords.txt" enablePositionIncrements="true" />
>
>
>  
>  
>
> words="stopwords.txt" enablePositionIncrements="true" />
> ignoreCase="true" expand="true"/>
>
>  
>
> 
> Does the text_general field type fit to my needs on URLs? Or should I use a
> more specific tokenizer?
> 
> 
> On Tue, Jun 25, 2013 at 2:00 PM, Jack Krupansky 
> wrote:
> 
>> As Jan indicates, your users could perform regular expression queries on a
>> URL string field, but maybe you should tell us more about your use case and
>> how your users really want to search.
>> 
>> One technique is to copy the URL to a tokenized text field. Then, users
>> can search for names and sub-sequences that occur in the URL without the
>> need for wildcards or regular expressions.
>> 
>> -- Jack Krupansky
>> 
>> -Original Message- From: Jan Høydahl
>> Sent: Tuesday, June 25, 2013 6:28 AM
>> 
>> To: solr-user@lucene.apache.org
>> Subject: Re: URL search and indexing
>> 
>> Probably a good match for the RegExp feature of Solr (given that your url
>> is not tokenized)
>> e.g. q=url:/.*\.it$/
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier :
>> 
>> Hi to everybody,
>>> I'm quite new to Solr so maybe my question could be trivial for you..
>>> In my use case I have to index stuff contained in some URL so i use url as
>>> key of my document and I treat it like a string.
>>> 
>>> However I'd like to be able to query by domain name, like *.it or *.
>>> somesite.com, what's the best strategy? I tought to made a URL to path
>>> transfromation and indexed using solr.**PathHierarchyTokenizerFactory but
>>> maybe there's a simpler solution..isn't it?
>>> 
>>> Best,
>>> Flavio
>>> 
>>> --
>>> 
>>> Flavio Pompermaier
>>> *Development Department
>>> *_**__
>>> *OKKAM**Srl **- www.okkam.it*
>>> 
>>> *Phone:* +(39) 0461 283 702
>>> *Fax:* + (39) 0461 186 6433
>>> *Email:* f.pomperma...@okkam.it
>>> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
>>> *Registered office:* Trento (Italy), via Segantini 23
>>> 
>>> Confidentially notice. This e-mail transmission may contain legally
>>> privileged and/or confidential information. Please do not read it if you
>>> are not the intended recipient(S). Any use, distribution, reproduction or
>>> disclosure by any other person is strictly prohibited. If you have
>>> received
>>> this e-mail in error, please notify the sender and destroy the original
>>> transmission and its attachments without reading or saving it in any
>>> manner.
>>> 
>> 
>> 
> 
> 
> -- 
> 
> Flavio Pompermaier
> *Development Department
> *___
> *OKKAM**Srl **- www.okkam.it*
> 
> *Phone:* +(39) 0461 283 702
> *Fax:* + (39) 0461 186 6433
> *Email:* f.pomperma...@okkam.it
> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> *Registered office:* Trento (Italy), via Segantini 23
> 
> Confidentially notice. This e-mail transmission may contain legally
> privileged and/or confidential information. Please do not read it if you
> are not the intended recipient(S). Any use, distribution, reproduction or
> disclosure by any other person is strictly prohibited. If you have received
> this e-mail in error, please notify the sender and destroy the original
> transmission and its attachments without reading or saving it in any manner.

Re: Help with synonyms

2013-06-25 Thread Shawn Heisey

On 6/25/2013 8:25 AM, Peter Kirk wrote:
> Thanks. I'm looking in to it.
> 
> Somehow it appears that the first line in the synonyms file is not registered 
> as a synonym. Can this be correct, the first line is ignored?

I will look into this later and file an issue if necessary, but if this
is actually true, just put a comment (starting with the # character) as
the first line.

Thanks,
Shawn

Solr 4.3 problems with embedded jetty from maven cargo

2013-06-25 Thread Daniel Exner

Hi all,

I'm currently trying to build a adept my Solr Maven Project to version
4.3 but it keeps complaining about missing SLF4j jars.

The relevant part from my pom.xml looks like that:

> 
> org.codehaus.cargo
> cargo-maven2-plugin
> 1.3.1
> 
> 
> 
> jetty6x
> embedded
> 
> 
> ${basedir}/target/classes
> 
> ${solrPort}
> true
> 
> 
> 
> log4j
> log4j
> 
> 
> 
> org.slf4j
> slf4j-log4j12
> 
> 
> org.slf4j
> jul-to-slf4j
> 
> 
> org.slf4j
> jcl-over-slf4j
> 
> 
> org.slf4j
> slf4j-api
> 
> 
> 

If I run my maven goals with "-X" for debug I see things like:
> Classpath location = 
> [/home/dex/.m2/repository/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar]

So I think those jars are getting added to the classpath correctly.

I tried adding a log4j.xml file via property an see:
> log4j: Preferred configurator class: org.apache.log4j.xml.DOMConfigurator
> log4j: System property is :null
> log4j: Standard DocumentBuilderFactory search succeded.
> log4j: DocumentBuilderFactory is: 
> org.apache.xerces.jaxp.DocumentBuilderFactoryImpl
> log4j: debug attribute= "null".
> log4j: Ignoring debug attribute.
> log4j: Threshold ="null".
> log4j: Level value for root is  [info].
> log4j: root level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Setting property [target] to [System.err].
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%-5p %c{1} - %m%n].
> log4j: Adding appender named [console] to category [root].

Nevertheless I jetty is unable to deploy Solr.
Exactly the same config works if I just change Solr version to 4.2

I'm out of ideas now. Someone got some things I could try?

Greetings
Daniel
-- 
Daniel Exner
Softwareentwickler & Anwendungsbetreuer
ESEMOS GmbH

Leipziger-Str. 14
D-04442 Zwenkau
Telefon +49 34203433014
E-Mail d.ex...@esemos.de

www.esemos.de



Handelsregister: Amtsgericht Leipzig HRB 24025, Sitz der Gesellschaft:
Zwenkau

Steuernummer: 235/108/04503

USt.-ID: DE 259460837

Geschäftsführer: Ulrike Hübner, Peter Rieger, Dirk Scholz, Dr. Lutz
Tischendorf



Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die
unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen
ist nicht gestattet.



This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this email in
error) please notify the sender immediately and delete this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

Re: Can we use Solr to serve data to web analytics & Dashboard charts

2013-06-25 Thread Walter Underwood

With only 10K records (lahks), a regular RDBMS should be just fine. I don't see 
any need for Solr with a small dataset like that.

Increase the caches sizes on your RDBMS so that all the tables fit in memory. 
Even with 10Kbytes per record, that is only 100Mbytes of data.

wunder

On Jun 25, 2013, at 7:28 AM, Otis Gospodnetic wrote:

> Hi Pradeep,
> 
> 5-6 hours between email and "Any help?" == "not enough patience" :)
> 
> The advantage of something like Solr over RDBMS with star schema may
> be that it is easier to scale horizontally than MySQL, or at least
> that was the case I last looked at horizontal RDBMS partitioning.  But
> if you are planning to have both RDBMS w/ star schema for reporting
> AND Solr for reporting (via facets and such), that seems redundant.
> You need just one of these two.
> 
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
> 
> 
> 
> On Tue, Jun 25, 2013 at 9:50 AM, pradeep kumar  
> wrote:
>> Any help?
>> On 25 Jun 2013 13:35, "pradeep kumar"  wrote:
>> 
>>> Sure,
>>> 
>>> First of all thanks a lot everyone for very quick reply.
>>> 
>>> We have a Ordering system which has a lakhs of records so far in a
>>> normalized RDBMS tables, say Order, Item, Details etc. We are planning to
>>> have a offline database (star schema) and develop reports, data analytical
>>> charts with drill down, dashboard with data from offline database.
>>> 
>>> I am planning to propose solr as a solution instead of a offline database
>>> ie through DIH to import data from DB into solr indexes. Since Solr indexes
>>> are stored denormalized manner and querying is faster, faceting search, i
>>> assumed that solr can be used to solve my requirement.
>>> 
>>> Please correct me if i am wrong.
>>> 
>>> Thanks,
>>> Pradeep
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Jun 25, 2013 at 3:43 AM, Otis Gospodnetic <
>>> otis.gospodne...@gmail.com> wrote:
>>> 
 Yeah, perhaps yet people keep using it for this.  So, Pradeep, it
 may work for you and if you share some numbers with us we may be able
 to tell you "no way" or "very likely OK". :)
 
 
 Otis
 --
 Solr & ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm
 
 
 
 On Mon, Jun 24, 2013 at 4:14 PM, Walter Underwood 
 wrote:
> I expect it won't be fast enough for general use. Most analytics stores
 implement functions inside the server to aggregate large amounts of data.
 There is always some query that returns the whole database in order to
 calculate an average.
> 
> I'm sure it will work fine for some things and for small data sets, but
 it probably won't scale for most real analytics applications.
> 
> wunder
> 
> On Jun 24, 2013, at 12:47 PM, pradeep kumar wrote:
> 
>> Hello everyone,
>> 
>> Apart from text search, can we use Solr as data store to serve data to
 form
>> analytics with drilldown charts or charts to add as widgets on
 dashboards?
>> 
>> Any suggestion, examples?
>> 
>> Thanks,
>> Pradeep
> 
> 
> 
> 
 
>>> 
>>> 

--
Walter Underwood
wun...@wunderwood.org

Re: Can we use Solr to serve data to web analytics & Dashboard charts

2013-06-25 Thread pradeep kumar

Thanks for your reply Otis,

I think i have not explained clearly in my previous email.

We are thinking of 2 options for our new reports/analytics/dashboard
implementation.

*1st option:* Is to have offline database with star schema which makes
querying east for generating reports using any report engine like
Crystal,JasperReports. his was accepted by everyone in my technical team as
its a proven in other projects that we developed.

*2nd option: *I was impressed with Solr features at very high level and i
proposed Solr and i was asked to evaluate. In that process i am having few
doubts which i am asking in all my emails to solr user list.

Just FYI, we don't have to scale our data store horizontally as of now.
Along with reports/analytics we have a search with complex filters &
product comparison functionality(like any shopping store) coming soon .
This is the main reason which attracted me toward solr because of its
faceting search. But idea is to use same denormalized data. Either start
schema or solr indexes.

Just copying content form my previous email to give you background of what
we are developing:
==
We have a Ordering system which has a lakhs of records so far in a
normalized RDBMS tables, say Order, Item, Details etc. We are planning to
have a offline database (star schema) and develop reports, data analytical
charts with drill down, dashboard with data from offline database.

OR

I am planning to propose solr as a solution instead of a offline database
ie through DIH to import data from DB into solr indexes. Since Solr indexes
are stored denormalized manner and querying is faster, faceting search, i
assumed that solr can be used to solve my requirement.
===
general questions:

1. solr can be used only for test search and text analytics/highlighting or
can be used for requirements like what i have.
2. How can we achieve data analytics from solr?
3. how flexible is solr to handle changing fields in the indexes?
4. Can we do search on multiple indexes/documents?

Hope i made my point clear. And sorry for not being patient :)

Thanks,
Pradeep

On Tue, Jun 25, 2013 at 7:58 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi Pradeep,
>
> 5-6 hours between email and "Any help?" == "not enough patience" :)
>
> The advantage of something like Solr over RDBMS with star schema may
> be that it is easier to scale horizontally than MySQL, or at least
> that was the case I last looked at horizontal RDBMS partitioning.  But
> if you are planning to have both RDBMS w/ star schema for reporting
> AND Solr for reporting (via facets and such), that seems redundant.
> You need just one of these two.
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Tue, Jun 25, 2013 at 9:50 AM, pradeep kumar 
> wrote:
> > Any help?
> > On 25 Jun 2013 13:35, "pradeep kumar"  wrote:
> >
> >> Sure,
> >>
> >> First of all thanks a lot everyone for very quick reply.
> >>
> >> We have a Ordering system which has a lakhs of records so far in a
> >> normalized RDBMS tables, say Order, Item, Details etc. We are planning
> to
> >> have a offline database (star schema) and develop reports, data
> analytical
> >> charts with drill down, dashboard with data from offline database.
> >>
> >> I am planning to propose solr as a solution instead of a offline
> database
> >> ie through DIH to import data from DB into solr indexes. Since Solr
> indexes
> >> are stored denormalized manner and querying is faster, faceting search,
> i
> >> assumed that solr can be used to solve my requirement.
> >>
> >> Please correct me if i am wrong.
> >>
> >> Thanks,
> >> Pradeep
> >>
> >>
> >>
> >>
> >> On Tue, Jun 25, 2013 at 3:43 AM, Otis Gospodnetic <
> >> otis.gospodne...@gmail.com> wrote:
> >>
> >>> Yeah, perhaps yet people keep using it for this.  So, Pradeep, it
> >>> may work for you and if you share some numbers with us we may be able
> >>> to tell you "no way" or "very likely OK". :)
> >>>
> >>>
> >>> Otis
> >>> --
> >>> Solr & ElasticSearch Support -- http://sematext.com/
> >>> Performance Monitoring -- http://sematext.com/spm
> >>>
> >>>
> >>>
> >>> On Mon, Jun 24, 2013 at 4:14 PM, Walter Underwood <
> wun...@wunderwood.org>
> >>> wrote:
> >>> > I expect it won't be fast enough for general use. Most analytics
> stores
> >>> implement functions inside the server to aggregate large amounts of
> data.
> >>> There is always some query that returns the whole database in order to
> >>> calculate an average.
> >>> >
> >>> > I'm sure it will work fine for some things and for small data sets,
> but
> >>> it probably won't scale for most real analytics applications.
> >>> >
> >>> > wunder
> >>> >
> >>> > On Jun 24, 2013, at 12:47 PM, pradeep kumar wrote:
> >>> >
> >>> >> Hello everyone,
> >>> >>
> >>> >> Apart from text search, can we use Solr as data store to s

Re: Solr 4.3 problems with embedded jetty from maven cargo

2013-06-25 Thread Alexandre Rafalovitch

On Tue, Jun 25, 2013 at 11:03 AM, Daniel Exner  wrote:
> I'm currently trying to build a adept my Solr Maven Project to version
> 4.3 but it keeps complaining about missing SLF4j jars.

Have you gone through the page describing logging issues:
https://wiki.apache.org/solr/SolrLogging . It is been a long time
since I looked at Maven, so I cannot tell if your problem is pre- or
post- fixes the Wiki suggests.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

RE: Several Machines Communication Failure

2013-06-25 Thread Ophir Michaeli

Solr Vesrion: 4.3
Solr Cloud

Machine 1: 
running 2 shards - 
shard 1: java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
shard 2: java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar

Machine 2:
Running 2 replicas - 
Shard 1 replica: java -Djetty.port=8900 -DzkHost=machine_1_IP:9983 -jar
start.jar
Shard 2 replica: java -Djetty.port=7500 -DzkHost= machine_1_IP:9983 -jar
start.jar

Replica 1 is identified in the cloud.
Replica 2 has an error:  

ERROR - 2013-06-25 15:17:27.807; org.apache.solr.common.SolrException; 
Error while trying to recover.
core=collection1:org.apache.solr.client.solrj.SolrServerException: 
Server refused connection at: MACHINE_1_IP:7574/solr
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:406)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:180)
at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.
java:202)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)

Thanks

-Original Message-
From: Jan Høydahl [mailto:jan@cominvent.com] 
Sent: Tuesday, June 25, 2013 3:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Several Machines Communication Failure

Hi,

We cannot help you based on this brief email.

Please provide a much more detailed description. Version of Solr, SolrCloud
or not. How exactly have you done this move? Relevant configuration
snippets, relevant log snippets of what goes wrong...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 12:36 skrev "Ophir Michaeli" :

> Hi,
> 
> I have a 2 Solr shards and 2 replicas running on the same machine ok.
> When I try to put each shard/replica on another machine (and set the ips
> accordingly) it fails, or work slowly, and fails sometimes.
> Any explanation for this behavior? 
> 
> Thanks
>

Re: Can we use Solr to serve data to web analytics & Dashboard charts

2013-06-25 Thread pradeep kumar

Well.. Just FYI.. 10 lakhs in each normalized tables.. Query time to fetch
If linked together won't be big? And data is growing.
On 25 Jun 2013 20:49, "Walter Underwood"  wrote:

> With only 10K records (lahks), a regular RDBMS should be just fine. I
> don't see any need for Solr with a small dataset like that.
>
> Increase the caches sizes on your RDBMS so that all the tables fit in
> memory. Even with 10Kbytes per record, that is only 100Mbytes of data.
>
> wunder
>
> On Jun 25, 2013, at 7:28 AM, Otis Gospodnetic wrote:
>
> > Hi Pradeep,
> >
> > 5-6 hours between email and "Any help?" == "not enough patience" :)
> >
> > The advantage of something like Solr over RDBMS with star schema may
> > be that it is easier to scale horizontally than MySQL, or at least
> > that was the case I last looked at horizontal RDBMS partitioning.  But
> > if you are planning to have both RDBMS w/ star schema for reporting
> > AND Solr for reporting (via facets and such), that seems redundant.
> > You need just one of these two.
> >
> > Otis
> > --
> > Solr & ElasticSearch Support -- http://sematext.com/
> > Performance Monitoring -- http://sematext.com/spm
> >
> >
> >
> > On Tue, Jun 25, 2013 at 9:50 AM, pradeep kumar 
> wrote:
> >> Any help?
> >> On 25 Jun 2013 13:35, "pradeep kumar"  wrote:
> >>
> >>> Sure,
> >>>
> >>> First of all thanks a lot everyone for very quick reply.
> >>>
> >>> We have a Ordering system which has a lakhs of records so far in a
> >>> normalized RDBMS tables, say Order, Item, Details etc. We are planning
> to
> >>> have a offline database (star schema) and develop reports, data
> analytical
> >>> charts with drill down, dashboard with data from offline database.
> >>>
> >>> I am planning to propose solr as a solution instead of a offline
> database
> >>> ie through DIH to import data from DB into solr indexes. Since Solr
> indexes
> >>> are stored denormalized manner and querying is faster, faceting
> search, i
> >>> assumed that solr can be used to solve my requirement.
> >>>
> >>> Please correct me if i am wrong.
> >>>
> >>> Thanks,
> >>> Pradeep
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jun 25, 2013 at 3:43 AM, Otis Gospodnetic <
> >>> otis.gospodne...@gmail.com> wrote:
> >>>
>  Yeah, perhaps yet people keep using it for this.  So, Pradeep, it
>  may work for you and if you share some numbers with us we may be able
>  to tell you "no way" or "very likely OK". :)
> 
> 
>  Otis
>  --
>  Solr & ElasticSearch Support -- http://sematext.com/
>  Performance Monitoring -- http://sematext.com/spm
> 
> 
> 
>  On Mon, Jun 24, 2013 at 4:14 PM, Walter Underwood <
> wun...@wunderwood.org>
>  wrote:
> > I expect it won't be fast enough for general use. Most analytics
> stores
>  implement functions inside the server to aggregate large amounts of
> data.
>  There is always some query that returns the whole database in order to
>  calculate an average.
> >
> > I'm sure it will work fine for some things and for small data sets,
> but
>  it probably won't scale for most real analytics applications.
> >
> > wunder
> >
> > On Jun 24, 2013, at 12:47 PM, pradeep kumar wrote:
> >
> >> Hello everyone,
> >>
> >> Apart from text search, can we use Solr as data store to serve data
> to
>  form
> >> analytics with drilldown charts or charts to add as widgets on
>  dashboards?
> >>
> >> Any suggestion, examples?
> >>
> >> Thanks,
> >> Pradeep
> >
> >
> >
> >
> 
> >>>
> >>>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-25 Thread Vinay Pothnis

Jason and Scott,

Thanks for the replies and pointers!
Yes, I will consider the 'maxDocs' value as well. How do i monitor the
transaction logs during the interval between commits?

Thanks
Vinay


On Mon, Jun 24, 2013 at 8:48 PM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> Scott,
>
> My comment was meant to be a bit tongue-in-cheek, but my intent in the
> statement was to represent hard failure along the lines Vinay is seeing.
>  We're talking about OutOfMemoryException conditions, total cluster
> paralysis requiring restart, or other similar and disastrous conditions.
>
> Where that line is is impossible to generically define, but trivial to
> accomplish.  What any of us running Solr has to achieve is a realistic
> simulation of our desired production load (probably well above peak) and to
> see what limits are reached.  Armed with that information we tweak.  In
> this case, we look at finding the point where data ingestion reaches a
> natural limit.  For some that may be JVM GC, for others memory buffer size
> on the client load, and yet others it may be I/O limits on multithreaded
> reads from a database or file system.
>
> In old Solr days we had a little less to worry about.  We might play with
> a commitWithin parameter, ramBufferSizeMB tweaks, or contemplate partial
> commits and rollback recoveries.  But with 4.x we now have more durable
> write options and NRT to consider, and SolrCloud begs to use this.  So we
> have to consider transaction logs, the file handles they leave open until
> commit operations occur, and how we want to manage writing to all cores
> simultaneously instead of a more narrow master/slave relationship.
>
> It's all manageable, all predictable (with some load testing) and all
> filled with many possibilities to meet our specific needs.  Considering hat
> each person's data model, ingestion pipeline, request processors, and field
> analysis steps will be different, 5 threads of input at face value doesn't
> really contemplate the whole problem.  We have to measure our actual data
> against our expectations and find where the weak chain links are to
> strengthen them.  The symptoms aren't necessarily predictable in advance of
> this testing, but they're likely addressable and not difficult to decipher.
>
> For what it's worth, SolrCloud is new enough that we're still experiencing
> some "uncharted territory with unknown ramifications" but with continued
> dialog through channels like these there are fewer territories without good
> cartography :)
>
> Hope that's of use!
>
> Jason
>
>
>
> On Jun 24, 2013, at 7:12 PM, Scott Lundgren <
> scott.lundg...@carbonblack.com> wrote:
>
> > Jason,
> >
> > Regarding your statement "push you over the edge"- what does that mean?
> > Does it mean "uncharted territory with unknown ramifications" or
> something
> > more like specific, known symptoms?
> >
> > I ask because our use is similar to Vinay's in some respects, and we want
> > to be able to push the capabilities of write perf - but not over the
> edge!
> > In particular, I am interested in knowing the symptoms of failure, to
> help
> > us troubleshoot the underlying problems if and when they arise.
> >
> > Thanks,
> >
> > Scott
> >
> > On Monday, June 24, 2013, Jason Hellman wrote:
> >
> >> Vinay,
> >>
> >> You may wish to pay attention to how many transaction logs are being
> >> created along the way to your hard autoCommit, which should truncate the
> >> open handles for those files.  I might suggest setting a maxDocs value
> in
> >> parallel with your maxTime value (you can use both) to ensure the commit
> >> occurs at either breakpoint.  30 seconds is plenty of time for 5
> parallel
> >> processes of 20 document submissions to push you over the edge.
> >>
> >> Jason
> >>
> >> On Jun 24, 2013, at 2:21 PM, Vinay Pothnis  wrote:
> >>
> >>> I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds.
> >>>
> >>> On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman <
> >>> jhell...@innoventsolutions.com> wrote:
> >>>
>  Vinay,
> 
>  What autoCommit settings do you have for your indexing process?
> 
>  Jason
> 
>  On Jun 24, 2013, at 1:28 PM, Vinay Pothnis  wrote:
> 
> > Here is the ulimit -a output:
> >
> > core file size   (blocks, -c)  0  data seg size
>  (kbytes,
> > -d)  unlimited  scheduling priority  (-e)  0  file size
> >  (blocks, -f)  unlimited  pending signals
> > (-i)  179963  max locked memory(kbytes, -l)  64  max memory
> >> size
> >(kbytes, -m)  unlimited  open files   (-n)
> > 32769  pipe size (512 bytes, -p)  8  POSIX message queues
> >  (bytes,
> > -q)  819200  real-time priority   (-r)  0  stack size
> > (kbytes, -s)  10240  cpu time(seconds, -t)  unlimited
>  max
> > user processes   (-u)  14  virtual memory
>  (kbytes,
> > -v)  unlimi

SOLR online reference document - WIKI

2013-06-25 Thread Learner

I just came across a wonderful online reference wiki for SOLR and thought of
sharing it with the community..

https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-online-reference-document-WIKI-tp4073110.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to replicate Solr Cloud

2013-06-25 Thread Kevin Osborn

We are going to have two datacenters, each with their own SolrCloud and
ZooKeeper quorums. The end result will be that they should be replicas of
each other.

One method that has been mentioned is that we should add documents to each
cluster separately. For various reasons, this may not be ideal for us.
Instead, we are playing around with the idea of always indexing to one
datacenter. And then having that replicate to the other datacenter. And
this is where I am having some trouble on how to proceed.

The nice thing about SolrCloud is that there is no masters and slaves. Each
node is equals, has the same configs, etc. But in this case, I want to have
a node in one datacenter poll for changes in another data center. Before
SolrCloud, I would have used slave/master replication. But in the SolrCloud
world, I am not sure how to configure this setup?

Or is there any better ideas on how to use replication to push or pull data
from one datacenter to another?

In my case, NRT is not a requirement. And I will also be dealing with about
3 collections and 5 or 6 shards.

Thanks.

-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677  SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Is there a way to capture div tag by id?

2013-06-25 Thread eShard

let's say I have a div with id="myDiv"
Is there a way to set up the solr upate/extract handler to capture just that
particular div?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can we use Solr to serve data to web analytics & Dashboard charts

2013-06-25 Thread Shawn Heisey


On 6/25/2013 9:19 AM, Walter Underwood wrote:

With only 10K records (lahks), a regular RDBMS should be just fine. I don't see 
any need for Solr with a small dataset like that.

Increase the caches sizes on your RDBMS so that all the tables fit in memory. 
Even with 10Kbytes per record, that is only 100Mbytes of data.


Lakh is 10 ... so 10 of them is one million.

http://en.wikipedia.org/wiki/Lakh

Thanks,
Shawn

Re: how to replicate Solr Cloud

2013-06-25 Thread Otis Gospodnetic

I think what is needed is a Leader that, while being a Leader for its
own Slice in its local Cluster and Collection (I think I'm using all
the latest terminology correctly here), is at the same time a Replica
of its own Leader counterpart in the "Primary Cluster".

Not currently possible, AFAIK.
Or maybe there is a better way?

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jun 25, 2013 at 1:07 PM, Kevin Osborn  wrote:
> We are going to have two datacenters, each with their own SolrCloud and
> ZooKeeper quorums. The end result will be that they should be replicas of
> each other.
>
> One method that has been mentioned is that we should add documents to each
> cluster separately. For various reasons, this may not be ideal for us.
> Instead, we are playing around with the idea of always indexing to one
> datacenter. And then having that replicate to the other datacenter. And
> this is where I am having some trouble on how to proceed.
>
> The nice thing about SolrCloud is that there is no masters and slaves. Each
> node is equals, has the same configs, etc. But in this case, I want to have
> a node in one datacenter poll for changes in another data center. Before
> SolrCloud, I would have used slave/master replication. But in the SolrCloud
> world, I am not sure how to configure this setup?
>
> Or is there any better ideas on how to use replication to push or pull data
> from one datacenter to another?
>
> In my case, NRT is not a requirement. And I will also be dealing with about
> 3 collections and 5 or 6 shards.
>
> Thanks.
>
> --
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677  SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]

Re: Can we use Solr to serve data to web analytics & Dashboard charts

2013-06-25 Thread pradeep kumar

Solr is not a solution for my requirement? Please let me know

Thanks
Pradeep


On Tue, Jun 25, 2013 at 10:52 PM, Shawn Heisey  wrote:

> On 6/25/2013 9:19 AM, Walter Underwood wrote:
>
>> With only 10K records (lahks), a regular RDBMS should be just fine. I
>> don't see any need for Solr with a small dataset like that.
>>
>> Increase the caches sizes on your RDBMS so that all the tables fit in
>> memory. Even with 10Kbytes per record, that is only 100Mbytes of data.
>>
>
> Lakh is 10 ... so 10 of them is one million.
>
> http://en.wikipedia.org/wiki/**Lakh 
>
> Thanks,
> Shawn
>
>

Re: Can we use Solr to serve data to web analytics & Dashboard charts

2013-06-25 Thread Walter Underwood

Sorry, thinking of "man" from Japanese, which is 10K.

Using language-specific numbers in an international forum is not a good idea.

wunder

On Jun 25, 2013, at 10:22 AM, Shawn Heisey wrote:

> On 6/25/2013 9:19 AM, Walter Underwood wrote:
>> With only 10K records (lahks), a regular RDBMS should be just fine. I don't 
>> see any need for Solr with a small dataset like that.
>> 
>> Increase the caches sizes on your RDBMS so that all the tables fit in 
>> memory. Even with 10Kbytes per record, that is only 100Mbytes of data.
> 
> Lakh is 10 ... so 10 of them is one million.
> 
> http://en.wikipedia.org/wiki/Lakh
> 
> Thanks,
> Shawn
>

Re: Can we use Solr to serve data to web analytics & Dashboard charts

2013-06-25 Thread Walter Underwood

We do not know whether Solr will work for you. The only way to find out is to 
build it and try.

You already have a solution that works. Use that.

wunder

On Jun 25, 2013, at 10:28 AM, pradeep kumar wrote:

> Solr is not a solution for my requirement? Please let me know
> 
> Thanks
> Pradeep
> 
> 
> On Tue, Jun 25, 2013 at 10:52 PM, Shawn Heisey  wrote:
> 
>> On 6/25/2013 9:19 AM, Walter Underwood wrote:
>> 
>>> With only 10K records (lahks), a regular RDBMS should be just fine. I
>>> don't see any need for Solr with a small dataset like that.
>>> 
>>> Increase the caches sizes on your RDBMS so that all the tables fit in
>>> memory. Even with 10Kbytes per record, that is only 100Mbytes of data.
>>> 
>> 
>> Lakh is 10 ... so 10 of them is one million.
>> 
>> http://en.wikipedia.org/wiki/**Lakh 
>> 
>> Thanks,
>> Shawn
>> 
>> 

--
Walter Underwood
wun...@wunderwood.org

Re: Need Help in migrating Solr version 1.4 to 4.3

2013-06-25 Thread Erick Erickson

bq: I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes

Solr/Lucene explicitly try to read _one_ major revision backwards.
Solr 3.x should be able to read 1.4 indexes. Solr 4.x should be
able to read Solr 3.x. No attempt is made to allow Solr 4.x to read
Solr 1.4 indexes, so I wouldn't even try.

Shalin's comment is best. If at all possible I'd just forget about
reading the old index and re-index from scratch. But if you _do_
try upgrading 1.4 -> 3.x -> 4.x, you probably want to optimize
at each step. That'll (I think) rewrite all the segments in the
current format.

Good luck!
Erick

On Tue, Jun 25, 2013 at 12:59 AM, Shalin Shekhar Mangar
 wrote:
> You must carefully go through the upgrade instructions starting from
> 1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from
> 3.1 to 4.0 should be given special attention.
>
> On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta  wrote:
>> Hello All,
>>
>> We are planning to migrate solr 1.4 to Solr 4.3 version.
>> And I am seeking some help in this side.
>>
>> Considering Schema file change:
>> By default there are lots of changes if I compare original Solr 1.4 schema
>> file to Sol 4.3 schema file.
>> And that is the reason we are not copying paste of schema file.
>> In our Solr 1.4 schema implementation, we have some custom fields with type
>> "textgen" and "text"
>> So in migration of these custom fields to Solr 4.3,  should I use type of
>> "text_general" as replacement of "textgen" and
>> "text_en" as replacement of "text"?
>> Please confirm the same.
>
> Please check the text_general definition in 4.3 against the textgen
> fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en
> and text.
>
>>
>> Considering Solrconfig change:
>> As we didn't have lots of changes in 1.4 solrconfig file except the
>> dataimport request handler.
>> And therefore in migration side, we are simply modifying the Solr 4.3
>> solrconfig file with his request handler.
>
> And you need to add the dataimporthandler jar into Solr's lib
> directory. DIH is not added automatically anymore.
>
>>
>> Considering the application development:
>>
>> We used all the queries as BOOLEAN type style (was not good)  I mean put
>> all the parameter in query fields i.e
>> *:* AND EntityName: <<>> AND : AND .
>>
>> I think we should simplify our queries using other fields like df, qf 
>>
>
> Probably. AND queries are best done by filter queries (fq).
>
>> We also used to create Solr server object via CommonsHttpSolrServer() so I
>> am planning to use now HttpSolrServer API>
>
> Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in
> the javabin format so old clients using javabin won't be able to
> communicate with Solr until you upgrade both solr client and solr
> servers.
>
>>
>> Please let me know the suggestion for above points also what are the other
>> factors I need to take care while considering the migration.
>
> There is no substitute for reading the upgrade sections in the changes.txt.
>
> I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes. You
> will most likely need to re-index your documents.
>
> You should also think about switching to SolrCloud to take advantage
> of its features.
>
> --
> Regards,
> Shalin Shekhar Mangar.

Re: Shard identification

2013-06-25 Thread Erick Erickson

Try sending requests to your shards with &distrib=false. See if the
results agree with the SolrCloud graph or whether the docs you get
back are inconsistent with the shard labels in the admin page. The
&distrib=false bit keeps the query from going to other shards and
will tell you if the current state is consistent or not.

Best
Erick

On Tue, Jun 25, 2013 at 1:02 AM, Shalin Shekhar Mangar
 wrote:
> Firstly, using 1 zookeeper machine is not at all ideal. See
> http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7
>
> I've never personally seen such an issue. Can you give screen shots of
> the cloud graph on each node? Use an image hosting service because the
> mailing list won't allow attachments.
>
> On Tue, Jun 18, 2013 at 2:07 PM, Ophir Michaeli  wrote:
>> Hi,
>>
>> I built a 2 shards and 2 replicas system that works ok on a local machine, 1
>> zookeeper on shard 1.
>> It appears ok on the solar monitor page, cloud tab
>> (http://localhost:8983/solr/#/~cloud).
>> When I move to using different machines, each shard/replica on a different
>> machine I get a wrong cloud-graph on the Solr monitoring page.
>> The machine that has Shard 2 appears on the graph on shard 1, and the
>> replicas are also mixed, shard 2 appears as 1 and shard 1 appears as 2.
>>
>> Any ideas why this happens?
>>
>> Thanks,
>> Ophir
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

AW: Need Help in migrating Solr version 1.4 to 4.3

2013-06-25 Thread André Widhani

fwiw, I can confirm that Solr 4.x can definitely not read indexes created with 
1.4.

You'll get an exception like the following:

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format version 
is not supported (resource: segment _16ofy in resource 
ChecksumIndexInput(MMapIndexInput(path="/var/opt/dcx/solr2/core-tex60l254lpachcjhtz4se-index2/data/index/segments_1dlof"))):
 2.x. This version of Lucene only supports indexes created with release 3.0 and 
later.

But as Erick mentioned, you could get away with optimizing the index with 3.x 
instead of re-indexing from scratch before moving on to 4.x - I think I did 
that once and it worked.

Regards,
André

Von: Erick Erickson [erickerick...@gmail.com]
Gesendet: Dienstag, 25. Juni 2013 19:37
An: solr-user@lucene.apache.org
Betreff: Re: Need Help in migrating Solr version 1.4 to 4.3

bq: I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes

Solr/Lucene explicitly try to read _one_ major revision backwards.
Solr 3.x should be able to read 1.4 indexes. Solr 4.x should be
able to read Solr 3.x. No attempt is made to allow Solr 4.x to read
Solr 1.4 indexes, so I wouldn't even try.

Shalin's comment is best. If at all possible I'd just forget about
reading the old index and re-index from scratch. But if you _do_
try upgrading 1.4 -> 3.x -> 4.x, you probably want to optimize
at each step. That'll (I think) rewrite all the segments in the
current format.

Good luck!
Erick

On Tue, Jun 25, 2013 at 12:59 AM, Shalin Shekhar Mangar
 wrote:
> You must carefully go through the upgrade instructions starting from
> 1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from
> 3.1 to 4.0 should be given special attention.
>
> On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta  wrote:
>> Hello All,
>>
>> We are planning to migrate solr 1.4 to Solr 4.3 version.
>> And I am seeking some help in this side.
>>
>> Considering Schema file change:
>> By default there are lots of changes if I compare original Solr 1.4 schema
>> file to Sol 4.3 schema file.
>> And that is the reason we are not copying paste of schema file.
>> In our Solr 1.4 schema implementation, we have some custom fields with type
>> "textgen" and "text"
>> So in migration of these custom fields to Solr 4.3,  should I use type of
>> "text_general" as replacement of "textgen" and
>> "text_en" as replacement of "text"?
>> Please confirm the same.
>
> Please check the text_general definition in 4.3 against the textgen
> fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en
> and text.
>
>>
>> Considering Solrconfig change:
>> As we didn't have lots of changes in 1.4 solrconfig file except the
>> dataimport request handler.
>> And therefore in migration side, we are simply modifying the Solr 4.3
>> solrconfig file with his request handler.
>
> And you need to add the dataimporthandler jar into Solr's lib
> directory. DIH is not added automatically anymore.
>
>>
>> Considering the application development:
>>
>> We used all the queries as BOOLEAN type style (was not good)  I mean put
>> all the parameter in query fields i.e
>> *:* AND EntityName: <<>> AND : AND .
>>
>> I think we should simplify our queries using other fields like df, qf 
>>
>
> Probably. AND queries are best done by filter queries (fq).
>
>> We also used to create Solr server object via CommonsHttpSolrServer() so I
>> am planning to use now HttpSolrServer API>
>
> Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in
> the javabin format so old clients using javabin won't be able to
> communicate with Solr until you upgrade both solr client and solr
> servers.
>
>>
>> Please let me know the suggestion for above points also what are the other
>> factors I need to take care while considering the migration.
>
> There is no substitute for reading the upgrade sections in the changes.txt.
>
> I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes. You
> will most likely need to re-index your documents.
>
> You should also think about switching to SolrCloud to take advantage
> of its features.
>
> --
> Regards,
> Shalin Shekhar Mangar.

Re: Solr indexer and Hadoop

2013-06-25 Thread Erick Erickson

You might be interested in following:
https://issues.apache.org/jira/browse/SOLR-4916

Best
Erick

On Tue, Jun 25, 2013 at 7:28 AM, Michael Della Bitta
 wrote:
> Jack,
>
> Sorry, but I don't agree that it's that cut and dried. I've very
> successfully worked with terabytes of data in Hadoop that was stored on an
> Isilon mounted via NFS, for example. In cases like this, you're using
> MapReduce purely for it's execution model (which existed far before Hadoop
> and HDFS ever did).
>
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
>
>
> On Tue, Jun 25, 2013 at 8:58 AM, Jack Krupansky 
> wrote:
>
>> ???
>>
>> Hadoop=HDFS
>>
>> If the data is not in Hadoop/HDFS, just use the normal Solr indexing
>> tools, including SolrCell and Data Import Handler, and possibly ManifoldCF.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: engy.morsy
>> Sent: Tuesday, June 25, 2013 8:10 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr indexer and Hadoop
>>
>>
>> Thank you Jack. So, I need to convert those nodes holding data to HDFS.
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.**
>> nabble.com/Solr-indexer-and-**Hadoop-tp4072951p4073013.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Re: how to replicate Solr Cloud

2013-06-25 Thread Jason Hellman

Kevin,

I can imagine this working if you consider your second data center a pure slave 
relationship to your SolrCloud cluster.  I haven't tried it, but I don't see 
why the solrconfig.xml can't identify as a master allowing you to call any of 
your cores in the cluster to replicate out.  That being said, this idea doesn't 
facilitate a SolrCloud cluster in the second data center…just a slave that 
could be a repeater.

You say that sending the data in both directions is not idea, but it works and 
is conceptually very simple.  What is the reasoning behind wanting to get away 
from that approach?

Jason

On Jun 25, 2013, at 10:07 AM, Kevin Osborn  wrote:

> We are going to have two datacenters, each with their own SolrCloud and
> ZooKeeper quorums. The end result will be that they should be replicas of
> each other.
> 
> One method that has been mentioned is that we should add documents to each
> cluster separately. For various reasons, this may not be ideal for us.
> Instead, we are playing around with the idea of always indexing to one
> datacenter. And then having that replicate to the other datacenter. And
> this is where I am having some trouble on how to proceed.
> 
> The nice thing about SolrCloud is that there is no masters and slaves. Each
> node is equals, has the same configs, etc. But in this case, I want to have
> a node in one datacenter poll for changes in another data center. Before
> SolrCloud, I would have used slave/master replication. But in the SolrCloud
> world, I am not sure how to configure this setup?
> 
> Or is there any better ideas on how to use replication to push or pull data
> from one datacenter to another?
> 
> In my case, NRT is not a requirement. And I will also be dealing with about
> 3 collections and 5 or 6 shards.
> 
> Thanks.
> 
> -- 
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677  SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]

Re: how to replicate Solr Cloud

2013-06-25 Thread Kevin Osborn

Otis,

I did actually stumble upon this link.

http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/74870

This was from you. You were attempting to replicate data from SolrCloud to
some other slaves for heavy-duty queries. You said that you accomplished
this. Can you provide a few pointers on how you did this? Thanks.


On Tue, Jun 25, 2013 at 10:25 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> I think what is needed is a Leader that, while being a Leader for its
> own Slice in its local Cluster and Collection (I think I'm using all
> the latest terminology correctly here), is at the same time a Replica
> of its own Leader counterpart in the "Primary Cluster".
>
> Not currently possible, AFAIK.
> Or maybe there is a better way?
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Tue, Jun 25, 2013 at 1:07 PM, Kevin Osborn 
> wrote:
> > We are going to have two datacenters, each with their own SolrCloud and
> > ZooKeeper quorums. The end result will be that they should be replicas of
> > each other.
> >
> > One method that has been mentioned is that we should add documents to
> each
> > cluster separately. For various reasons, this may not be ideal for us.
> > Instead, we are playing around with the idea of always indexing to one
> > datacenter. And then having that replicate to the other datacenter. And
> > this is where I am having some trouble on how to proceed.
> >
> > The nice thing about SolrCloud is that there is no masters and slaves.
> Each
> > node is equals, has the same configs, etc. But in this case, I want to
> have
> > a node in one datacenter poll for changes in another data center. Before
> > SolrCloud, I would have used slave/master replication. But in the
> SolrCloud
> > world, I am not sure how to configure this setup?
> >
> > Or is there any better ideas on how to use replication to push or pull
> data
> > from one datacenter to another?
> >
> > In my case, NRT is not a requirement. And I will also be dealing with
> about
> > 3 collections and 5 or 6 shards.
> >
> > Thanks.
> >
> > --
> > *KEVIN OSBORN*
> > LEAD SOFTWARE ENGINEER
> > CNET Content Solutions
> > OFFICE 949.399.8714
> > CELL 949.310.4677  SKYPE osbornk
> > 5 Park Plaza, Suite 600, Irvine, CA 92614
> > [image: CNET Content Solutions]
>



-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677  SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Re: Solr indexer and Hadoop

2013-06-25 Thread Michael Della Bitta

zomghowcanihelp? :)

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Tue, Jun 25, 2013 at 2:08 PM, Erick Erickson wrote:

> You might be interested in following:
> https://issues.apache.org/jira/browse/SOLR-4916
>
> Best
> Erick
>
> On Tue, Jun 25, 2013 at 7:28 AM, Michael Della Bitta
>  wrote:
> > Jack,
> >
> > Sorry, but I don't agree that it's that cut and dried. I've very
> > successfully worked with terabytes of data in Hadoop that was stored on
> an
> > Isilon mounted via NFS, for example. In cases like this, you're using
> > MapReduce purely for it's execution model (which existed far before
> Hadoop
> > and HDFS ever did).
> >
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062  | c: +1 917 477 7906
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions  | g+:
> > plus.google.com/appinions
> > w: appinions.com 
> >
> >
> > On Tue, Jun 25, 2013 at 8:58 AM, Jack Krupansky  >wrote:
> >
> >> ???
> >>
> >> Hadoop=HDFS
> >>
> >> If the data is not in Hadoop/HDFS, just use the normal Solr indexing
> >> tools, including SolrCell and Data Import Handler, and possibly
> ManifoldCF.
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: engy.morsy
> >> Sent: Tuesday, June 25, 2013 8:10 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Solr indexer and Hadoop
> >>
> >>
> >> Thank you Jack. So, I need to convert those nodes holding data to HDFS.
> >>
> >>
> >>
> >> --
> >> View this message in context: http://lucene.472066.n3.**
> >> nabble.com/Solr-indexer-and-**Hadoop-tp4072951p4073013.html<
> http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951p4073013.html
> >
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>

Re: how to replicate Solr Cloud

2013-06-25 Thread Kevin Osborn

Jason,

My initial reluctance to indexing directly to both data centers is that we
are doing a lot of bulk loading through CSV handler. We never get just 1
document at a time. It comes in large batch updates. And now we would have
to send the batch updates twice.

That is not to say that we won't go this way. But I am exploring other
solutions as well.


On Tue, Jun 25, 2013 at 11:21 AM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> Kevin,
>
> I can imagine this working if you consider your second data center a pure
> slave relationship to your SolrCloud cluster.  I haven't tried it, but I
> don't see why the solrconfig.xml can't identify as a master allowing you to
> call any of your cores in the cluster to replicate out.  That being said,
> this idea doesn't facilitate a SolrCloud cluster in the second data
> center…just a slave that could be a repeater.
>
> You say that sending the data in both directions is not idea, but it works
> and is conceptually very simple.  What is the reasoning behind wanting to
> get away from that approach?
>
> Jason
>
> On Jun 25, 2013, at 10:07 AM, Kevin Osborn  wrote:
>
> > We are going to have two datacenters, each with their own SolrCloud and
> > ZooKeeper quorums. The end result will be that they should be replicas of
> > each other.
> >
> > One method that has been mentioned is that we should add documents to
> each
> > cluster separately. For various reasons, this may not be ideal for us.
> > Instead, we are playing around with the idea of always indexing to one
> > datacenter. And then having that replicate to the other datacenter. And
> > this is where I am having some trouble on how to proceed.
> >
> > The nice thing about SolrCloud is that there is no masters and slaves.
> Each
> > node is equals, has the same configs, etc. But in this case, I want to
> have
> > a node in one datacenter poll for changes in another data center. Before
> > SolrCloud, I would have used slave/master replication. But in the
> SolrCloud
> > world, I am not sure how to configure this setup?
> >
> > Or is there any better ideas on how to use replication to push or pull
> data
> > from one datacenter to another?
> >
> > In my case, NRT is not a requirement. And I will also be dealing with
> about
> > 3 collections and 5 or 6 shards.
> >
> > Thanks.
> >
> > --
> > *KEVIN OSBORN*
> > LEAD SOFTWARE ENGINEER
> > CNET Content Solutions
> > OFFICE 949.399.8714
> > CELL 949.310.4677  SKYPE osbornk
> > 5 Park Plaza, Suite 600, Irvine, CA 92614
> > [image: CNET Content Solutions]
>
>


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677  SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Re: how to replicate Solr Cloud

2013-06-25 Thread Walter Underwood

Also, you have to track two sets of batches, failures, and retries.  --wunder


On Jun 25, 2013, at 11:30 AM, Kevin Osborn wrote:

> Jason,
> 
> My initial reluctance to indexing directly to both data centers is that we
> are doing a lot of bulk loading through CSV handler. We never get just 1
> document at a time. It comes in large batch updates. And now we would have
> to send the batch updates twice.
> 
> That is not to say that we won't go this way. But I am exploring other
> solutions as well.
> 
> 
> On Tue, Jun 25, 2013 at 11:21 AM, Jason Hellman <
> jhell...@innoventsolutions.com> wrote:
> 
>> Kevin,
>> 
>> I can imagine this working if you consider your second data center a pure
>> slave relationship to your SolrCloud cluster.  I haven't tried it, but I
>> don't see why the solrconfig.xml can't identify as a master allowing you to
>> call any of your cores in the cluster to replicate out.  That being said,
>> this idea doesn't facilitate a SolrCloud cluster in the second data
>> center…just a slave that could be a repeater.
>> 
>> You say that sending the data in both directions is not idea, but it works
>> and is conceptually very simple.  What is the reasoning behind wanting to
>> get away from that approach?
>> 
>> Jason
>> 
>> On Jun 25, 2013, at 10:07 AM, Kevin Osborn  wrote:
>> 
>>> We are going to have two datacenters, each with their own SolrCloud and
>>> ZooKeeper quorums. The end result will be that they should be replicas of
>>> each other.
>>> 
>>> One method that has been mentioned is that we should add documents to
>> each
>>> cluster separately. For various reasons, this may not be ideal for us.
>>> Instead, we are playing around with the idea of always indexing to one
>>> datacenter. And then having that replicate to the other datacenter. And
>>> this is where I am having some trouble on how to proceed.
>>> 
>>> The nice thing about SolrCloud is that there is no masters and slaves.
>> Each
>>> node is equals, has the same configs, etc. But in this case, I want to
>> have
>>> a node in one datacenter poll for changes in another data center. Before
>>> SolrCloud, I would have used slave/master replication. But in the
>> SolrCloud
>>> world, I am not sure how to configure this setup?
>>> 
>>> Or is there any better ideas on how to use replication to push or pull
>> data
>>> from one datacenter to another?
>>> 
>>> In my case, NRT is not a requirement. And I will also be dealing with
>> about
>>> 3 collections and 5 or 6 shards.
>>> 
>>> Thanks.
>>> 
>>> --
>>> *KEVIN OSBORN*
>>> LEAD SOFTWARE ENGINEER
>>> CNET Content Solutions
>>> OFFICE 949.399.8714
>>> CELL 949.310.4677  SKYPE osbornk
>>> 5 Park Plaza, Suite 600, Irvine, CA 92614
>>> [image: CNET Content Solutions]
>> 
>> 
> 
> 
> -- 
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677  SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]

--
Walter Underwood
wun...@wunderwood.org

Re: Name of the couple of popular app/web sites using solar as search engine

2013-06-25 Thread Learner

Check the list here..

http://wiki.apache.org/solr/PublicServers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Name-of-the-couple-of-popular-app-web-sites-using-solar-as-search-engine-tp4073157p4073160.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to replicate Solr Cloud

2013-06-25 Thread Otis Gospodnetic

Uh, I remember that email, but can't recall where we did it will
try to recall it some more and reply if I can manage to dig it out of
my brain...

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jun 25, 2013 at 2:24 PM, Kevin Osborn  wrote:
> Otis,
>
> I did actually stumble upon this link.
>
> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/74870
>
> This was from you. You were attempting to replicate data from SolrCloud to
> some other slaves for heavy-duty queries. You said that you accomplished
> this. Can you provide a few pointers on how you did this? Thanks.
>
>
> On Tue, Jun 25, 2013 at 10:25 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> I think what is needed is a Leader that, while being a Leader for its
>> own Slice in its local Cluster and Collection (I think I'm using all
>> the latest terminology correctly here), is at the same time a Replica
>> of its own Leader counterpart in the "Primary Cluster".
>>
>> Not currently possible, AFAIK.
>> Or maybe there is a better way?
>>
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Tue, Jun 25, 2013 at 1:07 PM, Kevin Osborn 
>> wrote:
>> > We are going to have two datacenters, each with their own SolrCloud and
>> > ZooKeeper quorums. The end result will be that they should be replicas of
>> > each other.
>> >
>> > One method that has been mentioned is that we should add documents to
>> each
>> > cluster separately. For various reasons, this may not be ideal for us.
>> > Instead, we are playing around with the idea of always indexing to one
>> > datacenter. And then having that replicate to the other datacenter. And
>> > this is where I am having some trouble on how to proceed.
>> >
>> > The nice thing about SolrCloud is that there is no masters and slaves.
>> Each
>> > node is equals, has the same configs, etc. But in this case, I want to
>> have
>> > a node in one datacenter poll for changes in another data center. Before
>> > SolrCloud, I would have used slave/master replication. But in the
>> SolrCloud
>> > world, I am not sure how to configure this setup?
>> >
>> > Or is there any better ideas on how to use replication to push or pull
>> data
>> > from one datacenter to another?
>> >
>> > In my case, NRT is not a requirement. And I will also be dealing with
>> about
>> > 3 collections and 5 or 6 shards.
>> >
>> > Thanks.
>> >
>> > --
>> > *KEVIN OSBORN*
>> > LEAD SOFTWARE ENGINEER
>> > CNET Content Solutions
>> > OFFICE 949.399.8714
>> > CELL 949.310.4677  SKYPE osbornk
>> > 5 Park Plaza, Suite 600, Irvine, CA 92614
>> > [image: CNET Content Solutions]
>>
>
>
>
> --
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677  SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]

Re: Name of the couple of popular app/web sites using solar as search engine

2013-06-25 Thread soumikghosh05

Thanks a lot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Name-of-the-couple-of-popular-app-web-sites-using-solar-as-search-engine-tp4073157p4073162.html
Sent from the Solr - User mailing list archive at Nabble.com.

Name of the couple of popular app/web sites using solar as search engine

2013-06-25 Thread soumikghosh05

Hi All,

I am planing to use Solar as a search solution for the new application of my
company.

Can anyone give me couple of names of the popular web sites/application
where Solar is being used as search solution. I know eclipse is using Solar.

It will help he to convince people.

Thanks in Advance,
Soumik Ghosh





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Name-of-the-couple-of-popular-app-web-sites-using-solar-as-search-engine-tp4073157.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Name of the couple of popular app/web sites using solar as search engine

2013-06-25 Thread Otis Gospodnetic

How much time have you got? :)

http://wiki.apache.org/solr/PublicServers

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jun 25, 2013 at 2:45 PM, soumikghosh05
 wrote:
> Hi All,
>
> I am planing to use Solar as a search solution for the new application of my
> company.
>
> Can anyone give me couple of names of the popular web sites/application
> where Solar is being used as search solution. I know eclipse is using Solar.
>
> It will help he to convince people.
>
> Thanks in Advance,
> Soumik Ghosh
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Name-of-the-couple-of-popular-app-web-sites-using-solar-as-search-engine-tp4073157.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Common practice for free text field

2013-06-25 Thread Manuel Le Normand

My schema contains about a hundred of fields of various types (int,
strings, plain text, emails).
I was concerned what is the common practice for searching free text over
the index. Assuming there are not boosts related to field matching, these
are the options I see:

   1. Index and query a "all_fields" copyField source=*
   1.  advantages - only one query flow against a single index.
  2. disadvantage - the tokenizing is not necessarily adapted to this
  kind of field, this requires more storage and memory
   2. Field aliasing ( f.myalias.qf=realfield)
  1. advantages - opposite from the above
  2. disadvantages - a single query term would query 100 different
  fields. Multi term query might be a serious performance issue.

Any common practices?

Re: Common practice for free text field

2013-06-25 Thread Manuel Le Normand

By field aliasing I meant something like: f.all_fields.qf=*_txt+*_s+*_int
that would sum up to 100 fields


On Wed, Jun 26, 2013 at 12:00 AM, Manuel Le Normand <
manuel.lenorm...@gmail.com> wrote:

> My schema contains about a hundred of fields of various types (int,
> strings, plain text, emails).
> I was concerned what is the common practice for searching free text over
> the index. Assuming there are not boosts related to field matching, these
> are the options I see:
>
>1. Index and query a "all_fields" copyField source=*
>1.  advantages - only one query flow against a single index.
>   2. disadvantage - the tokenizing is not necessarily adapted to this
>   kind of field, this requires more storage and memory
>2. Field aliasing ( f.myalias.qf=realfield)
>   1. advantages - opposite from the above
>   2. disadvantages - a single query term would query 100 different
>   fields. Multi term query might be a serious performance issue.
>
> Any common practices?
>
>

Re: Common practice for free text field

2013-06-25 Thread Otis Gospodnetic

Hi,

Look up edismax parser on the Wiki.  The advantage of using it is that
you can set different weight on different fields (qf param) and
shingle query (pfXXX params).

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jun 25, 2013 at 5:00 PM, Manuel Le Normand
 wrote:
> My schema contains about a hundred of fields of various types (int,
> strings, plain text, emails).
> I was concerned what is the common practice for searching free text over
> the index. Assuming there are not boosts related to field matching, these
> are the options I see:
>
>1. Index and query a "all_fields" copyField source=*
>1.  advantages - only one query flow against a single index.
>   2. disadvantage - the tokenizing is not necessarily adapted to this
>   kind of field, this requires more storage and memory
>2. Field aliasing ( f.myalias.qf=realfield)
>   1. advantages - opposite from the above
>   2. disadvantages - a single query term would query 100 different
>   fields. Multi term query might be a serious performance issue.
>
> Any common practices?

Result Grouping

2013-06-25 Thread Bryan Bende

I was reading this documentation on Result Grouping...
http://docs.lucidworks.com/display/solr/Result+Grouping

which says...

sort - sortspec - Specifies how Solr sorts the groups relative to each
other. For example, sort=popularity desc will cause the groups to be sorted
according to the highest popularity document in each group. The default
value is score desc.

group.sort - sort.spec - Specifies how Solr sorts documents within a single
group. The default value is score desc.

Is it possible to use these parameters such that group.sort would first
sort with in each group, and then the overall sort would be applied
according to the first element of each sorted group ?

For example, using the scenario above where it has "sort=popularity desc",
could you also have "group.sort=date asc" resulting in the the most recent
document of each group being sorted by decreasing popularity ?

It seems to work the way I described when running a single node Solr 4.3
instance, but in a 2 shard configuration it appears to work differently.

-Bryan

Re: URL search and indexing

2013-06-25 Thread Jack Krupansky

Yeah, URL Classify does only do so much. That's why you need to combine 
multiple methods.


As a fourth method, you could code up a short JavaScript 
"StatelessScriptUpdateProcessor" that did something like take a full domain 
name (such as output by URL Classify) and turn it into multiple values, each 
with more of the prefix removed, so that "lucene.apache.org" would index as:


lucene.apache.org
apache.org
apache
.org
org

And then the user could query by any of those partial domain names.

But, if you simply tokenize the URL (copy the URL string to a text field), 
you automatically get most of that. The user can query by a URL fragment, 
such as "apache.org", ".org", "lucene.apache.org", etc. and the tokenization 
will strip out the punctuation.


I'll add this script to my list of examples to add in the next rev of my 
book.


-- Jack Krupansky

-Original Message- 
From: Flavio Pompermaier

Sent: Tuesday, June 25, 2013 10:06 AM
To: solr-user@lucene.apache.org
Subject: Re: URL search and indexing

I bought the book and looking at the example I still don't understand if it
possible query all sub-urls of my URL.
For example, if the URLClassifyProcessorFactory takes in input "url_s":"
http://lucene.apache.org/solr/4_0_0/changes/Changes.html"; and makes some
outputs like
- "url_domain_s":"lucene.apache.org"
- "url_canonical_s":"
http://lucene.apache.org/solr/4_0_0/changes/Changes.html";
How should I configure url_domain_s in order to be able to makes query like
'*.apache.org'?
How should I configure url_canonical_s in order to be able to makes query
like 'http://lucene.apache.org/solr/*'?
Is it better to have two different fields for the two queries or could I
create just one field for the two kind of queries (obviously for the former
case then I should query something like *://.apache.org/*)?


On Tue, Jun 25, 2013 at 3:15 PM, Jack Krupansky 
wrote:



There are examples in my book:
http://www.lulu.com/shop/jack-**krupansky/solr-4x-deep-dive-**
early-access-release-1/ebook/**product-21079719.html

But... I still think you should use a tokenized text field as well - use
all three: raw string, tokenized text, and URL classification fields.

-- Jack Krupansky

-Original Message- From: Flavio Pompermaier
Sent: Tuesday, June 25, 2013 9:02 AM
To: solr-user@lucene.apache.org
Subject: Re: URL search and indexing


That's sound exactly what I'm looking for! However I cannot find an 
example

of how to use it..could you help me please?
Moreover, about id field, isn't true that id field shouldn't be analyzed 
as

suggested in
http://wiki.apache.org/solr/**UniqueKey#Text_field_in_the_**document
?


On Tue, Jun 25, 2013 at 2:47 PM, Jan Høydahl 
wrote:

 Sure you can query the url directly. Or if you choose you can split it up

in multiple components, e.g. using
http://lucene.apache.org/solr/**4_3_0/solr-core/org/apache/**
solr/update/processor/**URLClassifyProcessor.html

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 14:10 skrev Flavio Pompermaier :

> Sorry but maybe I miss something here..could I declare url as key field
and
> query it too..?
> At the moment, my schema.xml looks like:
>
> 
>  required="true" multiValued="false" />
>
>   
>   
>  ...
>   
>
> 
> url
>
> Is it ok? or should I add a "baseurl" field of some kind to be able to
> query all url coming from a certain domain (1st or 2nd level as well)?
>
> Best,
> Flavio
>
>
> On Tue, Jun 25, 2013 at 12:28 PM, Jan Høydahl 
wrote:
>
>> Probably a good match for the RegExp feature of Solr (given that your
url
>> is not tokenized)
>> e.g. q=url:/.*\.it$/
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier :
>>
>>> Hi to everybody,
>>> I'm quite new to Solr so maybe my question could be trivial for you..
>>> In my use case I have to index stuff contained in some URL so i use
>>> url
>> as
>>> key of my document and I treat it like a string.
>>>
>>> However I'd like to be able to query by domain name, like *.it or *.
>>> somesite.com, what's the best strategy? I tought to made a URL to
path
>>> transfromation and indexed using solr.**PathHierarchyTokenizerFactory
>>> but
>>> maybe there's a simpler solution..isn't it?
>>>
>>> Best,
>>> Flavio
>>>
>>> --
>>>
>>> Flavio Pompermaier
>>> *Development Department
>>> *_**__
>>> *OKKAM**Srl **- www.okkam.it*
>>>
>>> *Phone:* +(39) 0461 283 702
>>> *Fax:* + (39) 0461 186 6433
>>> *Email:* f.pomperma...@okkam.it
>>> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
>>> *Registered office:* Trento (Italy), via Segantini 23
>>>

Querying multiple collections in SolrCloud

2013-06-25 Thread Chris Toomey

Hi, I'm investigating using SolrCloud for querying documents of different
but similar/related types, and have read through docs. on the wiki and done
many searches in these archives, but still have some questions.  Thanks in
advance for your help.

Setup:
* Say that I have N distinct types of documents and I want to do queries
that return the best matches regardless document type.  I.e., something
akin to a Google search where I'd like to get the best matches from the
web, news, images, and maps.

* Our main use case is supporting simple user-entered searches, which would
just contain terms / phrases and wouldn't specify fields.

* The document types will not all have the same fields, though there may be
some overlap in the fields.

* We plan to use a separate collection for each document type, and to use
the eDisMax query parser.  Each collection would have a document-specific
schema configuration with appropriate defaults for query fields and boosts,
etc.

Questions:
* Would the above setup qualify as "multiple compatible collections", such
that we could search all N collections with a single SolrCloud query, as in
the example query "
http://localhost:8983/solr/collection1/select?q=apple%20pie&collection=c1,c2,...,cN";?
 Again, we're not querying against specific fields.

* How does SolrCloud combine the query results from multiple collections?
 Does it re-sort the combined result set, or does it just return the
concatenation of the (unmerged) results from each of the collections?

* Does SolrCloud impose any restrictions on querying multiple, sharded
collections?  I know it supports querying say all 3 shards of a single
collection, so want to make sure it would also support say all Nx3 shards
of N collections.

* When SolrCloud queries multiple shards/collections, it queries them
concurrently vs. serially, correct?

thanks much,
Chris

Re: Querying multiple collections in SolrCloud

2013-06-25 Thread Jack Krupansky

One simple scenario to consider: N+1 collections - one collection per 
document type with detailed fields for that document type, and one common 
collection that indexes a subset of the fields. The main user query would be 
an edismax over the common fields in that "main" collection. You can then 
display summary results from the common collection. You can also then 
support "drill down" into the type-specific collection based on a "type" 
field for each document in the main collection.


Or, sure, you actually CAN index multiple document types in the same 
collection - add all the fields to one schema - there is no time or space 
penalty if most of the field are empty for most documents.


-- Jack Krupansky

-Original Message- 
From: Chris Toomey

Sent: Tuesday, June 25, 2013 6:08 PM
To: solr-user@lucene.apache.org
Subject: Querying multiple collections in SolrCloud

Hi, I'm investigating using SolrCloud for querying documents of different
but similar/related types, and have read through docs. on the wiki and done
many searches in these archives, but still have some questions.  Thanks in
advance for your help.

Setup:
* Say that I have N distinct types of documents and I want to do queries
that return the best matches regardless document type.  I.e., something
akin to a Google search where I'd like to get the best matches from the
web, news, images, and maps.

* Our main use case is supporting simple user-entered searches, which would
just contain terms / phrases and wouldn't specify fields.

* The document types will not all have the same fields, though there may be
some overlap in the fields.

* We plan to use a separate collection for each document type, and to use
the eDisMax query parser.  Each collection would have a document-specific
schema configuration with appropriate defaults for query fields and boosts,
etc.

Questions:
* Would the above setup qualify as "multiple compatible collections", such
that we could search all N collections with a single SolrCloud query, as in
the example query "
http://localhost:8983/solr/collection1/select?q=apple%20pie&collection=c1,c2,...,cN";?
Again, we're not querying against specific fields.

* How does SolrCloud combine the query results from multiple collections?
Does it re-sort the combined result set, or does it just return the
concatenation of the (unmerged) results from each of the collections?

* Does SolrCloud impose any restrictions on querying multiple, sharded
collections?  I know it supports querying say all 3 shards of a single
collection, so want to make sure it would also support say all Nx3 shards
of N collections.

* When SolrCloud queries multiple shards/collections, it queries them
concurrently vs. serially, correct?

thanks much,
Chris

Joins with SolrCloud

2013-06-25 Thread Chris Toomey

What are the restrictions/limitations w.r.t. joins when using SolrCloud?

Say I have a 3-node cluster and both my "outer" and "inner" collections are
sharded 3 ways across the cluster.  Could I do a query such as
"select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foo&collection=outerCollection"?

Or if the above isn't supported, would it be if the "inner" collection was
not sharded and was replicated across all 3 nodes, so that it existed in
its entirety on each node?

thx,
Chris

Re: Querying multiple collections in SolrCloud

2013-06-25 Thread Chris Toomey

Thanks Jack for the alternatives.  The first is interesting but has the
downside of requiring multiple queries to get the full matching docs.  The
second is interesting and very simple, but has the downside of not being
modular and being difficult to configure field boosting when the
collections have overlapping field names with different boosts being needed
for the same field in different document types.

I'd still like to know about the viability of my original approach though
too.

Chris


On Tue, Jun 25, 2013 at 3:19 PM, Jack Krupansky wrote:

> One simple scenario to consider: N+1 collections - one collection per
> document type with detailed fields for that document type, and one common
> collection that indexes a subset of the fields. The main user query would
> be an edismax over the common fields in that "main" collection. You can
> then display summary results from the common collection. You can also then
> support "drill down" into the type-specific collection based on a "type"
> field for each document in the main collection.
>
> Or, sure, you actually CAN index multiple document types in the same
> collection - add all the fields to one schema - there is no time or space
> penalty if most of the field are empty for most documents.
>
> -- Jack Krupansky
>
> -Original Message- From: Chris Toomey
> Sent: Tuesday, June 25, 2013 6:08 PM
> To: solr-user@lucene.apache.org
> Subject: Querying multiple collections in SolrCloud
>
>
> Hi, I'm investigating using SolrCloud for querying documents of different
> but similar/related types, and have read through docs. on the wiki and done
> many searches in these archives, but still have some questions.  Thanks in
> advance for your help.
>
> Setup:
> * Say that I have N distinct types of documents and I want to do queries
> that return the best matches regardless document type.  I.e., something
> akin to a Google search where I'd like to get the best matches from the
> web, news, images, and maps.
>
> * Our main use case is supporting simple user-entered searches, which would
> just contain terms / phrases and wouldn't specify fields.
>
> * The document types will not all have the same fields, though there may be
> some overlap in the fields.
>
> * We plan to use a separate collection for each document type, and to use
> the eDisMax query parser.  Each collection would have a document-specific
> schema configuration with appropriate defaults for query fields and boosts,
> etc.
>
> Questions:
> * Would the above setup qualify as "multiple compatible collections", such
> that we could search all N collections with a single SolrCloud query, as in
> the example query "
> http://localhost:8983/solr/**collection1/select?q=apple%**
> 20pie&collection=c1,c2,..
> .,cN"**?
> Again, we're not querying against specific fields.
>
> * How does SolrCloud combine the query results from multiple collections?
> Does it re-sort the combined result set, or does it just return the
> concatenation of the (unmerged) results from each of the collections?
>
> * Does SolrCloud impose any restrictions on querying multiple, sharded
> collections?  I know it supports querying say all 3 shards of a single
> collection, so want to make sure it would also support say all Nx3 shards
> of N collections.
>
> * When SolrCloud queries multiple shards/collections, it queries them
> concurrently vs. serially, correct?
>
> thanks much,
> Chris
>

Re: Varnish

2013-06-25 Thread Learner

Check this link..
http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Varnish-tp4072057p4073205.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Joins with SolrCloud

2013-06-25 Thread Upayavira

I have never heard mention that joins support distributed search, so you
cannot do a join against a sharded core.

However, if from your example, innerCollection was replicated across all
nodes, I would think that should work, because all that comes back from
each server when a distributed search happens is the best 'n' matches,
so exactly how those 'n' matches were located doesn't matter
particularly.

Simpler answer: try it!

Upayavira

On Tue, Jun 25, 2013, at 11:25 PM, Chris Toomey wrote:
> What are the restrictions/limitations w.r.t. joins when using SolrCloud?
> 
> Say I have a 3-node cluster and both my "outer" and "inner" collections
> are
> sharded 3 ways across the cluster.  Could I do a query such as
> "select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foo&collection=outerCollection"?
> 
> Or if the above isn't supported, would it be if the "inner" collection
> was
> not sharded and was replicated across all 3 nodes, so that it existed in
> its entirety on each node?
> 
> thx,
> Chris

RE: Is it possible to searh Solr with a longer query string?

2013-06-25 Thread yang, gang

Hi,

I'm using Solr server to develop a search service, and I encounter a problem 
when trying to input a longer query string:

Here is the code:


StringBuffer stringBuffer = new StringBuffer();

... ...
try{
//search Pubmed server( a NCBI server ), it returns a list of IDs.
EFetchPubmedServiceStub service = new EFetchPubmedServiceStub();
EFetchPubmedServiceStub.EFetchRequest req = new 
EFetchPubmedServiceStub.EFetchRequest();
req.setWebEnv( WebEnv );
req.setQuery_key( query_key );
req.setRetstart( "1110" );

// return 295 IDs
req.setRetmax( "295" );
EFetchPubmedServiceStub.EFetchResult res = service.run_eFetch( req 
);

//connect returned IDs with " OR " and query my local Solr server
for( int i = 0; i < 
res.getPubmedArticleSet().getPubmedArticleSetChoice().length; i++ ){
EFetchPubmedServiceStub.PubmedArticleType art = 
res.getPubmedArticleSet().getPubmedArticleSetChoice()
[ i ].getPubmedArticle();

if( i > 0 ){
stringBuffer.append( " OR " );
}
stringBuffer.append( "( pmid:" + 
art.getMedlineCitation().getPMID().getString() + " )" );
}

HttpSolrServer solrServer = new HttpSolrServer( 
"http://127.0.0.1:8087/solr430/medline"; );

String q = stringBuffer.toString();

//when input query has more 300 IDs, query will throw 
org.apache.solr.client.solrj.SolrServerException: Server at 
http://127.0.0.1:8087/solr430/medline returned non ok status:400, message:Bad 
Request
QueryResponse solrRes = solrServer.query( new SolrQuery( q ) );
long found = solrRes.getResults().getNumFound();
System.out.println( found );
}
catch( Exception e ){
e.printStackTrace();
}
... ...

Do you think it's possible to change the query string length limit so that Solr 
can accept more IDs?

Thanks.

-Gary

Re: Is it possible to searh Solr with a longer query string?

2013-06-25 Thread Jack Krupansky


Are you using Tomcat?

See:
http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests

Enabling Longer Query Requests

If you try to submit too long a GET query to Solr, then Tomcat will reject 
your HTTP request on the grounds that the HTTP header is too large; symptoms 
may include an HTTP 400 Bad Request error or (if you execute the query in a 
web browser) a blank browser window.


If you need to enable longer queries, you can set the maxHttpHeaderSize 
attribute on the HTTP Connector element in your server.xml file. The default 
value is 4K. (See http://tomcat.apache.org/tomcat-5.5-doc/config/http.html)


---

If you're not using Tomcat, your container may have a similar limit.

-- Jack Krupansky

-Original Message- 
From: yang, gang

Sent: Tuesday, June 25, 2013 5:47 PM
To: solr-user@lucene.apache.org
Cc: Meng, Fan
Subject: RE: Is it possible to searh Solr with a longer query string?

Hi,

I'm using Solr server to develop a search service, and I encounter a problem 
when trying to input a longer query string:


Here is the code:


StringBuffer stringBuffer = new StringBuffer();

... ...
   try{
   //search Pubmed server( a NCBI server ), it returns a list of 
IDs.

   EFetchPubmedServiceStub service = new EFetchPubmedServiceStub();
   EFetchPubmedServiceStub.EFetchRequest req = new 
EFetchPubmedServiceStub.EFetchRequest();

   req.setWebEnv( WebEnv );
   req.setQuery_key( query_key );
   req.setRetstart( "1110" );

   // return 295 IDs
   req.setRetmax( "295" );
   EFetchPubmedServiceStub.EFetchResult res = service.run_eFetch( 
req );


   //connect returned IDs with " OR " and query my local Solr 
server
   for( int i = 0; i < 
res.getPubmedArticleSet().getPubmedArticleSetChoice().length; i++ ){
   EFetchPubmedServiceStub.PubmedArticleType art = 
res.getPubmedArticleSet().getPubmedArticleSetChoice()

   [ i ].getPubmedArticle();

   if( i > 0 ){
   stringBuffer.append( " OR " );
   }
   stringBuffer.append( "( pmid:" + 
art.getMedlineCitation().getPMID().getString() + " )" );

   }

   HttpSolrServer solrServer = new HttpSolrServer( 
"http://127.0.0.1:8087/solr430/medline"; );


   String q = stringBuffer.toString();

   //when input query has more 300 IDs, query will throw 
org.apache.solr.client.solrj.SolrServerException: Server at 
http://127.0.0.1:8087/solr430/medline returned non ok status:400, 
message:Bad Request

   QueryResponse solrRes = solrServer.query( new SolrQuery( q ) );
   long found = solrRes.getResults().getNumFound();
   System.out.println( found );
   }
   catch( Exception e ){
   e.printStackTrace();
   }
... ...

Do you think it's possible to change the query string length limit so that 
Solr can accept more IDs?


Thanks.

-Gary

Re: Is it possible to searh Solr with a longer query string?

2013-06-25 Thread Kevin Osborn

If your query is arriving on the server correctly, but throwing an
exception, adjust maxBooleanClauses in your solrconfig.xml. I'm not sure
what the consequences are of making it too large, but we had to adjust it
from the default of 1024 to 5000 in one implementation.

Basically, each ID in your query is a separate clause. So, you may have
exceeded maxBooleanClauses.

-Kevin


On Tue, Jun 25, 2013 at 5:15 PM, Jack Krupansky wrote:

> Are you using Tomcat?
>
> See:
> http://wiki.apache.org/solr/**SolrTomcat#Enabling_Longer_**Query_Requests
>
> Enabling Longer Query Requests
>
> If you try to submit too long a GET query to Solr, then Tomcat will reject
> your HTTP request on the grounds that the HTTP header is too large;
> symptoms may include an HTTP 400 Bad Request error or (if you execute the
> query in a web browser) a blank browser window.
>
> If you need to enable longer queries, you can set the maxHttpHeaderSize
> attribute on the HTTP Connector element in your server.xml file. The
> default value is 4K. (See http://tomcat.apache.org/**
> tomcat-5.5-doc/config/http.**html
> )
>
> ---
>
> If you're not using Tomcat, your container may have a similar limit.
>
> -- Jack Krupansky
>
> -Original Message- From: yang, gang
> Sent: Tuesday, June 25, 2013 5:47 PM
> To: solr-user@lucene.apache.org
> Cc: Meng, Fan
> Subject: RE: Is it possible to searh Solr with a longer query string?
>
>
> Hi,
>
> I'm using Solr server to develop a search service, and I encounter a
> problem when trying to input a longer query string:
>
> Here is the code:
>
>
> StringBuffer stringBuffer = new StringBuffer();
>
> ... ...
>try{
>//search Pubmed server( a NCBI server ), it returns a list of
> IDs.
>EFetchPubmedServiceStub service = new EFetchPubmedServiceStub();
>EFetchPubmedServiceStub.**EFetchRequest req = new
> EFetchPubmedServiceStub.**EFetchRequest();
>req.setWebEnv( WebEnv );
>req.setQuery_key( query_key );
>req.setRetstart( "1110" );
>
>// return 295 IDs
>req.setRetmax( "295" );
>EFetchPubmedServiceStub.**EFetchResult res =
> service.run_eFetch( req );
>
>//connect returned IDs with " OR " and query my local Solr
> server
>for( int i = 0; i < res.getPubmedArticleSet().**
> getPubmedArticleSetChoice().**length; i++ ){
>EFetchPubmedServiceStub.**PubmedArticleType art =
> res.getPubmedArticleSet().**getPubmedArticleSetChoice()
>[ i ].getPubmedArticle();
>
>if( i > 0 ){
>stringBuffer.append( " OR " );
>}
>stringBuffer.append( "( pmid:" + art.getMedlineCitation().*
> *getPMID().getString() + " )" );
>}
>
>HttpSolrServer solrServer = new HttpSolrServer( "
> http://127.0.0.1:8087/**solr430/medline"
> );
>
>String q = stringBuffer.toString();
>
>//when input query has more 300 IDs, query will throw
> org.apache.solr.client.solrj.**SolrServerException: Server at
> http://127.0.0.1:8087/solr430/**medlinereturned
>  non ok status:400, message:Bad Request
>QueryResponse solrRes = solrServer.query( new SolrQuery( q ) );
>long found = solrRes.getResults().**getNumFound();
>System.out.println( found );
>}
>catch( Exception e ){
>e.printStackTrace();
>}
> ... ...
>
> Do you think it's possible to change the query string length limit so that
> Solr can accept more IDs?
>
> Thanks.
>
> -Gary
>



-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677  SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

RE: Joins with SolrCloud

2013-06-25 Thread James Thomas

My understanding is the same that "{!join...}" does not work in SolrCloud (aka 
distributed search)
based on:
1.  https://issues.apache.org/jira/browse/LUCENE-3759
2. http://wiki.apache.org/solr/DistributedSearch
--- see "Limitations" section which refers to the JIRA above


-- James

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Tuesday, June 25, 2013 7:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Joins with SolrCloud

I have never heard mention that joins support distributed search, so you cannot 
do a join against a sharded core.

However, if from your example, innerCollection was replicated across all nodes, 
I would think that should work, because all that comes back from each server 
when a distributed search happens is the best 'n' matches, so exactly how those 
'n' matches were located doesn't matter particularly.

Simpler answer: try it!

Upayavira

On Tue, Jun 25, 2013, at 11:25 PM, Chris Toomey wrote:
> What are the restrictions/limitations w.r.t. joins when using SolrCloud?
> 
> Say I have a 3-node cluster and both my "outer" and "inner" 
> collections are sharded 3 ways across the cluster.  Could I do a query 
> such as 
> "select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foo&collection=outerCollection"?
> 
> Or if the above isn't supported, would it be if the "inner" collection 
> was not sharded and was replicated across all 3 nodes, so that it 
> existed in its entirety on each node?
> 
> thx,
> Chris

Re: Need Help in migrating Solr version 1.4 to 4.3

2013-06-25 Thread Sandeep Gupta

Thanks for all the answers.
Sure I am going to create new index again with Solr 4.3.

Also in application development side,
as I said that I am going to use HTTPSolrServer API and I found that we
shouldn't create this object multiple times
(as per the wiki document http://wiki.apache.org/solr/Solrj#HttpSolrServer)
So I am planning to have my Server class as singleton.
 Please advice little bit in this front also.

Regards
Sandeep



On Tue, Jun 25, 2013 at 11:16 PM, André Widhani wrote:

> fwiw, I can confirm that Solr 4.x can definitely not read indexes created
> with 1.4.
>
> You'll get an exception like the following:
>
> Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
> version is not supported (resource: segment _16ofy in resource
> ChecksumIndexInput(MMapIndexInput(path="/var/opt/dcx/solr2/core-tex60l254lpachcjhtz4se-index2/data/index/segments_1dlof"))):
> 2.x. This version of Lucene only supports indexes created with release 3.0
> and later.
>
> But as Erick mentioned, you could get away with optimizing the index with
> 3.x instead of re-indexing from scratch before moving on to 4.x - I think I
> did that once and it worked.
>
> Regards,
> André
>
> 
> Von: Erick Erickson [erickerick...@gmail.com]
> Gesendet: Dienstag, 25. Juni 2013 19:37
> An: solr-user@lucene.apache.org
> Betreff: Re: Need Help in migrating Solr version 1.4 to 4.3
>
> bq: I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes
>
> Solr/Lucene explicitly try to read _one_ major revision backwards.
> Solr 3.x should be able to read 1.4 indexes. Solr 4.x should be
> able to read Solr 3.x. No attempt is made to allow Solr 4.x to read
> Solr 1.4 indexes, so I wouldn't even try.
>
> Shalin's comment is best. If at all possible I'd just forget about
> reading the old index and re-index from scratch. But if you _do_
> try upgrading 1.4 -> 3.x -> 4.x, you probably want to optimize
> at each step. That'll (I think) rewrite all the segments in the
> current format.
>
> Good luck!
> Erick
>
> On Tue, Jun 25, 2013 at 12:59 AM, Shalin Shekhar Mangar
>  wrote:
> > You must carefully go through the upgrade instructions starting from
> > 1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from
> > 3.1 to 4.0 should be given special attention.
> >
> > On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta 
> wrote:
> >> Hello All,
> >>
> >> We are planning to migrate solr 1.4 to Solr 4.3 version.
> >> And I am seeking some help in this side.
> >>
> >> Considering Schema file change:
> >> By default there are lots of changes if I compare original Solr 1.4
> schema
> >> file to Sol 4.3 schema file.
> >> And that is the reason we are not copying paste of schema file.
> >> In our Solr 1.4 schema implementation, we have some custom fields with
> type
> >> "textgen" and "text"
> >> So in migration of these custom fields to Solr 4.3,  should I use type
> of
> >> "text_general" as replacement of "textgen" and
> >> "text_en" as replacement of "text"?
> >> Please confirm the same.
> >
> > Please check the text_general definition in 4.3 against the textgen
> > fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en
> > and text.
> >
> >>
> >> Considering Solrconfig change:
> >> As we didn't have lots of changes in 1.4 solrconfig file except the
> >> dataimport request handler.
> >> And therefore in migration side, we are simply modifying the Solr 4.3
> >> solrconfig file with his request handler.
> >
> > And you need to add the dataimporthandler jar into Solr's lib
> > directory. DIH is not added automatically anymore.
> >
> >>
> >> Considering the application development:
> >>
> >> We used all the queries as BOOLEAN type style (was not good)  I mean put
> >> all the parameter in query fields i.e
> >> *:* AND EntityName: <<>> AND : AND .
> >>
> >> I think we should simplify our queries using other fields like df, qf
> 
> >>
> >
> > Probably. AND queries are best done by filter queries (fq).
> >
> >> We also used to create Solr server object via CommonsHttpSolrServer()
> so I
> >> am planning to use now HttpSolrServer API>
> >
> > Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in
> > the javabin format so old clients using javabin won't be able to
> > communicate with Solr until you upgrade both solr client and solr
> > servers.
> >
> >>
> >> Please let me know the suggestion for above points also what are the
> other
> >> factors I need to take care while considering the migration.
> >
> > There is no substitute for reading the upgrade sections in the
> changes.txt.
> >
> > I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes. You
> > will most likely need to re-index your documents.
> >
> > You should also think about switching to SolrCloud to take advantage
> > of its features.
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>

Re: Is there a way to capture div tag by id?

2013-06-25 Thread Jack Krupansky

Sorry, but not only can you not capture that specific , but you cannot 
capture ANY . Really. For some mysterious reasoning, Tika silently eats 
 HTML parsing events. Plenty of other HTML tags can be captured, but 
not .


Both the Solr Wiki for Solr Cell and the new/Lucid Apache Solr Reference 
Guide mislead people with examples that clearly can never run as expected 
with real data.


-- Jack Krupansky
-Original Message- 
From: eShard

Sent: Tuesday, June 25, 2013 1:17 PM
To: solr-user@lucene.apache.org
Subject: Is there a way to capture div tag by id?

let's say I have a div with id="myDiv"
Is there a way to set up the solr upate/extract handler to capture just that
particular div?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need Help in migrating Solr version 1.4 to 4.3

2013-06-25 Thread Sandeep Gupta

Thanks for all the answers.
Sure I am going to create new index again with Solr 4.3.

Also in application development side,
as I said that I am going to use HTTPSolrServer API and I found that we
shouldn't create this object multiple times
(as per the wiki document http://wiki.apache.org/solr/Solrj#HttpSolrServer)
So I am planning to have my Server class as singleton.
 Please advice little bit in this front also.

Regards
Sandeep



On Tue, Jun 25, 2013 at 11:16 PM, André Widhani wrote:

> fwiw, I can confirm that Solr 4.x can definitely not read indexes created
> with 1.4.
>
> You'll get an exception like the following:
>
> Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
> version is not supported (resource: segment _16ofy in resource
> ChecksumIndexInput(MMapIndexInput(path="/var/opt/dcx/solr2/core-tex60l254lpachcjhtz4se-index2/data/index/segments_1dlof"))):
> 2.x. This version of Lucene only supports indexes created with release 3.0
> and later.
>
> But as Erick mentioned, you could get away with optimizing the index with
> 3.x instead of re-indexing from scratch before moving on to 4.x - I think I
> did that once and it worked.
>
> Regards,
> André
>
> 
> Von: Erick Erickson [erickerick...@gmail.com]
> Gesendet: Dienstag, 25. Juni 2013 19:37
> An: solr-user@lucene.apache.org
> Betreff: Re: Need Help in migrating Solr version 1.4 to 4.3
>
> bq: I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes
>
> Solr/Lucene explicitly try to read _one_ major revision backwards.
> Solr 3.x should be able to read 1.4 indexes. Solr 4.x should be
> able to read Solr 3.x. No attempt is made to allow Solr 4.x to read
> Solr 1.4 indexes, so I wouldn't even try.
>
> Shalin's comment is best. If at all possible I'd just forget about
> reading the old index and re-index from scratch. But if you _do_
> try upgrading 1.4 -> 3.x -> 4.x, you probably want to optimize
> at each step. That'll (I think) rewrite all the segments in the
> current format.
>
> Good luck!
> Erick
>
> On Tue, Jun 25, 2013 at 12:59 AM, Shalin Shekhar Mangar
>  wrote:
> > You must carefully go through the upgrade instructions starting from
> > 1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from
> > 3.1 to 4.0 should be given special attention.
> >
> > On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta 
> wrote:
> >> Hello All,
> >>
> >> We are planning to migrate solr 1.4 to Solr 4.3 version.
> >> And I am seeking some help in this side.
> >>
> >> Considering Schema file change:
> >> By default there are lots of changes if I compare original Solr 1.4
> schema
> >> file to Sol 4.3 schema file.
> >> And that is the reason we are not copying paste of schema file.
> >> In our Solr 1.4 schema implementation, we have some custom fields with
> type
> >> "textgen" and "text"
> >> So in migration of these custom fields to Solr 4.3,  should I use type
> of
> >> "text_general" as replacement of "textgen" and
> >> "text_en" as replacement of "text"?
> >> Please confirm the same.
> >
> > Please check the text_general definition in 4.3 against the textgen
> > fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en
> > and text.
> >
> >>
> >> Considering Solrconfig change:
> >> As we didn't have lots of changes in 1.4 solrconfig file except the
> >> dataimport request handler.
> >> And therefore in migration side, we are simply modifying the Solr 4.3
> >> solrconfig file with his request handler.
> >
> > And you need to add the dataimporthandler jar into Solr's lib
> > directory. DIH is not added automatically anymore.
> >
> >>
> >> Considering the application development:
> >>
> >> We used all the queries as BOOLEAN type style (was not good)  I mean put
> >> all the parameter in query fields i.e
> >> *:* AND EntityName: <<>> AND : AND .
> >>
> >> I think we should simplify our queries using other fields like df, qf
> 
> >>
> >
> > Probably. AND queries are best done by filter queries (fq).
> >
> >> We also used to create Solr server object via CommonsHttpSolrServer()
> so I
> >> am planning to use now HttpSolrServer API>
> >
> > Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in
> > the javabin format so old clients using javabin won't be able to
> > communicate with Solr until you upgrade both solr client and solr
> > servers.
> >
> >>
> >> Please let me know the suggestion for above points also what are the
> other
> >> factors I need to take care while considering the migration.
> >
> > There is no substitute for reading the upgrade sections in the
> changes.txt.
> >
> > I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes. You
> > will most likely need to re-index your documents.
> >
> > You should also think about switching to SolrCloud to take advantage
> > of its features.
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>

Re: Restarting SOLR will remove all cache?

2013-06-25 Thread Toke Eskildsen

On Tue, 2013-06-25 at 07:35 +0200, William Bell wrote:
> It goes restart the MMap stuff though.

You cannot be sure of that. It is not mandatory, but the system should
share memory mapped files with the disk cache.
https://en.wikipedia.org/wiki/Memory-mapped_file#Benefits

- Toke Eskildsen, State and University Library, Denmark

RE: Shard identification

2013-06-25 Thread Ophir Michaeli

Thanks for the response.
I use 4.3 and have this issue. 

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Tuesday, June 18, 2013 12:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Shard identification

What version of Solr? I had something like this on 4.2.1. Upgraging to
4.3 sorted it.

Upayavira

On Tue, Jun 18, 2013, at 09:37 AM, Ophir Michaeli wrote:
> Hi,
> 
> I built a 2 shards and 2 replicas system that works ok on a local 
> machine, 1 zookeeper on shard 1.
> It appears ok on the solar monitor page, cloud tab 
> (http://localhost:8983/solr/#/~cloud).
> When I move to using different machines, each shard/replica on a 
> different machine I get a wrong cloud-graph on the Solr monitoring 
> page.
> The machine that has Shard 2 appears on the graph on shard 1, and the 
> replicas are also mixed, shard 2 appears as 1 and shard 1 appears as 2.
> 
> Any ideas why this happens?
> 
> Thanks,
> Ophir

Solr indexer and Hadoop

2013-06-25 Thread engy.morsy

Hi All, 

I have TB of data that need to be indexed. I am trying to use hadoop to
index those TB. I am still newbie. 
I thought that the Map function will read data from hard disks and the
reduce function will index them. The problem I am facing is how to read
those data from hard disks which are not HDFS. 

I understand that the data to be indexed must be on HDFS, don't they? or I
am missing something here. 

I can't convert the nodes on which the data resides to HDFS. Can anyone
please help.

I would also appreciate if you can provide a good tutorial for solr indexing
using hadoop. I googled alot but I did not find a sufficient one. 
 
Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need Help in migrating Solr version 1.4 to 4.3

2013-06-25 Thread Shalin Shekhar Mangar

You must carefully go through the upgrade instructions starting from
1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from
3.1 to 4.0 should be given special attention.

On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta  wrote:
> Hello All,
>
> We are planning to migrate solr 1.4 to Solr 4.3 version.
> And I am seeking some help in this side.
>
> Considering Schema file change:
> By default there are lots of changes if I compare original Solr 1.4 schema
> file to Sol 4.3 schema file.
> And that is the reason we are not copying paste of schema file.
> In our Solr 1.4 schema implementation, we have some custom fields with type
> "textgen" and "text"
> So in migration of these custom fields to Solr 4.3,  should I use type of
> "text_general" as replacement of "textgen" and
> "text_en" as replacement of "text"?
> Please confirm the same.

Please check the text_general definition in 4.3 against the textgen
fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en
and text.

>
> Considering Solrconfig change:
> As we didn't have lots of changes in 1.4 solrconfig file except the
> dataimport request handler.
> And therefore in migration side, we are simply modifying the Solr 4.3
> solrconfig file with his request handler.

And you need to add the dataimporthandler jar into Solr's lib
directory. DIH is not added automatically anymore.

>
> Considering the application development:
>
> We used all the queries as BOOLEAN type style (was not good)  I mean put
> all the parameter in query fields i.e
> *:* AND EntityName: <<>> AND : AND .
>
> I think we should simplify our queries using other fields like df, qf 
>

Probably. AND queries are best done by filter queries (fq).

> We also used to create Solr server object via CommonsHttpSolrServer() so I
> am planning to use now HttpSolrServer API>

Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in
the javabin format so old clients using javabin won't be able to
communicate with Solr until you upgrade both solr client and solr
servers.

>
> Please let me know the suggestion for above points also what are the other
> factors I need to take care while considering the migration.

There is no substitute for reading the upgrade sections in the changes.txt.

I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes. You
will most likely need to re-index your documents.

You should also think about switching to SolrCloud to take advantage
of its features.

--
Regards,
Shalin Shekhar Mangar.

Re: Shard identification

2013-06-25 Thread Shalin Shekhar Mangar

Firstly, using 1 zookeeper machine is not at all ideal. See
http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7

I've never personally seen such an issue. Can you give screen shots of
the cloud graph on each node? Use an image hosting service because the
mailing list won't allow attachments.

On Tue, Jun 18, 2013 at 2:07 PM, Ophir Michaeli  wrote:
> Hi,
>
> I built a 2 shards and 2 replicas system that works ok on a local machine, 1
> zookeeper on shard 1.
> It appears ok on the solar monitor page, cloud tab
> (http://localhost:8983/solr/#/~cloud).
> When I move to using different machines, each shard/replica on a different
> machine I get a wrong cloud-graph on the Solr monitoring page.
> The machine that has Shard 2 appears on the graph on shard 1, and the
> replicas are also mixed, shard 2 appears as 1 and shard 1 appears as 2.
>
> Any ideas why this happens?
>
> Thanks,
> Ophir

-- 
Regards,
Shalin Shekhar Mangar.

Re: Can we use Solr to serve data to web analytics & Dashboard charts

2013-06-25 Thread pradeep kumar

Sure,

First of all thanks a lot everyone for very quick reply.

We have a Ordering system which has a lakhs of records so far in a
normalized RDBMS tables, say Order, Item, Details etc. We are planning to
have a offline database (star schema) and develop reports, data analytical
charts with drill down, dashboard with data from offline database.

I am planning to propose solr as a solution instead of a offline database
ie through DIH to import data from DB into solr indexes. Since Solr indexes
are stored denormalized manner and querying is faster, faceting search, i
assumed that solr can be used to solve my requirement.

Please correct me if i am wrong.

Thanks,
Pradeep

On Tue, Jun 25, 2013 at 3:43 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Yeah, perhaps yet people keep using it for this.  So, Pradeep, it
> may work for you and if you share some numbers with us we may be able
> to tell you "no way" or "very likely OK". :)
>
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Mon, Jun 24, 2013 at 4:14 PM, Walter Underwood 
> wrote:
> > I expect it won't be fast enough for general use. Most analytics stores
> implement functions inside the server to aggregate large amounts of data.
> There is always some query that returns the whole database in order to
> calculate an average.
> >
> > I'm sure it will work fine for some things and for small data sets, but
> it probably won't scale for most real analytics applications.
> >
> > wunder
> >
> > On Jun 24, 2013, at 12:47 PM, pradeep kumar wrote:
> >
> >> Hello everyone,
> >>
> >> Apart from text search, can we use Solr as data store to serve data to
> form
> >> analytics with drilldown charts or charts to add as widgets on
> dashboards?
> >>
> >> Any suggestion, examples?
> >>
> >> Thanks,
> >> Pradeep
> >
> >
> >
> >
>

Updating solrconfig and schema.xml for solrcloud in multicore setup

2013-06-25 Thread Utkarsh Sengar

Hello,

I am trying to update schema.xml for a core in a multicore setup and this
is what I do to update it:

I have 3 nodes in my solr cluster.

1. Pick node1 and manually update schema.xml

2. Restart node1 with -Dbootstrap_conf=true
java -Dsolr.solr.home=multicore -DnumShards=3 -Dbootstrap_conf=true
-DzkHost=localhost:2181 -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar

3. Restart the other 2 nodes using this command (without
-Dbootstrap_conf=true since these should pull from zk).:
java -Dsolr.solr.home=multicore -DnumShards=3 -DzkHost=localhost:2181
-DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar

But, when I do that. node1 displays all of my cores and the other 2 nodes
displays just one core.

Then, I found this:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3cbb7ad9bf-389b-4b94-8c1b-bbfc4028a...@gmail.com%3E
Which says bootstrap_conf is used for multicore setup.


But if I use bootstrap_conf for every node, then I will have to manually
update schema.xml (for any config file) everywhere? That does not sound
like an efficient way of managing configuration right?


-- 
Thanks,
-Utkarsh

Solr Document inside the document

2013-06-25 Thread Siva Prasad Janapati


Hi,
 
I have a requirement where i need to create document inside the document. For 
example,
 


...

...
...

 
.
 

 
Is there any way to configure the document like this?
 
Regards,
Siva
[http://smarttechies.wordpress.com/] http://smarttechies.wordpress.com/

Re: Solr Document inside the document

2013-06-25 Thread Gora Mohanty

On 25 June 2013 14:02, Siva Prasad Janapati  wrote:
>
> Hi,
>
> I have a requirement where i need to create document inside the document.
[...]

What do you mean by this? Create it inside which document?
One being POSTed to Solr?

Regards,
Gora

Re: String field does not yield partial match result using qf parameter

2013-06-25 Thread Jan Høydahl

fieldType "string" is not tokenized, so your observation is correct. You need 
to use a fieldType with analysis and tokenization to get the behavior you want.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 02:35 skrev "Mugoma Joseph O." :

> 
> It looks like partial search works only with copied to field. This works:
> 
> $ curl
> "http://localhost:8282/solr/links/select?q=text_ngrams:yengas&wt=json&indent=on&fl=id,domain,score";
> 
> On Tue, June 25, 2013 12:39 am, Mugoma Joseph O. wrote:
>> Hello,
>> 
>> I am newbie to solr.
>> 
>> I am trying out partial search (match). My experience is opposite of
>> http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-td4060096.html
>> 
>> When I add 'qf' to to dismax query I get no result unless there's a full
>> match.
>> 
>> I am using NGramFilterFactory as follows:
>> 
>> 
>>   
>> 
>> > maxGramSize="15"/>
>>   
>>   
>> 
>>   
>> 
>> 
>> ...
>> 
>> 
>> > stored="false" multiValued="true" />
>> 
>> ...
>> 
>> 
>> 
>> ...
>> 
>> 
>> 
>> 
>> If I have yengas.com in indexed I can search for yengas.com but not
>> yengas. However, If I drop 'qf' I can search for yengas.
>> 
>> 
>> Example searches:
>> 
>> $ curl
>> "http://localhost:8282/solr/links/select?q=domain:yengas&wt=json&indent=on&fl=id,domain,score";
>> => "response":{"numFound":0,"start":0,"docs":[]
>> 
>> 
>> $ curl
>> "http://localhost:8282/solr/links/select?q=domain:yengas.com&wt=json&indent=on&fl=id,domain,score";
>> => "response":{"numFound":3,"start":0,"docs":[]
>> 
>> $ curl
>> "http://localhost:8282/solr/links/select?defType=dismax&q=yengas&qf=domain^4&pf=domain&ps=0&fl=id,domain,score";
>> => "response":{"numFound":0,"start":0,"docs":[]
>> 
>> 
>> $ curl
>> "http://localhost:8282/solr/links/select?defType=dismax&q=yengas.com&pf=domain&ps=0&fl=id,domain,score";
>> => "response":{"numFound":3,"start":0,"docs":[]
>> 
>> 
>> The partial match fails on dismax and normal query.
>> 
>> What could I be missing?
>> 
>> 
>> Thanks.
>> 
>> Mugoma.
>> 
>> 
>> 
>> 
> 
>

Re: Updating solrconfig and schema.xml for solrcloud in multicore setup

2013-06-25 Thread Jan Høydahl

Hi,

The -Dbootstrap_confdir option is really only meant for a first-time bootstrap 
for your development environment, not for serious use.

Once you got your config into ZK you should modify the config directly in ZK.
There are many tools (also 3rd party) for this. But your best choice is 
probably zkCli shipping with Solr.
See http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
This means you will NOT need to start Solr with -Dboostrap_confdir at all.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 10:29 skrev Utkarsh Sengar :

> Hello,
> 
> I am trying to update schema.xml for a core in a multicore setup and this
> is what I do to update it:
> 
> I have 3 nodes in my solr cluster.
> 
> 1. Pick node1 and manually update schema.xml
> 
> 2. Restart node1 with -Dbootstrap_conf=true
> java -Dsolr.solr.home=multicore -DnumShards=3 -Dbootstrap_conf=true
> -DzkHost=localhost:2181 -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar
> 
> 3. Restart the other 2 nodes using this command (without
> -Dbootstrap_conf=true since these should pull from zk).:
> java -Dsolr.solr.home=multicore -DnumShards=3 -DzkHost=localhost:2181
> -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar
> 
> But, when I do that. node1 displays all of my cores and the other 2 nodes
> displays just one core.
> 
> Then, I found this:
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3cbb7ad9bf-389b-4b94-8c1b-bbfc4028a...@gmail.com%3E
> Which says bootstrap_conf is used for multicore setup.
> 
> 
> But if I use bootstrap_conf for every node, then I will have to manually
> update schema.xml (for any config file) everywhere? That does not sound
> like an efficient way of managing configuration right?
> 
> 
> -- 
> Thanks,
> -Utkarsh

Re: Solr Document inside the document

2013-06-25 Thread Jan Høydahl

Documents in Solr are flat - they contain a flat list of fields.

Depending on your requirements for the "doc inside doc", there may be several 
workarounds:

* field naming, i.e. "mysubdoc.title", "mysubdoc.author"... to flatten the 
document inside the main one
* query time join as demonstrated in the Tutorial data (manufacturer sub 
documents as separate docs)
* block-joins (faster way to index/search parent-child relationship sub docs)

Please elaborate on your exact need

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 10:32 skrev "Siva Prasad Janapati" 
:

> 
> Hi,
> 
> I have a requirement where i need to create document inside the document. For 
> example,
> 
> 
> 
> ...
> 
> ...
> ...
> 
> 
> .
> 
> 
> 
> Is there any way to configure the document like this?
> 
> Regards,
> Siva
> [http://smarttechies.wordpress.com/] http://smarttechies.wordpress.com/

Re: Updating solrconfig and schema.xml for solrcloud in multicore setup

2013-06-25 Thread Utkarsh Sengar

But as when I launch a solr instance without "-Dbootstrap_conf=true", just
once core is launched and I cannot see the other core.

This behavior is the same as Mark's reply here:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3cbb7ad9bf-389b-4b94-8c1b-bbfc4028a...@gmail.com%3E

- bootstrap_conf: you pass it true and it reads solr.xml and uploads
the conf set for each
SolrCore it finds, gives the conf set the name of the collection and
associates each collection
with the same named config set.

So the first just lets you boot strap one collection easily...but what
if you start with a
multi-core, multi-collection setup that you want to bootstrap into
SolrCloud? And they don't
share a common config set? That's what the second command is for. You
can setup 30 local SolrCores
in solr.xml and then just bootstrap all 30 different config sets up
and have them fully linked
with each collection just by passing bootstrap_conf=true.



Note: I am using -Dbootstrap_conf=true and not -Dbootstrap_confdir


Thanks,
-Utkarsh


On Tue, Jun 25, 2013 at 2:14 AM, Jan Høydahl  wrote:

> Hi,
>
> The -Dbootstrap_confdir option is really only meant for a first-time
> bootstrap for your development environment, not for serious use.
>
> Once you got your config into ZK you should modify the config directly in
> ZK.
> There are many tools (also 3rd party) for this. But your best choice is
> probably zkCli shipping with Solr.
> See http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
> This means you will NOT need to start Solr with -Dboostrap_confdir at all.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 25. juni 2013 kl. 10:29 skrev Utkarsh Sengar :
>
> > Hello,
> >
> > I am trying to update schema.xml for a core in a multicore setup and this
> > is what I do to update it:
> >
> > I have 3 nodes in my solr cluster.
> >
> > 1. Pick node1 and manually update schema.xml
> >
> > 2. Restart node1 with -Dbootstrap_conf=true
> > java -Dsolr.solr.home=multicore -DnumShards=3 -Dbootstrap_conf=true
> > -DzkHost=localhost:2181 -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar
> start.jar
> >
> > 3. Restart the other 2 nodes using this command (without
> > -Dbootstrap_conf=true since these should pull from zk).:
> > java -Dsolr.solr.home=multicore -DnumShards=3 -DzkHost=localhost:2181
> > -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar
> >
> > But, when I do that. node1 displays all of my cores and the other 2 nodes
> > displays just one core.
> >
> > Then, I found this:
> >
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3cbb7ad9bf-389b-4b94-8c1b-bbfc4028a...@gmail.com%3E
> > Which says bootstrap_conf is used for multicore setup.
> >
> >
> > But if I use bootstrap_conf for every node, then I will have to manually
> > update schema.xml (for any config file) everywhere? That does not sound
> > like an efficient way of managing configuration right?
> >
> >
> > --
> > Thanks,
> > -Utkarsh
>
>


-- 
Thanks,
-Utkarsh

92 matches

Mail list logo