Re: [blogpost] Memory is overrated, use SSDs

2013-06-07 Thread Toke Eskildsen
On Fri, 2013-06-07 at 07:15 +0200, Andy wrote:
> One question I have is did you precondition the SSD ( 
> http://www.sandforce.com/userfiles/file/downloads/FMS2009_F2A_Smith.pdf )? 
> SSD performance tends to take a very deep dive once all blocks are written at 
> least once and the garbage collector kicks in. 

Not explicitly so. The machine is our test server with the SSDs in RAID
0 with - to my knowledge - no TRIM support. They are 2½ year old and has
had a fair amount of data written and being 3/4 full most of the time.
At one point in time we experimented with 10M+ relatively small files
and a couple of 40GB databases, so the drives are definitely not in
pristine condition.

Anyway, as Solr searches is heavy on tiny random reads, I suspect that
search performance will be largely unaffected by SSD fragmentation. It
would be interesting to examine, but for now I cannot prioritize another
large performance test.


Thank you for your input. I will update the blog post accordingly,
Toke Eskildsen, State and University Library, Denmark



Re: nutch 1.4, solr 3.4 configuration error

2013-06-07 Thread Tuğcem Oral
I had a similar error. I couldn't find any documentation which nutch and
solr versions are compatible. For instance, we' re using nutch 1.6 on
hadoop 1.0.4 with solrj 3.4.0 and index crawled segments to solr 4.2.0. But
I remember that I could find a compatible version of solrj for nutch 1.4
(because of using hadoop). You can upgrade your nutch from 1.4 to 1.6
easily. And also I suggest you to check for your solrindex-mapping.xml in
your /conf directory.

Best,

Tugcem.


On Fri, Jun 7, 2013 at 12:58 AM, Chris Hostetter
wrote:

> : ./nutch crawl urls -dir myCrawl2 -solr http://localhost:8080 -depth 2
> -topN
> ...
> : Caused by: org.apache.solr.common.SolrException: Not Found
> :
> : Not Found
> :
> : request: http://localhost:8080/select?q=id:[* TO
> : *]&fl=id&rows=1&wt=javabin&version=2
> ...
> : Other possibly helpful information:
> : 1) The solr admin screen comes up fine in the browser.
>
> At which URL does the Solr admin screen come up fine in your browser?
>
> Best guess...
>
> 1) you have solr installed such that it uses the webcontext "/solr" but
> you gave the wrong url to nutch (ie: try "-solr
> http://localhost:8080/solr";)
>
> 2) you are using multiple collections, and you may need to configure nutch
> to know about which collection you are using (ie: try "-solr
> http://localhost:8080/solr/collection1";)
>
> ...if neither of those don't help, i would suggest you follow up with the
> nutch-user list, as the nutch community is probably in the best position
> to help you configure nutch to work with Solr and vice versa)
>
>
> -Hoss
>



-- 
TO


Clear cache used by Solr

2013-06-07 Thread Varsha Rani
Hi

I 'm trying to compare the performance of different Solr queries. In order
to get a fair test, I want to clear the cache between queries.

How is this done? Of course, one can restart the server, I was to know if
there is a quicker way.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clear-cache-used-by-Solr-tp4068817.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr.NoOpDistributingUpdateProcessorFactory in SOLR CLOUD

2013-06-07 Thread sathish_ix
Hi ,

Need more information how NoOpDistributingUpdateProcessorFactory works,
Below is the cloud setup,

collection1 shard1 ---node1:8983 (leader)
| | _ _ _ _ _ _ _ _ _ _ node2:8984
|
|_ _ _ _ _ _ _ _ _ _ _ _ shard2--- node3:7585 (leader)
  |_ _ _ _ _ _ _ _ __ _ node4:7586


node 1, node 2, node 3 , node4 are 4 seperate solr instance running on 4
tomcat container.

We have included the following tag to solrconfig.xml , for not distributing
the index across shards.




  


We are able accomplish the task of loading an index only single shard by
using no-op distributingupdateprocessfactory.


>> Loaded data into node:8984 of shard 1
After indexing the size of the index on node 8984 was 94MB
Whereas the index  size on leader node for shard 1  was 4 kb.
Seems for shard 1 the leader is not performing the index building and
replication is not working.
>>But on good note, the index was not distributed to shard 2 (node 3, node
4) 

When i removed above tag updateRequestProcessorChain,
>>Index is distributed accorss shards
>>Replication is working fine.

My requirement is to store specific region index into  single shard, so the
region data is not distributed across shards.

Can you some help on this ?

Thanks,
Sathish








--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-NoOpDistributingUpdateProcessorFactory-in-SOLR-CLOUD-tp4068818.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuring seperate db-data-config.xml per shard

2013-06-07 Thread sathish_ix
Hi,

we were able to accomplish this by single collection.

Zookeeper :

create separate node for each shards, and upload the dbconfig file under
shards.

eg : /config/config1/shard1
  /config/config1/shard2
  /config/config1/shard3

In the solrconfig.xml,

 

${dbconfig}




In solr.xml,



  

  
   
  
  
  


This way you can configure dbconfig file per shard.

Thanks,
Sathish




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuring-seperate-db-data-config-xml-per-shard-tp4068383p4068819.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way to load multiple schema when using zookeeper?

2013-06-07 Thread sathish_ix
Hi,

we were able to accomplish this by single collection.

Zookeeper :

create separate node for each shards, and upload the dbconfig file under
shards.

eg : /config/config1/shard1
  /config/config1/shard2
  /config/config1/shard3

In the solrconfig.xml,

 

${dbconfig}




In solr.xml,



  
   
  
   
  
 
  


This way you can configure dbconfig file per shard.

Thanks,
Sathish 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-load-multiple-schema-when-using-zookeeper-tp4058358p4068821.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way to load multiple schema when using zookeeper?

2013-06-07 Thread sathish_ix
Hi,

we were able to accomplish this by single collection.

Zookeeper :

create separate node for each shards, and upload the dbconfig file under
shards.

eg : /config/config1/shard1
  /config/config1/shard2
  /config/config1/shard3

In the solrconfig.xml,

 

${dbconfig}




In solr.xml,



  
   
  
   
  
 
  


This way you can configure dbconfig file per shard.

Thanks,
Sathish 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-load-multiple-schema-when-using-zookeeper-tp4058358p4068820.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Clear cache used by Solr

2013-06-07 Thread Toke Eskildsen
On Fri, 2013-06-07 at 09:24 +0200, Varsha Rani wrote:
> I 'm trying to compare the performance of different Solr queries. In order
> to get a fair test, I want to clear the cache between queries.
> 
> How is this done? Of course, one can restart the server, I was to know if
> there is a quicker way.

That depends on your system. If you are using Linux or OSX, this should
work:
sudo echo 1 > /proc/sys/vm/drop_caches
For Windows, CacheSet seems to provide the functionality:
http://technet.microsoft.com/en-us/sysinternals/bb897561.aspx


To avoid any leftover from memory mapping vs. cache trickery, I stop
Solr, issue the drop_caches call and start Solr again.

- Toke Eskildsen



Re: LotsOfCores feature

2013-06-07 Thread Aleksey
> A use case would a web site or service that had millions of users, each of
> whom would have an active Solr core when they are active, but inactive
> otherwise. Of course those cores would not all reside on one node and
> ZooKeeper is out of the question for managing anything that is in the
> millions. This would be a true "cloud" or "data center" and even multi-data
> center app, not a "cluster" app.

I am getting a little bit confused again. It seems now the answer to
my question is a "clear no"?
Also, instead of managing cores is it not possible to manage servers
which will be in tens and hundreds? As far as which core goes to which
server, that could be based on some hashing scheme.


Using Solr Scripts

2013-06-07 Thread Furkan KAMACI
I have a SolrCloud and I want to maintain some important things on it. i.e.
I will backup indexes, start - stop Solr nodes individually, send an
optimize request to the cloud etc. However I see that there is a scripts
folder comes with Solr. Can I use some of them for my purposes or should I
implement something that connects to Zookeeper quorum by Solrj and does
what I want?


How to stop index distribution among shards in solr cloud

2013-06-07 Thread sathish_ix
Hi,

I have two shards, logically each shards corresponds to a region. Currently
index is distributed in solr cloud to shards, how to load index to specific
shard in solr cloud,

Any thoughts ?

Thanks,
Sathish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-stop-index-distribution-among-shards-in-solr-cloud-tp4068831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr4.3 Internationalization.

2013-06-07 Thread bsargurunathan
Guys,

Please clarify the following questions regarding Solr Internationalization.

1) Initially my requirement is need to support 2 languages(English & French)
for a Web application.
And we are using Mysql DB.

2) So please share good and easy approach to achieve it with some sample
configs.

3) And my question is whether I need to index the data with both
languages(English & French) with different cores?

4) Or indexing with English is only enough? So solr have any mechanism to
handle multiple languages while retrieving? If there anything share with
some sample configs.

Thanks
Guru



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-3-Internationalization-tp4068834.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: LotsOfCores feature

2013-06-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
The Wiki page was built not for Cloud Solr.

We have done such a deployment where less than a tenth of cores were active
at any given point in time. though there were tens of million indices they
were split among a large no:of hosts.


If you don't insist of Cloud deployment it is possible. I'm not sure if it
is possible with cloud


On Fri, Jun 7, 2013 at 12:38 AM, Aleksey  wrote:

> I was looking at this wiki and linked issues:
> http://wiki.apache.org/solr/LotsOfCores
>
> they talk about a limit being 100K cores. Is that per server or per
> entire fleet because zookeeper needs to manage that?
>
> I was considering a use case where I have tens of millions of indices
> but less that a million needs to be active at any time, so they need
> to be loaded on demand and evicted when not used for a while.
> Also since number one requirement is efficient loading of course I
> assume I will store a prebuilt index somewhere so Solr will just
> download it and strap it in, right?
>
> The root issue is marked as "won;t fix" but some other important
> subissues are marked as resolved. What's the overall status of the
> effort?
>
> Thank you in advance,
>
> Aleksey
>



-- 
-
Noble Paul


Re: SOLR CSV output in custom order

2013-06-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
Have you tried explicitly giving the field names (fl) as parameter
 http://wiki.apache.org/solr/CommonQueryParameters#fl


On Thu, Jun 6, 2013 at 12:41 PM, anurag.jain  wrote:

> I want output of csv file in proper order.  when I use wt=csv  it gives
> output in random order. Is there any way to get output in proper format.
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-CSV-output-in-custom-order-tp4068527.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
-
Noble Paul


Re: [blogpost] Memory is overrated, use SSDs

2013-06-07 Thread Erick Erickson
Thanks for this, hard data is always welcome!

Another blog post for my reference list!

Erick

On Fri, Jun 7, 2013 at 2:59 AM, Toke Eskildsen  wrote:
> On Fri, 2013-06-07 at 07:15 +0200, Andy wrote:
>> One question I have is did you precondition the SSD ( 
>> http://www.sandforce.com/userfiles/file/downloads/FMS2009_F2A_Smith.pdf )? 
>> SSD performance tends to take a very deep dive once all blocks are written 
>> at least once and the garbage collector kicks in.
>
> Not explicitly so. The machine is our test server with the SSDs in RAID
> 0 with - to my knowledge - no TRIM support. They are 2½ year old and has
> had a fair amount of data written and being 3/4 full most of the time.
> At one point in time we experimented with 10M+ relatively small files
> and a couple of 40GB databases, so the drives are definitely not in
> pristine condition.
>
> Anyway, as Solr searches is heavy on tiny random reads, I suspect that
> search performance will be largely unaffected by SSD fragmentation. It
> would be interesting to examine, but for now I cannot prioritize another
> large performance test.
>
>
> Thank you for your input. I will update the blog post accordingly,
> Toke Eskildsen, State and University Library, Denmark
>


Re: solr.NoOpDistributingUpdateProcessorFactory in SOLR CLOUD

2013-06-07 Thread Erick Erickson
I don't think you want the noop bits, I'd go back to the
standard definitions here.


What you _do_ want, I think, is the "custom hashing" option, see:
https://issues.apache.org/jira/browse/SOLR-2592
which has been in place since Solr 4.1. It allows you to
send documents to the shard of your choice, which is I believe
what you're really after here.

Best
Erick

On Fri, Jun 7, 2013 at 3:31 AM, sathish_ix  wrote:
> Hi ,
>
> Need more information how NoOpDistributingUpdateProcessorFactory works,
> Below is the cloud setup,
>
> collection1 shard1 ---node1:8983 (leader)
> | | _ _ _ _ _ _ _ _ _ _ node2:8984
> |
> |_ _ _ _ _ _ _ _ _ _ _ _ shard2--- node3:7585 (leader)
>   |_ _ _ _ _ _ _ _ __ _ node4:7586
>
>
> node 1, node 2, node 3 , node4 are 4 seperate solr instance running on 4
> tomcat container.
>
> We have included the following tag to solrconfig.xml , for not distributing
> the index across shards.
> 
> 
> 
> 
> 
>
>
> We are able accomplish the task of loading an index only single shard by
> using no-op distributingupdateprocessfactory.
>
>
>>> Loaded data into node:8984 of shard 1
> After indexing the size of the index on node 8984 was 94MB
> Whereas the index  size on leader node for shard 1  was 4 kb.
> Seems for shard 1 the leader is not performing the index building and
> replication is not working.
>>>But on good note, the index was not distributed to shard 2 (node 3, node
> 4)
>
> When i removed above tag updateRequestProcessorChain,
>>>Index is distributed accorss shards
>>>Replication is working fine.
>
> My requirement is to store specific region index into  single shard, so the
> region data is not distributed across shards.
>
> Can you some help on this ?
>
> Thanks,
> Sathish
>
>
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-NoOpDistributingUpdateProcessorFactory-in-SOLR-CLOUD-tp4068818.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Clear cache used by Solr

2013-06-07 Thread Erick Erickson
I really question whether this is valuable. Much of Solr performance
is there explicitly because of caches, so what you're measuring
is disk I/O to fill caches and any other latency. I'm just not sure
what operational information you'll get here.

But assuming that you're really getting actionable data, you can
comment out all of the caches in the solrconfig.xml file to at least
remove those. The underlying lucene caches will not be emptied,
but they'll always be filled anyway for all the queries after the first
few, you can't avoid them.

Best
Erick

On Fri, Jun 7, 2013 at 3:24 AM, Varsha Rani  wrote:
> Hi
>
> I 'm trying to compare the performance of different Solr queries. In order
> to get a fair test, I want to clear the cache between queries.
>
> How is this done? Of course, one can restart the server, I was to know if
> there is a quicker way.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Clear-cache-used-by-Solr-tp4068817.html
> Sent from the Solr - User mailing list archive at Nabble.com.


solr facet query on multiple search term

2013-06-07 Thread vrparekh
Hello All,

I required facet counts for multiple SearchTerms.
Currently I am doing two separate facet query on each search term with
facet.range="dateField"

e.g.

 http://solrserver/select?q=1stsearchTerm&fq=on&facet-parameters 

 http://solrserver/select?q=2ndsearchTerm&fq=on&facet-parameters

Note :: SearchTerm field will be text_en_splitting

Now I have found another way to do facet query on multiple search term by
tagging and excluding

e.g.

http://solrurl/select?start=0&rows=10&hl=off&;
facet=on&
facet.range.start=2013-06-06T16%3a00%3a00Z&
facet.range.end=2013-06-07T16%3a00%3a01Z&
facet.range.gap=%2B1HOUR&
wt=xml&
sort=dateField+desc&
facet.range={!key=music+ex=movie}dateField&
   
fq={!tag=music}content:"music"&facet.range={!key=movie+ex=music}dateField&
fq={!tag=movie}content:"movie"&q=(col2:1+)&
   
fq=+dateField:[2013-06-05T16:00:00Z+TO+2013-06-07T16:00:00Z]+AND+(+Col1:"test"+)&
fl=col1,col2,col3


I have tested for few search term , It is providing same result as different
query for each search term.
Is this the proper way (with results and performance)?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-facet-query-on-multiple-search-term-tp4068856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: LotsOfCores feature

2013-06-07 Thread Erick Erickson
I should have been clearer, and others have mentioned... the "lots of cores"
stuff is really outside Zookeeper/SolrCloud at present. I don't think it's
incompatible, but it wasn't part of the design so it'll need some effort to
make it play nice with SolrCloud. I'm not sure there's actually a compelling
use-case for combining the two.

bq: Also, instead of managing cores is it not possible to manage servers
which will be in tens and hundreds?

Well, tens to hundreds of servers will work with SolrCloud. You could
theoretically take over routing documents (i.e. custom hashing) and
simply use SolrCloud without the "lots of cores" stuff. So the scenario
is that you have, say, 250 machines that will hold all your data and use
custom routing to get the right docs to the right core. Some of the upcoming
SolrJ being capable of sending requests only to the proper shard would
certainly help here. But this too is rather unexplored territory. I don't think
Zookeeper would really have a problem here because it's not moving much
data back and forth, the 1M limitation for data in ZK is on a per-core basis
and really applies only to the conf data, NOT the index.

But the current approach does lend itself to Jack's scenario. Essentially your
ClusterKeeper could send the index to one of the machines and create the
core there.

The current approach addresses the case where you are essentially doing
what Jack outlined semi-manually. That is, you're distributing your cores
around your cluster based on historical access patterns. It's pretty easy to
move the cores around by copying the dirs and using the auto-discovery
stuff to keep things in balance, but it's in no way automatic and probably
requires a restart (or at least core unload/load). Jack's idea
of doing this dynamically should work in that kind of scenario.

I can imagine, for instance, some relatively small number of physical
machines and all the user's indexes actually being kept on a networked
filesystem. The startup process is simply finding a machine with spare
capacity and telling it to create the core and pointing it at the pre-existing
index. On the assumption that the indexes fit into memory, you'd pay a
small penalty for start-up but wouldn't need to copy indexes around. You
could elaborate this as necessary, tuning the transient caches such that
you "fit" the number/size of users to particular hardware. If the store were
an HDFS file system, redundancy/backup/error recovery would come along
"for free".

But under any scenario, one of the hurdles will be figuring out how many
simultaneous users of whatever size can actually be comfortably handled
by a particular piece of hardware. And usually there's some kind of long
tail just to make it worse. Most of your users will be under X documents,
and some users will be 100X And updating would be "interesting".

But I should emphasize that anything elaborate like this dynamic shuffling
is kind of theoretical at this point, meaning we haven't actually tested it. It
_should_ work, but I'm sure there will be some issues to flush out.

Best
Erick

On Fri, Jun 7, 2013 at 6:38 AM, Noble Paul നോബിള്‍  नोब्ळ्
 wrote:
> The Wiki page was built not for Cloud Solr.
>
> We have done such a deployment where less than a tenth of cores were active
> at any given point in time. though there were tens of million indices they
> were split among a large no:of hosts.
>
>
> If you don't insist of Cloud deployment it is possible. I'm not sure if it
> is possible with cloud
>
>
> On Fri, Jun 7, 2013 at 12:38 AM, Aleksey  wrote:
>
>> I was looking at this wiki and linked issues:
>> http://wiki.apache.org/solr/LotsOfCores
>>
>> they talk about a limit being 100K cores. Is that per server or per
>> entire fleet because zookeeper needs to manage that?
>>
>> I was considering a use case where I have tens of millions of indices
>> but less that a million needs to be active at any time, so they need
>> to be loaded on demand and evicted when not used for a while.
>> Also since number one requirement is efficient loading of course I
>> assume I will store a prebuilt index somewhere so Solr will just
>> download it and strap it in, right?
>>
>> The root issue is marked as "won;t fix" but some other important
>> subissues are marked as resolved. What's the overall status of the
>> effort?
>>
>> Thank you in advance,
>>
>> Aleksey
>>
>
>
>
> --
> -
> Noble Paul


Documents

2013-06-07 Thread acasaus
Good morning,

I would like to know how I can modify a xml file to access to my information 
and not to the example information because I have one file from I obtains the 
information that I use to show the user with Blacklight.

Sorry about my english,

Alex


Re: Documents

2013-06-07 Thread Dmitry Kan
hi,

you need to parse your custom xml file and transform it into the xml file
that will be of format solr understands. If you are familiar with xslt, you
could do that in a few lines depending on the complexity of the input xml
file.

Dmitry


On Fri, Jun 7, 2013 at 3:34 PM,  wrote:

> Good morning,
>
> I would like to know how I can modify a xml file to access to my
> information
> and not to the example information because I have one file from I obtains
> the
> information that I use to show the user with Blacklight.
>
> Sorry about my english,
>
> Alex
>


Re: Doubt Regarding Shards Index

2013-06-07 Thread sathish_ix
Hi ,

How did you distribute the index by year to different shards,
do we need to write any code ?

Thanks,
Sathish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doubt-Regarding-Shards-Index-tp3629964p4068869.html
Sent from the Solr - User mailing list archive at Nabble.com.


[CROSS-POSTING] SOLR-4903 and SOLR-4904

2013-06-07 Thread Dmitry Kan
CROSS-POSTING from dev list.

Hi guys,

As discussed with Grant and Andrzej I have created two jiras related to
inefficiency in distributed faceting. This affects 3.4, but my gut feeling
is telling me 4.x is affected as well.

Regards,

Dmitry Kan

P.S. Asking this question won yours truly second prize on Stump the chump
this year. :)


Re: HdfsDirectoryFactory

2013-06-07 Thread Mark Miller
Eagle eye man.

Yeah, we plan on contributing hdfs support for Solr. I'm flying home today and 
will create a JIRA issue for it shortly after I get there.

- Mark

On Jun 6, 2013, at 6:16 PM, Jamie Johnson  wrote:

> I've seen reference to an HdfsDirectoryFactory in the new Cloudera Search
> along with a commit in the Solr SVN (
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml?view=markup),
> is this something that is being made part of the core?  I've seen
> discussions in the past where folks have recommended not using an HDFS
> based DirectoryFactory for reasons like speed, any details/information that
> can be provided would be really appreciated.



Re: Doubt Regarding Shards Index

2013-06-07 Thread Dmitry Kan
Hi,

Sharding by time by itself does not need any custom code on solr side:
start indexing your data to a shard, depending on the timestamp of your
document.

The querying part is trickier if you want to have one front end solr: it
should know which shards to query. If querying all shards for each query is
fine for you, then you are good and no changes needed. Alternatively, you
can shoot a query to a particular year shard knowing the year of your user
query.

Dmitry


On Fri, Jun 7, 2013 at 3:54 PM, sathish_ix wrote:

> Hi ,
>
> How did you distribute the index by year to different shards,
> do we need to write any code ?
>
> Thanks,
> Sathish
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Doubt-Regarding-Shards-Index-tp3629964p4068869.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-07 Thread Otis Gospodnetic
This is exactly what we did for a clients (alas using Elasticsearch). We
then observed better performance through SPM. We used the latest Oracle JVM.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jun 7, 2013 2:55 AM, "Bernd Fehling" 
wrote:

> Hi Shawn,
>
> I also had CMS with tons of tuning options but still had once in a while
> bigger GC pause. After switching to JDK7 I tried G1GC with no other options
> and it runs perfekt.
> With CMS I saw that old and young generation where growing until they
> "had to do" a GC. This produces the sawtooth and also takes longer GC
> pause time.
> With G1GC the GC is more frequently and better timed, it is softer, more
> flexible.
> I just removed any old tuning and old GC and have only the G1GC option.
>
> ulimit -c unlimited
> ulimit -l 256
> ulimit -m unlimited
> ulimit -n 8192
> ulimit -s unlimited
> ulimit -v unlimited
>
> JAVA_OPTS="-server -d64 -Xmx20g -Xms20g -XX:+UseG1GC
>   -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc.log"
>
> java version "1.7.0_07"
> Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
> Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>
> May be I have just luck with it, but for big heaps it works fine.
>
> Regards
> Bernd
>
>
> Am 06.06.2013 16:23, schrieb Shawn Heisey:
> > On 6/6/2013 3:50 AM, Bernd Fehling wrote:
> >> What helped me a lot was switching to G1GC.
> >> Faster, smoother, very little ripple, nearly no sawtooth.
> >
> > When I tried G1, it did indeed produce a better looking memory graph,
> > but it didn't do anything about my GC pauses.  They were several seconds
> > with just CMS and NewRatio, and they actually seemed to get slightly
> > worse when I tried G1 instead.
> >
> > To solve the GC pause problem, I've had to switch back to CMS and tack
> > on several more tuning options, most of which are CMS-specific.  I'm not
> > sure how to tune G1.  Have you done any additional tuning?
> >
> > Thanks,
> > Shawn
> >
>


Re: Clear cache used by Solr

2013-06-07 Thread Yonik Seeley
On Fri, Jun 7, 2013 at 7:32 AM, Erick Erickson  wrote:
> I really question whether this is valuable. Much of Solr performance
> is there explicitly because of caches

Right, and it's also the case that certain solr features are coded
with the cache in mind (i.e. they will be utilized for a single
request for things like highlighting, multi-select faceting, etc.

On Fri, Jun 7, 2013 at 3:24 AM, Varsha Rani  wrote:
> I 'm trying to compare the performance of different Solr queries. In order
> to get a fair test, I want to clear the cache between queries.

If you are using/testing lucene query syntax, you can just add an
additional term that doesn't match anything and then keep changing
it... that will prevent the query/filter cache from recognizing it as
the same.

q=(my big query I'm testing) ab

And then next time change the "b" to a "c", etc.

Or you could explicitly tell solr not to cache it:

http://yonik.com/posts/advanced-filter-caching-in-solr/

q={!cache=false}(my big query I'm testing)

-Yonik
http://lucidworks.com


Re: LotsOfCores feature

2013-06-07 Thread Jack Krupansky
AFAICT, SolrCloud addresses the use case of distributed update for a 
relatively smaller number of collections (dozens?) that have a relatively 
larger number of rows - billions over a modest to moderate number of nodes 
(a handful to a dozen or dozens). So, maybe dozens of collections (some 
people still call these "cores") that distribute hundreds of millions if not 
billions of rows over dozens (or potentially low hundreds) of nodes. 
Technically, ZK was designed for thousands of nodes, but I don't think that 
was for the use case of distributed query that constantly fans out to all 
shards.


Aleksey: What would you say is the average core size for your use case - 
thousands or millions of rows? And how sharded would each of your 
collections be, if at all?


-- Jack Krupansky

-Original Message- 
From: Noble Paul നോബിള്‍ नोब्ळ्

Sent: Friday, June 07, 2013 6:38 AM
To: solr-user@lucene.apache.org
Subject: Re: LotsOfCores feature

The Wiki page was built not for Cloud Solr.

We have done such a deployment where less than a tenth of cores were active
at any given point in time. though there were tens of million indices they
were split among a large no:of hosts.


If you don't insist of Cloud deployment it is possible. I'm not sure if it
is possible with cloud


On Fri, Jun 7, 2013 at 12:38 AM, Aleksey  wrote:


I was looking at this wiki and linked issues:
http://wiki.apache.org/solr/LotsOfCores

they talk about a limit being 100K cores. Is that per server or per
entire fleet because zookeeper needs to manage that?

I was considering a use case where I have tens of millions of indices
but less that a million needs to be active at any time, so they need
to be loaded on demand and evicted when not used for a while.
Also since number one requirement is efficient loading of course I
assume I will store a prebuilt index somewhere so Solr will just
download it and strap it in, right?

The root issue is marked as "won;t fix" but some other important
subissues are marked as resolved. What's the overall status of the
effort?

Thank you in advance,

Aleksey





--
-
Noble Paul 



Re: Schema Change: Int -> String (i am the original poster, new email address)

2013-06-07 Thread Jack Krupansky

Right, a search for "442" would not match "1442".

-- Jack Krupansky

-Original Message- 
From: z z

Sent: Friday, June 07, 2013 2:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema Change: Int -> String (i am the original poster, new 
email address)


Maybe if I were to say that the column "user_id" will become "user_ids"
that would clarify things?

user_id:2002+AND+created:[${**from}+TO+${until}]+data:"more"

becomes

user_id*s*:2002+AND+created:[${**from}+TO+${until}]+data:"more"

where I want 2002 to be an exact positive match on one of the user_ids
embedded in the TEXT ... not string :)  If I am totally off or making no
sense, feedback it very welcome.  I am just seeing lots of similar data
going into my db and it feels like Solr should be able to handle this.

I just want to know if transforming the data like that will still allow
exact searches against a user_id.  My language from a solr gurus point of
view is probably *very* poorly phrased ... "exact" and TEXT might not go
hand in hand.

Is the TEXT "20 1442 35" parsed as "20" "1442" "35" so that a search
against it for "1442" will yield "exact" results?  A search against "442"
wont match right?

1. "20 1442 35"
2. "20 442 35"
3. "20 1442"

user_ids:1442 -> yields #1 & #3 always?
user_ids:442 -> yields only #2 always?

My lack of understanding about what solr does when it indexes is shining
through :)


On Fri, Jun 7, 2013 at 1:43 PM, z z  wrote:


My language might be a bit off (I am saying "string" when I probably mean
"text" in the context of solr), but I'm pretty sure that my story is
unwavering ;)

`id` int(11) NOT NULL AUTO_INCREMENT
`created` int(10)
`data` varbinary(255)
`user_id` int(11)

So, imagine that we have 1000 entries come in where "data" above is
exactly the same for all 1000 entries, but user_id is different (id and
created being different is irrelevant).  I am thinking that prior to
inserting into mysql, I should be able to concatenate the user_ids 
together

with whitespace and then insert them into something like:

`id` int(11) NOT NULL AUTO_INCREMENT
`created` int(10)
`data` varbinary(255)
`user_id` blob

Then on solr's end it will treat the user_id as Text and parse it (I want
to say tokenize, but maybe my language is incorrect here?).

Then when I search

user_id:2002+AND+created:[${**from}+TO+${until}]+data:"more"

I want to be sure that if I look for user_id "2002", I will get data that
only has a value "2002" in the user_id column and that a separate user 
with

id "20" cannot accidentally pull data for user_id "2002" as a result of a
fuzzy (my language ok?) match of 20 against (20)02.

Current schema definition:

 

New schema definition:


...

  

  







Re: OR query with null value and non-null value(s)

2013-06-07 Thread Jack Krupansky
Yes, it SHOULD! And in the LucidWorks Search query parser it does. Why 
doesn't it in Solr? Ask Yonik to explain that!


-- Jack Krupansky

-Original Message- 
From: Rahul R

Sent: Friday, June 07, 2013 1:21 AM
To: solr-user@lucene.apache.org
Subject: Re: OR query with null value and non-null value(s)

Thank you Shawn. This does work. To help me understand better, why do
we need the *:* ? Shouldn't it be implicit ?
Shouldn't
fq=(price:4+OR+(-price:[* TO *]))  //does not work
mean the same as
fq=(price:4+OR+(*:* -price:[* TO *]))   //works

Why does Solr need the *:* there ?




On Fri, Jun 7, 2013 at 12:07 AM, Shawn Heisey  wrote:


On 6/6/2013 12:28 PM, Rahul R wrote:


I have recently enabled facet.missing=true in solrconfig.xml which gives
null facet values also. As I understand it, the syntax to do a faceted
search on a null value is something like this:
&fq=-price:[* TO *]
So when I want to search on a particular value (for example : 4)  OR null
value, I would expect the syntax to be something like this:
&fq=(price:4+OR+(-price:[* TO *]))
But this does not work. After searching around for more, read somewhere
that the right way to achieve this would be:
fq=-(-price:4+AND+price:[*+TO+***])
Now this does work but seems like a very roundabout way. Is there a 
better

way to achieve this ?



Pure negative queries don't work -- you have to have results in the query
before you can subtract.  For some top-level queries, Solr is able to
detect this situation and fix it internally, but on inner queries you must
explicitly state your intentions.  It is best if you always use '*:*
-query' syntax, just to be safe.

fq=(price:4+OR+(*:* -price:[* TO *]))

Thanks,
Shawn






Re: Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-07 Thread adityab
Hi All, 
I work with Sandeep M, so continued to his comments. We did observe a memory
growth. 
We use jdk1.6.0_45 with CMS. We see this issue because of large document
size. With large i mean our single document has large multivalued fields. 
We found that JIRA  LUCENE-4995
   is what we
experienced. and the patch seam to resolve our issue. We are performing more
test around it. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879p4068886.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Documents

2013-06-07 Thread Alexandre Rafalovitch
If you are trying to import an external XML file into your system, you
may want to look at DataImportHandler. It is a good way to start. Look
at Wikipedia examples.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jun 7, 2013 at 8:34 AM,   wrote:
> Good morning,
>
> I would like to know how I can modify a xml file to access to my information
> and not to the example information because I have one file from I obtains the
> information that I use to show the user with Blacklight.
>
> Sorry about my english,
>
> Alex


Re: Solr4.3 Internationalization.

2013-06-07 Thread Alexandre Rafalovitch
It may be helpful to approach this from the other side. Specifically search.

Are you:
1) Expecting to search across both French and English content (e.g.
French, but fallback to English if translation is missing)? If yes,
you want a single collection
2) Is French content completely separate from English content or are
they just a couple of translated fields in otherwise shared entity? If
later, you want a single collection.
3) Are you accessing all languages at once when you retrieve a record
or just one language at a time? If all languages at once, you want a
single collection.

And so on. If your content is completely separate, you could do
different collections. Otherwise, you probably want the same
collection.

If you do want a single collection, there is a couple of things you
can do to make it transparent for the frontend code to switch between
languages and make search transparent. While not a production use, it
is explored in details in my just released book:
http://www.packtpub.com/apache-solr-for-indexing-data/book . The
corresponding example is at:
https://github.com/arafalov/solr-indexing-book/tree/master/published/languages
but I am not sure how easy it is to understand without the walkthrough
in the book.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jun 7, 2013 at 6:08 AM, bsargurunathan  wrote:
> Guys,
>
> Please clarify the following questions regarding Solr Internationalization.
>
> 1) Initially my requirement is need to support 2 languages(English & French)
> for a Web application.
> And we are using Mysql DB.
>
> 2) So please share good and easy approach to achieve it with some sample
> configs.
>
> 3) And my question is whether I need to index the data with both
> languages(English & French) with different cores?
>
> 4) Or indexing with English is only enough? So solr have any mechanism to
> handle multiple languages while retrieving? If there anything share with
> some sample configs.
>
> Thanks
> Guru
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr4-3-Internationalization-tp4068834.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.3.0 Cloud Issue indexing pdf documents

2013-06-07 Thread Mark Wilson
Hi

I am having an issue with adding pdf documents to a SolrCloud index I have
setup.

I can index pdf documents fine using 4.3.0 on my local box, but I have a
SolrCloud instance setup on the Amazon Cloud (Using 2 servers) and I get
Error.

It seems that it is not loading org.apache.pdfbox.pdmodel.PDPage. However,
the jar is in the directory, and referenced in the solrconfig.xml file

  
  

  
  

  
  

  
  

When I start Tomcat, I can see that the file has loaded.

2705 [coreLoadExecutor-4-thread-3] INFO
org.apache.solr.core.SolrResourceLoader  ­ Adding
'file:/www/solr/lib/contrib/extraction/lib/pdfbox-1.7.1.jar' to classloader

But when I try to add a document.

java 
-Durl=http://ec2-blah-blaheu-west-1.compute.amazonaws.com:8080/solr/quosa2-c
ollection/update/extract -Dparams=literal.id=pdf1 -Dtype=text/pdf -jar
post.jar 2008.Genomics.pdf


I get this error. I¹m running on an Ubuntu machine.

Linux ip-10-229-125-163 3.5.0-21-generic #32-Ubuntu SMP Tue Dec 11 18:51:59
UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Error log.

88168 [http-bio-8080-exec-1] INFO
org.apache.solr.update.processor.LogUpdateProcessor  ­
[quosa2-collection_shard1_replica1] webapp=/solr path=/update/extract
params={literal.id=pdf1} {} 0 1534
88180 [http-bio-8080-exec-1] ERROR
org.apache.solr.servlet.SolrDispatchFilter  ­
null:java.lang.RuntimeException: java.lang.UnsatisfiedLinkError:
/usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so: libXrender.so.1:
cannot open shared object file: No such file or directory
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java
:670)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
380)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
155)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171
)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Proce
ssor.java:1009)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstrac
tProtocol.java:589)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:
310)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.UnsatisfiedLinkError:
/usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so: libXrender.so.1:
cannot open shared object file: No such file or directory
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1939)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1864)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1825)
at java.lang.Runtime.load0(Runtime.java:792)
at java.lang.System.load(System.java:1059)
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1939)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1864)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1846)
at java.lang.Runtime.loadLibrary0(Runtime.java:845)
at java.lang.System.loadLibrary(System.java:1084)
at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:67)
at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:47)
at java.security.AccessController.doPrivileged(Native Method)
at java.awt.Toolkit.loadLibraries(Toolkit.java:1648)
at java.awt.Toolkit.(Toolkit.java:1670)
at java.awt.Color.(Color.java:275)
at org.apache.pdfbox.pdmodel.PDPage.(PDPage.java:72)
at org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:212)
at org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:184)
at 
org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.ja
va:211)
at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:72)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apa

Custom Data Clustering

2013-06-07 Thread Raheel Hasan
Hi,

Can someone please tell me if there is a way to have a custom *`clustering
of the data`* from `solr` 'query' results? I am facing 2 issues currently:

 1. The `*Carrot*` clustering only applies clustering to the "paged"
results (i.e. in the current pagination's page results).

 2. I need to have custom clustering and classify results into certain
classes only (i.e. only few very specific words in the search results).
Like for example "Red", "Green", "Blue" etc... and not "hello World",
"Known World", "green world" etc -(if you know what I mean here) -
Where all these words in both Do and DoNot existing in the search results.

Please tell me how to achieve this. Perhaps Carrot/clustering is not needed
here and some other classifier is needed. So what to do here?

Basically, I cannot receive 1 million results, then process them via
PHP-Array to classify them as per need. The classification must be done
here in solr only.

Thanks

-- 
Regards,
Raheel Hasan


RE: How to stop index distribution among shards in solr cloud

2013-06-07 Thread James Thomas
This may help:

http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud
--- See "Document Routing" section.


-Original Message-
From: sathish_ix [mailto:skandhasw...@inautix.co.in] 
Sent: Friday, June 07, 2013 5:27 AM
To: solr-user@lucene.apache.org
Subject: How to stop index distribution among shards in solr cloud

Hi,

I have two shards, logically each shards corresponds to a region. Currently 
index is distributed in solr cloud to shards, how to load index to specific 
shard in solr cloud,

Any thoughts ?

Thanks,
Sathish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-stop-index-distribution-among-shards-in-solr-cloud-tp4068831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr.NoOpDistributingUpdateProcessorFactory in SOLR CLOUD

2013-06-07 Thread Chris Hostetter

: I don't think you want the noop bits, I'd go back to the
: standard definitions here.

Correct.

the NoOpDistributingUpdateProcessorFactory is for telling the update 
processor chain that you do not want it to do any distribution of updates 
at all -- whatever SolrCore you send the doc to, is the only do that gets 
it, and RunUpdateProcessor will write it to it's local index.



-Hoss


Re: Solr 4.3.0 Cloud Issue indexing pdf documents

2013-06-07 Thread Michael Della Bitta
Hi Mark,

This is a total shot in the dark, but does
passing  -Djava.awt.headless=true when you run the server help at all?

More on awt headless mode:
http://www.oracle.com/technetwork/articles/javase/headless-136834.html

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Fri, Jun 7, 2013 at 11:31 AM, Mark Wilson  wrote:

> Hi
>
> I am having an issue with adding pdf documents to a SolrCloud index I have
> setup.
>
> I can index pdf documents fine using 4.3.0 on my local box, but I have a
> SolrCloud instance setup on the Amazon Cloud (Using 2 servers) and I get
> Error.
>
> It seems that it is not loading org.apache.pdfbox.pdmodel.PDPage. However,
> the jar is in the directory, and referenced in the solrconfig.xml file
>
>   
>   
>
>   
>   
>
>   
>   
>
>   
>   
>
> When I start Tomcat, I can see that the file has loaded.
>
> 2705 [coreLoadExecutor-4-thread-3] INFO
> org.apache.solr.core.SolrResourceLoader  ­ Adding
> 'file:/www/solr/lib/contrib/extraction/lib/pdfbox-1.7.1.jar' to classloader
>
> But when I try to add a document.
>
> java
> -Durl=
> http://ec2-blah-blaheu-west-1.compute.amazonaws.com:8080/solr/quosa2-c
> ollection/update/extract -Dparams=literal.id=pdf1 -Dtype=text/pdf -jar
> post.jar 2008.Genomics.pdf
>
>
> I get this error. I¹m running on an Ubuntu machine.
>
> Linux ip-10-229-125-163 3.5.0-21-generic #32-Ubuntu SMP Tue Dec 11 18:51:59
> UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> Error log.
>
> 88168 [http-bio-8080-exec-1] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  ­
> [quosa2-collection_shard1_replica1] webapp=/solr path=/update/extract
> params={literal.id=pdf1} {} 0 1534
> 88180 [http-bio-8080-exec-1] ERROR
> org.apache.solr.servlet.SolrDispatchFilter  ­
> null:java.lang.RuntimeException: java.lang.UnsatisfiedLinkError:
> /usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so: libXrender.so.1:
> cannot open shared object file: No such file or directory
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java
> :670)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
> 380)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
> 155)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
> FilterChain.java:243)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
> ain.java:210)
> at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
> va:222)
> at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
> va:123)
> at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171
> )
> at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
> at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
> :118)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> at
>
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Proce
> ssor.java:1009)
> at
>
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstrac
> tProtocol.java:589)
> at
>
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:
> 310)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
> 45)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
> 15)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.UnsatisfiedLinkError:
> /usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so: libXrender.so.1:
> cannot open shared object file: No such file or directory
> at java.lang.ClassLoader$NativeLibrary.load(Native Method)
> at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1939)
> at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1864)
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1825)
> at java.lang.Runtime.load0(Runtime.java:792)
> at java.lang.System.load(System.java:1059)
> at java.lang.ClassLoader$NativeLibrary.load(Native Method)
> at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1939)
> at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1864)
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1846)
> at java.lang.Runtime.loadLibrary0(Runtime.java:845)
> at java.lang.System.loadLibrary(System.java:1084)
> at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:67)
> at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:47)
> 

Re: OR query with null value and non-null value(s)

2013-06-07 Thread Rahul R
Thank you for the Clarification Shawn.


On Fri, Jun 7, 2013 at 7:34 PM, Jack Krupansky wrote:

> Yes, it SHOULD! And in the LucidWorks Search query parser it does. Why
> doesn't it in Solr? Ask Yonik to explain that!
>
> -- Jack Krupansky
>
> -Original Message- From: Rahul R
> Sent: Friday, June 07, 2013 1:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: OR query with null value and non-null value(s)
>
>
> Thank you Shawn. This does work. To help me understand better, why do
> we need the *:* ? Shouldn't it be implicit ?
> Shouldn't
> fq=(price:4+OR+(-price:[* TO *]))  //does not work
> mean the same as
> fq=(price:4+OR+(*:* -price:[* TO *]))   //works
>
> Why does Solr need the *:* there ?
>
>
>
>
> On Fri, Jun 7, 2013 at 12:07 AM, Shawn Heisey  wrote:
>
>  On 6/6/2013 12:28 PM, Rahul R wrote:
>>
>>  I have recently enabled facet.missing=true in solrconfig.xml which gives
>>> null facet values also. As I understand it, the syntax to do a faceted
>>> search on a null value is something like this:
>>> &fq=-price:[* TO *]
>>> So when I want to search on a particular value (for example : 4)  OR null
>>> value, I would expect the syntax to be something like this:
>>> &fq=(price:4+OR+(-price:[* TO *]))
>>> But this does not work. After searching around for more, read somewhere
>>> that the right way to achieve this would be:
>>> fq=-(-price:4+AND+price:[*+TO+*])
>>>
>>> Now this does work but seems like a very roundabout way. Is there a
>>> better
>>> way to achieve this ?
>>>
>>>
>> Pure negative queries don't work -- you have to have results in the query
>> before you can subtract.  For some top-level queries, Solr is able to
>> detect this situation and fix it internally, but on inner queries you must
>> explicitly state your intentions.  It is best if you always use '*:*
>> -query' syntax, just to be safe.
>>
>> fq=(price:4+OR+(*:* -price:[* TO *]))
>>
>> Thanks,
>> Shawn
>>
>>
>>
>


Re: LotsOfCores feature

2013-06-07 Thread Aleksey
> Aleksey: What would you say is the average core size for your use case -
> thousands or millions of rows? And how sharded would each of your
> collections be, if at all?

Average core/collection size wouldn't even be thousands, hundreds more
like. And the largest would be half a million or so but that's a
pathological case. I don't need sharding and queries than fan out to
different machines. If fact I'd like to avoid that so I don't have to
collate the results.


> The Wiki page was built not for Cloud Solr.
>
> We have done such a deployment where less than a tenth of cores were active
> at any given point in time. though there were tens of million indices they
> were split among a large no:of hosts.
>
> If you don't insist of Cloud deployment it is possible. I'm not sure if it
> is possible with cloud

By Cloud you mean specifically SolrCloud? I don't have to have it if I
can do without it. Bottom line is I want a bunch of small cores to be
distributed over a fleet, each core completely fitting on one server.
Would you be willing to provide a little more details on your setup?
In particular, how are you managing the cores?
How do you route requests to proper server?
If you scale the fleet up and down, does reshuffling of the cores
happen automatically or is it an involved manual process?

Thanks,

Aleksey


Re: LotsOfCores feature

2013-06-07 Thread Jack Krupansky

Thanks. That's what I suspected. Yes, MegaMiniCores.

My scenario is purely hypothetical. But it is also relevant for 
"multi-tenant" use cases, where the users and schemas are not known in 
advance and are only online intermittently.


Users could fit three rough size categories: very small, medium, and very 
large. Over time a user might move from very small to medium to very large. 
Very large users could require their own dedicated clusters. Medium size 
could occasionally require a dedicated node, but not always. And very small 
is mostly offline but occasionally a fair number are online for short 
periods of time.


-- Jack Krupansky

-Original Message- 
From: Aleksey

Sent: Friday, June 07, 2013 3:44 PM
To: solr-user
Subject: Re: LotsOfCores feature


Aleksey: What would you say is the average core size for your use case -
thousands or millions of rows? And how sharded would each of your
collections be, if at all?


Average core/collection size wouldn't even be thousands, hundreds more
like. And the largest would be half a million or so but that's a
pathological case. I don't need sharding and queries than fan out to
different machines. If fact I'd like to avoid that so I don't have to
collate the results.



The Wiki page was built not for Cloud Solr.

We have done such a deployment where less than a tenth of cores were 
active

at any given point in time. though there were tens of million indices they
were split among a large no:of hosts.

If you don't insist of Cloud deployment it is possible. I'm not sure if it
is possible with cloud


By Cloud you mean specifically SolrCloud? I don't have to have it if I
can do without it. Bottom line is I want a bunch of small cores to be
distributed over a fleet, each core completely fitting on one server.
Would you be willing to provide a little more details on your setup?
In particular, how are you managing the cores?
How do you route requests to proper server?
If you scale the fleet up and down, does reshuffling of the cores
happen automatically or is it an involved manual process?

Thanks,

Aleksey 



RE: SolrCloud Load Balancer "weight"

2013-06-07 Thread Vaillancourt, Tim
Cool!

Having those values influenced by stats is a neat idea too. I'll get on that 
soon.

Tim

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Monday, June 03, 2013 5:07 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Load Balancer "weight"


On Jun 3, 2013, at 3:33 PM, Tim Vaillancourt  wrote:

> Should I JIRA this? Thoughts?

Yeah - it's always been in the back of my mind - it's come up a few times - 
eventually we would like nodes to report some stats to zk to influence load 
balancing. 

- mark


translating a character code to an ordinal?

2013-06-07 Thread geeky2
hello all,

environment: solr 3.5, centos

problem statement:  i have several character codes that i want to translate
to ordinal (integer) values (for sorting), while retaining the original code
field in the document.

i was thinking that i could use a copyField from my "code" field to my "ord"
field - then employ a pattern replace filter factory during indexing.

but won't the copyfield fail because the two field types are different?

ps: i also read the wiki about
http://wiki.apache.org/solr/DataImportHandler#Transformer the script
transformer and regex transformer - but was hoping to avoid this - if i
could.




thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr facet query on multiple search term

2013-06-07 Thread Erick Erickson
I'm a little confused here. Faceting is about counting docs that meet
your query restrictions. I.e. the "q=" and "fq=" clauses. So your original
problem statement simply cannot be combined into a single query
since your q= clauses are different. You could do something like
q=(firstterm OR secondterm)&facet.query=firstterm&facet.query=secondTerm
That would give you accurate facet counts for the terms, but it
certainly doesn't preserve the original intent of
q=firstterm&facet.query=blahblah.

But facet.query is only counted over the docs that match
the "q=" clause (well, the q= clause and any fq clauses). So perhaps
you can supply a few example input docs and desired counts on the other side.

Best
Erick

On Fri, Jun 7, 2013 at 8:01 AM, vrparekh  wrote:
> Hello All,
>
> I required facet counts for multiple SearchTerms.
> Currently I am doing two separate facet query on each search term with
> facet.range="dateField"
>
> e.g.
>
>  http://solrserver/select?q=1stsearchTerm&fq=on&facet-parameters
>
>  http://solrserver/select?q=2ndsearchTerm&fq=on&facet-parameters
>
> Note :: SearchTerm field will be text_en_splitting
>
> Now I have found another way to do facet query on multiple search term by
> tagging and excluding
>
> e.g.
>
> http://solrurl/select?start=0&rows=10&hl=off&;
> facet=on&
> facet.range.start=2013-06-06T16%3a00%3a00Z&
> facet.range.end=2013-06-07T16%3a00%3a01Z&
> facet.range.gap=%2B1HOUR&
> wt=xml&
> sort=dateField+desc&
> facet.range={!key=music+ex=movie}dateField&
>
> fq={!tag=music}content:"music"&facet.range={!key=movie+ex=music}dateField&
> fq={!tag=movie}content:"movie"&q=(col2:1+)&
>
> fq=+dateField:[2013-06-05T16:00:00Z+TO+2013-06-07T16:00:00Z]+AND+(+Col1:"test"+)&
> fl=col1,col2,col3
>
>
> I have tested for few search term , It is providing same result as different
> query for each search term.
> Is this the proper way (with results and performance)?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-facet-query-on-multiple-search-term-tp4068856.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: translating a character code to an ordinal?

2013-06-07 Thread Jack Krupansky
This won't help you unless you move to Solr 4.0, but here's an update 
processor script from the book that can take the first character of a string 
field and add it as an integer value for another field:


 
   
 add-char-code.js
 
   content
   content_code_i
 
   
   
   
 

Here is the JavaScript script that should be placed in the 
"add-char-code.js" file in the "conf" directory for

the Solr collection:

 function processAdd(cmd) {
   var fieldName;
   var codeFieldName;
   if (typeof params !== "undefined") {
 fieldName = params.get("fieldName");
 codeFieldName = params.get("codeFieldName");
   }
   if (fieldName == null)
 fieldName = "content";
   if (codeFieldName == null)
 codeFieldName = "content_code_i";

   // Get value for named field, no-op if empty
   var value = cmd.getSolrInputDocument().getField(fieldName);
   if (value != null){
 var str = value.getFirstValue();

 // No-op if string is empty
 if (str != null && str.length() != 0){
   // Get code for first character
   var code = str.charCodeAt(0);
   logger.info("String: \"" + str + "\" len: " + str.length() + " code: 
" + code);


   // Set the character code output field value
   cmd.getSolrInputDocument().addField(codeFieldName, code);
 }
   }
 }

 function processDelete() {
   // Dummy - add if needed
 }

 function processCommit() {
   // Dummy - add if needed
 }

 function processRollback() {
   // Dummy - add if needed
 }

 function processMergeIndexes() {
   // Dummy - add if needed
 }

 function finish() {
   // Dummy - add if needed
 }

Test it:

 curl 
"http://localhost:8983/solr/update?commit=true&update.chain=script-add-char-code"; 
\

 -H 'Content-type:application/json' -d '
 [{"id": "doc-1", "content": "abc"},
  {"id": "doc-2", "content": "1"},
  {"id": "doc-3", "content": ""},
  {"id": "doc-4"},
  {"id": "doc-5", "content": "\u0002 abc"},
  {"id": "doc-6", "content": ["And, this", "is the end", "of this 
test."]}]'


Results:

 "id":"doc-1",
 "content":["abc"],
 "content_code_i":97,

 "id":"doc-2",
 "content":["1"],
 "content_code_i":49,

 "id":"doc-3",
 "content":[""],

 "id":"doc-4",

 "id":"doc-5",
 "content":["\u0002 abc"],
 "content_code_i":2,

 "id":"doc-6",
 "content":["And, this",
   "is the end",
   "of this test."],
 "content_code_i":65,

-- Jack Krupansky

-Original Message- 
From: geeky2

Sent: Friday, June 07, 2013 6:27 PM
To: solr-user@lucene.apache.org
Subject: translating a character code to an ordinal?

hello all,

environment: solr 3.5, centos

problem statement:  i have several character codes that i want to translate
to ordinal (integer) values (for sorting), while retaining the original code
field in the document.

i was thinking that i could use a copyField from my "code" field to my "ord"
field - then employ a pattern replace filter factory during indexing.

but won't the copyfield fail because the two field types are different?

ps: i also read the wiki about
http://wiki.apache.org/solr/DataImportHandler#Transformer the script
transformer and regex transformer - but was hoping to avoid this - if i
could.




thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Filtering on results with more than N words.

2013-06-07 Thread Jack Krupansky
Also from the book, here's an alternative update request processor that uses 
a JavaScript script to do the counting and field

creation:

 
   
 add-word-count.js
 
   content
   content_wc_i
 
   
   
   
 

Here is the JavaScript script that should be placed in the 
"add-word-count.js" file in the "conf" directory for

the Solr collection:

 function processAdd(cmd) {
   var fieldName;
   var wordCountFieldName;
   if (typeof params !== "undefined") {
 fieldName = params.get("fieldName");
 wordCountFieldName = params.get("wordCountFieldName");
   }
   if (fieldName == null)
 fieldName = "content";
   if (wordCountFieldName == null)
 wordCountFieldName = "content_wc_i";

   // Get value(s) for named field
   var values = cmd.getSolrInputDocument().getField(fieldName).getValues();

   // Combine values into one string
   var str = "";
   var n = values.size();
   for (i = 0; i < n; i++)
 str += ' ' + values.get(i);

   // Compress out hyphens and underscores to join words
   var str_no_dash = str.replace(/-|_/g, '');;

   // Replace words with simply "X"
   var str_x_words = str_no_dash.replace(/\w+/g, 'X');

   // Remove punctuation and white space, leaving just the "X"es.
   var str_final = str_x_words.replace(/[^X]+/g, '');

   // A count of the "X"es is a good proxy for the word count.
   var wordCount = str_final.length;

   // Set the word count output field value
   cmd.getSolrInputDocument().addField(wordCountFieldName, wordCount);
 }

 function processDelete() {
   // Dummy - add if needed
 }

 function processCommit() {
   // Dummy - add if needed
 }

 function processRollback() {
   // Dummy - add if needed
 }

 function processMergeIndexes() {
   // Dummy - add if needed
 }

 function finish() {
   // Dummy - add if needed
 }

A test:

 curl 
"http://localhost:8983/solr/update?commit=true&update.chain=script-add-word-count"; 
\

 -H 'Content-type:application/json' -d '
 [{"id": "doc-1", "content": "Hello World"},
  {"id": "doc-2", "content": ""},
  {"id": "doc-3", "content": " -- --- !"},
  {"id": "doc-4", "content": "This is some more."},
  {"id": "doc-5", "content": "The CD-ROM, (and num_events_seen.)"},
  {"id": "doc-6", "content": "Four score and seven years ago our fathers
 brought forth on this continent a new nation, conceived in liberty,
 and dedicated to the proposition that all men are created equal.
 Now we are engaged in a great civil war, testing whether that nation,
 or any nation so conceived and so dedicated, can long endure. "},
  {"id": "doc-7", "content": "401(k)"},
  {"id": "doc-8", "content": ["And, this", "is the end", "of this 
test."]}]'


Results:

 "id":"doc-1",
 "content":["Hello World"],
 "content_wc_i":2,

 "id":"doc-2",
 "content":[""],
 "content_wc_i":0,

 "id":"doc-3",
 "content":[" -- --- !"],
 "content_wc_i":0,

 "id":"doc-4",
 "content":["This is some more."],
 "content_wc_i":4,

 "id":"doc-5",
 "content":["The CD-ROM, (and num_events_seen.)"],
 "content_wc_i":4,

 "id":"doc-6",
 "content":["Four score and seven years ago our fathers\n
 brought forth on this continent a new nation, conceived in liberty,\n
 and dedicated to the proposition that all men are created equal.\n
 Now we are engaged in a great civil war, testing whether that 
nation,\n

 or any nation so conceived and so dedicated, can long endure. "],
 "content_wc_i":54,

 "id":"doc-7",
 "content":["401(k)"],
 "content_wc_i":2,

 "id":"doc-8",
 "content":["And, this",
   "is the end",
   "of this test."],
 "content_wc_i":8,




-- Jack Krupansky
-Original Message- 
From: Jack Krupansky

Sent: Thursday, June 06, 2013 5:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Filtering on results with more than N words.


From the book, here's an update request processor chain which will count the

words in the "content" field and place it in the "content_len_I" field. Then
you could do a range query on that count.



 
 
   content
   content_len_i
 

 
 
   content_len_i

 

 
 
   content_len_i
   -|_
   
 

 
 
   content_len_i
   \w+
   X
 

 
 
   content_len_i
   [^X]
   
 

 
 
   content_len_i
 

 
 


Here's a test update using the Solr example schema, assuming you add the
above URP chain to solrconfig:

curl
"http://localhost:8983/solr/update?commit=true&update.chain=regex-count-words";
\
-H 'Content-type:application/json' -d '
[{"id": "doc-1", "content": "Hello World"},
{"id": "doc-2", "content": ""},
{"id": "doc-3", "content": " -- --- !"},
{"id": "doc-4", "content": "This is some more."},
{"id": "doc-5", "content": "The CD-ROM, (and num_events_seen.)"},
{"id": "doc-6", "content": "Four score and seven years ago our fathers
   brought forth on this continent a new nation, conceived in liberty,
   and dedicated to the proposition that all men are created equal.
   Now we are engaged in a great civil war, testing whether that nation,
   or any nation so conceived and so dedicated, can long endure. "},
{"id": "doc-

Re: translating a character code to an ordinal?

2013-06-07 Thread geeky2
hello jack,

thank you for the code ;)

what "book" are you referring to?  AFAICT - all of the 4.0 books are "future
order".

we won't be moving to 4.0 (soon enough).

so i take it - copyfield will not work, eg - i cannot take a code like ABC
and copy it to an int field and then use the regex to turn it in to an
ordinal?

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068984.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: translating a character code to an ordinal?

2013-06-07 Thread Jack Krupansky
Correct, you need either an update request processor, a custom field type, 
or to preprocess your input before you give it to Solr.


You can't do analysis on a non-text field.

"The book" is my new Solr reference/guide that I will be self-publishing. We 
hope to make an "Alpha" draft available later next week.


-- Jack Krupansky
-Original Message- 
From: geeky2

Sent: Friday, June 07, 2013 8:08 PM
To: solr-user@lucene.apache.org
Subject: Re: translating a character code to an ordinal?

hello jack,

thank you for the code ;)

what "book" are you referring to?  AFAICT - all of the 4.0 books are "future
order".

we won't be moving to 4.0 (soon enough).

so i take it - copyfield will not work, eg - i cannot take a code like ABC
and copy it to an int field and then use the regex to turn it in to an
ordinal?

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068984.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Lucene/Solr Filesystem tunings

2013-06-07 Thread Tim Vaillancourt
I figured as much for atime, thanks Otis!

I haven't ran benchmarks just yet, but I'll be sure to share whatever I
find. I plan to try ext4 vs xfs.

I am also curious what effect disabling journaling (ext2) would have,
relying on SolrCloud to manage 'consistency' over many instances vs FS
journaling. Anyone have opinions there? If I test I'll share the results.

Cheers,

Tim


On 4 June 2013 16:11, Otis Gospodnetic  wrote:

> Hi,
>
> You can use noatime, nodiratime, nothing in Solr depends on that as
> far as I know.  We tend to use ext4.  Some people love xfs.  Want to
> run some benchmarks and publish the results? :)
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Tue, Jun 4, 2013 at 6:48 PM, Tim Vaillancourt 
> wrote:
> > Hey all,
> >
> > Does anyone have any advice or special filesytem tuning to share for
> > Lucene/Solr, and which file systems they like more?
> >
> > Also, does Lucene/Solr care about access times if I turn them off (I
> think I
> > doesn't care)?
> >
> > A bit unrelated: What are people's opinions on reducing some consistency
> > things like filesystem journaling, etc (ext2?) due to SolrCloud's
> additional
> > HA with replicas? How about RAID 0 x 3 replicas or so?
> >
> > Thanks!
> >
> > Tim Vaillancourt
>


Re: Two instances of solr - the same datadir?

2013-06-07 Thread Tim Vaillancourt
If it makes you feel better, I also considered this approach when I was in
the same situation with a separate indexer and searcher on one Physical
linux machine.

My main concern was "re-using" the FS cache between both instances - If I
replicated to myself there would be two independent copies of the index,
FS-cached separately.

I like the suggestion of using autoCommit to reload the index. If I'm
reading that right, you'd set an autoCommit on 'zero docs changing', or
just 'every N seconds'? Did that work?

Best of luck!

Tim


On 5 June 2013 10:19, Roman Chyla  wrote:

> So here it is for a record how I am solving it right now:
>
> Write-master is started with: -Dmontysolr.warming.enabled=false
> -Dmontysolr.write.master=true -Dmontysolr.read.master=
> http://localhost:5005
> Read-master is started with: -Dmontysolr.warming.enabled=true
> -Dmontysolr.write.master=false
>
>
> solrconfig.xml changes:
>
> 1. all index changing components have this bit,
> enable="${montysolr.master:true}" - ie.
>
>   enable="${montysolr.master:true}">
>
> 2. for cache warming de/activation
>
>class="solr.QuerySenderListener"
>   enable="${montysolr.enable.warming:true}">...
>
> 3. to trigger refresh of the read-only-master (from write-master):
>
>class="solr.RunExecutableListener"
>   enable="${montysolr.master:true}">
>   curl
>   .
>   false
>${montysolr.read.master:http://localhost
>
> }/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> 
>
> This works, I still don't like the reload of the whole core, but it seems
> like the easiest thing to do now.
>
> -- roman
>
>
> On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla 
> wrote:
>
> > Hi Peter,
> >
> > Thank you, I am glad to read that this usecase is not alien.
> >
> > I'd like to make the second instance (searcher) completely read-only, so
> I
> > have disabled all the components that can write.
> >
> > (being lazy ;)) I'll probably use
> > http://wiki.apache.org/solr/CollectionDistribution to call the curl
> after
> > commit, or write some IndexReaderFactory that checks for changes
> >
> > The problem with calling the 'core reload' - is that it seems lots of
> work
> > for just opening a new searcher, eeekkk...somewhere I read that it is
> cheap
> > to reload a core, but re-opening the index searches must be definitely
> > cheaper...
> >
> > roman
> >
> >
> > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge  >wrote:
> >
> >> Hi,
> >> We use this very same scenario to great effect - 2 instances using the
> >> same
> >> dataDir with many cores - 1 is a writer (no caching), the other is a
> >> searcher (lots of caching).
> >> To get the searcher to see the index changes from the writer, you need
> the
> >> searcher to do an empty commit - i.e. you invoke a commit with 0
> >> documents.
> >> This will refresh the caches (including autowarming), [re]build the
> >> relevant searchers etc. and make any index changes visible to the RO
> >> instance.
> >> Also, make sure to use native in solrconfig.xml to
> >> ensure the two instances don't try to commit at the same time.
> >> There are several ways to trigger a commit:
> >> Call commit() periodically within your own code.
> >> Use autoCommit in solrconfig.xml.
> >> Use an RPC/IPC mechanism between the 2 instance processes to tell the
> >> searcher the index has changed, then call commit when called (more
> complex
> >> coding, but good if the index changes on an ad-hoc basis).
> >> Note, doing things this way isn't really suitable for an NRT
> environment.
> >>
> >> HTH,
> >> Peter
> >>
> >>
> >>
> >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla 
> >> wrote:
> >>
> >> > Replication is fine, I am going to use it, but I wanted it for
> instances
> >> > *distributed* across several (physical) machines - but here I have one
> >> > physical machine, it has many cores. I want to run 2 instances of solr
> >> > because I think it has these benefits:
> >> >
> >> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
> >> > searcher (28GB)
> >> > 2) I can deactivate warming for the writer and keep it for the
> searcher
> >> > (this considerably speeds up indexing - each time we commit, the
> server
> >> is
> >> > rebuilding a citation network of 80M edges)
> >> > 3) saving disk space and better OS caching (OS should be able to use
> >> more
> >> > RAM for the caching, which should result in faster operations - the
> two
> >> > processes are accessing the same index)
> >> >
> >> > Maybe I should just forget it and go with the replication, but it
> >> doesn't
> >> > 'feel right' IFF it is on the same physical machine. And Lucene
> >> > specifically has a method for discovering changes and re-opening the
> >> index
> >> > (DirectoryReader.openIfChanged)
> >> >
> >> > Am I not seeing something?
> >> >
> >> > roman
> >> >
> >> >
> >> >
> >> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> >> > jhell...@innoventsolutions.com> wrote:
> >> >
> >> > > Roman,
> >

Re: Two instances of solr - the same datadir?

2013-06-07 Thread Roman Chyla
I have auto commit after 40k RECs/1800secs. But I only tested with manual
commit, but I don't see why it should work differently.
Roman
On 7 Jun 2013 20:52, "Tim Vaillancourt"  wrote:

> If it makes you feel better, I also considered this approach when I was in
> the same situation with a separate indexer and searcher on one Physical
> linux machine.
>
> My main concern was "re-using" the FS cache between both instances - If I
> replicated to myself there would be two independent copies of the index,
> FS-cached separately.
>
> I like the suggestion of using autoCommit to reload the index. If I'm
> reading that right, you'd set an autoCommit on 'zero docs changing', or
> just 'every N seconds'? Did that work?
>
> Best of luck!
>
> Tim
>
>
> On 5 June 2013 10:19, Roman Chyla  wrote:
>
> > So here it is for a record how I am solving it right now:
> >
> > Write-master is started with: -Dmontysolr.warming.enabled=false
> > -Dmontysolr.write.master=true -Dmontysolr.read.master=
> > http://localhost:5005
> > Read-master is started with: -Dmontysolr.warming.enabled=true
> > -Dmontysolr.write.master=false
> >
> >
> > solrconfig.xml changes:
> >
> > 1. all index changing components have this bit,
> > enable="${montysolr.master:true}" - ie.
> >
> >  >  enable="${montysolr.master:true}">
> >
> > 2. for cache warming de/activation
> >
> >  >   class="solr.QuerySenderListener"
> >   enable="${montysolr.enable.warming:true}">...
> >
> > 3. to trigger refresh of the read-only-master (from write-master):
> >
> >  >   class="solr.RunExecutableListener"
> >   enable="${montysolr.master:true}">
> >   curl
> >   .
> >   false
> >${montysolr.read.master:http://localhost
> >
> >
> }/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > 
> >
> > This works, I still don't like the reload of the whole core, but it seems
> > like the easiest thing to do now.
> >
> > -- roman
> >
> >
> > On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla 
> > wrote:
> >
> > > Hi Peter,
> > >
> > > Thank you, I am glad to read that this usecase is not alien.
> > >
> > > I'd like to make the second instance (searcher) completely read-only,
> so
> > I
> > > have disabled all the components that can write.
> > >
> > > (being lazy ;)) I'll probably use
> > > http://wiki.apache.org/solr/CollectionDistribution to call the curl
> > after
> > > commit, or write some IndexReaderFactory that checks for changes
> > >
> > > The problem with calling the 'core reload' - is that it seems lots of
> > work
> > > for just opening a new searcher, eeekkk...somewhere I read that it is
> > cheap
> > > to reload a core, but re-opening the index searches must be definitely
> > > cheaper...
> > >
> > > roman
> > >
> > >
> > > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge  > >wrote:
> > >
> > >> Hi,
> > >> We use this very same scenario to great effect - 2 instances using the
> > >> same
> > >> dataDir with many cores - 1 is a writer (no caching), the other is a
> > >> searcher (lots of caching).
> > >> To get the searcher to see the index changes from the writer, you need
> > the
> > >> searcher to do an empty commit - i.e. you invoke a commit with 0
> > >> documents.
> > >> This will refresh the caches (including autowarming), [re]build the
> > >> relevant searchers etc. and make any index changes visible to the RO
> > >> instance.
> > >> Also, make sure to use native in solrconfig.xml
> to
> > >> ensure the two instances don't try to commit at the same time.
> > >> There are several ways to trigger a commit:
> > >> Call commit() periodically within your own code.
> > >> Use autoCommit in solrconfig.xml.
> > >> Use an RPC/IPC mechanism between the 2 instance processes to tell the
> > >> searcher the index has changed, then call commit when called (more
> > complex
> > >> coding, but good if the index changes on an ad-hoc basis).
> > >> Note, doing things this way isn't really suitable for an NRT
> > environment.
> > >>
> > >> HTH,
> > >> Peter
> > >>
> > >>
> > >>
> > >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla 
> > >> wrote:
> > >>
> > >> > Replication is fine, I am going to use it, but I wanted it for
> > instances
> > >> > *distributed* across several (physical) machines - but here I have
> one
> > >> > physical machine, it has many cores. I want to run 2 instances of
> solr
> > >> > because I think it has these benefits:
> > >> >
> > >> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
> > >> > searcher (28GB)
> > >> > 2) I can deactivate warming for the writer and keep it for the
> > searcher
> > >> > (this considerably speeds up indexing - each time we commit, the
> > server
> > >> is
> > >> > rebuilding a citation network of 80M edges)
> > >> > 3) saving disk space and better OS caching (OS should be able to use
> > >> more
> > >> > RAM for the caching, which should result in faster operations - the
> > two
> > >> > processes are accessing the same index)
> > >> >
> > >> > Maybe I sho

Re: translating a character code to an ordinal?

2013-06-07 Thread geeky2
thx,


please send me a link to the book so i get/purchase it.


thx
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068997.html
Sent from the Solr - User mailing list archive at Nabble.com.


custom field tutorial

2013-06-07 Thread geeky2
can someone point me to a "custom field" tutorial.

i checked the wiki and this list - but still a little hazy on how i would do
this.

essentially - when the user issues a query, i want my class to interrogate a
string field (containing several codes - example boo, baz, bar) 

and return a single integer field that maps to the string field (containing
the code).

example: 

boo=1
baz=2
bar=3

thx
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-field-tutorial-tp4068998.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: LotsOfCores feature

2013-06-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
We set it up like this
+ individual solr instances are setup
+ external mapping/routing to allocate users to instances. This information
can be stored in an external data store
+ all cores are created as transient and loadonstart as false
+ cores come online on demand
+ as and when users data get bigger (or hosts are hot)they are migrated
between less hit hosts using in built replication

Keep in mind we had the schema for all users. Currently there is no way to
upload a new schema to solr.
On Jun 8, 2013 1:15 AM, "Aleksey"  wrote:

> > Aleksey: What would you say is the average core size for your use case -
> > thousands or millions of rows? And how sharded would each of your
> > collections be, if at all?
>
> Average core/collection size wouldn't even be thousands, hundreds more
> like. And the largest would be half a million or so but that's a
> pathological case. I don't need sharding and queries than fan out to
> different machines. If fact I'd like to avoid that so I don't have to
> collate the results.
>
>
> > The Wiki page was built not for Cloud Solr.
> >
> > We have done such a deployment where less than a tenth of cores were
> active
> > at any given point in time. though there were tens of million indices
> they
> > were split among a large no:of hosts.
> >
> > If you don't insist of Cloud deployment it is possible. I'm not sure if
> it
> > is possible with cloud
>
> By Cloud you mean specifically SolrCloud? I don't have to have it if I
> can do without it. Bottom line is I want a bunch of small cores to be
> distributed over a fleet, each core completely fitting on one server.
> Would you be willing to provide a little more details on your setup?
> In particular, how are you managing the cores?
> How do you route requests to proper server?
> If you scale the fleet up and down, does reshuffling of the cores
> happen automatically or is it an involved manual process?
>
> Thanks,
>
> Aleksey
>


Re: custom field tutorial

2013-06-07 Thread Walter Underwood
What are you trying to do? This seems really odd. I've been working in search 
for fifteen years and I've never heard this request.

You could always return all the fields to the client and ignore the ones you 
don't want.

wunder

On Jun 7, 2013, at 8:24 PM, geeky2 wrote:

> can someone point me to a "custom field" tutorial.
> 
> i checked the wiki and this list - but still a little hazy on how i would do
> this.
> 
> essentially - when the user issues a query, i want my class to interrogate a
> string field (containing several codes - example boo, baz, bar) 
> 
> and return a single integer field that maps to the string field (containing
> the code).
> 
> example: 
> 
> boo=1
> baz=2
> bar=3
> 
> thx
> mark
> 






Re: Multitable import - uniqueKey

2013-06-07 Thread sodoo
Thank you for all reply members. Solve the issue. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multitable-import-uniqueKey-tp4067796p4069007.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom field tutorial

2013-06-07 Thread Anria Billavara

You seem to know what you want the words to map to, so index the map.  Have one 
field for the word, one field for the mapped value, and at query time, search 
the words and return the mapped field. If it is comma separated, so be it and 
split it up in your code post search.
Otherwise, same as Wunder, in my many years in search this is an odd request
Anria

Sent from my Samsung smartphone on AT&T

 Original message 
Subject: Re: custom field tutorial 
From: Walter Underwood  
To: solr-user@lucene.apache.org 
CC:  

What are you trying to do? This seems really odd. I've been working in search 
for fifteen years and I've never heard this request.

You could always return all the fields to the client and ignore the ones you 
don't want.

wunder

On Jun 7, 2013, at 8:24 PM, geeky2 wrote:

> can someone point me to a "custom field" tutorial.
> 
> i checked the wiki and this list - but still a little hazy on how i would do
> this.
> 
> essentially - when the user issues a query, i want my class to interrogate a
> string field (containing several codes - example boo, baz, bar) 
> 
> and return a single integer field that maps to the string field (containing
> the code).
> 
> example: 
> 
> boo=1
> baz=2
> bar=3
> 
> thx
> mark
>