AW: Solr indexing slows down

2013-06-10 Thread Sebastian Steinfeld
Hi Michael,

the database I am using is Oracle. That's right, I am selecting from a view.
What do you mean by selecting from outside of solr? I thought the batchsize 
will do the pagination?

The load of the database server is not increasing during the import. It seems 
that the database is doing nothing.

Thanks,
Sebastian



-Ursprüngliche Nachricht-
Von: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Gesendet: Donnerstag, 6. Juni 2013 18:29
An: solr-user@lucene.apache.org
Betreff: Re: Solr indexing slows down

Hi Sebastian,

What database are you using? How much RAM is available on your machine? It 
looks like you're selecting from a view... Have you tried paging through the 
view outside of Solr? Does that slow down as well? Do you notice any increased 
load on the Solr box or the database server?



Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Thu, Jun 6, 2013 at 6:13 AM, Sebastian Steinfeld < 
sebastian.steinf...@mgm-tp.com> wrote:

> Hi,
>
> I am new to solr and we want to use Solr to speed up our product search.
> And it is working really nice, but I think I have a problem with the 
> indexing.
> It slows down after a few minutes.
>
> I am using the DataImportHandler to import the products from the database.
> And I start the import by executing the following HTTP request:
> /dataimport?command=full-import&clean=true&commit=true
>
> I guess this are the importend parts of my configuration:
>
> schema.xml:
> --
> 
>  stored="true" required="true"  />
>  stored="true" required="true"  />
>  stored="false"  />
>  stored="false"  />
> multiValued="true"/>
> 
>  
>  positionIncrementGap="100">
>   
> 
> 
>   
> 
> --
>
> solrconfig.xml:
> --
>class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
> dataimport-handler.xml
> 
>   
> --
>
> dataimport-handler.xml:
> --
> 
>  url="*"
> user="*" "
> password="*"
> />
>
>  query="SELECT   PRODUCTS_PK, PRODUCTS_CODE,
> PRODUCTS_EAN, PRODUCTSLP_NAME FROM V_SOLR_IMPORT4PRODUCT_SEARCH">
> 
> 
> 
> 
> 
> 
> 
> --
>
> The amout of documents I want to index is 8 million, the first 1,6 
> million are indexed in 2min, but to complete the Import it takes nearly 2 
> hours.
> The size of the index on the hard drive is 610MB.
> I started the solr server with 2GB memory.
>
>
> I read that the duration of indexing might be connected to the batch 
> size, so I increased the batchSize in the dataSource to 10.000, but 
> this didn't make any differences.
> I also tried to disable the autocommit, which is configured in the 
> solrconfig.xml. I disabled it by uncommenting it, but this also didn't 
> made any differences.
>
> It would be realy nice if someone of you could help me with this problem.
>
> Thank you very much,
> Sebastian
>
>


Re: Get Statistics With CloudSolrServer?

2013-06-10 Thread Furkan KAMACI
I think that it is related to LukeRequest

2013/6/10 Mark Miller 

>
> On Jun 9, 2013, at 7:52 PM, Furkan KAMACI  wrote:
>
> > There is a stat,st,cs section at admin page and gives information as
> like:
> >
> > Last Modified, Num Docs, Max Doc and etc. How can I get such kind of
> > information using CloudSolrServer with Solrj?
>
> There is an admin request handler that exposes them as one option: the
> /admin/mbeans admin request handler - you can use solrj to hit that handler.
>
> - Mark
>
>


Re: LIMIT on number of OR in fq

2013-06-10 Thread Raymond Wiker
A better option would be to use POST instead of GET.


On Mon, Jun 10, 2013 at 8:50 AM, Aloke Ghoshal  wrote:

> True, the container's request header size limit must be the reason then.
> Try:
>
> http://serverfault.com/questions/136249/how-do-we-increase-the-maximum-allowed-http-get-query-length-in-jetty
>
>
>
> On Sun, Jun 9, 2013 at 11:04 PM, Jack Krupansky  >wrote:
>
> > Maybe it is hitting some kind of container limit on URL length, like more
> > than 2048?
> >
> > Add &debugQuery=true to your query and see what query is both received
> and
> > parsed and generated.
> >
> > Also, if the default query operator is set to or, fq={! q.op=OR}..., then
> > you can drop the " OR " operators for a shorter query string.
> >
> > That said, as with most features of Lucene and Solr, the #1 rule is: Use
> > them in moderation. A few dozen IDs are fine. A hundred immediately
> raising
> > suspicion - what are you really trying to do? 200?! 250??!! Over 300?!!
> > 1,000?!?! 5,000?!?! I mean, do you really need to do all of this on a
> > single "query"? If you find yourself saying "Yes", go back to the drawing
> > board and think a lot more carefully what your data model is. I mean, the
> > application data model is supposed to simplify queries. Your case does
> not
> > seem simple at all.
> >
> > Tell us what you are really trying to do with this extreme filter query.
> > The fact that you stumbled into an apparent problem should just be a
> wakeup
> > call that you need to reconsider your basic design assumptions.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Kamal Palei
> > Sent: Sunday, June 09, 2013 9:07 AM
> > To: solr-user@lucene.apache.org
> > Subject: LIMIT on number of OR in fq
> >
> >
> > Dear All
> > I am using below syntax to check for a particular field.
> > &fq=locations:(5000 OR 1 OR 15000 OR 2 OR 75100)
> > With this I get the expected result properly.
> >
> > In a particular situations the number of ORs are more (looks around 280)
> > something as below.
> >
> > &fq=pref_work_locations:(5000 OR 1 OR 15000 OR 2 OR 75100 OR
> 125300
> > OR 25300 OR 141100 OR 100700 OR 50300 OR 132100 OR 25000 OR 25100 OR
> 25200
> > OR 25400 OR 25500 OR 25600 OR 25700 OR 25800 OR 25900 OR 26000 OR 26100
> OR
> > 26200 OR 26300 OR 26400 OR 26500 OR 3 OR 30100 OR 35000 OR 35100 OR
> > 35200 OR 35300 OR 35400 OR 35500 OR 35600 OR 35700 OR 35800 OR 4 OR
> > 45000 OR 45100 OR 45200 OR 45300 OR 45400 OR 45500 OR 5 OR 50100 OR
> > 50200 OR 55000 OR 55100 OR 55200 OR 55300 OR 55400 OR 55500 OR 55600 OR
> > 55700 OR 6 OR 60100 OR 60200 OR 60300 OR 60400 OR 60500 OR 65000 OR
> > 65100 OR 65200 OR 7 OR 70100 OR 70200 OR 70300 OR 70400 OR 75000 OR
> > 75200 OR 75300 OR 75400 OR 75500 OR 75600 OR 75700 OR 75800 OR 75900 OR
> > 76000 OR 76100 OR 76200 OR 76300 OR 76400 OR 8 OR 80100 OR 80200 OR
> > 80300 OR 80400 OR 80500 OR 85000 OR 85100 OR 85200 OR 85300 OR 85400 OR
> > 85500 OR 85600 OR 85700 OR 85800 OR 85900 OR 86000 OR 86100 OR 86200 OR
> > 9 OR 90100 OR 90200 OR 90300 OR 90400 OR 90500 OR 90600 OR 90700 OR
> > 90800 OR 90900 OR 91000 OR 91100 OR 91200 OR 91300 OR 91400 OR 91500 OR
> > 91600 OR 91700 OR 91800 OR 91900 OR 92000 OR 92100 OR 92200 OR 92300 OR
> > 92400 OR 92500 OR 92600 OR 92700 OR 92800 OR 92900 OR 95000 OR 95100 OR
> > 10 OR 100100 OR 105000 OR 105100 OR 105200 OR 105300 OR 105400 OR
> > 105500 OR 105600 OR 105700 OR 105800 OR 105900 OR 106000 OR 106100 OR
> > 106200 OR 11 OR 110100 OR 115000 OR 115100 OR 115200 OR 115300 OR
> > 115400 OR 115500 OR 12 OR 120100 OR 120200 OR 120300 OR 120400 OR
> > 120500 OR 120600 OR 120700 OR 120800 OR 120900 OR 121000 OR 121100 OR
> > 125000 OR 125100 OR 125200 OR 125400 OR 125500 OR 125600 OR 125700 OR
> > 125800 OR 125900 OR 126000 OR 126100 OR 13 OR 130100 OR 130200 OR
> > 130300 OR 130400 OR 130500 OR 130600 OR 130700 OR 130800 OR 130900 OR
> > 131000 OR 131100 OR 131200 OR 131300 OR 131400 OR 131500 OR 131600 OR
> > 131700 OR 131800 OR 131900 OR 132000 OR 132200 OR 132300 OR 132400 OR
> > 132500 OR 135000 OR 135100 OR 14 OR 140100 OR 140200 OR 140300 OR
> > 140400 OR 140500 OR 140600 OR 140700 OR 140800 OR 140900 OR 141000 OR
> > 141200 OR 141300 OR 141400 OR 141500 OR 141600 OR 141700 OR 141800 OR
> > 141900 OR 142000 OR 142100 OR 145000 OR 15 OR 155000 OR 16 OR
> > 165000 OR 17 OR 175000 OR 18 OR 185000 OR 19 OR 195000 OR
> > 20 OR 205000 OR 21 OR 215000 OR 22 OR 225000 OR 23 OR
> > 235000 OR 24 OR 245000 OR 25 OR 255000 OR 26 OR 265000 OR
> > 27 OR 275000 OR 28 OR 285000 OR 29 OR 295000 OR 30 OR
> > 305000 OR 31 OR 315000 OR 32 OR 325000 OR 33 OR 335000 OR
> > 34 OR 345000 OR 35 OR 355000 OR 36 OR 365000 OR 37 OR
> > 375000 OR 38 OR 385000 OR 39)
> >
> >
> > When we have such a high number of ORs, it gives me 0 records, whereas I
> > expected all possible records.
> >
> > So I am wond

AW: Solr 4.3 - Schema Parsing Failed: Invalid field property: compressed

2013-06-10 Thread André Widhani
>From what version are you upgrading? The compressed attribute is unsupported 
>since the 3.x releases.

The change log (CHANGES.txt) has a section "Upgrading from Solr 1.4" in the 
notes for Solr 3.1:

"Field compression is no longer supported. Fields that were formerly compressed 
will be uncompressed as index segments are merged. For shorter fields, this may 
actually be an improvement, as the compression used was not very good for short 
text. Some indexes may get larger though."

Also, indices created with 1.4 cannot be opened with 4.x, only 3.x.

Regards,
André


Von: Uomesh [uom...@gmail.com]
Gesendet: Montag, 10. Juni 2013 06:19
An: solr-user@lucene.apache.org
Betreff: Solr 4.3 - Schema Parsing Failed: Invalid field property: compressed

Hi,

I am getting below after upgrading to Solr 4.3. Is compressed attribute no
longer supported in Solr 4.3 or it is a bug in 4.3?

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Schema Parsing Failed: Invalid field property: compressed

Thanks,
Umesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-Schema-Parsing-Failed-Invalid-field-property-compressed-tp4069254.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to Reach LukeRequestHandler From Solrj?

2013-06-10 Thread Furkan KAMACI
I want to get statistics from Solr via Solrj. I think that I should reach
LukeRequestHandler (*if it is not, you can explain th proper way*.) I use
Solr 4.2.1 and CloudSolrServer to reach Solr via Solrj. How can I do that?

This URL's response has exactly what I want:

:8983/solr/collection1/admin/luke?wt=json&show=index&numTerms=0&_=1370851203426


AW: Solr indexing slows down

2013-06-10 Thread Sebastian Steinfeld
Hi Shawn,

thank you for your answer.

I am using Oracle. This is the configuration I am using:
-



There are 12GB free memory on the server I hope this is enough.
I will test the import with 4GB vm memory.

Do you know if the "autocommit" inside solrconfig.xml configuration works when 
using the DIH with the url:
/dataimport?command=full-import&clean=true&commit=true

I read, that "commit=true" will only make one commit in the end of the import 
and so "autocommit" won't work.

I am using Solr 4.3

Thank you,
Sebastian


-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:s...@elyograg.org] 
Gesendet: Donnerstag, 6. Juni 2013 19:06
An: solr-user@lucene.apache.org
Betreff: Re: Solr indexing slows down

On 6/6/2013 4:13 AM, Sebastian Steinfeld wrote:
> The amout of documents I want to index is 8 million, the first 1,6 million 
> are indexed in 2min, but to complete the Import it takes nearly 2 hours.
> The size of the index on the hard drive is 610MB.
> I started the solr server with 2GB memory.
>
> I read that the duration of indexing might be connected to the batch size, so 
> I increased the batchSize in the dataSource to 10.000, but this didn't make 
> any differences.
> I also tried to disable the autocommit, which is configured in the 
> solrconfig.xml. I disabled it by uncommenting it, but this also didn't made 
> any differences.

If you are importing from MySQL, you actually want the batchSize to be -1.  
This streams the results so they don't take up large blocks of memory.  Other 
JDBC drivers have different ways of configuring this mode of operation.  You 
fully redacted the driver and URL in your config file, so I don't know what you 
are using.

2GB of Java heap for Solr is probably not enough.  It's likely that once your 
index gets big enough, Solr is starved for memory and has to perform constant 
garbage collections to free up enough for basic operation.  I would bet that 
you also don't have enough free memory for the OS to cache the index well:

http://wiki.apache.org/solr/SolrPerformanceProblems

If you are using 4.x with the updateLog turned on, then you want autoCommit 
enabled with openSearcher to be false.  This is covered on the wiki page I 
linked.

Thanks,
Shawn



Re: LotsOfCores feature

2013-06-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
Aleksey,

It was a less than ideal situation. because we did not have a choice. We
had external systems/scripts to manage this. A new custom implementation is
being built on SolrCloud which would have taken care of most of hose
 issues.

SolrReplication is a hidden once you move to cloud. But it will continue in
the same way if you have a stand-lone deployment.


On Mon, Jun 10, 2013 at 1:20 AM, Aleksey  wrote:

> Thanks Paul. Just a little clarification:
>
> You mention that you migrate data using built-in replication, but if
> you map and route users yourself, doesn't that mean that you also need
> to manage replication yourself? Your routing logic needs to be aware
> of how to map both replicas for each user, and if one hosts goes down,
> then it needs to distribute traffic that it was receiving over other
> hosts. Same thing for adding more hosts.
> I did a couple of quick searches and found mostly older wikis that say
> solr replication will change in the future. Would you be able to point
> me to the right one?
>
>
> -
>
> On Fri, Jun 7, 2013 at 8:34 PM, Noble Paul നോബിള്‍  नोब्ळ्
>  wrote:
> > We set it up like this
> > + individual solr instances are setup
> > + external mapping/routing to allocate users to instances. This
> information
> > can be stored in an external data store
> > + all cores are created as transient and loadonstart as false
> > + cores come online on demand
> > + as and when users data get bigger (or hosts are hot)they are migrated
> > between less hit hosts using in built replication
> >
> > Keep in mind we had the schema for all users. Currently there is no way
> to
> > upload a new schema to solr.
> > On Jun 8, 2013 1:15 AM, "Aleksey"  wrote:
> >
> >> > Aleksey: What would you say is the average core size for your use
> case -
> >> > thousands or millions of rows? And how sharded would each of your
> >> > collections be, if at all?
> >>
> >> Average core/collection size wouldn't even be thousands, hundreds more
> >> like. And the largest would be half a million or so but that's a
> >> pathological case. I don't need sharding and queries than fan out to
> >> different machines. If fact I'd like to avoid that so I don't have to
> >> collate the results.
> >>
> >>
> >> > The Wiki page was built not for Cloud Solr.
> >> >
> >> > We have done such a deployment where less than a tenth of cores were
> >> active
> >> > at any given point in time. though there were tens of million indices
> >> they
> >> > were split among a large no:of hosts.
> >> >
> >> > If you don't insist of Cloud deployment it is possible. I'm not sure
> if
> >> it
> >> > is possible with cloud
> >>
> >> By Cloud you mean specifically SolrCloud? I don't have to have it if I
> >> can do without it. Bottom line is I want a bunch of small cores to be
> >> distributed over a fleet, each core completely fitting on one server.
> >> Would you be willing to provide a little more details on your setup?
> >> In particular, how are you managing the cores?
> >> How do you route requests to proper server?
> >> If you scale the fleet up and down, does reshuffling of the cores
> >> happen automatically or is it an involved manual process?
> >>
> >> Thanks,
> >>
> >> Aleksey
> >>
>



-- 
-
Noble Paul


Re: Dataless nodes in SolrCloud?

2013-06-10 Thread Shalin Shekhar Mangar
No, there's no such notion in SolrCloud. Each node that is part of a
collection/shard is a replica and will handle indexing/querying. Even
though you can send a request to a node containing a different collection,
the request would just be forwarded to the right node and will be executed
there.

That being said, do people find such a feature useful? Is aggregation
expensive enough to warrant a separate box? In a distributed search, the
local index is used. One'd would just be adding a couple of extra network
requests if you don't have a local index.


On Sun, Jun 9, 2013 at 11:18 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> Is there a notion of a data-node vs. non-data node in SolrCloud?
> Something a la http://www.elasticsearch.org/guide/reference/modules/node/
>
>
> Thanks,
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.3.0 Cloud Issue indexing pdf documents

2013-06-10 Thread Mark Wilson
Hi Michael

Thanks very much for that, it did indeed solve the problem.

I had it setup on my internal servers, as I have a separate script for
tomcat startup, but forgot all about it on the Amazon Cloud servers.

For info

I added 
CATALINA_OPTS="-Djava.awt.headless=true"
export CATALINA_OPTS

to $tomcat_home/bin/setenv.sh

Thanks again

Regards Mark


On 07/06/2013 19:29, "Michael Della Bitta"
 wrote:

> Hi Mark,
> 
> This is a total shot in the dark, but does
> passing  -Djava.awt.headless=true when you run the server help at all?
> 
> More on awt headless mode:
> http://www.oracle.com/technetwork/articles/javase/headless-136834.html
> 
> Michael Della Bitta
> 
> Applications Developer
> 
> o: +1 646 532 3062  | c: +1 917 477 7906
> 
> appinions inc.
> 
> “The Science of Influence Marketing”
> 
> 18 East 41st Street
> 
> New York, NY 10017
> 
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
> 
> 
> On Fri, Jun 7, 2013 at 11:31 AM, Mark Wilson  wrote:
> 
>> Hi
>> 
>> I am having an issue with adding pdf documents to a SolrCloud index I have
>> setup.
>> 
>> I can index pdf documents fine using 4.3.0 on my local box, but I have a
>> SolrCloud instance setup on the Amazon Cloud (Using 2 servers) and I get
>> Error.
>> 
>> It seems that it is not loading org.apache.pdfbox.pdmodel.PDPage. However,
>> the jar is in the directory, and referenced in the solrconfig.xml file
>> 
>>   
>>   
>> 
>>   
>>   
>> 
>>   
>>   
>> 
>>   
>>   
>> 
>> When I start Tomcat, I can see that the file has loaded.
>> 
>> 2705 [coreLoadExecutor-4-thread-3] INFO
>> org.apache.solr.core.SolrResourceLoader  ­ Adding
>> 'file:/www/solr/lib/contrib/extraction/lib/pdfbox-1.7.1.jar' to classloader
>> 
>> But when I try to add a document.
>> 
>> java
>> -Durl=
>> http://ec2-blah-blaheu-west-1.compute.amazonaws.com:8080/solr/quosa2-c
>> ollection/update/extract -Dparams=literal.id=pdf1 -Dtype=text/pdf -jar
>> post.jar 2008.Genomics.pdf
>> 
>> 
>> I get this error. I¹m running on an Ubuntu machine.
>> 
>> Linux ip-10-229-125-163 3.5.0-21-generic #32-Ubuntu SMP Tue Dec 11 18:51:59
>> UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>> 
>> Error log.
>> 
>> 88168 [http-bio-8080-exec-1] INFO
>> org.apache.solr.update.processor.LogUpdateProcessor  ­
>> [quosa2-collection_shard1_replica1] webapp=/solr path=/update/extract
>> params={literal.id=pdf1} {} 0 1534
>> 88180 [http-bio-8080-exec-1] ERROR
>> org.apache.solr.servlet.SolrDispatchFilter  ­
>> null:java.lang.RuntimeException: java.lang.UnsatisfiedLinkError:
>> /usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so: libXrender.so.1:
>> cannot open shared object file: No such file or directory
>> at
>> 
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java
>> :670)
>> at
>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
>> 380)
>> at
>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
>> 155)
>> at
>> 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
>> FilterChain.java:243)
>> at
>> 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
>> ain.java:210)
>> at
>> 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
>> va:222)
>> at
>> 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
>> va:123)
>> at
>> 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171
>> )
>> at
>> 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>> at
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
>> at
>> 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
>> :118)
>> at
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>> at
>> 
>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Proce
>> ssor.java:1009)
>> at
>> 
>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstrac
>> tProtocol.java:589)
>> at
>> 
>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:
>> 310)
>> at
>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
>> 45)
>> at
>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
>> 15)
>> at java.lang.Thread.run(Thread.java:722)
>> Caused by: java.lang.UnsatisfiedLinkError:
>> /usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so: libXrender.so.1:
>> cannot open shared object file: No such file or directory
>> at java.lang.ClassLoader$NativeLibrary.load(Native Method)
>> at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1939)
>> at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1864)
>> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1825)
>> at java

What to do with CloudSolrServer if Internal Ips are different at my SolrCloud?

2013-06-10 Thread Furkan KAMACI
I want to use CloudSolrServer via Solrj at my application. However I get
that error:

org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request:[http://10.236.**.***:8983/solr/collection1,
http://10.240.**.**:8983/solr/collection1 ...

I think that problem is that: my Solr Nodes are located at Amazon AWS.
Their internal ip are different that I connect via my browser. What should
I do?


Admin Page Segment Count is Different than LukeRequest's Segment Count?

2013-06-10 Thread Furkan KAMACI
I have that lines of code:

CloudSolrServer solrServer = SolrCloudServerFactory.getCloudSolrServer();
NamedList namedList = solrServer.request(new LukeRequest());
NamedList index = (NamedList) namedList.get("index");
System.out.println(index.get("segmentCount"));

It prints 5 into system out. However when I open admin page and click
collection1 at core select list. Under statistics part I see that:

Segment Count: 1

Which one is true?


What is directory and userdata at LukeRequest?

2013-06-10 Thread Furkan KAMACI
I have that line of codes:

CloudSolrServer solrServer = SolrCloudServerFactory.getCloudSolrServer();
NamedList namedList = solrServer.request(new LukeRequest());
NamedList index = (NamedList) namedList.get("index");
System.out.println(index.get("directory"));
System.out.println(index.get("userData"));


I know that if I get maxDocs, lastModified etc. they are cloud specific.
However what does directory and userdata means? Does it node specific? If
yes, why it is returned me even I pass a CloudSolrServer type solrServer?


Re: translating a character code to an ordinal?

2013-06-10 Thread Erick Erickson
You can use copyField. All it does is send the raw data to
the second field, the fact that they're different types is
irrelevant.

Why not just give it a try?

Erick

On Fri, Jun 7, 2013 at 8:08 PM, geeky2  wrote:
> hello jack,
>
> thank you for the code ;)
>
> what "book" are you referring to?  AFAICT - all of the 4.0 books are "future
> order".
>
> we won't be moving to 4.0 (soon enough).
>
> so i take it - copyfield will not work, eg - i cannot take a code like ABC
> and copy it to an int field and then use the regex to turn it in to an
> ordinal?
>
> thx
> mark
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068984.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom Data Clustering

2013-06-10 Thread Raheel Hasan
I wounder how to do that shouldn't this already be part of Solr?

Also, I read over then the Internet that it possible to use Mahout and Solr
for this purpose so how to achieve that?


On Sun, Jun 9, 2013 at 7:57 AM, Otis Gospodnetic  wrote:

> Hello,
>
> This sounds like a custom SearchComponent.
> Which clustering library you want to use or DIY is up to you, but go
> with the SearchComponent approach.  You will still need to process N
> hits, but you won't need to first send them all over the wire.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Fri, Jun 7, 2013 at 11:48 AM, Raheel Hasan 
> wrote:
> > Hi,
> >
> > Can someone please tell me if there is a way to have a custom
> *`clustering
> > of the data`* from `solr` 'query' results? I am facing 2 issues
> currently:
> >
> >  1. The `*Carrot*` clustering only applies clustering to the "paged"
> > results (i.e. in the current pagination's page results).
> >
> >  2. I need to have custom clustering and classify results into certain
> > classes only (i.e. only few very specific words in the search results).
> > Like for example "Red", "Green", "Blue" etc... and not "hello World",
> > "Known World", "green world" etc -(if you know what I mean here) -
> > Where all these words in both Do and DoNot existing in the search
> results.
> >
> > Please tell me how to achieve this. Perhaps Carrot/clustering is not
> needed
> > here and some other classifier is needed. So what to do here?
> >
> > Basically, I cannot receive 1 million results, then process them via
> > PHP-Array to classify them as per need. The classification must be done
> > here in solr only.
> >
> > Thanks
> >
> > --
> > Regards,
> > Raheel Hasan
>



-- 
Regards,
Raheel Hasan


Re: Query-node+shard stickiness?

2013-06-10 Thread Erick Erickson
Nothing I've seen. It would get really tricky
though. Each node in the cluster would have
to have a copy of all queries received by
_any_ node which would result in all
queries being sent to all nodes along with
an indication of what node that query was
actually supposed to be serviced by.

And now suppose there were 100 shards,
then the list of the correct node would get
quite large.

Seems overly complex for the benefit, but
what do I know?

FWIW
Erick



On Sat, Jun 8, 2013 at 10:38 PM, Otis Gospodnetic
 wrote:
> Hi,
>
> Is there anything in SolrCloud that would support query-node/shard
> affinity/stickiness?
>
> What I mean by that is a mechanism that is smart enough to keep
> sending the same query X to the same node(s)+shard(s)... with the goal
> being better utilization of Solr and OS caches?
>
> Example:
> * Imagine a Collection with 2 shards and 3 replicas: s1r1, s1r2, s1r3,
> s2r1, s2r2, s2r3
> * Query for "Foo Bar" comes in and hits one of the nodes, say s1r1
> * Since shard 2 needs to be queried, too, one of its 3 replicas needs
> to be searched.  Say s2r1 gets searched
> * 5 minutes later the same query for "Foo Bar" comes in, say it hits s1r1 
> again
> * Again shard 2 needs to be searched.  But which of the 3 replicas
> should be searched?
> * Ideally that same s2r1 would be searched
>
> Is there anything in SolrCloud that can accomplish this?
> Or if there a place in SolrCloud where such "query hash ==>
> node/shard" mapping could be implemented?
>
> Thanks,
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/


Re: does solr support query time only stopwords?

2013-06-10 Thread Erick Erickson
My _guess_ is that you're perhaps using
edismax or similar and getting matches from
fields you don't expect on terms you that are
not stopwords. Try adding &debug=query and
seeing what the parsed query actually is.

And, of course, I have no idea what Datastax is
doing.

And, you have to at least reload the core
to pick up the new stopwords.

Best
Erick

On Sat, Jun 8, 2013 at 6:33 PM, jchen2000  wrote:
> I wanted to analyze high frequency terms using Solr's Luke request handler
> and keep updating the stopwords file for new queries from time to time.
> Obviously I have to index all terms whether they belong to stopwords list or
> not.
>
> So I configured query analyzer stopwords list but disabled index analyzer
> stopwords list, However, it seems like the query would return all records
> containing stopwords after this.
>
> Anybody has an idea why this would happen?
>
> ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Adding pdf/word file using JSON/XML

2013-06-10 Thread Roland Everaert
Hi,

Based on the wiki, below is an example of how I am currently adding a pdf
file with an extra field called name:
curl "
http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text";
--data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"

Is it possible to add a file + any extra fields using a JSON or XML request.


Thanks,



Roland Everaert.


Re: solr facet query on multiple search term

2013-06-10 Thread Erick Erickson
There's nothing like that built in that I know
of, the closest in concept is "pivot faceting"
but that doesn't work in this case.

Best
Erick

On Mon, Jun 10, 2013 at 2:13 AM, vrparekh  wrote:
> Thanks Erick,
>
> yes example url i provided is bit confusing, sorry for that.
>
> Actual requirement is to get day wise total no. of counts for multiple
> terms.
>
> if we use q=(firstterm OR
> secondterm)&facet.query=firstterm&facet.query=secondTerm. It will provide
> total no. of records count for both search term, but not day wise
> (facet.range will have combine results of both.)
>
> need something like below (just sample),
>
> 
>   
>  
>
>  
>   10551
>   20802
>   
> 
>  
> 
>
>  
>
>  
>   100
>   5
>   
> 
>  
> 
> 
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-facet-query-on-multiple-search-term-tp4068856p4069259.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: translating a character code to an ordinal?

2013-06-10 Thread geeky2
i will try it.

i guess i made a "poor" assumption that you would not get predictable
results when copying a code like "mycode" to an int field where where the
desired end result in the int field is say, "1".

i was worried that some sort of ascii conversion or "wrap around" would
happen in the int field.

thx for the insight.

mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4069335.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: translating a character code to an ordinal?

2013-06-10 Thread Erick Erickson
Hmmm, that may be a wrinkle. I'm actually not sure
what'll happen if the _raw_ thing you copy to the
int field is not an int (or whatever). You spoke of
character code translation, so it may blow up. In which
case I'd consider a custom update processor that read
the source field, performed whatever mods you want
to it and added the dest field.

You _might_ get away with the dest field doing the
translation with a PatternReplaceCharFilterFactory,
which processes the input stream before it gets
analyzed as well

FWIW,
Erick

On Mon, Jun 10, 2013 at 8:43 AM, geeky2  wrote:
> i will try it.
>
> i guess i made a "poor" assumption that you would not get predictable
> results when copying a code like "mycode" to an int field where where the
> desired end result in the int field is say, "1".
>
> i was worried that some sort of ascii conversion or "wrap around" would
> happen in the int field.
>
> thx for the insight.
>
> mark
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4069335.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding pdf/word file using JSON/XML

2013-06-10 Thread Gora Mohanty
On 10 June 2013 17:47, Roland Everaert  wrote:
> Hi,
>
> Based on the wiki, below is an example of how I am currently adding a pdf
> file with an extra field called name:
> curl "
> http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text";
> --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"
>
> Is it possible to add a file + any extra fields using a JSON or XML request.

It is not entirely clear what you are asking. Do you mean
can one do the same as your example above for a PDF
file, but with a XML or JSON file? If so, yes. Please see
the examples in example/exampledocs/ of a Solr source
tree, and http://wiki.apache.org/solr/ExtractingRequestHandler

Regards,
Gora


Re: translating a character code to an ordinal?

2013-06-10 Thread geeky2
i will try it out and let you know - 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4069339.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding pdf/word file using JSON/XML

2013-06-10 Thread Roland Everaert
Sorry if it was not clear.

What I would like is to know how to construct an XML/JSON request that
provide any necessary information (supposedly the full path on disk) to
solr to retrieve and index a pdf/ms word document.

So, an XML request could look like this:



doc10
BLAH
/path/to/file.pdf




Regards,


Roland.


On Mon, Jun 10, 2013 at 3:12 PM, Gora Mohanty  wrote:

> On 10 June 2013 17:47, Roland Everaert  wrote:
> > Hi,
> >
> > Based on the wiki, below is an example of how I am currently adding a pdf
> > file with an extra field called name:
> > curl "
> >
> http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text
> "
> > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"
> >
> > Is it possible to add a file + any extra fields using a JSON or XML
> request.
>
> It is not entirely clear what you are asking. Do you mean
> can one do the same as your example above for a PDF
> file, but with a XML or JSON file? If so, yes. Please see
> the examples in example/exampledocs/ of a Solr source
> tree, and http://wiki.apache.org/solr/ExtractingRequestHandler
>
> Regards,
> Gora
>


How to Get Cloud Statistics and Why It is Permitted to Use CloudSolrServer and LukeRequest?

2013-06-10 Thread Furkan KAMACI
I have two shards. One of them has 46 documents other one has 42.  My
default core name is collection1.

When I select a node from first shard I see that:

Last Modified:about a minute ago
Num Docs:42
Max Doc:42
Deleted Docs:0
Version:27
Segment Count:1

When I select a node from second shard I see that:

Last Modified:2 minutes ago
Num Docs:46
Max Doc:46
Deleted Docs:0
Version:11
Segment Count:1

I want to see total number of documents at my cloud. I have written that
lines of codes:

public int getMaxDocs(CloudSolrServer solrServer){
 NamedList namedList = null;
 try {
   namedList = solrServer.request(new LukeRequest());
  } catch (SolrServerException e) {
   e.printStackTrace();
  } catch (IOException e) {
  e.printStackTrace();
 }
return (int) ((NamedList)
namedList.get("index")).get("numDocs");
}

However I get 46 as result. I think that CloudSolrServer load balances
request and hits seconds shard. However how can I get distributed cluster
statistics? On the other hand isn't it logically wrong letting people to
use CloudSolrServer and LukeRequest together?


How to check does index needs optimize or not?

2013-06-10 Thread Furkan KAMACI
At admin page there occurs an optimize button if needed. Does it related to
current label? I mean does current is true means no need to optimize and
current is false means needs to optimeze? If not how can I check whether it
needs optimize or not from Solrj with CloudSolrServer?


Re: Adding pdf/word file using JSON/XML

2013-06-10 Thread Gora Mohanty
On 10 June 2013 18:53, Roland Everaert  wrote:
> Sorry if it was not clear.
>
> What I would like is to know how to construct an XML/JSON request that
> provide any necessary information (supposedly the full path on disk) to
> solr to retrieve and index a pdf/ms word document.
>
> So, an XML request could look like this:
>
> 
> 
> doc10
> BLAH
> /path/to/file.pdf
> 
> 
[...]

You cannot directly do this with the ExtractingRequestHandler.
One possibility is to use the DataImportHandler, with
XPathEntityProcessor or FileListEntityProcessor to get the filename,
and then use TikaEntityProcessor to actually process the file.
Please see http://wiki.apache.org/solr/DataImportHandler and
the various sections within it.

Regards,
Gora


Re: AW: Solr indexing slows down

2013-06-10 Thread Shawn Heisey
On 6/10/2013 2:32 AM, Sebastian Steinfeld wrote:
> Hi Shawn,
> 
> thank you for your answer.
> 
> I am using Oracle. This is the configuration I am using:
> -
>  name="local" 
> driver="oracle.jdbc.driver.OracleDriver" 
> url="jdbc:oracle:thin:@localhost:1521:XE" 
> user="" 
> password=""
> batchSize="2"
> />
> 
> 
> There are 12GB free memory on the server I hope this is enough.
> I will test the import with 4GB vm memory.

I don't know how to ensure streaming results with Oracle.  It is likely
that someone here does, though.  The default for most JDBC drivers is to
buffer the entire SQL result.

> Do you know if the "autocommit" inside solrconfig.xml configuration works 
> when using the DIH with the url:
> /dataimport?command=full-import&clean=true&commit=true
> 
> I read, that "commit=true" will only make one commit in the end of the import 
> and so "autocommit" won't work.

The autoCommit settings always work, but exactly what that means will
depend on what you want from autoCommit.  The autoCommit settings that
are in the example config will result in a hard commit every fifteen
seconds, but that commit will NOT open a new searcher, so the added
documents will not be visible in search results.  This is IMHO the best
way to go, although I would probably increase the interval to a minute
or five minutes.  You *DO* want these hard commits happening if you're
on Solr 4.x, to control the size of the updateLog.

If you want the index changes to become visible on a regular basis, then
uncomment and use the autoSoftCommit settings.  This defaults to once a
second, which I would probably increase, although that's up to you.  A
soft commit open a new searcher, so index changes become visible.

Thanks,
Shawn



Re: Adding pdf/word file using JSON/XML

2013-06-10 Thread Jack Krupansky

Sorry, but you are STILL not being clear!

Are you asking if you can pass Solr parameters as XML fields? No.

Are you asking if the file name and path can be indexed as metadata? To some 
degree:


curl "http://localhost:8983/solr/update/extract?literal.id=doc-1\
&commit=true&uprefix=attr_" -F "HelloWorld.docx=@HelloWorld.docx"

Then the stream has a name that is indexed as metadata:


 stream_source_info
 HelloWorld.docx
 stream_content_type
 application/octet-stream
 stream_size
 10096
 stream_name
 HelloWorld.docx
 Content-Type
 
application/vnd.openxmlformats-officedocument.wordprocessingml.document


and


 HelloWorld.docx



 HelloWorld.docx


Or, what is it that you are really string to do?

Simply tell us in plain language what problem you are trying to solve.

-- Jack Krupansky

-Original Message- 
From: Roland Everaert

Sent: Monday, June 10, 2013 9:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Adding pdf/word file using JSON/XML

Sorry if it was not clear.

What I would like is to know how to construct an XML/JSON request that
provide any necessary information (supposedly the full path on disk) to
solr to retrieve and index a pdf/ms word document.

So, an XML request could look like this:



doc10
BLAH
/path/to/file.pdf




Regards,


Roland.


On Mon, Jun 10, 2013 at 3:12 PM, Gora Mohanty  wrote:


On 10 June 2013 17:47, Roland Everaert  wrote:
> Hi,
>
> Based on the wiki, below is an example of how I am currently adding a 
> pdf

> file with an extra field called name:
> curl "
>
http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text
"
> --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"
>
> Is it possible to add a file + any extra fields using a JSON or XML
request.

It is not entirely clear what you are asking. Do you mean
can one do the same as your example above for a PDF
file, but with a XML or JSON file? If so, yes. Please see
the examples in example/exampledocs/ of a Solr source
tree, and http://wiki.apache.org/solr/ExtractingRequestHandler

Regards,
Gora





Re: Dataless nodes in SolrCloud?

2013-06-10 Thread Shawn Heisey
On 6/10/2013 3:32 AM, Shalin Shekhar Mangar wrote:
> No, there's no such notion in SolrCloud. Each node that is part of a
> collection/shard is a replica and will handle indexing/querying. Even
> though you can send a request to a node containing a different collection,
> the request would just be forwarded to the right node and will be executed
> there.
> 
> That being said, do people find such a feature useful? Is aggregation
> expensive enough to warrant a separate box? In a distributed search, the
> local index is used. One'd would just be adding a couple of extra network
> requests if you don't have a local index.

I use this concept in non-SolrCloud distributed search, only it's not a
separate node, it's a separate core, which contains the shards and
shards.qt parameters in the request handler definitions.

Thanks,
Shawn



Facet count for "others" after facet.limit

2013-06-10 Thread Raheel Hasan
Hi,

Is there anyway to use facet such that the results shows "Others" (or any
default value) and show all the others?

For example:

on
category_code
count
6
1
false

This will show top 6 different products counts divided into the categories.
However, there are say 20 different categories and I want the rest of the
counts to into "Others". so we have a total of 7 facet counts only: 6
categories and all the rest in "Others".

Please let me know how to do this. thanks..

-- 
Regards,
Raheel Hasan


Re: Solr 4.3.0 Cloud Issue indexing pdf documents

2013-06-10 Thread Michael Della Bitta
Glad that helped. I'm going to go buy a lottery ticket now! :)

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Jun 10, 2013 at 5:56 AM, Mark Wilson  wrote:

> Hi Michael
>
> Thanks very much for that, it did indeed solve the problem.
>
> I had it setup on my internal servers, as I have a separate script for
> tomcat startup, but forgot all about it on the Amazon Cloud servers.
>
> For info
>
> I added
> CATALINA_OPTS="-Djava.awt.headless=true"
> export CATALINA_OPTS
>
> to $tomcat_home/bin/setenv.sh
>
> Thanks again
>
> Regards Mark
>
>
> On 07/06/2013 19:29, "Michael Della Bitta"
>  wrote:
>
> > Hi Mark,
> >
> > This is a total shot in the dark, but does
> > passing  -Djava.awt.headless=true when you run the server help at all?
> >
> > More on awt headless mode:
> > http://www.oracle.com/technetwork/articles/javase/headless-136834.html
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062  | c: +1 917 477 7906
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions  | g+:
> > plus.google.com/appinions
> > w: appinions.com 
> >
> >
> > On Fri, Jun 7, 2013 at 11:31 AM, Mark Wilson  wrote:
> >
> >> Hi
> >>
> >> I am having an issue with adding pdf documents to a SolrCloud index I
> have
> >> setup.
> >>
> >> I can index pdf documents fine using 4.3.0 on my local box, but I have a
> >> SolrCloud instance setup on the Amazon Cloud (Using 2 servers) and I get
> >> Error.
> >>
> >> It seems that it is not loading org.apache.pdfbox.pdmodel.PDPage.
> However,
> >> the jar is in the directory, and referenced in the solrconfig.xml file
> >>
> >>   
> >>   
> >>
> >>   
> >>   
> >>
> >>   
> >>   
> >>
> >>   
> >>   
> >>
> >> When I start Tomcat, I can see that the file has loaded.
> >>
> >> 2705 [coreLoadExecutor-4-thread-3] INFO
> >> org.apache.solr.core.SolrResourceLoader  ­ Adding
> >> 'file:/www/solr/lib/contrib/extraction/lib/pdfbox-1.7.1.jar' to
> classloader
> >>
> >> But when I try to add a document.
> >>
> >> java
> >> -Durl=
> >> http://ec2-blah-blaheu-west-1.compute.amazonaws.com:8080/solr/quosa2-c
> >> ollection/update/extract -Dparams=literal.id=pdf1 -Dtype=text/pdf -jar
> >> post.jar 2008.Genomics.pdf
> >>
> >>
> >> I get this error. I¹m running on an Ubuntu machine.
> >>
> >> Linux ip-10-229-125-163 3.5.0-21-generic #32-Ubuntu SMP Tue Dec 11
> 18:51:59
> >> UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
> >>
> >> Error log.
> >>
> >> 88168 [http-bio-8080-exec-1] INFO
> >> org.apache.solr.update.processor.LogUpdateProcessor  ­
> >> [quosa2-collection_shard1_replica1] webapp=/solr path=/update/extract
> >> params={literal.id=pdf1} {} 0 1534
> >> 88180 [http-bio-8080-exec-1] ERROR
> >> org.apache.solr.servlet.SolrDispatchFilter  ­
> >> null:java.lang.RuntimeException: java.lang.UnsatisfiedLinkError:
> >> /usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so:
> libXrender.so.1:
> >> cannot open shared object file: No such file or directory
> >> at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java
> >> :670)
> >> at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
> >> 380)
> >> at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
> >> 155)
> >> at
> >>
> >>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
> >> FilterChain.java:243)
> >> at
> >>
> >>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
> >> ain.java:210)
> >> at
> >>
> >>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
> >> va:222)
> >> at
> >>
> >>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
> >> va:123)
> >> at
> >>
> >>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171
> >> )
> >> at
> >>
> >>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> >> at
> >>
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
> >> at
> >>
> >>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
> >> :118)
> >> at
> >>
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> >> at
> >>
> >>
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Proce
> >> ssor.java:1009)
> >> at
> >>
> >>
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstrac
> >> tProtocol.java:589)
> >> at
> >>
> >>
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoin

Re: How to check does index needs optimize or not?

2013-06-10 Thread Michael Della Bitta
I'm pretty sure you can just check this URL:
http://hasthelargehadroncolliderdestroyedtheworldyet.com/

;)

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Jun 10, 2013 at 9:47 AM, Furkan KAMACI wrote:

> At admin page there occurs an optimize button if needed. Does it related to
> current label? I mean does current is true means no need to optimize and
> current is false means needs to optimeze? If not how can I check whether it
> needs optimize or not from Solrj with CloudSolrServer?
>


Re: What to do with CloudSolrServer if Internal Ips are different at my SolrCloud?

2013-06-10 Thread Michael Della Bitta
You need to specify the public interface IP or hostname in the host
parameter in solr.xml:

http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Jun 10, 2013 at 6:47 AM, Furkan KAMACI wrote:

> I want to use CloudSolrServer via Solrj at my application. However I get
> that error:
>
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request:[http://10.236.
> **.***:8983/solr/collection1,
> http://10.240.**.**:8983/solr/collection1 ...
>
> I think that problem is that: my Solr Nodes are located at Amazon AWS.
> Their internal ip are different that I connect via my browser. What should
> I do?
>


Re: Solr indexing slows down

2013-06-10 Thread Walter Underwood
8 million documents in two hours is over 1000/sec. That is a pretty fast 
indexing rate. It may be hard to go faster than that.

wunder

On Jun 10, 2013, at 7:12 AM, Shawn Heisey wrote:

> On 6/10/2013 2:32 AM, Sebastian Steinfeld wrote:
>> Hi Shawn,
>> 
>> thank you for your answer.
>> 
>> I am using Oracle. This is the configuration I am using:
>> -
>> > name="local" 
>> driver="oracle.jdbc.driver.OracleDriver" 
>> url="jdbc:oracle:thin:@localhost:1521:XE" 
>> user="" 
>> password=""
>> batchSize="2"
>> />
>> 
>> 
>> There are 12GB free memory on the server I hope this is enough.
>> I will test the import with 4GB vm memory.
> 
> I don't know how to ensure streaming results with Oracle.  It is likely
> that someone here does, though.  The default for most JDBC drivers is to
> buffer the entire SQL result.
> 
>> Do you know if the "autocommit" inside solrconfig.xml configuration works 
>> when using the DIH with the url:
>> /dataimport?command=full-import&clean=true&commit=true
>> 
>> I read, that "commit=true" will only make one commit in the end of the 
>> import and so "autocommit" won't work.
> 
> The autoCommit settings always work, but exactly what that means will
> depend on what you want from autoCommit.  The autoCommit settings that
> are in the example config will result in a hard commit every fifteen
> seconds, but that commit will NOT open a new searcher, so the added
> documents will not be visible in search results.  This is IMHO the best
> way to go, although I would probably increase the interval to a minute
> or five minutes.  You *DO* want these hard commits happening if you're
> on Solr 4.x, to control the size of the updateLog.
> 
> If you want the index changes to become visible on a regular basis, then
> uncomment and use the autoSoftCommit settings.  This defaults to once a
> second, which I would probably increase, although that's up to you.  A
> soft commit open a new searcher, so index changes become visible.
> 
> Thanks,
> Shawn
> 

--
Walter Underwood
wun...@wunderwood.org





Re: Facet count for "others" after facet.limit

2013-06-10 Thread Jack Krupansky
Not directly for a field facet. Range and date facets do have the concept of 
"other" to give you more details, but field facet doesn't have that.


But, you can calculate that number easily - it is numFound minus the sum of 
the facet counts for the field, minus "missing".


Still, I agree that it would be nice to enable it directly, like 
"facet.others=true".


-- Jack Krupansky

-Original Message- 
From: Raheel Hasan

Sent: Monday, June 10, 2013 10:56 AM
To: solr-user@lucene.apache.org
Subject: Facet count for "others" after facet.limit

Hi,

Is there anyway to use facet such that the results shows "Others" (or any
default value) and show all the others?

For example:

on
category_code
count
6
1
false

This will show top 6 different products counts divided into the categories.
However, there are say 20 different categories and I want the rest of the
counts to into "Others". so we have a total of 7 facet counts only: 6
categories and all the rest in "Others".

Please let me know how to do this. thanks..

--
Regards,
Raheel Hasan 



Re: Facet count for "others" after facet.limit

2013-06-10 Thread Raheel Hasan
Yea, I just thought about the calculation from [total results - all facet
results]... But I wish there was a simple "Others" option as well ...

Thanks anyway for your help.


On Mon, Jun 10, 2013 at 8:20 PM, Jack Krupansky wrote:

> Not directly for a field facet. Range and date facets do have the concept
> of "other" to give you more details, but field facet doesn't have that.
>
> But, you can calculate that number easily - it is numFound minus the sum
> of the facet counts for the field, minus "missing".
>
> Still, I agree that it would be nice to enable it directly, like
> "facet.others=true".
>
> -- Jack Krupansky
>
> -Original Message- From: Raheel Hasan
> Sent: Monday, June 10, 2013 10:56 AM
> To: solr-user@lucene.apache.org
> Subject: Facet count for "others" after facet.limit
>
>
> Hi,
>
> Is there anyway to use facet such that the results shows "Others" (or any
> default value) and show all the others?
>
> For example:
>
> on
> category_**code
> count
> 6
> 1
> false
>
> This will show top 6 different products counts divided into the categories.
> However, there are say 20 different categories and I want the rest of the
> counts to into "Others". so we have a total of 7 facet counts only: 6
> categories and all the rest in "Others".
>
> Please let me know how to do this. thanks..
>
> --
> Regards,
> Raheel Hasan
>



-- 
Regards,
Raheel Hasan


Re: Solr indexing slows down

2013-06-10 Thread Michael Della Bitta
Sorry, with the "paging through the results outside of Solr," I meant
writing a test to see how long it takes to get through all the results in a
test harness that doesn't use Solr.

I agree with Shawn that you might need to do some JVM tuning to get things
going quicker. You might want to try to monitor your Solr instance with
something like VisualVM to see if it's garbage collecting too much. Also, I
agree with Walter that this might be a pretty decent rate for a single
thread to be importing documents into a single instance, and you might need
to investigate other options to parallelize it if you want it to go faster.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Jun 10, 2013 at 3:05 AM, Sebastian Steinfeld <
sebastian.steinf...@mgm-tp.com> wrote:

> Hi Michael,
>
> the database I am using is Oracle. That's right, I am selecting from a
> view.
> What do you mean by selecting from outside of solr? I thought the
> batchsize will do the pagination?
>
> The load of the database server is not increasing during the import. It
> seems that the database is doing nothing.
>
> Thanks,
> Sebastian
>
>
>
> -Ursprüngliche Nachricht-
> Von: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> Gesendet: Donnerstag, 6. Juni 2013 18:29
> An: solr-user@lucene.apache.org
> Betreff: Re: Solr indexing slows down
>
> Hi Sebastian,
>
> What database are you using? How much RAM is available on your machine? It
> looks like you're selecting from a view... Have you tried paging through
> the view outside of Solr? Does that slow down as well? Do you notice any
> increased load on the Solr box or the database server?
>
>
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> "The Science of Influence Marketing"
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
>
>
> On Thu, Jun 6, 2013 at 6:13 AM, Sebastian Steinfeld <
> sebastian.steinf...@mgm-tp.com> wrote:
>
> > Hi,
> >
> > I am new to solr and we want to use Solr to speed up our product search.
> > And it is working really nice, but I think I have a problem with the
> > indexing.
> > It slows down after a few minutes.
> >
> > I am using the DataImportHandler to import the products from the
> database.
> > And I start the import by executing the following HTTP request:
> > /dataimport?command=full-import&clean=true&commit=true
> >
> > I guess this are the importend parts of my configuration:
> >
> > schema.xml:
> > --
> > 
> > >  stored="true" required="true"  />
> > >  stored="true" required="true"  />
> > >  stored="false"  />
> > >  stored="false"  />
> > > multiValued="true"/>
> >
> >  
> >  > positionIncrementGap="100">
> >   
> > 
> > 
> >   
> > 
> > --
> >
> > solrconfig.xml:
> > --
> >> class="org.apache.solr.handler.dataimport.DataImportHandler">
> > 
> > dataimport-handler.xml
> > 
> >   
> > --
> >
> > dataimport-handler.xml:
> > --
> > 
> >  > url="*"
> > user="*" "
> > password="*"
> > />
> >
> >  > query="SELECT   PRODUCTS_PK, PRODUCTS_CODE,
> > PRODUCTS_EAN, PRODUCTSLP_NAME FROM V_SOLR_IMPORT4PRODUCT_SEARCH">
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > --
> >
> > The amout of documents I want to index is 8 million, the first 1,6
> > million are indexed in 2min, but to complete the Import it takes nearly
> 2 hours.
> > The size of the index on the hard drive is 610MB.
> > I started the solr server with 2GB memory.
> >
> >
> > I read that the duration of indexing might be connected to the batch
> > size, so I increased the batchSize in the dataSource to 10.000, but
> > this didn't make any differences.
> > I also tried to disable the autocommit, which is configured in the
> > solrconfig.xml. I disabled it by uncommenting it, but this also didn't
> > made any differences.
> >
> > It would be realy nice if someone of you could help me with this problem.
> >
> > Thank you very much,
> > Sebastian
> >
> >
>


Re: What to do with CloudSolrServer if Internal Ips are different at my SolrCloud?

2013-06-10 Thread Furkan KAMACI
Can I do it with Solrj?

2013/6/10 Michael Della Bitta 

> You need to specify the public interface IP or hostname in the host
> parameter in solr.xml:
>
> http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
>
>
> On Mon, Jun 10, 2013 at 6:47 AM, Furkan KAMACI  >wrote:
>
> > I want to use CloudSolrServer via Solrj at my application. However I get
> > that error:
> >
> > org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> > available to handle this request:[http://10.236.
> > **.***:8983/solr/collection1,
> > http://10.240.**.**:8983/solr/collection1 ...
> >
> > I think that problem is that: my Solr Nodes are located at Amazon AWS.
> > Their internal ip are different that I connect via my browser. What
> should
> > I do?
> >
>


Re: How to check does index needs optimize or not?

2013-06-10 Thread Furkan KAMACI
How Solr admin page understands it?

2013/6/10 Michael Della Bitta 

> I'm pretty sure you can just check this URL:
> http://hasthelargehadroncolliderdestroyedtheworldyet.com/
>
> ;)
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
>
>
> On Mon, Jun 10, 2013 at 9:47 AM, Furkan KAMACI  >wrote:
>
> > At admin page there occurs an optimize button if needed. Does it related
> to
> > current label? I mean does current is true means no need to optimize and
> > current is false means needs to optimeze? If not how can I check whether
> it
> > needs optimize or not from Solrj with CloudSolrServer?
> >
>


How to ignore folder collection1 when running single instance of SOLR?

2013-06-10 Thread bbarani
I am in process of migrating SOLR 3.x to 4.3.0. 

I am trying to figure out a way to run single instance of SOLR without
modifying the directory structure. Is it mandatory to have a folder named
collection1 in order for the new SOLR server to work? I see that by default
it always searches the config files in collection1 folder. Is there a way to
force it to ignore the collection1 directory?

My directory structure is as below

SOLR 
|___ conf
|___ lib
|___ index

I tried setting home (-Dsolr.solr.home) point to solr (directory) but it
searches for folder named collection1 under solr directory.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-ignore-folder-collection1-when-running-single-instance-of-SOLR-tp4069416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to check does index needs optimize or not?

2013-06-10 Thread Michael Della Bitta
Hi* *Furkan,

That was my flip way of saying that "Optimize" is a highly optional
procedure that should not be undertaken under ordinary circumstances. Solr
has no way of detecting whether this is necessary because in the vast
majority of cases, it's not. If you do a search of this list, you'll find a
lot of discussion of why it's no longer necessary. There's even a JIRA to
remove it from Solr: https://issues.apache.org/jira/browse/SOLR-3141


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Jun 10, 2013 at 11:41 AM, Furkan KAMACI wrote:

> How Solr admin page understands it?
>
> 2013/6/10 Michael Della Bitta 
>
> > I'm pretty sure you can just check this URL:
> > http://hasthelargehadroncolliderdestroyedtheworldyet.com/
> >
> > ;)
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062  | c: +1 917 477 7906
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions  | g+:
> > plus.google.com/appinions
> > w: appinions.com 
> >
> >
> > On Mon, Jun 10, 2013 at 9:47 AM, Furkan KAMACI  > >wrote:
> >
> > > At admin page there occurs an optimize button if needed. Does it
> related
> > to
> > > current label? I mean does current is true means no need to optimize
> and
> > > current is false means needs to optimeze? If not how can I check
> whether
> > it
> > > needs optimize or not from Solrj with CloudSolrServer?
> > >
> >
>


Re: What to do with CloudSolrServer if Internal Ips are different at my SolrCloud?

2013-06-10 Thread Michael Della Bitta
No, it's a Solr instance config.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Jun 10, 2013 at 11:40 AM, Furkan KAMACI wrote:

> Can I do it with Solrj?
>
> 2013/6/10 Michael Della Bitta 
>
> > You need to specify the public interface IP or hostname in the host
> > parameter in solr.xml:
> >
> > http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062  | c: +1 917 477 7906
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions  | g+:
> > plus.google.com/appinions
> > w: appinions.com 
> >
> >
> > On Mon, Jun 10, 2013 at 6:47 AM, Furkan KAMACI  > >wrote:
> >
> > > I want to use CloudSolrServer via Solrj at my application. However I
> get
> > > that error:
> > >
> > > org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> > > available to handle this request:[http://10.236.
> > > **.***:8983/solr/collection1,
> > > http://10.240.**.**:8983/solr/collection1 ...
> > >
> > > I think that problem is that: my Solr Nodes are located at Amazon AWS.
> > > Their internal ip are different that I connect via my browser. What
> > should
> > > I do?
> > >
> >
>


Re: How to ignore folder collection1 when running single instance of SOLR?

2013-06-10 Thread bbarani
Not sure if this is the right way,

I just moved solr.xml outside of solr directory and made changes to sol.xml
to make it point to solr directory and it seems to work fine as before. Can
someone confirm if this is the right way to configure when running single
instance of solr?

  

  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-ignore-folder-collection1-when-running-single-instance-of-SOLR-tp4069416p4069423.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to check does index needs optimize or not?

2013-06-10 Thread Shawn Heisey

On 6/10/2013 10:18 AM, Michael Della Bitta wrote:

Hi* *Furkan,

That was my flip way of saying that "Optimize" is a highly optional
procedure that should not be undertaken under ordinary circumstances. Solr
has no way of detecting whether this is necessary because in the vast
majority of cases, it's not. If you do a search of this list, you'll find a
lot of discussion of why it's no longer necessary. There's even a JIRA to
remove it from Solr: https://issues.apache.org/jira/browse/SOLR-3141


I really liked the LHC page. :)  Michael is correct here.  If you look 
through that JIRA, you'll see that there are still very valid reasons 
for doing an optimize, but the age-old reason of "improving performance" 
is not one of them.


I've put some additional thoughts on the JIRA.

Thanks,
Shawn



Re: Solrj Stats encoding problem

2013-06-10 Thread ethereal
Yeah, that's right, I just set all the params in "q" param. Stupid mistake.
Thanks, Chris.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429p4069431.html
Sent from the Solr - User mailing list archive at Nabble.com.


EmbeddedSolrServer reference

2013-06-10 Thread Alex Sarco
Hi,

I'm running Solr 4.3 embedded in Tomcat, so there's a Solr server starting when 
Tomcat starts.
In the same webapp, I also have a process to recreate the Lucene index when 
Solr starts. To do this, I have a singleton instance of EmbeddedSolrServer 
provided by Spring. This same instance is also used to update the index every 
once in a while (when a message advising of a DB update is received).
This works quite well, but my problem is the adds/updates I make to the index 
using EmbeddedSolrServer are not visible by the Solr instance started by 
Tomcat, unless I restart Tomcat (which is obviously not an option).
So, my questions are:


1)   Is there any way to get a reference to the Solr server started by Tomcat, 
and not having to start another one?

2)   Any other suggestion as how to implement this kind of architecture?


The information contained in this email is strictly confidential and for the 
use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Services Authority.


Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-10 Thread Walter Underwood
Why do you think that is useful? That will give terrible search results. 

Here are the first twenty words in /usr/share/dict/words that contain the 
substring "cat".

abacate
abdicate
abdication
abdicative
abdicator
aberuncator
abjudicate
abjudication
acacatechin
acacatechol
acatalectic
acatalepsia
acatalepsy
acataleptic
acatallactic
acatamathesia
acataphasia
acataposis
acatastasia
acatastatic

wunder

On Jun 9, 2013, at 10:56 PM, Prathik Puthran wrote:

> Hi,
> 
> @Walter
> I'm trying to implement the below feature for the user.
> User types in any "substring" of the strings in the dictionary (i.e. the
> indexed string) .
> SOLR Suggester should return all the strings in the dictionary which has
> the input string as substring.
> 
> Thanks,
> Prathik
> 
> 
> 
> On Fri, Jun 7, 2013 at 4:01 AM, Otis Gospodnetic > wrote:
> 
>> Hi
>> 
>> Ngrams *will* do this for you.
>> 
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On Jun 6, 2013 7:53 AM, "Prathik Puthran" 
>> wrote:
>> 
>>> Basically I want the Suggester to return for "Jason Bourne" as suggestion
>>> for ".*Bour.*" regex.
>>> 
>>> Thanks,
>>> Prathik
>>> 
>>> 
>>> On Thu, Jun 6, 2013 at 12:52 PM, Prathik Puthran <
>>> prathik.puthra...@gmail.com> wrote:
>>> 
 This works even now i.e. when I search for "Jas" it suggests "Jason
 Bourne". What I want is when I search for "Bour" or "ason" (any
>>> substring)
 it should suggest me "Jason Bourne" .
 
 
 On Thu, Jun 6, 2013 at 12:34 PM, Upayavira  wrote:
 
> Can you se the ShingleFilterFactory? It is ngrams for terms rather
>> than
> characters. If you limited it to two term ngrams, when the user
>> presses
> space after their first word, you could do a suggested query against
> your two term ngram field, which would suggest Jason Bourne, Jason
> Statham, etc then you press space after "Jason".
> 
> Upayavira
> 
> On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
>> My use case is I want to search for any substring of the indexed
>>> string
>> and
>> the Suggester should suggest the indexed string. What can I do to
>> make
>> this
>> work?
>> 
>> Thanks,
>> Prathik
>> 
>> 
>> On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
>> >> wrote:
>> 
>>> Please excuse my misunderstanding, but I always wonder why this
>>> index
> time
>>> processing is suggested usually. from my POV is the case for
> query-time
>>> processing i.e. PrefixQuery aka wildcard query Jason* .
>>> Ultra-fast term retrieval also provided by TermsComponent.
>>> 
>>> 
>>> On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <
> j...@basetechnology.com
 wrote:
>>> 
 ngrams?
 
 See:
 http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
 apache/lucene/analysis/ngram/**NGramFilterFactory.html<
>>> 
> 
>>> 
>> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
 
 
 -- Jack Krupansky
 
 -Original Message- From: Prathik Puthran
 Sent: Wednesday, June 05, 2013 11:59 AM
 To: solr-user@lucene.apache.org
 Subject: Configuring lucene to suggest the indexed string for
>> all
> the
 searches of the substring of the indexed string
 
 
 Hi,
 
 Is it possible to configure solr to suggest the indexed string
>> for
> all
>>> the
 searches of the substring of the string?
 
 Thanks,
 Prathik
 
>>> 
>>> 
>>> 
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>> 
>>> 
>>> 
>>> 
> 
 
 
>>> 
>> 

--
Walter Underwood
wun...@wunderwood.org





Curious why Solr Jetty URL has a # sign?

2013-06-10 Thread O. Olson
Hi,

This may be a dumb question but I am curious why the sample Solr Jetty
results in a URL with a # sign e.g. http://localhost:8983/solr/#/~logging ?
Is there any way to get rid of it, so I could have something like:
http://localhost:8983/solr/~logging ? 

Thank you,
O. O. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Curious-why-Solr-Jetty-URL-has-a-sign-tp4069434.html
Sent from the Solr - User mailing list archive at Nabble.com.


Chinese to Pinyin transliteration : homophone matching

2013-06-10 Thread Catala, Francois
Hi,

I've been looking for ways to do homophone matching in Solr for CJK languages. 
I am digging into Chinese for a start.
My inputs are words made of simplified characters, and I need to match words 
that use different characters, but are pronounced the same way.

My conclusion is that I need to index all the possible pinyin representations 
for a given word. Then at query time, generate all pinyin representations for 
the searched word, and match all documents containing any one of them.

My question is : which components can do that in Solr? I've been looking at 
ICUTokenFilterFactory, but with id="Han-Latin" it seems to to do a 1 to 1 
mapping, between characters and pinyin, while in reality it should be a 1 to 
many mapping.

Do you know of any Analyzer that could do something like :


-   input :
长


-   output :
cháng | zhǎng | zháng


Thanks so much for your help!



Re: EmbeddedSolrServer reference

2013-06-10 Thread Michael Della Bitta
Hi Alex,

Why not just use two webapps and not use EmbeddedSolrServer, but do all
your indexing as requests from your application to the Solr context next
door?

One advantage of doing it this way is that EmbeddedSolrServer has been
deemphasized by the Solr team, so you might not get the maintenance you
need to keep this system going.


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Jun 10, 2013 at 1:22 PM, Alex Sarco  wrote:

> Hi,
>
> I'm running Solr 4.3 embedded in Tomcat, so there's a Solr server starting
> when Tomcat starts.
> In the same webapp, I also have a process to recreate the Lucene index
> when Solr starts. To do this, I have a singleton instance of
> EmbeddedSolrServer provided by Spring. This same instance is also used to
> update the index every once in a while (when a message advising of a DB
> update is received).
> This works quite well, but my problem is the adds/updates I make to the
> index using EmbeddedSolrServer are not visible by the Solr instance started
> by Tomcat, unless I restart Tomcat (which is obviously not an option).
> So, my questions are:
>
>
> 1)   Is there any way to get a reference to the Solr server started by
> Tomcat, and not having to start another one?
>
> 2)   Any other suggestion as how to implement this kind of architecture?
>
> 
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the email
> and any copies of it. Opinions, conclusion (etc) that do not relate to the
> official business of this company shall be understood as neither given nor
> endorsed by it. IG is a trading name of IG Markets Limited (a company
> registered in England and Wales, company number 04008957) and IG Index
> Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Services Authority.
>


RE: EmbeddedSolrServer reference

2013-06-10 Thread Alex Sarco
Michael, thank you for your answer.

You mean using HttpCommonsSolrServer? I thought of that, but I don't see the 
point of going through the network when I'm running in the same JVM/box as the 
main Solr server.
I still would like a solution to my issue, since so far EmbeddedSolrServer 
works fine for me.

Alex.



The information contained in this email is strictly confidential and for the 
use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Services Authority.


Re: Curious why Solr Jetty URL has a # sign?

2013-06-10 Thread Chris Hostetter

:   This may be a dumb question but I am curious why the sample Solr Jetty
: results in a URL with a # sign e.g. http://localhost:8983/solr/#/~logging ?

You're looking at the Solr UI which is a single page javascript/AJAX based 
system that uses url fragments (after the hash) to record state about what 
you are looking at in the UI

some background...

https://issues.apache.org/jira/browse/SOLR-4431?focusedCommentId=13596596&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13596596

: Is there any way to get rid of it, so I could have something like:
: http://localhost:8983/solr/~logging ? 

Why specifically does it concern/bother you about having a "#" in the UI 
URL?   Smells like an XY Problem...

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341



-Hoss


SolrEntityProcessor gets slower and slower

2013-06-10 Thread Mingfeng Yang
I trying to migrate 100M documents from a solr index (v3.6) to a solrcloud
index (v4.1, 4 shards) by using SolrEntityProcessor.  My data-config.xml is
like

  http://10.64.35.117:8995/solr/"; query="*:*" rows="2000" fl=
"author_class,authorlink,author_location_text,author_text,author,category,date,dimension,entity,id,language,md5_text,op_dimension,opinion_text,query_id,search_source,sentiment,source_domain_text,source_domain,text,textshingle,title,topic,topic_text,url"
/>  

Initially, the data import rate is about 1K docs/second, but it eventually
decrease to 20docs/second after running for tens of hours.

Last time I tried data import with solorentityprocessor, the transfer rate
can be as high as 3K docs/seconds.

Anyone has any clues what can cause the slowdown?

Thanks,
Ming-


Solr developer IRC channel

2013-06-10 Thread Yonik Seeley
FYI, I've created a #solr-dev IRC channel for those who contribute to
Solr's development.

There used to be more of a "community" feel on some of the IRC
channels that's since been lost, so I'm trying to get some of that
back with a smaller subset of people interested in developing Solr.
The channel is unlogged, and I've set the topic to
"solr dev watercooler. rule #1: be nice"

-Yonik
http://lucidworks.com


Re: SolrEntityProcessor gets slower and slower

2013-06-10 Thread Shalin Shekhar Mangar
SolrEntityProcessor is fine for small amounts of data but not useful for
such a large index. The problem is that deep paging in search results is
expensive. As the "start" value for a query increases so does the cost of
the query. You are much better off just re-indexing the data.


On Mon, Jun 10, 2013 at 11:19 PM, Mingfeng Yang wrote:

> I trying to migrate 100M documents from a solr index (v3.6) to a solrcloud
> index (v4.1, 4 shards) by using SolrEntityProcessor.  My data-config.xml is
> like
>
>url="http://10.64.35.117:8995/solr/"; query="*:*" rows="2000" fl=
>
> "author_class,authorlink,author_location_text,author_text,author,category,date,dimension,entity,id,language,md5_text,op_dimension,opinion_text,query_id,search_source,sentiment,source_domain_text,source_domain,text,textshingle,title,topic,topic_text,url"
> />  
>
> Initially, the data import rate is about 1K docs/second, but it eventually
> decrease to 20docs/second after running for tens of hours.
>
> Last time I tried data import with solorentityprocessor, the transfer rate
> can be as high as 3K docs/seconds.
>
> Anyone has any clues what can cause the slowdown?
>
> Thanks,
> Ming-
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: How to check does index needs optimize or not?

2013-06-10 Thread Cosimo Streppone

On 10/6/2013 19:15, Shawn Heisey wrote:


On 6/10/2013 10:18 AM, Michael Della Bitta wrote:


Hi all,

first post to this really useful list.
My experience with Solr (4.0) started just a few months ago.
I had no prior exposure to Solr 3.x.


That was my flip way of saying that "Optimize" is a highly optional
procedure that should not be undertaken under ordinary circumstances.

>> [...]

There's even a JIRA to remove it from Solr: 
https://issues.apache.org/jira/browse/SOLR-3141


I really liked the LHC page. :)  Michael is correct here.  If you look
through that JIRA, you'll see that there are still very valid reasons
for doing an optimize, but the age-old reason of "improving performance"
is not one of them.


That is interesting, because I am not running any manual optimize,
but I can clearly see that Solr master is doing something to the index
that periodically brings it down about 80% in size.

After that, the query response time is much lower, and more
importantly, has a much lower variance too.

I assumed this was an on-demand automatic optimize,
but maybe it wasn't.

--
Cosimo



Re: does solr support query time only stopwords?

2013-06-10 Thread jchen2000
Thanks to you all and finally it seems that I figured out a workaround.

Yes I used edismax, but my test query was very simple,  it only queries one
field and uses only one stopword. So i see no chance it would hit another
field (but datastax might have done something we don't know). &debug didn't
yield useful information either.

So what I did was to keep the stopFilterFactory element for index analyzer
but without specifying our stopword file. I reindexed all solr cores. This
time it seems like I could get stopwords frequency info from Luke, while
queying stopwords returned 0 match.

my wild guess is that the stopFilterFactory for index analyze serves as an
overall "on" switch for the stopwords feature.


Erick Erickson wrote
> My _guess_ is that you're perhaps using
> edismax or similar and getting matches from
> fields you don't expect on terms you that are
> not stopwords. Try adding &debug=query and
> seeing what the parsed query actually is.
> 
> And, of course, I have no idea what Datastax is
> doing.
> 
> And, you have to at least reload the core
> to pick up the new stopwords.
> 
> Best
> Erick
> 
> On Sat, Jun 8, 2013 at 6:33 PM, jchen2000 <

> jchen200@

> > wrote:
>> I wanted to analyze high frequency terms using Solr's Luke request
>> handler
>> and keep updating the stopwords file for new queries from time to time.
>> Obviously I have to index all terms whether they belong to stopwords list
>> or
>> not.
>>
>> So I configured query analyzer stopwords list but disabled index analyzer
>> stopwords list, However, it seems like the query would return all records
>> containing stopwords after this.
>>
>> Anybody has an idea why this would happen?
>>
>> ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087p4069464.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.3 - Schema Parsing Failed: Invalid field property: compressed

2013-06-10 Thread Uomesh
I am upgrading from Solr 4.2 to 4.3. Till 4.2 i was not seeing any error.

Thanks,
Umesh


On Mon, Jun 10, 2013 at 2:41 AM, André Widhani [via Lucene] <
ml-node+s472066n4069276...@n3.nabble.com> wrote:

> From what version are you upgrading? The compressed attribute is
> unsupported since the 3.x releases.
>
> The change log (CHANGES.txt) has a section "Upgrading from Solr 1.4" in
> the notes for Solr 3.1:
>
> "Field compression is no longer supported. Fields that were formerly
> compressed will be uncompressed as index segments are merged. For shorter
> fields, this may actually be an improvement, as the compression used was
> not very good for short text. Some indexes may get larger though."
>
> Also, indices created with 1.4 cannot be opened with 4.x, only 3.x.
>
> Regards,
> André
>
> 
> Von: Uomesh [[hidden 
> email]]
>
> Gesendet: Montag, 10. Juni 2013 06:19
> An: [hidden email] 
> Betreff: Solr 4.3 - Schema Parsing Failed: Invalid field property:
> compressed
>
> Hi,
>
> I am getting below after upgrading to Solr 4.3. Is compressed attribute no
> longer supported in Solr 4.3 or it is a bug in 4.3?
>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Schema Parsing Failed: Invalid field property: compressed
>
> Thanks,
> Umesh
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-3-Schema-Parsing-Failed-Invalid-field-property-compressed-tp4069254.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-4-3-Schema-Parsing-Failed-Invalid-field-property-compressed-tp4069254p4069276.html
>  To unsubscribe from Solr 4.3 - Schema Parsing Failed: Invalid field
> property: compressed, click 
> here
> .
> NAML
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-Schema-Parsing-Failed-Invalid-field-property-compressed-tp4069254p4069465.html
Sent from the Solr - User mailing list archive at Nabble.com.

Hoss commented on https://people.apache.org/~hossman/comments.html

2013-06-10 Thread no-reply
Hello,
Hoss has commented on https://people.apache.org/~hossman/comments.html. 
You can find the comment here:
https://people.apache.org/~hossman/comments.html#comment_1351
Please note that if the comment contains a hyperlink, it must be approved
before it is shown on the site.

Below is the reply that was posted:

Test Comment


With regards,
Apache Solr Cwiki.

You are receiving this email because you have subscribed to changes for the 
solrcwiki site.
To stop receiving these emails, unsubscribe from the mailing list that is 
providing these notifications.



SOLR 4.3.0 synonym filter - parse error - SOLR 4.3.0

2013-06-10 Thread bbarani
For some reason I am getting the below error when parsing synonyms using
synonyms file.

Synonyms File:

http://www.pastebin.ca/2395108

The server encountered an internal error ({msg=SolrCore 'solr' is not
available due to init failure: java.io.IOException: Error parsing synonyms
file:,trace=org.apache.solr.common.SolrException: SolrCore 'solr' is not
available due to init failure: java.io.IOException: Error parsing synonyms
file: at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1212)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 

I couldn't really find any issue with the synonyms file..Can someone let me
know where I am wrong?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-synonym-filter-parse-error-SOLR-4-3-0-tp4069469.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to check does index needs optimize or not?

2013-06-10 Thread Shawn Heisey

On 6/10/2013 12:31 PM, Cosimo Streppone wrote:

On 10/6/2013 19:15, Shawn Heisey wrote:

I really liked the LHC page. :)  Michael is correct here.  If you look
through that JIRA, you'll see that there are still very valid reasons
for doing an optimize, but the age-old reason of "improving performance"
is not one of them.


That is interesting, because I am not running any manual optimize,
but I can clearly see that Solr master is doing something to the index
that periodically brings it down about 80% in size.

After that, the query response time is much lower, and more
importantly, has a much lower variance too.


Solr does merge segments according to your configuration, with a default 
mergeFactor of 10, so when 10 candidate segments exist, those specific 
segments will be merged into one larger segment.  When that has happened 
ten times, the ten larger segments will be merged into an even larger 
segment, and so on.  When segments are merged, deleted documents are not 
copied to the new segment.


An optimize is just a special explicit merge that in most cases merges 
all segments down to one.


If you are seeing an 80% index size reduction on a regular basis just 
from merging, then it sounds like you do a lot of document deletes, 
reindexes, and/or atomic updates.  When you reindex or do an atomic 
update on a document, the old one is deleted and the new one is inserted.


Document deletes don't actually remove anything from the index, they 
just mark specific document IDs as deleted.  The index doesn't get any 
smaller.  Searches will still look at the deleted docs, but they get 
removed from the results after the search is done.  Merging (or an 
optimize) is the only way that deleted documents actually get removed.


Thanks,
Shawn



Re: Hoss commented on https://people.apache.org/~hossman/comments.html

2013-06-10 Thread Chris Hostetter

FYI: this is a test as part of SOLR-4889 to get commenting enabled on the 
(hopefully soon to launch) new Solre Reference guide wiki.  

the comments.apache.org system is currently setup to notify solr-user when 
new comments are posted.

(For now i'm testing on people.apache.org becuase it's a bit easier to 
muck with the HTML in a flat page then in the cwiki templates)



: Date: Mon, 10 Jun 2013 18:43:53 + (UTC)
: From: no-re...@comments.apache.org
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Hoss commented on https://people.apache.org/~hossman/comments.html
: 
: Hello,
: Hoss has commented on https://people.apache.org/~hossman/comments.html. 
: You can find the comment here:
: https://people.apache.org/~hossman/comments.html#comment_1351
: Please note that if the comment contains a hyperlink, it must be approved
: before it is shown on the site.
: 
: Below is the reply that was posted:
: 
: Test Comment
: 
: 
: With regards,
: Apache Solr Cwiki.
: 
: You are receiving this email because you have subscribed to changes for the 
solrcwiki site.
: To stop receiving these emails, unsubscribe from the mailing list that is 
providing these notifications.
:   
: 

-Hoss


Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-10 Thread Prathik Puthran
Our dictionary is has very less words. So it is more of a feature to the
user than a nuisance.

Thanks,
Prathik


On Mon, Jun 10, 2013 at 10:52 PM, Walter Underwood wrote:

> Why do you think that is useful? That will give terrible search results.
>
> Here are the first twenty words in /usr/share/dict/words that contain the
> substring "cat".
>
> abacate
> abdicate
> abdication
> abdicative
> abdicator
> aberuncator
> abjudicate
> abjudication
> acacatechin
> acacatechol
> acatalectic
> acatalepsia
> acatalepsy
> acataleptic
> acatallactic
> acatamathesia
> acataphasia
> acataposis
> acatastasia
> acatastatic
>
> wunder
>
> On Jun 9, 2013, at 10:56 PM, Prathik Puthran wrote:
>
> > Hi,
> >
> > @Walter
> > I'm trying to implement the below feature for the user.
> > User types in any "substring" of the strings in the dictionary (i.e. the
> > indexed string) .
> > SOLR Suggester should return all the strings in the dictionary which has
> > the input string as substring.
> >
> > Thanks,
> > Prathik
> >
> >
> >
> > On Fri, Jun 7, 2013 at 4:01 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com
> >> wrote:
> >
> >> Hi
> >>
> >> Ngrams *will* do this for you.
> >>
> >> Otis
> >> Solr & ElasticSearch Support
> >> http://sematext.com/
> >> On Jun 6, 2013 7:53 AM, "Prathik Puthran" 
> >> wrote:
> >>
> >>> Basically I want the Suggester to return for "Jason Bourne" as
> suggestion
> >>> for ".*Bour.*" regex.
> >>>
> >>> Thanks,
> >>> Prathik
> >>>
> >>>
> >>> On Thu, Jun 6, 2013 at 12:52 PM, Prathik Puthran <
> >>> prathik.puthra...@gmail.com> wrote:
> >>>
>  This works even now i.e. when I search for "Jas" it suggests "Jason
>  Bourne". What I want is when I search for "Bour" or "ason" (any
> >>> substring)
>  it should suggest me "Jason Bourne" .
> 
> 
>  On Thu, Jun 6, 2013 at 12:34 PM, Upayavira  wrote:
> 
> > Can you se the ShingleFilterFactory? It is ngrams for terms rather
> >> than
> > characters. If you limited it to two term ngrams, when the user
> >> presses
> > space after their first word, you could do a suggested query against
> > your two term ngram field, which would suggest Jason Bourne, Jason
> > Statham, etc then you press space after "Jason".
> >
> > Upayavira
> >
> > On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
> >> My use case is I want to search for any substring of the indexed
> >>> string
> >> and
> >> the Suggester should suggest the indexed string. What can I do to
> >> make
> >> this
> >> work?
> >>
> >> Thanks,
> >> Prathik
> >>
> >>
> >> On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
> >>  >>> wrote:
> >>
> >>> Please excuse my misunderstanding, but I always wonder why this
> >>> index
> > time
> >>> processing is suggested usually. from my POV is the case for
> > query-time
> >>> processing i.e. PrefixQuery aka wildcard query Jason* .
> >>> Ultra-fast term retrieval also provided by TermsComponent.
> >>>
> >>>
> >>> On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <
> > j...@basetechnology.com
>  wrote:
> >>>
>  ngrams?
> 
>  See:
>  http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
>  apache/lucene/analysis/ngram/**NGramFilterFactory.html<
> >>>
> >
> >>>
> >>
> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
> 
> 
>  -- Jack Krupansky
> 
>  -Original Message- From: Prathik Puthran
>  Sent: Wednesday, June 05, 2013 11:59 AM
>  To: solr-user@lucene.apache.org
>  Subject: Configuring lucene to suggest the indexed string for
> >> all
> > the
>  searches of the substring of the indexed string
> 
> 
>  Hi,
> 
>  Is it possible to configure solr to suggest the indexed string
> >> for
> > all
> >>> the
>  searches of the substring of the string?
> 
>  Thanks,
>  Prathik
> 
> >>>
> >>>
> >>>
> >>> --
> >>> Sincerely yours
> >>> Mikhail Khludnev
> >>> Principal Engineer,
> >>> Grid Dynamics
> >>>
> >>> 
> >>> 
> >>>
> >
> 
> 
> >>>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


Re: Solr 4.3 - Schema Parsing Failed: Invalid field property: compressed

2013-06-10 Thread Shalin Shekhar Mangar
That is because starting with 4.3, Solr started throwing errors if the
schema had an illegal field parameter.

See https://issues.apache.org/jira/browse/SOLR-4641


On Tue, Jun 11, 2013 at 12:05 AM, Uomesh  wrote:

> I am upgrading from Solr 4.2 to 4.3. Till 4.2 i was not seeing any error.
>
> Thanks,
> Umesh
>
>
> On Mon, Jun 10, 2013 at 2:41 AM, André Widhani [via Lucene] <
> ml-node+s472066n4069276...@n3.nabble.com> wrote:
>
> > From what version are you upgrading? The compressed attribute is
> > unsupported since the 3.x releases.
> >
> > The change log (CHANGES.txt) has a section "Upgrading from Solr 1.4" in
> > the notes for Solr 3.1:
> >
> > "Field compression is no longer supported. Fields that were formerly
> > compressed will be uncompressed as index segments are merged. For shorter
> > fields, this may actually be an improvement, as the compression used was
> > not very good for short text. Some indexes may get larger though."
> >
> > Also, indices created with 1.4 cannot be opened with 4.x, only 3.x.
> >
> > Regards,
> > André
> >
> > 
> > Von: Uomesh [[hidden email]<
> http://user/SendEmail.jtp?type=node&node=4069276&i=0>]
> >
> > Gesendet: Montag, 10. Juni 2013 06:19
> > An: [hidden email]  >
> > Betreff: Solr 4.3 - Schema Parsing Failed: Invalid field property:
> > compressed
> >
> > Hi,
> >
> > I am getting below after upgrading to Solr 4.3. Is compressed attribute
> no
> > longer supported in Solr 4.3 or it is a bug in 4.3?
> >
> >
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > Schema Parsing Failed: Invalid field property: compressed
> >
> > Thanks,
> > Umesh
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Solr-4-3-Schema-Parsing-Failed-Invalid-field-property-compressed-tp4069254.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> > --
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/Solr-4-3-Schema-Parsing-Failed-Invalid-field-property-compressed-tp4069254p4069276.html
> >  To unsubscribe from Solr 4.3 - Schema Parsing Failed: Invalid field
> > property: compressed, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4069254&code=VW9tZXNoQGdtYWlsLmNvbXw0MDY5MjU0fDIyODkyODYxMg==
> >
> > .
> > NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-3-Schema-Parsing-Failed-Invalid-field-property-compressed-tp4069254p4069465.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Curious why Solr Jetty URL has a # sign?

2013-06-10 Thread O. Olson
Thank you Chris.

No, I do not have an XY Problem. I am new to Solr, Jetty and related
technology and was playing. I did not like the /#/ in the URL and felt that
it had no purpose. So, if I understand this correctly is Solr using the # as
a JQuery hook to decide which view to show? Am I correct in this
interpretation? 

If what I said above is correct, could I write a Jetty Rewrite rule to
eliminate the #. I could certainly write a rule to map /solr to the root /,
but I am not sure about the #. I don’t really have a need, I just wanted to
know what was possible. 

Thanks again,
O. O.



Chris Hostetter-3 wrote
> You're looking at the Solr UI which is a single page javascript/AJAX based 
> system that uses url fragments (after the hash) to record state about what 
> you are looking at in the UI
> 
> some background...
> 
> https://issues.apache.org/jira/browse/SOLR-4431?focusedCommentId=13596596&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13596596
> 
> Why specifically does it concern/bother you about having a "#" in the UI 
> URL?   Smells like an XY Problem...
> 
> https://people.apache.org/~hossman/#xyproblem
> XY Problem
> 
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
> 
> 
> 
> -Hoss





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Curious-why-Solr-Jetty-URL-has-a-sign-tp4069434p4069481.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Curious why Solr Jetty URL has a # sign?

2013-06-10 Thread Alexandre Rafalovitch
The # part is JavaScript URL. It is not seen by the server. It is part
of a standard single-page-application design approach. So, it is not
visible to Jetty rules, etc.

If you don't have a problem here, I would suggest just taking this
part on faith and continue to other parts of Solr

Regards,
   Alex.

On Mon, Jun 10, 2013 at 3:29 PM, O. Olson  wrote:
> If what I said above is correct, could I write a Jetty Rewrite rule to
> eliminate the #. I could certainly write a rule to map /solr to the root /,
> but I am not sure about the #.



Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: SOLR 4.3.0 synonym filter - parse error - SOLR 4.3.0

2013-06-10 Thread Jack Krupansky

You just have to look further down the stack trace "cause" chain to find it:

Caused by: java.text.ParseException: Invalid synonym rule at line 10
   at 
org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:72)
   at 
org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.loadSolrSynonyms(FSTSynonymFilterFactory.java:127)
   at 
org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.inform(FSTSynonymFilterFactory.java:98)

   ... 17 more
Caused by: java.lang.IllegalArgumentException: term:  was completely 
eliminated by analyzer
   at 
org.apache.lucene.analysis.synonym.SynonymMap$Builder.analyze(SynonymMap.java:140)
   at 
org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:99)
   at 
org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:70)

   ... 19 more

Your line 10:

n.e., nordest, nord-est, north-east, => northeast

You have trailing comma before the "=> operator. That resulted in an empty 
term. You have at least one other as well.


-- Jack Krupansky

-Original Message- 
From: bbarani

Sent: Monday, June 10, 2013 2:47 PM
To: solr-user@lucene.apache.org
Subject: SOLR 4.3.0 synonym filter - parse error - SOLR 4.3.0

For some reason I am getting the below error when parsing synonyms using
synonyms file.

Synonyms File:

http://www.pastebin.ca/2395108

The server encountered an internal error ({msg=SolrCore 'solr' is not
available due to init failure: java.io.IOException: Error parsing synonyms
file:,trace=org.apache.solr.common.SolrException: SolrCore 'solr' is not
available due to init failure: java.io.IOException: Error parsing synonyms
file: at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1212)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at

I couldn't really find any issue with the synonyms file..Can someone let me
know where I am wrong?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-synonym-filter-parse-error-SOLR-4-3-0-tp4069469.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: SOLR 4.3.0 synonym filter - parse error - SOLR 4.3.0

2013-06-10 Thread bbarani
Thanks a lot for your response Jack. I figured out that issue, this file is
currently generated by a perl program and seems like a bug in that program.
Thanks anyways



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-SOLR-4-3-0-synonym-filter-parse-error-SOLR-4-3-0-tp4069487p4069500.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to check does index needs optimize or not?

2013-06-10 Thread Otis Gospodnetic
Here is one way to tell if the index is optimized. Look at this graph
for example:

https://apps.sematext.com/spm/s/Dxn6SHjSLB

See the purple line labeled "delta"?  If it's not 0 it means your
index has deletions.  This index has over 100K deleted docs that have
not been expunged.  That's because we never optimize it.
See the difference beween max docs and num docs?  That's that delta.
See that green jagged line?  That's the number of segments.  It goes
up and down as Lucene merges segments.  There are close to 30 segments
in this index.  If the index were optimized, it would have just 1
segment and that green line would be down close to the X axis.

So that's one way to see if your index is optimized or not.
But as you can see here, we don't optimize this index at all.  That's
the index that's behind http://search-hadoop.com/ btw.  As you can
see, it's constantly growing slowly, but growing so we don't
optimize.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Jun 10, 2013 at 2:49 PM, Shawn Heisey  wrote:
> On 6/10/2013 12:31 PM, Cosimo Streppone wrote:
>>
>> On 10/6/2013 19:15, Shawn Heisey wrote:
>>>
>>> I really liked the LHC page. :)  Michael is correct here.  If you look
>>> through that JIRA, you'll see that there are still very valid reasons
>>> for doing an optimize, but the age-old reason of "improving performance"
>>> is not one of them.
>>
>>
>> That is interesting, because I am not running any manual optimize,
>> but I can clearly see that Solr master is doing something to the index
>> that periodically brings it down about 80% in size.
>>
>> After that, the query response time is much lower, and more
>> importantly, has a much lower variance too.
>
>
> Solr does merge segments according to your configuration, with a default
> mergeFactor of 10, so when 10 candidate segments exist, those specific
> segments will be merged into one larger segment.  When that has happened ten
> times, the ten larger segments will be merged into an even larger segment,
> and so on.  When segments are merged, deleted documents are not copied to
> the new segment.
>
> An optimize is just a special explicit merge that in most cases merges all
> segments down to one.
>
> If you are seeing an 80% index size reduction on a regular basis just from
> merging, then it sounds like you do a lot of document deletes, reindexes,
> and/or atomic updates.  When you reindex or do an atomic update on a
> document, the old one is deleted and the new one is inserted.
>
> Document deletes don't actually remove anything from the index, they just
> mark specific document IDs as deleted.  The index doesn't get any smaller.
> Searches will still look at the deleted docs, but they get removed from the
> results after the search is done.  Merging (or an optimize) is the only way
> that deleted documents actually get removed.
>
> Thanks,
> Shawn
>


Re: Curious why Solr Jetty URL has a # sign?

2013-06-10 Thread O. Olson
Thank you Alex for the explanation. I was not aware of single page
application design. After a bit of google, it seems to be more popular than
I expected.
O. O.



Alexandre Rafalovitch wrote
> The # part is JavaScript URL. It is not seen by the server. It is part
> of a standard single-page-application design approach. So, it is not
> visible to Jetty rules, etc.
> 
> If you don't have a problem here, I would suggest just taking this
> part on faith and continue to other parts of Solr
> 
> Regards,
>Alex.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Curious-why-Solr-Jetty-URL-has-a-sign-tp4069434p4069509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr developer IRC channel

2013-06-10 Thread Otis Gospodnetic
Mucho good! +1
Why unlogged though?  Just curious.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Jun 10, 2013 at 1:52 PM, Yonik Seeley  wrote:
> FYI, I've created a #solr-dev IRC channel for those who contribute to
> Solr's development.
>
> There used to be more of a "community" feel on some of the IRC
> channels that's since been lost, so I'm trying to get some of that
> back with a smaller subset of people interested in developing Solr.
> The channel is unlogged, and I've set the topic to
> "solr dev watercooler. rule #1: be nice"
>
> -Yonik
> http://lucidworks.com


Re: How to Reach LukeRequestHandler From Solrj?

2013-06-10 Thread bbarani
Try the below code..

query.setQueryType("/admin/luke");
QueryResponse rsp = server.query( query,METHOD.GET );
System.out.println(rsp.getResponse());



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Reach-LukeRequestHandler-From-Solrj-tp4069280p4069512.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Most common query

2013-06-10 Thread Otis Gospodnetic
Reply to an ancient email...

On Thu, Feb 14, 2013 at 7:49 AM, Ahmet Arslan  wrote:
> Hi,
>
> If I am not mistaken I saw some open jira to collect queries and calculate 
> popular searches etc.
>
> Some commercial solutions exist:
>
> http://sematext.com/search-analytics/index.html

The above is actually still free.

Otis


> http://soleami.com/blog/soleami-start_en.html
>
>
> --- On Wed, 2/13/13, ROSENBERG, YOEL (YOEL)** CTR ** 
>  wrote:
>
> From: ROSENBERG, YOEL (YOEL)** CTR ** 
> Subject: Most common query
> To: "solr-user@lucene.apache.org" 
> Date: Wednesday, February 13, 2013, 5:27 PM
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hi,
>
>
>
> I have a question, hope you can help me.
>
> I would like to get report using the solr admin
> tools that return the entire search that made on the system between dates.
>
> What is the correct way to do it?
>
>
>
> BR,
>
> Yoel
>
>
>
>
>
> Yoel Rosenberg
>
> ALCATEL-LUCENT
>
> Support Engineer
>
> T: +972 77 9088584
>
> M: +972 54 239 5204
>
> yoel.rosenb...@alcatel-lucent.com
>
>
>
>
>
>
>
>


Re: Query-node+shard stickiness?

2013-06-10 Thread Otis Gospodnetic
Yeah, that sounds complique and messy.
Just the other day I was looking at performance metrics for a customer
using master-slave setup and this sort of query->slave mapping behing
the load balancer.
After switching from such a setup to a round-robin setup their
performance noticeably suffered...

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Jun 10, 2013 at 8:06 AM, Erick Erickson  wrote:
> Nothing I've seen. It would get really tricky
> though. Each node in the cluster would have
> to have a copy of all queries received by
> _any_ node which would result in all
> queries being sent to all nodes along with
> an indication of what node that query was
> actually supposed to be serviced by.
>
> And now suppose there were 100 shards,
> then the list of the correct node would get
> quite large.
>
> Seems overly complex for the benefit, but
> what do I know?
>
> FWIW
> Erick
>
>
>
> On Sat, Jun 8, 2013 at 10:38 PM, Otis Gospodnetic
>  wrote:
>> Hi,
>>
>> Is there anything in SolrCloud that would support query-node/shard
>> affinity/stickiness?
>>
>> What I mean by that is a mechanism that is smart enough to keep
>> sending the same query X to the same node(s)+shard(s)... with the goal
>> being better utilization of Solr and OS caches?
>>
>> Example:
>> * Imagine a Collection with 2 shards and 3 replicas: s1r1, s1r2, s1r3,
>> s2r1, s2r2, s2r3
>> * Query for "Foo Bar" comes in and hits one of the nodes, say s1r1
>> * Since shard 2 needs to be queried, too, one of its 3 replicas needs
>> to be searched.  Say s2r1 gets searched
>> * 5 minutes later the same query for "Foo Bar" comes in, say it hits s1r1 
>> again
>> * Again shard 2 needs to be searched.  But which of the 3 replicas
>> should be searched?
>> * Ideally that same s2r1 would be searched
>>
>> Is there anything in SolrCloud that can accomplish this?
>> Or if there a place in SolrCloud where such "query hash ==>
>> node/shard" mapping could be implemented?
>>
>> Thanks,
>> Otis
>> --
>> Solr & ElasticSearch Support
>> http://sematext.com/


Re: Solr developer IRC channel

2013-06-10 Thread Yonik Seeley
On Mon, Jun 10, 2013 at 5:32 PM, Otis Gospodnetic
 wrote:
> Mucho good! +1
> Why unlogged though?  Just curious.

Personal preference give it a more informal / slightly more private feel.
Some people don't want casual watercooler chat recorded & publicized forever.

-Yonik
http://lucidworks.com


Re: Dataless nodes in SolrCloud?

2013-06-10 Thread Otis Gospodnetic
I think it would be useful.  I know people using ElasticSearch use it
relatively often.

>  Is aggregation expensive enough to warrant a separate box?

I think it can get expensive if X in rows=X is highish.  We've seen
this reported here on the Solr ML before
So to make sorting/merging of N result set from N "data nodes" on this
"aggregator node" you may want to get all the CPU you can get and not
have the CPU simultaneously also try to handle incoming queries.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Jun 10, 2013 at 5:32 AM, Shalin Shekhar Mangar
 wrote:
> No, there's no such notion in SolrCloud. Each node that is part of a
> collection/shard is a replica and will handle indexing/querying. Even
> though you can send a request to a node containing a different collection,
> the request would just be forwarded to the right node and will be executed
> there.
>
> That being said, do people find such a feature useful? Is aggregation
> expensive enough to warrant a separate box? In a distributed search, the
> local index is used. One'd would just be adding a couple of extra network
> requests if you don't have a local index.
>
>
> On Sun, Jun 9, 2013 at 11:18 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> Hi,
>>
>> Is there a notion of a data-node vs. non-data node in SolrCloud?
>> Something a la http://www.elasticsearch.org/guide/reference/modules/node/
>>
>>
>> Thanks,
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.


Re: Dataless nodes in SolrCloud?

2013-06-10 Thread Tim Vaillancourt
To answer Otis' question of whether or not this would be useful, the
trouble is, I don't know! :) It very well could be useful for my use case.

Is there any way to determine the impact of result merging (time spent?
Etc?) aside from just 'trying it'?

Cheers,

Tim


On 10 June 2013 14:48, Otis Gospodnetic  wrote:

> I think it would be useful.  I know people using ElasticSearch use it
> relatively often.
>
> >  Is aggregation expensive enough to warrant a separate box?
>
> I think it can get expensive if X in rows=X is highish.  We've seen
> this reported here on the Solr ML before
> So to make sorting/merging of N result set from N "data nodes" on this
> "aggregator node" you may want to get all the CPU you can get and not
> have the CPU simultaneously also try to handle incoming queries.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Mon, Jun 10, 2013 at 5:32 AM, Shalin Shekhar Mangar
>  wrote:
> > No, there's no such notion in SolrCloud. Each node that is part of a
> > collection/shard is a replica and will handle indexing/querying. Even
> > though you can send a request to a node containing a different
> collection,
> > the request would just be forwarded to the right node and will be
> executed
> > there.
> >
> > That being said, do people find such a feature useful? Is aggregation
> > expensive enough to warrant a separate box? In a distributed search, the
> > local index is used. One'd would just be adding a couple of extra network
> > requests if you don't have a local index.
> >
> >
> > On Sun, Jun 9, 2013 at 11:18 AM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Is there a notion of a data-node vs. non-data node in SolrCloud?
> >> Something a la
> http://www.elasticsearch.org/guide/reference/modules/node/
> >>
> >>
> >> Thanks,
> >> Otis
> >> Solr & ElasticSearch Support
> >> http://sematext.com/
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>


RE: Solr developer IRC channel

2013-06-10 Thread Vaillancourt, Tim
I agree with Yonik. It is great to see an IRC for Solr!

Tim

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Monday, June 10, 2013 2:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr developer IRC channel

On Mon, Jun 10, 2013 at 5:32 PM, Otis Gospodnetic  
wrote:
> Mucho good! +1
> Why unlogged though?  Just curious.

Personal preference give it a more informal / slightly more private feel.
Some people don't want casual watercooler chat recorded & publicized forever.

-Yonik
http://lucidworks.com



external zookeeper with SolrCloud

2013-06-10 Thread Joshi, Shital
Hi,



We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up 
Solr nodes while the zookeeper instance is not up and running, we see this 
error in Solr logs.



java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)

at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)



INFO  - 2013-06-10 15:03:35.422; 
org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 
[main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  ? 
Watcher org.apache.solr.common.cloud.ConnectionManager@530d0eae 
name:ZooKeeperConnection Watcher: . got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None



INFO  - 2013-06-10 15:03:35.423; 
org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status change 
trigger but we are already closed

592148 [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  
? Client->ZooKeeper status change trigger but we are already closed



After we bring up zookeeper instance, the node never connects to zookeeper and 
we can't see the solr admin page, until we restart the node.



Does the zookeeper instance has to be up when we bring up Solr node? That's not 
what the documentation say though.



Thanks.


Re: Query-node+shard stickiness?

2013-06-10 Thread Otis Gospodnetic
Actually, it doesn't really have to be messy.
Couldn't one have a custom handler that know how to compute a query
hash and map it to a specific node?
When the same query comes in again, the same computation will be doen
and the same node will be selected to execute the query.
No need for any nodes to have any "physical" query->node mapping and
no need for nodes to broadcast this sort of info around.

Sounds doable to me.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Jun 10, 2013 at 5:44 PM, Otis Gospodnetic
 wrote:
> Yeah, that sounds complique and messy.
> Just the other day I was looking at performance metrics for a customer
> using master-slave setup and this sort of query->slave mapping behing
> the load balancer.
> After switching from such a setup to a round-robin setup their
> performance noticeably suffered...
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Mon, Jun 10, 2013 at 8:06 AM, Erick Erickson  
> wrote:
>> Nothing I've seen. It would get really tricky
>> though. Each node in the cluster would have
>> to have a copy of all queries received by
>> _any_ node which would result in all
>> queries being sent to all nodes along with
>> an indication of what node that query was
>> actually supposed to be serviced by.
>>
>> And now suppose there were 100 shards,
>> then the list of the correct node would get
>> quite large.
>>
>> Seems overly complex for the benefit, but
>> what do I know?
>>
>> FWIW
>> Erick
>>
>>
>>
>> On Sat, Jun 8, 2013 at 10:38 PM, Otis Gospodnetic
>>  wrote:
>>> Hi,
>>>
>>> Is there anything in SolrCloud that would support query-node/shard
>>> affinity/stickiness?
>>>
>>> What I mean by that is a mechanism that is smart enough to keep
>>> sending the same query X to the same node(s)+shard(s)... with the goal
>>> being better utilization of Solr and OS caches?
>>>
>>> Example:
>>> * Imagine a Collection with 2 shards and 3 replicas: s1r1, s1r2, s1r3,
>>> s2r1, s2r2, s2r3
>>> * Query for "Foo Bar" comes in and hits one of the nodes, say s1r1
>>> * Since shard 2 needs to be queried, too, one of its 3 replicas needs
>>> to be searched.  Say s2r1 gets searched
>>> * 5 minutes later the same query for "Foo Bar" comes in, say it hits s1r1 
>>> again
>>> * Again shard 2 needs to be searched.  But which of the 3 replicas
>>> should be searched?
>>> * Ideally that same s2r1 would be searched
>>>
>>> Is there anything in SolrCloud that can accomplish this?
>>> Or if there a place in SolrCloud where such "query hash ==>
>>> node/shard" mapping could be implemented?
>>>
>>> Thanks,
>>> Otis
>>> --
>>> Solr & ElasticSearch Support
>>> http://sematext.com/


Hoss commented on http://cwiki.apache.org/SOLR/test-page-1.html

2013-06-10 Thread no-reply
Hello,
Hoss has commented on http://cwiki.apache.org/SOLR/test-page-1.html. 
You can find the comment here:
http://cwiki.apache.org/SOLR/test-page-1.html#comment_1359
Please note that if the comment contains a hyperlink, it must be approved
before it is shown on the site.

Below is the reply that was posted:

Test comment in Test Page 2 (attempt 2)

this is a long paragraph 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 
1234567890 test 1234567890 test 1234567890 te
 st 1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 
test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 
test 1234567890 test 1234567890 test 1234567890 test 1234567890 test 1234567890 
test.


With regards,
Apache Solr Cwiki.

You are receiving this email because you have subscribed to changes for the 
solrcwiki site.
To stop receiving these emails, unsubscribe from the mailing list that is 
providing these notifications.



Re: external zookeeper with SolrCloud

2013-06-10 Thread Mark Miller
This might be https://issues.apache.org/jira/browse/SOLR-4899

- Mark

On Jun 10, 2013, at 5:59 PM, "Joshi, Shital"  wrote:

> Hi,
> 
> 
> 
> We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up 
> Solr nodes while the zookeeper instance is not up and running, we see this 
> error in Solr logs.
> 
> 
> 
> java.net.ConnectException: Connection refused
> 
>at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 
>at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> 
>at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
> 
>at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 
> 
> 
> INFO  - 2013-06-10 15:03:35.422; 
> org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 
> [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  ? 
> Watcher org.apache.solr.common.cloud.ConnectionManager@530d0eae 
> name:ZooKeeperConnection Watcher: . got event WatchedEvent 
> state:SyncConnected type:None path:null path:null type:None
> 
> 
> 
> INFO  - 2013-06-10 15:03:35.423; 
> org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status 
> change trigger but we are already closed
> 
> 592148 [main-EventThread] INFO  
> org.apache.solr.common.cloud.ConnectionManager  ? Client->ZooKeeper status 
> change trigger but we are already closed
> 
> 
> 
> After we bring up zookeeper instance, the node never connects to zookeeper 
> and we can't see the solr admin page, until we restart the node.
> 
> 
> 
> Does the zookeeper instance has to be up when we bring up Solr node? That's 
> not what the documentation say though.
> 
> 
> 
> Thanks.



Re: index merge question

2013-06-10 Thread Jamie Johnson
Thanks Mark.  My question is stemming from the new cloudera search stuff.
My concern its that if while rebuilding the index someone updates a doc
that update could be lost from a solr perspective.  I guess what would need
to happen to ensure the correct information was indexed would be to record
the start time and reindex the information that changed since then?
On Jun 8, 2013 2:37 PM, "Mark Miller"  wrote:

>
> On Jun 8, 2013, at 12:52 PM, Jamie Johnson  wrote:
>
> > When merging through the core admin (
> > http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for
> > conflicts during the merge?  So for instance if I am merging core 1 and
> > core 2 into core 0 (first example), what happens if core 1 and core 2
> both
> > have a document with the same key, say core 1 has a newer version of core
> > 2?  Does the merge fail, does the newer document remain?
>
> You end up with both documents, both with that ID - not generally a
> situation you want to end up in. You need to ensure unique id's in the
> input data or replace the index rather than merging into it.
>
> >
> > Also if using the srcCore method if a document with key 1 is written
> while
> > an index also with key 1 is being merged what happens?
>
> It depends on the order I think - if the doc is written after the merge
> and it's an update, it will update the doc that was just merged in. If the
> merge comes second, you have the doc twice and it's a problem.
>
> - Mark


Field Names

2013-06-10 Thread PeriS

I was wondering if there was a way to define field names that are more less 
dynamic in nature but follow a regular expression pattern. I know you can have 
asterisk either as a prefix or a suffix but not both or somewhere int he middle 
of a name.

Goal: to define a field that takes up the form like 10*_* which would translate 
to 100_whatever thru 109_whatever; Any ideas please?

Thanks
-PeriS 



Re: Field Names

2013-06-10 Thread Gora Mohanty
On 11 June 2013 07:24, PeriS  wrote:
>
> I was wondering if there was a way to define field names that are more less 
> dynamic in nature but follow a regular expression pattern. I know you can 
> have asterisk either as a prefix or a suffix but not both or somewhere int he 
> middle of a name.
>

No. As http://wiki.apache.org/solr/SchemaXml#Dynamic_fields notes
"*" can appear only at the beginning or the end.

> Goal: to define a field that takes up the form like 10*_* which would 
> translate to 100_whatever thru 109_whatever; Any ideas please?

What is wrong with, e.g.,


Regards,
Gora


Re: Lucene/Solr Filesystem tunings

2013-06-10 Thread Ryan Zezeski
Just to add to the pile...use the Deadline or NOOP I/O scheduler.

-Z


On Sat, Jun 8, 2013 at 4:40 PM, Mark Miller  wrote:

> Turning swappiness down to 0 can have some decent performance impact.
>
> - http://en.wikipedia.org/wiki/Swappiness
>
> In the past, I've seen better performance with ext3 over ext4 around
> commits/fsync. Test were actually enough slower (lots of these operations),
> that I made a special ext3 partition workspace for lucene/solr dev. (Still
> use ext4 for root and home).
>
> Have not checked that recently, and it may not be a large concern for many
> use cases.
>
> - Mark
>
> On Jun 4, 2013, at 6:48 PM, Tim Vaillancourt  wrote:
>
> > Hey all,
> >
> > Does anyone have any advice or special filesytem tuning to share for
> Lucene/Solr, and which file systems they like more?
> >
> > Also, does Lucene/Solr care about access times if I turn them off (I
> think I doesn't care)?
> >
> > A bit unrelated: What are people's opinions on reducing some consistency
> things like filesystem journaling, etc (ext2?) due to SolrCloud's
> additional HA with replicas? How about RAID 0 x 3 replicas or so?
> >
> > Thanks!
> >
> > Tim Vaillancourt
>
>


Re: Field Names

2013-06-10 Thread Jack Krupansky

One idea: DON'T DO IT!

Seriously, if you find yourself trying to "play games" with field names, it 
says that you probably have a data model that is grossly out of line with 
the strengths (and weaknesses) of Solr.


Dynamic fields are fine - when used in moderation, but not when pushed to 
extremes.


What are you really trying to model and why does it seem to depend on 
dynamic fields?


In particular, how would users and the application query those dynamic 
fields?


-- Jack Krupansky

-Original Message- 
From: PeriS

Sent: Monday, June 10, 2013 9:54 PM
To: solr-user@lucene.apache.org
Subject: Field Names


I was wondering if there was a way to define field names that are more less 
dynamic in nature but follow a regular expression pattern. I know you can 
have asterisk either as a prefix or a suffix but not both or somewhere int 
he middle of a name.


Goal: to define a field that takes up the form like 10*_* which would 
translate to 100_whatever thru 109_whatever; Any ideas please?


Thanks
-PeriS 



shard splitting

2013-06-10 Thread Mingfeng Yang
>From the solr wiki, I saw this command (
http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=&shard=shardId)
which split one index into 2 shards.  However, is there someway to split
into more shards?

Thanks,
Ming-


reg: efficient querying using solr

2013-06-10 Thread gururaj kosuru
Hello,
 I have recently started using solr 3.4 and have a standalone
system deployed that has 40,000,000 data rows with 3 indexed fields
totalling around 9 GB. I have given a Heap size of 2GB and I run the
instance on Tomcat on an i7 system with 4 GB RAM. My queries involve
searching among the indexed fields using the keyword 'OR' to search for
multiple words in a single field and the keyword 'AND' to get the
intersection of multiple matches from different fields. The problem I face
is an out of memory exception that happens when the solr core has been
queried for a long time and I am forced to restart the solr instance. My
primary questions are :

1. Can I make any changes in my query as I have noticed that if I divide my
query into parts eg- if my query has "A and B and C", executing only A
gives out 75,000 results. However, If I run the whole query, I get an out
of memory error in SegmentNorms.java at line 156 which allocates a new byte
array of size count.

2. Does my index need to be able to fit into the RAM in one go?

3. Will moving to Solr Cloud solve the out of memory exception issue and if
so, what will be the ratio of my RAM size to the shard size that should be
used?

Thanks a lot,
Gururaj


Re: shard splitting

2013-06-10 Thread Shalin Shekhar Mangar
No, it is hard coded to split into two shards only. You can call it
recursively on a sub shard to split into more pieces. Please note that some
serious bugs were found in that command which will be fixed in the next
(4.3.1) release of Solr.


On Tue, Jun 11, 2013 at 9:43 AM, Mingfeng Yang wrote:

> From the solr wiki, I saw this command (
> http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=
> &shard=shardId)
> which split one index into 2 shards.  However, is there someway to split
> into more shards?
>
> Thanks,
> Ming-
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: reg: efficient querying using solr

2013-06-10 Thread Walter Underwood
2GB is a rather small heap. Our production systems run with 8GB and smaller 
indexes than that. Our dev and test systems run with 6GB heaps.

wunder

On Jun 10, 2013, at 9:52 PM, gururaj kosuru wrote:

> Hello,
> I have recently started using solr 3.4 and have a standalone
> system deployed that has 40,000,000 data rows with 3 indexed fields
> totalling around 9 GB. I have given a Heap size of 2GB and I run the
> instance on Tomcat on an i7 system with 4 GB RAM. My queries involve
> searching among the indexed fields using the keyword 'OR' to search for
> multiple words in a single field and the keyword 'AND' to get the
> intersection of multiple matches from different fields. The problem I face
> is an out of memory exception that happens when the solr core has been
> queried for a long time and I am forced to restart the solr instance. My
> primary questions are :
> 
> 1. Can I make any changes in my query as I have noticed that if I divide my
> query into parts eg- if my query has "A and B and C", executing only A
> gives out 75,000 results. However, If I run the whole query, I get an out
> of memory error in SegmentNorms.java at line 156 which allocates a new byte
> array of size count.
> 
> 2. Does my index need to be able to fit into the RAM in one go?
> 
> 3. Will moving to Solr Cloud solve the out of memory exception issue and if
> so, what will be the ratio of my RAM size to the shard size that should be
> used?
> 
> Thanks a lot,
> Gururaj

--
Walter Underwood
wun...@wunderwood.org





Re: shard splitting

2013-06-10 Thread Mingfeng Yang
Hi Shalin,

Do you mean that we can do 1->2, 2->4, 4->8 to get 8 shards eventually?

After splitting, if we want to set up a solrcloud with all 8 shards, how
shall we allocate the shards then?

Thanks,
Ming-


On Mon, Jun 10, 2013 at 9:55 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> No, it is hard coded to split into two shards only. You can call it
> recursively on a sub shard to split into more pieces. Please note that some
> serious bugs were found in that command which will be fixed in the next
> (4.3.1) release of Solr.
>
>
> On Tue, Jun 11, 2013 at 9:43 AM, Mingfeng Yang  >wrote:
>
> > From the solr wiki, I saw this command (
> >
> http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=
> > &shard=shardId)
> > which split one index into 2 shards.  However, is there someway to split
> > into more shards?
> >
> > Thanks,
> > Ming-
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: reg: efficient querying using solr

2013-06-10 Thread gururaj kosuru
Hi Walter,
 thanks for replying. Do you mean that it is necessary for
the index to fit into the heap? if so, will a heap size that is greater
than the actual RAM size slow down the queries?

Thanks,
Gururaj


On 11 June 2013 10:36, Walter Underwood  wrote:

> 2GB is a rather small heap. Our production systems run with 8GB and
> smaller indexes than that. Our dev and test systems run with 6GB heaps.
>
> wunder
>
> On Jun 10, 2013, at 9:52 PM, gururaj kosuru wrote:
>
> > Hello,
> > I have recently started using solr 3.4 and have a standalone
> > system deployed that has 40,000,000 data rows with 3 indexed fields
> > totalling around 9 GB. I have given a Heap size of 2GB and I run the
> > instance on Tomcat on an i7 system with 4 GB RAM. My queries involve
> > searching among the indexed fields using the keyword 'OR' to search for
> > multiple words in a single field and the keyword 'AND' to get the
> > intersection of multiple matches from different fields. The problem I
> face
> > is an out of memory exception that happens when the solr core has been
> > queried for a long time and I am forced to restart the solr instance. My
> > primary questions are :
> >
> > 1. Can I make any changes in my query as I have noticed that if I divide
> my
> > query into parts eg- if my query has "A and B and C", executing only A
> > gives out 75,000 results. However, If I run the whole query, I get an out
> > of memory error in SegmentNorms.java at line 156 which allocates a new
> byte
> > array of size count.
> >
> > 2. Does my index need to be able to fit into the RAM in one go?
> >
> > 3. Will moving to Solr Cloud solve the out of memory exception issue and
> if
> > so, what will be the ratio of my RAM size to the shard size that should
> be
> > used?
> >
> > Thanks a lot,
> > Gururaj
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


Re: reg: efficient querying using solr

2013-06-10 Thread Walter Underwood
An index does not need to fit into the heap. But a 4GB machine is almost 
certainly too small to run Solr with 40 million documents.

wunder

On Jun 10, 2013, at 10:36 PM, gururaj kosuru wrote:

> Hi Walter,
> thanks for replying. Do you mean that it is necessary for
> the index to fit into the heap? if so, will a heap size that is greater
> than the actual RAM size slow down the queries?
> 
> Thanks,
> Gururaj
> 
> 
> On 11 June 2013 10:36, Walter Underwood  wrote:
> 
>> 2GB is a rather small heap. Our production systems run with 8GB and
>> smaller indexes than that. Our dev and test systems run with 6GB heaps.
>> 
>> wunder
>> 
>> On Jun 10, 2013, at 9:52 PM, gururaj kosuru wrote:
>> 
>>> Hello,
>>>I have recently started using solr 3.4 and have a standalone
>>> system deployed that has 40,000,000 data rows with 3 indexed fields
>>> totalling around 9 GB. I have given a Heap size of 2GB and I run the
>>> instance on Tomcat on an i7 system with 4 GB RAM. My queries involve
>>> searching among the indexed fields using the keyword 'OR' to search for
>>> multiple words in a single field and the keyword 'AND' to get the
>>> intersection of multiple matches from different fields. The problem I
>> face
>>> is an out of memory exception that happens when the solr core has been
>>> queried for a long time and I am forced to restart the solr instance. My
>>> primary questions are :
>>> 
>>> 1. Can I make any changes in my query as I have noticed that if I divide
>> my
>>> query into parts eg- if my query has "A and B and C", executing only A
>>> gives out 75,000 results. However, If I run the whole query, I get an out
>>> of memory error in SegmentNorms.java at line 156 which allocates a new
>> byte
>>> array of size count.
>>> 
>>> 2. Does my index need to be able to fit into the RAM in one go?
>>> 
>>> 3. Will moving to Solr Cloud solve the out of memory exception issue and
>> if
>>> so, what will be the ratio of my RAM size to the shard size that should
>> be
>>> used?
>>> 
>>> Thanks a lot,
>>> Gururaj
>> 
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>> 
>> 
>> 
>> 

--
Walter Underwood
wun...@wunderwood.org





Re: reg: efficient querying using solr

2013-06-10 Thread gururaj kosuru
How can one calculate an ideal max shard size for a solr core instance if I
am running a cloud with multiple systems of 4GB?

Thanks


On 11 June 2013 11:18, Walter Underwood  wrote:

> An index does not need to fit into the heap. But a 4GB machine is almost
> certainly too small to run Solr with 40 million documents.
>
> wunder
>
> On Jun 10, 2013, at 10:36 PM, gururaj kosuru wrote:
>
> > Hi Walter,
> > thanks for replying. Do you mean that it is necessary for
> > the index to fit into the heap? if so, will a heap size that is greater
> > than the actual RAM size slow down the queries?
> >
> > Thanks,
> > Gururaj
> >
> >
> > On 11 June 2013 10:36, Walter Underwood  wrote:
> >
> >> 2GB is a rather small heap. Our production systems run with 8GB and
> >> smaller indexes than that. Our dev and test systems run with 6GB heaps.
> >>
> >> wunder
> >>
> >> On Jun 10, 2013, at 9:52 PM, gururaj kosuru wrote:
> >>
> >>> Hello,
> >>>I have recently started using solr 3.4 and have a standalone
> >>> system deployed that has 40,000,000 data rows with 3 indexed fields
> >>> totalling around 9 GB. I have given a Heap size of 2GB and I run the
> >>> instance on Tomcat on an i7 system with 4 GB RAM. My queries involve
> >>> searching among the indexed fields using the keyword 'OR' to search for
> >>> multiple words in a single field and the keyword 'AND' to get the
> >>> intersection of multiple matches from different fields. The problem I
> >> face
> >>> is an out of memory exception that happens when the solr core has been
> >>> queried for a long time and I am forced to restart the solr instance.
> My
> >>> primary questions are :
> >>>
> >>> 1. Can I make any changes in my query as I have noticed that if I
> divide
> >> my
> >>> query into parts eg- if my query has "A and B and C", executing only A
> >>> gives out 75,000 results. However, If I run the whole query, I get an
> out
> >>> of memory error in SegmentNorms.java at line 156 which allocates a new
> >> byte
> >>> array of size count.
> >>>
> >>> 2. Does my index need to be able to fit into the RAM in one go?
> >>>
> >>> 3. Will moving to Solr Cloud solve the out of memory exception issue
> and
> >> if
> >>> so, what will be the ratio of my RAM size to the shard size that should
> >> be
> >>> used?
> >>>
> >>> Thanks a lot,
> >>> Gururaj
> >>
> >> --
> >> Walter Underwood
> >> wun...@wunderwood.org
> >>
> >>
> >>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


Re: shard splitting

2013-06-10 Thread Shalin Shekhar Mangar
Hi Ming,

Yes, that's exactly what I meant. Referring to your last email about
SolrEntityProcessor -- If you're trying to migrate from a 3.x installation
to SolrCloud, then I think that you should create a SolrCloud installation
with numShards=1 and copy over your previous (3.x) index. Then you can use
shard splitting to increase the number of shards.

On Tue, Jun 11, 2013 at 10:53 AM, Mingfeng Yang wrote:

> Hi Shalin,
>
> Do you mean that we can do 1->2, 2->4, 4->8 to get 8 shards eventually?
>
> After splitting, if we want to set up a solrcloud with all 8 shards, how
> shall we allocate the shards then?
>
> Thanks,
> Ming-
>
>
> On Mon, Jun 10, 2013 at 9:55 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > No, it is hard coded to split into two shards only. You can call it
> > recursively on a sub shard to split into more pieces. Please note that
> some
> > serious bugs were found in that command which will be fixed in the next
> > (4.3.1) release of Solr.
> >
> >
> > On Tue, Jun 11, 2013 at 9:43 AM, Mingfeng Yang  > >wrote:
> >
> > > From the solr wiki, I saw this command (
> > >
> >
> http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=
> > > &shard=shardId)
> > > which split one index into 2 shards.  However, is there someway to
> split
> > > into more shards?
> > >
> > > Thanks,
> > > Ming-
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.