date:20130916

Re: spellcheck causing Core Reload to hang

2013-09-16 Thread Raheel Hasan

Hi,

Basically, it hangs only on "core Reload" and not during queries.
Furthermore, there is never any error reported in the logs, in fact the log
only records until Core-Reload call. If I shut down and restart Solr, the
next time it wont start, and still no errors in the log.




On Sat, Sep 14, 2013 at 1:53 AM, Chris Hostetter
wrote:

>
> : after a lot of investigation today, I found that its the spellcheck
> : component which is causing the issue. If its turned off, all will run
> well
> : and core can easily reload. However, when the spellcheck is on, the core
> : wont reload instead hang forever.
>
> Can you take some stack traces while the server is hung?
>
> Do you have any firstSearcher or newSearcher warming queries configured?
> If so can you try adding "spellcheck=false" to those warming queries and
> see if it eliminates the problem?
>
> Smells like this thread...
>
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201309.mbox/%3Calpine.DEB.2.02.1309061149310.10818@frisbee%3E
>
>
> ...would be good to get a jira open with a reproducible set of configs
> that demonstrates the problem semi-reliably..
>
>
> -Hoss
>



-- 
Regards,
Raheel Hasan

Re: spellcheck causing Core Reload to hang

2013-09-16 Thread Raheel Hasan

Yes I have tried Spellcheck=false and with that everything works just fine.
But I do need Spell check component so I cant just leave it off.


On Mon, Sep 16, 2013 at 12:24 PM, Raheel Hasan wrote:

> Hi,
>
> Basically, it hangs only on "core Reload" and not during queries.
> Furthermore, there is never any error reported in the logs, in fact the log
> only records until Core-Reload call. If I shut down and restart Solr, the
> next time it wont start, and still no errors in the log.
>
>
>
>
> On Sat, Sep 14, 2013 at 1:53 AM, Chris Hostetter  > wrote:
>
>>
>> : after a lot of investigation today, I found that its the spellcheck
>> : component which is causing the issue. If its turned off, all will run
>> well
>> : and core can easily reload. However, when the spellcheck is on, the core
>> : wont reload instead hang forever.
>>
>> Can you take some stack traces while the server is hung?
>>
>> Do you have any firstSearcher or newSearcher warming queries configured?
>> If so can you try adding "spellcheck=false" to those warming queries and
>> see if it eliminates the problem?
>>
>> Smells like this thread...
>>
>> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201309.mbox/%3Calpine.DEB.2.02.1309061149310.10818@frisbee%3E
>>
>>
>> ...would be good to get a jira open with a reproducible set of configs
>> that demonstrates the problem semi-reliably..
>>
>>
>> -Hoss
>>
>
>
>
> --
> Regards,
> Raheel Hasan
>



-- 
Regards,
Raheel Hasan

Re: what does "UnInvertedField; UnInverted multi-valued field" means and how to fix it

2013-09-16 Thread Raheel Hasan

Hay, thanks for the reply.

So after a full day spent only on trying to figure this out, I have found
the cause (spellcheck component)... but not the solution.

Se my other post with the subject "*spellcheck causing Core Reload to hang*".
I have explained it there.

Thanks a lot.



On Sun, Sep 15, 2013 at 2:35 AM, Erick Erickson wrote:

> This is totally weird. Can you give us the exact
> command you are using?
>
> Best
> Erick
>
>
> On Fri, Sep 13, 2013 at 8:15 AM, Raheel Hasan  >wrote:
>
> > Hi guyz,
> >
> > I have an issue here in between Solr Core and Data Indexing:
> >
> > When I build some index from fresh setup, everything is fine: all queries
> > and additional/update indexing, everything runs is fine. But when I
> reload
> > the Core, the solr stops from that point onward forever.
> >
> > All i get is this line as the last line of the solr log after the issue
> as
> > occurred:
> >
> > UnInvertedField; UnInverted multi-valued field
> >
> >
> {field=prod_cited_id,memSize=4880,tindexSize=40,time=4,phase1=4,nTerms=35,bigTerms=4,termInstances=36,uses=0}
> >
> > Furthermore, the only way to get things working again, would be to delete
> > the "data" folder inside "solr/{myCore}/"...
> >
> >
> > So can anyone help me beat this issue and get things working again? I
> cant
> > afford this issue when the system is LIVE..
> >
> > Thanks a lot.
> >
> > --
> > Regards,
> > Raheel Hasan
> >
>



-- 
Regards,
Raheel Hasan

Re: spellcheck causing Core Reload to hang

2013-09-16 Thread Raheel Hasan

Please see the log (after solr restart) in the other msg I posted on this
forum with the subject: "*Unable to connect" to "http://localhost:8983/solr/
*"

Thanks.



On Mon, Sep 16, 2013 at 12:25 PM, Raheel Hasan wrote:

> Yes I have tried Spellcheck=false and with that everything works just
> fine. But I do need Spell check component so I cant just leave it off.
>
>
> On Mon, Sep 16, 2013 at 12:24 PM, Raheel Hasan 
> wrote:
>
>> Hi,
>>
>> Basically, it hangs only on "core Reload" and not during queries.
>> Furthermore, there is never any error reported in the logs, in fact the log
>> only records until Core-Reload call. If I shut down and restart Solr, the
>> next time it wont start, and still no errors in the log.
>>
>>
>>
>>
>> On Sat, Sep 14, 2013 at 1:53 AM, Chris Hostetter <
>> hossman_luc...@fucit.org> wrote:
>>
>>>
>>> : after a lot of investigation today, I found that its the spellcheck
>>> : component which is causing the issue. If its turned off, all will run
>>> well
>>> : and core can easily reload. However, when the spellcheck is on, the
>>> core
>>> : wont reload instead hang forever.
>>>
>>> Can you take some stack traces while the server is hung?
>>>
>>> Do you have any firstSearcher or newSearcher warming queries configured?
>>> If so can you try adding "spellcheck=false" to those warming queries and
>>> see if it eliminates the problem?
>>>
>>> Smells like this thread...
>>>
>>> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201309.mbox/%3Calpine.DEB.2.02.1309061149310.10818@frisbee%3E
>>>
>>>
>>> ...would be good to get a jira open with a reproducible set of configs
>>> that demonstrates the problem semi-reliably..
>>>
>>>
>>> -Hoss
>>>
>>
>>
>>
>> --
>> Regards,
>> Raheel Hasan
>>
>
>
>
> --
> Regards,
> Raheel Hasan
>



-- 
Regards,
Raheel Hasan

Re: solr/document/select not available

2013-09-16 Thread Upayavira

If you have two cores, then the core name should be in your URL.
Http://host:8983/solr//select?q=blah

Or you can set a default core in solr.xml.

Upayavira

On Sun, Sep 15, 2013, at 12:16 PM, Nutan wrote:
> I get this error : solr/select not available.I am using two cores
> document
> and contract.Solrconfig.xml of document core is :
> 
> 
> 
>   LUCENE_42
>  ${solr.collection1.data.dir:}
> /requestDispatcher>
> 
> 
>   
>default="true">
> 
>  
>explicit 
>20
>*
>2.1
>  
> 
>  
>   
>  class="solr.extraction.ExtractingRequestHandler" >
> 
> last_modified
> contents
> true
> ignored_
> 
> 
>
>   
> *:*
>   
> 
> 
> I have defined standard request handler but still why do i get this
> error?
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-document-select-not-available-tp4090171.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Storing/indexing speed drops quickly

2013-09-16 Thread Toke Eskildsen

On Fri, 2013-09-13 at 17:32 +0200, Shawn Heisey wrote:
> Put your OS and Solr itself on regular disks in RAID1 and your Solr data 
> on the SSD.  Due to the eventual decay caused by writes, SSD will 
> eventually die, so be ready for SSD failures to take out shard replicas. 

One of the very useful properties of wear-levelling on SSD's is the wear
status of the drive can be queried. When the drive nears its EOL,
replace it.

As Lucene mainly uses bulk writes when updating the index, I will add
that the chances of wearing out a SSD by using it primarily for
Lucene/Solr is pretty hard to do, unless one constructs a pathological
setup.

Your failure argument is thus really a claim that SSDs are not reliable
technology. That is a fair argument as there has been some really rotten
apples among the offerings. This is coupled with the fact that is is
still a very rapidly changing technology, which makes it hard to pick an
older proven drive that is not markedly surpassed by the bleeding edge.

> So far I'm not aware of any RAID solutions that offer TRIM support, 
> and without TRIM support, an SSD eventually has performance problems. 

Search speed is not affected as "only" write performance suffers without
trim, but index update speed will be affected. Also, while it is
possible to get TRIM in RAID, there is currently only a single hardware
option:

http://www.anandtech.com/show/6161/intel-brings-trim-to-raid0-ssd-arrays-on-7series-motherboards-we-test-it

Regards,
- Toke Eskildsen, State and University Library, Denmark

Stop zookeeper from batch

2013-09-16 Thread Prasi S

Hi,
We have setup solrcloud with zookeeper and 2 tomcats . we are using a batch
file to start the zookeeper, uplink config files and start tomcats.

Now, i need to stop zookeeper from the batch file. How is this possible.

Im using Windows server. Zookeeper 3.4.5 version.

Pls help.

Thanks,
Prasi

How to make Solr complex Join Query patch in java

2013-09-16 Thread ashimbose

Hi,

Can anyone have any idea how to write a patch in java which will support for
complex join query in solr. I have the solr source code. If you have any
sample code for the same, please share with me.

Thanks
Ashim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-make-Solr-complex-Join-Query-patch-in-java-tp4090314.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Spellcheck compounded words

2013-09-16 Thread Rah1x

Hi guyz,

Did anyone solve this issue?

I am having it also, it took me 3 days to exactly figure it out that its
coming from "spellcheck.maxCollationTries"...

Even with 1 it hangs
forewver. The only way to restart is to stop solr, delete "data" folder and
then start solr again (i.e. index lost !).

Regards,
Raheel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html
Sent from the Solr - User mailing list archive at Nabble.com.

2 question about solr and lucene

2013-09-16 Thread Robin Wei

Hi, guys:
I met two questions about solr and lucene, wish people to help out.

use payload query but can NOT with numerical field type.  for example:  
  
I implemented my own requesthandler, refer to 
http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/ 
 I query in solr:sinaTag:operate
  solr response:

  "numFound": 2,
   "start": 0,"maxScore": 99,"docs": [  {"id": "1628209010",
"followersCount": 752,
"sinaTag": "operate|99 C2C|98 B2C|97 OnlineShopping|96 E-commercial|94",
"score": 99

   },

  {"id": "1900546410",
"followersCount": 1002,
"sinaTag": "Startup|99 Benz|98 PublicRelation|97 operate|96 Activity|95 
Media|94 AD|93 Vehicle|92 ",
  "score":   96

   }

This work well. 
But query with combined with other numberical condition, such as:
sinaTag:operate and followersCount:[752 TO 752]
{"responseHeader": {"status": 0,"QTime": 40 
 },  "response": {"numFound": 0,"start": 0,
"maxScore": 0,"docs": []  }
   }
   According these dataset, the first record should be responsed rather than 
NOT FOUND.
   I not know why.


 2. About string field fuzzy match filtering, how to get the score? what the 
formula is?
When I used two or several string fuzzy match, probable AND or OR,  how to 
get the score? what the formula is?
Might I implement myself score formula class which interface or abstract 
class to extend ?
   




Thanks in advance.

Re-Ranking results based on DocValues with custom function.

2013-09-16 Thread Mathias Lux

Hi!

I'm having quite an index with a lot of text and some binary data in
the documents (numeric vectors of arbitrary size with associated
dissimilarity functions). What I want to do is to search using common
text search and then (optionally) re-rank using some custom function
like

http://localhost:8983/solr/select?q=*:*&sort=myCustomFunction(var1) asc

I've seen that there are hooks in solrconfig.xml, but I did not find
an example or some documentation. I'd be most grateful if anyone could
either point me to one or give me a hint for another way to go :)

Btw. Using just the DocValues for search is handled by a custom
RequestHandler, which works great, but using text as a main search
feature, and my DocValues for re-ranking,  I'd rather just add a
function for sorting and use the current, stable and well performing
request handler.

cheers,
Mathias

ps. a demo of the current system is available at:
http://demo-itec.uni-klu.ac.at/liredemo/

-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

RE: Spellcheck compounded words

2013-09-16 Thread Dyer, James

Which version of Solr are you running? (the post you replied to was about Solr 
3.3, but the latest version now is 4.4.)  Please provide configuration details 
and the query you are running that causes the problem.  Also explain exactly 
what the problem is (query never returns?).  Also explain why you have to 
delete the "data" dir when you restart.  With a little background information, 
maybe someone can help.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Rah1x [mailto:raheel_itst...@yahoo.com] 
Sent: Monday, September 16, 2013 5:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck compounded words

Hi guyz,

Did anyone solve this issue?

I am having it also, it took me 3 days to exactly figure it out that its
coming from "spellcheck.maxCollationTries"...

Even with 1 it hangs
forewver. The only way to restart is to stop solr, delete "data" folder and
then start solr again (i.e. index lost !).

Regards,
Raheel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Frequent softCommits leading to high faceting times?

2013-09-16 Thread Erick Erickson

Soft commits are not free, they invalidate certain caches
which then have to be reloaded. I suspect you're hitting
this big time. The question is always "do you really, really
_need_ 1 second latency?". Set the soft commit interval to
be as long as your application can stand IMO.

And it may have nothing to do with facets, since things
like your filterCache autowarming is done on soft commit.
Ditto for the other caches you've configured in solrconfig.xml.

But one thing to watch is the size of your tlog. Transaction
logs are only truncated on hard commits, and can get
replayed on restart. So you're risking long restart times here.

See:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

FWIW,
Erick

On Mon, Sep 16, 2013 at 1:43 AM, Rohit Kumar  wrote:

> Hi,
>
> We are running *SOLR 4.3* with 8 Gb of index on
>
> Ubuntu 12.04 64 bits
> Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Single core.
> 16GB RAM
>
>
> We just started using the autoSoftCommit feature and noticed the facet
> queries slowed down from milliseconds taking earlier to a minute. We have
> *8
> facet fields*.
>
> We add close to 300 documents per second during peak interval.
>
> 
> 60
> false
> 
>
> 
> 1000
> 
>
>
> Here is some information i got with debugQuery. Please note that *facet
> time is more than 50 seconds.*
>
> 
> 50779.0
> 
> 0.0
> 
> 
> 41.0
> 
> *
> 50590.0
> *
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 5.0
> 
> 
> 143.0
> 
> 
>
> Please help.
>
> Thanks,
> Rohit Kumar
>

how soft-commit works

2013-09-16 Thread Matteo Grolla

Can anyone explain me the following things about soft-commit?
-For searches o access new documents I think a new searcher is opened after a 
soft commit.
How does the near realtime requirement for soft commit match with the 
potentially long time taken to warm up caches for the new searcher?
-Is it a good idea to set 
openSearcher=false in auto commit 
and rely on soft auto commit to see new data in searches?

thanks
Matteo Grolla

Re: How to make Solr complex Join Query patch in java

2013-09-16 Thread Erick Erickson

I'd start by taking a very hard look at my data model
and seeing if I can make redefine the model (as translated
from the DB you may be coming from). Solr does not
excel at what RDBMSs are designed to do. Really. I predict
that if you just try to make Solr into a RDMBS, you'll expend a
lot of effort and not be satisfied with the results. For instance,
do you expect to support joins across shards, i.e.
distributed support? What about "block joins"? Sub-selects
(again if so, what about distributed)? Grouping?

But if you absolutely insist on trying this, look at the existing
Join code. Take a look through any classes in the Solr/Lucene
source tree starting with either Join or BlockJoin. Note that they
are two very different capabilities so be aware of that as you
look through them.

Best,
Erick

On Mon, Sep 16, 2013 at 6:18 AM, ashimbose  wrote:

> Hi,
>
> Can anyone have any idea how to write a patch in java which will support
> for
> complex join query in solr. I have the solr source code. If you have any
> sample code for the same, please share with me.
>
> Thanks
> Ashim
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-make-Solr-complex-Join-Query-patch-in-java-tp4090314.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: 2 question about solr and lucene

2013-09-16 Thread Erick Erickson

Are you saying you're trying to put payloads on the numeric data?
If that's the case I don't know how that works. But a couple of things:

sinaTag:operate and followersCount:[752 TO 752]

is incorrect, you must capitalize the and as

sinaTag:operate AND followersCount:[752 TO 752]

Your syntax _should_ work since [] is inclusive, but
I'd just try with 751 TO 753] once to be sure.

Attach &debug=all to your query and you'll see exactly how the
query is parsed. You'll also see exactly how the scores are
calculated in a long, complex bit of output. I _think_ that
fuzzy and wildcards do a "constant score query", so don't be
surprised if the calculations show you that the fuzzy matches don't
change the score.

Best,
Erick



On Mon, Sep 16, 2013 at 3:08 AM, Robin Wei  wrote:

> Hi, guys:
> I met two questions about solr and lucene, wish people to help out.
>
> use payload query but can NOT with numerical field type.  for example:
> I implemented my own requesthandler, refer to
> http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/
>  I query in solr:sinaTag:operate
>   solr response:
>
>   "numFound": 2,
>"start": 0,"maxScore": 99,"docs": [  {"id":
> "1628209010",
> "followersCount": 752,
> "sinaTag": "operate|99 C2C|98 B2C|97 OnlineShopping|96
> E-commercial|94",
> "score": 99
>
>},
>
>   {"id": "1900546410",
> "followersCount": 1002,
> "sinaTag": "Startup|99 Benz|98 PublicRelation|97 operate|96
> Activity|95 Media|94 AD|93 Vehicle|92 ",
>   "score":   96
>
>}
>
> This work well.
> But query with combined with other numberical condition, such as:
> sinaTag:operate and followersCount:[752 TO 752]
> {"responseHeader": {"status": 0,"QTime":
> 40  },  "response": {"numFound": 0,"start": 0,
>"maxScore": 0,"docs": []  }
>}
>According these dataset, the first record should be responsed rather
> than NOT FOUND.
>I not know why.
>
>
>  2. About string field fuzzy match filtering, how to get the score? what
> the formula is?
> When I used two or several string fuzzy match, probable AND or OR,
>  how to get the score? what the formula is?
> Might I implement myself score formula class which interface or
> abstract class to extend ?
>
>
>
>
>
> Thanks in advance.
>
>
>
>
>

Slow query at first time

2013-09-16 Thread Sergio Stateri

Hi,

I´m trying to make a search with Solr 4.4, but in the first time the search
is too slow. I have studied about pre-warm queries, but the query response
is the same after putting it. Can anyone help me? Here´s a piece of
solrconfig.xml:

 
  

  codigoRoteiro:95240816
  0
  20

  


in the schema.xml:




 codigoRoteiro

When I start Solr, the following message is shown:

$ java -server -Xms2048m -Xmx4096m -Dsolr.solr.home="./oracleCore/solr"
-jar start.jar
.
.
.
8233 [searcherExecutor-4-thread-1] INFO  org.apache.solr.core.SolrCore  û
QuerySenderListener done.
8235 [searcherExecutor-4-thread-1] INFO  org.apache.solr.core.SolrCore  û
[db] Registered new searcher
Searcher@30b6b67dmain{StandardDirectoryReader(segments_6:34
_f(4.4):C420060)}

And here´s my solrj sample code:

SolrServer solrServer = new HttpSolrServer(solrServerUrl);

SolrQuery query = new SolrQuery();
query.setQuery("codigoRoteiro:95240816");

query.set("start", "0"); query.set("rows", "20");
query.addField("codigoRoteiro"); query.addField("rowidString");
query.addField("descricaoRoteiro"); query.addField("numeroDias");
query.addField("numeroNoites"); query.addField("dataSaida");

Date initialTime = new Date(); QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults(); Date finalTime = new Date();
System.out.println("Total timel: " +
(finalTime.getTime()-initialTime.getTime()) + " ms");


The response time is arround 200 ms. If I remove the prewarm query, the
response time doesn´t change. Shouldn´t the response time be minor when
using pre-warm query?


Thanks in advance,

-- 
Sergio Stateri Jr.
stat...@gmail.com

Re: Solr Java Client

2013-09-16 Thread Gora Mohanty

On 16 September 2013 02:47, Baskar Sikkayan  wrote:
[...]
> Have a  question now.
>
> I know in solr its flat file system and the data will be in denormalized
> form.
>
> My question :
>
> Have 3 tables,
>
> 1) user (userid, firstname, lastname, ...)
> 2) master (masterid, skills, ...)
> 3) child (childid, masterid, userid, ...)
>
> In solr, i have added all these field for each document.
>
> Example,
>
> childid,masterid,userid,skills,firstname,lastname
>
> Real Data Example,
>
> 1(childid),1(masterid),1(userid),"java,jsp","baskar","sks"
> 2(childid),1(masterid),1(userid),"java,jsp","baskar","sks"
> 3(childid),1(masterid),1(userid),"java,jsp","baskar","sks"

As people have already advised you, the best way to decide
how to organise your data in the Solr index depends on the
searches that you want to make. This is not entirely clear
from your description above. The flattening sample that you
show above would be suitable if the user is to search by
'child' attributes, but can be simplified otherwise.

> The above data sample is from solr document.
>  In my search result, i will have to show all these fields.
>
>  User may change the name at any time.The same has to be updated in solr.
>
> In this case, i need to find all the child id that belongs to the user and
> update the username with those child ids.
>
> Please tell me if there is any other better approach than this.

How would you know that the user name has been changed?
Is there a modification date for that table. If so, it would make
sense to check that against the last time indexing to Solr was
done. A DIH delta-import makes this straightforward.

Updates as you suggest above would be the normal way to handle
things. You should batch your updates, say by running an update
script at periodic intervals.

Regards,
Gora

Re: Spellcheck compounded words

2013-09-16 Thread Raheel Hasan

Hi,

I m running 4.3..

I have posted all the details in another threat... do you want me to copy
it here? or could you see that? The subject is "*spellcheck causing Core
Reload to hang*".




On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James
wrote:

> Which version of Solr are you running? (the post you replied to was about
> Solr 3.3, but the latest version now is 4.4.)  Please provide configuration
> details and the query you are running that causes the problem.  Also
> explain exactly what the problem is (query never returns?).  Also explain
> why you have to delete the "data" dir when you restart.  With a little
> background information, maybe someone can help.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
> -Original Message-
> From: Rah1x [mailto:raheel_itst...@yahoo.com]
> Sent: Monday, September 16, 2013 5:47 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck compounded words
>
> Hi guyz,
>
> Did anyone solve this issue?
>
> I am having it also, it took me 3 days to exactly figure it out that its
> coming from "spellcheck.maxCollationTries"...
>
> Even with 1 it hangs
> forewver. The only way to restart is to stop solr, delete "data" folder and
> then start solr again (i.e. index lost !).
>
> Regards,
> Raheel
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>


-- 
Regards,
Raheel Hasan

Re: Solr Java Client

2013-09-16 Thread Baskar Sikkayan

Hi Gora,
Thanks a lot for your reply.

"As people have already advised you, the best way to decide
how to organise your data in the Solr index depends on the
searches that you want to make. This is not entirely clear
from your description above. The flattening sample that you
show above would be suitable if the user is to search by
'child' attributes, but can be simplified otherwise."

*Yes, the search is based on the child attributes.*

"How would you know that the user name has been changed?
Is there a modification date for that table. If so, it would make
sense to check that against the last time indexing to Solr was
done. A DIH delta-import makes this straightforward."

*As of now, there is no special column to know if the username has been
changed.
*
*But, whenever the user update his name, i can track that in my java code
and send the update to Solr.
*
*Here, I am planning to use Solr java client.

*
*But, all the above things are possible with Java client and also with
delta-import.
*
*I am looking for changing the solr data whenever there is a change in the
database.*
*Even there is a small delay i am fine with that.

*
*Which one you will suggest?

*
*Solr Java client or DIH delta-import

*
*My application running on server A, database on server B and solr will be
on server C.
*
*If i am supposed to use, Solr Java client, i may need to hit the database
sometimes to get some parent data and then need to send the same to Solr.
*
*Guess, its a unnecessary trip.

*
*So confused here, if i need to go with Java client or DIH delta import.
*
Thanks,
Baskar.S

On Mon, Sep 16, 2013 at 9:23 AM, Gora Mohanty  wrote:

> On 16 September 2013 02:47, Baskar Sikkayan  wrote:
> [...]
> > Have a  question now.
> >
> > I know in solr its flat file system and the data will be in denormalized
> > form.
> >
> > My question :
> >
> > Have 3 tables,
> >
> > 1) user (userid, firstname, lastname, ...)
> > 2) master (masterid, skills, ...)
> > 3) child (childid, masterid, userid, ...)
> >
> > In solr, i have added all these field for each document.
> >
> > Example,
> >
> > childid,masterid,userid,skills,firstname,lastname
> >
> > Real Data Example,
> >
> > 1(childid),1(masterid),1(userid),"java,jsp","baskar","sks"
> > 2(childid),1(masterid),1(userid),"java,jsp","baskar","sks"
> > 3(childid),1(masterid),1(userid),"java,jsp","baskar","sks"
>
> As people have already advised you, the best way to decide
> how to organise your data in the Solr index depends on the
> searches that you want to make. This is not entirely clear
> from your description above. The flattening sample that you
> show above would be suitable if the user is to search by
> 'child' attributes, but can be simplified otherwise.
>
> > The above data sample is from solr document.
> >  In my search result, i will have to show all these fields.
> >
> >  User may change the name at any time.The same has to be updated in solr.
> >
> > In this case, i need to find all the child id that belongs to the user
> and
> > update the username with those child ids.
> >
> > Please tell me if there is any other better approach than this.
>
> How would you know that the user name has been changed?
> Is there a modification date for that table. If so, it would make
> sense to check that against the last time indexing to Solr was
> done. A DIH delta-import makes this straightforward.
>
> Updates as you suggest above would be the normal way to handle
> things. You should batch your updates, say by running an update
> script at periodic intervals.
>
> Regards,
> Gora
>

RE: Spellcheck compounded words

2013-09-16 Thread Dyer, James

I would investigate Hoss's suggestion and look at warming queries.  In some 
cases I've seen "maxCollationTries" in warming queries to cause a hang.  Unless 
you're trying to build your spellcheck dictionary during warming, you can 
safely turn spellcheck off for all warming queries.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Raheel Hasan [mailto:raheelhasan@gmail.com] 
Sent: Monday, September 16, 2013 8:29 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck compounded words

Hi,

I m running 4.3..

I have posted all the details in another threat... do you want me to copy
it here? or could you see that? The subject is "*spellcheck causing Core
Reload to hang*".




On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James
wrote:

> Which version of Solr are you running? (the post you replied to was about
> Solr 3.3, but the latest version now is 4.4.)  Please provide configuration
> details and the query you are running that causes the problem.  Also
> explain exactly what the problem is (query never returns?).  Also explain
> why you have to delete the "data" dir when you restart.  With a little
> background information, maybe someone can help.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
> -Original Message-
> From: Rah1x [mailto:raheel_itst...@yahoo.com]
> Sent: Monday, September 16, 2013 5:47 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck compounded words
>
> Hi guyz,
>
> Did anyone solve this issue?
>
> I am having it also, it took me 3 days to exactly figure it out that its
> coming from "spellcheck.maxCollationTries"...
>
> Even with 1 it hangs
> forewver. The only way to restart is to stop solr, delete "data" folder and
> then start solr again (i.e. index lost !).
>
> Regards,
> Raheel
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>


-- 
Regards,
Raheel Hasan

Re: Spellcheck compounded words

2013-09-16 Thread Raheel Hasan

I am building it on Commit..
true

Please see my other thread for all Logs and Schema + Solrconfig settings.


On Mon, Sep 16, 2013 at 7:03 PM, Dyer, James
wrote:

> I would investigate Hoss's suggestion and look at warming queries.  In
> some cases I've seen "maxCollationTries" in warming queries to cause a
> hang.  Unless you're trying to build your spellcheck dictionary during
> warming, you can safely turn spellcheck off for all warming queries.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Raheel Hasan [mailto:raheelhasan@gmail.com]
> Sent: Monday, September 16, 2013 8:29 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck compounded words
>
> Hi,
>
> I m running 4.3..
>
> I have posted all the details in another threat... do you want me to copy
> it here? or could you see that? The subject is "*spellcheck causing Core
> Reload to hang*".
>
>
>
>
> On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James
> wrote:
>
> > Which version of Solr are you running? (the post you replied to was about
> > Solr 3.3, but the latest version now is 4.4.)  Please provide
> configuration
> > details and the query you are running that causes the problem.  Also
> > explain exactly what the problem is (query never returns?).  Also explain
> > why you have to delete the "data" dir when you restart.  With a little
> > background information, maybe someone can help.
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> > -Original Message-
> > From: Rah1x [mailto:raheel_itst...@yahoo.com]
> > Sent: Monday, September 16, 2013 5:47 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Spellcheck compounded words
> >
> > Hi guyz,
> >
> > Did anyone solve this issue?
> >
> > I am having it also, it took me 3 days to exactly figure it out that its
> > coming from "spellcheck.maxCollationTries"...
> >
> > Even with 1 it hangs
> > forewver. The only way to restart is to stop solr, delete "data" folder
> and
> > then start solr again (i.e. index lost !).
> >
> > Regards,
> > Raheel
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
>
>
> --
> Regards,
> Raheel Hasan
>
>


-- 
Regards,
Raheel Hasan

Re: sorting using org.apache.solr.client.solrj.SolrQuery not working

2013-09-16 Thread suren

Shawn,
  I am doing exactly same. Data output is not sorting on "LAST_NAME"
column , but it is always sorting on different column "CLAIM_NUM", and I am
not adding this sorting condition( sort on CLAIM_NUM).

solrQuery.setQuery("*:*");
solrQuery.setSort("LAST_NAM",SolrQuery.ORDER.asc);
solrQuery.setFilterQueries("String Query");
In the log i see the sorting column as "LAST_NAM". 
Is there a difference between "LAST_NAM asc" and "LAST_NAM+asc"...I see only
this diff?

"params={sort=LAST_NAM+asc&start=0&q=*:*&wt=javabin&fq=(LAST_NAM:*D*)+AND++-CLAI_RISK_MNGT_FLG+:+Y+&version=2&rows=30}
hits=196 status=0 QTime=2 "









 CLAI_IDN

I also tried addOrUpdateSort and  addSort, But it is always sorting on
CLAI_CLM_NUM, not sure why?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/sorting-using-org-apache-solr-client-solrj-SolrQuery-not-working-tp4089985p4090364.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr PingQuery

2013-09-16 Thread Furkan KAMACI

I want to add one more thing for Shawn about Zookeeper.  In order to
have quorum,
you need to have half the servers plus one available. Because of that let's
assume you have 4 machine of Zookeeper and two of them communicating within
them and other two of them communicating within them. Assume that this two
zookeeper sets (each of them has two zookeeper node) can not communicate
with each other. This will result with a brain split. So the rule is
simple. In order to have quorum,  you need to have half the servers plus
one available because there can not be two different sets at any time that
has a number of half the servers plus one. There can be only one.


2013/9/15 Shawn Heisey 

> On 9/14/2013 6:57 AM, Prasi S wrote:
> > I use SolrPingResponse.getStatus method to start indexing to solr. I use
> > SolrCloud with external zookeeper
> >
> > If i send it to the Zookeeper, if zookeeper is down, it returns NOTOK.
> >
> > But if one of my solr is up and second solr is down, the Ping returns OK
> > status.
>
> If your zookeeper is completely down or does not have quorum, then
> SolrCloud isn't going to work right, so a ping response of NOTOK is
> correct.
>
> A fully redundant zookeeper ensemble is at least three machines,
> preferably an odd number.  You can run zookeeper on the same hardware as
> Solr, but it is recommended that it be a standalone process.  You should
> not run the solr embedded zookeeper (-DzkRun) for production, because
> when you shutdown or restart Solr, the embedded zookeeper also goes down.
>
> With three machines in the zookeeper ensemble, you can have one of them
> go down and everything keeps working perfectly.
>
> If you want to know why an odd number is recommended, consider a
> scenario with four zookeepers instead of three.  In order to have
> quorum, you need to have half the servers plus one available.  On a
> four-server ensemble, that works out so that three of them have to be
> running.  You are no better off than if you have three servers, because
> in either scenario you can only have one failure.  On top of that, you
> have an extra possible point of failure and you're using more resources,
> like switchports and power.  With five servers, two can go down and
> quorum will be maintained.
>
> If you only have two zookeepers, they both must be operational in order
> to have quorum.  If one of them were to fail, quorum would be lost and
> SolrCloud would stop working correctly.
>
> SolrCloud itself is also designed to deal with a failure of a single
> machine.  A replicationFactor of at least two is required for that to
> work correctly.
>
> Thanks,
> Shawn
>
>

Re: sorting using org.apache.solr.client.solrj.SolrQuery not working

2013-09-16 Thread Chris Hostetter


: In the log i see the sorting column as "LAST_NAM". 
: Is there a difference between "LAST_NAM asc" and "LAST_NAM+asc"...I see only
: this diff?

the log message you are looking at is showing you the request params 
recieved by the handler fro mthe client, with URL escaping -- so the "+" 
you see is the url escaing of the " " sent by the client.

can you show us the  declaration for the handler name 
you are using? 

If you are seeing theresults sorted by a differnet field then the one you 
specified in the client, then it has to be specified somehwere -- I'm 
guessing since it's not explicit in that log message that it's "/select" 
but it could also be whatever you have configured asthe default="true".

my best guess is that the requestHandler has some init params that set the 
sot option as an invariant so that you can't override it.

: 
"params={sort=LAST_NAM+asc&start=0&q=*:*&wt=javabin&fq=(LAST_NAM:*D*)+AND++-CLAI_RISK_MNGT_FLG+:+Y+&version=2&rows=30}
: hits=196 status=0 QTime=2 "

one other thing to sanity check: try loading your requestHandler, with all 
ofthose params (except for "wt=javabin") in a browser window, and double 
check which order the results come back -- just to verify that the results 
really are getting sorted incorrectly on the solr side and that the 
problem isn't some other bit of javacode you have re-sorting the results 
that get returned.

if you load the URL in your browser, yo ucan also add echoParams=all to 
see every param used in the request, even if it is an invariant specified 
in the requestHandler config.


-Hoss

SOLR 4.2, slaves replicating reporting higher version number than master

2013-09-16 Thread Neal Ensor

Having a strange intermittent issue with my 1 master, 3 slave solr 4.2
setup.  On occasion, after indexing the master and replicating across the
three slaves, each slave will start reporting they are one generation ahead
(525 vs. 524 on the master) and thus out of sync.  Replication runs appear
to do nothing, and it seems to not really be affecting performance, it's
just tickling my admin nerves.

Any suggestions of what to look at?  Just upgrade solr perhaps?  4.2 might
be getting rather old...

Re: SOLR 4.2, slaves replicating reporting higher version number than master

2013-09-16 Thread Chris Hostetter


Sounds like perhaps you are getting confused by this...

https://issues.apache.org/jira/browse/SOLR-4661

...if that is the situation then it's not a bug you need to worry about, 
just a confusion in how the ReplicaitonHandler reports it's stats -- the 
newer UI makes it more clear what numbers you are looking at.

If that doesn't looke like the problem you are seeing, then more detail on 
how to reproduce what you are seeing would be helpful (replicaiton 
configs, logs from amster & slave, etc...)


-Hoss

Re: how soft-commit works

2013-09-16 Thread Shawn Heisey

On 9/16/2013 7:01 AM, Matteo Grolla wrote:
> Can anyone explain me the following things about soft-commit?
> -For searches o access new documents I think a new searcher is opened after a 
> soft commit.
>   How does the near realtime requirement for soft commit match with the 
> potentially long time taken to warm up caches for the new searcher?
> -Is it a good idea to set 
>   openSearcher=false in auto commit 
>   and rely on soft auto commit to see new data in searches?

That is a very common way for installs requiring NRT updates to get
configured.

NRTCachingDirectoryFactory, which is the directory class used in the
example since 4.0, is a wrapper around MMapDirectoryFactory, which is
the old default in 3.x.

For soft commits, the NRT directory keeps small commits in RAM rather
than writing it to the disk, which makes the process of opening a new
searcher happen a lot faster.

http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/store/NRTCachingDirectory.html

If your index rate is very fast or you index large amounts of data, the
NRT directory doesn't gain you much over MMap, but because we made it
the default in the example, it probably doesn't have any performance
detriment.

Thanks,
Shawn

Re: Slow query at first time

2013-09-16 Thread Furkan KAMACI

What is query time of your search?

I mean as like that:

QueryResponse solrResponse = query(solrParams);
solrResponse.getQTime();


2013/9/16 Sergio Stateri 

> Hi,
>
> I´m trying to make a search with Solr 4.4, but in the first time the search
> is too slow. I have studied about pre-warm queries, but the query response
> is the same after putting it. Can anyone help me? Here´s a piece of
> solrconfig.xml:
>
>  
>   
> 
>   codigoRoteiro:95240816
>   0
>   20
> 
>   
> 
>
> in the schema.xml:
>
>  required="true" multiValued="false" />
> 
>
>  codigoRoteiro
>
> When I start Solr, the following message is shown:
>
> $ java -server -Xms2048m -Xmx4096m -Dsolr.solr.home="./oracleCore/solr"
> -jar start.jar
> .
> .
> .
> 8233 [searcherExecutor-4-thread-1] INFO  org.apache.solr.core.SolrCore  û
> QuerySenderListener done.
> 8235 [searcherExecutor-4-thread-1] INFO  org.apache.solr.core.SolrCore  û
> [db] Registered new searcher
> Searcher@30b6b67dmain{StandardDirectoryReader(segments_6:34
> _f(4.4):C420060)}
>
> And here´s my solrj sample code:
>
> SolrServer solrServer = new HttpSolrServer(solrServerUrl);
>
> SolrQuery query = new SolrQuery();
> query.setQuery("codigoRoteiro:95240816");
>
> query.set("start", "0"); query.set("rows", "20");
> query.addField("codigoRoteiro"); query.addField("rowidString");
> query.addField("descricaoRoteiro"); query.addField("numeroDias");
> query.addField("numeroNoites"); query.addField("dataSaida");
>
> Date initialTime = new Date(); QueryResponse rsp = server.query( query );
> SolrDocumentList docs = rsp.getResults(); Date finalTime = new Date();
> System.out.println("Total timel: " +
> (finalTime.getTime()-initialTime.getTime()) + " ms");
>
>
> The response time is arround 200 ms. If I remove the prewarm query, the
> response time doesn´t change. Shouldn´t the response time be minor when
> using pre-warm query?
>
>
> Thanks in advance,
>
> --
> Sergio Stateri Jr.
> stat...@gmail.com
>

Re: Slow query at first time

2013-09-16 Thread Shawn Heisey

On 9/16/2013 7:15 AM, Sergio Stateri wrote:
> I´m trying to make a search with Solr 4.4, but in the first time the search
> is too slow. I have studied about pre-warm queries, but the query response
> is the same after putting it. Can anyone help me? Here´s a piece of
> solrconfig.xml:
> 
>  

You've configured a firstSearcher.  Basically what this means is that
this query will be run when Solr first starts up, and never run again
after that.  Make it a newSearcher instead of firstSearcher, and it will
get run every time a new searcher gets created, and it might solve your
problem.

For further troublsehooting if the change above doesn't help, how big is
your index, and how much RAM does the machine have?  We already know
what your java heap is (2GB minimum, 4GB maximum).

Thanks,
Shawn

Dynamic row sizing for documents via UpdateCSV

2013-09-16 Thread Utkarsh Sengar

Hello,

I am using UpdateCSV to load data in solr.

Currently I load this schema with a static set of values:
userid,name,age,location
john8322,John,32,CA
tom22,Tom,30,NY


But now I have this usecase where john8322 might have a state specific
dynamic field for example:
userid,name,age,location, ca_count_i
john8322,John,32,CA, 7

And tom22 might have different dynamic fields:
userid,name,age,location, ny_count_i,oh_count_i
tom22,Tom,30,NY, 981,11

So is it possible to pass different columns sizes for each row, something
like this:
john8322,John,32,CA,ca_count_i:7
tom22,Tom,30,NY, ny_count_i:981,oh_count_i:11

I understand that the above syntax is not possible, but is there any other
way of solving this problem?

-- 
Thanks,
-Utkarsh

dih delete doc per $deleteDocById

2013-09-16 Thread andreas owen

i am using dih and want to delete indexed documents by xml-file with ids. i 
have seen $deleteDocById used in 

data-config.xml:

  


xml-file:


2345

Re: SOLR 4.2, slaves replicating reporting higher version number than master

2013-09-16 Thread Neal Ensor

Looks much like what I'm encountering.  Guessing that will go away once I
update solr, just wanted to make sure it wasn't a real bug.  Entirely
possible we are getting some "empty commits" given the nature of the index
maintenance.  Thanks for the pointer!

On Mon, Sep 16, 2013 at 2:00 PM, Chris Hostetter
wrote:

>
> Sounds like perhaps you are getting confused by this...
>
> https://issues.apache.org/jira/browse/SOLR-4661
>
> ...if that is the situation then it's not a bug you need to worry about,
> just a confusion in how the ReplicaitonHandler reports it's stats -- the
> newer UI makes it more clear what numbers you are looking at.
>
> If that doesn't looke like the problem you are seeing, then more detail on
> how to reproduce what you are seeing would be helpful (replicaiton
> configs, logs from amster & slave, etc...)
>
>
> -Hoss
>

Re: Best configuration for 2 servers

2013-09-16 Thread Otis Gospodnetic

At the moment I can't think of any reason why queries could not be
served w/o ZK up and running.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Solr Performance Monitoring -- http://sematext.com/spm



On Mon, Sep 16, 2013 at 4:58 PM, Branham, Jeremy [HR]
 wrote:
> I may be interpreting this incorrectly, but shouldn't the cloud still serve 
> requests if ZK crashes?
>
> http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble
>
> " The problem with example B is that while there are enough Solr servers to 
> survive any one of them crashing, there is only one zookeeper server that 
> contains the state of the cluster. If that zookeeper server crashes, 
> distributed queries will still work since the solr servers remember the state 
> of the cluster last reported by zookeeper. The problem is that no new servers 
> or clients will be able to discover the cluster state, and no changes to the 
> cluster state will be possible."
>
>
>
>
>
> Jeremy D. Branham
> Performance Technologist II
> Sprint University Performance Support
> Fort Worth, TX | Tel: **DOTNET
> Office: +1 (972) 405-2970 | Mobile: +1 (817) 791-1627
> http://JeremyBranham.Wordpress.com
> http://www.linkedin.com/in/jeremybranham
>
>
> -Original Message-
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: Friday, September 13, 2013 2:40 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Best configuration for 2 servers
>
> On 9/13/2013 12:50 PM, Branham, Jeremy [HR] wrote:
>> Does this sound appropriate then? [assuming no 3rd server]
>>
>> Server A:
>> Zoo Keeper
>> SOLR with 1 shard
>>
>> Server B:
>> SOLR with ZK Host parameter set to Server A
>
> Yes, that will work, but if the ZK on server A goes down, the entire cloud is 
> down.
>
> When you create a collection with replicationFactor=2, one replica will be on 
> server A and one replica will be on server B.
>
> If you want to break the index up into multiple shards, you can, you'll also 
> need the maxShardsPerNode parameter when you create the collection, and all 
> shards will have replicas on both machines.
>
> A note about zookeeper and redundancy, and an explanation about why 3 hosts 
> are required:  To form a quorum, zookeeper must have the votes of a majority 
> of the hosts in the ensemble.  If there are only two hosts, it's not possible 
> for there to be a majority unless both hosts are up, so two hosts is actually 
> worse than one.  You need to either have one ZK node or at least three, 
> preferably an odd number.
>
> Thanks,
> Shawn
>
>
>
> 
>
> This e-mail may contain Sprint proprietary information intended for the sole 
> use of the recipient(s). Any use by others is prohibited. If you are not the 
> intended recipient, please contact the sender and delete all copies of the 
> message.
>

RE: Best configuration for 2 servers

2013-09-16 Thread Branham, Jeremy [HR]

I may be interpreting this incorrectly, but shouldn't the cloud still serve 
requests if ZK crashes?

http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble

" The problem with example B is that while there are enough Solr servers to 
survive any one of them crashing, there is only one zookeeper server that 
contains the state of the cluster. If that zookeeper server crashes, 
distributed queries will still work since the solr servers remember the state 
of the cluster last reported by zookeeper. The problem is that no new servers 
or clients will be able to discover the cluster state, and no changes to the 
cluster state will be possible."

Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
Office: +1 (972) 405-2970 | Mobile: +1 (817) 791-1627
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Friday, September 13, 2013 2:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Best configuration for 2 servers

On 9/13/2013 12:50 PM, Branham, Jeremy [HR] wrote:
> Does this sound appropriate then? [assuming no 3rd server]
>
> Server A:
> Zoo Keeper
> SOLR with 1 shard
>
> Server B:
> SOLR with ZK Host parameter set to Server A

Yes, that will work, but if the ZK on server A goes down, the entire cloud is 
down.

When you create a collection with replicationFactor=2, one replica will be on 
server A and one replica will be on server B.

If you want to break the index up into multiple shards, you can, you'll also 
need the maxShardsPerNode parameter when you create the collection, and all 
shards will have replicas on both machines.

A note about zookeeper and redundancy, and an explanation about why 3 hosts are 
required:  To form a quorum, zookeeper must have the votes of a majority of the 
hosts in the ensemble.  If there are only two hosts, it's not possible for 
there to be a majority unless both hosts are up, so two hosts is actually worse 
than one.  You need to either have one ZK node or at least three, preferably an 
odd number.

Thanks,
Shawn

This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.

Atomic commit across shards?

2013-09-16 Thread Damien Dykman


Is a commit (hard or soft) atomic across shards?

In other words, can I guaranty that any given search on a multi-shard 
collection will hit the same index generation of each shard?


Thanks,
Damien

Re: Re-Ranking results based on DocValues with custom function.

2013-09-16 Thread Chris Hostetter

: dissimilarity functions). What I want to do is to search using common
: text search and then (optionally) re-rank using some custom function
: like
: 
: http://localhost:8983/solr/select?q=*:*&sort=myCustomFunction(var1) asc

can you describe what you want your custom function to look like? it may 
already be possible using the existing functions provided out of hte box - 
just neeed to combine them to build up the mathc expression...

https://wiki.apache.org/solr/FunctionQuery

...if you really want to write your own, just implement ValueSourceParser 
and register it in solrconfig.xml...

https://wiki.apache.org/solr/SolrPlugins#ValueSourceParser

: I've seen that there are hooks in solrconfig.xml, but I did not find
: an example or some documentation. I'd be most grateful if anyone could
: either point me to one or give me a hint for another way to go :)

when writing a custom plugin like this, the best thing to do is look at 
the existing examples of that plugin.  almost all of hte built in 
ValueSourceParsers are really trivial, and can be found in tiny anonymous 
classes right inside the ValueSourceParser.java...

For example, the function ot divide the results of two other fnctions...

addParser("div", new ValueSourceParser() {
  @Override
  public ValueSource parse(FunctionQParser fp) throws SyntaxError {
ValueSource a = fp.parseValueSource();
ValueSource b = fp.parseValueSource();
return new DivFloatFunction(a, b);
  }
});

..or, if you were trying to bundle that up in your own plugin jar and 
register it in solrconfig.xml, you might write it something like...

public class DivideValueSourceParser extends ValueSourceParser {
  public DivideValueSourceParser() { }
  public ValueSource parse(FunctionQParser fp) throws SyntaxError {
ValueSource a = fp.parseValueSource();
ValueSource b = fp.parseValueSource();
return new DivFloatFunction(a, b);
  }
}

and then register it as...




depending on your needs, you may also want to write a custom ValueSource 
implementation (ie: instead of DivFloatFunction above) in which case, 
again, the best examples to look at are all of the existing ValueSource 
functions...

https://lucene.apache.org/core/4_4_0/queries/org/apache/lucene/queries/function/ValueSource.html


-Hoss

Re: requested url solr/update/extract not available on this server

2013-09-16 Thread Chris Hostetter

: Is /solr/update working?

more importantly: does "/solr/" work in your browser and return anything
useful? (nothing you've told us yet gives us anyway of knowning if
solr is even up and running)

if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are
using the stock Solr 4.2 example configs, then
http://localhost:8080/solr/update/extract should not give you a 404 error.

if however you are using some other configs, it might not work unless
those configs register a handler with the path /update/extract.

Using the jetty setup provided with 4.2, and the example configs (from
4.2) I was able to index a sample PDF just fine using your curl command...

hossman@frisbee:~/tmp$ curl
"http://localhost:8983/solr/update/extract?literal.id=1&commit=true"; -F
"myfile=@stump.winners.san.diego.2013.pdf"

01839

:
: Check solrconfig to see that /update/extract is configured as in the standard
: Solr example.
:
: Does /solr/update/extract work for you using the standard Solr example?
:
: -- Jack Krupansky
:
: -Original Message- From: Nutan
: Sent: Sunday, September 15, 2013 2:37 AM
: To: solr-user@lucene.apache.org
: Subject: requested url solr/update/extract not available on this server
:
: I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I
: referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this
: error:requested url solr/update/extract not available on this server
: When my curl is :
: curl "http://localhost:8080/solr/update/extract?literal.id=1&commit=true"; -F
: "myfile=@cookbook.pdf"
: There is no entry in log files. Please help.
:
:
:
: --
: View this message in context:
:
http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html
: Sent from the Solr - User mailing list archive at Nabble.com.
:

-Hoss

TokenizerFactory from 4.2.0 to 4.3.0

2013-09-16 Thread Benson Margulies

TokenizerFactory changed, incompatibly with subclasses, from 4.2.0 to
4.3.0. Subclasses must now implement a different overload of create, and
may not implement the old one.

Has anyone got any devious strategies other than multiple copies of code to
deal with this when supporting multiple versions of Solr?

CREATEALIAS does not work with more than one collection (Error 503: no servers hosting shard)

2013-09-16 Thread HaiXin Tie


Hello Solr experts,

For some strange reason, collection alias does not work in my Solr
instance when more than one collection is used. I would appreciate your
help.

# Here is my setup, which is quite simple:
Zookeeper: 3.4.5 (used to upconfig/linkconfig collections and configs
for c1 and c2)
Solr: version 4.4.0, with two collections c1 and c2 (solr.xml included)
created using remote core API calls

# Symptoms:
1. Solr queries to each individual collection works fine:
http://localhost:8983/solr/c1/select?q=*:*
http://localhost:8983/solr/c2/select?q=*:*
2. CREATEALIAS name=cx for c1 or c2 alone (e.g. 1-1 mapping) works fine:
http://localhost:8983/solr/cx/select?q=*:*
3. CREATEALIAS name=cx for c1 and c2 does not work:

# Solr request/response to the collection alias (success):
http://localhost:8983/solr/cx/select?q=*:*
5032*:*no
servers hosting shard: 503

# Solr query using the alias fails with Error 503: "no servers
hosting shard"
curl -s
"http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=cx&collections=c1,c2";


0134


# Solr logs:
3503223 [qtp724646150-11] ERROR org.apache.solr.core.SolrCore  ?
org.apache.solr.common.SolrException: no servers hosting shard:
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

3503224 [qtp724646150-11] INFO  org.apache.solr.core.SolrCore  ? [c1]
webapp=/solr path=/select params={q=*:*} status=503 QTime=2

# solr.xml


  


  


# zookeeper alias (same from solr/cloud UI):
[zk: localhost:2181(CONNECTED) 10] get /myroot/aliases.json
{"collection":{
"cx":"c1,c2"}}
cZxid = 0x110d
ctime = Fri Sep 13 17:25:18 PDT 2013
mZxid = 0x18d1
mtime = Mon Sep 16 16:31:21 PDT 2013
pZxid = 0x110d
cversion = 0
dataVersion = 19
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 119
numChildren = 0

BTW, I've spent a lot of time figuring out how to make zookeeper and
solr work together. The commands are not complex, but making them work
sometimes requires a lot of digging online, to figure out missing jars
for zkCli.sh, etc. I know a lot of things are changing since Solr 4.0,
but I really hope the Solr documentation can be better maintained, so
that people won't have to spend tons of hours figuring out simple steps
(albeit complex under the hood) like this. Thanks!

--
Regards,
HaiXin
=
AIM  : tivohtie
Work : 408.914.9835
Mobile   : 408.368.9289
Schedule : http://htie-linux/
=




This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.

Updated: CREATEALIAS does not work with more than one collection (Error 503: no servers hosting shard)

2013-09-16 Thread HaiXin Tie


Sorry but I've fixed some typos, updated text:

Hello Solr experts,

For some strange reason, collection alias does not work in my Solr
instance when more than one collection is used. I would appreciate your
help.

# Here is my setup, which is quite simple:
Zookeeper: 3.4.5 (used to upconfig/linkconfig collections and configs
for c1 and c2)
Solr: version 4.4.0, with two collections c1 and c2 (solr.xml included)
created using remote core API calls

# Symptoms:
1. Solr queries to each individual collection works fine:
http://localhost:8983/solr/c1/select?q=*:*
http://localhost:8983/solr/c2/select?q=*:*
2. CREATEALIAS name=cx for c1 or c2 alone (e.g. 1-1 mapping) works fine:
http://localhost:8983/solr/cx/select?q=*:*
3. CREATEALIAS name=cx for c1 and c2 does not work:

# Solr request/response to the collection alias (success):
curl -s
"http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=cx&collections=c1,c2";


0134


# Solr query using the alias fails with Error 503: "no servers
hosting shard"
http://localhost:8983/solr/cx/select?q=*:*
5032*:*no
servers hosting shard: 503


# Solr logs:
3503223 [qtp724646150-11] ERROR org.apache.solr.core.SolrCore  ?
org.apache.solr.common.SolrException: no servers hosting shard:
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

3503224 [qtp724646150-11] INFO  org.apache.solr.core.SolrCore  ? [c1]
webapp=/solr path=/select params={q=*:*} status=503 QTime=2

# solr.xml


  


  


# zookeeper alias (same from solr/cloud UI):
[zk: localhost:2181(CONNECTED) 10] get /myroot/aliases.json
{"collection":{
"cx":"c1,c2"}}
cZxid = 0x110d
ctime = Fri Sep 13 17:25:18 PDT 2013
mZxid = 0x18d1
mtime = Mon Sep 16 16:31:21 PDT 2013
pZxid = 0x110d
cversion = 0
dataVersion = 19
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 119
numChildren = 0

BTW, I've spent a lot of time figuring out how to make zookeeper and
solr work together. The commands are not complex, but making them work
sometimes requires a lot of digging online, to figure out missing jars
for zkCli.sh, etc. I know a lot of things are changing since Solr 4.0,
but I really hope the Solr documentation can be better maintained, so
that people won't have to spend tons of hours figuring out simple steps
(albeit complex under the hood) like this. Thanks!

--
Regards,
HaiXin




This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.

SPLITSHARD failure right before publishing the new sub-shards

2013-09-16 Thread HaiXin Tie


Hi Solr experts,

I am using Solr 4.4 with ZK 3.4.5, trying to split "shard1" of a
collection named "body". There is only one core on one machine for this
collection. When I call SPLITSHARD to split this collection, Solr is
able to create two sub-shards, but failed with a NPE in SolrCore.java
while publishing the new shards. It seems that either the updateHandler
or its updateLog is null, though they work fine in the original shard:

SolrCore.java
if (cc != null && cc.isZooKeeperAware() &&
Slice.CONSTRUCTION.equals(cd.getCloudDescriptor().getShardState())) {
  // set update log to buffer before publishing the core
862:  getUpdateHandler().getUpdateLog().bufferUpdates();

  cd.getCloudDescriptor().setShardState(null);
  cd.getCloudDescriptor().setShardRange(null);

}


Here are the details. Any pointers to aid debugging this issue is
greatly appreciated!

# curl request/response to split the shard:

curl -s
"http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=body&shard=shard1";



5002688org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'body_shard1_0_replica1': Unable to create core:
body_shard1_0_replica1 Caused by: nullorg.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
SPLTSHARD failed to create subshard leadersSPLTSHARD failed to create subshard
leaders500SPLTSHARD failed to create subshard leadersorg.apache.solr.common.SolrException: SPLTSHARD failed to
create subshard leaders
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:171)
at
org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:322)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:662)
500



# Full solr log for the split shard call:

384779 [qtp334936591-10] INFO
org.apache.solr.handler.admin.CollectionsHandler  ? Splitting shard :
shard=shard1&action=SPLITSHARD&collection=body
384791 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue  ?
Watcher fired on path: /overseer/collection-queue-work state:
SyncConnected type NodeChildrenChanged
384791 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue  ?
Watcher fired on path: /overseer/collection-queue-work state:
SyncConnected type NodeChildrenChanged
384797 [main-EventThread] INFO org.apache.solr.cloud.Di

how to make sure all the index docs flushed to the index files

2013-09-16 Thread YouPeng Yang

Hi

   I'm using  the DIH to import data from  oracle database with Solr4.4
   Finally I get 2.7GB index data and 4.1GB tlog data.And the number of
docs was 1090.

  At first,  I move the 2.7GB index data to another new Solr Server in
tomcat7. After I start the tomcat ,I find the total number of docs was just
half of the orginal number.
  So I thought that maybe the left docs were not commited to index
files,and the  tlog needed to be replayed .

  Sequently , I moved the 2.7GB index data and 4.1GB tlog data to the new
Solr Server in tomcat7.
   After I start the tomcat,an exception comes up as [1].
   Then it halts.I can not access the tomcat server URL.
I noticed  that  the CPU utilization  was high by using the comand: top
-d 1 | grep tomcatPid.
I thought solr was replaying the updatelog.And I wait a long time and it
still was replaying. As results ,I give up.

   So I want to make sure after I finished the DIH import process ,whether
the whole index was flushed into the index data files. Is there any steps I
missed?
   How to make sure all the index were commited into the index files?.







[1]--
19380 [recoveryExecutor-6-thread-1] WARN  org.apache.solr.update.UpdateLog
?.REPLAY_ERR: Exception replaying log
java.lang.UnsupportedOperationException
at
org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46)
at
org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:200)
at org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:736)
at
org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:183)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:672)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1313)
at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1202)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)

2 question about solr and lucene

2013-09-16 Thread Robin Wei

Hi, guys:
I met two questions about solr and lucene, wish people to help out.
use payload query but can NOT with numerical field type.  for example:  
  
I implemented my own requesthandler, refer to 
http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/ 
 I query in solr:sinaTag:operate
  solr response:
  "numFound": 2,
"start": 0,
"maxScore": 99,
"docs": [
  {
"id": "1628209010",
"followersCount": 752,
"sinaTag": "operate|99 C2C|98 B2C|97 OnlineShopping|96 E-commercial|94",
"score": 99
   },
  {
"id": "1900546410",
"followersCount": 1002,
"sinaTag": "Startup|99 Benz|98 PublicRelation|97 operate|96 Activity|95 
Media|94 AD|93 Vehicle|92 ",
  "score":   96
   }
This work well. 
But query with combined with other numberical condition, such as:
sinaTag:operate and followersCount:[752 TO 752]
{
   "responseHeader": {
"status": 0,
"QTime": 40
  },
  "response": {
"numFound": 0,
"start": 0,
"maxScore": 0,
"docs": []
  }
}
According these dataset, the first record should be responsed rather than 
NOT FOUND. 
I not know why.


  2. About string field fuzzy match filtering, how to get the score? what the 
formula is?
 When I used two or several string fuzzy match, probable AND or OR,  how to 
get the score? what the formula is?
 Might I implement myself score formula class which interface or abstract 
class to extend ?
 



Thanks in advance.

Re: how to make sure all the index docs flushed to the index files

2013-09-16 Thread Shawn Heisey

On 9/16/2013 8:26 PM, YouPeng Yang wrote:
>I'm using  the DIH to import data from  oracle database with Solr4.4
>Finally I get 2.7GB index data and 4.1GB tlog data.And the number of
> docs was 1090.
> 
>   At first,  I move the 2.7GB index data to another new Solr Server in
> tomcat7. After I start the tomcat ,I find the total number of docs was just
> half of the orginal number.
>   So I thought that maybe the left docs were not commited to index
> files,and the  tlog needed to be replayed .

You need to turn on autoCommit in your solrconfig.xml so that there are
hard commits happening on a regular basis that flush all indexed data to
disk and start new transaction log files.  I will give you a link with
some information about that below.

>   Sequently , I moved the 2.7GB index data and 4.1GB tlog data to the new
> Solr Server in tomcat7.
>After I start the tomcat,an exception comes up as [1].
>Then it halts.I can not access the tomcat server URL.
> I noticed  that  the CPU utilization  was high by using the comand: top
> -d 1 | grep tomcatPid.
> I thought solr was replaying the updatelog.And I wait a long time and it
> still was replaying. As results ,I give up.

I don't know what the exception was about, but it is likely that it WAS
replaying the log.  With 4.1GB of transaction log, that's going to take
a LONG time, during which Solr will be unavailable.  It always replays
the entire transaction log.  The key, as mentioned above, is in keeping
that log small.

Here's a wiki page about the slow startup problem and an example of how
to configure autoCommit to deal with it:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

There's a lot of other good information on that page.

Thanks,
Shawn

Re: how to make sure all the index docs flushed to the index files

2013-09-16 Thread YouPeng Yang

Hi  Shawn

   Thank your very much for your reponse.

   I lauch the full-import task on the web page of solr/admin . And I do
check the commit option.
The new docs would be committed after the operation.
  The commit option is defferent with the autocommit,right? If the import
datasets are too large that leads to poor performance or
other problems ,such as [1].

   The exception that indicate that -Too many open files-,we thought is
because of the ulimit.




[1]
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149d.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149e.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149f.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149g.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149h.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149i.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149j.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149k.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149l.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149m.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149n.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149o.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149p.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149q.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149r.fdx (Too
many open files)
java.io.FileNotFoundException:
/data/apache-tomcat/webapps/solr/collection1/data/index/_149s.fdx (Too
many open files)



2013/9/17 Shawn Heisey 

> On 9/16/2013 8:26 PM, YouPeng Yang wrote:
> >I'm using  the DIH to import data from  oracle database with Solr4.4
> >Finally I get 2.7GB index data and 4.1GB tlog data.And the number of
> > docs was 1090.
> >
> >   At first,  I move the 2.7GB index data to another new Solr Server in
> > tomcat7. After I start the tomcat ,I find the total number of docs was
> just
> > half of the orginal number.
> >   So I thought that maybe the left docs were not commited to index
> > files,and the  tlog needed to be replayed .
>
> You need to turn on autoCommit in your solrconfig.xml so that there are
> hard commits happening on a regular basis that flush all indexed data to
> disk and start new transaction log files.  I will give you a link with
> some information about that below.
>
> >   Sequently , I moved the 2.7GB index data and 4.1GB tlog data to the new
> > Solr Server in tomcat7.
> >After I start the tomcat,an exception comes up as [1].
> >Then it halts.I can not access the tomcat server URL.
> > I noticed  that  the CPU utilization  was high by using the comand:
> top
> > -d 1 | grep tomcatPid.
> > I thought solr was replaying the updatelog.And I wait a long time and it
> > still was replaying. As results ,I give up.
>
> I don't know what the exception was about, but it is likely that it WAS
> replaying the log.  With 4.1GB of transaction log, that's going to take
> a LONG time, during which Solr will be unavailable.  It always replays
> the entire transaction log.  The key, as mentioned above, is in keeping
> that log small.
>
> Here's a wiki page about the slow startup problem and an example of how
> to configure autoCommit to deal with it:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup
>
> There's a lot of other good information on that page.
>
> Thanks,
> Shawn
>
>

Problem with SynonymFilter and StopFilterFactory

2013-09-16 Thread david . davila

Hi, 

I have encoutered a problem applying StopFilterFactory and 
SynonimFilterFactory. The problem is that SynonymFilter removes the gaps 
that were previously put by the StopFilterFactory. I'm applying filters in 
query time, because users need to change synonym lists frequently.

This is my schema, and an example of the issue:


String: "documentacion para agentes"

org.apache.solr.analysis.WhitespaceTokenizerFactory 
{luceneMatchVersion=LUCENE_35}
position1   2   3
term text   documentaciónpara   agentes
startOffset 0   14  19
endOffset   13  18  26
org.apache.solr.analysis.LowerCaseFilterFactory 
{luceneMatchVersion=LUCENE_35}
position1   2   3
term text   documentaciónpara   agentes
startOffset 0   14  19
endOffset   13  18  26
org.apache.solr.analysis.StopFilterFactory {words=stopwords_intranet.txt, 
ignoreCase=true, enablePositionIncrements=true, 
luceneMatchVersion=LUCENE_35}
position1   3
term text   documentación   agentes
startOffset 0   19
endOffset   13  26
org.apache.solr.analysis.SynonymFilterFactory 
{synonyms=sinonimos_intranet.txt, expand=true, ignoreCase=true, 
luceneMatchVersion=LUCENE_35}
position1   2
term text   documentación   agente
archivo agentes
typeSYNONYM SYNONYM
SYNONYM SYNONYM
startOffset 0   19
0   19
endOffset 1326
13  26


As you can see, the position should be 1 and 3, but SynonymFilter removes 
the gap and moves token from position 3 to 2
I've got the same problem with Solr 3.5 y 4.0. 
I don't know if it's a bug or an error with my configuration. In other 
schemas that I have worked with, I had always put the SynonymFilter 
previous to StopFilter, but in this I prefered using this order because of 
the big number of synonym that the list has (i.e. I don't want to generate 
a lot of synonyms for a word that I really wanted to remove).

Thanks,

David Dávila Atienza
AEAT - Departamento de Informática Tributaria

Re: how to make sure all the index docs flushed to the index files

2013-09-16 Thread YouPeng Yang

Hi
   Another werid problem.
   When we setup the autocommit properties, we  suppose that the index
fille will created every commited.So that the size of the index files will
be large enough. We do not want to keep too many small files as [1].

   How to control the size of the index files.

[1]
...omited 
548KBindex/_28w_Lucene41_0.doc
289KBindex/_28w_Lucene41_0.pos
1.1Mindex/_28w_Lucene41_0.tim
24Kindex/_28w_Lucene41_0.tip
2.1Mindex/_28w.fdt
766Bindex/_28w.fdx
5KBindex/_28w.fnm
40Kindex/_28w.nvd
79Kindex/_28w.nvm
364Bindex/_28w.si
518KBindex/_28x_Lucene41_0.doc
290KBindex/_28x_Lucene41_0.pos
1.2Mindex/_28x_Lucene41_0.tim
28Kindex/_28x_Lucene41_0.tip
2.1Mindex/_28x.fdt
843Bindex/_28x.fdx
5KBindex/_28x.fnm
40Kindex/_28x.nvd
79Kindex/_28x.nvm
386Bindex/_28x.si
...omited 
-





2013/9/17 YouPeng Yang 

> Hi  Shawn
>
>Thank your very much for your reponse.
>
>I lauch the full-import task on the web page of solr/admin . And I do
> check the commit option.
> The new docs would be committed after the operation.
>   The commit option is defferent with the autocommit,right? If the import
> datasets are too large that leads to poor performance or
> other problems ,such as [1].
>
>The exception that indicate that -Too many open files-,we thought is
> because of the ulimit.
>
>
>
>
>
> [1]
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149d.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149e.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149f.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149g.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149h.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149i.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149j.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149k.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149l.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149m.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149n.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149o.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149p.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149q.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149r.fdx (Too many 
> open files)
>
> java.io.FileNotFoundException: 
> /data/apache-tomcat/webapps/solr/collection1/data/index/_149s.fdx (Too many 
> open files)
>
>
>
> 2013/9/17 Shawn Heisey 
>
>> On 9/16/2013 8:26 PM, YouPeng Yang wrote:
>> >I'm using  the DIH to import data from  oracle database with Solr4.4
>> >Finally I get 2.7GB index data and 4.1GB tlog data.And the number of
>> > docs was 1090.
>> >
>> >   At first,  I move the 2.7GB index data to another new Solr Server in
>> > tomcat7. After I start the tomcat ,I find the total number of docs was
>> just
>> > half of the orginal number.
>> >   So I thought that maybe the left docs were not commited to index
>> > files,and the  tlog needed to be replayed .
>>
>> You need to turn on autoCommit in your solrconfig.xml so that there are
>> hard commits happening on a regular basis that flush all indexed data to
>> disk and start new transaction log files.  I will give you a link with
>> some information about that below.
>>
>> >   Sequently , I moved the 2.7GB index data and 4.1GB tlog data to the
>> new
>> > Solr Server in tomcat7.
>> >After I start the tomcat,an exception comes up as [1].
>> >Then it halts.I can not access the tomcat server URL.
>> > I noticed  that  the CPU utilization  was high by using the comand:
>> top
>> > -d 1 | grep tomcatPid.
>> > I thought solr was replaying the updatelog.And I wait a long time and it
>> > still was replaying. As results ,I g

48 matches

Mail list logo