SqlEntityProcessor

2014-08-10 Thread Christof Lorenz
Hi folks,

i am searching for a way to update a certain column in the rdbms for
each
item as soon as the item was indexed by solr. 
The column will be the indicator in the delta-query to select un-indexed
items.
We don't want to use the timestamp based mechanism that is default.

Any ideas how we could implement this ?

Regards,
Lochri



Re: SqlEntityProcessor

2014-08-10 Thread Alexandre Rafalovitch
Custom UpdateRequestProcessor that collects the IDs of submitted
documents and updates the database on commit.

DataImportHandler itself is strictly one-way read operation. But you
can add URP chain after it.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Sun, Aug 10, 2014 at 1:51 PM, Christof Lorenz  wrote:
> Hi folks,
>
> i am searching for a way to update a certain column in the rdbms for
> each
> item as soon as the item was indexed by solr.
> The column will be the indicator in the delta-query to select un-indexed
> items.
> We don't want to use the timestamp based mechanism that is default.
>
> Any ideas how we could implement this ?
>
> Regards,
> Lochri
>


Re: How can I request a big list of values ?

2014-08-10 Thread Jack Krupansky
Generally, "large requests" are an anti-pattern in modern distributed 
systems. Better to have a number of smaller requests executing in parallel 
and then merge the results in the application layer.


-- Jack Krupansky

-Original Message- 
From: Bruno Mannina

Sent: Saturday, August 9, 2014 7:18 PM
To: solr-user@lucene.apache.org
Subject: How can I request a big list of values ?

Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field
(more than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant 
parce que la protection avast! Antivirus est active.
http://www.avast.com 



Re: How can I request a big list of values ?

2014-08-10 Thread Anshum Gupta
Hi Bruno,

If you would have been on a more recent release,
https://issues.apache.org/jira/browse/SOLR-6318 would have come in
handy perhaps.
You might want to look at patching your version with this though (as a
work around).

On Sat, Aug 9, 2014 at 4:18 PM, Bruno Mannina  wrote:
> Hi All,
>
> I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.
>
> All work fine, it's great :)
>
> But now, I would like to request a list of values in the same field (more
> than 2000 values)
>
> I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)
>
> but I have a list of 2000 values ! I think it's not the good idea to use
> this method.
>
> Can someone help me to find the good solution ?
> Can I use a json structure by using a POST method ?
>
> Thanks a lot,
> Bruno
> |
>
>
> ---
> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> parce que la protection avast! Antivirus est active.
> http://www.avast.com



-- 

Anshum Gupta
http://www.anshumgupta.net


Re: How can I request a big list of values ?

2014-08-10 Thread Bruno Mannina

Hi Jack,

ok but for 2000 values, it means that I must do 40 requests if I choose 
to have 50 values by requests :'(
and in my case, user can choose about 8 topics, so it can generate 8 
times 40 requests... humm...


is it not possible to send a text, json, xml file ?

Le 10/08/2014 17:38, Jack Krupansky a écrit :
Generally, "large requests" are an anti-pattern in modern distributed 
systems. Better to have a number of smaller requests executing in 
parallel and then merge the results in the application layer.


-- Jack Krupansky

-Original Message- From: Bruno Mannina
Sent: Saturday, August 9, 2014 7:18 PM
To: solr-user@lucene.apache.org
Subject: How can I request a big list of values ?

Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field
(more than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: How can I request a big list of values ?

2014-08-10 Thread Bruno Mannina

Hi Anshum,

I can do it with 3.6 release no ?

my main problem, it's that I have around 2000 values, so I can't use one 
request with these values, it's too wide. :'(


I will take a look to generate (like Jack proposes me) several requests, 
but even in this case it seems to be not safe...


Le 10/08/2014 19:45, Anshum Gupta a écrit :

Hi Bruno,

If you would have been on a more recent release,
https://issues.apache.org/jira/browse/SOLR-6318 would have come in
handy perhaps.
You might want to look at patching your version with this though (as a
work around).

On Sat, Aug 9, 2014 at 4:18 PM, Bruno Mannina  wrote:

Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field (more
than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: How can I request a big list of values ?

2014-08-10 Thread Jack Krupansky

Not safe? In what way?

It might be nice to have a specialized SolrJ API for this particular kind of 
request, so the API can do the merge. Maybe do it as a class so that you 
could have a method that gets invoked as documents trickle back from the 
various requests, again so that it is not a massive, blocking request.


-- Jack Krupansky

-Original Message- 
From: Bruno Mannina

Sent: Sunday, August 10, 2014 6:04 PM
To: solr-user@lucene.apache.org
Subject: Re: How can I request a big list of values ?

Hi Anshum,

I can do it with 3.6 release no ?

my main problem, it's that I have around 2000 values, so I can't use one
request with these values, it's too wide. :'(

I will take a look to generate (like Jack proposes me) several requests,
but even in this case it seems to be not safe...

Le 10/08/2014 19:45, Anshum Gupta a écrit :

Hi Bruno,

If you would have been on a more recent release,
https://issues.apache.org/jira/browse/SOLR-6318 would have come in
handy perhaps.
You might want to look at patching your version with this though (as a
work around).

On Sat, Aug 9, 2014 at 4:18 PM, Bruno Mannina  wrote:

Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field (more
than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant 
parce que la protection avast! Antivirus est active.
http://www.avast.com 



Re: How can I request a big list of values ?

2014-08-10 Thread Jack Krupansky
The issue is not whether or how to do a massive request, but to recognize 
that a single massive request across the network is very clearly an 
anti-pattern for modern distributed systems.


Instead of searching for ways to do something "bad", it is better to figure 
out how to exploit the positive potential of a system, which in this case is 
parallel execution of distributed components.


-- Jack Krupansky

-Original Message- 
From: Bruno Mannina

Sent: Sunday, August 10, 2014 6:01 PM
To: solr-user@lucene.apache.org
Subject: Re: How can I request a big list of values ?

Hi Jack,

ok but for 2000 values, it means that I must do 40 requests if I choose
to have 50 values by requests :'(
and in my case, user can choose about 8 topics, so it can generate 8
times 40 requests... humm...

is it not possible to send a text, json, xml file ?

Le 10/08/2014 17:38, Jack Krupansky a écrit :
Generally, "large requests" are an anti-pattern in modern distributed 
systems. Better to have a number of smaller requests executing in parallel 
and then merge the results in the application layer.


-- Jack Krupansky

-Original Message- From: Bruno Mannina
Sent: Saturday, August 9, 2014 7:18 PM
To: solr-user@lucene.apache.org
Subject: How can I request a big list of values ?

Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field
(more than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant 
parce que la protection avast! Antivirus est active.

http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant 
parce que la protection avast! Antivirus est active.
http://www.avast.com 



Re: How to grab matching stats in Similarity class

2014-08-10 Thread Hafiz Mian M Hamid
Follow-up, hoping someone will help.


On Wednesday, August 6, 2014 5:40 PM, Hafiz Mian M Hamid 
 wrote:
 


We're using solr 4.2.1 and use an extension of Lucene's DefaultSimilarity as 
our similarity class. I am trying to figure out how we could get hold of the 
matching stats (i.e. how many/which terms in the query matched on different 
fields in the retrieved document set) in our similarity class since we want to 
add some custom boost to our scoring function. The scoring logic needs to know 
the number of terms matched on each field in the query to determine the boost 
value.

Basically we want our similarity class to be aware of the global matching stats 
even for scoring a single term in it's TFIDFDocScorer.score() method. I was 
wondering how we could get hold of that information. It looks like the 
exactSimScorer() and sloppySimScorer() methods get an instance of 
AtomicReaderContext as second parameter but it doesn't look like we could 
retrieve matching stats from this object. Is there any other way we could make 
the similarity class aware of the global matching stats?

I'd highly appreciate any help.

Thanks,
Hamid

what os env you use to develop lucene or solr?

2014-08-10 Thread rulinma
HI
  everybody,

  I want know this, if linux is the best choosen? and Doug Cutting use what,
and centos or ubuntu or others, even mac?

  thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-os-env-you-use-to-develop-lucene-or-solr-tp4152219.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta Import - Cleaning Index

2014-08-10 Thread rulinma
good.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delta-Import-Cleaning-Index-tp4151217p4152221.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disabling transaction logs

2014-08-10 Thread rulinma
good.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disabling-transaction-logs-tp4151721p415.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Content-Charset header in HttpSolrServer

2014-08-10 Thread Michael Ryan
Done. https://issues.apache.org/jira/browse/SOLR-6360

-Michael

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, August 06, 2014 7:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Content-Charset header in HttpSolrServer


: I was reviewing the httpclient code in HttpSolrServer and noticed that
: it sets a "Content-Charset" header. As far as I know this is not a real
: header and is not necessary. Anyone know a reason for this to be there? 
: I'm guessing this was just a mistake when converting from httpclient3 to
: httpclient4.

yeah ... looking at the diffs this was added in r1327635 as part of
SOLR-2020 .. if you compare with the old CommonsHttpSolrServer.java the lines 
of code sami seemed to be trying to replicate was...

post.getParams().setContentCharset("UTF-8");

...i suspect we've just been getting luck that the default is already UTF-8, 
and/or the subsequent code is always specific about the charset when adding 
streams -- but fixing that to call the equivilent "new" method in httpclient4 
would be a good idea.

Michael: Would you might opening a jira for this? (I don't suppose you know 
what the correct equivilent method/option in httpclient4 is do you?)



-Hoss
http://www.lucidworks.com/


Re: SolrCloud Scale Struggle

2014-08-10 Thread rulinma
should not autoCommit openSearcher too freq. 
   
  360 

  true 

 
   1000 
   100

 
   1 
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4152229.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: what os env you use to develop lucene or solr?

2014-08-10 Thread Shawn Heisey
On 8/10/2014 7:49 PM, rulinma wrote:
>   I want know this, if linux is the best choosen? and Doug Cutting use what,
> and centos or ubuntu or others, even mac?

My clients are all Windows, my servers are all Linux.  I've used Linux
clients, but I do enough stuff with my computers that is either hard or
impossible on Linux that I haven't taken the full plunge.

For Java development, I used eclipse.  I've done some poking in IntelliJ
IDEA, but it's not at all familiar yet.

I've heard awesome things about newer Macs, but I probably won't be
trying that until somebody else pays for it.

Thanks,
Shawn



Re: SolrCloud Scale Struggle

2014-08-10 Thread anand.mahajan
Hello all,

Thank you for your suggestions. With the autoCommit (every 10 mins) and
softCommit (every 10 secs) frequencies reduced things work much better now.
The CPU usages has gone down considerably too (by about 60%) and the
read/write throughput is showing considerable improvements too. 

There are a certain shards that are giving poor response times - these have
over 10M listings - I guess this is due to the fact that these are starving
for RAM? Would it help if I split these up in smaller shards, but with the
existing set of hardware? (I cannot allocate more machines to the cloud as
yet)

Thanks,
Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4152239.html
Sent from the Solr - User mailing list archive at Nabble.com.