Re: index the links having a certain website

2012-04-01 Thread Marcelo Carvalho Fernandes
Hi Manuel,

Do you mean you need to index html files?
What kind of search do you image doing?

---
Marcelo Carvalho Fernandes

On Friday, March 30, 2012, Manuel Antonio Novoa Proenza <
mano...@estudiantes.uci.cu> wrote:
>
>
>
>
>
> Hello
>
> I'm not good with English, and therefore I had to resort to a translator.
>
> I have the following question ...
>
> How I can index the links having a certain website ?
>
> regards
>
> ManP
>
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

-- 

Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786


Re: Position Solr results

2012-04-01 Thread Marcelo Carvalho Fernandes
Try using the "score" field in the search results.

---
Marcelo Carvalho Fernandes

On Friday, March 30, 2012, Manuel Antonio Novoa Proenza <
mano...@estudiantes.uci.cu> wrote:
>
>
>
>
>
> Hi
>
> I'm not good with English, and for this reason I had to resort to a
translator.
>
> I have the following question ...
>
> How I can get the position in which there is a certain website in solr
results generated for a given search criteria ?
>
> regards
>
> ManP
>
>
>
>
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

-- 

Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786


Re: Content privacy, search & index

2012-04-01 Thread dbenjamin
Hi Paul,

You lost me :-)

You mean implementing a specific RequestHandler just for my needs ?

Also, when you say "It'd transform a query for "a b" into "+(a b)
+(authorizedBit)"", that's not so clear to me, do you mind explaining this
like i was a 6 years old ? ;-) (even if I think that's just a matter of
syntax...)

Indeed, the friend list will obviously be cached.

Thanks.

Br,
Benjamin.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Content-privacy-search-index-tp3873462p3874961.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Content privacy, search & index

2012-04-01 Thread dbenjamin
Hi spring,

Solution 1 is what i had in mind.

So i can't do the whole thing directly in Solr ? (Except maybe by
implementing a new RequestHandler like Paul suggested)

Concerning the auto-complete of friends in the search box, you won't use the
auto-complete feature from Solr then, will you ? Because the friend list
would not be indexed in Solr but retrieved from application cache. (well, if
you don't have answer to that it's ok because that's really secondary, the
privacy level being my primary concern.)

Concerning the size of the data, we have to consider they could grow
exponentially.
The hypothesis is : 300K users, an average of 100 friends each and 200
documents (each).


Br,
Benjamin.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Content-privacy-search-index-tp3873462p3874982.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Content privacy, search & index

2012-04-01 Thread Paul Libbrecht
Hello Benjamin,

Le 1 avr. 2012 à 11:48, dbenjamin a écrit :
> You lost me :-)
> You mean implementing a specific RequestHandler just for my needs ?

I think a QueryComponent is enough, it'd extend QueryComponent.
It's prepare method reads all the params and calls the ResponseBuilder's 
setQuery with the redefined query.

> Also, when you say "It'd transform a query for "a b"

this is an example query from the client.
If you launch the QueryParser on it, you get a BooleanQuery with clauses 
TermQuery for a (in the "default field") and a TermQuery for b (in the "default 
field"). This is done for you if you call super.prepare then collect the query: 
it's probably a booleanquery, or you wrap.

> into "+(a b) +(authorizedBit)"", that's not so clear to me, do you mind 
> explaining this
> like i was a 6 years old ? ;-) (even if I think that's just a matter of
> syntax...)

you'd do something such as the following:

// assemble a booleanquery bq2 with all the necessary bits (e.g. indicating the 
term-queries that say owner:)

bq = new BooleanQuery();
bq1 = new BooleanQuery();
// add termqueries for a and b into bq1, 
bq.add(bq1, BooleanQuery.Occurs.MUST); // that's the +
bq.add(bq2, BooleanQuery.Occurs.MUST); // and another +
// assemble bq3 that woudl "prefer" particular things, e.g. prefer things of 
users in my group
bq.add(bq3, BooleanQuery.Occurs.SHOULD) // no +, just impacts weight but is not 
required

That's the way I implement query-expansion.
I'm afraid I do not know a place where this is documented.

paul

> 
> Indeed, the friend list will obviously be cached.
> 
> Thanks.
> 
> Br,
> Benjamin.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Content-privacy-search-index-tp3873462p3874961.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: reproducibility of query results

2012-04-01 Thread Ahmet Arslan
> I appear to be observing some
> unpredictability in query results, and I
> wanted to eliminate Solr itself as a possible cause.
> 
> Using 1.4 at the moment. I insert a stack of document (using
> the
> EmbeddedSolrServer) and then run a query, retrieving 200
> results. (A
> significant fraction of the docs in the index). Should I
> expect to get
> precisely the same docs in the same order with the same
> scores every time
> that I do this?

If your index does not change, yes you can expect this. If you add/delete docs 
score and order can change. 


Re: reproducibility of query results

2012-04-01 Thread Benson Margulies
i make a new index each iteration. if I insert the same docs in the
same order, should I expect the same query results? Note that I shut
down entirely after the adds, then in a new process run the queries.

On Apr 1, 2012, at 11:37 AM, Ahmet Arslan  wrote:

>> I appear to be observing some
>> unpredictability in query results, and I
>> wanted to eliminate Solr itself as a possible cause.
>>
>> Using 1.4 at the moment. I insert a stack of document (using
>> the
>> EmbeddedSolrServer) and then run a query, retrieving 200
>> results. (A
>> significant fraction of the docs in the index). Should I
>> expect to get
>> precisely the same docs in the same order with the same
>> scores every time
>> that I do this?
>
> If your index does not change, yes you can expect this. If you add/delete 
> docs score and order can change.


Re: 'foruns' don't match 'forum' with NGramFilterFactory (or EdgeNGramFilterFactory)

2012-04-01 Thread Bráulio Bhavamitra
Using edismax defType made the thing work, could anybody explain why?

(any stemming is enabled now)

bráulio

2012/2/14 Bráulio Bhavamitra 

> Hello all,
>
> I'm experimenting with NGramFilterFactory and EgdeNGramFilterFactory.
>
> Both of them shows a match in my solr admin analysis, but when I query
> 'foruns'
> doesn't find any 'forum'.
> analysis
> http://bhakta.casadomato.org:8982/solr/admin/analysis.jsp?nt=type&name=text&verbose=on&highlight=on&val=f%C3%B3runs&qverbose=on&qval=f%C3%B3runs
> search
> http://bhakta.casadomato.org:8982/solr/select/?q=foruns&version=2.2&start=0&rows=10&indent=on
>
> Anybody knows what's the problem?
>
> bráulio
>


Re: index the links having a certain website

2012-04-01 Thread Manuel Antonio Novoa Proenza
hi Marcelo

Certainly I want to index HTML documents, but would like to save these 
separately to text links that these have, for, say , See What are the links to 
external websites have a page ? .

I reiterate that my English is very bad so I use a translator , anyway then 
send you what I mean in Spanish.

thank you very much

Manuel


hola Marcelo

Ciertamente deseo indexar documentos HTML, pero quisiera de estos guardar por 
separado al texto los enlaces que estos posean, para, por ejemplo, Consultar 
¿Cuáles son los links a sitios web externos que posee una determinada página?.

Te reitero que mi inglés es muy malo por eso uso un traductor, de todas formas 
a continuación te envío lo que quiero decir en español.

Muchas gracias

Manuel
















Saludos...














Manuel Antonio Novoa Proenza
Universidad de las Ciencias Informáticas
Email: mano...@estudiantes.uci.cu


- Mensaje original -

De: "Marcelo Carvalho Fernandes" 
Para: solr-user@lucene.apache.org
Enviados: Domingo, 1 de Abril 2012 5:12:34
Asunto: Re: index the links having a certain website

Hi Manuel,

Do you mean you need to index html files?
What kind of search do you image doing?

---
Marcelo Carvalho Fernandes

On Friday, March 30, 2012, Manuel Antonio Novoa Proenza <
mano...@estudiantes.uci.cu> wrote:
>
>
>
>
>
> Hello
>
> I'm not good with English, and therefore I had to resort to a translator.
>
> I have the following question ...
>
> How I can index the links having a certain website ?
>
> regards
>
> ManP
>
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

--

Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci



Re: Position Solr results

2012-04-01 Thread Manuel Antonio Novoa Proenza


hi Marcelo

In that sense I think the score does not help. The score is a number that I 
determined at that position results generated are a given site.

For example :

I perform the following query : q = university

Solr generates several results among which is that of a certain website. Does 
solr some mechanism to let me know that posción is this result?

I reiterate that my English is very bad so I use a translator , anyway then 
send you what I mean in Spanish.

thank you very much

Manuel

hola Marcelo

En ese sentido creo que el score no me sirve. El score es un numero que no me 
determina en que posición de los resultados generados se encuentra un 
determinado sitio.

Por ejemplo:

Yo realizo la siguiente consulta: q= universidad

Solr genera varios resultados entre los que se encuentra el de un determinado 
sitio web. ¿Cuenta solr con algún mecanismo que me permita saber en que posción 
se encuentra este resultado?

Te reitero que mi inglés es muy malo por eso uso un traductor, de todas formas 
a continuación te envío lo que quiero decir en español.

Muchas gracias

Manuel

















Saludos...














Manuel Antonio Novoa Proenza
Universidad de las Ciencias Informáticas
Email: mano...@estudiantes.uci.cu


- Mensaje original -

De: "Marcelo Carvalho Fernandes" 
Para: solr-user@lucene.apache.org
Enviados: Domingo, 1 de Abril 2012 5:14:50
Asunto: Re: Position Solr results

Try using the "score" field in the search results.

---
Marcelo Carvalho Fernandes

On Friday, March 30, 2012, Manuel Antonio Novoa Proenza <
mano...@estudiantes.uci.cu> wrote:
>
>
>
>
>
> Hi
>
> I'm not good with English, and for this reason I had to resort to a
translator.
>
> I have the following question ...
>
> How I can get the position in which there is a certain website in solr
results generated for a given search criteria ?
>
> regards
>
> ManP
>
>
>
>
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

--

Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci



Re: reproducibility of query results

2012-04-01 Thread Ahmet Arslan
> i make a new index each iteration. if
> I insert the same docs in the
> same order, should I expect the same query results? Note
> that I shut
> down entirely after the adds, then in a new process run the
> queries.

By saying a new index, you mean you create a an empty, new index? Yes, you 
should see same query results. (if you insert the same docs, and use same 
analysis)



RE: reproducibility of query results

2012-04-01 Thread Steven A Rowe
If your results are only sorted by score, it's possible that some have exactly 
the same score.  Unless you use a secondary sort, I don't think the order of 
returned results among same-scored hits is guaranteed.  As a result, if you cut 
off hits at some fixed threshold, you could see different entries at the 
low-scoring end of the hit list. - Steve

-Original Message-
From: Benson Margulies [mailto:bimargul...@gmail.com] 
Sent: Sunday, April 01, 2012 12:09 PM
To: solr-user@lucene.apache.org
Subject: Re: reproducibility of query results

i make a new index each iteration. if I insert the same docs in the same order, 
should I expect the same query results? Note that I shut down entirely after 
the adds, then in a new process run the queries.

On Apr 1, 2012, at 11:37 AM, Ahmet Arslan  wrote:

>> I appear to be observing some
>> unpredictability in query results, and I wanted to eliminate Solr 
>> itself as a possible cause.
>>
>> Using 1.4 at the moment. I insert a stack of document (using the
>> EmbeddedSolrServer) and then run a query, retrieving 200 results. (A 
>> significant fraction of the docs in the index). Should I expect to 
>> get precisely the same docs in the same order with the same scores 
>> every time that I do this?
>
> If your index does not change, yes you can expect this. If you add/delete 
> docs score and order can change.


Re: reproducibility of query results

2012-04-01 Thread Benson Margulies
On Sun, Apr 1, 2012 at 1:05 PM, Steven A Rowe  wrote:

> If your results are only sorted by score, it's possible that some have
> exactly the same score.  Unless you use a secondary sort, I don't think the
> order of returned results among same-scored hits is guaranteed.  As a
> result, if you cut off hits at some fixed threshold, you could see
> different entries at the low-scoring end of the hit list. - Steve
>

THanks.


>
> -Original Message-
> From: Benson Margulies [mailto:bimargul...@gmail.com]
> Sent: Sunday, April 01, 2012 12:09 PM
> To: solr-user@lucene.apache.org
> Subject: Re: reproducibility of query results
>
> i make a new index each iteration. if I insert the same docs in the same
> order, should I expect the same query results? Note that I shut down
> entirely after the adds, then in a new process run the queries.
>
> On Apr 1, 2012, at 11:37 AM, Ahmet Arslan  wrote:
>
> >> I appear to be observing some
> >> unpredictability in query results, and I wanted to eliminate Solr
> >> itself as a possible cause.
> >>
> >> Using 1.4 at the moment. I insert a stack of document (using the
> >> EmbeddedSolrServer) and then run a query, retrieving 200 results. (A
> >> significant fraction of the docs in the index). Should I expect to
> >> get precisely the same docs in the same order with the same scores
> >> every time that I do this?
> >
> > If your index does not change, yes you can expect this. If you
> add/delete docs score and order can change.
>


Re: ExtractingRequestHandler

2012-04-01 Thread Erick Erickson
Yes, you can. but Generally, storing the raw input in Solr is
not the best approach. The problem here is that pretty soon
you get a huge index that contains *everything*. Solr was not
intended to be a data store.

Besides, you then need to store the binary form of the file. Solr
only deals with text, not markup.

Most people index the text in Solr, and enough information
so the application knows where to go to fetch the original
document when the user drills down (e.g. file path, database
PK, etc). Would that work for your situation?

Best
Erick

On Sat, Mar 31, 2012 at 3:55 PM,   wrote:
> Hi,
>
> I want to index various filetypes in solr, this can easily done with
> ExtractingRequestHandler. But I also need the extracted content back.
> I know ext.extract.only but then nothing gets indexed, right?
>
> Can I index the document AND get the content back as with ext.extract.only?
> In a single request?
>
> Thank you
>
>


RE: ExtractingRequestHandler

2012-04-01 Thread spring
Hi Erik,

I think we have some misunderstanding.

I want to index the text of the docs in Solr (only indexed, NOT stored).

But I want the text (Tika output) back for:

* later faster reindexing (some text extraction like OCR takes really long)
* use the text for other processings

The original doc is NOT stored in solr.


So my question was if I can index the original doc via
ExtractingRequestHandler in Solr AND get back the text output, in a single
call.

AFAIK I can do it only in 2 calls:

1) ExtractingRequestHandler?ext.extract.only=true -> Text
2) Index the text from 1) in solr


Thx 

> Yes, you can. but Generally, storing the raw input in Solr is
> not the best approach. The problem here is that pretty soon
> you get a huge index that contains *everything*. Solr was not
> intended to be a data store.
> 
> Besides, you then need to store the binary form of the file. Solr
> only deals with text, not markup.
> 
> Most people index the text in Solr, and enough information
> so the application knows where to go to fetch the original
> document when the user drills down (e.g. file path, database
> PK, etc). Would that work for your situation?
> 
> Best
> Erick
> 
> On Sat, Mar 31, 2012 at 3:55 PM,   wrote:
> > Hi,
> >
> > I want to index various filetypes in solr, this can easily done with
> > ExtractingRequestHandler. But I also need the extracted 
> content back.
> > I know ext.extract.only but then nothing gets indexed, right?
> >
> > Can I index the document AND get the content back as with 
> ext.extract.only?
> > In a single request?
> >
> > Thank you
> >
> >
> 



Re: 'foruns' don't match 'forum' with NGramFilterFactory (or EdgeNGramFilterFactory)

2012-04-01 Thread Erick Erickson
It's really hard to explain when there's not much background info. You
haven't provide much to analyze. You might review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

2012/4/1 Bráulio Bhavamitra :
> Using edismax defType made the thing work, could anybody explain why?
>
> (any stemming is enabled now)
>
> bráulio
>
> 2012/2/14 Bráulio Bhavamitra 
>
>> Hello all,
>>
>> I'm experimenting with NGramFilterFactory and EgdeNGramFilterFactory.
>>
>> Both of them shows a match in my solr admin analysis, but when I query
>> 'foruns'
>> doesn't find any 'forum'.
>> analysis
>> http://bhakta.casadomato.org:8982/solr/admin/analysis.jsp?nt=type&name=text&verbose=on&highlight=on&val=f%C3%B3runs&qverbose=on&qval=f%C3%B3runs
>> search
>> http://bhakta.casadomato.org:8982/solr/select/?q=foruns&version=2.2&start=0&rows=10&indent=on
>>
>> Anybody knows what's the problem?
>>
>> bráulio
>>


Re: ExtractingRequestHandler

2012-04-01 Thread Erick Erickson
Ahhh, OK. Sure, anything you store in Solr you can get back. The key
is not Tika, but your schema.xml file, and setting 'stored="true" '

bq: So my question was if I can index the original doc via
ExtractingRequestHandler in Solr AND get back the text output, in a single
call.

I know of now way to do this using Solr Cell. That said, you can always
use SolrJ and Tika on the client to separate the Tika parsing from
the indexing steps. Then you have all the parts available on the
client to do whatever you want.

 Solr Cell is great for proof-of-concept, but for heavy-duty applications,
you're offloading all the processing on the  Solr server, which can be a
problem.

Here's a writeup describing how to use Tika independently of
Solr while indexing data to Solr that might help:

http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Hope that helps
Erick

On Sun, Apr 1, 2012 at 1:27 PM,   wrote:
> Hi Erik,
>
> I think we have some misunderstanding.
>
> I want to index the text of the docs in Solr (only indexed, NOT stored).
>
> But I want the text (Tika output) back for:
>
> * later faster reindexing (some text extraction like OCR takes really long)
> * use the text for other processings
>
> The original doc is NOT stored in solr.
>
>
> So my question was if I can index the original doc via
> ExtractingRequestHandler in Solr AND get back the text output, in a single
> call.
>
> AFAIK I can do it only in 2 calls:
>
> 1) ExtractingRequestHandler?ext.extract.only=true -> Text
> 2) Index the text from 1) in solr
>
>
> Thx
>
>> Yes, you can. but Generally, storing the raw input in Solr is
>> not the best approach. The problem here is that pretty soon
>> you get a huge index that contains *everything*. Solr was not
>> intended to be a data store.
>>
>> Besides, you then need to store the binary form of the file. Solr
>> only deals with text, not markup.
>>
>> Most people index the text in Solr, and enough information
>> so the application knows where to go to fetch the original
>> document when the user drills down (e.g. file path, database
>> PK, etc). Would that work for your situation?
>>
>> Best
>> Erick
>>
>> On Sat, Mar 31, 2012 at 3:55 PM,   wrote:
>> > Hi,
>> >
>> > I want to index various filetypes in solr, this can easily done with
>> > ExtractingRequestHandler. But I also need the extracted
>> content back.
>> > I know ext.extract.only but then nothing gets indexed, right?
>> >
>> > Can I index the document AND get the content back as with
>> ext.extract.only?
>> > In a single request?
>> >
>> > Thank you
>> >
>> >
>>
>


Re: ExtractingRequestHandler

2012-04-01 Thread Bill Bell
I have had good luck with creating a separate core index for just data. This is 
a different core than the indexed core.

Very fast.

Bill Bell
Sent from mobile


On Apr 1, 2012, at 11:15 AM, Erick Erickson  wrote:

> Yes, you can. but Generally, storing the raw input in Solr is
> not the best approach. The problem here is that pretty soon
> you get a huge index that contains *everything*. Solr was not
> intended to be a data store.
> 
> Besides, you then need to store the binary form of the file. Solr
> only deals with text, not markup.
> 
> Most people index the text in Solr, and enough information
> so the application knows where to go to fetch the original
> document when the user drills down (e.g. file path, database
> PK, etc). Would that work for your situation?
> 
> Best
> Erick
> 
> On Sat, Mar 31, 2012 at 3:55 PM,   wrote:
>> Hi,
>> 
>> I want to index various filetypes in solr, this can easily done with
>> ExtractingRequestHandler. But I also need the extracted content back.
>> I know ext.extract.only but then nothing gets indexed, right?
>> 
>> Can I index the document AND get the content back as with ext.extract.only?
>> In a single request?
>> 
>> Thank you
>> 
>> 


Open deleted index file failing jboss shutdown with Too many open files Error

2012-04-01 Thread Gopal Patwa
I am using Solr 4.0 nightly build with NRT and I often get this
error during auto commit "Too many open files". I have search this forum
and what I found it is related to OS ulimit setting, please see below my
ulimit settings. I am not sure what ulimit setting I should have for open
file? ulimit -n unlimited?.

Even if I set to higher number, it will just delay the issue until it reach
new open file limit. What I have seen that Solr has kept deleted index file
open by java process, which causing issue for our application server jboss
to shutdown gracefully due this open files by java process.

I have seen recently this issue was resolved in lucene, is it TRUE?

https://issues.apache.org/jira/browse/LUCENE-3855


I have 3 core with index size : core1 - 70GB, Core2 - 50GB and Core3
- 15GB, with Single shard

We update the index every 5 seconds, soft commit every 1 second and hard
commit every 15 minutes

Environment: Jboss 4.2, JDK 1.6 64 bit, CentOS , JVM Heap Size = 24GB*


ulimit:

core file size  (blocks, -c) 0

data seg size   (kbytes, -d) unlimited

scheduling priority (-e) 0

file size   (blocks, -f) unlimited

pending signals (-i) 401408

max locked memory   (kbytes, -l) 1024

max memory size (kbytes, -m) unlimited

open files  (-n) 4096

pipe size(512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority  (-r) 0

stack size  (kbytes, -s) 10240

cpu time   (seconds, -t) unlimited

max user processes  (-u) 401408

virtual memory  (kbytes, -v) unlimited

file locks  (-x) unlimited


ERROR:*

*2012-04-01* *20:08:35*,*323* [] *priority=ERROR* *app_name=*
*thread=pool-10-thread-1* *location=CommitTracker* *line=93* *auto*
*commit* *error...:org.apache.solr.common.SolrException:* *Error*
*opening* *new* *searcher*
*at* 
*org.apache.solr.core.SolrCore.openNewSearcher*(*SolrCore.java:1138*)
*at* *org.apache.solr.core.SolrCore.getSearcher*(*SolrCore.java:1251*)
*at* 
*org.apache.solr.update.DirectUpdateHandler2.commit*(*DirectUpdateHandler2.java:409*)
*at* 
*org.apache.solr.update.CommitTracker.run*(*CommitTracker.java:197*)
*at* 
*java.util.concurrent.Executors$RunnableAdapter.call*(*Executors.java:441*)
*at* 
*java.util.concurrent.FutureTask$Sync.innerRun*(*FutureTask.java:303*)
*at* *java.util.concurrent.FutureTask.run*(*FutureTask.java:138*)
*at* 
*java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301*(*ScheduledThreadPoolExecutor.java:98*)
*at* 
*java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run*(*ScheduledThreadPoolExecutor.java:206*)
*at* 
*java.util.concurrent.ThreadPoolExecutor$Worker.runTask*(*ThreadPoolExecutor.java:886*)
*at* 
*java.util.concurrent.ThreadPoolExecutor$Worker.run*(*ThreadPoolExecutor.java:908*)
*at* *java.lang.Thread.run*(*Thread.java:662*)*Caused* *by:*
*java.io.FileNotFoundException:*
*/opt/mci/data/srwp01mci001/inventory/index/_4q1y_0.tip* (*Too many
open files*)
*at* *java.io.RandomAccessFile.open*(*Native* *Method*)
*at* *java.io.RandomAccessFile.*<*init*>(*RandomAccessFile.java:212*)
*at* 
*org.apache.lucene.store.FSDirectory$FSIndexOutput.*<*init*>(*FSDirectory.java:449*)
*at* 
*org.apache.lucene.store.FSDirectory.createOutput*(*FSDirectory.java:288*)
*at* 
*org.apache.lucene.codecs.BlockTreeTermsWriter.*<*init*>(*BlockTreeTermsWriter.java:161*)
*at* 
*org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer*(*Lucene40PostingsFormat.java:66*)
*at* 
*org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField*(*PerFieldPostingsFormat.java:118*)
*at* 
*org.apache.lucene.index.FreqProxTermsWriterPerField.flush*(*FreqProxTermsWriterPerField.java:322*)
*at* 
*org.apache.lucene.index.FreqProxTermsWriter.flush*(*FreqProxTermsWriter.java:92*)
*at* *org.apache.lucene.index.TermsHash.flush*(*TermsHash.java:117*)
*at* *org.apache.lucene.index.DocInverter.flush*(*DocInverter.java:53*)
*at* 
*org.apache.lucene.index.DocFieldProcessor.flush*(*DocFieldProcessor.java:81*)
*at* 
*org.apache.lucene.index.DocumentsWriterPerThread.flush*(*DocumentsWriterPerThread.java:475*)
*at* 
*org.apache.lucene.index.DocumentsWriter.doFlush*(*DocumentsWriter.java:422*)
*at* 
*org.apache.lucene.index.DocumentsWriter.flushAllThreads*(*DocumentsWriter.java:553*)
*at* 
*org.apache.lucene.index.IndexWriter.getReader*(*IndexWriter.java:354*)
*at* 
*org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter*(*StandardDirectoryReader.java:258*)
*at* 
*org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged*(*StandardDirectoryReader.java:243*)
*at* 
*org.apache.lucene.index.Director

Re: Luke using shards

2012-04-01 Thread Lance Norskog
> (http://localhost:8983/solr/admin/luke?shards=localhost:8983/solr,localhost:7574/solr)

If luke did distributed search, it would go into an infinite loop :)

On Thu, Mar 29, 2012 at 5:03 AM, Dmitry Kan  wrote:
> One option to try here (not verified) is to set up a Solr front that will
> point to these two shards. Then try accessing its luke interface via admin
> as you did on one of the shards.
>
> But as Erick already pointed out, Luke operates on a lower level than Solr,
> so this does not necessarily work.
>
> Dmitry
>
> On Wed, Mar 28, 2012 at 11:02 PM, Dennis Brundage
> wrote:
>
>> Is there a way to get Solr/Luke to return the aggregated results across
>> shards? I tried setting the shards parameter
>> (
>> http://localhost:8983/solr/admin/luke?shards=localhost:8983/solr,localhost:7574/solr
>> )
>> but only got the results for localhost:8983. I am able to search across the
>> shards so my url's are correct.
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Luke-using-shards-tp3865816p3865816.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Regards,
>
> Dmitry Kan



-- 
Lance Norskog
goks...@gmail.com


Re: Trouble handling Unit symbol

2012-04-01 Thread Rajani Maski
Thank you for the reply.



On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter
wrote:

>
> : We have data having such symbols like :  ?
> : Indexed data has  -Dose:"0 ?L"
> : Now , when  it is searched as  - Dose:"0 ?L"
>...
> : Query Q value observed  : S257:"0 ??L/injection"
>
> First off: your "when searched as" example does not match up to your
> "Query Q" observed value (ie: field queries, extra "/injection" text at
> the end) suggesting that you maybe cut/paste something you didn't mean to
> -- so take the rest of this advice with a grain of salt.
>
> If i ignore your "when it is searched as" exampleand focus entirely on
> what you say you've indexed the data as, and the Q value you are sing (in
> what looks like the echoParams output) then the first thing that jumps out
> at me is that it looks like your servlet container (or perhaps your web
> browser if that's where you tested this) is not dealing with the unicode
> correctly -- because allthough i see a "?" in the first three lines i
> quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
> preceeded by a "?" (UTF8: 0xC3 0x82) ... suggesting that perhaps the "?"
> did not get URL encoded properly when the request was made to your servlet
> container?
>
> In particular, you might want to take a look at...
>
>
> https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
> http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
> The example/exampledocs/test_utf8.sh script included with solr
>
>
>
>
> -Hoss


Re: SolrCloud

2012-04-01 Thread asia
Then what exactly solrcloud does.because when i am firing query I am getting
response even without zookeeper?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3876820.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud

2012-04-01 Thread asia
Thanks for replying,
So if i will make a replica of each shard,then should I use zookeeper for
every shards and replica or only for the replica.! more question i want to
ask is that I am using solr in tomcat and eclipse environment using solrj.so
I am a bit confuse as to how to use zookeeper in it along with tomcat.I have
downloaded zookeeper jar files also but need little help in it.
-Asia

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3876869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud new....

2012-04-01 Thread asia
Hello,
even I am working on the same,I have tried with the wiki example but I am
getting errors.I want to use zookeeper with solrj in eclipse using
tomcat.Need little help.as to how to integrate zookeeper in eclipse for
solrcloud.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p3876928.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Responding to Requests with Chunks/Streaming

2012-04-01 Thread Mikhail Khludnev
Hello,

Small update - reading streamed response is done via callback. No
SolrDocumentList in memory.
https://github.com/m-khl/solr-patches/tree/streaming
here is the test
https://github.com/m-khl/solr-patches/blob/d028d4fabe0c20cb23f16098637e2961e9e2366e/solr/core/src/test/org/apache/solr/response/ResponseStreamingTest.java#L138

no progress in distributed search via streaming yet.

Pls let me know if you don't want to have updates from my playground.

Regards

On Thu, Mar 29, 2012 at 1:02 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> @All
> Why nobody desires such a pretty cool feature?
>
> Nicholas,
> I have a tiny progress: I'm able to stream in javabin codec format while
> searching, It implies sorting by _docid_
>
> here is the diff
>
> https://github.com/m-khl/solr-patches/commit/2f9ff068c379b3008bb983d0df69dff714ddde95
>
> The current issue is that reading response by SolrJ is done as whole.
> Reading by callback is supported by EmbeddedServer only. Anyway it should
> not a big deal. ResponseStreamingTest.java somehow works.
> I'm stuck on introducing response streaming in distributes search, it's
> actually more challenging  - RespStreamDistributedTest fails
>
> Regards
>
>
> On Fri, Mar 16, 2012 at 3:51 PM, Nicholas Ball 
> wrote:
>
>>
>> Mikhail & Ludovic,
>>
>> Thanks for both your replies, very helpful indeed!
>>
>> Ludovic, I was actually looking into just that and did some tests with
>> SolrJ, it does work well but needs some changes on the Solr server if we
>> want to send out individual documents a various times. This could be done
>> with a write() and flush() to the FastOutputStream (daos) in JavBinCodec.
>> I
>> therefore think that a combination of this and Mikhail's solution would
>> work best!
>>
>> Mikhail, you mention that your solution doesn't currently work and not
>> sure why this is the case, but could it be that you haven't flushed the
>> data (os.flush()) you've written in the collect method of DocSetStreamer?
>> I
>> think placing the output stream into the SolrQueryRequest is the way to
>> go,
>> so that we can access it and write to it how we intend. However, I think
>> using the JavaBinCodec would be ideal so that we can work with SolrJ
>> directly, and not mess around with the encoding of the docs/data etc...
>>
>> At the moment the entry point to JavaBinCodec is through the
>> BinaryResponseWriter which calls the highest level marshal() method which
>> decodes and sends out the entire SolrQueryResponse (line 49 @
>> BinaryResponseWriter). What would be ideal is to be able to break up the
>> response and call the JavaBinCodec for pieces of it with a flush after
>> each
>> call. Did a few tests with a simple Thread.sleep and a flush to see if
>> this
>> would actually work and looks like it's working out perfectly. Just trying
>> to figure out the best way to actually do it now :) any ideas?
>>
>> An another note, for a solution to work with the chunked transfer encoding
>> (and therefore web browsers), a lot more development is going to be
>> needed.
>> Not sure if it's worth trying yet but might look into it later down the
>> line.
>>
>> Nick
>>
>> On Fri, 16 Mar 2012 07:29:20 +0300, Mikhail Khludnev
>>  wrote:
>> > Ludovic,
>> >
>> > I looked through. First of all, it seems to me you don't amend regular
>> > "servlet" solr server, but the only embedded one.
>> > Anyway, the difference is that you stream DocList via callback, but it
>> > means that you've instantiated it in memory and keep it there until it
>> will
>> > be completely consumed. Think about a billion numfound. Core idea of my
>> > approach is keep almost zero memory for response.
>> >
>> > Regards
>> >
>> > On Fri, Mar 16, 2012 at 12:12 AM, lboutros  wrote:
>> >
>> >> Hi,
>> >>
>> >> I was looking for something similar.
>> >>
>> >> I tried this patch :
>> >>
>> >> https://issues.apache.org/jira/browse/SOLR-2112
>> >>
>> >> it's working quite well (I've back-ported the code in Solr 3.5.0...).
>> >>
>> >> Is it really different from what you are trying to achieve ?
>> >>
>> >> Ludovic.
>> >>
>> >> -
>> >> Jouve
>> >> France.
>> >> --
>> >> View this message in context:
>> >>
>>
>> http://lucene.472066.n3.nabble.com/Responding-to-Requests-with-Chunks-Streaming-tp3827316p3829909.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> ge...@yandex.ru
>
> 
>  
>
>


-- 
Sincerely yours
Mikhail Khludnev
ge...@yandex.ru