Re: pagerank??

2012-04-04 Thread Bing Li
According to my knowledge, Solr cannot support this.

In my case, I get data by keyword-matching from Solr and then rank the data
by PageRank after that.

Thanks,
Bing

On Wed, Apr 4, 2012 at 6:37 AM, Manuel Antonio Novoa Proenza <
mano...@estudiantes.uci.cu> wrote:

> Hello,
>
> I have in my Solr index , many indexed documents.
>
> Let me know any way or efficient function to calculate the page rank of
> websites indexed.
>
>
> s
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


Re: pagerank??

2012-04-04 Thread Ravish Bhagdev
You might want to look into Nutch and its LinkRank instead of Solr for
this.  For obtaining such information, you need a crawler to crawl through
the links.  Not what Solr is meant for.

Rav

On Wed, Apr 4, 2012 at 8:46 AM, Bing Li  wrote:

> According to my knowledge, Solr cannot support this.
>
> In my case, I get data by keyword-matching from Solr and then rank the data
> by PageRank after that.
>
> Thanks,
> Bing
>
> On Wed, Apr 4, 2012 at 6:37 AM, Manuel Antonio Novoa Proenza <
> mano...@estudiantes.uci.cu> wrote:
>
> > Hello,
> >
> > I have in my Solr index , many indexed documents.
> >
> > Let me know any way or efficient function to calculate the page rank of
> > websites indexed.
> >
> >
> > s
> >
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> > INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
>


Re: UTF-8 encoding

2012-04-04 Thread henri
I have finally solved my problem!!

Did the following:

added two lines in the /browse requestHandler
   velocity.properties
   text/html;charset=UTF-8

Moved velocity.properties from solr/conf/velocity to solr/conf

Not being an expert, I am not 100% sure this is the "best" solution, and
where and how it should be documented in the solr/velocity package. I will
leave this doc update to afficionados.

Cheers to all,
Henri

--
View this message in context: 
http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3883485.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread Ravish Bhagdev
Updating a single field is not possible in solr.  The whole record has to
be rewritten.

300 MB is still not that big a file.  Have you tried doing the indexing (if
its only a one time thing) by giving it ~2 GB or xmx?

A single file with that size is strange!  May I ask what is it?

Rav

On Tue, Apr 3, 2012 at 7:32 PM, vybe3142  wrote:

>
> Some days ago, I posted about an issue with SOLR running out of memory when
> attempting to index large text files (say 300 MB ). Details at
>
> http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-td3846939.html
>
> Two things I need to point out:
>
> 1. I don't need Tika for content extraction as the files are already in
> plain text format.
> 2. The heap space error was caused by a futile Tika/SOLR attempt at
> creating
> the corresponding huge XML document in memory
>
> I've decided to develop a custom handler that
> 1. reads the file text directly
> 2. attempts to create a SOLR document and directly add the text data to the
> corresponding field.
>
> One approach I've taken is to read manageable chunks of text data
> sequentially from the file and process. We've used this approach
> sucessfully
> with Lucene in the past and I'm attempting to make it work with SOLR too. I
> got most of the work done yesterday, but need a bit of guidance w.r.t.
> point
> 2.
>
> How can I achieve updating the same field multiple times. Looking at the
> SOLR source, processor.addField() merely
> a. adds to the in-memory field map and
> b. attempts to write EVERYTHING to the index later on.
>
> In my situation, (a) eventually causes a heap space error.
>
>
>
>
> Here's part of the handler code.
>
>
>
> Thanks much
>
> Thanks
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Incremantally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3881945.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


query time customized boosting

2012-04-04 Thread monmohan
Hi,
My index is composed of documents with an "author" field. My system is a
users portal where they can have a friend relationship among each other.
When a user searches for documents, I would like to boost score of docs in
which  author is friend of the user doing the search. Note that the list of
friends for a user can be potentially big and dynamic (changing as the user
makes more friends)

Is there a way to do this kind of boosting at query time? I have looked at
External field, query elevator and function queries but it appears that none
of them 

Since the list of friends for a user is dynamic and per user based, it can't
really be added as a field in the index for each document so I am not
considering that option at all.
Regards
Monmohan

--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-time-customized-boosting-tp3883743p3883743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Choosing tokenizer based on language of document

2012-04-04 Thread Prakashganesh, Prabhu
Hi,
  I have documents in different languages and I want to choose the 
tokenizer to use for a document based on the language of the document. The 
language of the document is already known and is indexed in a field. What I 
want to do is when I index the text in the document, I want to choose the 
tokenizer to use based on the value of the language field. I want to use one 
field for the text in the document (defining multiple fields for each language 
is not an option). It seems like I can define a tokenizer for a field, so I 
guess what I need to do is to write a custom tokenizer that looks at the 
language field value of the document and calls the appropriate tokenizer for 
that language (e.g. StandardTokenizer for English, CJKTokenizer for CJK 
languages etc..). From whatever I have read, it seems quite straight forward to 
write a custom tokenizer, but how would this custom tokenizer know the language 
of the document? Is there some way I can pass in this value to the tokenizer? 
Or is there some way the tokenizer will have access to other fields in the 
document?. Would be really helpful if someone can provide an answer

Thanks
Prabhu


Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread Mikhail Khludnev
There is https://issues.apache.org/jira/browse/LUCENE-3837 but I suppose
it's too far from completion.

On Wed, Apr 4, 2012 at 2:48 PM, Ravish Bhagdev wrote:

> Updating a single field is not possible in solr.  The whole record has to
> be rewritten.
>
> 300 MB is still not that big a file.  Have you tried doing the indexing (if
> its only a one time thing) by giving it ~2 GB or xmx?
>
> A single file with that size is strange!  May I ask what is it?
>
> Rav
>
> On Tue, Apr 3, 2012 at 7:32 PM, vybe3142  wrote:
>
> >
> > Some days ago, I posted about an issue with SOLR running out of memory
> when
> > attempting to index large text files (say 300 MB ). Details at
> >
> >
> http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-td3846939.html
> >
> > Two things I need to point out:
> >
> > 1. I don't need Tika for content extraction as the files are already in
> > plain text format.
> > 2. The heap space error was caused by a futile Tika/SOLR attempt at
> > creating
> > the corresponding huge XML document in memory
> >
> > I've decided to develop a custom handler that
> > 1. reads the file text directly
> > 2. attempts to create a SOLR document and directly add the text data to
> the
> > corresponding field.
> >
> > One approach I've taken is to read manageable chunks of text data
> > sequentially from the file and process. We've used this approach
> > sucessfully
> > with Lucene in the past and I'm attempting to make it work with SOLR
> too. I
> > got most of the work done yesterday, but need a bit of guidance w.r.t.
> > point
> > 2.
> >
> > How can I achieve updating the same field multiple times. Looking at the
> > SOLR source, processor.addField() merely
> > a. adds to the in-memory field map and
> > b. attempts to write EVERYTHING to the index later on.
> >
> > In my situation, (a) eventually causes a heap space error.
> >
> >
> >
> >
> > Here's part of the handler code.
> >
> >
> >
> > Thanks much
> >
> > Thanks
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Incremantally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3881945.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
Sincerely yours
Mikhail Khludnev
ge...@yandex.ru


 


Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread Ravish Bhagdev
Yes, I think there are good reasons why it works like that.  Focus of
search system is to be efficient on query side at cost of being not that
efficient on storage.

You must however also note that by default a field's length is limited to
1 words in solrconf.xml which you may also need to modify.  But I guess
if its going out of memory you might have already done this?

Ravish

On Wed, Apr 4, 2012 at 1:34 PM, Mikhail Khludnev  wrote:

> There is https://issues.apache.org/jira/browse/LUCENE-3837 but I suppose
> it's too far from completion.
>
> On Wed, Apr 4, 2012 at 2:48 PM, Ravish Bhagdev  >wrote:
>
> > Updating a single field is not possible in solr.  The whole record has to
> > be rewritten.
> >
> > 300 MB is still not that big a file.  Have you tried doing the indexing
> (if
> > its only a one time thing) by giving it ~2 GB or xmx?
> >
> > A single file with that size is strange!  May I ask what is it?
> >
> > Rav
> >
> > On Tue, Apr 3, 2012 at 7:32 PM, vybe3142  wrote:
> >
> > >
> > > Some days ago, I posted about an issue with SOLR running out of memory
> > when
> > > attempting to index large text files (say 300 MB ). Details at
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-td3846939.html
> > >
> > > Two things I need to point out:
> > >
> > > 1. I don't need Tika for content extraction as the files are already in
> > > plain text format.
> > > 2. The heap space error was caused by a futile Tika/SOLR attempt at
> > > creating
> > > the corresponding huge XML document in memory
> > >
> > > I've decided to develop a custom handler that
> > > 1. reads the file text directly
> > > 2. attempts to create a SOLR document and directly add the text data to
> > the
> > > corresponding field.
> > >
> > > One approach I've taken is to read manageable chunks of text data
> > > sequentially from the file and process. We've used this approach
> > > sucessfully
> > > with Lucene in the past and I'm attempting to make it work with SOLR
> > too. I
> > > got most of the work done yesterday, but need a bit of guidance w.r.t.
> > > point
> > > 2.
> > >
> > > How can I achieve updating the same field multiple times. Looking at
> the
> > > SOLR source, processor.addField() merely
> > > a. adds to the in-memory field map and
> > > b. attempts to write EVERYTHING to the index later on.
> > >
> > > In my situation, (a) eventually causes a heap space error.
> > >
> > >
> > >
> > >
> > > Here's part of the handler code.
> > >
> > >
> > >
> > > Thanks much
> > >
> > > Thanks
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Incremantally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3881945.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> ge...@yandex.ru
>
> 
>  
>


LBHttpSolrServer to query a preferred server

2012-04-04 Thread Martin Grotzke
Hi,

we want to use the LBHttpSolrServer (4.0/trunk) and specify a preferred
server. Our use case is that for one user request we make several solr
requests with some heavy caching (using a custom request handler with a
special cache) and want to make sure that the subsequent solr requests
are hitting the same solr server.

A possible solution with LBHttpSolrServer would look like this:
- LBHttpSolrServer provides a method getSolrServer() that returns a
ServerWrapper
- LBHttpSolrServer provides a method
   request(final SolrRequest request, ServerWrapper preferredServer)
  that returns the response (NamedList).

This method first tries the specified preferredServer and if this fails
queries all others (first alive servers then zombies).

What do you think of this solution? Any other solution preferred?

I'll start implementing this and submit an issue/patch hoping that it
makes it into trunk.

Cheers,
Martin



signature.asc
Description: OpenPGP digital signature


Commitwithin

2012-04-04 Thread Jens Ellenberg

Hello,

I am trying to use commitwithin in Java but there seams to be no commit 
at all with this option.


1. Example Code:

UpdateRequest request = new UpdateRequest();
request.deleteByQuery("fild:value");
request.setCommitWithin(1);
System.out.println(request.getCommitWithin());
server.request(request);

2. Example Code:

server.add(aSolrDocument,1);

Only after an explicit commit ( server.commit(); ) are the changes 
available.


I have deleted the "autocommit" option in the solrconfig. Has anyone an 
idea?


Greetings
Jens

--
Jens Ellenberg,
Master of Science (Informatik)

Tel   +49 (40) 39 99 76 45
Fax   +49 (40) 39 99 76 40
EMail ellenb...@silpion.de

Silpion IT-Solutions GmbH
Firmensitz: Brandshofer Deich 48, 20539 Hamburg
Registergericht: Amtsgericht Hamburg, HRB 78585
Geschäftsführer: Patrick Postel



Re: solrcloud is deleteByQuery stored in transactions and forwarded like other operations?

2012-04-04 Thread Mark Miller

On Apr 3, 2012, at 10:35 PM, Jamie Johnson wrote:

> I haven't personally seen this issue but I have been told by another
> developer that he ran a deleteByQuery("*:*").  This deleted the index,
> but on restart there was information still in the index.  Should this
> be possible?  I had planned to setup something to test this locally
> but wanted to know if anyone is aware of anything like this.


It *shouldn't* be possible, but it depends on many things. At this point, I 
suspect either a bug or its an old build. Towards the end of the recent 
SolrCloud spurt, Yonik did some work on DBQ and recovery. A build before that 
could have had issues like this. If you see it on a recent build, I'd def file 
a JIRA.

- Mark Miller
lucidimagination.com













Re: Commitwithin

2012-04-04 Thread Mark Miller
Solr version? I think that for a while now, deletes where not triggering 
commitWithin. I think this was recently fixed - if I remember right it will be 
part of 3.6 and then 4.


- Mark Miller
lucidimagination.com

On Apr 4, 2012, at 10:12 AM, Jens Ellenberg wrote:

> Hello,
> 
> I am trying to use commitwithin in Java but there seams to be no commit at 
> all with this option.
> 
> 1. Example Code:
> 
>UpdateRequest request = new UpdateRequest();
>request.deleteByQuery("fild:value");
>request.setCommitWithin(1);
>System.out.println(request.getCommitWithin());
>server.request(request);
> 
> 2. Example Code:
> 
>server.add(aSolrDocument,1);
> 
> Only after an explicit commit ( server.commit(); ) are the changes available.
> 
> I have deleted the "autocommit" option in the solrconfig. Has anyone an idea?
> 
> Greetings
> Jens
> 
> -- 
> Jens Ellenberg,
> Master of Science (Informatik)
> 
> Tel   +49 (40) 39 99 76 45
> Fax   +49 (40) 39 99 76 40
> EMail ellenb...@silpion.de
> 
> Silpion IT-Solutions GmbH
> Firmensitz: Brandshofer Deich 48, 20539 Hamburg
> Registergericht: Amtsgericht Hamburg, HRB 78585
> Geschäftsführer: Patrick Postel
> 















Search for "library" returns 0 results, but search for "marion library" returns many results

2012-04-04 Thread Sean Adams-Hiett
This is cross posted on Drupal.org: http://drupal.org/node/1515046

Summary: I have a fairly clean install of Drupal 7 with
Apachesolr-1.0-beta18. I have created a content type called document with a
number of fields. I am working with 30k+ records, most of which are related
to "Marion, IA" in some way. A search for "library" (without the quotes)
returns no results, while a search for "marion library" returns thousands
of results. That doesn't make any sense to me at all.

Details:

  Drupal 7 (latest stable version)
  Apachesolr-1.0-beta18
  Custom content type with many fields
  LAMP stack running on Centos Linode
  PHP 5.2.x


I also checked this through the solr admin interface, running the same
searches with similar results, so I can't rule out the possibility that
something is configured wrong... but since I am using the solrconfig.xml
and schema.xml files provided with the modules, it is also a possibility
that the issue lies here as well. I have watched the logs and during the
searches that produce no results but should, there is no output in the log
besides the regular [INFO] about the query.

I am stumped and I am past a deadline with this project, so any help would
be greatly appreciated.

-- 
Sean Adams-Hiett
Director of Development
The Advantage Companies
s...@advantage-companies.com
www.advantage-companies.com


RE: Search for "library" returns 0 results, but search for "marion library" returns many results

2012-04-04 Thread Joshua Sumali
Did you try to append &debugQuery=on to get more information?

> -Original Message-
> From: Sean Adams-Hiett [mailto:s...@advantage-companies.com]
> Sent: Wednesday, April 04, 2012 10:43 AM
> To: solr-user@lucene.apache.org
> Subject: Search for "library" returns 0 results, but search for "marion 
> library"
> returns many results
> 
> This is cross posted on Drupal.org: http://drupal.org/node/1515046
> 
> Summary: I have a fairly clean install of Drupal 7 with 
> Apachesolr-1.0-beta18. I
> have created a content type called document with a number of fields. I am
> working with 30k+ records, most of which are related to "Marion, IA" in some
> way. A search for "library" (without the quotes) returns no results, while a
> search for "marion library" returns thousands of results. That doesn't make
> any sense to me at all.
> 
> Details:
> 
>   Drupal 7 (latest stable version)
>   Apachesolr-1.0-beta18
>   Custom content type with many fields
>   LAMP stack running on Centos Linode
>   PHP 5.2.x
> 
> 
> I also checked this through the solr admin interface, running the same
> searches with similar results, so I can't rule out the possibility that 
> something
> is configured wrong... but since I am using the solrconfig.xml and schema.xml
> files provided with the modules, it is also a possibility that the issue lies 
> here
> as well. I have watched the logs and during the searches that produce no
> results but should, there is no output in the log besides the regular
> [INFO] about the query.
> 
> I am stumped and I am past a deadline with this project, so any help would
> be greatly appreciated.
> 
> --
> Sean Adams-Hiett
> Director of Development
> The Advantage Companies
> s...@advantage-companies.com
> www.advantage-companies.com


Re: Search for "library" returns 0 results, but search for "marion library" returns many results

2012-04-04 Thread Ravish Bhagdev
Yes, can you check if results you get with "marion library" match on marion
or library?  By default solr uses OR between words (specified in
solrconfig.xml).  You can also easily check this by enabling highlighting.

Ravish

On Wed, Apr 4, 2012 at 4:11 PM, Joshua Sumali  wrote:

> Did you try to append &debugQuery=on to get more information?
>
> > -Original Message-
> > From: Sean Adams-Hiett [mailto:s...@advantage-companies.com]
> > Sent: Wednesday, April 04, 2012 10:43 AM
> > To: solr-user@lucene.apache.org
> > Subject: Search for "library" returns 0 results, but search for "marion
> library"
> > returns many results
> >
> > This is cross posted on Drupal.org: http://drupal.org/node/1515046
> >
> > Summary: I have a fairly clean install of Drupal 7 with
> Apachesolr-1.0-beta18. I
> > have created a content type called document with a number of fields. I am
> > working with 30k+ records, most of which are related to "Marion, IA" in
> some
> > way. A search for "library" (without the quotes) returns no results,
> while a
> > search for "marion library" returns thousands of results. That doesn't
> make
> > any sense to me at all.
> >
> > Details:
> > 
> >   Drupal 7 (latest stable version)
> >   Apachesolr-1.0-beta18
> >   Custom content type with many fields
> >   LAMP stack running on Centos Linode
> >   PHP 5.2.x
> > 
> >
> > I also checked this through the solr admin interface, running the same
> > searches with similar results, so I can't rule out the possibility that
> something
> > is configured wrong... but since I am using the solrconfig.xml and
> schema.xml
> > files provided with the modules, it is also a possibility that the issue
> lies here
> > as well. I have watched the logs and during the searches that produce no
> > results but should, there is no output in the log besides the regular
> > [INFO] about the query.
> >
> > I am stumped and I am past a deadline with this project, so any help
> would
> > be greatly appreciated.
> >
> > --
> > Sean Adams-Hiett
> > Director of Development
> > The Advantage Companies
> > s...@advantage-companies.com
> > www.advantage-companies.com
>


Re: PageRank

2012-04-04 Thread Manuel Antonio Novoa Proenza
hi Rav
Thank you for your answer.

In my case I use nutch for crawling the web. Using nutch am a true rookie. How 
do I configure nutch to return that information? And how do I make solr to 
index that information, or that information is being built with the score of 
the indexed documents.

thank you very much
















Saludos...














Manuel Antonio Novoa Proenza
Universidad de las Ciencias Informáticas
Email: mano...@estudiantes.uci.cu




10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci



Re: PageRank

2012-04-04 Thread Markus Jelsma

Hi,

Please subscribe to the Nutch mailing list. Scoring is straightforward 
and calculated scores can be written to the CrawlDB or as external file 
field for Solr.


Cheers

On Wed, 04 Apr 2012 10:22:46 -0500 (COT), Manuel Antonio Novoa Proenza 
 wrote:

hi Rav
Thank you for your answer.

In my case I use nutch for crawling the web. Using nutch am a true
rookie. How do I configure nutch to return that information? And how
do I make solr to index that information, or that information is 
being

built with the score of the indexed documents.

thank you very much
















Saludos...














Manuel Antonio Novoa Proenza
Universidad de las Ciencias Informáticas
Email: mano...@estudiantes.uci.cu




10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Search for "library" returns 0 results, but search for "marion library" returns many results

2012-04-04 Thread Sean Adams-Hiett
Here are some of the XML results with the debug on:





library
library

+DisjunctionMaxQuery((content:librari)~0.01)
DisjunctionMaxQuery((content:librari^2.0)~0.01)

+(content:librari)~0.01
(content:librari^2.0)~0.01

DisMaxQParser



0.0

0.0

0.0


0.0


0.0


0.0


0.0


0.0


0.0



0.0

0.0


0.0


0.0


0.0


0.0


0.0


0.0






It looks like somehow the query is getting converted from "library" to
"librari". Any idea how that would happen?

Sean

On Wed, Apr 4, 2012 at 10:13 AM, Ravish Bhagdev wrote:

> Yes, can you check if results you get with "marion library" match on marion
> or library?  By default solr uses OR between words (specified in
> solrconfig.xml).  You can also easily check this by enabling highlighting.
>
> Ravish
>
> On Wed, Apr 4, 2012 at 4:11 PM, Joshua Sumali  wrote:
>
> > Did you try to append &debugQuery=on to get more information?
> >
> > > -Original Message-
> > > From: Sean Adams-Hiett [mailto:s...@advantage-companies.com]
> > > Sent: Wednesday, April 04, 2012 10:43 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Search for "library" returns 0 results, but search for "marion
> > library"
> > > returns many results
> > >
> > > This is cross posted on Drupal.org: http://drupal.org/node/1515046
> > >
> > > Summary: I have a fairly clean install of Drupal 7 with
> > Apachesolr-1.0-beta18. I
> > > have created a content type called document with a number of fields. I
> am
> > > working with 30k+ records, most of which are related to "Marion, IA" in
> > some
> > > way. A search for "library" (without the quotes) returns no results,
> > while a
> > > search for "marion library" returns thousands of results. That doesn't
> > make
> > > any sense to me at all.
> > >
> > > Details:
> > > 
> > >   Drupal 7 (latest stable version)
> > >   Apachesolr-1.0-beta18
> > >   Custom content type with many fields
> > >   LAMP stack running on Centos Linode
> > >   PHP 5.2.x
> > > 
> > >
> > > I also checked this through the solr admin interface, running the same
> > > searches with similar results, so I can't rule out the possibility that
> > something
> > > is configured wrong... but since I am using the solrconfig.xml and
> > schema.xml
> > > files provided with the modules, it is also a possibility that the
> issue
> > lies here
> > > as well. I have watched the logs and during the searches that produce
> no
> > > results but should, there is no output in the log besides the regular
> > > [INFO] about the query.
> > >
> > > I am stumped and I am past a deadline with this project, so any help
> > would
> > > be greatly appreciated.
> > >
> > > --
> > > Sean Adams-Hiett
> > > Director of Development
> > > The Advantage Companies
> > > s...@advantage-companies.com
> > > www.advantage-companies.com
> >
>



-- 
Sean Adams-Hiett
Owner, Web Geeks For Hire
phone: (361) 433.5748
email: s...@webgeeksforhire.com
twitter: @geekbusiness 


Evaluating Solr

2012-04-04 Thread Joseph Werner
Hi,

I'm evaluating Solr for use in a project. In the Solr FAQ under "How can I
rebuild my index from scratch if I change my schema?"  After restarting the
server, step  5 is to "Re-Index your data" no mention is made of how this
is done.

For more routine changes, are record updates supported without the
necessitity to rebuilt an index? For example if a description field for an
item needs be changed, am I correct in reading that the recodrd need only
be resubmitted?


-- 
Best Regards,
[Joseph] Christian Werner Sr


Re: Evaluating Solr

2012-04-04 Thread Glen Newton
"Re-Index your data"  ~= Reload your data

On Wed, Apr 4, 2012 at 12:46 PM, Joseph Werner  wrote:
> Hi,
>
> I'm evaluating Solr for use in a project. In the Solr FAQ under "How can I
> rebuild my index from scratch if I change my schema?"  After restarting the
> server, step  5 is to "Re-Index your data" no mention is made of how this
> is done.
>
> For more routine changes, are record updates supported without the
> necessitity to rebuilt an index? For example if a description field for an
> item needs be changed, am I correct in reading that the recodrd need only
> be resubmitted?
>
>
> --
> Best Regards,
> [Joseph] Christian Werner Sr



-- 
-
http://zzzoot.blogspot.com/
-


Re: Evaluating Solr

2012-04-04 Thread Yonik Seeley
On Wed, Apr 4, 2012 at 12:46 PM, Joseph Werner  wrote:
> For more routine changes, are record updates supported without the
> necessitity to rebuilt an index? For example if a description field for an
> item needs be changed, am I correct in reading that the recodrd need only
> be resubmitted?

Correct.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


JNDI in db-data-config.xml websphere

2012-04-04 Thread tech20nn
I am trying to use jndiName attribute in db-data-config.xml. This works great
in tomcat. However having issues in websphere. 

Following exception is thrown 

"Make sure that a J2EE application does not execute JNDI operations on
"java:" names within static code blocks or in threads created by that J2EE
application.  Such code does not necessarily run on the thread of a server
application request and therefore is not supported by JNDI operations on
"java:" names. [Root exception is javax.naming.NameNotFoundException: Name
comp/env/jdbc not found in context "java:"."


It seems like websphere has issues accessing jndi resource from Static code.
Has anyone experienced this ?

Thanks & Regards


--
View this message in context: 
http://lucene.472066.n3.nabble.com/JNDI-in-db-data-config-xml-websphere-tp3884787p3884787.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does any one know when Solr 4.0 will be released.

2012-04-04 Thread Darren Govoni
No one knows. But if you ask the devs, they will say 'when its done'.

One clue might be to monitor the bugs/issues scheduled for 4.0. When
they are all resolved, then its ready.

On Wed, 2012-04-04 at 09:41 -0700, srinivas konchada wrote:
> Hello every one
> Does any one know when Solr 4.0 will be released? there is a specific
> feature that exists in 4.0 which we want to take advantage off. Problem is
> we cannot deploy some thing into production from trunk. We need to use an
> official release.
> 
> 
> Thanks
> Srinivas Konchada




Re: UTF-8 encoding

2012-04-04 Thread Erik Hatcher
Apologies for not replying sooner on this thread, I just noticed it today...

To add insight into where velocity.properties can reside, it is used this way 
in VelocityResponseWriter.java:

SolrVelocityResourceLoader resourceLoader =
new 
SolrVelocityResourceLoader(request.getCore().getSolrConfig().getResourceLoader());
String propFile = request.getParams().get("v.properties");
  is = resourceLoader.getResourceStream(propFile);
  Properties props = new Properties();
  props.load(is);
  engine.init(props);

SolrVelocityResourceLoader is a pass-through hook to Solr's ResourceLoader, 
allowing it to load resources from:

  /** Opens any resource by its name.
   * By default, this will look in multiple locations to load the resource:
   * $configDir/$resource (if resource is not absolute)
   * $CWD/$resource
   * otherwise, it will look for it in any jar accessible through the class 
loader.
   * Override this method to customize loading resources.
   *@return the stream for the named resource
   */
  public InputStream openResource(String resource)

So that file could conceivably live in many places, but conf/ is where I'd put 
it.

I've just updated the wiki documentation to say this instead:

v.properties: specifies a Velocity properties file to be applied, found using 
the Solr resource loader mechanism. If not specified, no .properties file is 
loaded. Example: v.properties=velocity.properties where velocity.properties can 
be found using Solr's resource loader mechanism, for example in the conf/ 
directory (not conf/velocity which is for templates only). The .properties file 
could also be located inside a JAR in the lib/ directory, or other locations.

Feel free to modify that if it needs improving.

Thanks,
Erik


On Apr 4, 2012, at 04:29 , henri wrote:

> I have finally solved my problem!!
> 
> Did the following:
> 
> added two lines in the /browse requestHandler
>   velocity.properties
>   text/html;charset=UTF-8
> 
> Moved velocity.properties from solr/conf/velocity to solr/conf
> 
> Not being an expert, I am not 100% sure this is the "best" solution, and
> where and how it should be documented in the solr/velocity package. I will
> leave this doc update to afficionados.
> 
> Cheers to all,
> Henri
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3883485.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: solrcloud is deleteByQuery stored in transactions and forwarded like other operations?

2012-04-04 Thread Jamie Johnson
Thanks Mark.  The delete by query is a very rare operation for us and
I really don't have the liberty to update to current trunk right now.
Do you happen to know about when the fix was made so I can see if we
are before or after that time?

On Wed, Apr 4, 2012 at 10:25 AM, Mark Miller  wrote:
>
> On Apr 3, 2012, at 10:35 PM, Jamie Johnson wrote:
>
>> I haven't personally seen this issue but I have been told by another
>> developer that he ran a deleteByQuery("*:*").  This deleted the index,
>> but on restart there was information still in the index.  Should this
>> be possible?  I had planned to setup something to test this locally
>> but wanted to know if anyone is aware of anything like this.
>
>
> It *shouldn't* be possible, but it depends on many things. At this point, I 
> suspect either a bug or its an old build. Towards the end of the recent 
> SolrCloud spurt, Yonik did some work on DBQ and recovery. A build before that 
> could have had issues like this. If you see it on a recent build, I'd def 
> file a JIRA.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread vybe3142
Thanks.

Increasing max. heap space is not a scalable option as it reduces the
ability of the system to scale with multiple concurrent index requests.

The use case is indexing a set of text files which we have no control over
i.e. could be small or large. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Incrementally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3885233.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solrcloud is deleteByQuery stored in transactions and forwarded like other operations?

2012-04-04 Thread Yonik Seeley
On Wed, Apr 4, 2012 at 3:04 PM, Jamie Johnson  wrote:
> Thanks Mark.  The delete by query is a very rare operation for us and
> I really don't have the liberty to update to current trunk right now.
> Do you happen to know about when the fix was made so I can see if we
> are before or after that time?

Not difinitive, but a grep of "svn log" in solr/core shows:

r1295665 | yonik | 2012-03-01 11:41:54 -0500 (Thu, 01 Mar 2012) | 1 line
cloud: fix distributed deadlock w/ deleteByQuery

r1243773 | yonik | 2012-02-13 22:00:22 -0500 (Mon, 13 Feb 2012) | 1 line
dbq: fix param rename

r1243768 | yonik | 2012-02-13 21:45:41 -0500 (Mon, 13 Feb 2012) | 1 line
solrcloud: send deleteByQuery to all shard leaders to version and
forward to replicas


-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread vybe3142

> Updating a single field is not possible in solr.  The whole record has to 
> be rewritten. 

Unfortunate. Lucene allows it.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Incrementally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3885253.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread Yonik Seeley
On Wed, Apr 4, 2012 at 3:14 PM, vybe3142  wrote:
>
>> Updating a single field is not possible in solr.  The whole record has to
>> be rewritten.
>
> Unfortunate. Lucene allows it.

I think you're mistaken - the same limitations apply to Lucene.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread Walter Underwood
I believe we are talking about two different things. The original question was 
about incrementally building up a field during indexing, right? 

After a document is committed, a field cannot be separately updated, that is 
true in both Lucene and Solr.

wunder

On Apr 4, 2012, at 12:20 PM, Yonik Seeley wrote:

> On Wed, Apr 4, 2012 at 3:14 PM, vybe3142  wrote:
>> 
>>> Updating a single field is not possible in solr.  The whole record has to
>>> be rewritten.
>> 
>> Unfortunate. Lucene allows it.
> 
> I think you're mistaken - the same limitations apply to Lucene.
> 
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10







SolrCloud on multiple appservers

2012-04-04 Thread solr user
Does anyone have a blog, wiki with detailed step by step instructions on
setting up SOLRCloud on multiple JBOSS instances?

Thanks in advance,


SOLRCloud on appserver

2012-04-04 Thread SOLRUSER
Does anyone have any instructions on setting up SOLRCloud on multiple
appservers? Ideally a wiki, blog, step-by-step guide I can follow.



Re: space making it hard tu use wilcard with lucene parser

2012-04-04 Thread jmlucjav
thanks, that will work I think

--
View this message in context: 
http://lucene.472066.n3.nabble.com/space-making-it-hard-tu-use-wilcard-with-lucene-parser-tp3882534p3885460.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr with UIMA

2012-04-04 Thread Tommaso Teofili
Hi again Chris,

I finally manage to find some proper time to test your configuration.
First thing to notice is that it worked for me assuming the following
pre-requisites were satisfied:
- you had the jar containing the AnalysisEngine for the RoomAnnotator.xml
in your libraries section (this is actually the uimaj-examples.jar which is
shipped with the UIMA SDK under libs[1]) :
- you had the solr-uima jar in your libraries

the above are done adding the following lines to the solrconfig (usually on
the top of the file just beneath the  element)

  
  
  

If you want to know what's going wrong I'd advice to not ignore errors
within the UIMAUpdateProcessor configuration:
false

What I get if I run your same curl command and then make a *:* query is :


  
   0
   2
   
 xml
 0
 *:*
 10
   
   
   
 
   4
Test Room HAW GN-K35

   Hawthorne
 
  

   

which look ok to me.
Hope this helps.
Tommaso

[1] : http://mirror.switch.ch/mirror/apache/dist//uima///uimaj-2.3.1-bin.zip

2012/3/28 chris3001 

> Tommaso,
> Thank you so much for looking into this, I am very grateful!
>
> Chris
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3865291.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Using UIMA in Solr behind a firewall

2012-04-04 Thread Tommaso Teofili
Hello Peter,

I think that is more related to UIMA AlchemyAPIAnnotator [1] or to
AlchemyAPI services themselves [2] because Solr just use the out of the box
UIMA AnalysisEngine for that.
Thus it may make sense to ask on d...@uima.apache.org (or even directly to
AlchemyAPI guys).
HTH,
Tommaso

[1] : http://uima.apache.org/sandbox.html#alchemy.annotator
[2] : http://www.alchemyapi.com/api/calling.html

2012/4/2 kodo 

> Hi!
>
> I'm desperately trying to work out how to configure Solr in order to allow
> it to make calls to the Alchemy service through the UIMA analysis engines.
> Is there anybody who has been able to accomplish this?
>
> Cheers
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-UIMA-in-Solr-behind-a-firewall-tp3877143p3877143.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread jmlucjav
depending on you jvm version, -XX:+UseCompressedStrings would help alleviate
the problem. It did help me before.

xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Incrementally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3885493.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Distributed grouping issue

2012-04-04 Thread Young, Cody
Hi Martijn,

I created a JIRA issue and attached a test that fails. It seems to exhibit the 
same issue that I see on my local box. (If you run it multiple times you can 
see that the group value of the top doc changes between runs.)

Also, I had to change add fixShardCount = true; in the constructor of the 
TestDistributedGrouping class, which caused another test case to fail. (It's 
commented out in the patch with a TODO above it.)

Please let me know if you need any other information.

https://issues.apache.org/jira/browse/SOLR-3316

Thanks!!
Cody

-Original Message-
From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of 
Martijn v Groningen
Sent: Monday, April 02, 2012 10:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

I tried the to reproduce this. However the matches always returns 4 in my case 
(when using rows=1 and rows=2).
In your case the 2 documents on each core do belong to the same group, right?

I did find something else. If I use rows=0 then an error occurs. I think we 
need to further investigate this.
Can you open an issue in Jira? I'm a bit busy today. We can then further look 
into this in the coming days.

Martijn

On 2 April 2012 23:00, Young, Cody  wrote:

> Okay, I've played with this a bit more. Found something interesting:
>
> When the groups returned do not include results from a core, then the 
> core is excluded from the count. (I have 1 group, 2 documents per 
> core)
>
> Example:
>
>
> http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/s
> olr/core0,localhost:8983/solr/core1&group=true&group.field=group_field
> &group.limit=10&rows=1
>
> 
> 
> 2
>
> Then, just by changing rows=2
>
>
> http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/s
> olr/core0,localhost:8983/solr/core1&group=true&group.field=group_field
> &group.limit=10&rows=2
>
> 
> 
> 4
>
> Let me know if you have any luck reproducing.
>
> Thanks,
> Cody
>
> -Original Message-
> From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On 
> Behalf Of Martijn v Groningen
> Sent: Monday, April 02, 2012 1:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Distributed grouping issue
>
> >
> > All documents of a group exist on a single shard, there are no 
> > cross-shard groups.
> >
> You only have to partition documents by group when the groupCount and 
> some other features need to be accurate. For the "matches" this is not 
> necessary. The matches are summed up during merging the shared responses.
>
> I can't reproduce the error you are describing on a small local setup 
> I have here. I have two Solr cores with a simple schema. Each core has 
> 3 documents. When grouping the matches element returns 6. I'm running 
> on a trunk that I have updated 30 minutes ago. Can you try to isolate 
> the problem by testing with a small subset of your data?
>
> Martijn
>



--
Met vriendelijke groet,

Martijn van Groningen


Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread vybe3142
  
Yonik Seeley-2-2 wrote
> 
> On Wed, Apr 4, 2012 at 3:14 PM, vybe3142  wrote:
>>
>>> Updating a single field is not possible in solr.  The whole record has
>>> to
>>> be rewritten.
>>
>> Unfortunate. Lucene allows it.
> 
> I think you're mistaken - the same limitations apply to Lucene.
> 
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10
> 

You're correct (and I stand corrected). 

I looked at our older codebase that used lucene. I need to dig deeper to
understand how come it doesn't crash when invoking addField() multiple times
on each portion of the large text data whereas SOLR does. Speaking to the
developer who wrote that code, we resorted to multiple addField()
invocations to address the heap space issue.

I'll post back



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Incrementally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3885711.html
Sent from the Solr - User mailing list archive at Nabble.com.


waitFlush and waitSearcher with SolrServer.add(docs, commitWithinMs)

2012-04-04 Thread Mike O'Leary
If you index a set of documents with SolrJ and use
StreamingUpdateSolrServer.add(Collection docs, int 
commitWithinMs),
it will perform a commit within the time specified, and it seems to use default 
values for waitFlush and waitSearcher.

Is there a place where you can specify different values for waitFlush and 
waitSearcher, or if you want to use different values do you have to call 
StreamingUpdateSolrServer.add(Collection docs)
and then call
StreamingUpdateSolrServer.commit(waitFlush, waitSearcher)
explicitly?
Thanks,
Mike


Re: waitFlush and waitSearcher with SolrServer.add(docs, commitWithinMs)

2012-04-04 Thread Mark Miller

On Apr 4, 2012, at 6:50 PM, Mike O'Leary wrote:

> If you index a set of documents with SolrJ and use
> StreamingUpdateSolrServer.add(Collection docs, int 
> commitWithinMs),
> it will perform a commit within the time specified, and it seems to use 
> default values for waitFlush and waitSearcher.
> 
> Is there a place where you can specify different values for waitFlush and 
> waitSearcher, or if you want to use different values do you have to call 
> StreamingUpdateSolrServer.add(Collection docs)
> and then call
> StreamingUpdateSolrServer.commit(waitFlush, waitSearcher)
> explicitly?
> Thanks,
> Mike


waitFlush actually does nothing in recent versions of Solr. waitSearcher 
doesn't seem so important when the commit is not done explicitly by the user or 
a client.

- Mark Miller
lucidimagination.com













Re: LBHttpSolrServer to query a preferred server

2012-04-04 Thread Martin Grotzke
Hi,

I just submitted an issue with patch for this:
https://issues.apache.org/jira/browse/SOLR-3318

Cheers,
Martin


On 04/04/2012 03:53 PM, Martin Grotzke wrote:
> Hi,
> 
> we want to use the LBHttpSolrServer (4.0/trunk) and specify a preferred
> server. Our use case is that for one user request we make several solr
> requests with some heavy caching (using a custom request handler with a
> special cache) and want to make sure that the subsequent solr requests
> are hitting the same solr server.
> 
> A possible solution with LBHttpSolrServer would look like this:
> - LBHttpSolrServer provides a method getSolrServer() that returns a
> ServerWrapper
> - LBHttpSolrServer provides a method
>request(final SolrRequest request, ServerWrapper preferredServer)
>   that returns the response (NamedList).
> 
> This method first tries the specified preferredServer and if this fails
> queries all others (first alive servers then zombies).
> 
> What do you think of this solution? Any other solution preferred?
> 
> I'll start implementing this and submit an issue/patch hoping that it
> makes it into trunk.
> 
> Cheers,
> Martin
> 



signature.asc
Description: OpenPGP digital signature


RE: waitFlush and waitSearcher with SolrServer.add(docs, commitWithinMs)

2012-04-04 Thread Mike O'Leary
I am indexing some database contents using add(docs, commitWithinMs), and those 
add calls are taking over 80% of the time once the database begins returning 
results. I was wondering if setting waitSearcher to false would speed this up. 
Many of the calls take 1 to 6 seconds, with one outlier that took over 11 
minutes.
Thanks,
Mike

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, April 04, 2012 4:15 PM
To: solr-user@lucene.apache.org
Subject: Re: waitFlush and waitSearcher with SolrServer.add(docs, 
commitWithinMs)


On Apr 4, 2012, at 6:50 PM, Mike O'Leary wrote:

> If you index a set of documents with SolrJ and use 
> StreamingUpdateSolrServer.add(Collection docs, int 
> commitWithinMs), it will perform a commit within the time specified, and it 
> seems to use default values for waitFlush and waitSearcher.
> 
> Is there a place where you can specify different values for waitFlush 
> and waitSearcher, or if you want to use different values do you have 
> to call StreamingUpdateSolrServer.add(Collection 
> docs) and then call StreamingUpdateSolrServer.commit(waitFlush, waitSearcher) 
> explicitly?
> Thanks,
> Mike


waitFlush actually does nothing in recent versions of Solr. waitSearcher 
doesn't seem so important when the commit is not done explicitly by the user or 
a client.

- Mark Miller
lucidimagination.com













Is there any performance cost of using lots of OR in the solr query

2012-04-04 Thread roz dev
Hi All,

I am working on an application which makes few solr calls to get the data.

On the high level, We have a requirement like this


   - Make first call to Solr, to get the list of products which are
   children of a given category
   - Make 2nd solr call to get product documents based on a list of product
   ids

2nd query will look like

q=document_type:SKU&fq=product_id:(34 OR 45 OR 56 OR 77)

We can have close to 100 product ids in fq.

is there a performance cost of doing these solr calls which have lots of OR?

As per Slide # 41 of Presentation "The Seven Deadly Sins of Solr", it is a
bad idea to have these kind of queries.

http://www.slideshare.net/lucenerevolution/hill-jay-7-sins-of-solrpdf

But, It does not become clear the reason it is bad.

Any inputs will be welcome.

Thanks

Saroj


Duplicates in Facets

2012-04-04 Thread Jamie Johnson
I am currently indexing some information and am wondering why I am
getting duplicates in facets.  From what I can tell they are the same,
but is there any case that could cause this that I may not be thinking
of?  Could this be some non printable character making it's way into
the index?


Sample output from luke


  
string
I--M---OFl
*_umvs
(unstored field)
332
-1

  328
  124
  36
  20
  4



Re: Duplicates in Facets

2012-04-04 Thread Darren Govoni
Try using Luke to look at your index and see if there are multiple
similar TFV's. You can browse them easily in Luke.

On Wed, 2012-04-04 at 23:35 -0400, Jamie Johnson wrote:
> I am currently indexing some information and am wondering why I am
> getting duplicates in facets.  From what I can tell they are the same,
> but is there any case that could cause this that I may not be thinking
> of?  Could this be some non printable character making it's way into
> the index?
> 
> 
> Sample output from luke
> 
> 
>   
> string
> I--M---OFl
> *_umvs
> (unstored field)
> 332
> -1
> 
>   328
>   124
>   36
>   20
>   4
> 
> 




Re: Duplicates in Facets

2012-04-04 Thread Jamie Johnson
Yes, thanks for the reply.  Turns out there is whitespace differences
in these fields, thank you for the quick reply!

On Wed, Apr 4, 2012 at 11:45 PM, Darren Govoni  wrote:
> Try using Luke to look at your index and see if there are multiple
> similar TFV's. You can browse them easily in Luke.
>
> On Wed, 2012-04-04 at 23:35 -0400, Jamie Johnson wrote:
>> I am currently indexing some information and am wondering why I am
>> getting duplicates in facets.  From what I can tell they are the same,
>> but is there any case that could cause this that I may not be thinking
>> of?  Could this be some non printable character making it's way into
>> the index?
>>
>>
>> Sample output from luke
>>
>> 
>>   
>>     string
>>     I--M---OFl
>>     *_umvs
>>     (unstored field)
>>     332
>>     -1
>>     
>>       328
>>       124
>>       36
>>       20
>>       4
>>     
>>
>
>


Re: solrcloud is deleteByQuery stored in transactions and forwarded like other operations?

2012-04-04 Thread Jamie Johnson
My snapshot was taken 2/27.  That would seem to indicate that the
deleteByQuery should be getting versioned, I am not sure if the other
issues that were resolved would change the operation.  I'll keep an
eye on it and if it pops up I'll try to push the update.  Thanks.

On Wed, Apr 4, 2012 at 3:12 PM, Yonik Seeley  wrote:
> On Wed, Apr 4, 2012 at 3:04 PM, Jamie Johnson  wrote:
>> Thanks Mark.  The delete by query is a very rare operation for us and
>> I really don't have the liberty to update to current trunk right now.
>> Do you happen to know about when the fix was made so I can see if we
>> are before or after that time?
>
> Not difinitive, but a grep of "svn log" in solr/core shows:
>
> r1295665 | yonik | 2012-03-01 11:41:54 -0500 (Thu, 01 Mar 2012) | 1 line
> cloud: fix distributed deadlock w/ deleteByQuery
>
> r1243773 | yonik | 2012-02-13 22:00:22 -0500 (Mon, 13 Feb 2012) | 1 line
> dbq: fix param rename
>
> r1243768 | yonik | 2012-02-13 21:45:41 -0500 (Mon, 13 Feb 2012) | 1 line
> solrcloud: send deleteByQuery to all shard leaders to version and
> forward to replicas
>
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10


Re: SolrCloud replica and leader out of Sync somehow

2012-04-04 Thread Jamie Johnson
Not sure if this got lost in the shuffle, were there any thoughts on this?

On Wed, Mar 21, 2012 at 11:02 AM, Jamie Johnson  wrote:
> Given that in a distributed environment the docids are not guaranteed
> to be the same across shards should the sorting use the uniqueId field
> as the tie breaker by default?
>
> On Tue, Mar 20, 2012 at 2:10 PM, Yonik Seeley
>  wrote:
>> On Tue, Mar 20, 2012 at 2:02 PM, Jamie Johnson  wrote:
>>> I'll try to dig for the JIRA.  Also I'm assuming this could happen on
>>> any sort, not just score correct?  Meaning if we sorted by a date
>>> field and there were duplicates in that date field order wouldn't be
>>> guaranteed for the same reasons right?
>>
>> Correct - internal docid is the tiebreaker for all sorts.
>>
>> -Yonik
>> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
>> Boston May 7-10


alt attribute img tag

2012-04-04 Thread Manuel Antonio Novoa Proenza
Hello, 

I would like to know the method of extracting from the images that are in html 
documents Alt attribute data 

















10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: query time customized boosting

2012-04-04 Thread Monmohan Singh
Hi,
Any inputs or experience that others have come across will be really
helpful to know.
Basically, its the same as page ranking but the information used to decide
the rank is much more dynamic in nature..
Appreciate any inputs.
Regards
Monmohan

On Wed, Apr 4, 2012 at 4:22 PM, monmohan  wrote:

> Hi,
> My index is composed of documents with an "author" field. My system is a
> users portal where they can have a friend relationship among each other.
> When a user searches for documents, I would like to boost score of docs in
> which  author is friend of the user doing the search. Note that the list of
> friends for a user can be potentially big and dynamic (changing as the user
> makes more friends)
>
> Is there a way to do this kind of boosting at query time? I have looked at
> External field, query elevator and function queries but it appears that
> none
> of them
>
> Since the list of friends for a user is dynamic and per user based, it
> can't
> really be added as a field in the index for each document so I am not
> considering that option at all.
> Regards
> Monmohan
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/query-time-customized-boosting-tp3883743p3883743.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: query time customized boosting

2012-04-04 Thread William Bell
If you have degree of separation (like friend). You could do something like:

...defType=dismax&bq=degree_of_separation:1^100

Thanks.

On Thu, Apr 5, 2012 at 12:55 AM, Monmohan Singh  wrote:
> Hi,
> Any inputs or experience that others have come across will be really
> helpful to know.
> Basically, its the same as page ranking but the information used to decide
> the rank is much more dynamic in nature..
> Appreciate any inputs.
> Regards
> Monmohan
>
> On Wed, Apr 4, 2012 at 4:22 PM, monmohan  wrote:
>
>> Hi,
>> My index is composed of documents with an "author" field. My system is a
>> users portal where they can have a friend relationship among each other.
>> When a user searches for documents, I would like to boost score of docs in
>> which  author is friend of the user doing the search. Note that the list of
>> friends for a user can be potentially big and dynamic (changing as the user
>> makes more friends)
>>
>> Is there a way to do this kind of boosting at query time? I have looked at
>> External field, query elevator and function queries but it appears that
>> none
>> of them
>>
>> Since the list of friends for a user is dynamic and per user based, it
>> can't
>> really be added as a field in the index for each document so I am not
>> considering that option at all.
>> Regards
>> Monmohan
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/query-time-customized-boosting-tp3883743p3883743.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076