Re: change sort order for MoreLikeThis

2009-08-05 Thread Renz Daluz
Thanks guys.
I tried to boost it instead (as sort looks like not supported) but it's not
taking effect. Here are the parameters that I'm using:

I want to boost by time_published field and I enable mlt.boost
&bf=recip(rord(time_published),1,1000,165)^1500&qt=mlt&mlt.boost=true


Regards,
/Renz



2009/8/4 Avlesh Singh 

> >
> > You lost me.
> >
> Absolutely sorry about that Bill :(
>
> How does boosting change the sort order?
>
> What I really meant here is that if you have more than one "similarity"
> fields in you MLT query, you can boost the results found due to one over
> the
> other. It was not at all aimed to be an answer for sort. Actually, I was
> too
> prompt to respond!
>
> What about sorting on a field that is not the mlt field?
> >
> Haven't tried this yet. It would be surprising if it does not work as
> expected.
>
> Cheers
> Avlesh
>
> On Tue, Aug 4, 2009 at 3:24 AM, Bill Au  wrote:
>
> > Avlesh,
> > You lost me.  How does boosting change the sort order?  What about
> > sorting on a field that is not the mlt field?
> >
> > Bill
> >
> > On Mon, Aug 3, 2009 at 3:13 AM, Avlesh Singh  wrote:
> >
> > > You can boost the similarity field matches, if you want. Look for
> > mlt.boost
> > > at http://wiki.apache.org/solr/MoreLikeThis
> > >
> > > Cheers
> > > Avlesh
> > >
> > > On Mon, Aug 3, 2009 at 11:33 AM, Renz Daluz 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm looking at changing the result order when searching by MLT. I
> tried
> > > the
> > > > sort=, but it's not working. I check the wiki and can't
> > > find
> > > > anything. Is there a way to do this?
> > > >
> > > > Thanks,
> > > > /Laurence
> > > >
> > >
> >
>


Re: change sort order for MoreLikeThis

2009-08-05 Thread Renz Daluz
Oh and yes, I tried to sort that is not mlt field and it's not taking
effect:
Here the whole parameters that I'm using:

mlt.fl=text,title&tie=0.01&mlt.mintf=1&mlt.match.include=true&fl=tagged_bucket,tagged_entities&bf=recip(rord(time_published),1,1000,165)^1500&qt=mlt&mlt.minwl=3&mm=5&mlt.boost=true&qf=text^0.5+title^0.4++description^0.01+keywords^0.01+bestlink_keywords^0.1+authors_t^0.05&mlt.maxwl=20&mlt.maxntp=200&mlt.maxqt=10&mlt.interestingTerms=details&rows=200&mlt.mindf=3&pf=text^300+title^10+tagged_entities^200+inbound_text^1+bestlink_keywords^1&q=id:story|25584945c&ps=1&sort=time_published+desc


Thanks,
Renz


2009/8/5 Renz Daluz 

> Thanks guys.
> I tried to boost it instead (as sort looks like not supported) but it's not
> taking effect. Here are the parameters that I'm using:
>
> I want to boost by time_published field and I enable mlt.boost
> &bf=recip(rord(time_published),1,1000,165)^1500&qt=mlt&mlt.boost=true
>
>
> Regards,
> /Renz
>
>
>
> 2009/8/4 Avlesh Singh 
>
> >
>> > You lost me.
>> >
>> Absolutely sorry about that Bill :(
>>
>> How does boosting change the sort order?
>>
>> What I really meant here is that if you have more than one "similarity"
>> fields in you MLT query, you can boost the results found due to one over
>> the
>> other. It was not at all aimed to be an answer for sort. Actually, I was
>> too
>> prompt to respond!
>>
>> What about sorting on a field that is not the mlt field?
>> >
>> Haven't tried this yet. It would be surprising if it does not work as
>> expected.
>>
>> Cheers
>> Avlesh
>>
>> On Tue, Aug 4, 2009 at 3:24 AM, Bill Au  wrote:
>>
>> > Avlesh,
>> > You lost me.  How does boosting change the sort order?  What about
>> > sorting on a field that is not the mlt field?
>> >
>> > Bill
>> >
>> > On Mon, Aug 3, 2009 at 3:13 AM, Avlesh Singh  wrote:
>> >
>> > > You can boost the similarity field matches, if you want. Look for
>> > mlt.boost
>> > > at http://wiki.apache.org/solr/MoreLikeThis
>> > >
>> > > Cheers
>> > > Avlesh
>> > >
>> > > On Mon, Aug 3, 2009 at 11:33 AM, Renz Daluz 
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I'm looking at changing the result order when searching by MLT. I
>> tried
>> > > the
>> > > > sort=, but it's not working. I check the wiki and
>> can't
>> > > find
>> > > > anything. Is there a way to do this?
>> > > >
>> > > > Thanks,
>> > > > /Laurence
>> > > >
>> > >
>> >
>>
>
>


query matching issue

2009-08-05 Thread Radha C.
Hello list,
 
I have documents contains word "Richard Nass". I need to match the "Richard
Nass" documents for the query strings "richard", "nass", "rich".
The search works for the following query ,
 
http://localhost:8983/solr/select?q=author:Richard
 nass 
http://localhost:8983/solr/select?q=author:Richard
 Nass
http://localhost:8983/solr/select?q=author:richard
 nass
 
But doesnot work for q=author:Richard, q=author:nass q=author:rich...
 
I tried wildcard search like q=author:rich* also.
 
Can anyone help me how to get the flexible search as above.
 
Thanks in advance..
 
Radha.C
 


Re: DataImportHandler: Partial Delete and Update (Hacking "deleteQuery" in SOLR 1.3?)

2009-08-05 Thread Chantal Ackermann

Hi Paul,

yes, I did and I just verified in the code. The deletedPkQuery is used 
to collect all primary keys of the root entity that shall be deleted 
from the index.


The deletion is done on the SOLR writer by unique ID:
  writer.deleteDoc(deletedKey.get(root.pk)); //DocBuilder

  delCmd.id = id.toString(); // SOLR Writer deleteDoc()
  delCmd.fromPending = true;
  delCmd.fromCommitted = true;
  processor.processDelete(delCmd);

// RunUpdateProcessorFactory
  @Override
  public void processDelete(DeleteUpdateCommand cmd) throws IOException {
if( cmd.id != null ) {
  updateHandler.delete(cmd); // writer.deleteDoc() uses that
}
else {
  updateHandler.deleteByQuery(cmd); // I would like to use that
}
super.processDelete(cmd);
  }

My problem is that the ids I have to delete are those that do not exist 
in the database anymore. So, I have no means to return them by DB query. 
That is why I would like to use a different field that a group of 
documents has in common, and that would allow me to get hold of the 
outdated documents in the index. (But I have to find out the value of 
that other field by DB query.)


Cheers,
Chantal


Noble Paul നോബിള്‍ नोब्ळ् schrieb:

did you explore the deletedPkQuery ?

On Wed, Aug 5, 2009 at 11:46 AM, Chantal
Ackermann wrote:

Hi all,

the database from which I populate the SOLR index is refreshed
"partially". Subsets of the data is deleted and readded for a certain
group identifier. Is it possible to do something alike in a (delta) import
of the DataImportHandler?

Example:
SOLR-Index:
groupID: 1, PK: 1, refreshDate: [before last_index_time]
groupID: 1, PK: 2, refreshDate: [before last_index_time]
groupID: 1, PK: 3, refreshDate: [before last_index_time]

Refreshed DB:
groupID: 1, PK: 1, refreshDate: [after last_index_time]
groupID: 1, PK: 5, refreshDate: [after last_index_time]
groupID: 1, PK: 30, refreshDate: [after last_index_time]
(PK 2 and 3 are not there, anymore. PK is unique across all groupIDs)

deleteQuery="groupID:1"
(An attribute of the entity element that the DocBuilder (1.3) reads and
sends as query once, before the delta import, unchanged to the SOLR
writer to delete documents.)

After that, the delta import loads data with groupID=1 from the DB.

Could I plug into SOLR with maybe a custom processor to achieve
something in the direction of:

deleteInput="select FIELD_VALUE from TABLE where CHANGED_DATE >
'${dataimporter.last_index_time}' group by FIELD_VALUE"
deleteQuery="field:${my_entity.FIELD_VALUE}"

FIELD_VALUE is not the primary key, and the "deleteInput" query can
return multiple rows.


I am aware of SOLR-1060 and SOLR-1059 but I am not sure that those will
help me. In those cases it looks like the delete is run per entity. I
want the delete to run before the (delta)import, once.
If that impression is wrong, I'll happily switch to 1.4, of course.

Cheers!
Chantal


--
Chantal Ackermann







--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: DataImportHandler: Partial Delete and Update (Hacking "deleteQuery" in SOLR 1.3?)

2009-08-05 Thread Chantal Ackermann

Thanks, Paul! :-)

The wiki doesn't mark $deleteDocByQuery (and the other special commands) 
as 1.4, as it usually does. Maybe it's worth correcting that?


Noble Paul നോബിള്‍ नोब्ळ् schrieb:

ok, writing an EntityProcessor/Transofrmer may help here use the special command
http://wiki.apache.org/solr/DataImportHandler#head-5e9ebf5a2aaa1dc54464102c395ed1bf7cdb98c3

$deleteDocByQuery is what you need .



query matching issue

2009-08-05 Thread Radha C.
Hello list,

I have documents contain word "Richard Nass". I need to match the "Richard
Nass" documents for the query strings "richard", "nass", "rich".

The search works for the following queries ,

 
http://localhost:8983/solr/select?q=author:Richard nass

 
http://localhost:8983/solr/select?q=author:Richard Nass

 
http://localhost:8983/solr/select?q=author:richard nass 


But doesnot work for q=author:Richard, q=author:nass q=author:rich 

I tried wildcard search like q=author:rich* also.

Can anyone help me how to get the flexible search as above.

Thanks in advance..

Radha.C



Re: ClassCastException from custom request handler

2009-08-05 Thread James Brady
OK, problem solved! Well, worked around.

I gave up on the new style plugin loading in a multicore Jetty setup, and
packaged up my plugin in a rebuilt solr.war.

I had tried this before, but only putting the class files in WEB-INF/lib. If
I put a jar file in there, it works.

2009/8/4 Chantal Ackermann 

>
>
> James Brady schrieb:
>
>> Yeah I was thinking T would be SolrRequestHandler too. Eclipse's debugger
>> can't tell me...
>>
>
> You could try disassembling. Or Eclipse opens classes in a very rudimentary
> format when there is no source code attached. Maybe it shows the actual
> return value there, instead of T.
>
>
>> Lot's of other handlers are created with no problem before my plugin falls
>> over, so I don't think it's a problem with T not being what we expected.
>>
>> Do you know of any working examples of plugins I can download and build in
>> my environment to see what happens?
>>
>
> No sorry. I've only overwritten the EntityProcessor from DataImportHandler,
> and that is not configured in solrconfig.xml.
>
>
>
>
>> 2009/8/4 Chantal Ackermann 
>>
>>  Code is from AbstractPluginLoader in the solr plugin package, 1.3 (the
>>> regular stable release, no svn checkout).
>>>
>>>
>>>  80-84
>>>
 @SuppressWarnings("unchecked")
 protected T create( ResourceLoader loader, String name, String
 className, Node node ) throws Exception
 {
  return (T) loader.newInstance( className, getDefaultPackages() );
 }


>>
>> --
>> http://twitter.com/goodgravy
>> 512 300 4210
>> http://webmynd.com/
>> Sent from Bury, United Kingdom
>>
>


-- 
http://twitter.com/goodgravy
512 300 4210
http://webmynd.com/
Sent from Bury, United Kingdom


mergeContiguous for multiple search terms

2009-08-05 Thread Hachmann, Bjoern
Hello,
 
we would like to use the highlightingComponent with the mergeContiguous 
parameter set to true. 
 
We have a field with value: Ökonom Charles Goodhart.
 
If we search for all three words, they are found correctly: Ökonom 
Charles Goodhart
 
But, as I set the mergeContiguous parameter to true, I expected: Ökonom 
Charles Goodhart. Am I misunderstanding the behaviour of this parameter? 
We are using the dismax-query parser and solr-1.3.
 
Thank you very much for your time.
Björn Hachmann
 
 
 


help getting started with spell check dictionary

2009-08-05 Thread Ian Connor
Hi,

I have downloaded a dictionary in plane text format from
http://icon.shef.ac.uk/Moby/mwords.html and added it to my /mnt directory.

When I tried to add:

 
   external
   org.apache.solr.spelling.FileBasedSpellChecker
   /mnt/dictionary.txt
   text
  

within the  block, I thought it
would be as easy as running a query like:

http://localhost:8983/solr/select/?q=cancr&spellcheck=true&spellcheck.build=true

to get it to work. Can anyone tell me what steps I am missing here? Thanks
for any help here.

I was trying to get the idea from the example here:
https://issues.apache.org/jira/browse/SOLR-572 after reading through
http://wiki.apache.org/solr/SpellCheckComponent

-- 
Regards,

Ian Connor


SolrJ and ISO-8859-1

2009-08-05 Thread Schilperoort , René
Hello,

Is it possible to change the encoding of the SolrJ request and response?

Regards, Rene


Index rebuilding.

2009-08-05 Thread caezar

Hi All,

Am I right, when I say, that solr index is rebuild, when 'commit' command
send?
Let's suppose yes. For instance, I have solr index with 1M document, and
then I'm committing one more million documents. Here is some questions:
- will this (second) commit took longer then first one? much longer?
- Will it use some drive space for temporary data, while rebuilding index,
which is then will be free? how much?
- Is is possible to perform searches, which this rebuilding is in progress?

Thanks!
-- 
View this message in context: 
http://www.nabble.com/Index-rebuilding.-tp24829220p24829220.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DisMax - fetching dynamic fields

2009-08-05 Thread Alexey Serba
My bad! Please disregard this post.

Alex

On Tue, Aug 4, 2009 at 9:21 PM, Alexey Serba wrote:
> Solr 1.4 built from trunk revision 790594 ( 02 Jul 2009 )
>
> On Tue, Aug 4, 2009 at 9:19 PM, Alexey Serba wrote:
>> Hi everybody,
>>
>> I have a couple of dynamic fields in my schema, e.g. rating_* popularity_*
>>
>> The problem I have is that if I try to specify existing fields
>> "rating_1 popularity_1" in "fl" parameter - DisMax handler just
>> ignores them whereas StandardRequestHandler works fine.
>>
>> Any clues what's wrong?
>>
>> Thanks in advance,
>> Alex
>>
>


Re: Index rebuilding.

2009-08-05 Thread Shalin Shekhar Mangar
On Wed, Aug 5, 2009 at 8:21 PM, caezar  wrote:

>
> Hi All,
>
> Am I right, when I say, that solr index is rebuild, when 'commit' command
> send?
> Let's suppose yes. For instance, I have solr index with 1M document, and
> then I'm committing one more million documents. Here is some questions:
> - will this (second) commit took longer then first one? much longer?


When you do the second commit, the auto-warming of caches and/or queries on
newSearcher may take longer. Also, during indexing segments may get merged
which may add some time.


> - Will it use some drive space for temporary data, while rebuilding index,
> which is then will be free? how much?


No. Commit should not need extra drive space. An optimize may need
additional space temporarily. But it is always good to have extra free space
on the disk.


>
> - Is is possible to perform searches, which this rebuilding is in progress?
>

Yes.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Wild card search does not return any result

2009-08-05 Thread Mohamed Parvez
Thanks Otis and Avlesh,

Below is the configuration I have

1] solrconfig.xml

.
  
 
   explicit
  false
  false
  1

 
  spellcheck

  
.
.
  

  data-import.xml

  
..
..
  
textSpell

  default
  SPELL
  ./spellcheckerIndex
  true
  true

  

2] data-import.xml

.
..





.
.

3] schema.xml
..
..



..
..





..
..

  






  
  







  

Date: Tue, Aug 4, 2009 at 8:25 PM
Subject: Re: Wild card search does not return any result
To: solr-user@lucene.apache.org


Hi,

I doubt it's a bug.  It's probably working correctly based on the config,
etc., I just don't have enough details about the configuration, your request
handler, query rewriting, the data in your index, etc. to tell you what
exactly is happening.

 Otis


On Tue, Aug 4, 2009 at 11:13 PM, Avlesh Singh  wrote:

> You read it incorrectly Parvez.
> The "bug" that Bill seem to have found out is with the analysis tool and
> NOT
> the search handler itself. Results in your case is as expected. Wildcard
> queries are not analyzed hence the inconsistency.
> A workaround is suggested, on the same thread, here -
>
> http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:i5zxdbnvspgek2bp+state:results
>
> Cheers
> Avlesh
>
> On Wed, Aug 5, 2009 at 12:52 AM, Mohamed Parvez  wrote:
>
> > Thanks Otis, The thread suggests that this is bug
> >
> >
> >
> http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:qinymqdn6mkocv4k
> >
> > Both SSE and ICS are 3 letter word and both are not part of English
> > language.
> > SEE* works fine and ICS* does not work, this is sure a bug.
> >
> > Any idea when will this bug be fixed or if there is any work around.
> >
> > 
> > Thanks/Regards,
> > Parvez
> > GV : 786-693-2228
> >
> >
> > On Tue, Aug 4, 2009 at 11:48 AM, Otis Gospodnetic <
> > otis_gospodne...@yahoo.com> wrote:
> >
> > > Could it be the same reason as described here:
> > >
> > > http://markmail.org/message/ts65a6jok3ii6nva
> > >
> > > Otis
> > > --
> > > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> > >
> > >
> > >
> > > - Original Message 
> > > > From: Mohamed Parvez 
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Tuesday, August 4, 2009 11:26:45 AM
> > > > Subject: Wild card search does not return any result
> > > >
> > > > Hello All,
> > > >
> > > >I have two fields.
> > > >
> > > >
> > > >
> > > >
> > > > I have document(which has been indexed) that has a value of "ICS for
> > BUS
> > > > field" and "SSE for ROLE filed"
> > > >
> > > > When I search for q=BUS:ics i get the result, but if i search for
> > > q=BUS:ics*
> > > > i don't get any match (or result)
> > > >
> > > > when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the
> > > > result.
> > > >
> > > > why BUS:ics* does not return any result ?
> > > >
> > > >
> > > > I have the default configuration for text filed, see below.
> > > >
> > > >
> > > > positionIncrementGap="100">
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ignoreCase="true"
> > > > words="stopwords.txt"
> > > > enablePositionIncrements="true"
> > > > />
> > > >
> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > > >
> > > >
> > > > protected="protwords.txt"/>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ignoreCase="true" expand="true"/>
> > > >
> > > > words="stopwords.txt"/>
> > > >
> > > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> > > >
> > > >
> > > > protected="protwords.txt"/>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 
> > > > Thanks/Regards,
> > > > Parvez
> > > >
> > > > Note : This is a re-post. looks like something went wrong the first
> > time
> > > > around.
> > >
> > >
> >
>


Re: Wild card search does not return any result

2009-08-05 Thread Mohamed Parvez
looks like earlier schema.xml, has some typo.
below is the correct schema.xml

3] schema.xml
..
..





..
..

  






  
  







  
 wrote:

> Thanks Otis and Avlesh,
>
> Below is the configuration I have
>
> 1] solrconfig.xml
> 
> .
>default="true">
>  
>explicit
>   false
>   false
>   1
> 
>  
>   spellcheck
> 
>   
> .
> .
>class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>   data-import.xml
> 
>   
> ..
> ..
>   
> textSpell
> 
>   default
>   SPELL
>   ./spellcheckerIndex
>   true
>   true
> 
>   
>
> 2] data-import.xml
>
> .
> ..
> 
>query="select * from user">
> 
> 
> 
> .
> .
>
> 3] schema.xml
> ..
> ..
> 
> 
> 
>  ..
> ..
> 
> 
> 
>  multiValued="true"/>
> 
> ..
> ..
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  words="stopwords.txt"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
> 
>  protected="protwords.txt"/>
> 
>   
>  
> 
>
>
> To make it simple. I have only one record in the table,
> ID=1
> BUS=ICS
> ROLE=SSE
>
>
> like I said before,
> *I don't get any match, if i search for q=ics*
> I get the match, which is correct result, if i search for q=sse**
>
> I have not done any query rewriting, i am just using the default
> configuration, that comes with solr.
>
> Otis, Let me know if you need any more information.
>
> Avlesh, The above set up is just a striped down version, to figure out what
> is the issue, In my real application, I have 100 of collums in the table,
> that i use for building the search index. I dont think its a good option to
> copy over all the fields and create another 100 odd fields, with just lower
> case filter applied.
>
> 
> Parvez
>
>
> From: Otis Gospodnetic 
> Date: Tue, Aug 4, 2009 at 8:25 PM
> Subject: Re: Wild card search does not return any result
> To: solr-user@lucene.apache.org
>
>
> Hi,
>
> I doubt it's a bug.  It's probably working correctly based on the config,
> etc., I just don't have enough details about the configuration, your request
> handler, query rewriting, the data in your index, etc. to tell you what
> exactly is happening.
>
>  Otis
>
>
> On Tue, Aug 4, 2009 at 11:13 PM, Avlesh Singh  wrote:
>
>> You read it incorrectly Parvez.
>> The "bug" that Bill seem to have found out is with the analysis tool and
>> NOT
>> the search handler itself. Results in your case is as expected. Wildcard
>> queries are not analyzed hence the inconsistency.
>> A workaround is suggested, on the same thread, here -
>>
>> http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:i5zxdbnvspgek2bp+state:results
>>
>> Cheers
>> Avlesh
>>
>> On Wed, Aug 5, 2009 at 12:52 AM, Mohamed Parvez  wrote:
>>
>> > Thanks Otis, The thread suggests that this is bug
>> >
>> >
>> >
>> http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:qinymqdn6mkocv4k
>> >
>> > Both SSE and ICS are 3 letter word and both are not part of English
>> > language.
>> > SEE* works fine and ICS* does not work, this is sure a bug.
>> >
>> > Any idea when will this bug be fixed or if there is any work around.
>> >
>> > 
>> > Thanks/Regards,
>> > Parvez
>> > GV : 786-693-2228
>> >
>> >
>> > On Tue, Aug 4, 2009 at 11:48 AM, Otis Gospodnetic <
>> > otis_gospodne...@yahoo.com> wrote:
>> >
>> > > Could it be the same reason as described here:
>> > >
>> > > http://markmail.org/message/ts65a6jok3ii6nva
>> > >
>> > > Otis
>> > > --
>> > > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>> > >
>> > >
>> > >
>> > > - Original Message 
>> > > > From: Mohamed Parvez 
>> > > > To: solr-user@lucene.apache.org
>> > > > Sent: Tuesday, August 4, 2009 11:26:45 AM
>> > > > Subject: Wild card search does not return any result
>> > > >
>> > > > Hello All,
>> > > >
>> > > >I have two fields.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > I have document(which has been indexed) that has a value of "ICS for
>> > BUS
>> > > > field" and "SSE for ROLE filed"
>> > > >
>> > > > When I search for q=BUS:ics i get the result, but if i search for
>> > > q=BUS:ics*
>> > > > i don't get any match (or result)
>> > > >
>> > > > when I search

Re: THIS WEEK: PNW Hadoop, HBase / Apache Cloud Stack Users' Meeting, Wed Jul 29th, Seattle

2009-08-05 Thread Bradford Stephens
A big "thanks" to everyone who came out despite the heat! Hope to see
you again the last week of August, probably at UW.

On Wed, Jul 29, 2009 at 4:52 PM, Bradford
Stephens wrote:
> Don't forget this is tonight! Excited to see everyone there.
>
> On Tue, Jul 28, 2009 at 11:25 AM, Bradford
> Stephens wrote:
>> Hey everyone,
>>
>> SLIGHT change of plans.
>>
>> A few people have asked me to move to a place with Air Conditioning,
>> since the temperature's in the 90's this week. So, here we go:
>>
>> Big Time Brewing Company
>> 4133 University Way NE
>> Seattle, WA 98105
>>
>> Call me at 904-415-3009 if you have any questions.
>>
>>
>> On Mon, Jul 27, 2009 at 12:16 PM, Bradford
>> Stephens wrote:
>>> Hello again!
>>>
>>> Yes, I know some of us are still recovering from OSCON. It's time for
>>> another delicious meetup to chat about Hadoop, HBase, Solr, Lucene,
>>> and more!
>>>
>>> UW is quite a pain for us to access until August, so we're changing
>>> the venue to one pretty close:
>>>
>>> Piccolo's Pizza
>>> 5301 Roosevelt Way NE
>>> (between 53rd St & 55th St)
>>>
>>> 6:45pm - 8:30 (or when we get bored)!
>>>
>>> As usual, people are more than welcome to give talks, whether they're
>>> long-format or lightning. I'd also really like to start thinking about
>>> hackathons, perhaps we could have one next month?
>>>
>>> I'll be talking about HBase .20 and the possibility of low-latency
>>> HBase Analytics. I'd be very excited to hear what people are up to!
>>>
>>> Contact me if there's any questions: 904-415-3009
>>>
>>> Cheers,
>>> Bradford
>>>
>>> --
>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>>> Media, and Computer Science
>>>
>>
>>
>>
>> --
>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> Media, and Computer Science
>>
>
>
>
> --
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>



-- 
http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi ,

We are planning to use Solr for indexing the server log contents.
The expected processed log file size per day: 100 GB
We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB).

Can any one provide what would be the optimal size of the index that I can 
store on a single server, without hampering the search performance etc.

We are planning to use OSX server with a configuration of 16 GB (Can go to 24 
GB).

We need to figure out how many servers are required to handle such amount of 
data..

Any help would be greatly appreciated.

Thanks
SilentSurfer


  



Re: sole 1.3: bug in phps response writer

2009-08-05 Thread Poohneat

Hey Otis, 
I don't think this issue has been solved yet. I am working with Solr 1.3
release and yet i get the same exception as the original post. 
I have Solr 1.3 release with the localsolr jars. 

Any advice is helpful ... for now i will use the json response writer and
work around this bug. 

Thanks 
--
take care


Otis Gospodnetic wrote:
> 
> Hi Alok,
> 
> I don't think it's a known issue and 2. a) sounds like the best and most
> appreciated approach! :)
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> 
> 
> From: Alok Dhir 
> To: solr-user@lucene.apache.org
> Sent: Monday, November 17, 2008 12:36:25 PM
> Subject: sole 1.3: bug in phps response writer
> 
> Distributed queries:
> 
> curl
> 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2&version=2.2&start=0&rows=10&q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALogin&wt=php'
> 
> curl
> 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2&version=2.2&start=0&rows=10&q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALogin&wt=xml
> 
> curl
> 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2&version=2.2&start=0&rows=10&q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALogin&wt=json''
> 
> All work fine, providing identical results in their respective formats
> (note the change in the wt param).
> 
> curl
> 'http://devxen0:8983/solr/core0/select?shards=search3:8983/solr/core0,search3:8983/solr/core2&version=2.2&start=0&rows=10&q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALogin&wt=phps'
> 
> fails with:
> 
> java.lang.IllegalArgumentException: Map size must not be negative
> at
> org.apache.solr.request.PHPSerializedWriter.writeMapOpener(PHPSerializedResponseWriter.java:195)
> at
> org.apache.solr.request.JSONWriter.writeSolrDocument(JSONResponseWriter.java:392)
> at
> org.apache.solr.request.JSONWriter.writeSolrDocumentList(JSONResponseWriter.java:547)
> at
> org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:147)
> at
> org.apache.solr.request.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java:150)
> at
> org.apache.solr.request.PHPSerializedWriter.writeNamedList(PHPSerializedResponseWriter.java:71)
> at
> org.apache.solr.request.PHPSerializedWriter.writeResponse(PHPSerializedResponseWriter.java:66)
> at
> org.apache.solr.request.PHPSerializedResponseWriter.write(PHPSerializedResponseWriter.java:47)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> at org.mortbay.jetty.Server.handle(Server.java:285)
> at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> 
> Questions:
> 
> 1) Is this known?  I didn't see it in the issue treacker.
> 
> 2) What's the better course of action: a) download source, fix, submit
> patch, wait for new relase; b) drop phps and use json instead?
> 
> Thanks
> 

-- 
View this message in context: 
http://www.nabble.com/sole-1.3%3A-bug-in-phps-response-writer-tp20544146p24834570.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Limit of Index size per machine..

2009-08-05 Thread Ian Connor
I try to keep the index directory size less than the amount of RAM and rely
on the OS to cache as it needs. Linux does a pretty good job here and I am
sure OS X will do a good job also.

Distributed search here will be your friend so you can chunk it up to a
number of servers to keep your cost down (2GB RAM sticks are much cheaper
than 4GB RAM sticks $20 < $100).

Ian.

On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer wrote:

>
> Hi ,
>
> We are planning to use Solr for indexing the server log contents.
> The expected processed log file size per day: 100 GB
> We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB).
>
> Can any one provide what would be the optimal size of the index that I can
> store on a single server, without hampering the search performance etc.
>
> We are planning to use OSX server with a configuration of 16 GB (Can go to
> 24 GB).
>
> We need to figure out how many servers are required to handle such amount
> of data..
>
> Any help would be greatly appreciated.
>
> Thanks
> SilentSurfer
>
>
>
>
>


-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor


RE: 99.9% uptime requirement

2009-08-05 Thread Robert Petersen
Maintenance Questions:  In a two slave one master setup where the two
slaves are behind load balancers what happens if I have to restart solr?
If I have to restart solr say for a schema update where I have added a
new field then what is the recommended procedure?

If I can guarantee no commits or optimizes happen on the master during
the schema update so no new snapshots become available then can I safely
leave rsyncd enabled?  When I stop and start a slave server, should I
first pull it out of the load balancers list or will solr gracefully
release connections as it shuts down so no searches are lost?

What do you guys do to push out updates?

Thanks for any thoughts,
Robi


-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Tuesday, August 04, 2009 8:57 AM
To: solr-user@lucene.apache.org
Subject: Re: 99.9% uptime requirement

Right. You don't get to 99.9% by assuming that an 8 hour outage is OK.  
Design for continuous uptime, with plans for how long it takes to  
patch around a single point of failure. For example, if your load  
balancer is a single point of failure, make sure that you can redirect  
the front end servers to a single Solr server in much less than 8 hours.

Also, think about your SLA. Can the search index be more than 8 hours  
stale? How quickly do you need to be able to replace a failed indexing  
server? You might be able to run indexing locally on each search  
server if they are lightly loaded.

wunder

On Aug 4, 2009, at 7:11 AM, Norberto Meijome wrote:

> On Mon, 3 Aug 2009 13:15:44 -0700
> "Robert Petersen"  wrote:
>
>> Thanks all, I figured there would be more talk about daemontools if  
>> there
>> were really a need.  I appreciate the input and for starters we'll  
>> put two
>> slaves behind a load balancer and grow it from there.
>>
>
> Robert,
> not taking away from daemon tools, but daemon tools won't help you  
> if your
> whole server goes down.
>
> don't put all your eggs in one basket - several
> servers, load balancer (hardware load balancers x 2, haproxy, etc)
>
> and sure, use daemon tools to keep your services running within each  
> server...
>
> B
> _
> {Beto|Norberto|Numard} Meijome
>
> "Why do you sit there looking like an envelope without any address  
> on it?"
>  Mark Twain
>
> I speak for myself, not my employer. Contents may be hot. Slippery  
> when wet.
> Reading disclaimers makes you go blind. Writing them is worse. You  
> have been
> Warned.
>



enablereplication does not work

2009-08-05 Thread solr jay
Hi,

http://localhost:8549/solr/replication?command=enablereplication

does not seem working. After making the request, I run

http://localhost:8549/solr/replication?command=indexversion

and here is the response:




0
0

0
0



Notice the indexversion is 0, which is the value after you disable
replication. On the other hand

http://localhost:8549/solr/replication?command=details

returns:





0
7



692 bytes


  /tmp/solr/solrdata/index


true
false
1249517184279
2


commit




This response format is experimental.  It is likely to change in the future.




Notice that the indexversion is 1249517184279.

thanks,

-- 
J


Re: Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi,

That means we need approximately 3000 GB (Index Size)/24 GB (RAM) = 125 
servers. 

It would be very hard to convince my org to go for 125 servers for log 
management of 3 Terabytes of indexes. 

Has any one used, solr for processing and handling of the indexes of the order 
of 3 TB ? If so how many servers were used for indexing alone.

Thanks,
sS


--- On Wed, 8/5/09, Ian Connor  wrote:

> From: Ian Connor 
> Subject: Re: Limit of Index size per machine..
> To: solr-user@lucene.apache.org
> Date: Wednesday, August 5, 2009, 9:38 PM
> I try to keep the index directory
> size less than the amount of RAM and rely
> on the OS to cache as it needs. Linux does a pretty good
> job here and I am
> sure OS X will do a good job also.
> 
> Distributed search here will be your friend so you can
> chunk it up to a
> number of servers to keep your cost down (2GB RAM sticks
> are much cheaper
> than 4GB RAM sticks $20 < $100).
> 
> Ian.
> 
> On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer wrote:
> 
> >
> > Hi ,
> >
> > We are planning to use Solr for indexing the server
> log contents.
> > The expected processed log file size per day: 100 GB
> > We are expecting to retain these indexes for 30 days
> (100*30 ~ 3 TB).
> >
> > Can any one provide what would be the optimal size of
> the index that I can
> > store on a single server, without hampering the search
> performance etc.
> >
> > We are planning to use OSX server with a configuration
> of 16 GB (Can go to
> > 24 GB).
> >
> > We need to figure out how many servers are required to
> handle such amount
> > of data..
> >
> > Any help would be greatly appreciated.
> >
> > Thanks
> > SilentSurfer
> >
> >
> >
> >
> >
> 
> 
> -- 
> Regards,
> 
> Ian Connor
> 1 Leighton St #723
> Cambridge, MA 02141
> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> Fax: +1(770) 818 5697
> Skype: ian.connor
> 


  



Re: Limit of Index size per machine..

2009-08-05 Thread Walter Underwood
That is why people don't use search engines to manage logs. Look at a  
Hadoop cluster.


wunder

On Aug 5, 2009, at 10:08 PM, Silent Surfer wrote:



Hi,

That means we need approximately 3000 GB (Index Size)/24 GB (RAM) =  
125 servers.


It would be very hard to convince my org to go for 125 servers for  
log management of 3 Terabytes of indexes.


Has any one used, solr for processing and handling of the indexes of  
the order of 3 TB ? If so how many servers were used for indexing  
alone.


Thanks,
sS


--- On Wed, 8/5/09, Ian Connor  wrote:


From: Ian Connor 
Subject: Re: Limit of Index size per machine..
To: solr-user@lucene.apache.org
Date: Wednesday, August 5, 2009, 9:38 PM
I try to keep the index directory
size less than the amount of RAM and rely
on the OS to cache as it needs. Linux does a pretty good
job here and I am
sure OS X will do a good job also.

Distributed search here will be your friend so you can
chunk it up to a
number of servers to keep your cost down (2GB RAM sticks
are much cheaper
than 4GB RAM sticks $20 < $100).

Ian.

On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer >wrote:




Hi ,

We are planning to use Solr for indexing the server

log contents.

The expected processed log file size per day: 100 GB
We are expecting to retain these indexes for 30 days

(100*30 ~ 3 TB).


Can any one provide what would be the optimal size of

the index that I can

store on a single server, without hampering the search

performance etc.


We are planning to use OSX server with a configuration

of 16 GB (Can go to

24 GB).

We need to figure out how many servers are required to

handle such amount

of data..

Any help would be greatly appreciated.

Thanks
SilentSurfer








--
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor










Re: Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi,

We initially went with Hadoop path, but as it is one more software based file 
system on top of the OS file system, we didn't get a buy in from our system 
Engineers. i.e In case if we run into any HDFS issues, SEs won't be supporting 
us :(

Regards,
sS

--- On Thu, 8/6/09, Walter Underwood  wrote:

> From: Walter Underwood 
> Subject: Re: Limit of Index size per machine..
> To: solr-user@lucene.apache.org
> Date: Thursday, August 6, 2009, 5:12 AM
> That is why people don't use search
> engines to manage logs. Look at a  
> Hadoop cluster.
> 
> wunder
> 
> On Aug 5, 2009, at 10:08 PM, Silent Surfer wrote:
> 
> >
> > Hi,
> >
> > That means we need approximately 3000 GB (Index
> Size)/24 GB (RAM) =  
> > 125 servers.
> >
> > It would be very hard to convince my org to go for 125
> servers for  
> > log management of 3 Terabytes of indexes.
> >
> > Has any one used, solr for processing and handling of
> the indexes of  
> > the order of 3 TB ? If so how many servers were used
> for indexing  
> > alone.
> >
> > Thanks,
> > sS
> >
> >
> > --- On Wed, 8/5/09, Ian Connor 
> wrote:
> >
> >> From: Ian Connor 
> >> Subject: Re: Limit of Index size per machine..
> >> To: solr-user@lucene.apache.org
> >> Date: Wednesday, August 5, 2009, 9:38 PM
> >> I try to keep the index directory
> >> size less than the amount of RAM and rely
> >> on the OS to cache as it needs. Linux does a
> pretty good
> >> job here and I am
> >> sure OS X will do a good job also.
> >>
> >> Distributed search here will be your friend so you
> can
> >> chunk it up to a
> >> number of servers to keep your cost down (2GB RAM
> sticks
> >> are much cheaper
> >> than 4GB RAM sticks $20 < $100).
> >>
> >> Ian.
> >>
> >> On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer
>  
> >> >wrote:
> >>
> >>>
> >>> Hi ,
> >>>
> >>> We are planning to use Solr for indexing the
> server
> >> log contents.
> >>> The expected processed log file size per day:
> 100 GB
> >>> We are expecting to retain these indexes for
> 30 days
> >> (100*30 ~ 3 TB).
> >>>
> >>> Can any one provide what would be the optimal
> size of
> >> the index that I can
> >>> store on a single server, without hampering
> the search
> >> performance etc.
> >>>
> >>> We are planning to use OSX server with a
> configuration
> >> of 16 GB (Can go to
> >>> 24 GB).
> >>>
> >>> We need to figure out how many servers are
> required to
> >> handle such amount
> >>> of data..
> >>>
> >>> Any help would be greatly appreciated.
> >>>
> >>> Thanks
> >>> SilentSurfer
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >> -- 
> >> Regards,
> >>
> >> Ian Connor
> >> 1 Leighton St #723
> >> Cambridge, MA 02141
> >> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> >> Fax: +1(770) 818 5697
> >> Skype: ian.connor
> >>
> >
> >
> >
> >
> 
>







Re: enablereplication does not work

2009-08-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
how is the replicationhandler configured? if there was no
commit/optimize thhen it would show the version as '0'

On Thu, Aug 6, 2009 at 5:50 AM, solr jay wrote:
> Hi,
>
> http://localhost:8549/solr/replication?command=enablereplication
>
> does not seem working. After making the request, I run
>
> http://localhost:8549/solr/replication?command=indexversion
>
> and here is the response:
>
>
> 
> 
> 0
> 0
> 
> 0
> 0
> 
>
>
> Notice the indexversion is 0, which is the value after you disable
> replication. On the other hand
>
> http://localhost:8549/solr/replication?command=details
>
> returns:
>
>
> 
>
> 
> 0
> 7
> 
>
> 
> 692 bytes
>
> 
>  /tmp/solr/solrdata/index
> 
> 
> true
> false
> 1249517184279
> 2
>
> 
> commit
> 
> 
>
> 
> This response format is experimental.  It is likely to change in the future.
> 
> 
>
>
> Notice that the indexversion is 1249517184279.
>
> thanks,
>
> --
> J
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: 99.9% uptime requirement

2009-08-05 Thread Shalin Shekhar Mangar
On Thu, Aug 6, 2009 at 4:10 AM, Robert Petersen  wrote:

> Maintenance Questions:  In a two slave one master setup where the two
> slaves are behind load balancers what happens if I have to restart solr?
> If I have to restart solr say for a schema update where I have added a
> new field then what is the recommended procedure?
>
> If I can guarantee no commits or optimizes happen on the master during
> the schema update so no new snapshots become available then can I safely
> leave rsyncd enabled?  When I stop and start a slave server, should I
> first pull it out of the load balancers list or will solr gracefully
> release connections as it shuts down so no searches are lost?
>

We pull slaves out of the load balancer, wait for 15-20 seconds and then
stop the tomcat process.


>
> What do you guys do to push out updates?
>

Disable the cron job on all slaves (which calls snappuller).
Update schema on master and re-index.
For each slave: Take it out of rotation, stop tomcat, update the schema,
start tomcat, call snappuller, start cron.

This is now a piece of cake with the java based replication in Solr 1.4
which supports replicating configuration too without downtime.

-- 
Regards,
Shalin Shekhar Mangar.


Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-05 Thread Ninad Raut
Hi,
I have a search engine on Solr. Also I have a remote web application which
will be using the Solr Indexes for search.
I have three scenarios:
1) Transfer the Indexes to the Remote Application.

   - This will reduce load on the actual solr server and make seraches
   faster.
   - Need to write some code to transfer the index
   - Need to double my effort to update,merge,optimize index

2)Use HTTP GET

   - Will increase load on the Solr server
   - No extra code needed for transfer

3) Embedded Serach

   - Use SolrJ for querying

I want to know which is the best approach.
Regards,
Ninad Raut.