issue with commit

2008-09-25 Thread sunnyfr

Hi,

I can't find a way to sort out my issue, can somebody please help me ?
My problem is all my logs files look empty and no snapshot created, but
everything seems to work, except this snapshot.
auto commit seems ok according to the stat page, but no log ???  and
snapshot are not created except if I run it manually by command line ?? 


When I check my commit.log nothings is runned
but my config file seems ok to active my commit :


  1
  1000


My snapshooter too: but no log in snapshooter.log


  snapshooter
  
  true
   arg1 arg2 
   MYVAR=val1 




  snapshooter
  
  true
 

My scripts.conf
data_dir=/data/solr/book/data 

And doc updated :

busy
A command is still running...
−

0:1:33.251
39453
17980
0
2008-09-24 09:50:01
2008-09-24 09:50:01
2008-09-24 09:50:37
2008-09-24 09:50:37
5636
−

Indexing completed. Added/Updated: 5636 documents. Deleted 0 documents.



I've nothing in my commit.log but when i go to my update handler it looks
commit are fired, no?

Update Handlers
 
name: updateHandler  
class: org.apache.solr.update.DirectUpdateHandler2  
version: 1.0  
description: Update handler that efficiently directly updates the on-disk
main lucene index  
stats: commits : 2174
autocommit maxDocs : 1
autocommit maxTime : 1000ms
autocommits : 2172
optimizes : 321
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 323848
cumulative_deletesById : 0
cumulative_deletesByQuery : 0
cumulative_errors : 0
docsDeleted : 0



Any Idea? What should I check ?
thanks a lot,
-- 
View this message in context: 
http://www.nabble.com/issue-with-commit-tp19664249p19664249.html
Sent from the Solr - User mailing list archive at Nabble.com.



most searched keyword in solr

2008-09-25 Thread sanraj25

hi,
how will we find most searched keyword in solr?
If anybody can suggest us a good solution, it would be helpful
thank you

with  Regards,
P.Parkavi

-- 
View this message in context: 
http://www.nabble.com/most-searched-keyword-in-solr-tp19664387p19664387.html
Sent from the Solr - User mailing list archive at Nabble.com.



cannot allocate memory ?? snapshooter

2008-09-25 Thread sunnyfr

Hi, 
Any idea ? 

Sep 25 06:50:41 solr-test jsvc.exec[23286]: Sep 25, 2008 6:50:41 AM
org.apache.solr.common.SolrException log SEVERE: java.io.IOException: Cannot
run program "snapshooter": java.io.IOException: error=12, Cannot allocate
memory ^Iat java.lang.ProcessBuilder.start(ProcessBuilder.java:459) ^Iat
java.lang.Runtime.exec(Runtime.java:593) ^Iat
org.apache.solr.core.RunExecutableListener.exec(RunExecutableListener.java:73)
^Iat
org.apache.solr.core.RunExecutableListener.postCommit(RunExecutableListener.java:100)
^Iat
org.apache.solr.update.UpdateHandler.callPostCommitCallbacks(UpdateHandler.java:101)
^Iat
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:370)
^Iat
org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run(DirectUpdateHandler2.java:525)
^Iat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
^Iat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) ^Iat
java.util.concurrent.FutureTask.run(FutureTask.java:138) ^Iat
java.util.concurrent.ScheduledTh


Thanks a lot guys,

-- 
View this message in context: 
http://www.nabble.com/cannot-allocate-memorysnapshooter-tp19664817p19664817.html
Sent from the Solr - User mailing list archive at Nabble.com.



Memory error - snapshooter help

2008-09-25 Thread sunnyfr

Hi, 
Any idea ? 

Sep 25 06:50:41 solr-test jsvc.exec[23286]: Sep 25, 2008 6:50:41 AM
org.apache.solr.common.SolrException log SEVERE: java.io.IOException: Cannot
run program "snapshooter": java.io.IOException: error=12, Cannot allocate
memory 

My memory for java is 
JAVA_OPTS="-Xms6000m -Xmx6000m -XX:+UseParallelGC -XX:+AggressiveOpts
-XX:NewRatio=5 -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"

My memory is not that bad, I've 8Gb
[EMAIL PROTECTED]:/# free -m
 total   used   free sharedbuffers cached
Mem:  7998   7095903  0  6950
-/+ buffers/cache:   6138   1860
Swap: 2000295   1705
-
Mem:   8190864k total,  7266832k used,   924032k free, 7280k buffers
Swap:  2048248k total,   302244k used,  1746004k free,   974040k cached



My cache is :
   

   


  


My index are quite big:
[EMAIL PROTECTED]:/# ls -ll data/solr/book/data
drwxr-xr-x 2 tomcat55 nogroup 8192 Sep 25 10:08 index
drwxr-xr-x 2 root root4096 Sep 22 16:20 snapshot.20080922162058



What would you reckon ? 
Thanks a lot guys,

-- 
View this message in context: 
http://www.nabble.com/Memory-error---snapshooter-help-tp19665074p19665074.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Not enough space

2008-09-25 Thread sunnyfr

Hi,
I've obviously the same error, I just don't know how do you add swap space ? 
Thanks a lot,


Yonik Seeley wrote:
> 
> On 7/5/07, Xuesong Luo <[EMAIL PROTECTED]> wrote:
>> Thanks, Chris and Yonik. You are right. I remember the heap size was
>> over 500m when I got the Not enough space error message.
>> Is there a best practice to avoid this kind of problem?
> 
> add more swap space.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Not-enough-space-tp11423199p19665707.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Not enough space

2008-09-25 Thread Brian Carmalt
Search with Google for swap file linux linux or "distro name"

There is tons of info out there.

 
Am Donnerstag, den 25.09.2008, 02:07 -0700 schrieb sunnyfr:
> Hi,
> I've obviously the same error, I just don't know how do you add swap space ? 
> Thanks a lot,
> 
> 
> Yonik Seeley wrote:
> > 
> > On 7/5/07, Xuesong Luo <[EMAIL PROTECTED]> wrote:
> >> Thanks, Chris and Yonik. You are right. I remember the heap size was
> >> over 500m when I got the Not enough space error message.
> >> Is there a best practice to avoid this kind of problem?
> > 
> > add more swap space.
> > 
> > -Yonik
> > 
> > 
> 



Re: most searched keyword in solr

2008-09-25 Thread Mark Miller

sanraj25 wrote:

hi,
how will we find most searched keyword in solr?
If anybody can suggest us a good solution, it would be helpful
thank you

with  Regards,
P.Parkavi

  
Write some code to record every query/keyword. Could be done at 
different places depending on how you define 'keyword' compared to how 
things are tokenized.


Or, you should also be able to parse the solr logs and extract query 
information and figure it out based on that.


Or...? Havn't seen any code to help with this out there, but maybe there 
is some?


- Mark


Re: Memory error - snapshooter help

2008-09-25 Thread Bill Au
The OS is checking that there is enough memory... add swap space:
http://www.nabble.com/Not-enough-space-to11423199.html#a11432978



On Thu, Sep 25, 2008 at 4:20 AM, sunnyfr <[EMAIL PROTECTED]> wrote:

>
> Hi,
> Any idea ?
>
> Sep 25 06:50:41 solr-test jsvc.exec[23286]: Sep 25, 2008 6:50:41 AM
> org.apache.solr.common.SolrException log SEVERE: java.io.IOException:
> Cannot
> run program "snapshooter": java.io.IOException: error=12, Cannot allocate
> memory
>
> My memory for java is
> JAVA_OPTS="-Xms6000m -Xmx6000m -XX:+UseParallelGC -XX:+AggressiveOpts
> -XX:NewRatio=5 -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
>
> My memory is not that bad, I've 8Gb
> [EMAIL PROTECTED]:/# free -m
> total   used   free sharedbuffers cached
> Mem:  7998   7095903  0  6950
> -/+ buffers/cache:   6138   1860
> Swap: 2000295   1705
>
> -
> Mem:   8190864k total,  7266832k used,   924032k free, 7280k buffers
> Swap:  2048248k total,   302244k used,  1746004k free,   974040k cached
>
>
>
> My cache is :
> class="solr.LRUCache"
>  size="16384"
>  initialSize="4096"
>  autowarmCount="4096"/>
>
>   
>  class="solr.LRUCache"
>  size="16384"
>  initialSize="4096"
>  autowarmCount="1024"/>
>
>  
>  class="solr.LRUCache"
>  size="4096"
>  initialSize="5"
>  autowarmCount="4096"/>
>
> My index are quite big:
> [EMAIL PROTECTED]:/# ls -ll data/solr/book/data
> drwxr-xr-x 2 tomcat55 nogroup 8192 Sep 25 10:08 index
> drwxr-xr-x 2 root root4096 Sep 22 16:20 snapshot.20080922162058
>
>
>
> What would you reckon ?
> Thanks a lot guys,
>
> --
> View this message in context:
> http://www.nabble.com/Memory-error---snapshooter-help-tp19665074p19665074.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


How to select one entity at a time?

2008-09-25 Thread con

Hi
I have got two entities in my data-config.xml file, entity1 and entity2. 
For condition-A I need to execute only entity1 and for condition-B only the
entity2 needs to get executed.
How can I mention it while accessing the search index in the REST way.
Is there any option that i can give along with this query:
http://localhost:8983/solr/select/?q=physics&version=2.2&start=0&rows=10&indent=on&wt=json

Thanks
con


-- 
View this message in context: 
http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19668759.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Memory error - snapshooter help

2008-09-25 Thread Mark Miller
It is a mistake to give java 6gb out of 8gb available. First, when you 
say 6gb, thats just the heap - the java process will use memory beyond 
6gb. That doesn't leave you with hardly any RAM for any other process. 
And it leaves you almost *nothing* for the filesystem cache. 
Effectively, you are starving yourself of memory. Lower the heap by a 
lot - start low- giving java more ram than it needs is not beneficial. 
So start low and monitor the heap usage using your container, jconsole, 
or visualgc. If you don't pin the min and max, you will see how much the 
jvm actually wants but what actually gets allocated (start with a low 
min for this). Lower your max accordingly.


- Mark


sunnyfr wrote:
Hi, 
Any idea ? 


Sep 25 06:50:41 solr-test jsvc.exec[23286]: Sep 25, 2008 6:50:41 AM
org.apache.solr.common.SolrException log SEVERE: java.io.IOException: Cannot
run program "snapshooter": java.io.IOException: error=12, Cannot allocate
memory 

My memory for java is 
JAVA_OPTS="-Xms6000m -Xmx6000m -XX:+UseParallelGC -XX:+AggressiveOpts

-XX:NewRatio=5 -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"

My memory is not that bad, I've 8Gb
[EMAIL PROTECTED]:/# free -m
 total   used   free sharedbuffers cached
Mem:  7998   7095903  0  6950
-/+ buffers/cache:   6138   1860
Swap: 2000295   1705
-
Mem:   8190864k total,  7266832k used,   924032k free, 7280k buffers
Swap:  2048248k total,   302244k used,  1746004k free,   974040k cached



My cache is :
   

   


  


My index are quite big:
[EMAIL PROTECTED]:/# ls -ll data/solr/book/data
drwxr-xr-x 2 tomcat55 nogroup 8192 Sep 25 10:08 index
drwxr-xr-x 2 root root4096 Sep 22 16:20 snapshot.20080922162058



What would you reckon ? 
Thanks a lot guys,


  




Re: Memory error - snapshooter help

2008-09-25 Thread sunnyfr

but according to you .. How much should I increased my memory ??  512?


Bill Au wrote:
> 
> The OS is checking that there is enough memory... add swap space:
> http://www.nabble.com/Not-enough-space-to11423199.html#a11432978
> 
> 
> 
> On Thu, Sep 25, 2008 at 4:20 AM, sunnyfr <[EMAIL PROTECTED]> wrote:
> 
>>
>> Hi,
>> Any idea ?
>>
>> Sep 25 06:50:41 solr-test jsvc.exec[23286]: Sep 25, 2008 6:50:41 AM
>> org.apache.solr.common.SolrException log SEVERE: java.io.IOException:
>> Cannot
>> run program "snapshooter": java.io.IOException: error=12, Cannot allocate
>> memory
>>
>> My memory for java is
>> JAVA_OPTS="-Xms6000m -Xmx6000m -XX:+UseParallelGC -XX:+AggressiveOpts
>> -XX:NewRatio=5 -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
>>
>> My memory is not that bad, I've 8Gb
>> [EMAIL PROTECTED]:/# free -m
>> total   used   free sharedbuffers cached
>> Mem:  7998   7095903  0  6950
>> -/+ buffers/cache:   6138   1860
>> Swap: 2000295   1705
>>
>> -
>> Mem:   8190864k total,  7266832k used,   924032k free, 7280k buffers
>> Swap:  2048248k total,   302244k used,  1746004k free,   974040k cached
>>
>>
>>
>> My cache is :
>>   >  class="solr.LRUCache"
>>  size="16384"
>>  initialSize="4096"
>>  autowarmCount="4096"/>
>>
>>   
>>>  class="solr.LRUCache"
>>  size="16384"
>>  initialSize="4096"
>>  autowarmCount="1024"/>
>>
>>  
>>>  class="solr.LRUCache"
>>  size="4096"
>>  initialSize="5"
>>  autowarmCount="4096"/>
>>
>> My index are quite big:
>> [EMAIL PROTECTED]:/# ls -ll data/solr/book/data
>> drwxr-xr-x 2 tomcat55 nogroup 8192 Sep 25 10:08 index
>> drwxr-xr-x 2 root root4096 Sep 22 16:20 snapshot.20080922162058
>>
>>
>>
>> What would you reckon ?
>> Thanks a lot guys,
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Memory-error---snapshooter-help-tp19665074p19665074.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Memory-error---snapshooter-help-tp19665074p19668769.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Memory error - snapshooter help

2008-09-25 Thread sunnyfr

Thanks a lot Mark, I will try that.


markrmiller wrote:
> 
> It is a mistake to give java 6gb out of 8gb available. First, when you 
> say 6gb, thats just the heap - the java process will use memory beyond 
> 6gb. That doesn't leave you with hardly any RAM for any other process. 
> And it leaves you almost *nothing* for the filesystem cache. 
> Effectively, you are starving yourself of memory. Lower the heap by a 
> lot - start low- giving java more ram than it needs is not beneficial. 
> So start low and monitor the heap usage using your container, jconsole, 
> or visualgc. If you don't pin the min and max, you will see how much the 
> jvm actually wants but what actually gets allocated (start with a low 
> min for this). Lower your max accordingly.
> 
> - Mark
> 
> 
> sunnyfr wrote:
>> Hi, 
>> Any idea ? 
>>
>> Sep 25 06:50:41 solr-test jsvc.exec[23286]: Sep 25, 2008 6:50:41 AM
>> org.apache.solr.common.SolrException log SEVERE: java.io.IOException:
>> Cannot
>> run program "snapshooter": java.io.IOException: error=12, Cannot allocate
>> memory 
>>
>> My memory for java is 
>> JAVA_OPTS="-Xms6000m -Xmx6000m -XX:+UseParallelGC -XX:+AggressiveOpts
>> -XX:NewRatio=5 -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
>>
>> My memory is not that bad, I've 8Gb
>> [EMAIL PROTECTED]:/# free -m
>>  total   used   free sharedbuffers cached
>> Mem:  7998   7095903  0  6950
>> -/+ buffers/cache:   6138   1860
>> Swap: 2000295   1705
>> -
>> Mem:   8190864k total,  7266832k used,   924032k free, 7280k buffers
>> Swap:  2048248k total,   302244k used,  1746004k free,   974040k cached
>>
>>
>>
>> My cache is :
>>>   class="solr.LRUCache"
>>   size="16384"
>>   initialSize="4096"
>>   autowarmCount="4096"/>
>>
>>
>> >   class="solr.LRUCache"
>>   size="16384"
>>   initialSize="4096"
>>   autowarmCount="1024"/>
>>
>>   
>> >   class="solr.LRUCache"
>>   size="4096"
>>   initialSize="5"
>>   autowarmCount="4096"/>
>>
>> My index are quite big:
>> [EMAIL PROTECTED]:/# ls -ll data/solr/book/data
>> drwxr-xr-x 2 tomcat55 nogroup 8192 Sep 25 10:08 index
>> drwxr-xr-x 2 root root4096 Sep 22 16:20 snapshot.20080922162058
>>
>>
>>
>> What would you reckon ? 
>> Thanks a lot guys,
>>
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Memory-error---snapshooter-help-tp19665074p19668799.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr filesystem dependencies

2008-09-25 Thread Erlend Hamnaberg
Hi list.
I am using the EmbeddedSolrServer to embed solr in my application, however I
have run into a snag.

The only filesystem dependency that I want is the index itself.

The current implementation of the SolrResource seems to suggest that i need
a filesystem dependency to keep my configuration in.
I manged to work around this using the code below, but it feels kind of
wrong.


SolrConfig config = new SolrConfig(null, null,
getClass().getResourceAsStream(SOLR_CONFIG));
IndexSchema schema = new IndexSchema(config, null,
getClass().getResourceAsStream(SOLR_SCHEMA));

CoreContainer coreContainer = new CoreContainer();

SolrCore core = new SolrCore("EMS", indexPath.getAbsolutePath(),
config, schema, new CoreDescriptor(coreContainer, "EMS", SOLR_BASE));
coreContainer.register("EMS", core,  false);
SolrServer solrServer = new EmbeddedSolrServer(coreContainer,
"EMS");


Is there a recommended way of embedding the solr server?


Thanks

- Erlend


Re: Refresh of synonyms.txt without reload

2008-09-25 Thread Batzenmann

Hi again,


Walter Underwood wrote:
> 
> More details on index-time vs. query-time synonyms are here:
> 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter
> 

thx, for pointing that out - That's definitely s.th. worth revising. Butimho
the issue of a changing synonyms.txt remainse, since the same FilterFactory
is used to create the Filters for the index-time analyzer - so the question
remains: How do I take care that if I reindex documents affected by new
synonyms with the updated synonyms.txt without having to restart solr ?

cheers, Axel
-- 
View this message in context: 
http://www.nabble.com/Refresh-of-synonyms.txt-without-reload-tp19629361p19669366.html
Sent from the Solr - User mailing list archive at Nabble.com.



Pre-processing text in custom FilterFactory / TokenizerFactory

2008-09-25 Thread Jaco
Hello,

I need to work with an external stemmer in Solr. This stemmer is accessible
as a COM object (running Solr in tomcat on Windows platform). I managed to
integrate this using the com4j library. I tried two scenario's:
1. Create a custom FilterFactory and Filter class for this. The external
stemmer is then invoked for every token
2. Create a custom TokenizerFactory, that invokes the external stemmer for
the entire search text, then puts the result of this into a StringReader,
and finally returns new WhitespaceTokenizer(stringReader), so the stemmed
text gets tokenized by the whitespace tokenizer.

Looking at search results, both scenario's appear to work from a functional
point of view. The first scenario however is too slow because of the
overhead of calling the external COM object for each token.

The second scenario is much faster, and also gives correct search results.
However, this then gives problems with highlighting - sometimes, errors are
reported (String out of Range), in other cases, I get incorrect highlight
fragments. Without knowing all details about this stuff, this makes sense
because of the change done to the text to be processed (I guess positions
get messed up then).  Maybe my second scenario is totally insane?

Any ideas on how to overcome this or any other suggestions on how to realise
this?

Cheers,

Jaco.

PS I posted this message yesterday, but it didn't come through, so this is
the 2nd try..


NullPointerException

2008-09-25 Thread Dinesh Gupta

Hi All,

I have attached my file.

I am getting exception.

Please suggest me how to short-out this issue.



WARNING: Error creating document : SolrInputDocumnt[{id=id(1.0)={93146}, 
ttl=ttl(1.0)={Majestic from Pushpams.com}, cdt=cdt(1.0)={2001-09-04 
15:40:40.0}, mdt=mdt(1.0)={2008-09-23 17:47:44.0}, prc=prc(1.0)={600.00}}]
java.lang.NullPointerException
at org.apache.lucene.document.Document.getField(Document.java:140)
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:283)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at 
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:190)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at 
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at 
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)
at 
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)

_
Search for videos of Bollywood, Hollywood, Mollywood and every other wood, only 
on Live.com 
http://www.live.com/?scope=video&form=MICOAL
	

			















  

  


































  

  




  







  
  







  





  







  




  








  


 
 

 


 
   


   
   
   
   
 

   
  
  

   
   
   
   
   
   
   
   
   


   
   
 

 
 id

 
 id

 
 

  
  

 
 





  
  ${solr.abortOnConfigurationError:true}

  
  
  D:/dev/solr/data
  

  
   
false
10
1000
2147483647
1
1000
1
  

  

false
10
1000
2147483647
1


false
  

  
  









  


  

1024





   


  



true




   

   
10

















false


4

  

  
  


  
  
  
  
  

 
   explicit
   
 
  

  
  

 explicit
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 
 
ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 
id,name,price,score
 
 
2<-1 5<-2 6<90%
 
 100
 *:*

  

  
  

 explicit
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0
 2<-1 5<-2 6<90%
 
 incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2



  inStock:true



  cat
  manu_exact
  price:[* TO 500]
  price:[500 TO *]

  
  
  

 
inStock:true
 
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
2<-1 5<-2 6<90%
 
  


  
  

 
   1
   0.5
 
 
 
 
 
 
 
 spell
 
 
 
  

Re: most searched keyword in solr

2008-09-25 Thread Jon Baer

Why even do any of the work :-)

Im not sure any of the free analytic apps (ala Google) can but the  
paid ones do, just drop the query into one of those and let them  
analyze ...


http://www.google.com/analytics/

Then just parse the reports.

- Jon

On Sep 25, 2008, at 8:39 AM, Mark Miller wrote:


sanraj25 wrote:

hi,
how will we find most searched keyword in solr?
If anybody can suggest us a good solution, it would be helpful
thank you

with  Regards,
P.Parkavi


Write some code to record every query/keyword. Could be done at  
different places depending on how you define 'keyword' compared to  
how things are tokenized.


Or, you should also be able to parse the solr logs and extract query  
information and figure it out based on that.


Or...? Havn't seen any code to help with this out there, but maybe  
there is some?


- Mark




Re: Standard analyzer and acronyms

2008-09-25 Thread Luca Molteni
The schema browswer is a section in the admin panel of Solr. I don't know if
I'm looking at original value, I think there are only filtered values in
there.

Thank you for the reply.

Bye

L.M.


2008/9/22 Otis Gospodnetic <[EMAIL PROTECTED]>

> Hi,
>
> Are you sure you are not looking at the original field values? (what is the
> schema browser are you referring to?)
> Yes, tokenizer + filters are applied in the order they are defined in, so
> the order is important.  For example, you typically want to lower-case
> tokens before removing stop words because, presumably, your stop words are
> all lower-case.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: Luca Molteni <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Monday, September 22, 2008 4:43:43 AM
> > Subject: Standard analyzer and acronyms
> >
> > Hello, list.
> >
> > I found some strange results using the standard analyzer.
> >
> > I've put it in both query and index time,  but when I use the schema
> browser
> > to see the commond values for field, i find:
> >
> > spa1558 s.p.a. 833
> > Which is pretty strange, since I've used the analyzer to remove the dots
> > from the acronyms.
> >
> > My hypothesis is that the StandardAnalyzer remove dots from only the
> > uppercase acronyms.
> >
> > Can anyone confirm this to me?
> >
> > Regarding this, I was wondering if the filter and the tokenizers are
> applied
> > sequencely using the order in which they are written.
> > For example, if I use the StandardAnalyzer, the StopFilter for the words
> > "IBM" and the whitespace tokenizer
> >
> > "I.B.M Company"
> >
> > 1. The standard removes the dot
> >
> > "IBM Company"
> >
> > 2. The stopfilter removes the word "IBM"
> >
> > "Company"
> >
> > 3. The analyzer returns only one token
> >
> > "Company".
> >
> > I know, this is not a great example, but I think that not all the
> analyzer
> > are commutative, then there should be an order in which they are applied.
> >
> > Thank you very much.
> >
> > L.M.
>
>


Re: most searched keyword in solr

2008-09-25 Thread Walter Underwood
I process our HTTP logs. I'm sure there are log analyzers that
handle search terms, though I wrote a bit of Python to do it.
If you extract the search queries to a file, then use a Unix
pipe to get a list:

  sort < queries.txt | uniq -c | sort -rn > counted-queries.txt

wunder

On 9/25/08 12:29 AM, "sanraj25" <[EMAIL PROTECTED]> wrote:

> 
> hi,
> how will we find most searched keyword in solr?
> If anybody can suggest us a good solution, it would be helpful
> thank you
> 
> with  Regards,
> P.Parkavi



Re: Refresh of synonyms.txt without reload

2008-09-25 Thread Walter Underwood
First, define separate analyzer/filter chains for index and query.
Do not include synonyms in the query chain.

Second, use a separate indexing system and use Solr index distribution
to sync the indexes to one or more query systems. This will create a new
Searcher and caches on the query systems, but it is less drastic than
a restart.

wunder


On 9/25/08 6:19 AM, "Batzenmann" <[EMAIL PROTECTED]> wrote:

> thx, for pointing that out - That's definitely s.th. worth revising. Butimho
> the issue of a changing synonyms.txt remainse, since the same FilterFactory
> is used to create the Filters for the index-time analyzer - so the question
> remains: How do I take care that if I reindex documents affected by new
> synonyms with the updated synonyms.txt without having to restart solr ?



snappuller not fired

2008-09-25 Thread sunnyfr

Hi everybody,
Any idea why, it might be the path ?? 

Conf file :


  snapshooter
  
  true
   arg1 arg2 
   MYVAR=val1 




  snapshooter
  
  true


Scripts.php :
user=root
solr_hostname=localhost
solr_port=8180
rsyncd_port=18180
data_dir=/data/solr/book/data
webapp_name=solr/book
master_host=10.97.1.151
master_data_dir=/data/solr/book/data
master_status_dir=/data/solr/book/logs
~   

thanks a lot,
Sunny 

-- 
View this message in context: 
http://www.nabble.com/snappuller-not-fired-tp19671251p19671251.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Refresh of synonyms.txt without reload

2008-09-25 Thread Batzenmann


Walter Underwood wrote:
> 
> First, define separate analyzer/filter chains for index and query.
> Do not include synonyms in the query chain.
> 
> Second, use a separate indexing system and use Solr index distribution
> to sync the indexes to one or more query systems. This will create a new
> Searcher and caches on the query systems, but it is less drastic than
> a restart.
> 

We already have separate analyzers for index/query-time as well as index
distribution for master/slave searchers in place.

Synonyms can be maintained via a web-app by our customer. Assuming you
suggest to use solr as 'separate indexing system', I'll run into the same
issue, where I still think restarting the app every time s.o.
adds/removes/alters a synonym is not an desirable solution.

Axel
-- 
View this message in context: 
http://www.nabble.com/Refresh-of-synonyms.txt-without-reload-tp19629361p19671877.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Refresh of synonyms.txt without reload

2008-09-25 Thread Otis Gospodnetic
Depending on how often synonyms are added you may or may not have/want to make 
Solr reload your synonyms.  If you use index-time synonyms you definitely don't 
want to reindex every time they change if they change more frequently than what 
it takes to reindex.
I believe you can use new MultiCore methods for loading/reloading cores to get 
synonyms to reload.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Batzenmann <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, September 25, 2008 11:24:02 AM
> Subject: Re: Refresh of synonyms.txt without reload
> 
> 
> 
> Walter Underwood wrote:
> > 
> > First, define separate analyzer/filter chains for index and query.
> > Do not include synonyms in the query chain.
> > 
> > Second, use a separate indexing system and use Solr index distribution
> > to sync the indexes to one or more query systems. This will create a new
> > Searcher and caches on the query systems, but it is less drastic than
> > a restart.
> > 
> 
> We already have separate analyzers for index/query-time as well as index
> distribution for master/slave searchers in place.
> 
> Synonyms can be maintained via a web-app by our customer. Assuming you
> suggest to use solr as 'separate indexing system', I'll run into the same
> issue, where I still think restarting the app every time s.o.
> adds/removes/alters a synonym is not an desirable solution.
> 
> Axel
> -- 
> View this message in context: 
> http://www.nabble.com/Refresh-of-synonyms.txt-without-reload-tp19629361p19671877.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: java.io.IOException: cannot read directory org.apache.lucene.store.FSDirectory@/home/solr/src/apache-solr-nightly/example/solr/data/index: list() returned null

2008-09-25 Thread Erik Holstad
Ran some more tests and when I'm only using 
25000 I get

va116__at_orgapachesolrupdateUpdateHandlercreateMainIndexWriterUpdateHandlerjava122__at_orgapachesolrupdateDirectUpdateHandler2openWriterDirectUpdateHandler2java167__at_orgapachesolrupdateDirectUpdateHandler2addDocDirectUpdateHandler2java221__at_orgapachesolrupdateprocessorRunUpdateProcessorprocessAddRunUpdateProcessorFactoryjava59__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava196__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava123__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1204__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava13

Lock_obtain_timed_out_SingleInstanceLock_writelock__orgapachelucenestoreLockObtainFailedException_Lock_obtain_timed_out_SingleInstanceLock_writelock__at_orgapachelucenestoreLockobtainLockjava85__at_orgapacheluceneindexIndexWriterinitIndexWriterjava1140__at_orgapacheluceneindexIndexWriterinitIndexWriterjava938__at_orgapachesolrupdateSolrIndexWriterinitSolrIndexWriterjava116__at_orgapachesolrupdateUpdateHandlercreateMainIndexWriterUpdateHandlerjava122__at_orgapachesolrupdateDirectUpdateHandler2openWriterDirectUpdateHandler2java167__at_orgapachesolrupdateDirectUpdateHandler2addDocDirectUpdateHandler2java221__at_orgapachesolrupdateprocessorRunUpdateProcessorprocessAddRunUpdateProcessorFactoryjava59__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava196__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava123__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1204__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava13

request: http://ss0:8983/solr/update?wt=javabin&version=2.2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343)

at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
at 
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)

at SolrTasks.insertSetupEnd(SolrTasks.java:176)
at SolrTasks.insert(SolrTasks.java:158)
at SolrImportMR.map(SolrImportMR.java:81)
at org.apache.hadoop.hbase.mapred.TableMap.map(TableMap.java:42)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)



This is now when using 6 mappers as input to Solr

So I downed the nr of mappers to 3, and than everything worked. But this was
not an optimal solution
so what we ended up doing was to send all the mappers to one reduce which
did the commit for all
the mappers and this seems to work fine even for more than 3 mappers.

Regards Erik






On Wed, Sep 24, 2008 at 2:36 PM, Erik Holstad <[EMAIL PROTECTED]> wrote:

> That is exactly what we are doing now added all the documents to the server
> in the Map phase of the job and send them all to on reducer, which commits
> them all.
> Seems to be working.
>
> Thanks Erik
>
>
> On Wed, Sep 24, 2008 at 2:27 PM, Otis Gospodnetic <
> [EMAIL PROTECTED]> wrote:
>
>> Erik,
>> There is little benefit from having mor

RE: Shingles , min size?

2008-09-25 Thread Steven A Rowe
Hi Norberto,

ShingleMatrixFilter is capable of this, but ShingleFilter is not.  It should be 
though - I think if ShingleFilter continues to exist, it should learn a few 
things from ShingleMatrixFilter's one-dimensional functionality.

Steve

On 09/25/2008 at 2:23 AM, Norberto Meijome wrote:
> hi guys,
> I may have missed it ,but is it possible to tell the
> solr.ShingleFilterFactory the minimum number of grams to
> generate per shingle?  Similar to NGramTokenizerFactory's
> minGramSize="3" maxGramSize="3"
> 
> thanks!
> B
> _
> {Beto|Norberto|Numard} Meijome
> 
> "Ask not what's inside your head, but what your head's inside of."
>J. J. Gibson
> 
> I speak for myself, not my employer. Contents may be hot.
> Slippery when wet. Reading disclaimers makes you go blind.
> Writing them is worse. You have been Warned.
>

 



Best practice advice needed!

2008-09-25 Thread sundar shankar
Hi,
  We have an index of courses (about 4 million docs in prod) and we have a 
nightly that would pick up newly added courses and update the index 
accordingly. There is another Enterprise system that shares the same table and 
that could delete data from the table too. 

I just want to know what would be the best practice to find out deleted records 
and remove it from my index. Unfortunately for us, we dont maintain a history 
of the deleted records and thats a big bane. 

Please do advice on what might be the best way to implement this?

-Sundar

_
Movies, sports & news! Get your daily entertainment fix, only on live.com
http://www.live.com/?scope=video&form=MICOAL

Re: Best practice advice needed!

2008-09-25 Thread Fuad Efendi
I am guessing your Enterprise system deletes/updates tables in RDBMS,  
and your SOLR indexes that data. Additionally to that, you have  
front-end interacting with SOLR and with RDBMS. At front-end level, in  
case of a search sent to SOLR returning primary keys for data, you may  
check your database using primary keys returned by SOLR before  
committing output to end users.


To remove records from an index... best-by performance is to have  
Master-Slave SOLR instances, remove data from Master SOLR, and  
commit/synchronize with Slave nightly (when traffic is lowest). SOLR  
won't be in-sync with database, but you can always retrieve PKs from  
SOLR, check database for those PKs, and 'filter' output...


--
Thanks,

Fuad Efendi
416-993-2060(cell)
Tokenizer Inc.
==
http://www.linkedin.com/in/liferay


Quoting sundar shankar <[EMAIL PROTECTED]>:


Hi,
  We have an index of courses (about 4 million docs in prod) and  
 we have a nightly that would pick up newly added courses and update  
 the index accordingly. There is another Enterprise system that   
shares the same table and that could delete data from the table too.


I just want to know what would be the best practice to find out   
deleted records and remove it from my index. Unfortunately for us,   
we dont maintain a history of the deleted records and thats a big   
bane.


Please do advice on what might be the best way to implement this?

-Sundar

_
Movies, sports & news! Get your daily entertainment fix, only on live.com
http://www.live.com/?scope=video&form=MICOAL






Re: Best practice advice needed!

2008-09-25 Thread Erick Erickson
How long does it take to build the entire index? Can you just rebuild it
from scratch every night? That would be the simplest.

Best
Erick

On Thu, Sep 25, 2008 at 12:48 PM, sundar shankar
<[EMAIL PROTECTED]>wrote:

> Hi,
>  We have an index of courses (about 4 million docs in prod) and we have
> a nightly that would pick up newly added courses and update the index
> accordingly. There is another Enterprise system that shares the same table
> and that could delete data from the table too.
>
> I just want to know what would be the best practice to find out deleted
> records and remove it from my index. Unfortunately for us, we dont maintain
> a history of the deleted records and thats a big bane.
>
> Please do advice on what might be the best way to implement this?
>
> -Sundar
>
> _
> Movies, sports & news! Get your daily entertainment fix, only on live.com
> http://www.live.com/?scope=video&form=MICOAL


spellcheck: buildOnOptimize?

2008-09-25 Thread Jason Rennie
I see that there's an option to automatically rebuild the spelling index on
a commit.  That's a nice feature that we'll consider using, but we run
commits every few thousand document updates, which would yield ~100 spelling
index rebuilds a day.  OTOH, we run an optimize about once/day which seems
like a more appropriate schedule for rebuilding the spelling index.

Is there or could there be an option to rebuild the spelling index on
optimize?

Thanks,

Jason


Re: Best practice advice needed!

2008-09-25 Thread Walter Underwood
This will cause the result counts to be wrong and the "deleted" docs
will stay in the search index forever.

Some approaches for incremental update:

* full sweep garbage collection: fetch every ID in the Solr DB and
check whether that exists in the source DB, then delete the ones
that don't exist.

* mark for deletion: change the DB to leave the record but flag it
as deleted in a boolean row, then delete from Solr all deleted
items in the source DB. The items marked for deletion can be
deleted from the source DB at a later time.

* indexer scratchpad DB: a database used by the indexing code which
shows all the IDs currently in the index, usually with a last modified
time. This is similar to the full sweep, but may be much faster with
a dedicated DB. This can get arbitrarily fancy. Web spiders work like this.

wunder

On 9/25/08 10:08 AM, "Fuad Efendi" <[EMAIL PROTECTED]> wrote:

> I am guessing your Enterprise system deletes/updates tables in RDBMS,
> and your SOLR indexes that data. Additionally to that, you have
> front-end interacting with SOLR and with RDBMS. At front-end level, in
> case of a search sent to SOLR returning primary keys for data, you may
> check your database using primary keys returned by SOLR before
> committing output to end users.



Re: Best practice advice needed!

2008-09-25 Thread Walter Underwood
That should be "flag it in a boolean column". --wunder


On 9/25/08 11:51 AM, "Walter Underwood" <[EMAIL PROTECTED]> wrote:

> This will cause the result counts to be wrong and the "deleted" docs
> will stay in the search index forever.
> 
> Some approaches for incremental update:
> 
> * full sweep garbage collection: fetch every ID in the Solr DB and
> check whether that exists in the source DB, then delete the ones
> that don't exist.
> 
> * mark for deletion: change the DB to leave the record but flag it
> as deleted in a boolean row, then delete from Solr all deleted
> items in the source DB. The items marked for deletion can be
> deleted from the source DB at a later time.
> 
> * indexer scratchpad DB: a database used by the indexing code which
> shows all the IDs currently in the index, usually with a last modified
> time. This is similar to the full sweep, but may be much faster with
> a dedicated DB. This can get arbitrarily fancy. Web spiders work like this.
> 
> wunder
> 
> On 9/25/08 10:08 AM, "Fuad Efendi" <[EMAIL PROTECTED]> wrote:
> 
>> I am guessing your Enterprise system deletes/updates tables in RDBMS,
>> and your SOLR indexes that data. Additionally to that, you have
>> front-end interacting with SOLR and with RDBMS. At front-end level, in
>> case of a search sent to SOLR returning primary keys for data, you may
>> check your database using primary keys returned by SOLR before
>> committing output to end users.
> 



RE: Best practice advice needed!

2008-09-25 Thread sundar shankar
Great Thanks. 



> Date: Thu, 25 Sep 2008 11:54:32 -0700
> Subject: Re: Best practice advice needed!
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> 
> That should be "flag it in a boolean column". --wunder
> 
> 
> On 9/25/08 11:51 AM, "Walter Underwood" <[EMAIL PROTECTED]> wrote:
> 
> > This will cause the result counts to be wrong and the "deleted" docs
> > will stay in the search index forever.
> > 
> > Some approaches for incremental update:
> > 
> > * full sweep garbage collection: fetch every ID in the Solr DB and
> > check whether that exists in the source DB, then delete the ones
> > that don't exist.
> > 
> > * mark for deletion: change the DB to leave the record but flag it
> > as deleted in a boolean row, then delete from Solr all deleted
> > items in the source DB. The items marked for deletion can be
> > deleted from the source DB at a later time.
> > 
> > * indexer scratchpad DB: a database used by the indexing code which
> > shows all the IDs currently in the index, usually with a last modified
> > time. This is similar to the full sweep, but may be much faster with
> > a dedicated DB. This can get arbitrarily fancy. Web spiders work like this.
> > 
> > wunder
> > 
> > On 9/25/08 10:08 AM, "Fuad Efendi" <[EMAIL PROTECTED]> wrote:
> > 
> >> I am guessing your Enterprise system deletes/updates tables in RDBMS,
> >> and your SOLR indexes that data. Additionally to that, you have
> >> front-end interacting with SOLR and with RDBMS. At front-end level, in
> >> case of a search sent to SOLR returning primary keys for data, you may
> >> check your database using primary keys returned by SOLR before
> >> committing output to end users.
> > 
> 

_
Searching for the best deals on travel? Visit MSN Travel.
http://in.msn.com/coxandkings

Re: spellcheck: buildOnOptimize?

2008-09-25 Thread Grant Ingersoll


On Sep 25, 2008, at 2:17 PM, Jason Rennie wrote:

I see that there's an option to automatically rebuild the spelling  
index on

a commit.  That's a nice feature that we'll consider using, but we run
commits every few thousand document updates, which would yield ~100  
spelling
index rebuilds a day.  OTOH, we run an optimize about once/day which  
seems

like a more appropriate schedule for rebuilding the spelling index.

Is there or could there be an option to rebuild the spelling index on
optimize?


Seems reasonable, could almost do it via the postOptimize call back  
already in the config, except the SpellCheckComponent's EvenListener  
is private static and has an empty postCommit implementation (which is  
what is called after optimization, since it is just like a commit in  
many ways)


Thus, a patch would be needed. 
 


Re: spellcheck: buildOnOptimize?

2008-09-25 Thread Shalin Shekhar Mangar
On Fri, Sep 26, 2008 at 12:43 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:

>
> On Sep 25, 2008, at 2:17 PM, Jason Rennie wrote:
>
>  I see that there's an option to automatically rebuild the spelling index
>> on
>> a commit.  That's a nice feature that we'll consider using, but we run
>> commits every few thousand document updates, which would yield ~100
>> spelling
>> index rebuilds a day.  OTOH, we run an optimize about once/day which seems
>> like a more appropriate schedule for rebuilding the spelling index.
>>
>> Is there or could there be an option to rebuild the spelling index on
>> optimize?
>>
>
> Seems reasonable, could almost do it via the postOptimize call back already
> in the config, except the SpellCheckComponent's EvenListener is private
> static and has an empty postCommit implementation (which is what is called
> after optimization, since it is just like a commit in many ways)
>
> Thus, a patch would be needed.
>

postCommit/postOptimize callbacks happen after commit/optimize but before a
new searcher is opened. Therefore, it is not possible to re-build spellcheck
index on those events without opening a IndexReader directly on the solr
index. That is why the event listener in SpellCheckComponent uses the
newSearcher listener to build on commits.

I don't think there is anything in the API currently to do what Jason wants.

-- 
Regards,
Shalin Shekhar Mangar.


Re: How to select one entity at a time?

2008-09-25 Thread Shalin Shekhar Mangar
On Thu, Sep 25, 2008 at 6:13 PM, con <[EMAIL PROTECTED]> wrote:

>
> Hi
> I have got two entities in my data-config.xml file, entity1 and entity2.
> For condition-A I need to execute only entity1 and for condition-B only the
> entity2 needs to get executed.
> How can I mention it while accessing the search index in the REST way.
> Is there any option that i can give along with this query:
>
> http://localhost:8983/solr/select/?q=physics&version=2.2&start=0&rows=10&indent=on&wt=json
>

I suppose that you are using multiple root entities and the solr document
contains some field which tells us the entity it came from.

If yes, you can use filter queries (fq parameter) to filter on those fields.

-- 
Regards,
Shalin Shekhar Mangar.


Re: snappuller not fired

2008-09-25 Thread Shalin Shekhar Mangar
I think you have asked the question before too. I have the same answer, try
giving the full (absolute) path to snapshooter and the bin directory. Check
the logs to see if there are any errors.

On Thu, Sep 25, 2008 at 8:24 PM, sunnyfr <[EMAIL PROTECTED]> wrote:

>
> Hi everybody,
> Any idea why, it might be the path ??
>
> Conf file :
>
>
>  snapshooter
>  
>  true
>   arg1 arg2 
>   MYVAR=val1 
>
>
>
>
>  snapshooter
>  
>  true
>
>
> Scripts.php :
> user=root
> solr_hostname=localhost
> solr_port=8180
> rsyncd_port=18180
> data_dir=/data/solr/book/data
> webapp_name=solr/book
> master_host=10.97.1.151
> master_data_dir=/data/solr/book/data
> master_status_dir=/data/solr/book/logs
> ~
>
> thanks a lot,
> Sunny
>
> --
> View this message in context:
> http://www.nabble.com/snappuller-not-fired-tp19671251p19671251.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


RE: Best practice advice needed!

2008-09-25 Thread sundar shankar
Hi Faud,
Since I dont have too much of data (4 million) I dont have a master slave setup 
yet. How big a change would that be?



> Date: Thu, 25 Sep 2008 10:08:51 -0700
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: Best practice advice needed!
> 
> I am guessing your Enterprise system deletes/updates tables in RDBMS,  
> and your SOLR indexes that data. Additionally to that, you have  
> front-end interacting with SOLR and with RDBMS. At front-end level, in  
> case of a search sent to SOLR returning primary keys for data, you may  
> check your database using primary keys returned by SOLR before  
> committing output to end users.
> 
> To remove records from an index... best-by performance is to have  
> Master-Slave SOLR instances, remove data from Master SOLR, and  
> commit/synchronize with Slave nightly (when traffic is lowest). SOLR  
> won't be in-sync with database, but you can always retrieve PKs from  
> SOLR, check database for those PKs, and 'filter' output...
> 
> -- 
> Thanks,
> 
> Fuad Efendi
> 416-993-2060(cell)
> Tokenizer Inc.
> ==
> http://www.linkedin.com/in/liferay
> 
> 
> Quoting sundar shankar <[EMAIL PROTECTED]>:
> 
> > Hi,
> >   We have an index of courses (about 4 million docs in prod) and  
> >  we have a nightly that would pick up newly added courses and update  
> >  the index accordingly. There is another Enterprise system that   
> > shares the same table and that could delete data from the table too.
> >
> > I just want to know what would be the best practice to find out   
> > deleted records and remove it from my index. Unfortunately for us,   
> > we dont maintain a history of the deleted records and thats a big   
> > bane.
> >
> > Please do advice on what might be the best way to implement this?
> >
> > -Sundar
> >
> > _
> > Movies, sports & news! Get your daily entertainment fix, only on live.com
> > http://www.live.com/?scope=video&form=MICOAL
> 
> 
> 

_
Movies, sports & news! Get your daily entertainment fix, only on live.com
http://www.live.com/?scope=video&form=MICOAL

Re: NullPointerException

2008-09-25 Thread Shalin Shekhar Mangar
I'm not sure about why the NullPointerException is coming. Is that the whole
stack trace?

The mdt and cdt are date in schema.xml but the format that is in the log is
wrong. Look at the DateFormatTransformer in DataImportHandler which can
format strings in your database to the correct date format needed for Solr.

On Thu, Sep 25, 2008 at 7:09 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:

>  Hi All,
>
> I have attached my file.
>
> I am getting exception.
>
> Please suggest me how to short-out this issue.
>
>
>
> WARNING: Error creating document : SolrInputDocumnt[{id=id(1.0)={93146},
> ttl=ttl(1.0)={Majestic from Pushpams.com}, cdt=cdt(1.0)={2001-09-04
> 15:40:40.0}, mdt=mdt(1.0)={2008-09-23 17:47:44.0}, prc=prc(1.0)={600.00}}]
> java.lang.NullPointerException
> at org.apache.lucene.document.Document.getField(Document.java:140)
> at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:283)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
> at
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
> at
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:190)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
> at
> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
> at
> org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)
> at
> org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)
>
> --
> MSN Technology brings you the latest on gadgets, gizmos and the new hits in
> the gaming market. Try it now! 
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Best practice advice needed!

2008-09-25 Thread Fuad Efendi
About web spiders: I simply use "last modified timestamp" field in  
SOLR, and I expire items after 30 days. If item was updated (timestamp  
changed) - it won't be deleted. If I delete it from database - it will  
be deleted from SOLR within 30 days. Spiders don't need  
'transactional' updates.


Recently I moved to HBase from MySQL. "row::column" structure is  
physically sorted, column-oriented structure. SOLR lazily follows  
database updates; it's very specific case...



Quoting Walter Underwood <[EMAIL PROTECTED]>:


That should be "flag it in a boolean column". --wunder


On 9/25/08 11:51 AM, "Walter Underwood" <[EMAIL PROTECTED]> wrote:


This will cause the result counts to be wrong and the "deleted" docs
will stay in the search index forever.

Some approaches for incremental update:

* full sweep garbage collection: fetch every ID in the Solr DB and
check whether that exists in the source DB, then delete the ones
that don't exist.

* mark for deletion: change the DB to leave the record but flag it
as deleted in a boolean row, then delete from Solr all deleted
items in the source DB. The items marked for deletion can be
deleted from the source DB at a later time.

* indexer scratchpad DB: a database used by the indexing code which
shows all the IDs currently in the index, usually with a last modified
time. This is similar to the full sweep, but may be much faster with
a dedicated DB. This can get arbitrarily fancy. Web spiders work like this.

wunder

On 9/25/08 10:08 AM, "Fuad Efendi" <[EMAIL PROTECTED]> wrote:


I am guessing your Enterprise system deletes/updates tables in RDBMS,
and your SOLR indexes that data. Additionally to that, you have
front-end interacting with SOLR and with RDBMS. At front-end level, in
case of a search sent to SOLR returning primary keys for data, you may
check your database using primary keys returned by SOLR before
committing output to end users.











Bunch of questions regarding enterprise configuration

2008-09-25 Thread Dev Team
Hi everybody,

I'm new to Solr, and have been reading through documentation off-and-on for
days, but still have some unanswered basic/fundamental questions that have a
huge impact on my implementation approach.
I am thinking of moving my company's web app's main search engine over to
Solr. My goal is to index 5M user records of a social networking website
(most of which have a free-form text portion, but the majority of  data is
non-text) and have complex searches against those records back in the
sub-0.5s range of time. I have just under 10  application servers each
running my web-app, which is mostly stateless except for things like users'
online status.
Forgive me for asking so many in one email; feel free to change subject line
and reply about individual items. Here's the questions:

1. How to best organize a web-app that normally goes to a search-db to use
Solr instead?
a) Set up independent Solr instance, make app search it just like it used to
search database.
b) Integrate Solr right into app, so that app+solr get deployed together
(this is very possible, as our app is Java). But we run  several instances
of the app so we'd be running several Solr instances too.
c) Set up independent Solr instance + our code (plugins or whatever?), have
web clients request DIRECTLY to the Solr app and have  Solr return search
results directly.
d) Other configuration...?

2. How to best handle Enums?
We have a bunch of enumerated data (say, for example, shoe types). What
"fieldType" should we use to index them?
Should I index them as text? If I index "sandals" then if somebody searches
for the keyword "sandals" then the documents that have shoeType=Sandals (eg,
enum-value of "07") I'd want those documents to show up.

3. Enums are related, sort-of:
Sometimes our enumerated data is somewhat related. For example (in the "shoe
types" example), let's say we have "sandals", well,  "crocs" are not
sandals, but are SORT-oF like sandals, so we'd like them to match but score
lower than an exact sandal match. How do  we do this? (Is this "Changing
Similarity" or is that barking up the wrong tree?)

4. How to manage "Tags" data?
Users on my site can enter "tags", and we want to be able to build
tag-clouds, follow tag-links, and whatnot. Should I index tags as just a
fieldType of "text"?

5. How do I load the data?
Loading all the data from the database (to anything!) takes a big chunk of
time. Should I export it from the database once and then load it into Solr
using CSV?
Follow-up: How would I manage loading this/new data on an ongoing basis? The
site's users are creating data all the time, the bulk of  which is old (i.e.
before today; could be bulk loaded), but after an initial bulk load it's
ongoing data. Should I be just building  a huge Solr index on the filesystem
and making sure I don't lose it?

6. How do I manage real-time data?
For example, let's say I have users coming online and offline all the time,
and I need to be able to search my set of "online  users". How should I go
about this? Can this just be handled through index updates?

I'd appreciate any advice.

Sincerely,

Daryl.


How to get count of different groups of items in a single query

2008-09-25 Thread Choi, David
Hi everyone,  I tried looking in the mailing list archive, but couldn't find a 
good answer for what I'm trying to do.

Say I have an index of data about cars.  I want to search for all red cars, so 
I do something like: q=colour:red.  This returns 100 results, of which 40 are 
"model:Toyota", 30 are "model:Chrysler", and 30 are "model:Ford".  How can I 
get the count of each type of car, without doing 3 separate queries?

Thanks in advance!
- David Choi



Re: How to get count of different groups of items in a single query

2008-09-25 Thread Bess Sadler

Hi, David.

In this case it looks like you're looking for the faceting  
functionality. You can read more about this on the wiki, here:

http://wiki.apache.org/solr/SimpleFacetParameters?highlight=%28facet%29

In your case, you're going to want something like:

http://yoursolr.com/solr/select?q=colour:red&facet=true&facet.field=model

(You can also add a &rows=0 if you want to just see the facets, while  
you're trying to get things working.)


Hope this helps!

Bess


On 25-Sep-08, at 7:09 PM, Choi, David wrote:

Hi everyone,  I tried looking in the mailing list archive, but  
couldn't find a good answer for what I'm trying to do.


Say I have an index of data about cars.  I want to search for all  
red cars, so I do something like: q=colour:red.  This returns 100  
results, of which 40 are "model:Toyota", 30 are "model:Chrysler",  
and 30 are "model:Ford".  How can I get the count of each type of  
car, without doing 3 separate queries?


Thanks in advance!
- David Choi





Searching Question

2008-09-25 Thread Jake Conk
Hello,

We are using Solr for our new forums search feature. If possible when
searching for the word "Halo" we would like threads that contain the
word "Halo" the most with the least amount of posts in that thread to
have a higher score.

For instance, if we have a thread with 10 posts and the word "Halo"
shows up 5 times then that should have a lower score than a thread
that has the word "Halo" 3 times within its posts and has 5 replies.
Basically the thread that shows the search string most frequently
amongst the number of posts in the thread should be the one with the
highest score.

Is something like this possible?

Thanks,

- JC


why index auto change?

2008-09-25 Thread 李学健
hi, all

recently, i encounter this problem several times.
index in solr automically changes, i never post any data today, but a
part of index files as below:

-rw-r--r-- 1 root root 3202872 Sep 24 22:23 _f.prx
-rw-r--r-- 1 root root 14595 Sep 24 22:23 _f.tii
-rw-r--r-- 1 root root 1072202 Sep 24 22:23 _f.tis
-rw-r--r-- 1 root root 95 Sep 26 10:43 _g.fnm
-rw-r--r-- 1 root root 2220768 Sep 26 10:43 _g.frq
-rw-r--r-- 1 root root 46229 Sep 26 10:43 _g.nrm
-rw-r--r-- 1 root root 2467451 Sep 26 10:43 _g.prx
-rw-r--r-- 1 root root 11108 Sep 26 10:43 _g.tii
-rw-r--r-- 1 root root 799903 Sep 26 10:43 _g.tis
-rw-r--r-- 1 root root 258 Sep 26 10:43 segments_4
-rw-r--r-- 1 root root 20 Sep 26 10:43 segments.gen


notice, _g files is modified today, others is created in 24th when i
posted data.

can somebody explain this?
thanks!


Re: spellcheck: buildOnOptimize?

2008-09-25 Thread Chris Hostetter

: postCommit/postOptimize callbacks happen after commit/optimize but before a
: new searcher is opened. Therefore, it is not possible to re-build spellcheck
: index on those events without opening a IndexReader directly on the solr

FWIW: I believe it has to work that way because postCommit events might 
modify the index. (but i'm just guessing)

: index. That is why the event listener in SpellCheckComponent uses the
: newSearcher listener to build on commits.
: 
: I don't think there is anything in the API currently to do what Jason wants.

couldn't the Listener's newSearcher() method just do something like 
this...

if (rebuildOnlyAfterOptimize && 
! (newSearcher.getReader().isOptimized() && 
   ! oldSearcher.getReader().isOptimized()) {
  return;
} else {
  // current impl
}

...assuming a new "rebuildOnlyAfterOptimize" option was added?

-Hoss



Re: a question about solr queryparser

2008-09-25 Thread Chris Hostetter

: (correctly) in the solrconfig.xml.  Could you paste the relevant part of 
: solrconfig.xml?  I don't recall a bug related to this, but you could 
: also try Solr 1.3 if you believe you configured things conrrectly.

also check the Analysis Tool (link from the admin page) and see what it 
says your analyzer produces for your field and a *query* for...

oneworld

...keep in mind that the query parser only associates "chunks" of text 
with something like "title:" if the text is quoted or the whitespace 
between the chunks is escaped, so ...

title:oneworld onedream

...will cause "oneworld" to be passed to the analyzer for your title field 
and "onedream" to the analyzer for whatever your default field is.



-Hoss



Re: error when indexing null value of slong fied

2008-09-25 Thread Chris Hostetter

"Missing" is different then "null" ... in truth what i suspect yo uare 
doing is indexing something like this...



...that is an empty string (""), and the error is because an empty string 
can't be converted to a number

: when i indexed a doc with null value of this field, an error happened:
: 
: SEVERE: org.apache.solr.common.SolrException: Error while creating field
: 'pubdate{type=slong,properties=indexed,stored,omitNorms,sortMissingLast}'
: from value ''
: 
: and slong type defined as below:
: 
: 
: since it's permitted that this field is null when sorting, why not when
: indexing?



-Hoss



Re: Searching for future or "null" dates

2008-09-25 Thread Chris Hostetter

: I would also like to follow your advice but don't know how to do it with
: defaultOperator="AND". What I am missing is the equivalent to OR:
: AND: +
: NOT: -
: OR: ???
: I didn't find anything on the Solr or Lucene query syntax pages. If

that's true, regretably there is no prefix operator to indicate a "SHOULD" 
clause in the Lucene query langauge, so if you set the default op to "AND" 
you can't then override it on individual clauses.

this is one of hte reasons i never make the default op AND.

If i'm dealing with structured queries generated progromaticly or by 
"advanced" users (ie: people who know they are querying Solr) i leave the 
default op alone an let them specify the full syntax with total control.  
if i'm dealing with "novice" users who just want to search for stuff i use 
dismax with it's shinny sexy "mm" param (disclaimer: i wrote it) and the 
default op doesn't matter (even if i want to make every term a user types 
mandatory)

: I switched to the AND-default because that is the default in my web
: frontend so I don't have to change logic. What should I do in this
: situation? Go back to the OR-default?

it depends on what exactly your goals are ... you could always leave the 
defualt OR in the schema but have your front end send q.op when needed -- 
or set q.op as a default in a handler only used by your front end while 
other queries use handlers without it (and get the default behavior) ...

...or you could just ignore the ramblings of a crazy person 
like me who thinks AND and OR are abominations in a non-boolean logic 
system since they make sense for you and go about your day.

i'm sure your food will still taste pretty good :)




-Hoss



RE: deleting record from the index using deleteByQuery method

2008-09-25 Thread Chris Hostetter

: confused about is the field cumulative_delete. Does this have any
: significance to whether the delete was a success or not? Also shouldn't

cumulative_delete is just the count of all delete commands since the 
SolreCore was started up (as opposed to "delete" which is the count since 
the last commit)

: the method deleteByQuery return a diff status code based on if the
: delete was successful or not?

I'm not certain, but i don't think so -- consider a query for 
"asdflkakjadf" ... if there are no docs that match that query, then no 
docs get deleted -- but the operation is still successful (just like a 
search is still a success even if no docs are found)



-Hoss



Re: why index auto change?

2008-09-25 Thread Otis Gospodnetic
Somebody must have run some some index modifying command.  I can't think of 
anything else that would touch the index.  Have you triple-checked your logs?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: 李学健 <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, September 25, 2008 11:33:54 PM
> Subject: why index auto change?
> 
> hi, all
> 
> recently, i encounter this problem several times.
> index in solr automically changes, i never post any data today, but a
> part of index files as below:
> 
> -rw-r--r-- 1 root root 3202872 Sep 24 22:23 _f.prx
> -rw-r--r-- 1 root root 14595 Sep 24 22:23 _f.tii
> -rw-r--r-- 1 root root 1072202 Sep 24 22:23 _f.tis
> -rw-r--r-- 1 root root 95 Sep 26 10:43 _g.fnm
> -rw-r--r-- 1 root root 2220768 Sep 26 10:43 _g.frq
> -rw-r--r-- 1 root root 46229 Sep 26 10:43 _g.nrm
> -rw-r--r-- 1 root root 2467451 Sep 26 10:43 _g.prx
> -rw-r--r-- 1 root root 11108 Sep 26 10:43 _g.tii
> -rw-r--r-- 1 root root 799903 Sep 26 10:43 _g.tis
> -rw-r--r-- 1 root root 258 Sep 26 10:43 segments_4
> -rw-r--r-- 1 root root 20 Sep 26 10:43 segments.gen
> 
> 
> notice, _g files is modified today, others is created in 24th when i
> posted data.
> 
> can somebody explain this?
> thanks!



Re: Searching Question

2008-09-25 Thread Otis Gospodnetic
Sounds like a case for a function query where you use the field that stores the 
number of posts for a thread to adjust the score.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jake Conk <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, September 25, 2008 8:51:02 PM
> Subject: Searching Question
> 
> Hello,
> 
> We are using Solr for our new forums search feature. If possible when
> searching for the word "Halo" we would like threads that contain the
> word "Halo" the most with the least amount of posts in that thread to
> have a higher score.
> 
> For instance, if we have a thread with 10 posts and the word "Halo"
> shows up 5 times then that should have a lower score than a thread
> that has the word "Halo" 3 times within its posts and has 5 replies.
> Basically the thread that shows the search string most frequently
> amongst the number of posts in the thread should be the one with the
> highest score.
> 
> Is something like this possible?
> 
> Thanks,
> 
> - JC



Re: Bunch of questions regarding enterprise configuration

2008-09-25 Thread Otis Gospodnetic
Hi,

Your questions don't have simple answers, but here are some quick one.




- Original Message 
> I'm new to Solr, and have been reading through documentation off-and-on for
> days, but still have some unanswered basic/fundamental questions that have a
> huge impact on my implementation approach.
> I am thinking of moving my company's web app's main search engine over to
> Solr. My goal is to index 5M user records of a social networking website
> (most of which have a free-form text portion, but the majority of  data is
> non-text) and have complex searches against those records back in the
> sub-0.5s range of time. I have just under 10  application servers each
> running my web-app, which is mostly stateless except for things like users'
> online status.

How many servers have you got for running Solr? (assuming you don't intend to 
put Solr on the same servers as your webapp, as it sounds like each webapp is 
maxing out its server)

> Forgive me for asking so many in one email; feel free to change subject line
> and reply about individual items. Here's the questions:
> 
> 1. How to best organize a web-app that normally goes to a search-db to use
> Solr instead?
> a) Set up independent Solr instance, make app search it just like it used to
> search database.
> b) Integrate Solr right into app, so that app+solr get deployed together
> (this is very possible, as our app is Java). But we run  several instances
> of the app so we'd be running several Solr instances too.
> c) Set up independent Solr instance + our code (plugins or whatever?), have
> web clients request DIRECTLY to the Solr app and have  Solr return search
> results directly.
> d) Other configuration...?

a) Set up Solr master + N slaves on a separate set of boxes and access them 
remotely from your webapp.  If your webapp is a Java webapp, use SolrJ.  
Alternatively, if your webapp servers have enough spare CPU cycles and enough 
RAM, you could make those sam servers your 10 Solr slaves.

> 2. How to best handle Enums?
> We have a bunch of enumerated data (say, for example, shoe types). What
> "fieldType" should we use to index them?
> Should I index them as text? If I index "sandals" then if somebody searches
> for the keyword "sandals" then the documents that have shoeType=Sandals (eg,
> enum-value of "07") I'd want those documents to show up.

Sounds like "string" type.

> 3. Enums are related, sort-of:
> Sometimes our enumerated data is somewhat related. For example (in the "shoe
> types" example), let's say we have "sandals", well,  "crocs" are not
> sandals, but are SORT-oF like sandals, so we'd like them to match but score
> lower than an exact sandal match. How do  we do this? (Is this "Changing
> Similarity" or is that barking up the wrong tree?)

One option is to have a separate sort_of_like field where you stick various 
sort-of-like "synonyms".  If you are using DisMax you can include that 
sort_of_like field in the config but give it less boost than the "main" field.  
You could use index-time synonym injection for that sort_of_like field.

> 4. How to manage "Tags" data?
> Users on my site can enter "tags", and we want to be able to build
> tag-clouds, follow tag-links, and whatnot. Should I index tags as just a
> fieldType of "text"?

"text" is fine if you don't want tags to be exact.  Assume "photography" and 
"photo" have the same stem.  Do you want a user clicing on "photo" to get items 
tagged as "photography", too?  If so, use text, else consider string.  Treat 
multi-word tags as phrases.  Example: 
http://www.simpy.com/user/otis/tag/%22information+retrieval%22

> 5. How do I load the data?
> Loading all the data from the database (to anything!) takes a big chunk of
> time. Should I export it from the database once and then load it into Solr
> using CSV?

If export is not slow, then upload vis CSV should be faster than adding docs to 
Solr "the usual way".  But judging from your question below, you probably don't 
need the CSV approach.

> Follow-up: How would I manage loading this/new data on an ongoing basis? The
> site's users are creating data all the time, the bulk of  which is old (i.e.
> before today; could be bulk loaded), but after an initial bulk load it's
> ongoing data. Should I be just building  a huge Solr index on the filesystem
> and making sure I don't lose it?

Sounds like one-time bulk indexing followed by continous incremental indexing.  
You can have 2 masters to make things more fault-tolerant.  Or you can store 
your index on a SAN.  Or you can just count on your N Solr slaves acting as the 
"backup" (replicas) of your index, though they'll always be a little behind the 
master index.

> 6. How do I manage real-time data?
> For example, let's say I have users coming online and offline all the time,
> and I need to be able to search my set of "online  users". How should I go
> about this? Can this just be handled through index updates?

Yes, though there is no real-time search in Solr just

Anyproblem in running two solr instances on the same machine using the same directory ?

2008-09-25 Thread Jagadish Rath
Hi

  I am running two solr instances on the same machine using the same data
directory. one on port 8982 and the other on 8984.

   - 1st one *only accepts commits* (indexer) -- *port 8982*

 -- It has all tha cache size as 0, to get rid of warmup of
searchers

   - 2nd one* accepts all the queries*.(searcher) -- *port 8984*

 -- It has non-zero cache size as it needs to handle queries

   - I have a cron *which does a dummy commit to the 2nd instance (on port
   8984)* to update its searcher every 1 minute.

 --- *curl http://localhost:8984/solr/update -s -H
'Content-type:text/xml; charset=utf-8' -d  ""*

 I am trying to use this as a *solution to the maxWarmingSearcher limit
exceeded Error* that occurs as a result of a large no. of commits. I am
trying to use this solution as an alternate to the conventional master/slave
solution.

  I have following questions

   - *Is there any known issue with this solution or any issues that can be
   foreseen for this solution?*

*   -- does it result in a corrupted index ?
*

   - *What are the other solutions to the problem of "maxWarmingSearchers
   limit exceeded error " ?**  *

 A would really appreciate a quick response.

Thanks
Jagadish Rath


Re: Searching for future or "null" dates

2008-09-25 Thread Michael Lackhoff
On 26.09.2008 06:17 Chris Hostetter wrote:

> that's true, regretably there is no prefix operator to indicate a "SHOULD" 
> clause in the Lucene query langauge, so if you set the default op to "AND" 
> you can't then override it on individual clauses.
> 
> this is one of hte reasons i never make the default op AND.

Just for symmetry or to get rid of this restriction wouldn't it be a
good idea to add such a prefix operator?

> i'm sure your food will still taste pretty good :)

That's what my wife keeps telling me ;-)

Many thanks. I think I will leave it as is for the current application
but use OR-Default plus prefix operators for new projects.

-Michael