Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Jaco
Hello,

I need to work with an external stemmer in Solr. This stemmer is accessible
as a COM object (running Solr in tomcat on Windows platform). I managed to
integrate this using the com4j library. I tested two scenario's:
1. Create a custom FilterFactory and Filter class for this. The external
stemmer is then invoked for every token
2. Create a custom TokenizerFactory (extending BaseTokenizerFactory), that
invokes the external stemmer for the entire search text, then puts the
result of this into a StringReader, and finally returns new
WhitespaceTokenizer(stringReader), so the stemmed text gets tokenized by the
whitespace tokenizer.

Looking at search results, both scenario's appear to work from a functional
point of view. The first scenario however is too slow because of the
overhead of calling the external COM object for each token.

The second scenario is much faster, and also gives correct search results.
However, this then gives problems with highlighting - sometimes, errors are
reported (String out of Range), in other cases, I get incorrect highlight
fragments. Without knowing all details about this stuff, this makes sense
because of the change done to the text to be processed before it's
tokenized.  Maybe my second scenario does not make sense at all..?

Any ideas on how to overcome this or any other suggestions on how to realise
this?

Thanks, bye,

Jaco.

PS I posted this message twice before but it didn't come through (spam
filtering..??), so this is the 2nd try with text changed a bit


CLOSING SolrCore! >> ???

2008-09-26 Thread sunnyfr

Hello everybody,

I've big issue with the website, I don't know how but I can't start it
again: this is my Catalina.log

[EMAIL PROTECTED]:/# tail -f /var/log/tomcat5.5/catalina.2008-09-25.log
INFO: [book] CLOSING SolrCore!
Sep 25, 2008 5:56:16 PM org.apache.solr.core.SolrCore closeSearcher
INFO: [book] Closing main searcher on request.
Sep 25, 2008 5:56:16 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing [EMAIL PROTECTED] main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=2,evictions=0,size=2,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=10,evictions=0,size=10,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Sep 25, 2008 5:56:16 PM org.apache.solr.update.DirectUpdateHandler2 close
INFO: closing DirectUpdateHandler2{commits=1,autocommit
maxDocs=1,autocommit
maxTime=1000ms,autocommits=0,optimizes=0,docsPending=0,adds=46,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=46,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,docsDeleted=0}


Does somebody have and idea ??? I'm lost,

Thanks a lot,

-- 
View this message in context: 
http://www.nabble.com/CLOSING-SolrCore%21-%3E%3E-tp19683719p19683719.html
Sent from the Solr - User mailing list archive at Nabble.com.



Create Indexes

2008-09-26 Thread Dinesh Gupta











Hi All,

Please give me some links so that I can start from basics.

I have a large database of product.

1) Product can be associated with multiple category
2) a category can be associated with multiple catalogs.
3) category & catalog association goes on a table called category-catalog 
relation
4) Products are associated with category-catalog.

We are using Lucene a long time.
But database is now grow up.

Now it takes approximate 6-10 hours. We create index everyday. 

I have some question:

Is it OK to create whole index by Solr web-app?
If not than ,How can I create index?

I have attached some file that create index now.




Get the latest buzz on outsourcing. Up to date information on mergers, 
acquisitions and deals on BPO Watch. Try it now!

_
Movies, sports & news! Get your daily entertainment fix, only on live.com
http://www.live.com/?scope=video&form=MICOAL

Re: How to select one entity at a time?

2008-09-26 Thread con

To be more specific:
I have the data-config.xml just like: 



 










I have 3 search conditions. when the client wants to search all the users,
only the entity, 'user' must be executed. And if he wants to search all
managers, the entity, 'manager' must be executed.

How can i accomplish this through url?

Thanks
con 




Shalin Shekhar Mangar wrote:
> 
> On Thu, Sep 25, 2008 at 6:13 PM, con <[EMAIL PROTECTED]> wrote:
> 
>>
>> Hi
>> I have got two entities in my data-config.xml file, entity1 and entity2.
>> For condition-A I need to execute only entity1 and for condition-B only
>> the
>> entity2 needs to get executed.
>> How can I mention it while accessing the search index in the REST way.
>> Is there any option that i can give along with this query:
>>
>> http://localhost:8983/solr/select/?q=physics&version=2.2&start=0&rows=10&indent=on&wt=json
>>
> 
> I suppose that you are using multiple root entities and the solr document
> contains some field which tells us the entity it came from.
> 
> If yes, you can use filter queries (fq parameter) to filter on those
> fields.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19683927.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: NullPointerException

2008-09-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
I dunno if the problem is w/ date. are cdt and mdt date fields in the DB?

On Fri, Sep 26, 2008 at 12:58 AM, Shalin Shekhar Mangar
<[EMAIL PROTECTED]> wrote:
> I'm not sure about why the NullPointerException is coming. Is that the whole
> stack trace?
>
> The mdt and cdt are date in schema.xml but the format that is in the log is
> wrong. Look at the DateFormatTransformer in DataImportHandler which can
> format strings in your database to the correct date format needed for Solr.
>
> On Thu, Sep 25, 2008 at 7:09 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:
>
>>  Hi All,
>>
>> I have attached my file.
>>
>> I am getting exception.
>>
>> Please suggest me how to short-out this issue.
>>
>>
>>
>> WARNING: Error creating document : SolrInputDocumnt[{id=id(1.0)={93146},
>> ttl=ttl(1.0)={Majestic from Pushpams.com}, cdt=cdt(1.0)={2001-09-04
>> 15:40:40.0}, mdt=mdt(1.0)={2008-09-23 17:47:44.0}, prc=prc(1.0)={600.00}}]
>> java.lang.NullPointerException
>> at org.apache.lucene.document.Document.getField(Document.java:140)
>> at
>> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:283)
>> at
>> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
>> at
>> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
>> at
>> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288)
>> at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
>> at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>> at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>> at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>> at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>> at
>> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:190)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>> at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
>> at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
>> at
>> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
>> at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
>> at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
>> at
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>> at
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
>> at
>> org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)
>> at
>> org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)
>>
>> --
>> MSN Technology brings you the latest on gadgets, gizmos and the new hits in
>> the gaming market. Try it now! 
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
--Noble Paul


Re: Create Indexes

2008-09-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Sep 26, 2008 at 1:02 PM, Dinesh Gupta
<[EMAIL PROTECTED]> wrote:
>
>
> Hi All,
>
> Please give me some links so that I can start from basics.
>
> I have a large database of product.
>
> 1) Product can be associated with multiple category
> 2) a category can be associated with multiple catalogs.
> 3) category & catalog association goes on a table called category-catalog
> relation
> 4) Products are associated with category-catalog.
>
> We are using Lucene a long time.
> But database is now grow up.
>
> Now it takes approximate 6-10 hours. We create index everyday.
>
> I have some question:
>
> Is it OK to create whole index by Solr web-app?
It is OK . That is how everyone does it. We do not create a Lucene
index and load it to solr
> If not than ,How can I create index?
>
> I have attached some file that create index now.
File is missing
>
>
>
>
> 
> Get the latest buzz on outsourcing. Up to date information on mergers,
> acquisitions and deals on BPO Watch. Try it now!
> 
> Voice your opinion on the burning issues of the day. Discuss, debate with
> the world. Logon to message boards on MSN. Try it!



-- 
--Noble Paul


Re: How to select one entity at a time?

2008-09-26 Thread Norberto Meijome
On Fri, 26 Sep 2008 00:46:07 -0700 (PDT)
con <[EMAIL PROTECTED]> wrote:

> To be more specific:
> I have the data-config.xml just like: 
> 
>   
>   
>
>   
>   
>   
>   
> 
>   
>   
>   
> 

Con, I may be confused here...are you asking how to load only data from your 
USERS SQL table into SOLR, or how to search in your SOLR index for data about 
'USERS'.

data-config.xml is only relevant for the Data Import Handler...but your 
following question:

> 
> I have 3 search conditions. when the client wants to search all the users,
> only the entity, 'user' must be executed. And if he wants to search all
> managers, the entity, 'manager' must be executed.
> 
> How can i accomplish this through url?

*seems* to indicate you want to search on this .

If you want to search on a particular field from your SOLR schema, DIH is not 
involved. If you use the standard QH, you say ?q=user:Bob 

If I misunderstood your question, please explain...

cheers,
b
_
{Beto|Norberto|Numard} Meijome

"Everything is interesting if you go into it deeply enough"
  Richard Feynman

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How to select one entity at a time?

2008-09-26 Thread con

What you meant is correct only. Please excuse for that I am new to solr. :-(
I want to index all the query results. (I think this will be done by the
data-config.xml) 
Now while accessing this indexed data, i need this filtering. ie. Either
user or manager.
I tried your suggestion:
http://localhost:8983/solr/select/?q=user:bob&version=2.2&start=0&rows=10&indent=on&wt=json
But it didn't worked. Is the above URL correct or is there some other
option.


Thanks for your reply.. 
con





Norberto Meijome-6 wrote:
> 
> On Fri, 26 Sep 2008 00:46:07 -0700 (PDT)
> con <[EMAIL PROTECTED]> wrote:
> 
>> To be more specific:
>> I have the data-config.xml just like: 
>> 
>>  
>>  
>>   
>>  
>>  
>>  
>>  
>> 
>>  
>>  
>>  
>> 
> 
> Con, I may be confused here...are you asking how to load only data from
> your USERS SQL table into SOLR, or how to search in your SOLR index for
> data about 'USERS'.
> 
> data-config.xml is only relevant for the Data Import Handler...but your
> following question:
> 
>> 
>> I have 3 search conditions. when the client wants to search all the
>> users,
>> only the entity, 'user' must be executed. And if he wants to search all
>> managers, the entity, 'manager' must be executed.
>> 
>> How can i accomplish this through url?
> 
> *seems* to indicate you want to search on this .
> 
> If you want to search on a particular field from your SOLR schema, DIH is
> not involved. If you use the standard QH, you say ?q=user:Bob 
> 
> If I misunderstood your question, please explain...
> 
> cheers,
> b
> _
> {Beto|Norberto|Numard} Meijome
> 
> "Everything is interesting if you go into it deeply enough"
>   Richard Feynman
> 
> I speak for myself, not my employer. Contents may be hot. Slippery when
> wet. Reading disclaimers makes you go blind. Writing them is worse. You
> have been Warned.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19685292.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: NullPointerException

2008-09-26 Thread Dinesh Gupta

Hi,

Yes, cdt & mdt are the date in MYSQL DB

> Date: Fri, 26 Sep 2008 13:58:24 +0530
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: NullPointerException
> 
> I dunno if the problem is w/ date. are cdt and mdt date fields in the DB?
> 
> On Fri, Sep 26, 2008 at 12:58 AM, Shalin Shekhar Mangar
> <[EMAIL PROTECTED]> wrote:
> > I'm not sure about why the NullPointerException is coming. Is that the whole
> > stack trace?
> >
> > The mdt and cdt are date in schema.xml but the format that is in the log is
> > wrong. Look at the DateFormatTransformer in DataImportHandler which can
> > format strings in your database to the correct date format needed for Solr.
> >
> > On Thu, Sep 25, 2008 at 7:09 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:
> >
> >>  Hi All,
> >>
> >> I have attached my file.
> >>
> >> I am getting exception.
> >>
> >> Please suggest me how to short-out this issue.
> >>
> >>
> >>
> >> WARNING: Error creating document : SolrInputDocumnt[{id=id(1.0)={93146},
> >> ttl=ttl(1.0)={Majestic from Pushpams.com}, cdt=cdt(1.0)={2001-09-04
> >> 15:40:40.0}, mdt=mdt(1.0)={2008-09-23 17:47:44.0}, prc=prc(1.0)={600.00}}]
> >> java.lang.NullPointerException
> >> at org.apache.lucene.document.Document.getField(Document.java:140)
> >> at
> >> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:283)
> >> at
> >> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
> >> at
> >> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
> >> at
> >> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288)
> >> at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
> >> at
> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
> >> at
> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
> >> at
> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
> >> at
> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
> >> at
> >> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:190)
> >> at
> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
> >> at
> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> >> at
> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> >> at
> >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
> >> at
> >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
> >> at
> >> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
> >> at
> >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
> >> at
> >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
> >> at
> >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
> >> at
> >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
> >> at
> >> org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)
> >> at
> >> org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)
> >>
> >> --
> >> MSN Technology brings you the latest on gadgets, gizmos and the new hits in
> >> the gaming market. Try it now! 
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
> 
> 
> 
> -- 
> --Noble Paul

_
Searching for weekend getaways? Try Live.com
http://www.live.com/?scope=video&form=MICOAL

Create Indexes

2008-09-26 Thread Dinesh Gupta










Hi All,

Please give me some links so that I can start from basics.

I have a large database of product.

1) Product can be associated with multiple category
2) a category can be associated with multiple catalogs.
3) category & catalog association goes on a table called category-catalog 
relation
4) Products are associated with category-catalog.

We are using Lucene a long time.
But database is now grow up.

Now it takes approximate 6-10 hours. We create index everyday. 

I have some question:

Is it OK to create whole index by Solr web-app?
If not than ,How can I create index?

I have attached some file that create index now.






_
Search for videos of Bollywood, Hollywood, Mollywood and every other wood, only 
on Live.com 
http://www.live.com/?scope=video&form=MICOAL

Re: Create Indexes

2008-09-26 Thread Norberto Meijome
On Fri, 26 Sep 2008 16:32:05 +0530
Dinesh Gupta <[EMAIL PROTECTED]> wrote:

> Is it OK to create whole index by Solr web-app?
> If not than ,How can I create index?
> 
> I have attached some file that create index now.
> 

Dinesh,
you sent the same email 2 1/2 hours ago. sending it again will not give you 
more answers.

If you have a file you want to share, you should upload it to a webserver and 
share the URL - most mailing lists drop any file attachments.


_
{Beto|Norberto|Numard} Meijome

Never take Life too seriously, no one gets out alive anyway.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How to select one entity at a time?

2008-09-26 Thread Norberto Meijome
On Fri, 26 Sep 2008 02:35:18 -0700 (PDT)
con <[EMAIL PROTECTED]> wrote:

> What you meant is correct only. Please excuse for that I am new to solr. :-(

Con, have a read here :

http://www.ibm.com/developerworks/java/library/j-solr1/

it helped me pick up the basics a while back. it refers to 1.2, but the core 
concepts are relevant to 1.3 too.

b
_
{Beto|Norberto|Numard} Meijome

Hildebrant's Principle:
If you don't know where you are going,
any road will get you there.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How to select one entity at a time?

2008-09-26 Thread Norberto Meijome
On Fri, 26 Sep 2008 02:35:18 -0700 (PDT)
con <[EMAIL PROTECTED]> wrote:

> What you meant is correct only. Please excuse for that I am new to solr. :-(

hi Con,
nothing to be excused for..but you may want to read the wiki , as it provides
quite a lot of information that should answer your questions. DIH is great, but
I wouldn't go near it until you understand how to create your own schema.xml
and solrconfig.xml .

http://wiki.apache.org/solr/FrontPage is the wiki

( everyone else ... is there a guide on getting started on SOLR ? step by step,
taking the example and changing it for your own use?  )

> I want to index all the query results. (I think this will be done by the
> data-config.xml) 

hmm...terminology :-) 
you index documents (similar to records in a database).

when you send a query to Solr, you will get results if your query 

> Now while accessing this indexed data, i need this filtering. ie. Either
> user or manager.
> I tried your suggestion:
> http://localhost:8983/solr/select/?q=user:bob&version=2.2&start=0&rows=10&indent=on&wt=json

the url LOOKS ok. do you have any document in your index with field user
containing 'bob; ? 

try this to get all results ( xml format, first 3 results only...

http://localhost:8983/solr/select/?q=*:*&rows=3

then, find a field with a value , then search for that value and see if you get
that document back - it should work...(with lots of caveats, yes)..

If you send us the result we can help u understand better why it isn't
working as you intend..
b
_
{Beto|Norberto|Numard} Meijome

"First they ignore you, then they laugh at you, then they fight you, then you
win." Mahatma Gandhi.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: spellcheck: buildOnOptimize?

2008-09-26 Thread Grant Ingersoll

That seems reasonable.

Another thing to think about, is maybe it is useful to provide some  
event metadata to the events that contain information about what  
triggered them.  Something like a SolrEvent class such that postCommit  
looks like

postCommit(SolrEvent evt)

and
public void newSearcher(SolrEvent evt, SolrIndexSearcher newSearcher,  
SolrIndexSearcher currentSearcher);


Of course, since SolrEventListener is an interface...

On Sep 25, 2008, at 11:57 PM, Chris Hostetter wrote:



: postCommit/postOptimize callbacks happen after commit/optimize but  
before a
: new searcher is opened. Therefore, it is not possible to re-build  
spellcheck
: index on those events without opening a IndexReader directly on  
the solr


FWIW: I believe it has to work that way because postCommit events  
might

modify the index. (but i'm just guessing)

: index. That is why the event listener in SpellCheckComponent uses  
the

: newSearcher listener to build on commits.
:
: I don't think there is anything in the API currently to do what  
Jason wants.


couldn't the Listener's newSearcher() method just do something like
this...

if (rebuildOnlyAfterOptimize &&
   ! (newSearcher.getReader().isOptimized() &&
  ! oldSearcher.getReader().isOptimized()) {
 return;
} else {
 // current impl
}

...assuming a new "rebuildOnlyAfterOptimize" option was added?

-Hoss






Re: Searching Question

2008-09-26 Thread Grant Ingersoll
Is a thread and all of it's posts a single document?  In other words,  
how are you modeling your posts as Solr documents?  Also, where are  
you keeping track of the number of replies?  Is that in Solr or in a DB?


-Grant

On Sep 25, 2008, at 8:51 PM, Jake Conk wrote:


Hello,

We are using Solr for our new forums search feature. If possible when
searching for the word "Halo" we would like threads that contain the
word "Halo" the most with the least amount of posts in that thread to
have a higher score.

For instance, if we have a thread with 10 posts and the word "Halo"
shows up 5 times then that should have a lower score than a thread
that has the word "Halo" 3 times within its posts and has 5 replies.
Basically the thread that shows the search string most frequently
amongst the number of posts in the thread should be the one with the
highest score.

Is something like this possible?

Thanks,

- JC


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: CLOSING SolrCore! >> ???

2008-09-26 Thread Grant Ingersoll
Can you provide more information?  What happened right before seeing  
this msg?  What version of Solr are you on?


-Grant

On Sep 26, 2008, at 3:26 AM, sunnyfr wrote:



Hello everybody,

I've big issue with the website, I don't know how but I can't start it
again: this is my Catalina.log

[EMAIL PROTECTED]:/# tail -f /var/log/tomcat5.5/catalina.2008-09-25.log
INFO: [book] CLOSING SolrCore!
Sep 25, 2008 5:56:16 PM org.apache.solr.core.SolrCore closeSearcher
INFO: [book] Closing main searcher on request.
Sep 25, 2008 5:56:16 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing [EMAIL PROTECTED] main

filterCache 
{lookups 
= 
0 
,hits 
= 
0 
,hitratio 
= 
0.00 
,inserts 
= 
0 
,evictions 
= 
0 
,size 
= 
0 
,cumulative_lookups 
= 
0 
,cumulative_hits 
= 
0 
,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


queryResultCache 
{lookups 
= 
0 
,hits 
= 
0 
,hitratio 
= 
0.00 
,inserts 
= 
2 
,evictions 
= 
0 
,size 
= 
2 
,cumulative_lookups 
= 
0 
,cumulative_hits 
= 
0 
,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


documentCache 
{lookups 
= 
0 
,hits 
= 
0 
,hitratio 
= 
0.00 
,inserts 
= 
10 
,evictions 
= 
0 
,size 
= 
10 
,cumulative_lookups 
= 
0 
,cumulative_hits 
= 
0 
,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Sep 25, 2008 5:56:16 PM org.apache.solr.update.DirectUpdateHandler2  
close

INFO: closing DirectUpdateHandler2{commits=1,autocommit
maxDocs=1,autocommit
maxTime 
= 
1000ms 
,autocommits 
= 
0 
,optimizes 
= 
0 
,docsPending 
= 
0 
,adds 
= 
46 
,deletesById 
= 
0 
,deletesByQuery 
= 
0 
,errors 
= 
0 
,cumulative_adds 
= 
46 
,cumulative_deletesById 
=0,cumulative_deletesByQuery=0,cumulative_errors=0,docsDeleted=0}



Does somebody have and idea ??? I'm lost,

Thanks a lot,

--
View this message in context: 
http://www.nabble.com/CLOSING-SolrCore%21-%3E%3E-tp19683719p19683719.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Grant Ingersoll
How are you creating the tokens?  What are you setting for the offsets  
and the positions?


One thing that is helpful is Solr's built in Analysis tool via the  
Admin interface (http://localhost:8983/solr/admin/)  From there, you  
can plug in verbose mode, and see what the position and offsets are  
for every piece of your Analyzer.


-Grant

On Sep 26, 2008, at 3:10 AM, Jaco wrote:


Hello,

I need to work with an external stemmer in Solr. This stemmer is  
accessible
as a COM object (running Solr in tomcat on Windows platform). I  
managed to

integrate this using the com4j library. I tested two scenario's:
1. Create a custom FilterFactory and Filter class for this. The  
external

stemmer is then invoked for every token
2. Create a custom TokenizerFactory (extending  
BaseTokenizerFactory), that

invokes the external stemmer for the entire search text, then puts the
result of this into a StringReader, and finally returns new
WhitespaceTokenizer(stringReader), so the stemmed text gets  
tokenized by the

whitespace tokenizer.

Looking at search results, both scenario's appear to work from a  
functional

point of view. The first scenario however is too slow because of the
overhead of calling the external COM object for each token.

The second scenario is much faster, and also gives correct search  
results.
However, this then gives problems with highlighting - sometimes,  
errors are
reported (String out of Range), in other cases, I get incorrect  
highlight
fragments. Without knowing all details about this stuff, this makes  
sense

because of the change done to the text to be processed before it's
tokenized.  Maybe my second scenario does not make sense at all..?

Any ideas on how to overcome this or any other suggestions on how to  
realise

this?

Thanks, bye,

Jaco.

PS I posted this message twice before but it didn't come through (spam
filtering..??), so this is the 2nd try with text changed a bit


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









RE: Create Indexes

2008-09-26 Thread Dinesh Gupta

Hi,

Please tell me where to upload the files.


Regard,
Dinesh Gupta

> Date: Fri, 26 Sep 2008 21:23:58 +1000
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: Create Indexes
> 
> On Fri, 26 Sep 2008 16:32:05 +0530
> Dinesh Gupta <[EMAIL PROTECTED]> wrote:
> 
> > Is it OK to create whole index by Solr web-app?
> > If not than ,How can I create index?
> > 
> > I have attached some file that create index now.
> > 
> 
> Dinesh,
> you sent the same email 2 1/2 hours ago. sending it again will not give you 
> more answers.
> 
> If you have a file you want to share, you should upload it to a webserver and 
> share the URL - most mailing lists drop any file attachments.
> 
> 
> _
> {Beto|Norberto|Numard} Meijome
> 
> Never take Life too seriously, no one gets out alive anyway.
> 
> I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
> Reading disclaimers makes you go blind. Writing them is worse. You have been 
> Warned.

_
Want to explore the world? Visit MSN Travel for the best deals.
http://in.msn.com/coxandkings

Re: spellcheck: buildOnOptimize?

2008-09-26 Thread Shalin Shekhar Mangar
On Fri, Sep 26, 2008 at 9:27 AM, Chris Hostetter
<[EMAIL PROTECTED]>wrote:

>
> couldn't the Listener's newSearcher() method just do something like
> this...
>
> if (rebuildOnlyAfterOptimize &&
>! (newSearcher.getReader().isOptimized() &&
>   ! oldSearcher.getReader().isOptimized()) {
>  return;
> } else {
>  // current impl
> }
>
> ...assuming a new "rebuildOnlyAfterOptimize" option was added?
>

Yup, that will work.

Jason, can you please open a jira issue to add this feature?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Jaco
Hi,

Here's some of the code of my Tokenizer:

public class MyTokenizerFactory extends BaseTokenizerFactory
{
public WhitespaceTokenizer create(Reader input)
{
String text, normalizedText;

try {
text  = IOUtils.toString(input);
normalizedText= *invoke my stemmer(text)*;

}
catch( IOException ex ) {
throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
ex );
}

StringReaderstringReader = new StringReader(normalizedText);

return new WhitespaceTokenizer(stringReader);
}
}

I see what's going in the analysis tool now, and I think I understand the
problem. For instance, the text: abcdxxx defgxxx. Let's assume the stemmer
gets rid of xxx.

I would then see this in the analysis tool after the tokenizer stage:
- abcd - term position 1; start: 1; end:  3
- defg - term position 2; start: 4; end: 7

These positions are not in line with the initial search text - this must be
why the highlighting goes wrong. I guess my little trick to do this was a
bit too simple because it messes up the positions basically because
something different from the original source text is tokenized.

Any suggestions would be very welcome...

Cheers,

Jaco.


2008/9/26 Grant Ingersoll <[EMAIL PROTECTED]>

> How are you creating the tokens?  What are you setting for the offsets and
> the positions?
>
> One thing that is helpful is Solr's built in Analysis tool via the Admin
> interface (http://localhost:8983/solr/admin/)  From there, you can plug in
> verbose mode, and see what the position and offsets are for every piece of
> your Analyzer.
>
> -Grant
>
>
> On Sep 26, 2008, at 3:10 AM, Jaco wrote:
>
>  Hello,
>>
>> I need to work with an external stemmer in Solr. This stemmer is
>> accessible
>> as a COM object (running Solr in tomcat on Windows platform). I managed to
>> integrate this using the com4j library. I tested two scenario's:
>> 1. Create a custom FilterFactory and Filter class for this. The external
>> stemmer is then invoked for every token
>> 2. Create a custom TokenizerFactory (extending BaseTokenizerFactory), that
>> invokes the external stemmer for the entire search text, then puts the
>> result of this into a StringReader, and finally returns new
>> WhitespaceTokenizer(stringReader), so the stemmed text gets tokenized by
>> the
>> whitespace tokenizer.
>>
>> Looking at search results, both scenario's appear to work from a
>> functional
>> point of view. The first scenario however is too slow because of the
>> overhead of calling the external COM object for each token.
>>
>> The second scenario is much faster, and also gives correct search results.
>> However, this then gives problems with highlighting - sometimes, errors
>> are
>> reported (String out of Range), in other cases, I get incorrect highlight
>> fragments. Without knowing all details about this stuff, this makes sense
>> because of the change done to the text to be processed before it's
>> tokenized.  Maybe my second scenario does not make sense at all..?
>>
>> Any ideas on how to overcome this or any other suggestions on how to
>> realise
>> this?
>>
>> Thanks, bye,
>>
>> Jaco.
>>
>> PS I posted this message twice before but it didn't come through (spam
>> filtering..??), so this is the 2nd try with text changed a bit
>>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>


Re: Bunch of questions regarding enterprise configuration

2008-09-26 Thread Dev Team
Hi Otis,
 First off, thanks for your complete reply! It certainly has a lot of
good info in it.
 To address some of the questions you asked, please see below:

On Fri, Sep 26, 2008 at 1:36 AM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> Your questions don't have simple answers, but here are some quick one.
>
>
>
>
> - Original Message 
> > I'm new to Solr, and have been reading through documentation off-and-on
> for
> > days, but still have some unanswered basic/fundamental questions that
> have a
> > huge impact on my implementation approach.
> > I am thinking of moving my company's web app's main search engine over to
> > Solr. My goal is to index 5M user records of a social networking website
> > (most of which have a free-form text portion, but the majority of  data
> is
> > non-text) and have complex searches against those records back in the
> > sub-0.5s range of time. I have just under 10  application servers each
> > running my web-app, which is mostly stateless except for things like
> users'
> > online status.
>
> How many servers have you got for running Solr? (assuming you don't intend
> to put Solr on the same servers as your webapp, as it sounds like each
> webapp is maxing out its server)


Right now, 0. I'm still investigating how to get it working, not yet close
to estimating load. Once we get things to the testing stage, I'm sure we'll
have an idea of what kind of production hardware we'll need to
purchase/reuse/whatever.


>
>
> > Forgive me for asking so many in one email; feel free to change subject
> line
> > and reply about individual items. Here's the questions:
> >
> > 1. How to best organize a web-app that normally goes to a search-db to
> use
> > Solr instead?
> > a) Set up independent Solr instance, make app search it just like it used
> to
> > search database.
> > b) Integrate Solr right into app, so that app+solr get deployed together
> > (this is very possible, as our app is Java). But we run  several
> instances
> > of the app so we'd be running several Solr instances too.
> > c) Set up independent Solr instance + our code (plugins or whatever?),
> have
> > web clients request DIRECTLY to the Solr app and have  Solr return search
> > results directly.
> > d) Other configuration...?
>
> a) Set up Solr master + N slaves on a separate set of boxes and access them
> remotely from your webapp.  If your webapp is a Java webapp, use SolrJ.
>  Alternatively, if your webapp servers have enough spare CPU cycles and
> enough RAM, you could make those sam servers your 10 Solr slaves.


I see, thank you.

Out of curiosity, how much RAM are we talking? (My database has about 6 gigs
of data that we'd want to index for search.) The reason I ask is because my
webapp servers do have CPU/RAM to spare.


>
>
> > 2. How to best handle Enums?
> > We have a bunch of enumerated data (say, for example, shoe types). What
> > "fieldType" should we use to index them?
> > Should I index them as text? If I index "sandals" then if somebody
> searches
> > for the keyword "sandals" then the documents that have shoeType=Sandals
> (eg,
> > enum-value of "07") I'd want those documents to show up.
>
> Sounds like "string" type.


Okay I'll look into it more, thanks.


>
>
> > 3. Enums are related, sort-of:
> > Sometimes our enumerated data is somewhat related. For example (in the
> "shoe
> > types" example), let's say we have "sandals", well,  "crocs" are not
> > sandals, but are SORT-oF like sandals, so we'd like them to match but
> score
> > lower than an exact sandal match. How do  we do this? (Is this "Changing
> > Similarity" or is that barking up the wrong tree?)
>
> One option is to have a separate sort_of_like field where you stick various
> sort-of-like "synonyms".  If you are using DisMax you can include that
> sort_of_like field in the config but give it less boost than the "main"
> field.  You could use index-time synonym injection for that sort_of_like
> field.


Wow, okay... I'll definitely have to do a bit more reading to understand
what you just said. ;)


>
>
> > 4. How to manage "Tags" data?
> > Users on my site can enter "tags", and we want to be able to build
> > tag-clouds, follow tag-links, and whatnot. Should I index tags as just a
> > fieldType of "text"?
>
> "text" is fine if you don't want tags to be exact.  Assume "photography"
> and "photo" have the same stem.  Do you want a user clicing on "photo" to
> get items tagged as "photography", too?  If so, use text, else consider
> string.  Treat multi-word tags as phrases.  Example:
> http://www.simpy.com/user/otis/tag/%22information+retrieval%22


Hmm... You raise a good question in there. Thanks for that info, I'll look
into it more.


> 
>
> > 5. How do I load the data?
> > Loading all the data from the database (to anything!) takes a big chunk
> of
> > time. Should I export it from the database once and then load it into
> Solr
> > using CSV?
>

Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Grant Ingersoll


On Sep 26, 2008, at 9:40 AM, Jaco wrote:


Hi,

Here's some of the code of my Tokenizer:

public class MyTokenizerFactory extends BaseTokenizerFactory
{
   public WhitespaceTokenizer create(Reader input)
   {
   String text, normalizedText;

   try {
   text  = IOUtils.toString(input);
   normalizedText= *invoke my stemmer(text)*;

   }
   catch( IOException ex ) {
   throw new  
SolrException( SolrException.ErrorCode.SERVER_ERROR,

ex );
   }

   StringReaderstringReader = new  
StringReader(normalizedText);


   return new WhitespaceTokenizer(stringReader);
   }
}

I see what's going in the analysis tool now, and I think I  
understand the
problem. For instance, the text: abcdxxx defgxxx. Let's assume the  
stemmer

gets rid of xxx.

I would then see this in the analysis tool after the tokenizer stage:
- abcd - term position 1; start: 1; end:  3
- defg - term position 2; start: 4; end: 7

These positions are not in line with the initial search text - this  
must be
why the highlighting goes wrong. I guess my little trick to do this  
was a

bit too simple because it messes up the positions basically because
something different from the original source text is tokenized.


Yes, this is exactly the problem.  I don't know enough about com4J or  
your stemmer, but some things come to mind:


1. Are you having to restart/initialize the stemmer every time for  
your "slow" approach?  Does that really need to happen?
2. Can the stemmer return something other than a String?  Say a String  
array of all the stemmed words?  Or maybe even some type of object  
that tells you the original word and the stemmed word?


-Grant


Re: Bunch of questions regarding enterprise configuration

2008-09-26 Thread Otis Gospodnetic
Hi Daryl,

Re RAM amount - depends on your particular index (DB size doesn't help - who 
knows how you'll analyze/tokenize/index data, what term distribution is like, 
etc.)

Re master-slave - look for Collection Replication page on the Wiki

Re real-time IM-like presence - perhaps you can do it all in RAM(Directory), or 
even InstantiatedIndex (in Lucene contrib), perhaps you can post-process 
results with "is the user X online?" type of looking in some fast in-memory 
data structure that's not necessarily Solr/Lucene.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Dev Team <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, September 26, 2008 10:38:39 AM
> Subject: Re: Bunch of questions regarding enterprise configuration
> 
> Hi Otis,
>  First off, thanks for your complete reply! It certainly has a lot of
> good info in it.
>  To address some of the questions you asked, please see below:
> 
> On Fri, Sep 26, 2008 at 1:36 AM, Otis Gospodnetic <
> [EMAIL PROTECTED]> wrote:
> 
> > Hi,
> >
> > Your questions don't have simple answers, but here are some quick one.
> >
> >
> >
> >
> > - Original Message 
> > > I'm new to Solr, and have been reading through documentation off-and-on
> > for
> > > days, but still have some unanswered basic/fundamental questions that
> > have a
> > > huge impact on my implementation approach.
> > > I am thinking of moving my company's web app's main search engine over to
> > > Solr. My goal is to index 5M user records of a social networking website
> > > (most of which have a free-form text portion, but the majority of  data
> > is
> > > non-text) and have complex searches against those records back in the
> > > sub-0.5s range of time. I have just under 10  application servers each
> > > running my web-app, which is mostly stateless except for things like
> > users'
> > > online status.
> >
> > How many servers have you got for running Solr? (assuming you don't intend
> > to put Solr on the same servers as your webapp, as it sounds like each
> > webapp is maxing out its server)
> 
> 
> Right now, 0. I'm still investigating how to get it working, not yet close
> to estimating load. Once we get things to the testing stage, I'm sure we'll
> have an idea of what kind of production hardware we'll need to
> purchase/reuse/whatever.
> 
> 
> >
> >
> > > Forgive me for asking so many in one email; feel free to change subject
> > line
> > > and reply about individual items. Here's the questions:
> > >
> > > 1. How to best organize a web-app that normally goes to a search-db to
> > use
> > > Solr instead?
> > > a) Set up independent Solr instance, make app search it just like it used
> > to
> > > search database.
> > > b) Integrate Solr right into app, so that app+solr get deployed together
> > > (this is very possible, as our app is Java). But we run  several
> > instances
> > > of the app so we'd be running several Solr instances too.
> > > c) Set up independent Solr instance + our code (plugins or whatever?),
> > have
> > > web clients request DIRECTLY to the Solr app and have  Solr return search
> > > results directly.
> > > d) Other configuration...?
> >
> > a) Set up Solr master + N slaves on a separate set of boxes and access them
> > remotely from your webapp.  If your webapp is a Java webapp, use SolrJ.
> >  Alternatively, if your webapp servers have enough spare CPU cycles and
> > enough RAM, you could make those sam servers your 10 Solr slaves.
> 
> 
> I see, thank you.
> 
> Out of curiosity, how much RAM are we talking? (My database has about 6 gigs
> of data that we'd want to index for search.) The reason I ask is because my
> webapp servers do have CPU/RAM to spare.
> 
> 
> >
> >
> > > 2. How to best handle Enums?
> > > We have a bunch of enumerated data (say, for example, shoe types). What
> > > "fieldType" should we use to index them?
> > > Should I index them as text? If I index "sandals" then if somebody
> > searches
> > > for the keyword "sandals" then the documents that have shoeType=Sandals
> > (eg,
> > > enum-value of "07") I'd want those documents to show up.
> >
> > Sounds like "string" type.
> 
> 
> Okay I'll look into it more, thanks.
> 
> 
> >
> >
> > > 3. Enums are related, sort-of:
> > > Sometimes our enumerated data is somewhat related. For example (in the
> > "shoe
> > > types" example), let's say we have "sandals", well,  "crocs" are not
> > > sandals, but are SORT-oF like sandals, so we'd like them to match but
> > score
> > > lower than an exact sandal match. How do  we do this? (Is this "Changing
> > > Similarity" or is that barking up the wrong tree?)
> >
> > One option is to have a separate sort_of_like field where you stick various
> > sort-of-like "synonyms".  If you are using DisMax you can include that
> > sort_of_like field in the config but give it less boost than the "main"
> > field.  You could use index-time synonym injection for that sort_of_like
> > f

Is it possible to specify a pattern of Ranking while querying the indexes?

2008-09-26 Thread tushar kapoor

I want to specify a particular pattern in which results are retrieved for a
query. Can a pattern of ranks be specified in the query ?
-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-specify-a-pattern-of-Ranking-while-querying-the-indexes--tp19690731p19690731.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Bunch of questions regarding enterprise configuration

2008-09-26 Thread Dev Team
Hi Otis,
 Ah, okay those are all great pointers, thanks. I will certainly have to
do more research, and then I'll certainly have more questions later.

 I have thought of using some kind of non-lucene/solr distributed cache
to narrow-down the online search... but the problem comes when there's
millions of users who could be online at a given time. I don't want to
(forgive database-speak here) tack on a huge in-clause to the end of my
search query, like:  search for "baseball" and "sandals" where members in
(...millions of ids?...).
 The in-memory data structure of presence data is definitely necessary
for certain tasks, and useful for others, however it doesn't help me in two
places:
1) non-trivial searches that include presence criteria,
2) sorting all search results by presence info.
 --But thank you, I will look into the things you mentioned. Given how
little I know about Solr right now, maybe I'm worried about problems that
have already been solved.

 Thanks again very much for your help.

Sincerely,

 Daryl.


Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Jaco
Hi Grant,

In reply to your questions:

1. Are you having to restart/initialize the stemmer every time for your
"slow" approach?  Does that really need to happen?

It is invoking a COM object in Windows. The object is instantiated once for
a token stream, and then invoked once for each token. The invoke always has
an overhead, not much to do about that (sigh...)

2. Can the stemmer return something other than a String?  Say a String array
of all the stemmed words?  Or maybe even some type of object that tells you
the original word and the stemmed word?

The stemmer can only return a String. But, I do know that the returned
string always has exactly the same number of words as the input string. So
logically, it would be possible to :
a) first calculate the position/start/end of each token in the input string
(usual tokenization by Whitespace), resulting in token list 1
b) then invoke the stemmer, and tokenize that result by Whitespace,
resulting in token list 2
c) 'merge' the token values of token list 2 into token list 1, which is
possible because each token's position is the same in both lists...
d) return that 'merged' token list 2 for further processing

Would this work in Solr?

I can do some Java coding to achieve that from logical point of view, but I
wouldn't know how to structure this flow into the MyTokenizerFactory, so
some hints to achieve that would be great!

Thanks for helping out!

Jaco.


2008/9/26 Grant Ingersoll <[EMAIL PROTECTED]>

>
> On Sep 26, 2008, at 9:40 AM, Jaco wrote:
>
>  Hi,
>>
>> Here's some of the code of my Tokenizer:
>>
>> public class MyTokenizerFactory extends BaseTokenizerFactory
>> {
>>   public WhitespaceTokenizer create(Reader input)
>>   {
>>   String text, normalizedText;
>>
>>   try {
>>   text  = IOUtils.toString(input);
>>   normalizedText= *invoke my stemmer(text)*;
>>
>>   }
>>   catch( IOException ex ) {
>>   throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
>> ex );
>>   }
>>
>>   StringReaderstringReader = new StringReader(normalizedText);
>>
>>   return new WhitespaceTokenizer(stringReader);
>>   }
>> }
>>
>> I see what's going in the analysis tool now, and I think I understand the
>> problem. For instance, the text: abcdxxx defgxxx. Let's assume the stemmer
>> gets rid of xxx.
>>
>> I would then see this in the analysis tool after the tokenizer stage:
>> - abcd - term position 1; start: 1; end:  3
>> - defg - term position 2; start: 4; end: 7
>>
>> These positions are not in line with the initial search text - this must
>> be
>> why the highlighting goes wrong. I guess my little trick to do this was a
>> bit too simple because it messes up the positions basically because
>> something different from the original source text is tokenized.
>>
>
> Yes, this is exactly the problem.  I don't know enough about com4J or your
> stemmer, but some things come to mind:
>
> 1. Are you having to restart/initialize the stemmer every time for your
> "slow" approach?  Does that really need to happen?
> 2. Can the stemmer return something other than a String?  Say a String
> array of all the stemmed words?  Or maybe even some type of object that
> tells you the original word and the stemmed word?
>
> -Grant
>


Re: Is it possible to specify a pattern of Ranking while querying the indexes?

2008-09-26 Thread Grant Ingersoll

Can you give an example of what you mean?

On Sep 26, 2008, at 11:28 AM, tushar kapoor wrote:



I want to specify a particular pattern in which results are  
retrieved for a

query. Can a pattern of ranks be specified in the query ?
--
View this message in context: 
http://www.nabble.com/Is-it-possible-to-specify-a-pattern-of-Ranking-while-querying-the-indexes--tp19690731p19690731.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









ANNOUNCE: Application Period Opens for Travel Assistance to ApacheCon US 2008

2008-09-26 Thread Chris Hostetter


NOTE: This is a cross posted announcement to all Lucene sub-projects, 
please confine any replies to [EMAIL PROTECTED]


-

The Travel Assistance Committee is taking in applications for those wanting
to attend ApacheCon US 2008 between the 3rd and 7th November 2008 in New
Orleans.

The Travel Assistance Committee is looking for people who would like to be
able to attend ApacheCon US 2008 who need some financial support in order to
get there. There are VERY few places available and the criteria is high,
that aside applications are open to all open source developers who feel that
their attendance would benefit themselves, their project(s), the ASF and
open source in general.

Financial assistance is available for flights, accomodation and entrance
fees either in full or in part, depending on circumstances. It is intended
that all our ApacheCon events are covered, so it may be prudent for those in
Europe and or Asia to wait until an event closer to them comes up - you are
all welcome to apply for ApacheCon US of course, but there must be
compelling reasons for you to attend an event further away that your home
location for your application to be considered above those closer to the
event location.

More information can be found on the main Apache website at
http://www.apache.org/travel/index.html - where you will also find a link to
the application form and details for submitting.

Time is very tight for this event, so applications are open now and will end
on the 2nd October 2008 - to give enough time for travel arrangements to be
made.

Good luck to all those that will apply.

Regards,

The Travel Assistance Committee


Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Chris Hostetter

: It is invoking a COM object in Windows. The object is instantiated once for
: a token stream, and then invoked once for each token. The invoke always has
: an overhead, not much to do about that (sigh...)

I also know nothing about COM, but based on your comments it sounds like 
instantiating your COM object is expensive ... so why to it for every 
token?  why not have a TokenFilter where the COM object is constructed 
when the TokenFilter is constructed, and then the same object will be 
invoked for each token in the stream for a given field value.

Or better still: if your COM object is threadsafe, construct one in the 
init method for your TokenFilterFactory and reuse it in every TokenFilter 
instance.



-Hoss



Re: Dismax , "query phrases"

2008-09-26 Thread Chris Hostetter

I'm not fully following everything you've got here, but one thing jumped 
out at me...

:

Re: Searching Question

2008-09-26 Thread Jake Conk
Grant,

Each post is its own document but I can merge them all into a single
document under one  thread if that will allow me to do what I want.
The number of replies is stored both in Solr and the DB.

Thanks,

- JC

On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> Is a thread and all of it's posts a single document?  In other words, how
> are you modeling your posts as Solr documents?  Also, where are you keeping
> track of the number of replies?  Is that in Solr or in a DB?
>
> -Grant
>
> On Sep 25, 2008, at 8:51 PM, Jake Conk wrote:
>
>> Hello,
>>
>> We are using Solr for our new forums search feature. If possible when
>> searching for the word "Halo" we would like threads that contain the
>> word "Halo" the most with the least amount of posts in that thread to
>> have a higher score.
>>
>> For instance, if we have a thread with 10 posts and the word "Halo"
>> shows up 5 times then that should have a lower score than a thread
>> that has the word "Halo" 3 times within its posts and has 5 replies.
>> Basically the thread that shows the search string most frequently
>> amongst the number of posts in the thread should be the one with the
>> highest score.
>>
>> Is something like this possible?
>>
>> Thanks,
>>
>> - JC
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>


Re: Searching Question

2008-09-26 Thread Otis Gospodnetic
It might be easiest to store the thread ID and the number of replies in the 
thread in each post Document in Solr.

Otherwise it sounds like you'll have to combine some search results or data 
post-search.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jake Conk <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, September 26, 2008 1:50:37 PM
> Subject: Re: Searching Question
> 
> Grant,
> 
> Each post is its own document but I can merge them all into a single
> document under one  thread if that will allow me to do what I want.
> The number of replies is stored both in Solr and the DB.
> 
> Thanks,
> 
> - JC
> 
> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote:
> > Is a thread and all of it's posts a single document?  In other words, how
> > are you modeling your posts as Solr documents?  Also, where are you keeping
> > track of the number of replies?  Is that in Solr or in a DB?
> >
> > -Grant
> >
> > On Sep 25, 2008, at 8:51 PM, Jake Conk wrote:
> >
> >> Hello,
> >>
> >> We are using Solr for our new forums search feature. If possible when
> >> searching for the word "Halo" we would like threads that contain the
> >> word "Halo" the most with the least amount of posts in that thread to
> >> have a higher score.
> >>
> >> For instance, if we have a thread with 10 posts and the word "Halo"
> >> shows up 5 times then that should have a lower score than a thread
> >> that has the word "Halo" 3 times within its posts and has 5 replies.
> >> Basically the thread that shows the search string most frequently
> >> amongst the number of posts in the thread should be the one with the
> >> highest score.
> >>
> >> Is something like this possible?
> >>
> >> Thanks,
> >>
> >> - JC
> >
> > --
> > Grant Ingersoll
> > http://www.lucidimagination.com
> >
> > Lucene Helpful Hints:
> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > http://wiki.apache.org/lucene-java/LuceneFAQ
> >
> >
> >
> >
> >
> >
> >
> >



Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Jaco
The overhead is not in the instantiation, but in the actual call to the COM
object. The approach with one time instantiation in the TokenFilterFactory,
and the use of that object in the TokenFilter is exactly what I tried. There
is a factor of 10 performance gain when being able to do a single call
instead of token-by-token (also tried this in another environment (perl),
which gave the same result).

So I guess I'll need to do this with the other approach I suggested.

Bye,

Jaco.

2008/9/26 Chris Hostetter <[EMAIL PROTECTED]>

>
> : It is invoking a COM object in Windows. The object is instantiated once
> for
> : a token stream, and then invoked once for each token. The invoke always
> has
> : an overhead, not much to do about that (sigh...)
>
> I also know nothing about COM, but based on your comments it sounds like
> instantiating your COM object is expensive ... so why to it for every
> token?  why not have a TokenFilter where the COM object is constructed
> when the TokenFilter is constructed, and then the same object will be
> invoked for each token in the stream for a given field value.
>
> Or better still: if your COM object is threadsafe, construct one in the
> init method for your TokenFilterFactory and reuse it in every TokenFilter
> instance.
>
>
>
> -Hoss
>
>


ApacheCon US promo

2008-09-26 Thread Grant Ingersoll

Cross-posting...

Just wanted to let everyone know that there will be a number of Lucene/ 
Solr/Mahout/Tika related talks, training sessions, and Birds of a  
Feather (BOF) gatherings at ApacheCon New Orleans this fall.


Details:
When: November 3-7
Where:  Sheraton, New Orleans, USA
URL: http://us.apachecon.com/c/acus2008/

Lucene:

Advanced Indexing Techniques by Michael Busch: 
http://us.apachecon.com/c/acus2008/sessions/7

Lucene Boot Camp (2 day hands-on training by me): 
http://us.apachecon.com/c/acus2008/sessions/69

Solr:

Solr out of the Box by Chris Hostetter: 
http://us.apachecon.com/c/acus2008/sessions/9

Beyond the Box by Hoss: http://us.apachecon.com/c/acus2008/sessions/10

Solr Boot Camp (1 day hands-on training by Erik Hatcher): 
http://us.apachecon.com/c/acus2008/sessions/91

Mahout:

Intro to Mahout and Machine Learning (by me): 
http://us.apachecon.com/c/acus2008/sessions/11

Tika:

Content Analysis for ECM with Apache Tika by Paolo Mottadelli : 
http://us.apachecon.com/c/acus2008/sessions/12


There's also one more Lucene session that is TBD, but it will be on  
that same Wednesday as everything else.  Chances are it will be an  
intro to Lucene type talk.



BOFs:  http://wiki.apache.org/apachecon/BirdsOfaFeatherUs08


Cheers,
Grant


Solr performance for Instance updates

2008-09-26 Thread mahendra mahendra
Hi,
 
We want to update the index based on TIB listener, whenever database changes 
happens we want to update my index instantly this may happen very frequently 
for number of records.
 
Could anyone please tell me how would be the performance for these scenarios?
 
Question related linguistic support
How is linguistic support for Solr and is it compatibility to add third party 
packages(mass Boston, Cambridge,..etc for linguistic support)
 
Any help would appreciate!!

Thanks & Regards,
Mahendra


  

Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Grant Ingersoll


On Sep 26, 2008, at 12:05 PM, Jaco wrote:


Hi Grant,

In reply to your questions:

1. Are you having to restart/initialize the stemmer every time for  
your

"slow" approach?  Does that really need to happen?

It is invoking a COM object in Windows. The object is instantiated  
once for
a token stream, and then invoked once for each token. The invoke  
always has

an overhead, not much to do about that (sigh...)

2. Can the stemmer return something other than a String?  Say a  
String array
of all the stemmed words?  Or maybe even some type of object that  
tells you

the original word and the stemmed word?

The stemmer can only return a String. But, I do know that the returned
string always has exactly the same number of words as the input  
string. So

logically, it would be possible to :
a) first calculate the position/start/end of each token in the input  
string

(usual tokenization by Whitespace), resulting in token list 1
b) then invoke the stemmer, and tokenize that result by Whitespace,
resulting in token list 2
c) 'merge' the token values of token list 2 into token list 1, which  
is

possible because each token's position is the same in both lists...
d) return that 'merged' token list 2 for further processing

Would this work in Solr?


I think so, assuming your stemmer tokenizes on whitespace as well.




I can do some Java coding to achieve that from logical point of  
view, but I
wouldn't know how to structure this flow into the  
MyTokenizerFactory, so

some hints to achieve that would be great!



One thought:
Don't create an all in one Tokenizer.  Instead, keep the Whitespace  
Tokenizer as is.  Then, create a TokenFilter that buffers the whole  
document into a memory (via the next() implementation) and also  
creates, using StringBuilder, a string containing the whole text.   
Once you've read it all in, then send the string to your stemmer,  
parse it back out and associate it back to your token buffer.  If you  
are guaranteed position, you could even keep a (linked) hash, such  
that it is really quick to look up tokens after stemming.


Pseudocode looks something like:

while (token.next != null)
   tokenMap.put(token.position, token)
   stringBuilder.append(' ').append(token.text)

stemmedText = comObj.stem(stringBuilder.toString())
correlateStemmedText(stemmedText, tokenMap)

spit out the tokens one by one...


I think this approach should be fast (but maybe not as fast as your  
all in one tokenizer) and will provide the correct position and  
offsets.  You do have to be careful w/ really big documents, as that  
map can be big.  You also want to be careful about map reuse, token  
reuse, etc.


I believe there are a couple of buffering TokenFilters in Solr that  
you could examine for inspiration.  I think the  
RemoveDuplicatesTokenFilter (or whatever it's called) does buffering.


-Grant






Thanks for helping out!

Jaco.


2008/9/26 Grant Ingersoll <[EMAIL PROTECTED]>



On Sep 26, 2008, at 9:40 AM, Jaco wrote:

Hi,


Here's some of the code of my Tokenizer:

public class MyTokenizerFactory extends BaseTokenizerFactory
{
 public WhitespaceTokenizer create(Reader input)
 {
 String text, normalizedText;

 try {
 text  = IOUtils.toString(input);
 normalizedText= *invoke my stemmer(text)*;

 }
 catch( IOException ex ) {
 throw new  
SolrException( SolrException.ErrorCode.SERVER_ERROR,

ex );
 }

 StringReaderstringReader = new  
StringReader(normalizedText);


 return new WhitespaceTokenizer(stringReader);
 }
}

I see what's going in the analysis tool now, and I think I  
understand the
problem. For instance, the text: abcdxxx defgxxx. Let's assume the  
stemmer

gets rid of xxx.

I would then see this in the analysis tool after the tokenizer  
stage:

- abcd - term position 1; start: 1; end:  3
- defg - term position 2; start: 4; end: 7

These positions are not in line with the initial search text -  
this must

be
why the highlighting goes wrong. I guess my little trick to do  
this was a

bit too simple because it messes up the positions basically because
something different from the original source text is tokenized.



Yes, this is exactly the problem.  I don't know enough about com4J  
or your

stemmer, but some things come to mind:

1. Are you having to restart/initialize the stemmer every time for  
your

"slow" approach?  Does that really need to happen?
2. Can the stemmer return something other than a String?  Say a  
String
array of all the stemmed words?  Or maybe even some type of object  
that

tells you the original word and the stemmed word?

-Grant



--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: Solr performance for Instance updates

2008-09-26 Thread Otis Gospodnetic
Hi,



- Original Message 
> From: mahendra mahendra <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, September 26, 2008 3:52:57 PM
> Subject: Solr performance for Instance updates
> 
> Hi,
>  
> We want to update the index based on TIB listener, whenever database changes 
> happens we want to update my index instantly this may happen very frequently 
> for 
> number of records.
>  
> Could anyone please tell me how would be the performance for these scenarios?

It depends on the complexity of your analyzers, but I've seen indexing at over 
500 docs/second on modest hardware.

> Question related linguistic support
> How is linguistic support for Solr and is it compatibility to add third party 
> packages(mass Boston, Cambridge,..etc for linguistic support)

Could you please be more specific about what type of linguistic support you are 
after?  Are you after NE extraction?  If so, no such thing built into Solr, but 
it can certainly be built and integrated with Solr. ;)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


Re: Integrating external stemmer in Solr and pre-processing text

2008-09-26 Thread Jaco
Thanks for these suggestions, will try it in the coming days and post my
findings in this thread.

Bye,

Jaco.

2008/9/26 Grant Ingersoll <[EMAIL PROTECTED]>

>
> On Sep 26, 2008, at 12:05 PM, Jaco wrote:
>
>  Hi Grant,
>>
>> In reply to your questions:
>>
>> 1. Are you having to restart/initialize the stemmer every time for your
>> "slow" approach?  Does that really need to happen?
>>
>> It is invoking a COM object in Windows. The object is instantiated once
>> for
>> a token stream, and then invoked once for each token. The invoke always
>> has
>> an overhead, not much to do about that (sigh...)
>>
>> 2. Can the stemmer return something other than a String?  Say a String
>> array
>> of all the stemmed words?  Or maybe even some type of object that tells
>> you
>> the original word and the stemmed word?
>>
>> The stemmer can only return a String. But, I do know that the returned
>> string always has exactly the same number of words as the input string. So
>> logically, it would be possible to :
>> a) first calculate the position/start/end of each token in the input
>> string
>> (usual tokenization by Whitespace), resulting in token list 1
>> b) then invoke the stemmer, and tokenize that result by Whitespace,
>> resulting in token list 2
>> c) 'merge' the token values of token list 2 into token list 1, which is
>> possible because each token's position is the same in both lists...
>> d) return that 'merged' token list 2 for further processing
>>
>> Would this work in Solr?
>>
>
> I think so, assuming your stemmer tokenizes on whitespace as well.
>
>
>>
>> I can do some Java coding to achieve that from logical point of view, but
>> I
>> wouldn't know how to structure this flow into the MyTokenizerFactory, so
>> some hints to achieve that would be great!
>>
>
>
> One thought:
> Don't create an all in one Tokenizer.  Instead, keep the Whitespace
> Tokenizer as is.  Then, create a TokenFilter that buffers the whole document
> into a memory (via the next() implementation) and also creates, using
> StringBuilder, a string containing the whole text.  Once you've read it all
> in, then send the string to your stemmer, parse it back out and associate it
> back to your token buffer.  If you are guaranteed position, you could even
> keep a (linked) hash, such that it is really quick to look up tokens after
> stemming.
>
> Pseudocode looks something like:
>
> while (token.next != null)
>   tokenMap.put(token.position, token)
>   stringBuilder.append(' ').append(token.text)
>
> stemmedText = comObj.stem(stringBuilder.toString())
> correlateStemmedText(stemmedText, tokenMap)
>
> spit out the tokens one by one...
>
>
> I think this approach should be fast (but maybe not as fast as your all in
> one tokenizer) and will provide the correct position and offsets.  You do
> have to be careful w/ really big documents, as that map can be big.  You
> also want to be careful about map reuse, token reuse, etc.
>
> I believe there are a couple of buffering TokenFilters in Solr that you
> could examine for inspiration.  I think the RemoveDuplicatesTokenFilter (or
> whatever it's called) does buffering.
>
> -Grant
>
>
>
>
>
>>
>> Thanks for helping out!
>>
>> Jaco.
>>
>>
>> 2008/9/26 Grant Ingersoll <[EMAIL PROTECTED]>
>>
>>
>>> On Sep 26, 2008, at 9:40 AM, Jaco wrote:
>>>
>>> Hi,
>>>

 Here's some of the code of my Tokenizer:

 public class MyTokenizerFactory extends BaseTokenizerFactory
 {
  public WhitespaceTokenizer create(Reader input)
  {
 String text, normalizedText;

 try {
 text  = IOUtils.toString(input);
 normalizedText= *invoke my stemmer(text)*;

 }
 catch( IOException ex ) {
 throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
 ex );
 }

 StringReaderstringReader = new StringReader(normalizedText);

 return new WhitespaceTokenizer(stringReader);
  }
 }

 I see what's going in the analysis tool now, and I think I understand
 the
 problem. For instance, the text: abcdxxx defgxxx. Let's assume the
 stemmer
 gets rid of xxx.

 I would then see this in the analysis tool after the tokenizer stage:
 - abcd - term position 1; start: 1; end:  3
 - defg - term position 2; start: 4; end: 7

 These positions are not in line with the initial search text - this must
 be
 why the highlighting goes wrong. I guess my little trick to do this was
 a
 bit too simple because it messes up the positions basically because
 something different from the original source text is tokenized.


>>> Yes, this is exactly the problem.  I don't know enough about com4J or
>>> your
>>> stemmer, but some things come to mind:
>>>
>>> 1. Are you having to restart/initialize the stemmer every time for your
>>> "slow" approach?  Does that really need to happen?
>>> 2. Can the stemmer return something other than a String?  

Re: Anyproblem in running two solr instances on the same machine using the same directory ?

2008-09-26 Thread Yonik Seeley
On Fri, Sep 26, 2008 at 2:18 AM, Jagadish Rath <[EMAIL PROTECTED]> wrote:
>   - *What are the other solutions to the problem of "maxWarmingSearchers
>   limit exceeded error " ?**  *

Don't commit so rapidly?
What is the reason for your high commit rate?

-Yonik


Re: Solr performance for Instance updates

2008-09-26 Thread mahendra mahendra
Hi,
 
Instantly I want to update each doc(based on db changes) and commit, I hope for 
every commit it takes more time.I don't want to post some bulk docs and commit.
 
How can be the performance for this scenario...
Also every time if I am going to update the docs the index size is going to 
increase, the old doc is physically not going to delete.
 
Any Idea on this..
 

Thanks & Regards,
Mahendra

--- On Sat, 9/27/08, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

From: Otis Gospodnetic <[EMAIL PROTECTED]>
Subject: Re: Solr performance for Instance updates
To: solr-user@lucene.apache.org
Date: Saturday, September 27, 2008, 2:19 AM

Hi,



- Original Message 
> From: mahendra mahendra <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, September 26, 2008 3:52:57 PM
> Subject: Solr performance for Instance updates
> 
> Hi,
>  
> We want to update the index based on TIB listener, whenever database
changes 
> happens we want to update my index instantly this may happen very
frequently for 
> number of records.
>  
> Could anyone please tell me how would be the performance for these
scenarios?

It depends on the complexity of your analyzers, but I've seen indexing at
over 500 docs/second on modest hardware.

> Question related linguistic support
> How is linguistic support for Solr and is it compatibility to add third
party 
> packages(mass Boston, Cambridge,..etc for linguistic support)

Could you please be more specific about what type of linguistic support you are
after?  Are you after NE extraction?  If so, no such thing built into Solr, but
it can certainly be built and integrated with Solr. ;)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



  

Re: Solr performance for Instance updates

2008-09-26 Thread Otis Gospodnetic
You can add in real-time.  You are thinking of "commit" as a RDBMS commit, I 
assume.  That happens "automatically".  Solr has a notion of "commit", too, but 
it's different that the DB one.  I have a feeling you haven't really looked at 
the Solr tutorial yet.  Want to give that a try first?


Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: mahendra mahendra <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, September 26, 2008 6:30:27 PM
> Subject: Re: Solr performance for Instance updates
> 
> Hi,
>  
> Instantly I want to update each doc(based on db changes) and commit, I hope 
> for 
> every commit it takes more time.I don't want to post some bulk docs and 
> commit.
>  
> How can be the performance for this scenario...
> Also every time if I am going to update the docs the index size is going to 
> increase, the old doc is physically not going to delete.
>  
> Any Idea on this..
>  
> 
> Thanks & Regards,
> Mahendra
> 
> --- On Sat, 9/27/08, Otis Gospodnetic wrote:
> 
> From: Otis Gospodnetic 
> Subject: Re: Solr performance for Instance updates
> To: solr-user@lucene.apache.org
> Date: Saturday, September 27, 2008, 2:19 AM
> 
> Hi,
> 
> 
> 
> - Original Message 
> > From: mahendra mahendra 
> > To: solr-user@lucene.apache.org
> > Sent: Friday, September 26, 2008 3:52:57 PM
> > Subject: Solr performance for Instance updates
> > 
> > Hi,
> >  
> > We want to update the index based on TIB listener, whenever database
> changes 
> > happens we want to update my index instantly this may happen very
> frequently for 
> > number of records.
> >  
> > Could anyone please tell me how would be the performance for these
> scenarios?
> 
> It depends on the complexity of your analyzers, but I've seen indexing at
> over 500 docs/second on modest hardware.
> 
> > Question related linguistic support
> > How is linguistic support for Solr and is it compatibility to add third
> party 
> > packages(mass Boston, Cambridge,..etc for linguistic support)
> 
> Could you please be more specific about what type of linguistic support you 
> are
> after?  Are you after NE extraction?  If so, no such thing built into Solr, 
> but
> it can certainly be built and integrated with Solr. ;)
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch