Query boosting

2010-02-18 Thread deepak agrawal
Hi,

i want to boost the result through query.
i have 4 fields in our schema.






If i search *deepak* then result should come in that order  -


All *UPDBY* having deepak then
All *To* having deepak then
All *CC* having deepak
All *BCC* having deepak

I am using Standard request handler. Please help me on this.
-- 
DEEPAK AGRAWAL
+91-9379433455
GOOD LUCK.


Re: labeling facets and highlighting question

2010-02-18 Thread gwk

There's a ! missing in there, try {!key=label}.

Regards,

gwk

On 2/18/2010 5:01 AM, adeelmahmood wrote:

okay so if I dont want to do any excludes then I am assuming I should just
put in {key=label}field .. i tried that and it doesnt work .. it says
undefined field {key=label}field


Lance Norskog-2 wrote:
   

Here's the problem: the wiki page is confusing:

http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters

The line:
q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype

is standalone, but the later line:

facet.field={!ex=dt key=mylabel}doctype

mean 'change the long query from {!ex=dt}docType to {!ex=dt
key=mylabel}docType'

'tag=dt' creates a tag (name) for a filter query, and 'ex=dt' means
'exclude this filter query'.

On Wed, Feb 17, 2010 at 4:30 PM, adeelmahmood
wrote:
 

simple question: I want to give a label to my facet queries instead of
the
name of facet field .. i found the documentation at solr site that I can
do
that by specifying the key local param .. syntax something like
facet.field={!ex=dt%20key='By%20Owner'}owner

I am just not sure what the ex=dt part does .. if i take it out .. it
throws
an error so it seems its important but what for ???

also I tried turning on the highlighting and i can see that it adds the
highlighting items list in the xml at the end .. but it only points out
the
ids of all the matching results .. it doesnt actually shows the text data
thats its making a match with // so i am getting something like this back


  
  
...

instead of the actual text thats being matched .. isnt it supposed to do
that and wrap the search terms in em tag .. how come its not doing that
in
my case

here is my schema





--
View this message in context:
http://old.nabble.com/labeling-facets-and-highlighting-question-tp27632747p27632747.html
Sent from the Solr - User mailing list archive at Nabble.com.


   



--
Lance Norskog
goks...@gmail.com


 
   




Re: Upgrading Tika in Solr

2010-02-18 Thread Christian Vogler
Just a word of caution: I've been bitten by this bug, which affects Tika 0.6: 
https://issues.apache.org/jira/browse/PDFBOX-541

It causes the parser to go into an infinite loop, which isn't exactly great 
for server stability. Tika 0.4 is not affected in the same way - as far as I 
remember, the parser just fails on such PDF files.

According to the Tika folks, PDFBox and Tika releases need to be synchronized, 
so it might be wise to hold off upgrading until the next Tika version has been 
released that contains the fixed PDFBox.

Best regards
- Christian


On Wednesday 17 February 2010 11:40:50 am Liam O'Boyle wrote:
> I just copied in the newer .jars and got rid of the old ones and
> everything seemed to work smoothly enough.
> 
> Liam
> 
> On Tue, 2010-02-16 at 13:11 -0500, Grant Ingersoll wrote:
> > I've got a task open to upgrade to 0.6.  Will try to get to it this week.
> >  Upgrading is usually pretty trivial.
> >
> > On Feb 14, 2010, at 12:37 AM, Liam O'Boyle wrote:
> > > Afternoon,
> > >
> > > I've got a large collections of documents which I'm attempting to add
> > > to a Solr index using Tika via the ExtractingRequestHandler, but there
> > > are a large number that it has problems with (PDFs, PPTX and XLS
> > > documents mainly).
> > >
> > > I've tried them with the most recent stand alone version of Tika and it
> > > handles most of the failing documents correctly.  I tried using a
> > > recent nightly build of Solr, but the same problems seem to occur.
> > >
> > > Are there instructions somewhere on installing a more recent Tika build
> > > into Solr?
> > >
> > > Thanks,
> > > Liam
> >
> > --
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem using Solr/Lucene:
> > http://www.lucidimagination.com/search
> 

-- 
Christian Vogler, Ph.D.
Institute for Language and Speech Processing, Athens, Greece


Re: Query boosting

2010-02-18 Thread Paul Dhaliwal
Try using the dismax handler
http://wiki.apache.org/solr/DisMaxRequestHandler

This would be very good read for you.

you would use the bq ( boost query parameter) and it should look something
similar to..

&bq=UPDBY:deepak^5.0+TO:deepak^4.0+CC:deepak^3.0+BCC:deepak^2.0

Paul

On Thu, Feb 18, 2010 at 12:28 AM, deepak agrawal  wrote:

> Hi,
>
> i want to boost the result through query.
> i have 4 fields in our schema.
>
> 
> 
> 
> 
>
> If i search *deepak* then result should come in that order  -
>
>
> All *UPDBY* having deepak then
> All *To* having deepak then
> All *CC* having deepak
> All *BCC* having deepak
>
> I am using Standard request handler. Please help me on this.
> --
> DEEPAK AGRAWAL
> +91-9379433455
> GOOD LUCK.
>


Re: getting unexpected statscomponent values

2010-02-18 Thread Koji Sekiguchi

solr-user wrote:
Hossman, what do you mean by including a "TestCase"?  


Will create issue in Jira asap; I will include the URL, schema and some code
to generate sample data
  

I think they are good for TestCase.

Koji

--
http://www.rondhuit.com/en/



java.io.IOException: read past EOF after Solr 1.4.0

2010-02-18 Thread Koji Sekiguchi
Using release-1.4.0 or trunk branch Solr and indexing
example data and search 0 boosted word:

http://localhost:8983/solr/select/?q=usb^0.0

I got the following exception:

java.io.IOException: read past EOF
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:163)
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70)
at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210)
at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948)
at
org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506)
at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
at org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:444)
at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:427)
at
org.apache.solr.util.SolrPluginUtils.optimizePreFetchDocs(SolrPluginUtils.java:267)
at
org.apache.solr.handler.component.QueryComponent.doPrefetch(QueryComponent.java:278)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:185)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

This cannot be reproduced with release-1.3.0...

Koji

-- 
http://www.rondhuit.com/en/



some scores to 0 using omitNorns=false

2010-02-18 Thread Raimon Bosch


Hi,

We did some tests with omitNorms=false. We have seen that in the last
result's page we have some scores set to 0.0. This scores setted to 0 are
problematic to our sorters.

It could be some kind of bug?

Regrads,
Raimon Bosch.
-- 
View this message in context: 
http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637436.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: java.io.IOException: read past EOF after Solr 1.4.0

2010-02-18 Thread Yonik Seeley
2010/2/18 Koji Sekiguchi :
> Using release-1.4.0 or trunk branch Solr and indexing
> example data and search 0 boosted word:
>
> http://localhost:8983/solr/select/?q=usb^0.0

Confirmed - looks like Solr is requesting an incorrect docid.
I'm looking into it.

-Yonik
http://www.lucidimagination.com


score computation for dismax handler

2010-02-18 Thread bharath venkatesh
Hi ,
  When query is made across multiple fields in dismax handler using
paramater qf  , I have observed that with  debug query enabled the resultant
score is max score of scores of query across each  fields . but I want the
resultant score to be sum of score across fields (like the standard handler
) . can any one tell me how this can be achevied.


Re: Realtime search and facets with very frequent commits

2010-02-18 Thread Otis Gospodnetic
Hi Janne,

I *think*  Ocean Realtime Search has been superseded by Lucene NRT search.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Janne Majaranta 
> To: solr-user@lucene.apache.org
> Sent: Thu, February 18, 2010 2:12:37 AM
> Subject: Re: Realtime search and facets with very frequent commits
> 
> Hi,
> 
> Yes, I did play with mergeFactor.
> I didn't play with mergePolicy.
> 
> Wouldn't that affect indexing speed and possibly memory usage ?
> I don't have any problems with indexing speed ( 1000 - 2000 docs / sec via
> the standard HTTP API ).
> 
> My problem is that I need very warm caches to get fast faceting, and the
> autowarming of the caches takes too long compared to the frequency of
> commits I'm having.
> So a commit every minute means less than a minute time to warm the caches.
> 
> To give you a idea of what kind of queries needs to be autowarmed in my app,
> the logevents indexed as documents have timestamps with different
> granularity used for faceting.
> For example, to get count of logevents for every hour using faceting there's
> a timestamp field with the format mmddhh ( for example: 2010021808
> meaning 2010-02-18 8am).
> One use case is to get hourly counts over the whole index. A non-cached
> query counting the hourly counts over the 40M documents index takes a
> while..
> And to my understanding autowarming means something like that this kind of
> query would be basically re-executed against a cold cache. Probably not
> exactly how it works, but it "feels" like it would.
> 
> Moving the commits to a smaller index while using sharding to have a
> transparent view to the index from the client app seems to solve my problem.
> 
> I'm not sure if the (upcoming?) NRT features would keep the caches more
> persistent, probably not in a environment where docs get frequent updates /
> deletes.
> 
> Also, I'm closely following the Ocean Realtime Search project AND it's SOLR
> integration. It sounds like it has the "dream features" to enable realtime
> updates to the index.
> 
> -Janne
> 
> 
> 2010/2/18 Jan Høydahl / Cominvent 
> 
> > Hi,
> >
> > Have you tried playing with mergeFactor or even mergePolicy?
> >
> > --
> > Jan Høydahl  - search architect
> > Cominvent AS - www.cominvent.com
> >
> > On 16. feb. 2010, at 08.26, Janne Majaranta wrote:
> >
> > > Hey Dipti,
> > >
> > > Basically query optimizations + setting cache sizes to a very high level.
> > > Other than that, the config is about the same as the out-of-the-box
> > config
> > > that comes with the Solr download.
> > >
> > > I haven't found a magic switch to get very fast query responses + facet
> > > counts with the frequency of commits I'm having using one single SOLR
> > > instance.
> > > Adding some TOP queries for a certain type of user to static warming
> > queries
> > > just moved the time of autowarming the caches to the time it took to warm
> > > the caches with static queries.
> > > I've been staging a setup where there's a small solr instance receiving
> > all
> > > the updates and a large instance which doesn't receive the live feed of
> > > updates.
> > > The small index will be merged with the large index periodically (once a
> > > week or once a month).
> > > The two instances are seen by the client app as one instance using the
> > > sharding features of SOLR.
> > > The instances are running on the same server inside their own JVM /
> > jetty.
> > >
> > > In this setup the caches are very HOT for the large index and queries are
> > > extremely fast, and the small index is small enough to get extremely fast
> > > queries without having to warm up the caches too much.
> > >
> > > Basically I'm able to have a commit frequency of 10 seconds in a 40M docs
> > > index while counting TOP5 facets over 14 fields in 200ms.
> > > In reality the commit frequency of 10 seconds comes from the fact that
> > the
> > > updates are going into a 1M - 2M documents index, and the fast facet
> > counts
> > > from the fact that the 38M documents index has hot caches and doesn't
> > > receive any updates.
> > >
> > > Also, not running updates to the large index means that the SOLR instance
> > > reading the large index uses about half the memory it used before when
> > > running the updates to the large index. At least it does so on Win2k3.
> > >
> > > -Janne
> > >
> > >
> > > 2010/2/15 dipti khullar 
> > >
> > >> Hey Janne
> > >>
> > >> Can you please let me know what other optimizations are you talking
> > about
> > >> here. Because in our application we are committing in about 5 mins but
> > >> still
> > >> the response time is very low and at times there are some connection
> > time
> > >> outs also.
> > >>
> > >> Just wanted to confirm if you have done some major configuration changes
> > >> which have proved beneficial.
> > >>
> > >> Thanks
> > >> Dipti
> > >>
> > >>
> >
> >



Re: parsing strings into phrase queries

2010-02-18 Thread Robert Muir
i gave it a rough shot Lance, if there's a better way to explain it, please
edit

On Wed, Feb 17, 2010 at 10:23 PM, Lance Norskog  wrote:

> That would be great. After reading this and the PositionFilter class I
> still don't know how to use it.
>
> On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir  wrote:
> > i think we can improve the docs/wiki to show this example use case, i
> > noticed the wiki explanation for this filter gives a more complex
> shingles
> > example, which is interesting, but this seems to be a common problem and
> > maybe we should add this use case.
> >
> > On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
> > wrote:
> >
> >>
> >> : take a look at PositionFilter
> >>
> >> Right, there was another thread recently where almost the exact same
> issue
> >> was discussed...
> >>
> >> http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html
> >>
> >> ..except that i was ignorant of the existence of PositionFilter when i
> >> wrote that message.
> >>
> >>
> >>
> >> -Hoss
> >>
> >>
> >
> >
> > --
> > Robert Muir
> > rcm...@gmail.com
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Robert Muir
rcm...@gmail.com


Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-18 Thread Otis Gospodnetic
Hi Tom,

32MB is very low, 320MB is medium, and I think you could go higher, just pick 
whichever garbage collector is good for throughput.  I know Java 1.6 update 18 
also has some Hotspot and maybe also GC fixes, so I'd use that.  Finally, this 
sounds like a good use case for reindexing with Hadoop!

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: "Burton-West, Tom" 
> To: "solr-user@lucene.apache.org" 
> Sent: Wed, February 17, 2010 5:16:26 PM
> Subject: What is largest reasonable setting for ramBufferSizeMB?
> 
> Hello all,
> 
> At some point we will need to re-build an index that totals about 2 
> terrabytes 
> in size (split over 10 shards).  At our current indexing speed we estimate 
> that 
> this will take about 3 weeks.  We would like to reduce that time.  It appears 
> that our main bottleneck is disk I/O.
> We currently have ramBufferSizeMB set to 32 and our merge factor is 10.  If 
> we 
> increase ramBufferSizeMB to 320, we avoid a merge and the 9 disk writes and 
> reads to merge 9+1 32MB segments into a 320MB segment.
> 
> Assuming we allocate enough memory to the JVM, would it make sense to 
> increase 
> ramBufferSize to 3200MB?   What are people's experiences with very large 
> ramBufferSizeMB sizes?
> 
> Tom Burton-West
> University of Michigan Library
> www.hathitrust.org



Re: optimize is taking too much time

2010-02-18 Thread Jagdish Vasani
Hi,

you should not optimize index after each insert of document.insted you
should optimize it after inserting some good no of documents.
because in optimize it will merge  all segments to one according to setting
of lucene index.

thanks,
Jagdish
On Fri, Feb 12, 2010 at 4:01 PM, mklprasad  wrote:

>
> hi
> in my solr u have 1,42,45,223 records having some 50GB .
> Now when iam loading a new record and when its trying optimize the docs its
> taking 2 much memory and time
>
>
> can any body please tell do we have any property in solr to get rid of
> this.
>
> Thanks in advance
>
> --
> View this message in context:
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: getting unexpected statscomponent values

2010-02-18 Thread Erick Erickson
SOLR makes heavy use of JUnit for testing. The real advantage
of a JUnit testcase being attached is that it can then be
permanently incorporated into the SOLR builds. If you're
unfamiliar with JUnit, then providing the raw data that illustrates
the bug allows people who work on SOLR to save a bunch
of time trying to reproduce the problem. It also insures that
they are addressing what you're seeing ...

It's especially helpful if you can take a bit of time to pare away
all the unnecessary stuff in your example files and/or comment
what you think the important bits are.

HTH
Erick

On Wed, Feb 17, 2010 at 5:46 PM, solr-user  wrote:

>
>
> hossman wrote:
> >
> >
> > That does look really weird, and definitely seems like a bug.
> >
> > Can you open an issue in Jira? ... ideally with a TestCase (even if it's
> > not a JUnit test case, just having some sample docs that can be indexed
> > against the example schema and a URL showing the problem would be
> helpful)
> >
> >
>
> Hossman, what do you mean by including a "TestCase"?
>
> Will create issue in Jira asap; I will include the URL, schema and some
> code
> to generate sample data
> --
> View this message in context:
> http://old.nabble.com/getting-unexpected-statscomponent-values-tp27599248p27631633.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: some scores to 0 using omitNorns=false

2010-02-18 Thread adeelmahmood

I was gonna ask a question about this but you seem like you might have the
answer for me .. wat exactly is the omitNorms field do (or is expected to
do) .. also if you could please help me understand what termVectors and
multiValued options do ??
Thanks for ur help


Raimon Bosch wrote:
> 
> 
> Hi,
> 
> We did some tests with omitNorms=false. We have seen that in the last
> result's page we have some scores set to 0.0. This scores setted to 0 are
> problematic to our sorters.
> 
> It could be some kind of bug?
> 
> Regrads,
> Raimon Bosch.
> 

-- 
View this message in context: 
http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637819.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Realtime search and facets with very frequent commits

2010-02-18 Thread Janne Majaranta
Hi Otis,

Ok, now I'm confused ;)
There seems to be a bit activity though when looking at the "last updated"
timestamps in the google code project wiki:
http://code.google.com/p/oceansearch/w/list

The Tag Index feature sounds very interesting.

-Janne


2010/2/18 Otis Gospodnetic 

> Hi Janne,
>
> I *think*  Ocean Realtime Search has been superseded by Lucene NRT search.
>
>  Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> - Original Message 
> > From: Janne Majaranta 
> > To: solr-user@lucene.apache.org
> > Sent: Thu, February 18, 2010 2:12:37 AM
> > Subject: Re: Realtime search and facets with very frequent commits
> >
> > Hi,
> >
> > Yes, I did play with mergeFactor.
> > I didn't play with mergePolicy.
> >
> > Wouldn't that affect indexing speed and possibly memory usage ?
> > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec
> via
> > the standard HTTP API ).
> >
> > My problem is that I need very warm caches to get fast faceting, and the
> > autowarming of the caches takes too long compared to the frequency of
> > commits I'm having.
> > So a commit every minute means less than a minute time to warm the
> caches.
> >
> > To give you a idea of what kind of queries needs to be autowarmed in my
> app,
> > the logevents indexed as documents have timestamps with different
> > granularity used for faceting.
> > For example, to get count of logevents for every hour using faceting
> there's
> > a timestamp field with the format mmddhh ( for example: 2010021808
> > meaning 2010-02-18 8am).
> > One use case is to get hourly counts over the whole index. A non-cached
> > query counting the hourly counts over the 40M documents index takes a
> > while..
> > And to my understanding autowarming means something like that this kind
> of
> > query would be basically re-executed against a cold cache. Probably not
> > exactly how it works, but it "feels" like it would.
> >
> > Moving the commits to a smaller index while using sharding to have a
> > transparent view to the index from the client app seems to solve my
> problem.
> >
> > I'm not sure if the (upcoming?) NRT features would keep the caches more
> > persistent, probably not in a environment where docs get frequent updates
> /
> > deletes.
> >
> > Also, I'm closely following the Ocean Realtime Search project AND it's
> SOLR
> > integration. It sounds like it has the "dream features" to enable
> realtime
> > updates to the index.
> >
> > -Janne
> >
> >
> > 2010/2/18 Jan Høydahl / Cominvent
> >
> > > Hi,
> > >
> > > Have you tried playing with mergeFactor or even mergePolicy?
> > >
> > > --
> > > Jan Høydahl  - search architect
> > > Cominvent AS - www.cominvent.com
> > >
> > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote:
> > >
> > > > Hey Dipti,
> > > >
> > > > Basically query optimizations + setting cache sizes to a very high
> level.
> > > > Other than that, the config is about the same as the out-of-the-box
> > > config
> > > > that comes with the Solr download.
> > > >
> > > > I haven't found a magic switch to get very fast query responses +
> facet
> > > > counts with the frequency of commits I'm having using one single SOLR
> > > > instance.
> > > > Adding some TOP queries for a certain type of user to static warming
> > > queries
> > > > just moved the time of autowarming the caches to the time it took to
> warm
> > > > the caches with static queries.
> > > > I've been staging a setup where there's a small solr instance
> receiving
> > > all
> > > > the updates and a large instance which doesn't receive the live feed
> of
> > > > updates.
> > > > The small index will be merged with the large index periodically
> (once a
> > > > week or once a month).
> > > > The two instances are seen by the client app as one instance using
> the
> > > > sharding features of SOLR.
> > > > The instances are running on the same server inside their own JVM /
> > > jetty.
> > > >
> > > > In this setup the caches are very HOT for the large index and queries
> are
> > > > extremely fast, and the small index is small enough to get extremely
> fast
> > > > queries without having to warm up the caches too much.
> > > >
> > > > Basically I'm able to have a commit frequency of 10 seconds in a 40M
> docs
> > > > index while counting TOP5 facets over 14 fields in 200ms.
> > > > In reality the commit frequency of 10 seconds comes from the fact
> that
> > > the
> > > > updates are going into a 1M - 2M documents index, and the fast facet
> > > counts
> > > > from the fact that the 38M documents index has hot caches and doesn't
> > > > receive any updates.
> > > >
> > > > Also, not running updates to the large index means that the SOLR
> instance
> > > > reading the large index uses about half the memory it used before
> when
> > > > running the updates to the large index. At least it does so on
> Win2k3.
> > > >
> > > > -Janne
> > > >
> > > >
> > > > 2010/2/15 dipti

Re: some scores to 0 using omitNorns=false

2010-02-18 Thread Raimon Bosch


I am not an expert in lucene scoring formula, but omintNorms=false makes the
scoring formula a little bit more complex, taking into account boosting for
fields and documents. If I'm not wrong (if I am please, correct me) I think
that with omitNorms=false take into account the queryNorm(q) and norm(t,d)
from formula: score(q,d)   =   coord(q,d)  ·  queryNorm(q)  ·∑  
 ( 
tf(t in d)  ·  idf(t)2  ·  t.getBoost() ·  norm(t,d)  ) so the formula will
be more complex.

See
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html,
and
http://old.nabble.com/scores-are-the-same-for-many-diferent-documents-td27623039.html#a27623039

multiValued option is used to create fields with multiple values.

We use it one of our indexed modifying the schema.xml, adding a new field

...

...

This field is processed in a specific UpdateRequestProcessorFactory (write
by us) from a comma separated field called 's_similar_names':
...
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();

String v = (String)doc.getFieldValue( "s_similar_names" );
if( v != null ) {
  String s_similar_names[] = v.split(",");
  for(String s_similar_name : s_similar_names){
if(!s_similar_name.equals(""))
doc.addField( "s_similar_name", s_similar_name );
  }
}

// pass it up the chain
super.processAdd(cmd);
  }
...

A processofactory is specified in solrconfig.xml

...
# 
#   
#   
#   
#   
...

and adding this chain to XmlUpdateRequestHandler in solrconfig.xml:

...
#   
#   
#mychain  
#
#   
...

termVector is used to save more info about terns of a document in the index
and save computational time in functions like MoreLikeThis.
http://wiki.apache.org/solr/TermVectorComponent. We don't use it.


adeelmahmood wrote:
> 
> I was gonna ask a question about this but you seem like you might have the
> answer for me .. wat exactly is the omitNorms field do (or is expected to
> do) .. also if you could please help me understand what termVectors and
> multiValued options do ??
> Thanks for ur help
> 
> 
> Raimon Bosch wrote:
>> 
>> 
>> Hi,
>> 
>> We did some tests with omitNorms=false. We have seen that in the last
>> result's page we have some scores set to 0.0. This scores setted to 0 are
>> problematic to our sorters.
>> 
>> It could be some kind of bug?
>> 
>> Regrads,
>> Raimon Bosch.
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637827.html
Sent from the Solr - User mailing list archive at Nabble.com.



including 'the' dismax query kills results

2010-02-18 Thread Nagelberg, Kallin
I've noticed some peculiar behavior with the dismax searchhandler.

In my case I'm making the search "The British Open", and am getting 0 results. 
When I change it to "British Open" I get many hits. I looked at the query 
analyzer and it should be broken down to "british" and "open" tokens ('the' is 
a stopword). I imagine it is doing an 'and' type search, and by setting the 
'mm' parameter to 1 I once again get results for 'the british open'. I would 
like mm to be 100% however, but just not care about stopwords. Is there a way 
to do this?

Thanks,
-Kal


Re: optimize is taking too much time

2010-02-18 Thread NarasimhaRaju
Hi, 
You can also make use of autocommit feature of solr.
You have two possibilities either based on max number of uncommited docs or 
based on time.
see  of your solrconfig.xml.

Example:-


   
   
    
   60 
  


once your done with adding run final optimize/commit.

Regards, 
P.N.Raju, 





From: Jagdish Vasani 
To: solr-user@lucene.apache.org
Sent: Thu, February 18, 2010 3:12:15 PM
Subject: Re: optimize is taking too much time

Hi,

you should not optimize index after each insert of document.insted you
should optimize it after inserting some good no of documents.
because in optimize it will merge  all segments to one according to setting
of lucene index.

thanks,
Jagdish
On Fri, Feb 12, 2010 at 4:01 PM, mklprasad  wrote:

>
> hi
> in my solr u have 1,42,45,223 records having some 50GB .
> Now when iam loading a new record and when its trying optimize the docs its
> taking 2 much memory and time
>
>
> can any body please tell do we have any property in solr to get rid of
> this.
>
> Thanks in advance
>
> --
> View this message in context:
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



  

Re: dataimporthandler and expungeDeletes=false

2010-02-18 Thread Jorg Heymans
I found the error. The  definition in schema.xml was not set to
the primary key field/column as returned by the deletedPkQuery.

Jorg

On Wed, Feb 17, 2010 at 11:38 AM, Jorg Heymans wrote:

> Looking closer at the documentation, it appears that expungeDeletes in fact
> has nothing to do with 'removing deleted documents from the index' as i
> thought before:
>
>
> http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22_and_.22optimize.22
>
>
> expungeDeletes = "true" | "false" — default is false — merge segments with
> deletes away.
>
> Is this correct ?
>
> FWIW I worked around the issue by adding a removed flag to my data and
> sending   and  commands after delta import but it would have
> been so much nicer to be able to do this all from DIH.
>
> Has anybody been able to get deletedPkQuery to work for deleting documents
> during delta import ?
>
> Jorg
>
> On Tue, Feb 16, 2010 at 3:57 PM, Jorg Heymans wrote:
>
>> Hi,
>>
>> Can anybody tell me if [1] still applies as of version trunk 03/02/2010 ?
>> I am removing documents from my index using deletedPkQuery and a
>> deltaimport. I can tell from the logs that the removal seems to be working:
>>
>> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed parentDeltaQuery for Entity: attachment
>> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder
>> deleteAll
>> INFO: Deleting stale documents
>> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.SolrWriter
>> deleteDoc
>> INFO: Deleting document: 33053
>> 16-Feb-2010 15:32:54 org.apache.solr.core.SolrDeletionPolicy onInit
>> INFO: SolrDeletionPolicy.onInit: commits:num=1
>>
>>  
>> commit{dir=D:\lib\apache-solr-1.5-dev\example\solr\project\data\index,segFN=segments_1y,version=1265210107838,generation=70,filenames=[_2v.prx,
>> _2v.fnm, _2v.tis, _2v.fdt, _2v.frq, segments_1y, _2v.fdx, _2v.tii]
>> 16-Feb-2010 15:32:54 org.apache.solr.core.SolrDeletionPolicy updateCommits
>> INFO: newest commit = 1265210107838
>> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder doDelta
>> INFO: Delta Import completed successfully
>> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder finish
>> INFO: Import completed successfully
>> 16-Feb-2010 15:32:54 org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: start
>> commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false)
>> 16-Feb-2010 15:32:54 org.apache.solr.search.SolrIndexSearcher 
>> INFO: Opening searc...@182c2d9 main
>> 16-Feb-2010 15:32:54 org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: end_commit_flush
>>
>> However when i search the index the removed data is still present,
>> presumably because the DirectUpdateHandler2 does not automatically do
>> expungeDeletes ? Can i configure this somewhere in solrconfig.xml (SOLR-1275
>> was not very clear exactly what needs to be done to activate this behaviour)
>> ?
>>
>> Thanks
>> Jorg
>>
>> [1] http://marc.info/?l=solr-user&m=125962049425151&w=2
>>
>
>


Re: Realtime search and facets with very frequent commits

2010-02-18 Thread Jason Rutherglen
Janne,

I don't think there's any activity happening there.

SOLR-1606 is the tracking issue for moving to per segment facets and
docsets.  I haven't had an immediate commercial need to implement
those.

Jason

On Thu, Feb 18, 2010 at 7:04 AM, Janne Majaranta
 wrote:
> Hi Otis,
>
> Ok, now I'm confused ;)
> There seems to be a bit activity though when looking at the "last updated"
> timestamps in the google code project wiki:
> http://code.google.com/p/oceansearch/w/list
>
> The Tag Index feature sounds very interesting.
>
> -Janne
>
>
> 2010/2/18 Otis Gospodnetic 
>
>> Hi Janne,
>>
>> I *think*  Ocean Realtime Search has been superseded by Lucene NRT search.
>>
>>  Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Hadoop ecosystem search :: http://search-hadoop.com/
>>
>>
>>
>> - Original Message 
>> > From: Janne Majaranta 
>> > To: solr-user@lucene.apache.org
>> > Sent: Thu, February 18, 2010 2:12:37 AM
>> > Subject: Re: Realtime search and facets with very frequent commits
>> >
>> > Hi,
>> >
>> > Yes, I did play with mergeFactor.
>> > I didn't play with mergePolicy.
>> >
>> > Wouldn't that affect indexing speed and possibly memory usage ?
>> > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec
>> via
>> > the standard HTTP API ).
>> >
>> > My problem is that I need very warm caches to get fast faceting, and the
>> > autowarming of the caches takes too long compared to the frequency of
>> > commits I'm having.
>> > So a commit every minute means less than a minute time to warm the
>> caches.
>> >
>> > To give you a idea of what kind of queries needs to be autowarmed in my
>> app,
>> > the logevents indexed as documents have timestamps with different
>> > granularity used for faceting.
>> > For example, to get count of logevents for every hour using faceting
>> there's
>> > a timestamp field with the format mmddhh ( for example: 2010021808
>> > meaning 2010-02-18 8am).
>> > One use case is to get hourly counts over the whole index. A non-cached
>> > query counting the hourly counts over the 40M documents index takes a
>> > while..
>> > And to my understanding autowarming means something like that this kind
>> of
>> > query would be basically re-executed against a cold cache. Probably not
>> > exactly how it works, but it "feels" like it would.
>> >
>> > Moving the commits to a smaller index while using sharding to have a
>> > transparent view to the index from the client app seems to solve my
>> problem.
>> >
>> > I'm not sure if the (upcoming?) NRT features would keep the caches more
>> > persistent, probably not in a environment where docs get frequent updates
>> /
>> > deletes.
>> >
>> > Also, I'm closely following the Ocean Realtime Search project AND it's
>> SOLR
>> > integration. It sounds like it has the "dream features" to enable
>> realtime
>> > updates to the index.
>> >
>> > -Janne
>> >
>> >
>> > 2010/2/18 Jan Høydahl / Cominvent
>> >
>> > > Hi,
>> > >
>> > > Have you tried playing with mergeFactor or even mergePolicy?
>> > >
>> > > --
>> > > Jan Høydahl  - search architect
>> > > Cominvent AS - www.cominvent.com
>> > >
>> > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote:
>> > >
>> > > > Hey Dipti,
>> > > >
>> > > > Basically query optimizations + setting cache sizes to a very high
>> level.
>> > > > Other than that, the config is about the same as the out-of-the-box
>> > > config
>> > > > that comes with the Solr download.
>> > > >
>> > > > I haven't found a magic switch to get very fast query responses +
>> facet
>> > > > counts with the frequency of commits I'm having using one single SOLR
>> > > > instance.
>> > > > Adding some TOP queries for a certain type of user to static warming
>> > > queries
>> > > > just moved the time of autowarming the caches to the time it took to
>> warm
>> > > > the caches with static queries.
>> > > > I've been staging a setup where there's a small solr instance
>> receiving
>> > > all
>> > > > the updates and a large instance which doesn't receive the live feed
>> of
>> > > > updates.
>> > > > The small index will be merged with the large index periodically
>> (once a
>> > > > week or once a month).
>> > > > The two instances are seen by the client app as one instance using
>> the
>> > > > sharding features of SOLR.
>> > > > The instances are running on the same server inside their own JVM /
>> > > jetty.
>> > > >
>> > > > In this setup the caches are very HOT for the large index and queries
>> are
>> > > > extremely fast, and the small index is small enough to get extremely
>> fast
>> > > > queries without having to warm up the caches too much.
>> > > >
>> > > > Basically I'm able to have a commit frequency of 10 seconds in a 40M
>> docs
>> > > > index while counting TOP5 facets over 14 fields in 200ms.
>> > > > In reality the commit frequency of 10 seconds comes from the fact
>> that
>> > > the
>> > > > updates are going into a 1M - 2M documents index, and the fast facet
>> > > counts
>> > > >

replications issue

2010-02-18 Thread giskard
Hi all,

I've setup solr replication as described in the wiki.

when i start the replication a directory called index.$numebers is created 
after a while
it disappears and a new index.$othernumbers is created

index/ remains untouched with an empty index.

any clue?

thank you in advance,
Riccardo

--
ciao,
giskard





Schema error unknown field

2010-02-18 Thread Pulkit Singhal
I'm getting the following exception
SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc'

I'm wondering what I need to do in order to add the "desc" field to
the Solr schema for indexing?


@Field annotation support

2010-02-18 Thread Pulkit Singhal
Hello All,

When I use Maven or Eclipse to try and compile my bean which has the
@Field annotation as specified in http://wiki.apache.org/solr/Solrj
page ... the compiler doesn't find any class to support the
annotation. What jar should we use to bring in this custom Solr
annotation?


Re: Schema error unknown field

2010-02-18 Thread Erick Erickson
Add desc as a  in your schema.xml
file would be my first guess.

Providing some explanation of what you're trying to do
would help diagnose your issues.

HTH
Erick

On Thu, Feb 18, 2010 at 12:21 PM, Pulkit Singhal wrote:

> I'm getting the following exception
> SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc'
>
> I'm wondering what I need to do in order to add the "desc" field to
> the Solr schema for indexing?
>


Re: parsing strings into phrase queries

2010-02-18 Thread Kevin Osborn
The PositionFilter worked great for my purpose along with another filter that I 
build.

In my case, my indexed data may be something like "X150". So, a query for 
"Nokia X150" should match. But I don't want random matches on "x". However, if 
my indexed data is "G7", I do want a query on "PowerShot G7" to match on "g" 
and "7". So, a simple length filter will not do. Instead I build a custom 
filter (that I am willing to contribute back) that filters out singletons that 
are surrounded by longer tokens (3 or more by default). So, "PowerShot G7" 
becomes "power" "shot" "g" "7", but "Nokia X150" becomes "nokia" "150".

And then I put the results of this into a PositionFilter. This allows "Nokia 
X150ABC" to match against the "X150" part. So far I really like this for 
partial part number searches. And then to boost exact matches, I used copyField 
to create another field without PositionFilter. And then did an optional phrase 
query on that.





From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Wed, February 17, 2010 7:23:23 PM
Subject: Re: parsing strings into phrase queries

That would be great. After reading this and the PositionFilter class I
still don't know how to use it.

On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir  wrote:
> i think we can improve the docs/wiki to show this example use case, i
> noticed the wiki explanation for this filter gives a more complex shingles
> example, which is interesting, but this seems to be a common problem and
> maybe we should add this use case.
>
> On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
> wrote:
>
>>
>> : take a look at PositionFilter
>>
>> Right, there was another thread recently where almost the exact same issue
>> was discussed...
>>
>> http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html
>>
>> ..except that i was ignorant of the existence of PositionFilter when i
>> wrote that message.
>>
>>
>>
>> -Hoss
>>
>>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com



  

Re: Faceting

2010-02-18 Thread José Moreira
have you used UIMA? i did a quick read on the docs and it seems to do what
i'm looking for.

2010/2/11 Otis Gospodnetic 

> Note that UIMA doesn't doe NER itself (as far as I know), but instead
> relies on GATE or OpenNLP or OpenCalais, AFAIK :)
>
> Those interested in UIMA and living close to New York should go to
> http://www.meetup.com/NYC-Search-and-Discovery/calendar/12384559/
>
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> - Original Message 
> > From: Jan Høydahl / Cominvent 
> > To: solr-user@lucene.apache.org
> > Sent: Tue, February 9, 2010 9:57:26 AM
> > Subject: Re: Faceting
> >
> > NOTE: Please start a new email thread for a new topic (See
> > http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking)
> >
> > Your strategy could work. You might want to look into dedicated entity
> > extraction frameworks like
> > http://opennlp.sourceforge.net/
> > http://nlp.stanford.edu/software/CRF-NER.shtml
> > http://incubator.apache.org/uima/index.html
> >
> > Or if that is too much work, look at
> > http://issues.apache.org/jira/browse/SOLR-1725 for a way to plug in your
> entity
> > extraction code into Solr itself using a scripting language.
> >
> > --
> > Jan Høydahl  - search architect
> > Cominvent AS - www.cominvent.com
> >
> > On 5. feb. 2010, at 20.10, José Moreira wrote:
> >
> > > Hello,
> > >
> > > I'm planning to index a 'content' field for search and from that
> > > fields text content i would like to facet (probably) according to if
> > > the content has e-mails, urls and within urls, url's to pictures,
> > > videos and others.
> > >
> > > As i'm a relatively new user to Solr, my plan was to regexp the
> > > content in my application and add tags to a Solr field according to
> > > the content, so for example the content "m...@email.com
> > > http://www.site.com"; would have the tags "email, link".
> > >
> > > If i follow this path can i then facet on "email" and/or "link" ? For
> > > example combining facet field with facet value params?
> > >
> > > Best
> > >
> > > --
> > > http://pt.linkedin.com/in/josemoreira
> > > josemore...@irc.freenode.net
> > > http://djangopeople.net/josemoreira/
>
>


-- 
josemore...@irc.freenode.net
http://pt.linkedin.com/in/josemoreira
http://djangopeople.net/josemoreira/


Re: Realtime search and facets with very frequent commits

2010-02-18 Thread Janne Majaranta
Ok, thanks.

-Janne


2010/2/18 Jason Rutherglen 

> Janne,
>
> I don't think there's any activity happening there.
>
> SOLR-1606 is the tracking issue for moving to per segment facets and
> docsets.  I haven't had an immediate commercial need to implement
> those.
>
> Jason
>
> On Thu, Feb 18, 2010 at 7:04 AM, Janne Majaranta
>  wrote:
> > Hi Otis,
> >
> > Ok, now I'm confused ;)
> > There seems to be a bit activity though when looking at the "last
> updated"
> > timestamps in the google code project wiki:
> > http://code.google.com/p/oceansearch/w/list
> >
> > The Tag Index feature sounds very interesting.
> >
> > -Janne
> >
> >
> > 2010/2/18 Otis Gospodnetic 
> >
> >> Hi Janne,
> >>
> >> I *think*  Ocean Realtime Search has been superseded by Lucene NRT
> search.
> >>
> >>  Otis
> >> 
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >> Hadoop ecosystem search :: http://search-hadoop.com/
> >>
> >>
> >>
> >> - Original Message 
> >> > From: Janne Majaranta 
> >> > To: solr-user@lucene.apache.org
> >> > Sent: Thu, February 18, 2010 2:12:37 AM
> >> > Subject: Re: Realtime search and facets with very frequent commits
> >> >
> >> > Hi,
> >> >
> >> > Yes, I did play with mergeFactor.
> >> > I didn't play with mergePolicy.
> >> >
> >> > Wouldn't that affect indexing speed and possibly memory usage ?
> >> > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec
> >> via
> >> > the standard HTTP API ).
> >> >
> >> > My problem is that I need very warm caches to get fast faceting, and
> the
> >> > autowarming of the caches takes too long compared to the frequency of
> >> > commits I'm having.
> >> > So a commit every minute means less than a minute time to warm the
> >> caches.
> >> >
> >> > To give you a idea of what kind of queries needs to be autowarmed in
> my
> >> app,
> >> > the logevents indexed as documents have timestamps with different
> >> > granularity used for faceting.
> >> > For example, to get count of logevents for every hour using faceting
> >> there's
> >> > a timestamp field with the format mmddhh ( for example: 2010021808
> >> > meaning 2010-02-18 8am).
> >> > One use case is to get hourly counts over the whole index. A
> non-cached
> >> > query counting the hourly counts over the 40M documents index takes a
> >> > while..
> >> > And to my understanding autowarming means something like that this
> kind
> >> of
> >> > query would be basically re-executed against a cold cache. Probably
> not
> >> > exactly how it works, but it "feels" like it would.
> >> >
> >> > Moving the commits to a smaller index while using sharding to have a
> >> > transparent view to the index from the client app seems to solve my
> >> problem.
> >> >
> >> > I'm not sure if the (upcoming?) NRT features would keep the caches
> more
> >> > persistent, probably not in a environment where docs get frequent
> updates
> >> /
> >> > deletes.
> >> >
> >> > Also, I'm closely following the Ocean Realtime Search project AND it's
> >> SOLR
> >> > integration. It sounds like it has the "dream features" to enable
> >> realtime
> >> > updates to the index.
> >> >
> >> > -Janne
> >> >
> >> >
> >> > 2010/2/18 Jan Høydahl / Cominvent
> >> >
> >> > > Hi,
> >> > >
> >> > > Have you tried playing with mergeFactor or even mergePolicy?
> >> > >
> >> > > --
> >> > > Jan Høydahl  - search architect
> >> > > Cominvent AS - www.cominvent.com
> >> > >
> >> > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote:
> >> > >
> >> > > > Hey Dipti,
> >> > > >
> >> > > > Basically query optimizations + setting cache sizes to a very high
> >> level.
> >> > > > Other than that, the config is about the same as the
> out-of-the-box
> >> > > config
> >> > > > that comes with the Solr download.
> >> > > >
> >> > > > I haven't found a magic switch to get very fast query responses +
> >> facet
> >> > > > counts with the frequency of commits I'm having using one single
> SOLR
> >> > > > instance.
> >> > > > Adding some TOP queries for a certain type of user to static
> warming
> >> > > queries
> >> > > > just moved the time of autowarming the caches to the time it took
> to
> >> warm
> >> > > > the caches with static queries.
> >> > > > I've been staging a setup where there's a small solr instance
> >> receiving
> >> > > all
> >> > > > the updates and a large instance which doesn't receive the live
> feed
> >> of
> >> > > > updates.
> >> > > > The small index will be merged with the large index periodically
> >> (once a
> >> > > > week or once a month).
> >> > > > The two instances are seen by the client app as one instance using
> >> the
> >> > > > sharding features of SOLR.
> >> > > > The instances are running on the same server inside their own JVM
> /
> >> > > jetty.
> >> > > >
> >> > > > In this setup the caches are very HOT for the large index and
> queries
> >> are
> >> > > > extremely fast, and the small index is small enough to get
> extremely
> >> fast
> >> > > > queries without having to warm up the caches t

Re: including 'the' dismax query kills results

2010-02-18 Thread Joe Calderon
use the common grams filter, itll create tokens for stop words and
their adjacent terms

On Thu, Feb 18, 2010 at 7:16 AM, Nagelberg, Kallin
 wrote:
> I've noticed some peculiar behavior with the dismax searchhandler.
>
> In my case I'm making the search "The British Open", and am getting 0 
> results. When I change it to "British Open" I get many hits. I looked at the 
> query analyzer and it should be broken down to "british" and "open" tokens 
> ('the' is a stopword). I imagine it is doing an 'and' type search, and by 
> setting the 'mm' parameter to 1 I once again get results for 'the british 
> open'. I would like mm to be 100% however, but just not care about stopwords. 
> Is there a way to do this?
>
> Thanks,
> -Kal
>


Re: Deleting spelll checker index

2010-02-18 Thread darniz

Thanks
If this is really the case, i declared a new filed called mySpellTextDup and
retired the original field.
Now i have a new field which powers my dictionary with no words in it and 
now i am free to index which ever term i want.

This is not the best of solution but i cant think of a reasonable workaround

Thanks
darniz


Lance Norskog-2 wrote:
> 
> This is a quirk of Lucene - when you delete a document, the indexed
> terms for the document are not deleted. That is, if 2 documents have
> the word 'frampton' in an indexed field, the term dictionary contains
> the entry 'frampton' and pointers to those two documents. When you
> delete those two documents, the index contains the entry 'frampton'
> with an empty list of pointers. So, the terms are still there even
> when you delete all of the documents.
> 
> Facets and the spellchecking dictionary build from this term
> dictionary, not from the text string that are 'stored' and returned
> when you search for the documents.
> 
> The  command throws away these remnant terms.
> 
> http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/
> 
> On Wed, Feb 17, 2010 at 12:24 PM, darniz  wrote:
>>
>> Please bear with me on the limitted understanding.
>> i deleted all documents and i made a rebuild of my spell checker  using
>> the
>> command
>> spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default
>>
>> After this i went to the schema browser and i saw that mySpellText still
>> has
>> around 2000 values.
>> How can i make sure that i clean up that field.
>> We had the same issue with facets too, even though we delete all the
>> documents, and if we do a facet on make we still see facets but we can
>> filter out facets by saying facet.mincount>0.
>>
>> Again coming back to my question how can i make mySpellText fields get
>> rid
>> of all previous terms
>>
>> Thanks a lot
>> darniz
>>
>>
>>
>> hossman wrote:
>>>
>>> : But still i cant stop thinking about this.
>>> : i deleted my entire index and now i have 0 documents.
>>> :
>>> : Now if i make a query with accrd i still get a suggestion of accord
>>> even
>>> : though there are no document returned since i deleted my entire index.
>>> i
>>> : hope it also clear the spell check index field.
>>>
>>> there are two Lucene indexes when you use spell checking.
>>>
>>> there is the "main" index which is goverend by your schema.xml and is
>>> what
>>> you add your own documents to, and what searches are run agains for the
>>> result section of solr responses.
>>>
>>> There is also the "spell" index which has only two fields and in
>>> which each "document" corrisponds to a "word" that might be returend as
>>> a
>>> spelling suggestion, and the other fields contain various
>>> start/end/middle
>>> ngrams that represent possible misspellings.
>>>
>>> When you use the spellchecker component it builds the "spell" index
>>> makinga document out of every word it finds in whatever field name you
>>> configure it to use.
>>>
>>> deleting your entire "main" index won't automaticly delete the "spell"
>>> index (allthough you should be able rebuild the "spell" index using the
>>> *empty* "main" index, that should work).
>>>
>>> : i am copying both fields to a field called
>>> : 
>>> : 
>>>
>>> ..at this point your "main" index has a field named mySpellText, and for
>>> ever document it contains a copy of make and model.
>>>
>>> :         
>>> :             default
>>> :             mySpellText
>>> :             true
>>> :             true
>>>
>>> ...so whenever you commit or optimize your "main" index it will take
>>> every
>>> word from the mySpellText and use them all as individual documents in
>>> the
>>> "spell" index.
>>>
>>> In your previous email you said you changed hte copyField declaration,
>>> and
>>> then triggered a commit -- that rebuilt your "spell" index, but the data
>>> was still all there in the mySpellText field of the "main" index, so the
>>> rebuilt "spell" index was exactly the same.
>>>
>>> : i have buildOnOPtmize and buildOnCommit as true so when i index new
>>> document
>>> : i want my dictionary to be created but how can i make sure i remove
>>> the
>>> : preivious indexed terms.
>>>
>>> everytime the spellchecker component "builds" it will create a
>>> completley
>>> new "spell" index .. but if the old data is still in the "main" index
>>> then
>>> it will also be in the "spell" index.
>>>
>>> The only reason i can think of why you'd be seeing words in your "spell"
>>> index after deleting documents from your "main" index is that even if
>>> you
>>> delete documents, the Terms are still there in the underlying index
>>> untill
>>> the segments are merged ... so if you do an optimize that will force
>>> them
>>> to be expunged --- but i honestly have no idea if that is what's causing
>>> your problem, because quite frankly i really don't understand what your
>>> problem is ... you have to provide specifics: reproducible steps anyone
>>> can take using a clean 

Re: Schema error unknown field

2010-02-18 Thread Pulkit Singhal
I guess my n00b-ness is showing :)

I started off using the instructions directly from
http://wiki.apache.org/solr/Solrj and there was no mention of schema
there and even after gettign this error and searching for schema.xml
in the wiki ... I found no meaningful hits so I thought it best to
ask.

With your advice, I searched for schema.xml and found 13 instances of it:

\solr_1.4.0\client\ruby\solr-ruby\solr\conf\schema.xml
\solr_1.4.0\client\ruby\solr-ruby\test\conf\schema.xml
\solr_1.4.0\contrib\clustering\src\test\resource\schema.xml
\solr_1.4.0\contrib\extraction\src\test\resource\schema.xml
\solr_1.4.0\contrib\velocity\src\main\solr\conf\schema.xml
\solr_1.4.0\example\example-DIH\solr\db\conf\schema.xml
\solr_1.4.0\example\example-DIH\solr\mail\conf\schema.xml
\solr_1.4.0\example\example-DIH\solr\rss\conf\schema.xml
\solr_1.4.0\example\multicore\core0\conf\schema.xml
\solr_1.4.0\example\multicore\core1\conf\schema.xml
\solr_1.4.0\example\solr\conf\schema.xml
\solr_1.4.0\src\test\test-files\solr\conf\schema.xml
\solr_1.4.0\src\test\test-files\solr\shared\conf\schema.xml

I took a wild guess and added the field I wanted ("desc") into this
file since its name seemed to be the most generic one:
C:\apps\solr_1.4.0\example\solr\conf\schema.xml

And it worked ... a bit strange that an example directory is used but
I suppose it is configurable somewhere?

Thanks for you help Erick!

Cheers,
- Pulkit

On Thu, Feb 18, 2010 at 9:53 AM, Erick Erickson  wrote:
> Add desc as a  in your schema.xml
> file would be my first guess.
>
> Providing some explanation of what you're trying to do
> would help diagnose your issues.
>
> HTH
> Erick
>
> On Thu, Feb 18, 2010 at 12:21 PM, Pulkit Singhal 
> wrote:
>
>> I'm getting the following exception
>> SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc'
>>
>> I'm wondering what I need to do in order to add the "desc" field to
>> the Solr schema for indexing?
>>
>


Re: Schema error unknown field

2010-02-18 Thread Erick Erickson
NP. And I see why you'd be confused... What's actually happening
is that if you're using the tutorial to make things run, a lot
is happening under the covers. In particular, you're switching to
the solr/example directory where you're invoking the start.jar, which
is pre-configured to bring up the...you guessed it... example project.

This is purposely designed so you have to do the absolute minimal
work to get something to play with up and running, but once you start
trying anything new, you quickly need to get deeper into what's
really happening to make progress. It's the usual trade-off between
having to set up an entire installation before doing your first "hello
world" program (which would include setting up a servlet container,
placing everything in the right place, creating a schema from
scratch, ad nauseum) and having something that "just works" but
conceals lots and lots of details.

The Manning book for SOLR or LucidWorks are good resources

Erick

On Thu, Feb 18, 2010 at 3:03 PM, Pulkit Singhal wrote:

> I guess my n00b-ness is showing :)
>
> I started off using the instructions directly from
> http://wiki.apache.org/solr/Solrj and there was no mention of schema
> there and even after gettign this error and searching for schema.xml
> in the wiki ... I found no meaningful hits so I thought it best to
> ask.
>
> With your advice, I searched for schema.xml and found 13 instances of it:
>
> \solr_1.4.0\client\ruby\solr-ruby\solr\conf\schema.xml
> \solr_1.4.0\client\ruby\solr-ruby\test\conf\schema.xml
> \solr_1.4.0\contrib\clustering\src\test\resource\schema.xml
> \solr_1.4.0\contrib\extraction\src\test\resource\schema.xml
> \solr_1.4.0\contrib\velocity\src\main\solr\conf\schema.xml
> \solr_1.4.0\example\example-DIH\solr\db\conf\schema.xml
> \solr_1.4.0\example\example-DIH\solr\mail\conf\schema.xml
> \solr_1.4.0\example\example-DIH\solr\rss\conf\schema.xml
> \solr_1.4.0\example\multicore\core0\conf\schema.xml
> \solr_1.4.0\example\multicore\core1\conf\schema.xml
> \solr_1.4.0\example\solr\conf\schema.xml
> \solr_1.4.0\src\test\test-files\solr\conf\schema.xml
> \solr_1.4.0\src\test\test-files\solr\shared\conf\schema.xml
>
> I took a wild guess and added the field I wanted ("desc") into this
> file since its name seemed to be the most generic one:
> C:\apps\solr_1.4.0\example\solr\conf\schema.xml
>
> And it worked ... a bit strange that an example directory is used but
> I suppose it is configurable somewhere?
>
> Thanks for you help Erick!
>
> Cheers,
> - Pulkit
>
> On Thu, Feb 18, 2010 at 9:53 AM, Erick Erickson 
> wrote:
> > Add desc as a  in your schema.xml
> > file would be my first guess.
> >
> > Providing some explanation of what you're trying to do
> > would help diagnose your issues.
> >
> > HTH
> > Erick
> >
> > On Thu, Feb 18, 2010 at 12:21 PM, Pulkit Singhal <
> pulkitsing...@gmail.com>wrote:
> >
> >> I'm getting the following exception
> >> SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc'
> >>
> >> I'm wondering what I need to do in order to add the "desc" field to
> >> the Solr schema for indexing?
> >>
> >
>


Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-18 Thread Tom Burton-West

Thanks Otis,

I don't know enough about Hadoop to understand the advantage of using Hadoop
in this use case.  How would using Hadoop differ from distributing the
indexing over 10 shards on 10 machines with Solr?

Tom



Otis Gospodnetic wrote:
> 
> Hi Tom,
> 
> 32MB is very low, 320MB is medium, and I think you could go higher, just
> pick whichever garbage collector is good for throughput.  I know Java 1.6
> update 18 also has some Hotspot and maybe also GC fixes, so I'd use that. 
> Finally, this sounds like a good use case for reindexing with Hadoop!
> 
>  Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 

-- 
View this message in context: 
http://old.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB--tp27631231p27645167.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-18 Thread Yonik Seeley
On Thu, Feb 18, 2010 at 8:52 AM, Otis Gospodnetic
 wrote:
> 32MB is very low, 320MB is medium, and I think you could go higher, just pick 
> whichever garbage collector is good for throughput.  I know Java 1.6 update 
> 18 also has some Hotspot and maybe also GC fixes, so I'd use that.

I think you misread Tom's email - it sounds like you are talking about
JVM heap sizes, not ramBufferSizeMB?
32MB is certainly not low, and 320MB is not medium.

-Yonik
http://www.lucidimagination.com


Re: Schema error unknown field

2010-02-18 Thread Smiley, David W.
On Feb 18, 2010, at 3:27 PM, Erick Erickson wrote:

> The Manning book for SOLR or LucidWorks are good resources

And of course the PACKT book ;-)

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/







Run Solr within my war

2010-02-18 Thread Pulkit Singhal
Hello Everyone,

I do NOT want to host Solr separately. I want to run it within my war
with the Java Application which is using it. How easy/difficult is
that to setup? Can anyone with past experience on this topic, please
comment.

thanks,
- Pulkit


Re: Run Solr within my war

2010-02-18 Thread Dave Searle
Why would you want to? Surely having it seperate increases scalablity?

On 18 Feb 2010, at 22:23, "Pulkit Singhal"   
wrote:

> Hello Everyone,
>
> I do NOT want to host Solr separately. I want to run it within my war
> with the Java Application which is using it. How easy/difficult is
> that to setup? Can anyone with past experience on this topic, please
> comment.
>
> thanks,
> - Pulkit


Re: Schema error unknown field

2010-02-18 Thread Erick Erickson
Oops, got my Manning MEAP edition of LIA II mixed up with my PACKT SOLR 1.4
book.

But some author guy caught my gaffe ...

Erick

On Thu, Feb 18, 2010 at 5:13 PM, Smiley, David W.  wrote:

> On Feb 18, 2010, at 3:27 PM, Erick Erickson wrote:
>
> > The Manning book for SOLR or LucidWorks are good resources
>
> And of course the PACKT book ;-)
>
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>
>
>
>
>
>


Re: Run Solr within my war

2010-02-18 Thread Pulkit Singhal
Yeah I have been pitching that but I want all the functionality of
Solr in a small package because it is not a concern given the
specifically limited data set being searched upon. I understand that
the # of users is still another part of this equation but there just
aren't that many at this time and having it separate will add to
deployment complexity and kill the product before it ever takes off.
Adoption is key for me.

On Thu, Feb 18, 2010 at 2:25 PM, Dave Searle  wrote:
> Why would you want to? Surely having it seperate increases scalablity?
>
> On 18 Feb 2010, at 22:23, "Pulkit Singhal" 
> wrote:
>
>> Hello Everyone,
>>
>> I do NOT want to host Solr separately. I want to run it within my war
>> with the Java Application which is using it. How easy/difficult is
>> that to setup? Can anyone with past experience on this topic, please
>> comment.
>>
>> thanks,
>> - Pulkit
>


spellcheck.build=true has no effect

2010-02-18 Thread darniz

Hello All.
After doing a lot of research i came to this conclusion please correct me if
i am wrong.
i noticed that if you have buildonCommit and buildOnOptimize as true in your
spell check component, then the spell check builds whenever a commit or
optimze happens. which is the desired behaviour and correct. 
please read on.

I am using Index based spell checker and i am copying make and model to my
spellcheck field. i index some document and the make and model are being
copied to spellcheck field when i commit.
Now i stopped my solr server and 
I added one more filed bodytype to be copied to my spellcheck field.
i dont want to reindex data so i issued a http request to rebuild my
spellchecker
&spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default.
Looks like the above command has no effect, the bodyType is not being copied
to spellcheck field.

The only time the spellcheck filed has bodyType value copied into it is when
i have to do again reindex document and do a commmit.

Is this the desired behaviour.
Adding buildOncommit and buildOnOptimize will force the spellchecker to
rebuild only if a commit or optimize happens
Please let me know if there are some configurable parameters


thanks
darniz

-- 
View this message in context: 
http://old.nabble.com/spellcheck.build%3Dtrue-has-no-effect-tp27648346p27648346.html
Sent from the Solr - User mailing list archive at Nabble.com.



Range Searches in Collections

2010-02-18 Thread cjkadakia

Hi, I'm trying to do a search on a range of floats that are part of my solr
schema. Basically we have a collection of "fees" that are associated with
each document in our index.

The query I tried was:

q=fees:[3 TO 10]

This should return me documents with Fee values between 3 and 10
inclusively, which it does. However, I need it to check for ALL items in
this collection, not just one that satisfies it. Currently, this is
returning me documents with fee values above 10 and below 3 as long as it
contains at least one other within.

Any suggestions on how to accomplish this?
-- 
View this message in context: 
http://old.nabble.com/Range-Searches-in-Collections-tp27648470p27648470.html
Sent from the Solr - User mailing list archive at Nabble.com.



How does one sort facet queries?

2010-02-18 Thread Kelly Taylor

All sorting of facets works great at the field level (count/index)...all good
there...but how is sorting accomplished with range queries? The solrj
response doesn't seem to maintain the order the queries are sent in, and the
order is not in index or count order. What's the trick?

http://localhost:8983/solr/select?q=someterm
  &rows=0
  &facet=true
  &facet.limit=-1
  &facet.query=price:[* TO 100]
  &facet.query=price:[100 TO 200]
  &facet.query=price:[200 TO 300]
  &facet.query=price:[300 TO 400]
  &facet.query=price:[400 TO 500]
  &facet.query=price:[500 TO 600]
  &facet.query=price:[600 TO 700]
  &facet.query=price:[700 TO *]  
  &facet.mincount=1
  &collapse.field=dedupe_hash
  &collapse.threshold=1
  &collapse.type=normal
  &collapse.facet=before

-- 
View this message in context: 
http://old.nabble.com/How-does-one-sort-facet-queries--tp27648587p27648587.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Run Solr within my war

2010-02-18 Thread Richard Frovarp

On 2/18/2010 4:22 PM, Pulkit Singhal wrote:

Hello Everyone,

I do NOT want to host Solr separately. I want to run it within my war
with the Java Application which is using it. How easy/difficult is
that to setup? Can anyone with past experience on this topic, please
comment.

thanks,
- Pulkit

   
So basically you're talking about running an embedded version of Solr 
like the EmbeddedSolrServer? I have no experience on this, but this 
should provide you the correct search term to find documentation on use. 
From what little code I've seen to run test cases against Solr, it looks 
relatively straight forward to get running. To use you would use the 
SolrJ library to communicate with the embedded solr server.


Richard


Re: Range Searches in Collections

2010-02-18 Thread Otis Gospodnetic
Hm, yes, it sounds like your "fees" field has multiple values/tokens, one for 
each fee.  That's full-text search for you. :)
How about having multiple fee fields, each with just one fee value?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/





From: cjkadakia 
To: solr-user@lucene.apache.org
Sent: Thu, February 18, 2010 7:58:23 PM
Subject: Range Searches in Collections


Hi, I'm trying to do a search on a range of floats that are part of my solr
schema. Basically we have a collection of "fees" that are associated with
each document in our index.

The query I tried was:

q=fees:[3 TO 10]

This should return me documents with Fee values between 3 and 10
inclusively, which it does. However, I need it to check for ALL items in
this collection, not just one that satisfies it. Currently, this is
returning me documents with fee values above 10 and below 3 as long as it
contains at least one other within.

Any suggestions on how to accomplish this?
-- 
View this message in context: 
http://old.nabble.com/Range-Searches-in-Collections-tp27648470p27648470.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-18 Thread Otis Gospodnetic
Hi Tom,

It wouldn't.  I didn't see the mention of parallel indexing in the original 
email. :)

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Tom Burton-West 
> To: solr-user@lucene.apache.org
> Sent: Thu, February 18, 2010 3:30:05 PM
> Subject: Re: What is largest reasonable setting for ramBufferSizeMB?
> 
> 
> Thanks Otis,
> 
> I don't know enough about Hadoop to understand the advantage of using Hadoop
> in this use case.  How would using Hadoop differ from distributing the
> indexing over 10 shards on 10 machines with Solr?
> 
> Tom
> 
> 
> 
> Otis Gospodnetic wrote:
> > 
> > Hi Tom,
> > 
> > 32MB is very low, 320MB is medium, and I think you could go higher, just
> > pick whichever garbage collector is good for throughput.  I know Java 1.6
> > update 18 also has some Hotspot and maybe also GC fixes, so I'd use that. 
> > Finally, this sounds like a good use case for reindexing with Hadoop!
> > 
> >  Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Hadoop ecosystem search :: http://search-hadoop.com/
> > 
> > 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB--tp27631231p27645167.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: parsing strings into phrase queries

2010-02-18 Thread Otis Gospodnetic
This sounds useful to me!
Here's a pointer: http://wiki.apache.org/solr/HowToContribute


Thanks!
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/





From: Kevin Osborn 
To: solr-user@lucene.apache.org
Sent: Thu, February 18, 2010 1:15:11 PM
Subject: Re: parsing strings into phrase queries

The PositionFilter worked great for my purpose along with another filter that I 
build.

In my case, my indexed data may be something like "X150". So, a query for 
"Nokia X150" should match. But I don't want random matches on "x". However, if 
my indexed data is "G7", I do want a query on "PowerShot G7" to match on "g" 
and "7". So, a simple length filter will not do. Instead I build a custom 
filter (that I am willing to contribute back) that filters out singletons that 
are surrounded by longer tokens (3 or more by default). So, "PowerShot G7" 
becomes "power" "shot" "g" "7", but "Nokia X150" becomes "nokia" "150".

And then I put the results of this into a PositionFilter. This allows "Nokia 
X150ABC" to match against the "X150" part. So far I really like this for 
partial part number searches. And then to boost exact matches, I used copyField 
to create another field without PositionFilter. And then did an optional phrase 
query on that.





From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Wed, February 17, 2010 7:23:23 PM
Subject: Re: parsing strings into phrase queries

That would be great. After reading this and the PositionFilter class I
still don't know how to use it.

On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir  wrote:
> i think we can improve the docs/wiki to show this example use case, i
> noticed the wiki explanation for this filter gives a more complex shingles
> example, which is interesting, but this seems to be a common problem and
> maybe we should add this use case.
>
> On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
> wrote:
>
>>
>> : take a look at PositionFilter
>>
>> Right, there was another thread recently where almost the exact same issue
>> was discussed...
>>
>> http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html
>>
>> ..except that i was ignorant of the existence of PositionFilter when i
>> wrote that message.
>>
>>
>>
>> -Hoss
>>
>>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Re: replications issue

2010-02-18 Thread Otis Gospodnetic
giskard,

Is this on the master or on the slave(s)?
Maybe you can paste your replication handler config for the master and your 
replication handler config for the slave.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/





From: giskard 
To: solr-user@lucene.apache.org
Sent: Thu, February 18, 2010 12:16:37 PM
Subject: replications issue

Hi all,

I've setup solr replication as described in the wiki.

when i start the replication a directory called index.$numebers is created 
after a while
it disappears and a new index.$othernumbers is created

index/ remains untouched with an empty index.

any clue?

thank you in advance,
Riccardo

--
ciao,
giskard

Re: @Field annotation support

2010-02-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
solrj jar

On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal
 wrote:
> Hello All,
>
> When I use Maven or Eclipse to try and compile my bean which has the
> @Field annotation as specified in http://wiki.apache.org/solr/Solrj
> page ... the compiler doesn't find any class to support the
> annotation. What jar should we use to bring in this custom Solr
> annotation?
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: optimize is taking too much time

2010-02-18 Thread mklprasad



Jagdish Vasani-2 wrote:
> 
> Hi,
> 
> you should not optimize index after each insert of document.insted you
> should optimize it after inserting some good no of documents.
> because in optimize it will merge  all segments to one according to
> setting
> of lucene index.
> 
> thanks,
> Jagdish
> On Fri, Feb 12, 2010 at 4:01 PM, mklprasad  wrote:
> 
>>
>> hi
>> in my solr u have 1,42,45,223 records having some 50GB .
>> Now when iam loading a new record and when its trying optimize the docs
>> its
>> taking 2 much memory and time
>>
>>
>> can any body please tell do we have any property in solr to get rid of
>> this.
>>
>> Thanks in advance
>>
>> --
>> View this message in context:
>> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

Yes,
Thanks for reply 
i have removed the optmize() from  code. but i have a doubt ..
1.Will  mergefactor internally do any optmization (or) we have to specify

2. Even if solr initaiates optmize if i have a large data like 52GB will
that takes huge time?

Thanks,
Prasad



-- 
View this message in context: 
http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
Sent from the Solr - User mailing list archive at Nabble.com.