Re: How to configure Solr PostingsFormat block size

2015-03-12 Thread Tom Burton-West
Hi Hoss, I created a wrapper class, compiled a jar and included an org.apache.lucene.codecs.Codec file in META-INF/services in the jar file with an entry for the wrapper class :HTPostingsFormatWrapper. I created a collection1/lib directory and put the jar there. (see below) I'm getting the drea

Re: PostingsFormat block size

2015-01-28 Thread Trym Møller
Hi Thanks for your input. I do not do updates to the existing docs, so that is not relevant in my case, and I have just skipped that test case :-) I have not been able to measure any significant changes to the distributed searches or just doing a direct search for an id. Did I miss something

Re: PostingsFormat block size

2015-01-27 Thread Mikhail Khludnev
Hm.. It's not blocks which I'm familiar with. Regarding performance impact from bigger ID blocks: if you have ID and sends update for existing docs. And IDs are also used for some of the distributed search stages, I suppose. Here it is. On Tue, Jan 27, 2015 at 4:33 PM, Trym Møller wrote: > Hi >

Re: PostingsFormat block size

2015-01-27 Thread Trym Møller
Hi Thanks for your clarifying questions. In the constructor of the Lucene41PostingsFormat class the minimum and maximum block size is provided. These sizes are used when creating the BlockTreeTermsWriter (responsible for writing the .tim and .tip files of the lucene index). It is the blocksiz

Re: PostingsFormat block size

2015-01-27 Thread Mikhail Khludnev
Hello Trym, Can you clarify, which blockSize do you mean? And the second q, just to avoid unnecessary explanation, do you know what's Pulsing? On Tue, Jan 27, 2015 at 2:28 PM, Trym Møller wrote: > Hi > > I have successfully create a really cool Lucene41x8PostingsFormat class (a > copy of the Lu

PostingsFormat block size

2015-01-27 Thread Trym Møller
Hi I have successfully create a really cool Lucene41x8PostingsFormat class (a copy of the Lucene41PostingsFormat class modified to use 8 times the default block size), registered the format as required. In the schema.xml I have created a field type string with this postingsformat and lastly I

Re: How to configure Solr PostingsFormat block size

2015-01-14 Thread Chris Hostetter
: As a foolish dev (not malicious I hope!), I did mess around with something : like this once; I was writing my own Codec. I found I had to create a file : called META-INF/services/org.apache.lucene.codecs.Codec in my solr plugin jar : that contained the fully-qualified class name of my codec: I

Re: How to configure Solr PostingsFormat block size

2015-01-14 Thread Michael Sokolov
As a foolish dev (not malicious I hope!), I did mess around with something like this once; I was writing my own Codec. I found I had to create a file called META-INF/services/org.apache.lucene.codecs.Codec in my solr plugin jar that contained the fully-qualified class name of my codec: I guess

Re: How to configure Solr PostingsFormat block size

2015-01-13 Thread Chris Hostetter
: This is starting to sound pretty complicated. Are you saying this is not : doable with Solr 4.10? it should be doable in 4.10, using a wrapper class like the one i mentioned below (delegating to Lucene51PostingsFormat instead of Lucene50PostingsFormat) ... it's just that the 4.10 APIs are dan

Re: How to configure Solr PostingsFormat block size

2015-01-13 Thread Tom Burton-West
Thanks Hoss, This is starting to sound pretty complicated. Are you saying this is not doable with Solr 4.10? >>...or at least: that's how it *should* work :) makes me a bit nervous about trying this on my own. Should I open a JIRA issue or am I probably the only person with a use case for repla

Re: How to configure Solr PostingsFormat block size

2015-01-13 Thread Chris Hostetter
: ...the nuts & bolts of it is that the PostingFormat baseclass should take : care of all the SPI "name" registration that you need based on what you : pass to the super() construction ... allthough now that i think about it, : i'm not sure how you'd go about specifying your own name for the :

Re: How to configure Solr PostingsFormat block size

2015-01-13 Thread Chris Hostetter
: assuming I've written the subclass of the postings format, I need to tell : Solr to use it. : : Do I just do something like: : : the postingFormat xml tag in schema.xml just refers to the "name" of the postingFormat in SPI -- which is discussed in the PostingFormat javadocs... https://luc

Re: How to configure Solr PostingsFormat block size

2015-01-13 Thread Tom Burton-West
Thanks Michael and Hoss, assuming I've written the subclass of the postings format, I need to tell Solr to use it. Do I just do something like: Is there a way to set this for all fieldtypes or would that require writing a custom CodecFactory? Tom On Mon, Jan 12, 2015 at 4:46 PM, Chris Hoste

Re: How to configure Solr PostingsFormat block size

2015-01-12 Thread Chris Hostetter
: It looks like this is a good starting point: : : http://wiki.apache.org/solr/SolrConfigXml#codecFactory The default "SchemaCodecFactory" already supports defining a diff posting format per fieldType - but there isn't much in solr to let you "tweak" individual options on specific posting form

Re: How to configure Solr PostingsFormat block size

2015-01-12 Thread Michael Sokolov
It looks like this is a good starting point: http://wiki.apache.org/solr/SolrConfigXml#codecFactory -Mike On 01/12/2015 03:37 PM, Tom Burton-West wrote: Hello all, Our indexes have around 3 billion unique terms, so for Solr 3, we set TermIndexInterval to about 8 times the default. The net ef

How to configure Solr PostingsFormat block size

2015-01-12 Thread Tom Burton-West
Hello all, Our indexes have around 3 billion unique terms, so for Solr 3, we set TermIndexInterval to about 8 times the default. The net effect of this is to reduce the size of the in-memory index by about 1/8th. (For background see for http://www.hathitrust.org/blogs/large-scale-search/too-many