Hi Lance,

Use case is "keyword extraction", and it could be 2- and 3-grams (2- and
3- words); so that theoretically we can have 10,000^3 = 1,000,000,000,000
3-grams for English only... of course my suggestion is to use statistics and
to build a dictionary of such 3-word combinations (remove top, remove
tail, using frequencies)... And to hard-limit this dictionary to 1,000,000...
That was business requirement which technically impossible to implement
(as a realtime query results); we don't even use word stemming etc...




-Fuad




On 12-08-20 7:22 PM, "Lance Norskog" <goks...@gmail.com> wrote:

>Is this required by your application? Is there any way to reduce the
>number of terms?
>
>A work around is to use shards. If your terms follow Zipf's Law each
>shard will have fewer than the complete number of terms. For N shards,
>each shard will have ~1/N of the singleton terms. For 2-count terms,
>1/N or 2/N will have that term.
>
>Now I'm interested but not mathematically capable: what is the general
>probabilistic formula for splitting Zipf's Law across shards?
>
>On Mon, Aug 20, 2012 at 3:51 PM, Jack Krupansky <j...@basetechnology.com>
>wrote:
>> It appears that there is a hard limit of 24-bits or 16M for the number
>>of
>> bytes to reference the terms in a single field of a single document. It
>> takes 1, 2, 3, 4, or 5 bytes to reference a term. If it took 4 bytes,
>>that
>> would allow 16/4 or 4 million unique terms - per document. Do you have
>>such
>> large documents? This appears to be a hard limit based of 24-bytes in a
>>Java
>> int.
>>
>> You can try facet.method=enum, but that may be too slow.
>>
>> What release of Solr are you running?
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Fuad Efendi
>> Sent: Monday, August 20, 2012 4:34 PM
>> To: Solr-User@lucene.apache.org
>> Subject: UnInvertedField limitations
>>
>>
>> Hi All,
>>
>>
>> I have a problemÅ   (Yonik, please!) help me, what is Term count limits?
>>I
>> possibly have 256,000,000 different terms in a fieldÅ  or 16,000,000?
>>
>> Thanks!
>>
>>
>> 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1]
>>- :
>> org.apache.solr.common.SolrException: Too many values for
>>UnInvertedField
>> faceting on field enrich_keywords_string_mv
>>        at
>> org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:179)
>>        at
>> 
>>org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedFiel
>>d.j
>> ava:668)
>>        at
>> 
>>org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326)
>>        at
>> 
>>org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.jav
>>a:4
>> 23)
>>        at
>> 
>>org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206
>>)
>>        at
>> 
>>org.apache.solr.handler.component.FacetComponent.process(FacetComponent.j
>>ava
>> :85)
>>        at
>> 
>>org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH
>>and
>> ler.java:204)
>>        at
>> 
>>org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
>>se.
>> java:129)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
>>
>>
>>
>>
>> --
>> Fuad Efendi
>> http://www.tokenizer.ca
>>
>>
>>
>
>
>
>-- 
>Lance Norskog
>goks...@gmail.com


Reply via email to