Hi Lance,
Use case is "keyword extraction", and it could be 2- and 3-grams (2- and 3- words); so that theoretically we can have 10,000^3 = 1,000,000,000,000 3-grams for English only... of course my suggestion is to use statistics and to build a dictionary of such 3-word combinations (remove top, remove tail, using frequencies)... And to hard-limit this dictionary to 1,000,000... That was business requirement which technically impossible to implement (as a realtime query results); we don't even use word stemming etc... -Fuad On 12-08-20 7:22 PM, "Lance Norskog" <goks...@gmail.com> wrote: >Is this required by your application? Is there any way to reduce the >number of terms? > >A work around is to use shards. If your terms follow Zipf's Law each >shard will have fewer than the complete number of terms. For N shards, >each shard will have ~1/N of the singleton terms. For 2-count terms, >1/N or 2/N will have that term. > >Now I'm interested but not mathematically capable: what is the general >probabilistic formula for splitting Zipf's Law across shards? > >On Mon, Aug 20, 2012 at 3:51 PM, Jack Krupansky <j...@basetechnology.com> >wrote: >> It appears that there is a hard limit of 24-bits or 16M for the number >>of >> bytes to reference the terms in a single field of a single document. It >> takes 1, 2, 3, 4, or 5 bytes to reference a term. If it took 4 bytes, >>that >> would allow 16/4 or 4 million unique terms - per document. Do you have >>such >> large documents? This appears to be a hard limit based of 24-bytes in a >>Java >> int. >> >> You can try facet.method=enum, but that may be too slow. >> >> What release of Solr are you running? >> >> -- Jack Krupansky >> >> -----Original Message----- From: Fuad Efendi >> Sent: Monday, August 20, 2012 4:34 PM >> To: Solr-User@lucene.apache.org >> Subject: UnInvertedField limitations >> >> >> Hi All, >> >> >> I have a problemÅ (Yonik, please!) help me, what is Term count limits? >>I >> possibly have 256,000,000 different terms in a fieldÅ or 16,000,000? >> >> Thanks! >> >> >> 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] >>- : >> org.apache.solr.common.SolrException: Too many values for >>UnInvertedField >> faceting on field enrich_keywords_string_mv >> at >> org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:179) >> at >> >>org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedFiel >>d.j >> ava:668) >> at >> >>org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326) >> at >> >>org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.jav >>a:4 >> 23) >> at >> >>org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206 >>) >> at >> >>org.apache.solr.handler.component.FacetComponent.process(FacetComponent.j >>ava >> :85) >> at >> >>org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH >>and >> ler.java:204) >> at >> >>org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa >>se. >> java:129) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) >> >> >> >> >> -- >> Fuad Efendi >> http://www.tokenizer.ca >> >> >> > > > >-- >Lance Norskog >goks...@gmail.com