Re: SolrCloud indexing

2015-05-09 Thread Shawn Heisey
On 5/9/2015 8:41 PM, Bill Au wrote: > Is the behavior of document being indexed independently on each node in a > SolrCloud cluster new in 5.x or is that true in 4.x also? > > If the document is indexed independently on each node, then if I query the > document from each node directly, a timestamp

Re: SolrCloud indexing

2015-05-09 Thread Bill Au
Is the behavior of document being indexed independently on each node in a SolrCloud cluster new in 5.x or is that true in 4.x also? If the document is indexed independently on each node, then if I query the document from each node directly, a timestamp could hold different values since the documen

Re: determine "big" documents in the index?

2015-05-09 Thread Erick Erickson
1> Right, shingles (and you've set max size to 3) a bazillion possibilities, so the sky's the limit. It's usually smaller than that since some patterns of words aren't very likely, but it's still a big number. I'd really take a look at the terms that are actually indexed with TermsComponent or sim

Re: JSON Facet & Analytics API in Solr 5.1

2015-05-09 Thread Yonik Seeley
curl -g "http://localhost:8983/solr/techproducts/query?q=*:*&json.facet={cats:{terms:{field:cat,sort:'count+asc'}}}" Using curl with everything in the URL is definitely trickier. Everything needs to be URL escaped. If it's not, curl will often silently do nothing. For example, when I had sort:'c

Re: indexing java byte code in classes / jars

2015-05-09 Thread Mark
Hi Alexandre, Solr & ASM is the extact poblem I'm looking to hack about with so I'm keen to consider any code no matter how ugly or broken Regards Mark On 9 May 2015 at 10:21, Alexandre Rafalovitch wrote: > If you only have classes/jars, use ASM. I have done this before, have some > ugly cod

Re: indexing java byte code in classes / jars

2015-05-09 Thread Alexandre Rafalovitch
If you only have classes/jars, use ASM. I have done this before, have some ugly code to share if you want. If you have sources, javadoc 8 is a good way too. I am doing that now for solr-start.com, code on Github. Regards, Alex On 9 May 2015 7:09 am, "Mark" wrote: > To answer why bytecode -

AW: determine "big" documents in the index?

2015-05-09 Thread Clemens Wyss DEV
> If you used shingles I do: >http://lucidworks.com/blog/indexing-with-solrj/ This is more or less what I do >2> you have a lot of garbage in your input. >OCR is notorious for this,as are binary blobs. What does the AutodetectParser return in case of