Re: Exception when optimizing index

2012-06-10 Thread Rok Rejc
Hi all, I have run CheckIndex. It seems that the index is currupted. I've got plenty of exceptions like: test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException java.lang.ArrayIndexOutOfBoundsException at org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDa

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-10 Thread Aaron Daubman
Hoss, The new FieldValueSubsetUpdateProcessorFactory classes look phenomenal. I haven't looked yet, but what are the chances these will be back-ported to 3.6 (or how hard would it be to backport them?)... I'll have to check out the source in more detail. If stuck on 3.6, what would be the best wa

Re: What would cause: "SEVERE: java.lang.ClassCastException: com.company.MyCustomTokenizerFactory cannot be cast to org.apache.solr.analysis.TokenizerFactory"

2012-06-10 Thread Aaron Daubman
Jack, Thanks - this was indeed the issue. I still don't understand exactly why (the same local-nexus-hosted Solr jars were the ones being duplicated on the classpath: included in my custom -with-dependencies jars as well as in the solr war, which was build/distributed/and hosted from the same nexu

Issues with whitespace tokenization in QueryParser

2012-06-10 Thread John Berryman
According to https://issues.apache.org/jira/browse/LUCENE-2605, the Lucene QueryParser tokenizes on white space before giving any text to the Analyzer. This makes it impossible to use multi-term synonyms because the SynonymFilter only receives one word at a time. Resolution to this would really he

Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Yes, these documents have lots of unique values as the same product could be assigned to lots of other categories and that too, in a different sort order. We did some evaluation of heap usage and found that with kind of queries we generate, heap usage was going up to 24-26 GB. I could trace it to

Re: How to do custom sorting in Solr?

2012-06-10 Thread Erick Erickson
2M docs is actually pretty small. Sorting is sensitive to the number of _unique_ values in the sort fields, not necessarily the number of documents. And sorting only works on fields with a single value (i.e. it can't have more than one token after analysis). So for each field you're only talking 2

Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Thanks Erik for your quick feedback When Products are assigned to a category or Sub-Category then they can be in any order and price type can be regular or markdown. So, reg and markdown products are intermingled as per their assignment but I want to sort them in such a way that we ensure that al

Building a heat map from geo data in index

2012-06-10 Thread Jamie Johnson
I had a request from a customer which to this point I have not seen much similar so I figured I'd pose the question here. I've been asked if it was possible to build a heat map from the results of a query. I can imagine a process to do this through some post processing, but that sounds very expen

Re: x most similar documents

2012-06-10 Thread Jack Krupansky
Oops, I said "MLT will use the first search result from the original query", but that is for the MLT handler. For the MLT component you get a separate set of documents for each document in the results of the original query. -- Jack Krupansky -Original Message- From: Jack Krupansky Se

Re: x most similar documents

2012-06-10 Thread Jack Krupansky
Yes, it sounds like MLT is the way to go, but sometimes you have to get creative in figuring out how to set the numerous parameters. And sometimes you have to use the MLT request handler rather than /select with the MLT component. You might also encounter issues related to the shortness of the

Re: How to do custom sorting in Solr?

2012-06-10 Thread Erick Erickson
Skimming this, I two options come to mind: 1> Simply apply primary, secondary, etc sorts. Something like &sort=subcategory asc,markdown_or_regular desc,sort_order asc 2> You could also use grouping to arrange things in groups and sort within those groups. This has the advantage of return

x most similar documents

2012-06-10 Thread Benjamin Murauer
Hi there, i have a solr server running containing tweets. my schema.xml contains following fields: my problem is actually quite simple; somewhere in my GUI the user types text and i want to retrieve tweets that are most similar to it. Therefore, i tried the "morelikethis" functionality. M

Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Hi All > > I have an index which contains a Catalog of Products and Categories, with > Solr 4.0 from trunk > > Data is organized like this: > > Category: Books > > Sub Category: Programming > > Products: > > Product # 1, Price: Regular Sort Order:1 > Product # 2, Price: Markdown, Sort Order:2 >