find documents based on specific term frequency

2015-08-26 Thread Tang, Rebecca
Hi there, We have an index build on solr 5.0. We received an user question: "Is there a way to search for documents that have a word appearing more than a certain number of times? For example, I want to find documents that only have more than 10 instances of the word "genetics" …" I'm not sure

boolean operators OR/NOT get highlighted by solr

2015-05-11 Thread Tang, Rebecca
Hi, We have a SOLR query like this q=ddmdate%3A2012-05-01T00%3A00%3A00Z+NOT+dddate%3A2010-06-11T00%3A00%3A00Z&wt=json&indent=true&hl=true&hl.simple.pre=%3Ch1%3E&hl.simple.post=%3C%2Fh1%3E&hl.requireFieldMatch=true&hl.preserveMulti=true&hl.fl=ot&f.ot.hl.fragsize=300&f.ot.hl.alternateField=ot&f.ot.

solr bug 6143 (facet count and CollapsingQParserPlugin)

2015-03-02 Thread Tang, Rebecca
We use the CollapsingQParser to group possible duplicate records. We are running into the issue reported by bug 6143. CollapsingQParser only supports facet.truncate but it returns counts that confuses our customers. What we need is group.facets. I wanted to check if a "new feature" bug has b

Re: how to debug solr performance degradation

2015-02-27 Thread Tang, Rebecca
eries, but if you want to >> force/preload the index into memory you could try doing something like >> > >> > cat `find /path/to/solr/index` > /dev/null >> > >> > >> > if you haven't already reviewed the following, you might take a look >>here

RE: how to debug solr performance degradation

2015-02-25 Thread Tang, Rebecca
_ From: Shawn Heisey [apa...@elyograg.org] Sent: Tuesday, February 24, 2015 5:23 PM To: solr-user@lucene.apache.org Subject: Re: how to debug solr performance degradation On 2/24/2015 5:45 PM, Tang, Rebecca wrote: > We gave the machine 180G mem to see if it improves performance.

Re: how to debug solr performance degradation

2015-02-24 Thread Tang, Rebecca
)? Rebecca Tang Applications Developer, UCSF CKM Industry Documents Digital Libraries E: rebecca.t...@ucsf.edu On 2/24/15 12:44 PM, "Shawn Heisey" wrote: >On 2/24/2015 1:09 PM, Tang, Rebecca wrote: >> Our solr index used to perform OK on our beta production box (any

how to debug solr performance degradation

2015-02-24 Thread Tang, Rebecca
Our solr index used to perform OK on our beta production box (anywhere between 0-3 seconds to complete any query), but today I noticed that the performance is very bad (queries take between 12 – 15 seconds). I haven't updated the solr index configuration (schema.xml/solrconfig.xml) lately. All

RE: edismax removes query string: (pg_int:-1) becomes ()

2015-02-21 Thread Tang, Rebecca
Or escape the minus with a backslash: (pg_int:\-1) Also, what is the field and field type for pg_int? The edismax query parser has a few too many parsing heuristics, causing way too many odd combinations that are not exhaustively tested. -- Jack Krupansky On Sat, Feb 21, 2015 at 5:43 PM,

edismax removes query string: (pg_int:-1) becomes ()

2015-02-21 Thread Tang, Rebecca
Hi there, I have a field pg_int which is number of pages stored as integer. There are 118 records in my index with pg_int = -1. If I search the index with pg_int:-1, I get the correct records returned in the results. { "responseHeader": { "status": 0, "QTime": 1, "params": { "debugQuery": "tr

what order does solr return the results in if the search is *:*

2015-02-19 Thread Tang, Rebecca
If user searches for *:*, what order does solr return the results in? I expected the results to be returned in index order. (I indexed the documents in the order of the numeric document id from 0 -> ~15,000,000). So when I searched with *:*, I expected the first 10 documents returned to have

Re: Old facet value doesn't go away after index update

2014-12-22 Thread Tang, Rebecca
Thank you for the explanation! Rebecca Tang Applications Developer, UCSF CKM Industry Documents Digital Libraries E: rebecca.t...@ucsf.edu On 12/19/14 12:37 PM, "Shawn Heisey" wrote: >On 12/19/2014 11:22 AM, Tang, Rebecca wrote: >> I have an index that has a field call

Old facet value doesn't go away after index update

2014-12-19 Thread Tang, Rebecca
Hi there, I have an index that has a field called collection_facet. There was a value 'Ness Motley Law Firm Documents' that we wanted to update to 'Ness Motley Law Firm'. There were 36,132 records with this value. So I re-indexed just the 36,132 records. After the update, I ran a facet query

Is there a way to stop some hyphenated terms from being tokenized

2014-11-05 Thread Tang, Rebecca
Hi there, For some hyphenated terms, I want them to stay as is instead of being tokenized. For example: e-cigarette, e-cig, I-pad. I don't want them to be split into e and cig or I and pad because the single letter e and I produces too many false positive matches. Is there a way to tell the

plans for CollapsingQParser to support untruncated facet count

2014-10-24 Thread Tang, Rebecca
We are using collapsingQParser to group results for collapsing possible duplicate records. We have a signature field that the collapsingQParser acts on to group the results. However, when we facet on top of it, we get facet counts that are seemingly wrong. For example, if we have 4 records:

Is it possible to replicate just the solrconfig.xml file

2014-10-09 Thread Tang, Rebecca
I have a master-slave set up. Most of the times when I replicate, I want to replicate the index as well as some of the config files like schema.xml, solrconfig.xml, etc. I have this set up and it works well. But sometimes, I make a small tweak to solrconfig.xml and deploy it to the master. Af

Re: OOM during indexing nested docs

2014-06-25 Thread Tang, Rebecca
How big is your request size from client to server? I ran into OOM problems too. For me the reason was that I was sending big requests (1+ docs) at too fast a pace. So I put a throttle on the client to control the throughput of the request it sends to the server, and that got rid of the OOM e

Re: faceting performance on fields with high-cardinality

2014-06-13 Thread Tang, Rebecca
to see if it further improves the performance. Rebecca Tang Applications Developer, UCSF CKM Legacy Tobacco Document Library E: rebecca.t...@ucsf.edu On 6/13/14 1:24 PM, "Toke Eskildsen" wrote: >Tang, Rebecca [rebecca.t...@ucsf.edu] wrote: >> I have an solr index with 1

Re: solr server early EOF errors

2014-06-13 Thread Tang, Rebecca
tions Developer, UCSF CKM Legacy Tobacco Document Library E: rebecca.t...@ucsf.edu On 6/13/14 11:57 AM, "Shawn Heisey" wrote: >On 6/13/2014 12:06 PM, Tang, Rebecca wrote: >> I've been working with this issue for a while and I really don¹t know >>what the root cause

solr server early EOF errors

2014-06-13 Thread Tang, Rebecca
Hi there, I've been working with this issue for a while and I really don’t know what the root cause is. Any insight would be great! I have 14 million records in a mysql DB. I grab 100,000 records from the DB at a time and then use ConcurrentUpdateSolrServer (with queue size = 50 and thread c

faceting performance on fields with high-cardinality

2014-06-12 Thread Tang, Rebecca
Hi there, I have an solr index with 14+ million records. We facet on quite a few fields with very high-cardinality such as author, person, organization, brand and document type. Some of the records contain thousands of persons and organizations. So the person and organization fields can be v

How to search one field and highlight another

2014-04-02 Thread Tang, Rebecca
Hi there, For dates we create two Solr fields: date_display and date. date_display: stored = true, indexed = false, it's for display purpose only date: stored = false, indexed = true, it's used for searching, ordering and faceting When users search on date, I need to be able to highlight date_di