PatternTokenizer failure

2011-11-28 Thread Jay Luker
Hi all, I'm trying to use PatternTokenizer and not getting expected results. Not sure where the failure lies. What I'm trying to do is split my input on whitespace except in cases where the whitespace is preceded by a hyphen character. So to do this I'm using a negative look behind assertion in th

Re: InvalidTokenOffsetsException when using MappingCharFilterFactory, DictionaryCompoundWordTokenFilterFactory and Highlighting

2011-11-30 Thread Jay Luker
I am having a similar issue with OffsetExceptions during highlighting. In all of the explanations and bug reports I'm reading there is a mention this is all the result of a problem with HTMLStripCharFilter. But my analysis chains don't (that I'm aware of) make use of HTMLStripCharFilter, so can som

RegexQuery performance

2011-12-08 Thread Jay Luker
Hi, I am trying to provide a means to search our corpus of nearly 2 million fulltext astronomy and physics articles using regular expressions. A small percentage of our users need to be able to locate, for example, certain types of identifiers that are present within the fulltext (grant numbers, d

Re: RegexQuery performance

2011-12-10 Thread Jay Luker
ising, but I haven't built an instance of trunk yet to try it out. Any ohter suggestions appreciated. Thanks! --jay > In other words, this could be an "XY problem" > > Best > Erick > > On Thu, Dec 8, 2011 at 11:14 AM, Robert Muir wrote: >> On Thu, Dec 8,

Re: RegexQuery performance

2011-12-12 Thread Jay Luker
On Sat, Dec 10, 2011 at 9:25 PM, Erick Erickson wrote: > My off-the-top-of-my-head notion is you implement a > Filter whose job is to emit some "special" tokens when > you find strings like this that allow you to search without > regexes. For instance, in the example you give, you could > index so

NumericRangeQuery: what am I doing wrong?

2011-12-14 Thread Jay Luker
I can't get NumericRangeQuery or TermQuery to work on my integer "id" field. I feel like I must be missing something obvious. I have a test index that has only two documents, id:9076628 and id:8003001. The id field is defined like so: A MatchAllDocsQuery will return the 2 documents, but any que

Re: NumericRangeQuery: what am I doing wrong?

2011-12-14 Thread Jay Luker
On Wed, Dec 14, 2011 at 2:04 PM, Erick Erickson wrote: > Hmmm, seems like it should work, but there are two things you might try: > 1> just execute the query in Solr. id:1 TO 100]. Does that work? Yep, that works fine. > 2> I'm really grasping at straws here, but it's *possible* that you >  

Re: NumericRangeQuery: what am I doing wrong?

2011-12-15 Thread Jay Luker
On Wed, Dec 14, 2011 at 5:02 PM, Chris Hostetter wrote: > > I'm a little lost in this thread ... if you are programaticly construction > a NumericRangeQuery object to execute in the JVM against a Solr index, > that suggests you are writting some sort of SOlr plugin (or uembedding > solr in some wa

Re: Autocommit not happening

2010-07-23 Thread Jay Luker
For the sake of any future googlers I'll report my own clueless but thankfully brief struggle with autocommit. There are two parts to the story: Part One is where I realize my config was not contained within my . In Part Two I realized I had typed "" rather than "". --jay On Fri, Jul 23, 2010 a

documentCache clarification

2010-10-27 Thread Jay Luker
Hi all, The solr wiki says this about the documentCache: "The more fields you store in your documents, the higher the memory usage of this cache will be." OK, but if i have enableLazyFieldLoading set to true and in my request parameters specify "fl=id", then the number of fields per document shou

Re: documentCache clarification

2010-10-27 Thread Jay Luker
che.org/jira/browse/SOLR-52 > [2]: http://www.mail-archive.com/solr-...@lucene.apache.org/msg01185.html > > On Wednesday 27 October 2010 16:39:44 Jay Luker wrote: >> Hi all, >> >> The solr wiki says this about the documentCache: "The more fields you >> store in

Re: documentCache clarification

2010-10-28 Thread Jay Luker
On Wed, Oct 27, 2010 at 9:13 PM, Chris Hostetter wrote: > > : schema.) My evidence for this is the documentCache stats reported by > : solr/admin. If I request "rows=10&fl=id" followed by > : "rows=10&fl=id,title" I would expect to see the 2nd request result in > : a 2nd insert to the cache, but i

Re: documentCache clarification

2010-10-29 Thread Jay Luker
On Thu, Oct 28, 2010 at 7:27 PM, Chris Hostetter wrote: > The queryResultCache is keyed on and the > value is a "DocList" object ... > > http://lucene.apache.org/solr/api/org/apache/solr/search/DocList.html > > Unlike the Document objects in the documentCache, the DocLists in the > queryResultCa

Using jetty's GzipFilter in the example solr.war

2010-11-13 Thread Jay Luker
Hi, I thought I'd try turning on gzip compression but I can't seem to get jetty's GzipFilter to actually compress my responses. I unpacked the example solr.war and tried adding variations of the following to the web.xml (and then rejar-ed), but as far as I can tell, jetty isn't actually compressin

Re: Using jetty's GzipFilter in the example solr.war

2010-11-15 Thread Jay Luker
On Sun, Nov 14, 2010 at 12:49 AM, Kiwi de coder wrote: > try to put u filter on top of web.xml (instead of middle or bottom), i try > this few day and it just only a simple solution (not sure is a spec to put > on top or is a bug) Thank you. An explanation of why this worked is probably better e

Sending binary data as part of a query

2011-01-28 Thread Jay Luker
Hi all, Here is what I am interested in doing: I would like to send a compressed integer bitset as a query to solr. The bitset integers represent my document ids and the results I want to get back is the facet data for those documents. I have successfully created a QueryComponent class that, assu

Re: Sending binary data as part of a query

2011-02-01 Thread Jay Luker
On Mon, Jan 31, 2011 at 9:22 PM, Chris Hostetter wrote: > that class should probably have been named ContentStreamUpdateHandlerBase > or something like that -- it tries to encapsulate the logic that most > RequestHandlers using COntentStreams (for updating) need to worry about. > > Your QueryComp

Help with parsing configuration using SolrParams/NamedList

2011-02-16 Thread Jay Luker
Hi, I'm trying to use a CustomSimilarityFactory and pass in per-field options from the schema.xml, like so: 500 1 0.5 500 2 0.5 My problem is I am utterly failing to figure out how to parse this nested option structu

Highlight snippets for a set of known documents

2011-03-31 Thread Jay Luker
Hi all, I'm trying to get highlight snippets for a set of known documents and I must being doing something wrong because it's only sort of working. Say my query is "foobar" and I already know that docs 1, 5 and 11 are matches. Now I want to retrieve the highlight snippets for the term "foobar" fo

Re: Highlight snippets for a set of known documents

2011-04-01 Thread Jay Luker
> q=foobar&fq={!q.op=OR}(id:1 id:5 id:11) > > Regards > Stefan > > On Thu, Mar 31, 2011 at 6:40 PM, Jay Luker wrote: >> Hi all, >> >> I'm trying to get highlight snippets for a set of known documents and >> I must being doing something wrong becau

UIMA example setup w/o OpenCalais

2011-04-07 Thread Jay Luker
Hi, I'd would like to experiment with the UIMA contrib package, but I have issues with the OpenCalais service's ToS and would rather not use it. Is there a way to adapt the UIMA example setup to use only the AlchemyAPI service? I tried simply leaving out the OpenCalais api key but i get exceptions

Re: UIMA example setup w/o OpenCalais

2011-04-08 Thread Jay Luker
uld be able to do so by simply removing the OpenCalaisAnnotator from > the execution pipeline commenting the line 124 of the file: > solr/contrib/uima/src/main/resources/org/apache/uima/desc/OverridingParamsExtServicesAE.xml > Hope this helps, > Tommaso > > 2011/4/7 Jay Luker > >

tika/pdfbox knobs & levers

2011-04-13 Thread Jay Luker
Hi all, I'm wondering if there are any knobs or levers i can set in solrconfig.xml that affect how pdfbox text extraction is performed by the extraction handler. I would like to take advantage of pdfbox's ability to normalize diacritics and ligatures [1], but that doesn't seem to be the default be

Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Jay Luker
Hi Emyr, You could try using the "extractOnly=true" parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr James wrote: > Hi All, > > I have solr and tika ins

Re: Solr performance

2011-05-11 Thread Jay Luker
On Wed, May 11, 2011 at 7:07 AM, javaxmlsoapdev wrote: > I have some 25 odd fields with "stored=true" in schema.xml. Retrieving back > 5,000 records back takes a few secs. I also tried passing "fl" and only > include one field in the response but still response time is same. What are > the things

Re: Document has fields with different update frequencies: how best to model

2011-06-10 Thread Jay Luker
Take a look at ExternalFileField [1]. It's meant for exactly what you want to do here. FYI, there is an issue with caching of the external values introduced in v1.4 but, thankfully, resolved in v3.2 [2] --jay [1] http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html [2

Re: Document has fields with different update frequencies: how best to model

2011-06-11 Thread Jay Luker
update frequencies. It does not > seem external file field is the use case for this. > > > > On 10 June 2011 20:13, Jay Luker wrote: >> Take a look at ExternalFileField [1]. It's meant for exactly what you >> want to do here. >> >> FYI, there is an issue wit

WordDelimiterFilter preserveOriginal & position increment

2012-10-23 Thread Jay Luker
Hi, I'm having an issue with the WDF preserveOriginal="1" setting and the matching of a phrase query. Here's an example of the text that is being indexed: "...obtained with the Southern African Large Telescope,SALT..." A lot of our text is extracted from PDFs, so this kind of formatting junk is

Re: WordDelimiterFilter preserveOriginal & position increment

2012-10-23 Thread Jay Luker
t seems to not be a problem in 4.x. Thanks, --jay On Tue, Oct 23, 2012 at 10:45 AM, Shawn Heisey wrote: > On 10/23/2012 8:16 AM, Jay Luker wrote: >> >> From looking at the analysis debugger I can see that the WDF is >> getting the term "Telescope,SALT" and correct