Hi all,
I'm trying to use PatternTokenizer and not getting expected results.
Not sure where the failure lies. What I'm trying to do is split my
input on whitespace except in cases where the whitespace is preceded
by a hyphen character. So to do this I'm using a negative look behind
assertion in th
I am having a similar issue with OffsetExceptions during highlighting.
In all of the explanations and bug reports I'm reading there is a
mention this is all the result of a problem with HTMLStripCharFilter.
But my analysis chains don't (that I'm aware of) make use of
HTMLStripCharFilter, so can som
Hi,
I am trying to provide a means to search our corpus of nearly 2
million fulltext astronomy and physics articles using regular
expressions. A small percentage of our users need to be able to
locate, for example, certain types of identifiers that are present
within the fulltext (grant numbers, d
ising, but I
haven't built an instance of trunk yet to try it out. Any ohter
suggestions appreciated.
Thanks!
--jay
> In other words, this could be an "XY problem"
>
> Best
> Erick
>
> On Thu, Dec 8, 2011 at 11:14 AM, Robert Muir wrote:
>> On Thu, Dec 8,
On Sat, Dec 10, 2011 at 9:25 PM, Erick Erickson wrote:
> My off-the-top-of-my-head notion is you implement a
> Filter whose job is to emit some "special" tokens when
> you find strings like this that allow you to search without
> regexes. For instance, in the example you give, you could
> index so
I can't get NumericRangeQuery or TermQuery to work on my integer "id"
field. I feel like I must be missing something obvious.
I have a test index that has only two documents, id:9076628 and
id:8003001. The id field is defined like so:
A MatchAllDocsQuery will return the 2 documents, but any que
On Wed, Dec 14, 2011 at 2:04 PM, Erick Erickson wrote:
> Hmmm, seems like it should work, but there are two things you might try:
> 1> just execute the query in Solr. id:1 TO 100]. Does that work?
Yep, that works fine.
> 2> I'm really grasping at straws here, but it's *possible* that you
>
On Wed, Dec 14, 2011 at 5:02 PM, Chris Hostetter
wrote:
>
> I'm a little lost in this thread ... if you are programaticly construction
> a NumericRangeQuery object to execute in the JVM against a Solr index,
> that suggests you are writting some sort of SOlr plugin (or uembedding
> solr in some wa
For the sake of any future googlers I'll report my own clueless but
thankfully brief struggle with autocommit.
There are two parts to the story: Part One is where I realize my
config was not contained within my . In
Part Two I realized I had typed "" rather than
"".
--jay
On Fri, Jul 23, 2010 a
Hi all,
The solr wiki says this about the documentCache: "The more fields you
store in your documents, the higher the memory usage of this cache
will be."
OK, but if i have enableLazyFieldLoading set to true and in my request
parameters specify "fl=id", then the number of fields per document
shou
che.org/jira/browse/SOLR-52
> [2]: http://www.mail-archive.com/solr-...@lucene.apache.org/msg01185.html
>
> On Wednesday 27 October 2010 16:39:44 Jay Luker wrote:
>> Hi all,
>>
>> The solr wiki says this about the documentCache: "The more fields you
>> store in
On Wed, Oct 27, 2010 at 9:13 PM, Chris Hostetter
wrote:
>
> : schema.) My evidence for this is the documentCache stats reported by
> : solr/admin. If I request "rows=10&fl=id" followed by
> : "rows=10&fl=id,title" I would expect to see the 2nd request result in
> : a 2nd insert to the cache, but i
On Thu, Oct 28, 2010 at 7:27 PM, Chris Hostetter
wrote:
> The queryResultCache is keyed on and the
> value is a "DocList" object ...
>
> http://lucene.apache.org/solr/api/org/apache/solr/search/DocList.html
>
> Unlike the Document objects in the documentCache, the DocLists in the
> queryResultCa
Hi,
I thought I'd try turning on gzip compression but I can't seem to get
jetty's GzipFilter to actually compress my responses. I unpacked the
example solr.war and tried adding variations of the following to the
web.xml (and then rejar-ed), but as far as I can tell, jetty isn't
actually compressin
On Sun, Nov 14, 2010 at 12:49 AM, Kiwi de coder wrote:
> try to put u filter on top of web.xml (instead of middle or bottom), i try
> this few day and it just only a simple solution (not sure is a spec to put
> on top or is a bug)
Thank you.
An explanation of why this worked is probably better e
Hi all,
Here is what I am interested in doing: I would like to send a
compressed integer bitset as a query to solr. The bitset integers
represent my document ids and the results I want to get back is the
facet data for those documents.
I have successfully created a QueryComponent class that, assu
On Mon, Jan 31, 2011 at 9:22 PM, Chris Hostetter
wrote:
> that class should probably have been named ContentStreamUpdateHandlerBase
> or something like that -- it tries to encapsulate the logic that most
> RequestHandlers using COntentStreams (for updating) need to worry about.
>
> Your QueryComp
Hi,
I'm trying to use a CustomSimilarityFactory and pass in per-field
options from the schema.xml, like so:
500
1
0.5
500
2
0.5
My problem is I am utterly failing to figure out how to parse this
nested option structu
Hi all,
I'm trying to get highlight snippets for a set of known documents and
I must being doing something wrong because it's only sort of working.
Say my query is "foobar" and I already know that docs 1, 5 and 11 are
matches. Now I want to retrieve the highlight snippets for the term
"foobar" fo
> q=foobar&fq={!q.op=OR}(id:1 id:5 id:11)
>
> Regards
> Stefan
>
> On Thu, Mar 31, 2011 at 6:40 PM, Jay Luker wrote:
>> Hi all,
>>
>> I'm trying to get highlight snippets for a set of known documents and
>> I must being doing something wrong becau
Hi,
I'd would like to experiment with the UIMA contrib package, but I have
issues with the OpenCalais service's ToS and would rather not use it.
Is there a way to adapt the UIMA example setup to use only the
AlchemyAPI service? I tried simply leaving out the OpenCalais api key
but i get exceptions
uld be able to do so by simply removing the OpenCalaisAnnotator from
> the execution pipeline commenting the line 124 of the file:
> solr/contrib/uima/src/main/resources/org/apache/uima/desc/OverridingParamsExtServicesAE.xml
> Hope this helps,
> Tommaso
>
> 2011/4/7 Jay Luker
>
>
Hi all,
I'm wondering if there are any knobs or levers i can set in
solrconfig.xml that affect how pdfbox text extraction is performed by
the extraction handler. I would like to take advantage of pdfbox's
ability to normalize diacritics and ligatures [1], but that doesn't
seem to be the default be
Hi Emyr,
You could try using the "extractOnly=true" parameter [1]. Of course,
you'll need to repost the extracted text manually.
--jay
[1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only
On Thu, May 5, 2011 at 9:36 AM, Emyr James wrote:
> Hi All,
>
> I have solr and tika ins
On Wed, May 11, 2011 at 7:07 AM, javaxmlsoapdev wrote:
> I have some 25 odd fields with "stored=true" in schema.xml. Retrieving back
> 5,000 records back takes a few secs. I also tried passing "fl" and only
> include one field in the response but still response time is same. What are
> the things
Take a look at ExternalFileField [1]. It's meant for exactly what you
want to do here.
FYI, there is an issue with caching of the external values introduced
in v1.4 but, thankfully, resolved in v3.2 [2]
--jay
[1]
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
[2
update frequencies. It does not
> seem external file field is the use case for this.
>
>
>
> On 10 June 2011 20:13, Jay Luker wrote:
>> Take a look at ExternalFileField [1]. It's meant for exactly what you
>> want to do here.
>>
>> FYI, there is an issue wit
Hi,
I'm having an issue with the WDF preserveOriginal="1" setting and the
matching of a phrase query. Here's an example of the text that is
being indexed:
"...obtained with the Southern African Large Telescope,SALT..."
A lot of our text is extracted from PDFs, so this kind of formatting
junk is
t seems to not be a problem in 4.x.
Thanks,
--jay
On Tue, Oct 23, 2012 at 10:45 AM, Shawn Heisey wrote:
> On 10/23/2012 8:16 AM, Jay Luker wrote:
>>
>> From looking at the analysis debugger I can see that the WDF is
>> getting the term "Telescope,SALT" and correct
29 matches
Mail list logo