Hi Evert,
I recently needed help with phrase highlighting and was pointed to the
FastVectorHighlighter which worked out great. I just made a change to the
configuration to add generateWordParts="0" and generateNumberParts="0" so that
searches for things like "1a" would get highlighted correctly
.pdf"
with the same result no highlight in the respond as below:
"highlighting": { "pdf1": {} }
=(
Really... do not know what to do...
Thanks for your time, if you have any more suggestion where I could be missing
something... please let me know.
> insight into why that would be. The most frequent reason is that the
> > field is a "string" type which is not broken up into words. Another
> > possibility is that your analysis chain is leaving in the quotes or
> > something similar. As James says, looking at a
Mark! For what it's worth - up
vote. Thanks!
Cheers!
-Teague James
> On Oct 1, 2015, at 6:12 PM, Koji Sekiguchi
> wrote:
>
> Hi Mark,
>
> I think I saw similar requirement recently in mailing list. The feature
> sounds reasonable to me.
>
> > If not, how
Hi everyone!
Does anyone have any suggestions on how to URL encode URLs that I'm
importing from SQL using the DIH? The importer pulls in something like
"http://www.downloadsite.com/document that is being downloaded.doc" and then
the Tika parser can't download the document because it ends up trying
Hello everyone,
I am having difficulty enabling phrase highlighting and am hoping someone
here can offer some help. This is what I have currently:
Solr 4.9
solrconfig.xml (partial snip)
xml
explicit
10
tiguous=true ?
>
> On Tue, Dec 1, 2015 at 3:36 PM, Teague James
> wrote:
>
>> Hello everyone,
>>
>> I am having difficulty enabling phrase highlighting and am hoping someone
>> here can offer some help. This is what I have currently:
the phrase
correct. It is the method of highlighting (I'm trying to get one set of tags
per phrase) and the occasional breaking of a single phrase into 2 hits.
Given that setup, what do you recommend? I'm not sure I understand the approach
you're describing. I appreciate the help!
Thanks everyone who replied! The FastVectorHighlighter did the trick. Here
is how I configured it:
In solrconfig.xml:
In the requestHandler I added:
on
text
true
100
In schema.xml:
I modified the text field:
I restarted Solr, re-indexed the documents and tested. All phrases are
correctly highli
Hello,
I am trying to install Solr 6.0.0 and have been successful with the default
installation, following the instructions provided on the Apache Solr
website. However, I do not want Solr running on port 8983, I want it to run
on port 80. I started a new Ubuntu 14.04 VM, installed open JDK 8, the
d
be appreciated. If this is a "feature" of the application, it'd be nice to see
that in the documentation.
Thanks Shawn!
-Teague
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Tuesday, May 31, 2016 4:31 PM
To: solr-user@lucene.apache.org
Su
Can someone please help me troubleshoot my Solr 6.0 highlighting issue? I
have a production Solr 4.9.0 unit configured to highlight responses and it
has worked for a long time now without issues. I have recently been testing
Solr 6.0 and have been unable to get highlighting to work. I used my 4.9
c
Can someone please help me troubleshoot my Solr 6.0 highlighting issue? I
have a production Solr 4.9.0 unit configured to highlight responses and it
has worked for a long time now without issues. I have recently been testing
Solr 6.0 and have been unable to get highlighting to work. I used my 4.9
c
Hi - Thanks for the reply, I'll give that a try.
-Original Message-
From: jimtronic [mailto:jimtro...@gmail.com]
Sent: Monday, October 24, 2016 3:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 6.0 Highlighting Not Working
Perhaps you need to wrap your inner "" and "" tags in t
MS
Why does highlighting fail for "1a" type searches? Any help is appreciated!
Thanks!
-Teague James
pha-numeric two character value, such a n0, b1, 1z, etc.:
...
Where "8667" is the document ID of the record that had the hit, but no
highlight. Other searches, "ms" for example, return:
...
MS
Why does highlighting fail for "1a" type searches? Any help is appreciated!
Thanks!
-Teague James
mind you but possible.
Best,
Erick
On Wed, Feb 1, 2017 at 8:24 AM, Teague James wrote:
> Hello everyone! I'm still stuck on this issue and could really use
> some help. I have a Solr 6.0.0 instance that is storing documents
> peppered with text like "1a", "2e",
My Solr 4.8.0 index includes a field called 'dom_title'. The field is
displayed in the result set. I want to be able to highlight keywords from
this field in the displayed results. I have tried configuring solrconfig.xml
and I have tried adding parameters to the query "&hl=true&hl.fl=dom_title"
but
Vicky,
I resolved this by making sure that the field that is searched has
"stored=true". By default "text" is searched, which is the destination of
the copyFields and is not stored. If you change your copyField destination
to a field that is stored and use that field as the default search field
th
Hello all,
I am working with Solr 4.9.0 and am searching for phrases that contain words
like "of" or "to" that Solr seems to be ignoring at index time. Here's what
I tried:
curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
--data-binary '100blah blah blah knowledge of scie
is the stopword.txt. You could either empty that file
out or change the field type for your field.
On Mon, Jul 14, 2014 at 12:53 PM, Teague James wrote:
> Hello all,
>
> I am working with Solr 4.9.0 and am searching for phrases that contain
> words like "of" or "to"
ing) by default uses
lang/stopwords_en.txt (which wouldn't be empty if you check).
What you're looking at is the stopword.txt. You could either empty that file
out or change the field type for your field.
On Mon, Jul 14, 2014 at 12:53 PM, Teague James
wrote:
> Hello all,
>
&
t.com/ and @solrstart Solr popularizers community:
https://www.linkedin.com/groups?gid=6713853
On Tue, Jul 15, 2014 at 8:10 AM, Teague James wrote:
> Hi Anshum,
>
> Thanks for replying and suggesting this, but the field type I am using (a
> modified text_general) in my schema ha
Hi everyone!
Does anyone have any good examples of generating a contiguous highlight for
a phrase? Here's what I have done:
curl http://localhost/solr/collection1/update?commit=true -H "Content-Type:
text/xml" --data-binary '100blah blah blah knowledge of science blah blah
blah'
Then, using a br
I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links to
binary files, such as Word, PDF, etc. The crawler crawls the site but I am
not getting the URLs of the links for the binary files no matter how deep I
set the settings for the site. I see the labels for the links in the
conte
: Re: Indexing URLs for Binaries
Check suffix-urlfilter.txt in your conf directory for Nutch. You might be
prohibiting those filetypes from the crawl.
- Mark
On 1/3/14, 10:29 AM, "Teague James" wrote:
>I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links
>
I am trying to index a website that contains links to documents such as PDF,
Word, etc. The intent is to be able to store the URLs for the links to the
documents.
For example, when indexing www.example.com which has links on the page like
"Example Document" which points to www.example.com/docs/ex
I am still unsuccessful in getting this to work. My expectation is that the
index-anchor plugin should produce values for the field anchor. However this
field is not showing up in my Solr index no matter what I try.
Here's what I have in my nutch-site.xml for plugins:
protocol-http|urlfilter-regex
Hello Markus,
I do get a linkdb folder in the crawl folder that gets created - but it is
created at the time that I execute the command automatically by Nutch. I just
tried to use solrindex against yesterday's cawl and did not get any errors, but
did not get the anchor field or any of the outli
Okay. I changed my solrindex to this:
bin/nutch solrindex http://localhost/solr/ crawl/crawldb crawl/linkdb
crawl/segments/20140115143147
I got the same errors:
Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: file:/.../crawl/linkdb/crawl_fetch
Input path does
Okay. I had used that previously and I just tried it again. The following
generated no errors:
bin/nutch solrindex http://localhost/solr/ crawl/crawldb -linkdb crawl/linkdb
-dir crawl/segments/
Solr is still not getting an anchor field and the outlinks are not appearing in
the index anywhere e
Progress!
I changed the value of that property in nutch-default.xml and I am getting the
anchor field now. However, the stuff going in there is a bit random and doesn't
seem to correlate to the pages I'm crawling. The primary objective is that when
there is something on the page that is a link
What I'm getting is just the anchor text. In cases where there are multiple
anchors I am getting a comma separated list of anchor text - which is fine.
However, I am not getting all of the anchors that are on the page, nor am I
getting any of the URLs. The anchors I am getting back never include
Markus,
With some help from another user on the Nutch list I did a dump and found that
the URLs I am trying to capture are in Nutch. However, when I index them with
Solr I am not getting them. What I get in the dump is this:
http://www.example.com/pdfs/article1.pdf
Status: 2 (db_fetched)
Fetch
I cannot get Solr 4.6.0 to do partial word search on a particular field that
is used for faceting. Most of the information I have found suggests
modifying the fieldType "text" to include either the NGramFilterFactory or
EdgeNGramFilterFactory in the filter. However since I am copying many other
fie
enerally work since it will query for all
the sub-terms, but AND will only work if all the sub-terms occur in the
document field.
-- Jack Krupansky
-Original Message-
From: Teague James
Sent: Wednesday, February 5, 2014 4:52 PM
To: solr-user@lucene.apache.org
Subject: Partial Word Search
l
the sub-terms, but AND will only work if all the sub-terms occur in the
document field.
-- Jack Krupansky
-Original Message-
From: Teague James
Sent: Wednesday, February 5, 2014 4:52 PM
To: solr-user@lucene.apache.org
Subject: Partial Word Search
I cannot get Solr 4.6.0 to do partial
Is there a way to specify the document types that Tika parses? In my DIH I
index the content of a SQL database which has a field that points to the SQL
record's binary file (which could be Word, PDF, JPG, MOV, etc.). Tika then
uses the document URL to index that document's content. However there ar
Hello!
I am indexing Solr 4.9.0 using the /update request handler and am getting
errors from Tika - Illegal IOException from
org.apache.tika.parser.xml.DcXMLParser@74ce3bea which is caused by
MalFormedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. I
believe that this is the result
Hi all,
I am using Solr 4.9.0 to index a DB with DIH. In the DB there is a URL
field. In the DIH Tika uses that field to fetch and parse the documents. The
URL from the field is valid and will download the document in the browser
just fine. But Tika is getting HTTP response code 400. Any ideas why
Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Tuesday, December 02, 2014 1:45 PM
To: solr-user
Subject: Re: Tika HTTP 400 Errors with DIH
On 2 December 2014 at 13:19, Teague James wrote:
> clob="true"
What does ClobTransformer is doing on the DownloadURL field? Is it possibl
Regards,
>> Alex.
>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources
>> and newsletter: http://www.solr-start.com/ and @solrstart Solr
>> popularizers community: https://www.linkedin.com/groups?gid=6713853
>>
>>
>> On 4 December 2014 at
42 matches
Mail list logo