Hey, thanks a lot for the hint with pdfbox-app.jar.
For testing purpose I now extracted a affected pdf form and a usual pdf file.
The result ist he following:
Usual pdf file:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut
labore et d
pdf form:
Hi,
Does anyone knows any faster method of populating the synonyms.txt file
instead of manually typing in the words into the file, which there could be
thousands of synonyms around?
Regards,
Edwin
Hi Hoss,
thank you for your help. This helps a lot. I can see the plugin neither in the log nor in
the plugin list, but it "works" now (got an exception from our class, so I know
it'll be called).
Thanks a lot!
Oliver
Am 29.04.2015 um 18:40 schrieb Chris Hostetter:
: snippet to
: vufind/so
Hello,
I have a situation and i'm a little bit stuck on the way how to fix it.
For example the following data structure:
*Deal*
All Coca Cola 20% off
*Products*
Coca Cola light
Coca Cola Zero 1L
Coca Cola Zero 20CL
Coca Cola 1L
When somebody search to "Cola" discount i want the result of the d
Hi,
I have created my index with the default configurations. Now I am trying to
use proximity search. However, I am bit not sure on the results and where
its going wrong.
For example, I want to find two phrases "this is phrase one" and another
phrase "this is the second phrase" with not more than
I just tried with simple proximity search like "word1 word2" ~3 and it is
not working. Just wondering whether I have to make any configuration
changes to solrconfig.xml to make proximity search work?
Thanks
Vijay
On 30 April 2015 at 14:32, Vijaya Narayana Reddy Bhoomi Reddy <
vijaya.bhoomire...@
I am facing the same problem; currently I am resorting to a custom program
to create this file. Hopefully there is a better solution out there.
Thanks,
Kaushik
On Thu, Apr 30, 2015 at 3:58 AM, Zheng Lin Edwin Yeo
wrote:
> Hi,
>
> Does anyone knows any faster method of populating the synonyms.tx
Hi,
My Solr documents contain descriptions of products, similar to a
BestBuy or
a NewEgg catalog. I'm wondering if it were possible to push a product down
the ranking if it contains a certain word. By this I mean it would still
appear in the search results. However, instead of appearing n
There is a possible solution here:
https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR
Synonym format).
I don't have personal experience with it. I only know about it because it's
mentioned on page 184 of the 'Solr in Action' book by Trey Grainger and
Timothy Potter.
Maybe som
Hi Vijaya,
I just quickly tried proximity search with the example set shipped with
solr 5 and it looked like working for me.
Perhaps, what you could is debug the query by enabling debugQuery=true.
Here are the steps that I tried.(Assuming you are on Solr 5. Though this
term proximity functionali
Which version of solr?
On Thu, Apr 30, 2015 at 9:58 AM, Zheng Lin Edwin Yeo
wrote:
> Hi,
>
> Does anyone knows any faster method of populating the synonyms.txt file
> instead of manually typing in the words into the file, which there could be
> thousands of synonyms around?
>
> Regards,
> Edwin
Hi,
I am interested to index some documents in Solr, as I did in Lucene.
I mean: giving via solrJ all the information about the field I am adding
(Tokenize, store, facet etc...)
can we do that? Or is it mandatory to define a schema on the collection?
Thanks a lot!
Benjamin
Hi Doug, nice write-up and 2 questions:
- You write your own QParser plugins - can one keep the features of edismax
for field boosting/phrase-match boosting by subclassing edismax? Assuming
yes...
- What do pf2 and pf3 do in the edismax query parser?
hon-lucene-synonyms plugin links correction
On 4/30/2015 8:43 AM, Sznajder ForMailingList wrote:
> I am interested to index some documents in Solr, as I did in Lucene.
>
> I mean: giving via solrJ all the information about the field I am adding
> (Tokenize, store, facet etc...)
>
> can we do that? Or is it mandatory to define a schema on the
OK, given all that Tika _is_ sending the weird characters to Solr. You
can get them out of the index by using someting like
PatternReplaceTokenFilterFactory or PatternReplaceCharFilterFactory in
you analysis chain. However, you'll still be stuck with the odd
characters showing up in your browser.
Or use a Solr update processor to scrub the source values. The regex
pattern replacement processor could do the trick:
http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html
-- Jack Krupansky
On Thu, Apr 30, 2015 at 11:17 AM, Erick Erickso
Jack:
I keep forgetting those things exist, thanks for the reminder!
On Thu, Apr 30, 2015 at 8:23 AM, Jack Krupansky
wrote:
> Or use a Solr update processor to scrub the source values. The regex
> pattern replacement processor could do the trick:
> http://lucene.apache.org/solr/5_1_0/solr-core/o
- You write your own QParser plugins - can one keep the features of edismax
for field boosting/phrase-match boosting by subclassing edismax? Assuming
yes...
hon-lucene-synonyms does this, but largely by copy pasting the code (sorry
about the broken link!)
pf2 and pf3 take the query "hello my na
Could you explain a bit more _why_ you want to do this? As you're
probably well aware, there
are multiple ways to shoot yourself in the foot in lower-level Lucene.
If you have some situation where you're creating indexes on the fly
that may vary then
you could consider the "managed schema" that le
Thank you.
-Original Message-
From: Doug Turnbull [mailto:dturnb...@opensourceconnections.com]
Sent: Thursday, April 30, 2015 11:33 AM
To: solr-user@lucene.apache.org; Dan Davis
Subject: Re: analyzer, indexAnalyzer and queryAnalyzer
- You write your own QParser plugins - can one keep the
I'm using Solr-5.0.0 and ZooKeeper-3.4.6.
I've gotton some samples from the Moby Treasure List
http://www.gutenberg.org/catalog/world/results?title=moby+list to try it
out.
However, currently I can only have up to around 2100 lines in my
synonyms.txt in when I load the configuration into ZooKeepe
Steve,
Another possibility is to use the Linux pdftotext command-line utility or a
software daemon linked with the libraries it uses, usually part of the
poppler-utils package. pdfbox should have the same basic capabilities, but
may run a little slower.
If you have very many "filled pdf" for
Thanks Rajani.
I could get proximity search work for individual words. However, still
could not make it work for two phrases, each containing more than a word.
Also, results seem to be unexpected for proximity queries with wildcards.
Thanks & Regards
Vijay
On 30 April 2015 at 15:19, Rajani Ma
You'll need the ComplexPhraseQueryParser [1] to handle multiterm
(wildcard/fuzzy/regex) terms in proximity. Beware, though, that that does not
perform analysis on fuzzy/wildcard IIRC).
The SurroundQueryParser can probably do both phrase near phrase and multiterm
within proximity. Same warning
What time unit is the Solr collections API overseerstatus action using
in the returned data?
For example, given the following XML: name="avgTimePerRequest">0.15491020578778136
Is the avgTimePerRequest in seconds?
Thanks,
Ryan
---
Thanks Tim for the information. I shall have a look at them.
Thanks & Regards
Vijay
On 30 April 2015 at 18:13, Allison, Timothy B. wrote:
> You'll need the ComplexPhraseQueryParser [1] to handle multiterm
> (wildcard/fuzzy/regex) terms in proximity. Beware, though, that that does
> not perfo
Hi Vijay,
I haven't tried this myself, but perhaps you could build the two phrases as
PhraseQueries and connect them up with a SpanQuery? Something like this
(using your original example).
PhraseQuery p1 = new PhraseQuery();
for (String word : "this is phrase 1".split()) {
p1.add(new Term("my
Hi,
If adding PhraseQuery objects does not work, then using SpanNearQuery with
slop 0 and order true for p1 and p2 should work (tried).
Dmitry
On Thu, Apr 30, 2015 at 8:43 PM, Sujit Pal wrote:
> Hi Vijay,
>
> I haven't tried this myself, but perhaps you could build the two phrases as
> PhraseQ
On 4/30/2015 11:22 AM, Ryan Steele wrote:
> What time unit is the Solr collections API overseerstatus action using
> in the returned data?
>
> For example, given the following XML: name="avgTimePerRequest">0.15491020578778136
>
> Is the avgTimePerRequest in seconds?
Most timing data in Solr is re
: My Solr documents contain descriptions of products, similar to a
BestBuy or
: a NewEgg catalog. I'm wondering if it were possible to push a product down
: the ranking if it contains a certain word. By this I mean it would still
https://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_
Hello,
I have a usecase with the following characteristics:
- High index update rate (adds/updates)
- High query rate
- Low index size (~800MB for 2.4Million docs)
- The documents that are created at the high rate eventually "expire" and
are deleted regularly at half hour intervals
I current
Thanks All, I shall try out the options and see how the results are.
Thanks & Regards
Vijay
-Original Message-
From: Dmitry Kan [mailto:solrexp...@gmail.com]
Sent: 30 April 2015 18:58
To: solr-user@lucene.apache.org
Subject: Re: Proximity Search
Hi,
If adding PhraseQuery objects does n
(cross posted, please confine any replies to general@lucene)
A quick reminder and/or heads up for htose who haven't heard yet: this
year's Lucene/Solr Revolution is happeing in Austin Texas in October. The
CFP and Early bird registration are currently open. (CFP ends May 8,
Early Bird ends
: There is a possible solution here:
: https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR
: Synonym format).
If you have WordNet synonyms you do't need any special code/tools to
convert them -- the current solr.SynonymFilterFactory supports wordnet
files (just specify forma
Dear all,
I have defined two dynamic fields:
for documents in English and in Portuguese, with the following index and
query analyzers:
Just to populate it with the general synonym words. I've managed to
populate it with some source online, but is there a limit to what it can
contains?
I can't load the configuration into zookeeper if the synonyms.txt file
contains more than 2100 lines.
Regards,
Edwin
On 1 May 2015 05:44, "Chris H
Split your synonyms into multiple files and set the SynonymFilterFactory
with a coma-separated list of files. e.g. :
synonyms="syn1.txt,syn2.txt,syn3.txt"
On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo
wrote:
> Just to populate it with the general synonym words. I've managed to
> populate
Thank you for the info. Yup this works. I found out that we can't load
files that are more than 1MB into zookeeper, as it happens to any files
that's larger than 1MB in size, not just the synonyms files.
But I'm not sure if there will be an impact to the system, as the number of
synonym text file c
HI Ryan,
That is in milliseconds.
On Thu, Apr 30, 2015 at 10:52 PM, Ryan Steele wrote:
> What time unit is the Solr collections API overseerstatus action using in
> the returned data?
>
> For example, given the following XML: name="avgTimePerRequest">0.15491020578778136
>
> Is the avgTimePerRe
39 matches
Mail list logo