date:20150430

AW: Odp.: solr issue with pdf forms

2015-04-30 Thread Steve.Scholl

Hey, thanks a lot for the hint with pdfbox-app.jar. For testing purpose I now extracted a affected pdf form and a usual pdf file. The result ist he following: Usual pdf file: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et d pdf form:

Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo

Hi, Does anyone knows any faster method of populating the synonyms.txt file instead of manually typing in the words into the file, which there could be thousands of synonyms around? Regards, Edwin

Re: How to register a custom QParserPlugin

2015-04-30 Thread Oliver Obenland

Hi Hoss, thank you for your help. This helps a lot. I can see the plugin neither in the log nor in the plugin list, but it "works" now (got an exception from our class, so I know it'll be called). Thanks a lot! Oliver Am 29.04.2015 um 18:40 schrieb Chris Hostetter: : snippet to : vufind/so

Need help with Nested docs situation

2015-04-30 Thread roySolr

Hello, I have a situation and i'm a little bit stuck on the way how to fix it. For example the following data structure: *Deal* All Coca Cola 20% off *Products* Coca Cola light Coca Cola Zero 1L Coca Cola Zero 20CL Coca Cola 1L When somebody search to "Cola" discount i want the result of the d

Proximity Search

2015-04-30 Thread Vijaya Narayana Reddy Bhoomi Reddy

Hi, I have created my index with the default configurations. Now I am trying to use proximity search. However, I am bit not sure on the results and where its going wrong. For example, I want to find two phrases "this is phrase one" and another phrase "this is the second phrase" with not more than

Re: Proximity Search

2015-04-30 Thread Vijaya Narayana Reddy Bhoomi Reddy

I just tried with simple proximity search like "word1 word2" ~3 and it is not working. Just wondering whether I have to make any configuration changes to solrconfig.xml to make proximity search work? Thanks Vijay On 30 April 2015 at 14:32, Vijaya Narayana Reddy Bhoomi Reddy < vijaya.bhoomire...@

Re: Injecting synonymns into Solr

2015-04-30 Thread Kaushik

I am facing the same problem; currently I am resorting to a custom program to create this file. Hopefully there is a better solution out there. Thanks, Kaushik On Thu, Apr 30, 2015 at 3:58 AM, Zheng Lin Edwin Yeo wrote: > Hi, > > Does anyone knows any faster method of populating the synonyms.tx

Negative Boosting documents with a certain word

2015-04-30 Thread O. Olson

Hi, My Solr documents contain descriptions of products, similar to a BestBuy or a NewEgg catalog. I'm wondering if it were possible to push a product down the ranking if it contains a certain word. By this I mean it would still appear in the search results. However, instead of appearing n

Re: Injecting synonymns into Solr

2015-04-30 Thread Scott Dawson

There is a possible solution here: https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR Synonym format). I don't have personal experience with it. I only know about it because it's mentioned on page 184 of the 'Solr in Action' book by Trey Grainger and Timothy Potter. Maybe som

Re: Proximity Search

2015-04-30 Thread Rajani Maski

Hi Vijaya, I just quickly tried proximity search with the example set shipped with solr 5 and it looked like working for me. Perhaps, what you could is debug the query by enabling debugQuery=true. Here are the steps that I tried.(Assuming you are on Solr 5. Though this term proximity functionali

Re: Injecting synonymns into Solr

2015-04-30 Thread Vincenzo D'Amore

Which version of solr? On Thu, Apr 30, 2015 at 9:58 AM, Zheng Lin Edwin Yeo wrote: > Hi, > > Does anyone knows any faster method of populating the synonyms.txt file > instead of manually typing in the words into the file, which there could be > thousands of synonyms around? > > Regards, > Edwin

"Avoiding" a schema.xml

2015-04-30 Thread Sznajder ForMailingList

Hi, I am interested to index some documents in Solr, as I did in Lucene. I mean: giving via solrJ all the information about the field I am adding (Tokenize, store, facet etc...) can we do that? Or is it mandatory to define a schema on the collection? Thanks a lot! Benjamin

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-04-30 Thread Dan Davis

Hi Doug, nice write-up and 2 questions: - You write your own QParser plugins - can one keep the features of edismax for field boosting/phrase-match boosting by subclassing edismax? Assuming yes... - What do pf2 and pf3 do in the edismax query parser? hon-lucene-synonyms plugin links correction

Re: "Avoiding" a schema.xml

2015-04-30 Thread Shawn Heisey

On 4/30/2015 8:43 AM, Sznajder ForMailingList wrote: > I am interested to index some documents in Solr, as I did in Lucene. > > I mean: giving via solrJ all the information about the field I am adding > (Tokenize, store, facet etc...) > > can we do that? Or is it mandatory to define a schema on the

Re: Odp.: solr issue with pdf forms

2015-04-30 Thread Erick Erickson

OK, given all that Tika _is_ sending the weird characters to Solr. You can get them out of the index by using someting like PatternReplaceTokenFilterFactory or PatternReplaceCharFilterFactory in you analysis chain. However, you'll still be stuck with the odd characters showing up in your browser.

Re: Odp.: solr issue with pdf forms

2015-04-30 Thread Jack Krupansky

Or use a Solr update processor to scrub the source values. The regex pattern replacement processor could do the trick: http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html -- Jack Krupansky On Thu, Apr 30, 2015 at 11:17 AM, Erick Erickso

Re: Odp.: solr issue with pdf forms

2015-04-30 Thread Erick Erickson

Jack: I keep forgetting those things exist, thanks for the reminder! On Thu, Apr 30, 2015 at 8:23 AM, Jack Krupansky wrote: > Or use a Solr update processor to scrub the source values. The regex > pattern replacement processor could do the trick: > http://lucene.apache.org/solr/5_1_0/solr-core/o

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-04-30 Thread Doug Turnbull

- You write your own QParser plugins - can one keep the features of edismax for field boosting/phrase-match boosting by subclassing edismax? Assuming yes... hon-lucene-synonyms does this, but largely by copy pasting the code (sorry about the broken link!) pf2 and pf3 take the query "hello my na

Re: "Avoiding" a schema.xml

2015-04-30 Thread Erick Erickson

Could you explain a bit more _why_ you want to do this? As you're probably well aware, there are multiple ways to shoot yourself in the foot in lower-level Lucene. If you have some situation where you're creating indexes on the fly that may vary then you could consider the "managed schema" that le

RE: analyzer, indexAnalyzer and queryAnalyzer

2015-04-30 Thread Davis, Daniel (NIH/NLM) [C]

Thank you. -Original Message- From: Doug Turnbull [mailto:dturnb...@opensourceconnections.com] Sent: Thursday, April 30, 2015 11:33 AM To: solr-user@lucene.apache.org; Dan Davis Subject: Re: analyzer, indexAnalyzer and queryAnalyzer - You write your own QParser plugins - can one keep the

Re: Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo

I'm using Solr-5.0.0 and ZooKeeper-3.4.6. I've gotton some samples from the Moby Treasure List http://www.gutenberg.org/catalog/world/results?title=moby+list to try it out. However, currently I can only have up to around 2100 lines in my synonyms.txt in when I load the configuration into ZooKeepe

RE: Odp.: solr issue with pdf forms

2015-04-30 Thread Davis, Daniel (NIH/NLM) [C]

Steve, Another possibility is to use the Linux pdftotext command-line utility or a software daemon linked with the libraries it uses, usually part of the poppler-utils package. pdfbox should have the same basic capabilities, but may run a little slower. If you have very many "filled pdf" for

Re: Proximity Search

2015-04-30 Thread Vijaya Narayana Reddy Bhoomi Reddy

Thanks Rajani. I could get proximity search work for individual words. However, still could not make it work for two phrases, each containing more than a word. Also, results seem to be unexpected for proximity queries with wildcards. Thanks & Regards Vijay On 30 April 2015 at 15:19, Rajani Ma

RE: Proximity Search

2015-04-30 Thread Allison, Timothy B.

You'll need the ComplexPhraseQueryParser [1] to handle multiterm (wildcard/fuzzy/regex) terms in proximity. Beware, though, that that does not perform analysis on fuzzy/wildcard IIRC). The SurroundQueryParser can probably do both phrase near phrase and multiterm within proximity. Same warning

Collections API Overseer status and statistics

2015-04-30 Thread Ryan Steele

What time unit is the Solr collections API overseerstatus action using in the returned data? For example, given the following XML: name="avgTimePerRequest">0.15491020578778136 Is the avgTimePerRequest in seconds? Thanks, Ryan ---

Re: Proximity Search

2015-04-30 Thread Vijaya Narayana Reddy Bhoomi Reddy

Thanks Tim for the information. I shall have a look at them. Thanks & Regards Vijay On 30 April 2015 at 18:13, Allison, Timothy B. wrote: > You'll need the ComplexPhraseQueryParser [1] to handle multiterm > (wildcard/fuzzy/regex) terms in proximity. Beware, though, that that does > not perfo

Re: Proximity Search

2015-04-30 Thread Sujit Pal

Hi Vijay, I haven't tried this myself, but perhaps you could build the two phrases as PhraseQueries and connect them up with a SpanQuery? Something like this (using your original example). PhraseQuery p1 = new PhraseQuery(); for (String word : "this is phrase 1".split()) { p1.add(new Term("my

Re: Proximity Search

2015-04-30 Thread Dmitry Kan

Hi, If adding PhraseQuery objects does not work, then using SpanNearQuery with slop 0 and order true for p1 and p2 should work (tried). Dmitry On Thu, Apr 30, 2015 at 8:43 PM, Sujit Pal wrote: > Hi Vijay, > > I haven't tried this myself, but perhaps you could build the two phrases as > PhraseQ

Re: Collections API Overseer status and statistics

2015-04-30 Thread Shawn Heisey

On 4/30/2015 11:22 AM, Ryan Steele wrote: > What time unit is the Solr collections API overseerstatus action using > in the returned data? > > For example, given the following XML: name="avgTimePerRequest">0.15491020578778136 > > Is the avgTimePerRequest in seconds? Most timing data in Solr is re

Re: Negative Boosting documents with a certain word

2015-04-30 Thread Chris Hostetter

: My Solr documents contain descriptions of products, similar to a BestBuy or : a NewEgg catalog. I'm wondering if it were possible to push a product down : the ranking if it contains a certain word. By this I mean it would still https://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_

Optimal configuration for high throughput indexing

2015-04-30 Thread Vinay Pothnis

Hello, I have a usecase with the following characteristics: - High index update rate (adds/updates) - High query rate - Low index size (~800MB for 2.4Million docs) - The documents that are created at the high rate eventually "expire" and are deleted regularly at half hour intervals I current

RE: Proximity Search

2015-04-30 Thread Vijay Bhoomireddy

Thanks All, I shall try out the options and see how the results are. Thanks & Regards Vijay -Original Message- From: Dmitry Kan [mailto:solrexp...@gmail.com] Sent: 30 April 2015 18:58 To: solr-user@lucene.apache.org Subject: Re: Proximity Search Hi, If adding PhraseQuery objects does n

Lucene/Solr Revolution 2015 - Austin Oct 13-16 - CFP ends next Week

2015-04-30 Thread Chris Hostetter

(cross posted, please confine any replies to general@lucene) A quick reminder and/or heads up for htose who haven't heard yet: this year's Lucene/Solr Revolution is happeing in Austin Texas in October. The CFP and Early bird registration are currently open. (CFP ends May 8, Early Bird ends

Re: Injecting synonymns into Solr

2015-04-30 Thread Chris Hostetter

: There is a possible solution here: : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR : Synonym format). If you have WordNet synonyms you do't need any special code/tools to convert them -- the current solr.SynonymFilterFactory supports wordnet files (just specify forma

Bug with full text search fields in multiple languages (solr 5)

2015-04-30 Thread erantone

Dear all, I have defined two dynamic fields: for documents in English and in Portuguese, with the following index and query analyzers:

Re: Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo

Just to populate it with the general synonym words. I've managed to populate it with some source online, but is there a limit to what it can contains? I can't load the configuration into zookeeper if the synonyms.txt file contains more than 2100 lines. Regards, Edwin On 1 May 2015 05:44, "Chris H

Re: Injecting synonymns into Solr

2015-04-30 Thread Philippe Soares

Split your synonyms into multiple files and set the SynonymFilterFactory with a coma-separated list of files. e.g. : synonyms="syn1.txt,syn2.txt,syn3.txt" On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo wrote: > Just to populate it with the general synonym words. I've managed to > populate

Re: Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo

Thank you for the info. Yup this works. I found out that we can't load files that are more than 1MB into zookeeper, as it happens to any files that's larger than 1MB in size, not just the synonyms files. But I'm not sure if there will be an impact to the system, as the number of synonym text file c

Re: Collections API Overseer status and statistics

2015-04-30 Thread Shalin Shekhar Mangar

HI Ryan, That is in milliseconds. On Thu, Apr 30, 2015 at 10:52 PM, Ryan Steele wrote: > What time unit is the Solr collections API overseerstatus action using in > the returned data? > > For example, given the following XML: name="avgTimePerRequest">0.15491020578778136 > > Is the avgTimePerRe

39 matches

Mail list logo