AW: Odp.: solr issue with pdf forms

2015-04-29 Thread Steve.Scholl
Thank you very much fort he detailed information. I now checked the properties of the content fied. In my oppinion it is indexed, right?: Field: content Properties: Indexed, Tokenized, Stored, TermVector Stored Schema: Indexed, Tokenized, Stored, TermVector Stored Index: Indexed, Tokenized, Store

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
Brackets are range operators for the parser, you need to escape them \[ or enclose in quotes. On Apr 29, 2015 10:27 PM, "Kaushik" wrote: > Hi Roman, > > "Tween 20" also did not retrieve me results. So I replaced the whitespaces > in the synonyms.txt with 'x' and now when I search, I get the resu

SolrCloud+HDFS disappointed indexing performance

2015-04-29 Thread xinwu
Hi, I have a huge amount of log files which are stored in HDFS need to be indexed. And I did some tests,the result follows: Then several questions: 1. Why HDFS showed such disappointed performance? 2. If each doc size is about 2

Re: How to export the list of terms indexed in Solr?

2015-04-29 Thread Koji Sekiguchi
Hi brent3600, You can use NLP4L for this purpose. NLP4L is good at counting the number of words not only in whole index but also in a set of documents. There is a tutorial for this function. Count the number of words http://nlp4l.github.io/tutorial_ja.html#useNLP Sorry but the tutorial is writ

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-04-29 Thread Kaushik
Hi Doug, Nice explanation of the query parsers. If you get a chance, can you please take a quick look at the issue I am facing with multi term synonyms as well? http://lucene.472066.n3.nabble.com/Mutli-term-synonyms-tt4200960.html#none is the problem I am facing. I am now able to perform multi ter

Re: Mutli term synonyms

2015-04-29 Thread Kaushik
Hi Roman, "Tween 20" also did not retrieve me results. So I replaced the whitespaces in the synonyms.txt with 'x' and now when I search, I get the results back. One problem however still exits. i.e. when I search for POLYSORBATE 20[MART.] which is a synonym for POLYSORBATE 20, I get error as below

Re: On the fly reloading of solr core properties

2015-04-29 Thread KNitin
Hi I would really appreciate it if any of you can share your insights with such a use case. Thanks much Nitin On Tuesday, April 28, 2015, KNitin wrote: > Hi > > In Solrcloud (4.6.1) every time a property/value is changed in > solrcore.properties file, a core/collection reload is needed to pic

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-04-29 Thread Doug Turnbull
So Solr has the idea of a query parser. The query parser is a convenient way of passing a search string to Solr and having Solr parse it into underlying Lucene queries: You can see a list of query parsers here http://wiki.apache.org/solr/QueryParser What this means is that the query parser does wo

Loading lineshape data into Solr

2015-04-29 Thread Arthur Zubarev
Hi Solr community, My immediate task at hand is to load lienshape data into Solr (the lineshape data is a set of points on a curve in form of lat. + long. coordinates). The data sits in a SQL Server 2012 table. Extracting the data to a flat file is impossible as it is becoming binary (not readabl

Schema API: add-field-type

2015-04-29 Thread Steven White
Hi Everyone, When I pass the following: http://localhost:8983/solr/db/schema/fieldtypes?wt=xml I see this (as one example): date solr.TrieDateField 0 0 last_modified *_dts *_dt See how there is "fields" and "dynamicfields"? However, w

How to export the list of terms indexed in Solr?

2015-04-29 Thread brent3600
We are indexing collections of documents (files) with SOLR, and would like the following capability: Export or pull from SOLR the list of terms that have been indexed for a document or set of documents, along with the term frequency count. 1. Does SOLR already provide an API or method to acco

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-04-29 Thread Steven White
Hi Doug, I don't understand what you mean by the following: > For example, if a user searches for q=hot dogs&defType=edismax&qf=title > body the *query parser* *not* the *analyzer* first turns the query into: If I have indexAnalyzer and queryAnalyzer in a fieldType that are 100% identical, the e

Re: Choosing order of fields in response with fl=field_1, field_2

2015-04-29 Thread Raphaël Tournoy
Fair enough, I like the underlying phylosophy and i'm glad to understand the choices. Thank you very much for these explanations, Raphaël Le 28/04/2015 19:19, Chris Hostetter a écrit : because of th enature of the CSV format, the order of the fields *has* to be deterministic and consistent for

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
Hi Kaushik, I meant to compare tween 20 against "tween 20". Your autophrase filter replaces whitespace with x, but your synonym filter expects whitespaces. Try that. Roman On Apr 29, 2015 2:27 PM, "Kaushik" wrote: > Hi Roman, > > When I used the debugQuery using > > http://localhost:8983/solr/c

Re: Mutli term synonyms

2015-04-29 Thread Kaushik
Hi Roman, When I used the debugQuery using http://localhost:8983/solr/collection1/autophrase?q=tween+20&wt=json&indent=true&debugQuery=true I see the following in the response. The autophrase plugin seems to be doing its part. Just not the synonym expansion. When you say use phrase queries, what d

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-04-29 Thread Chris Hostetter
: 1) If the content of indexAnalyzer and queryAnalyzer are exactly the same, : that's the same as if I have an analyzer only, right? Effectively yes. Subtle nuance: if you declare 1 analyzer, there is one Analyzer object in ram. If you declare both, then there are 2 Analyzer objects in RAM -

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-04-29 Thread Doug Turnbull
*> 1) If the content of indexAnalyzer and queryAnalyzer are exactly the same,that's the same as if I have an analyzer only, right?* 1) Yes *> 2) Under the hood, all three are the same thing when it comes to what kind* *of data and configuration attributes can take, right?* 2) Yes. Both take in te

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
Pls post output of the request with debugQuery=true Do you see the synonyms being expanded? Probably not. You can go to the administer iface, in the analyzer section play with the input until you see the synonyms. Use phrase queries too. That will be helpful to elliminate autophrase filter On Apr

analyzer, indexAnalyzer and queryAnalyzer

2015-04-29 Thread Steven White
Hi Everyone, Looking at Solr's schema.xml, there are three kind of analyzers: analyzer, indexAnalyzer and queryAnalyzer. I have two questions about them: 1) If the content of indexAnalyzer and queryAnalyzer are exactly the same, that's the same as if I have an analyzer only, right? 2) Under the

Managed synonyms and Solr Java API

2015-04-29 Thread Mike Thomsen
Is there a way to manage synonyms through Solr's Java API? Google doesn't turn up any good results, and I didn't see anything in the javadocs that looked promising. Thanks, Mike

Re: Errors after upgrade from Solr 4.5.1 to Solr 4.10.4

2015-04-29 Thread Chris Hostetter
: After we upgraded Solr from 4.5.1 to 4.10.4, we started seeing the : following UnsupportedOperationException logged repeatedly. We do not : have highlighting configured to useFastVectorHighlighter. The logged : stack trace has given me little to go on. I was hoping this is a : problem oth

Re: How to register a custom QParserPlugin

2015-04-29 Thread Chris Hostetter
: snippet to : vufind/solr/biblio/conf/sorconfig.xml. the correct syntax should be... ...note the "P" If it's loaded properly, you should see mention of MyQParserPlugin in your logs at startup, and it should appear in the list of query parser plugins in the admin ui... https://cwiki.apa

Re: luceneMatchVersion

2015-04-29 Thread Chris Hostetter
https://issues.apache.org/jira/browse/SOLR-7487 : Date: Wed, 29 Apr 2015 12:23:13 -0400 : From: Scott Dawson : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: luceneMatchVersion : : Thanks Shawn. There's a closed JIRA ticket related to this - SOLR-5048 - :

Re: Why are these two queries different?

2015-04-29 Thread Chris Hostetter
: We did two SOLR qeries and they supposed to return the same results but : did not: the short answer is: if you want those queries to return the same results, then you need to adjust your query time analyzer forthe all_text field to not split intra numberic tokens on "," i don't know *why* ex

Re: luceneMatchVersion

2015-04-29 Thread Scott Dawson
Thanks Shawn. There's a closed JIRA ticket related to this - SOLR-5048 - "fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion". Regards, Scott On Wed, Apr 29, 2015 at 12:15 PM, Shawn Heisey wrote: > On 4/29/2015 9:56 AM, Scott Dawson wrote: > > In Solr

Errors after upgrade from Solr 4.5.1 to Solr 4.10.4

2015-04-29 Thread Rich Hume
After we upgraded Solr from 4.5.1 to 4.10.4, we started seeing the following UnsupportedOperationException logged repeatedly. We do not have highlighting configured to useFastVectorHighlighter. The logged stack trace has given me little to go on. I was hoping this is a problem others have see

How to register a custom QParserPlugin

2015-04-29 Thread Oliver Obenland
Hi, we are trying to implement a custom QParserPlugin following this tutorial: http://spykem.blogspot.de/2013/06/plug-in-external-score-to-solr.html. We are using SOLR with VuFind. The implementation is done, the jar is located at vufind/solr/lib/ where other jars seem to be located. Then we a

Re: luceneMatchVersion

2015-04-29 Thread Shawn Heisey
On 4/29/2015 9:56 AM, Scott Dawson wrote: > In Solr 5.1, I've noticed that luceneMatchVersion is set to 5.0.0 in the > sample and any newly generated solrconfig.xml files. Is this an oversight > or by design? Any reason I shouldn't bump it to 5.1.0 for new cores I'm > creating? I'm pretty sure tha

luceneMatchVersion

2015-04-29 Thread Scott Dawson
Hello, In Solr 5.1, I've noticed that luceneMatchVersion is set to 5.0.0 in the sample and any newly generated solrconfig.xml files. Is this an oversight or by design? Any reason I shouldn't bump it to 5.1.0 for new cores I'm creating? Thanks, Scott

Re: Embedded Solr, event for "cores up-and-running"?

2015-04-29 Thread Shawn Heisey
On 4/29/2015 9:26 AM, Erick Erickson wrote: > I'm not sure there _is_ a good way short of sending a query at it. > Since great efforts are made to have the embedded Solr act just like > an external version, there's not much in the way of back-doors that I > know of. Do the calls which create the E

Re: Embedded Solr, event for "cores up-and-running"?

2015-04-29 Thread Erick Erickson
I'm not sure there _is_ a good way short of sending a query at it. Since great efforts are made to have the embedded Solr act just like an external version, there's not much in the way of back-doors that I know of. Best, Erick On Wed, Apr 29, 2015 at 6:49 AM, Clemens Wyss DEV wrote: > If I run S

Re: How to improve the performance of query with expand query

2015-04-29 Thread yliu
One more question. The 70516592 includes everything in the core, not only those have the same _root_. My expand.field limit the results to the same root. There are only 382 documents with the same _root_. Would that have ruled out most of the other documents? Maybe I have misunderstood your que

Re: How to improve the performance of query with expand query

2015-04-29 Thread yliu
I will try it out with Solr 5. Thanks for your help, yliu -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-improve-the-performance-of-query-with-expand-query-tp4202895p4203030.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to improve the performance of query with expand query

2015-04-29 Thread Joel Bernstein
Ok, this explains the slowness. The expand.q returns 380 documents but it's actually postfiltering the entire 70 million+ result set in the case of expand.q=*:*. The problem is the postfilter only approach that Expand uses in Solr 4.x. Expand was originally designed to use the main query as the ex

Re: How to improve the performance of query with expand query

2015-04-29 Thread yliu
There are totally 70516592 documents in the index and 380 is returned from the expand.q query. How do I include licenseNumber in expand.q query? None of the expand documents have that field. Thanks, yliu -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-improve-the-

Re: How to improve the performance of query with expand query

2015-04-29 Thread Joel Bernstein
Is it possible to use the licenseNumber is the expand.q? In Solr 4*, Expand brings back all the results in the expand.q and maps them to groups. So large result sets generated by expand.q can be slow. In Solr 5, Expand limits the results coming back from expand.q to only include records that have

custom collector

2015-04-29 Thread Robust Links
Hi I need help porting my lucene code from 4 to 5. In particular, I need to customize a collector (to collect all doc Ids in the index - which can be >30MM docs..). Below is how I achieved this in lucene 4. Is there some guidelines how to do this in lucene 5, specially on semantics changes of Atom

Re: How to improve the performance of query with expand query

2015-04-29 Thread yliu
What I have here is the complete request. I am not using any "collapse" in my query. Before the queries I listed in my previous post, I have only: http:///solr-slave/entitlement/select/? Thanks. yliu -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-improve-the-

Re: How to improve the performance of query with expand query

2015-04-29 Thread Joel Bernstein
Can you paste in the entire http request? Wondering if you're using both collapse and expand. Both of these have an effect on performance. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 29, 2015 at 8:48 AM, yliu wrote: > Hi, Joel, > > Here are some details: > > 1) I am using solr ve

Re: Odp.: solr issue with pdf forms

2015-04-29 Thread Erick Erickson
Steve: I'd just look at one field at a time Presumably you have a field that's displaying poorly, "content"? Just look at _that_ field, as http://IP:8080/solr/core_de/terms?terms.fl=content or http://IP:8080/solr/core_de/terms?terms.fl=content&terms.prefix=d Now, that should show you terms

Re: TIKA OCR not working

2015-04-29 Thread Erick Erickson
Yes, the critical bit for knowing what release a JIRA is in is the "Fix Version/s" entry. You have to be a little careful though to only read that when the Resolution is "Fixed", as the "fix version" is sometimes set while the JIRA is still open. On Tue, Apr 28, 2015 at 8:52 PM, trung.ht wrote: >

Embedded Solr, event for "cores up-and-running"?

2015-04-29 Thread Clemens Wyss DEV
If I run Solr in ebmedded mode (which I shouldn't, I know ;) ) how do I know (event?) that the cores are up-and-running, i.e. all is initialized? Thx Clemens

Re: Antwort: Custom Scoring Question

2015-04-29 Thread Johannes Ruscheinski
Hi Stephan, On 29/04/15 14:37, Stephan Schubert wrote: > Hi Johannes, > > did you have a look on Solr edismax and function queries? > https://cwiki.apache.org/confluence/display/solr/Function+Queries Just read it. > > If I got you right, for the case you just want to ignore fields which have > n

Re: Mutli term synonyms

2015-04-29 Thread Kaushik
Hi Roman, Following is my use case: *Schema.xml*... *SolrConfig.xml...* name="/autophrase" class="solr.SearchHandler"> explicit 10 name autophrasingPa

Re: How to improve the performance of query with expand query

2015-04-29 Thread yliu
Hi, Joel, Here are some details: 1) I am using solr version 4.10.4. When my request is: q=contentType:EntitlementBean AND licenseNumber:40281071. It returns in 414ms. And the response size is 4728 bytes. One document returned. When my request is: q=contentType:EntitlementBean AND license

Antwort: Custom Scoring Question

2015-04-29 Thread Stephan Schubert
Hi Johannes, did you have a look on Solr edismax and function queries? https://cwiki.apache.org/confluence/display/solr/Function+Queries If I got you right, for the case you just want to ignore fields which have not a value set on a specific field you can filter them out with a filter query.

RE: Odp.: solr issue with pdf forms

2015-04-29 Thread Allison, Timothy B.
I completely agree with Erick about the utility of the TermsComponent to see what is actually being indexed. If you find problems there and if you haven't done so already, you might also investigate further down the stack. It might make sense to run the tika-app.jar (whichever version you are

Custom Scoring Question

2015-04-29 Thread Johannes Ruscheinski
Hi, I am entirely new to the world of SOLR programming and I have the following questions: In addition to our regular searches we need to implement a specialised form of range search and ranking. We have implemented a CustomScoreQuery and a CustomScoreProvider. I now have a few questions: 1)

Re: Simple search low speed

2015-04-29 Thread Norgorn
In case s1 will face the same problem, the thing was that SOLR caches were turned off, and I underestimated the meaning of caches in desire to save as much RAM as possible. -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-search-low-speed-tp4202135p4202975.html Sent fr

Re: /suggest through SolrJ?

2015-04-29 Thread Tommaso Teofili
2015-04-27 19:22 GMT+02:00 Alessandro Benedetti : > Just had the very same problem, and I confirm that currently is quite a > mess to manage suggestions in SolrJ ! > I have to go with manual Json parsing. > or very not nice NamedList API mess (see an example in JR Oak [1][2]). Regards, Tommaso

AW: Odp.: solr issue with pdf forms

2015-04-29 Thread Steve.Scholl
Sorry, but there really isn't... :-/ I never used the terms component. So I first looked if it is configured, and it really is. Then I tried to get an idea how it works and tried the examples described in the doku. After that I tried to figure out how to get the output from the "misscoded" pdf

Re: /suggest through SolrJ?

2015-04-29 Thread Jan Høydahl
Alessandro, can you open a JIRA issue for this? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 27. apr. 2015 kl. 19.22 skrev Alessandro Benedetti > : > > Just had the very same problem, and I confirm that currently is quite a > mess to manage suggestions in SolrJ !

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
I'm not sure I understand - the autophrasing filter will allow the parser to see all the tokens, so that they can be parsed (and multi-token synonyms) identified. So if you are using the same analyzer at query and index time, they should be able to see the same stuff. are you using multi-token syn