Re: Search with punctuations

2013-07-14 Thread kobe.free.wo...@gmail.com
Hi Erick, Thanks for your reply! I have tried both of the suggestions that you have mentioned i.e., 1. Using WhitespaceTokensizerFactory 2. Using WordDelimiterFilterFactory with catenateWords="1" But, I still face the same issue. Should the tokenizers/ factories used must be the same for both "

Re: Norms

2013-07-14 Thread Mark Miller
On Jul 10, 2013, at 4:39 AM, Daniel Collins wrote: > QueryNorm is what I'm still trying to get to the bottom of exactly :) If you have not seen it, some reading from the past here… https://issues.apache.org/jira/browse/LUCENE-1896 - Mark

Re: How to from solr facet exclude specific “Tag”!

2013-07-14 Thread Upayavira
Make your two fq clauses separate fq params? Would be better for your caches, and would mean the tag is easily associated with the whole fq querystring. Upayavira On Sun, Jul 14, 2013, at 03:14 AM, 张智 wrote: > solr 4.3 > > this is my query request params: > > 0 name="QTime">15true name="indent"

Re: Solr caching clarifications

2013-07-14 Thread Manuel Le Normand
Alright, thanks Erick. For the question about memory usage of merges, taken from Mike McCandless Blog The big thing that stays in RAM is a logical int[] mapping old docIDs to new docIDs, but in more recent versions of Lucene (4.x) we use a much more efficient structure than a simple int[] ... see

Re: Apache Solr 4 - after 1st commit the index does not grow

2013-07-14 Thread Erick Erickson
Well, that's one. OutOfMemoryErrors will stop things from happening for sure, the cure is to give the JVM more memory. Additionally, multiple update of a doc with the same will replace the old copy with a new one, that might be what you're seeing. But get rid of the OOM first. Best Erick On Su

Re: solr autodetectparser tikaconfig dataimporter error

2013-07-14 Thread Jack Krupansky
"Caused by: java.lang.NoSuchMethodError:" That means you have some out of date jars or some newer jars mixed in with the old ones. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Sunday, July 14, 2013 3:07 PM To: solr-user@lucene.apache.org Subject: Re: solr autodetect

Re: solr autodetectparser tikaconfig dataimporter error

2013-07-14 Thread Andreas Owen
hi is there nowone with a idea what this error is or even give me a pointer where to look? If not is there a alternitave way to import documents from a xml-file with meta-data and the filename to parse? thanks for any help. On 12. Jul 2013, at 10:38 PM, Andreas Owen wrote: > i am using solr

Re: HTTP Status 503 - Server is shutting down

2013-07-14 Thread PeterKerk
Ok, still getting the same error "HTTP Status 503 - Server is shutting down", so here's what I did now: - reinstalled tomcat - deployed solr-4.3.1.war in C:\Program Files\Apache Software Foundation\Tomcat 6.0\webapps - copied log4j-1.2.16.jar,slf4j-api-1.6.6.jar,slf4j-log4j12-1.6.6.jar to C:\Progr

Re: Apache Solr 4 - after 1st commit the index does not grow

2013-07-14 Thread glumet
When I look into the log, there is: SEVERE: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2668) at org.apache.lucene.index.IndexWriter.commitInte

Re: external file field and fl parameter

2013-07-14 Thread Chris Collins
Yes that worked, thanks Alan. The consistency of this api is "challenging". C On Jul 14, 2013, at 11:03 AM, Alan Woodward wrote: > Hi Chris, > > Try wrapping the field name in a field() function in your fl parameter list, > like so: > fl=field(eff_field_name) > > Alan Woodward > www.flax.co.

Re: external file field and fl parameter

2013-07-14 Thread Alan Woodward
Hi Chris, Try wrapping the field name in a field() function in your fl parameter list, like so: fl=field(eff_field_name) Alan Woodward www.flax.co.uk On 14 Jul 2013, at 18:41, Chris Collins wrote: > Why would I be re-indexing an external file field? The whole purpose is that > its brought in

Re: ACL implementation: Pseudo-join performance & Atomic Updates

2013-07-14 Thread Oleg Burlaca
Hello Erick, > Join performance is most sensitive to the number of values > in the field being joined on. So if you have lots and lots of > distinct values in the corpus, join performance will be affected. Yep, we have a list of unique Id's that we get by first searching for records where loggedIn

Re: external file field and fl parameter

2013-07-14 Thread Chris Collins
Why would I be re-indexing an external file field? The whole purpose is that its brought in at runtime and not part of the index? C On Jul 14, 2013, at 10:13 AM, Shawn Heisey wrote: > On 7/14/2013 7:05 AM, Chris Collins wrote: >> Yep I did switch on stored=true in the field type. I was able to

Apache Solr 4 - after 1st commit the index does not grow

2013-07-14 Thread glumet
I have written my own plugin for Apache Nutch 2.2.1 to crawl images, videos and podcasts from selected sites (I have 180 urls in my seed). I put this metadata to a hBase store and now I want to save it to the index (Solr). I have a lot of metadatas to save (webpages + images + videos + podcast). I

Re: external file field and fl parameter

2013-07-14 Thread Shawn Heisey
On 7/14/2013 7:05 AM, Chris Collins wrote: > Yep I did switch on stored=true in the field type. I was able to confirm a > few ways that there are values for the eff by two methods: > > 1) changing desc to asc produced drastically different results. > > 2) debugging FileFloatSource the following

Re: SolrCloud leader

2013-07-14 Thread Shawn Heisey
On 7/14/2013 6:42 AM, kowish.adamosh wrote: > The problem is that I don't want to invoke data import on 8 server nodes but > to choose only one for scheduling. Of course if this server will shut down > then another one needs to take the scheduler role. I can see that there is > task for sheduling h

Re: ACL implementation: Pseudo-join performance & Atomic Updates

2013-07-14 Thread Oleg Burlaca
Hello Jack, Thanks for so many links, my comments are below, I'll found a way to rephrase all my questions in one: How to implement a DAC (Discretionary Access Control) similar to Windows OS using SOLR? What we have: a hierarchical filesystem, user and groups, permissions applied at the level of

Re: HTTP Status 503 - Server is shutting down

2013-07-14 Thread Shawn Heisey
On 7/14/2013 6:43 AM, PeterKerk wrote: > Hi Shawn, > > I'm also getting the HTTP Status 503 - Server is shutting down error when > navigating to http://localhost:8080/solr-4.3.1/ > INFO: Deploying web application archive solr-4.3.1.war > log4j:WARN No appenders could be found for logger > (org.

Re: ACL implementation: Pseudo-join performance & Atomic Updates

2013-07-14 Thread Erick Erickson
Join performance is most sensitive to the number of values in the field being joined on. So if you have lots and lots of distinct values in the corpus, join performance will be affected. bq: I suppose the delete/reindex approach will not change soon There is ongoing work (search the JIRA for "Sta

Re: Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread Erick Erickson
I'm completely ignorant of all things PHP, including the state of any Solr client code, so I'm afraid I can't help with that... Best Erick On Sun, Jul 14, 2013 at 11:03 AM, xan wrote: > Thanks for the link. Also, having gone quite far with my work using the PHP > Solr client, isn't there anythin

Re: ACL implementation: Pseudo-join performance & Atomic Updates

2013-07-14 Thread Jack Krupansky
Take a look at LucidWorks Search and its access control: http://docs.lucidworks.com/display/help/Search+Filters+for+Access+Control Role-based security is an easier nut to crack. Karl Wright of ManifoldCF had a Solr patch for document access control at one point: SOLR-1895 - ManifoldCF SearchCom

Re: Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread xan
Thanks for the link. Also, having gone quite far with my work using the PHP Solr client, isn't there anything that could be done using the PHP Solr client only? -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-indexed-content-of-files-using-ExtractingRequestHandler-tp

ACL implementation: Pseudo-join performance & Atomic Updates

2013-07-14 Thread Oleg Burlaca
Hello all, Situation: We have a collection of files in SOLR with ACL applied: each file has a multi-valued field that contains the list of userID's that can read it: here is sample data: Id | content | userId 1 | text text | 4,5,6,2 2 | text text | 4,5,9 3 | text text | 4,2 Problem: when ACL

Re: Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread Erick Erickson
Right, sorry... http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ On Sun, Jul 14, 2013 at 8:31 AM, xan wrote: > Sorry, but did you forget to send me the example's link? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Getting-indexed-content-of-files-using-

Re: SolrCloud leader

2013-07-14 Thread Jack Krupansky
In theory, each of the nodes uses the same configuration, right? So, in theory, ANY of the nodes can do a DIH import. It is only way down low in the update processing chain that an individual Solr input document needs to have its key hashed and then the request is routed to the leader of the ap

Re: external file field and fl parameter

2013-07-14 Thread Chris Collins
Yep I did switch on stored=true in the field type. I was able to confirm a few ways that there are values for the eff by two methods: 1) changing desc to asc produced drastically different results. 2) debugging FileFloatSource the following was getting triggered filling the vals array:

Re: HTTP Status 503 - Server is shutting down

2013-07-14 Thread PeterKerk
Hi Shawn, I'm also getting the HTTP Status 503 - Server is shutting down error when navigating to http://localhost:8080/solr-4.3.1/ I already copied the logging.properties file from C:\Dropbox\Databases\solr-4.3.1\example\etc to C:\Dropbox\Databases\solr-4.3.1\example\lib Here's my Tomcat consol

Re: SolrCloud leader

2013-07-14 Thread kowish.adamosh
The problem is that I don't want to invoke data import on 8 server nodes but to choose only one for scheduling. Of course if this server will shut down then another one needs to take the scheduler role. I can see that there is task for sheduling https://issues.apache.org/jira/browse/SOLR-2305 . I h

Re: Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread xan
Sorry, but did you forget to send me the example's link? -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-indexed-content-of-files-using-ExtractingRequestHandler-tp4077856p4077877.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem using Term Component in solr

2013-07-14 Thread Erick Erickson
by "regularizing the title" I meant either indexing and searching exactly: Medical Engineering and Physics or Medical Eng. and Phys. Or you could remove the stopwords yourself at both index and query time, which would fix your "Physics of Fluids" example. The problem here is that you'll be foreve

Re: Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread Erick Erickson
Well, cURL is generally not what people use for production. What I'd consider is using SolrJ (which you can access Tika from) and then store the raw pdf (or whatever) document as a binary data type in Solr. Here's an example (with DB indexing mixed in, but you should be able to pull that part out)

Re: external file field and fl parameter

2013-07-14 Thread Erick Erickson
Did you store the field? I.e. set stored="true"? And does the EFF contain values for the docs you're returning? Best Erick On Sun, Jul 14, 2013 at 3:32 AM, Chris Collins wrote: > I am playing with external file field for sorting. I created a dynamic field > using the ExternalFileField type. >

Re: Custom processing in Solr Request Handler plugin and its debugging ?

2013-07-14 Thread Erick Erickson
Not sure how to do the "pass to another request handler" thing, but the debugging part is pretty straightforward. I use IntelliJ, but as far as I know Eclipse has very similar capabilities. First, I cheat and path to the jar that's the output from my IDE, that saves copying the jar around. So my s

Re: add to ContributorsGroup - Instructions for setting up SolrCloud on jboss

2013-07-14 Thread Erick Erickson
Done, sorry it took so long, hadn't looked at the list in a couple of days. Erick On Fri, Jul 12, 2013 at 5:46 PM, Ali, Saqib wrote: > username: saqib > > > On Fri, Jul 12, 2013 at 2:35 PM, Ali, Saqib wrote: > >> Hello, >> >> Can you please add me to the ContributorsGroup? I would like to add

Re: Multiple queries or Filtering Queries in Solr

2013-07-14 Thread Erick Erickson
Isn't this just a filter query? (fq=)? Something like q=query2&fq=query1 Although I don't quite understand the 500 > 50, but you can always tack on additional fq clauses, it's basically set intersection. As for limiting the results a user sees, that's what thr &rows parameter is for. So another

Re: Does Solrj Batch Processing Querying May Confuse?

2013-07-14 Thread Erick Erickson
Well, if you can find one of the docs, or you know one of the IDs that's missing, try explainOther, see: http://wiki.apache.org/solr/CommonQueryParameters#explainOther Best Erick On Fri, Jul 12, 2013 at 8:29 AM, Furkan KAMACI wrote: > I've crawled some webpages and indexed them at Solr. I've que

How to from solr facet exclude specific “Tag”!

2013-07-14 Thread 张智
solr 4.3 this is my query request params: 015truetrue*:*1373713374569{!ex=city}CityId{!ex=company}CompanyIdxml{!tag=city}CityId:729 AND {!tag=company}CompanyId:16122 This is the query response "Facet" content: 100171894067747765780580922921328975...808776772765668667402401390971...

Re: Problem using Term Component in solr

2013-07-14 Thread Parul Gupta(Knimbus)
Hi, Vocabulary is not known that's the main issue else I will implement synonyms instead. what do u mean by 'regularizing the title'. so let me know some solution... -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-using-Term-Component-in-solr-tp4077200p4077865.htm

Getting indexed content of files using ExtractingRequestHandler

2013-07-14 Thread xan
Hi, I'm using the PHP Solr client (ver: 1.0.2). I'm indexing the contents through my database. Suppose $data is a stdClass object having id, name, title, etc. from a database entry. Next, I declare a solr Document and assign fields to it.: $doc = new SolrInputDocument(); $doc->addField ('id' ,

external file field and fl parameter

2013-07-14 Thread Chris Collins
I am playing with external file field for sorting. I created a dynamic field using the ExternalFileField type. I naively assumed that the "fl" argument would allow me to return the value the external field but doesnt seem to do so. For instance I have a defined a dynamic field: *_efloat th