Any way to get reference to original request object from within Solr component?

2012-03-16 Thread SUJIT PAL
Hello, I have a custom component which depends on the ordering of a multi-valued parameter. Unfortunately it looks like the values do not come back in the same order as they were put in the URL. Here is some code to explain the behavior: URL: /solr/my_custom_handler?q=something&myparam=foo&mypa

Re: suggestions on automated testing for solr output

2012-03-16 Thread Gora Mohanty
On 17/03/2012, geeky2 wrote: > hello all, > > i know this is never a fun topic for people, but our SDLC mandates that we > have unit test cases that attempt to validate the output from specific solr > queries. > > i have some ideas on how to do this, but would really appreciate feedback > from any

Sorting Index Results by User's Score

2012-03-16 Thread Phill Tornroth
I'm puzzled on whether or not Solr is the right system for solving this problem I've got. I'm using some Solr indexes for autocompletion, and I have a desire to rank the results by their value to the requesting user. Essentially, I'll tally the number of times the user has chosen particular results

Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-16 Thread vybe3142
Hi, Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming. Use case: * Text Files to be indexed are on file server (A) (some potentially large - several 100 MB) * SOLRJ client is on server (B) * SOLR server is on server (C) running with dynamically created SOLR cores

suggestions on automated testing for solr output

2012-03-16 Thread geeky2
hello all, i know this is never a fun topic for people, but our SDLC mandates that we have unit test cases that attempt to validate the output from specific solr queries. i have some ideas on how to do this, but would really appreciate feedback from anyone that has done this or is doing it now.

Extract terms of a query to do highlighting

2012-03-16 Thread Nicolas Labrot
Hello, I want to do highlighting by "hand" into my indexed document which can be XML, HTML, PDF, SVG, CGM... Given a search query I want to be able to extract all the terms occurring in this query to be able to do custom highlighting on the results. The returned terms should be coherent with the

Re: Error while trying to load JSON

2012-03-16 Thread Pulkit Singhal
It seems that you are using the bbyopen data. If have made up your mind on using the JSON data then simply store it in ElasticSearch instead of Solr as they do take any valid JSON structure. Otherwise, you can download the xml archive from bbyopen and prepare a schema: Here are some generic instru

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-16 Thread Matthew Parker
I'm still having issues replicating in my work environment. Can anyone explain how the replication mechanism works? Is it communicating across ports or through zookeeper to manager the process? On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker < mpar...@apogeeintegration.com> wrote: > All, > > I

Re: Performance Question

2012-03-16 Thread Mikhail Khludnev
Hello, Frankly speaking the computational complexity of Lucene search depends from size of search result: numFound*log(start+rows), but from size of index. Regards On Fri, Mar 16, 2012 at 9:34 PM, Jamie Johnson wrote: > I'm curious if anyone tell me how Solr/Lucene performs in a situation > wh

Re: problems with DisjunctionMaxQuery and early-termination

2012-03-16 Thread Mikhail Khludnev
On Fri, Mar 16, 2012 at 8:38 PM, Carlos Gonzalez-Cadenas < c...@experienceon.com> wrote: > On Fri, Mar 16, 2012 at 9:26 AM, Mikhail Khludnev < > mkhlud...@griddynamics.com> wrote: > >> Hello Carlos, >> > >> so, search all terms with MUST first, you've got the best result in terms >> of precision a

Re: Error while trying to load JSON

2012-03-16 Thread Erick Erickson
bq: Shouldn't it be able to take any valid JSON structure? No, that was never the intent. The intent here was just to provide a JSON-compatible format for indexing data for those who don't like/want to use XML or SolrJ or Solr doesn't index arbitrary XML either. And I have a hard time imaginin

Re: Error while trying to load JSON

2012-03-16 Thread Chambeda
Ok, so my issue is that it must be a flat structure. Why isn't the JSON parser able to deconstruct the object into a flatter structure for indexing? Shouldn't it be able to take any valid JSON structure? -- View this message in context: http://lucene.472066.n3.nabble.com/Error-while-trying-to-lo

Re: Error while trying to load JSON

2012-03-16 Thread Erick Erickson
I don't believe Solr indexes arbitrary JSON, just as it does not index arbitrary XML. You need the input to be quite specific to how Solr expects the data, it's a relatively flat structure. There is an example in /solr/example/exampledocs/books.json that will give you an idea of the expected format

Adding a 'Topics' pulldown for refined initial searches.

2012-03-16 Thread Valentin, AJ
Hello all, Yesterday was my first time using this (or any) email list and I think I did something wrong. Anyways, I will try this again. I have installed Solr search on my Drupal 7 installation. Currently, it works as an 'All' search tool. I'd like to limit the scope of the search with an

Performance Question

2012-03-16 Thread Jamie Johnson
I'm curious if anyone tell me how Solr/Lucene performs in a situation where you have 100,000 documents each with 100 tokens vs having 1,000,000 documents each with 10 tokens. Should I expect the performance to be the same? Any information would be greatly appreciated.

Error while trying to load JSON

2012-03-16 Thread Chambeda
I am trying to load a json document that has the following structure: ... "accessoriesImage": null, "department": "ET", "shipping": [ { "nextDay": 10.19, "secondDay": 6.45, "ground": 1.69 } ], "preowned": false, "format": "CD", ... When executing the curl reques

Re: Java Server and PHP server

2012-03-16 Thread Erick Erickson
It's really up to you. All any app needs to connect to Solr is the HTTP connection, even if you use something like SolrJ. Yes, there'll be some latency but I suspect you'll only really notice that if you're trying to index massive amounts of data across the wire. Best Erick On Fri, Mar 16, 2012 a

Re: problems with DisjunctionMaxQuery and early-termination

2012-03-16 Thread Carlos Gonzalez-Cadenas
On Fri, Mar 16, 2012 at 9:26 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Hello Carlos, > Hello Mikhail: Thanks for your answer. > > I have two concerns about your approach. First-K (not top-K honestly) > collector approach impacts recall of your search and using disjunctive > q

Java Server and PHP server

2012-03-16 Thread Spadez
Hi, Call me crazy, but I don’t like the idea of having a single server which not only runs my PHP site on Apache, but also runs SOLR and Nutch, inclusive of Tomcat. Is it a terrible idea to have one Rackspace VPS account which runs the PHP site with MYSQL database, and another rackspace account w

Re: Maybe switching to Solr Cores

2012-03-16 Thread Michael Kuhlmann
Am 16.03.2012 16:42, schrieb Mike Austin: It seems that the biggest real-world advantage is the ability to control core creation and replacement with no downtime. The negative would be the isolation however the are still somewhat isolated. What other benefits and common real-world situations wo

Maybe switching to Solr Cores

2012-03-16 Thread Mike Austin
I'm trying to understand the difference between multiple Tomcat indexes using context fragments versus using one application with multiple cores? Since I'm currently using tomcat context fragments to run 7 different indexes, could I get help understanding more why I would want to use solr cores ins

Re: Field Value Substitution

2012-03-16 Thread Erick Erickson
I guess I don't quite understand. If the description field is single valued, simply specifying that field on the fl parameter should return it. It would help if you showed some sample documents, because I can't tell whether you only have one descriptor per document or several By the way, you'

Re: Filter Queries: Intersection

2012-03-16 Thread Alexander Golubowitsch
http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/ -> That's an excellent read - thanks a lot for the heads-up! Kind regards, Alex On 16.03.2012 14:08, Erick Erickson wrote: Your problem is that you're saying with the -myField:* "Remove from the result set all documents

Re: Apache solr issue after configuration

2012-03-16 Thread Richard Noble
Solr newbie here, but this looks familier. Another thing to make sure of is that the plugin jars are not ialready loaded from the standard java classpath. I had a problem with this in that some jars were being loaded by the standard java classloader, and my some other plugins were being loaded by

Spellchecker problem

2012-03-16 Thread Finotti Simone
Hello, I have this configuration where a single master builds the Solr index and it replicates to two slave Solr instances. Regular queries are sent only to those two slaves. Configurations are the same for everyone (except of replication section, of course). My problem: it's happened that, in

SolrJ Request issue when trying to add a PDF file to Index

2012-03-16 Thread Jones, Rhys
Hello, I'm having trouble adding a pdf file to my index. It's multicored. My server object instantiates properly (StreamingUpdateSolrServer). In my request object (ContentStreamUpdateRequest) I add a couple of literals to populate fields in the index that the parsed content of the PDF won't

Re: Master/Slave switch on teh fly. Replication

2012-03-16 Thread Michael Kuhlmann
Am 16.03.2012 15:05, schrieb stockii: i have 8 cores ;-) i thought that replication is defined in solrconfig.xml and this file is only load on startup and i cannot change master to slave and slave to master without restarting the servlet-container ?!?!?! No, you can reload the whole core at an

Re: Master/Slave switch on teh fly. Replication

2012-03-16 Thread stockii
i have 8 cores ;-) i thought that replication is defined in solrconfig.xml and this file is only load on startup and i cannot change master to slave and slave to master without restarting the servlet-container ?!?!?! - --- System ---

mailto: scheme aware tokenizer

2012-03-16 Thread Kai Gülzau
Is there any analyzer out there which handles the mailto: scheme? UAX29URLEmailTokenizer seems to split at the wrong place: mailto:t...@example.org -> mailto:test example.org As a workaround I use mailto:"; replacement="mailto: "/> Regards, Kai Gülzau novomind AG ___

Re: Solr in Windows XP + JDK 5 + Tomcat 6.0.13

2012-03-16 Thread danchoithuthiet
Hi Alejandro, I followed your instructions step by step, but it still isn't working HTTP Status 404 - /solr/admin type Status report message /solr/admin description The requested resource (/solr/admin) is not available. I used Apache Tomcat/6.0.35 Xampp 1.7.7 Sun JDK 7 -- View this messag

Re: Indexing Halts for long time and then restarts

2012-03-16 Thread Erick Erickson
Flattery will get you a lot ... Yeah, I expect you're hitting a merge issue. To test, set up autocommit to only trigger after a lot of docs are committed. You should see the time before the big pause change radically (perhaps disappear if you don't commit until the run is done). Note that it'll s

Re: Index-time field boost with DIH

2012-03-16 Thread Erick Erickson
I'd go ahead and do the query time boosts. The "penalty" will be a single multiplication per doc (I think), and probably not noticeable. And it's much more flexible/easier... Best Erick On Thu, Mar 15, 2012 at 9:21 PM, Arcadius Ahouansou wrote: > Hello. > > I have an SQL database with documents

Re: Solr 3.5.0 - different behaviour on rows?

2012-03-16 Thread Erick Erickson
Well, a lot depends upon the query analysis. Are you using the *exact* same analysis chains in both? Look at the admin/analysis page and see how your term evaluates. I'm guessing that WordDelimiterFilterFactory is being used in the 3.5 case and not in the 1.4.1 case so the 3.5 case is matching ever

RE: Regarding Indexing Multiple Columns Best Practise

2012-03-16 Thread Husain, Yavar
Thanks Erick!! -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, March 16, 2012 6:58 PM To: solr-user@lucene.apache.org Subject: Re: Regarding Indexing Multiple Columns Best Practise I would *guess* you won't notice much/any difference. Note that, if

Indexing Halts for long time and then restarts

2012-03-16 Thread Husain, Yavar
Since Erick is really active answering now so posting a quick question :) I am using: DIH Solr 3.5 on Windows Building Auto Recommendation Utility Having around 1 Billion Query Strings (3-6 words each) in database. Indexing them using NGram. Merge Factor = 30 Auto Commit not set. DIH halted a

Re: Regarding Indexing Multiple Columns Best Practise

2012-03-16 Thread Erick Erickson
I would *guess* you won't notice much/any difference. Note that, if you use a fieldType with the increment gap > 1 (the default is often set to 100), phrase queries (slop) will perform differently depending upon which option you choose. Best Erick On Thu, Mar 15, 2012 at 10:49 AM, Husain, Yavar

Re: Apache solr issue after configuration

2012-03-16 Thread Erick Erickson
At a guess, you don't have any paths to solr dist. Try copying all the other lib directives from the example (not core) dir (adjusting paths as necessary). The error message indicates you aren't getting to /dist/apache-solr-velocity-3.5.0.jar Best Erick On Thu, Mar 15, 2012 at 9:48 AM, ViruS

Re: PorterStemmer using example schema and data

2012-03-16 Thread Erick Erickson
What you think the results of stemming should be and what they actually are sometimes differ ... Look at the admin/analysis page, check the "verbose" boxes and try recharging rechargeable and you'll see, step by step, the results of each element of the analysis chain. Since the Porter stemmer is a

Re: Master/Slave switch on teh fly. Replication

2012-03-16 Thread Erick Erickson
What's the use-case? Presumably you have different configs... I'm actually not sure if you can do a reload see: http://wiki.apache.org/solr/CoreAdmin#RELOAD without a core, but you could try. Best Erick On Thu, Mar 15, 2012 at 4:59 AM, stockii wrote: > Hello. > > Is it possible to switch master

Re: Filter Queries: Intersection

2012-03-16 Thread Erick Erickson
Your problem is that you're saying with the -myField:* "Remove from the result set all documents with any value in myField", which is not what you want. Lucene query language is not strictly boolean logic, here's an excellent writeup: http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-

Re: utf8 encoding for solr not working

2012-03-16 Thread Tanguy Moal
I think you're using PHP to request solr. You can ask solr to respond in several different formats (xml, json, php, ...), see http://wiki.apache.org/solr/QueryResponseWriter . Depending on how you connect to solr from php, you may want to use html_entity_decode before using mb_substr. -- Ta

Re: Query results

2012-03-16 Thread Tanguy Moal
That's because of the space. If you want to include the space in the search query (performing exact match), then use double quotes around your search terms : q=multiplex_name:"Agent Vinod" Online documentation : * http://wiki.apache.org/solr/SolrQuerySyntax * http://lucene.apache.org/core/ol

utf8 encoding for solr not working

2012-03-16 Thread Merlin Morgenstern
I am running solr 3.5 with a mysql data connector. Solr is configured to use UTF8 as encoding: unfortunatelly solr does encode special characters like "ä" into htmlentities: ä which leads to problems when cutting strings with php mb_substr(..) How can I configure solr to deliver UTF-8 instea

Re: Responding to Requests with Chunks/Streaming

2012-03-16 Thread Nicholas Ball
Mikhail & Ludovic, Thanks for both your replies, very helpful indeed! Ludovic, I was actually looking into just that and did some tests with SolrJ, it does work well but needs some changes on the Solr server if we want to send out individual documents a various times. This could be done with a w

Request Timeout Parameter in update queries

2012-03-16 Thread samarth s
Hi, Does an update query to solr work well when sent with a timeout parameter ? https://issues.apache.org/jira/browse/SOLR-502 For example, consider an update query was fired with a timeout of 30 seconds, and the request got aborted half way due to the timeout. Can this corrupt the index in any wa

Re: Problem witch adding classpath

2012-03-16 Thread Chantal Ackermann
Hi, I put all those jars into SOLR_HOME/lib. I do not specify them in solrconfig.xml explicitely, and they are all found all right. Would that be an option for you? Chantal On Thu, 2012-03-15 at 17:43 +0100, ViruS wrote: > Hello, > > I just now try to switch from 3.4.0 to 3.5.0 ... i make ne

Re: Field Value Substitution

2012-03-16 Thread Jan Høydahl
You could use the MappingUpdateProcessor for this, doing the mapping through a simple synonyms-like config file at index time, indexing the description in a String field. https://issues.apache.org/jira/browse/SOLR-2151 Or you could make a SearchComponent plugin doing the same thing "live" at que

Re: exact match with id field (represented as url) in solr 3.5

2012-03-16 Thread Tanguy Moal
Hello Roberto, Exact match needs extra " (double-quotes) surrounding the exact thing you want to query in the id field. Give a try to a query like this : id:"http://127.0.0.1:/my/personal/testuser/Personal Documents/cal9.pdf" See this wiki page :

Re: problems with DisjunctionMaxQuery and early-termination

2012-03-16 Thread Mikhail Khludnev
Hello Carlos, I have two concerns about your approach. First-K (not top-K honestly) collector approach impacts recall of your search and using disjunctive queries impacts precision e.g. I want to find some fairly small and quiet, and therefore unpopular "Lemond Hotel" you parse my phrase into Lemo

exact match with id field (represented as url) in solr 3.5

2012-03-16 Thread Roberto Iannone
Dear all,  I've got an issue querying for the "id" field in solr. the "id" field is filled with document url taking from a sharepoint library using manifoldcf repository connector. in my index there are these document ids:           http: