exact match with id field (represented as url) in solr 3.5

2012-03-16 Thread Roberto Iannone

  
  
Dear all, 
 I've got an issue querying for the "id" field in solr. the "id"
field is filled with document url taking from a sharepoint library
using manifoldcf repository connector.

in my index there are these document ids:


  
    
   
http://localhost:/my/personal/testuser/Personal%20Documents/cal9.pdf
  
  
    
   
http://localhost:/my/personal/testuser/Personal%20Documents/cal1.pdf
  
  
    
   
http://localhost:/my/personal/testuser/Personal%20Documents/cal3.pdf
  
  
    
   
http://localhost:/my/personal/testuser/Personal%20Documents/cal17.pdf
  
  
    
   
http://localhost:/my/personal/testuser/Personal%20Documents/cal6.pdf
  


When a submit this query:


I got the following (wrong) response from Solr:


  
    0
    15
    
  
  id:http\://127.0.0.1\:/my/personal/testuser/Personal
Documents/cal9.pdf
  true
  id
    
  
  
    
  
  http://127.0.0.1:/my/personal/testuser/Personal%20Documents/cal17.pdf
    
  
  
    
    id:http\://127.0.0.1\:/my/personal/testuser/Personal
Documents/cal9.pdf
    
    id:http\://127.0.0.1\:/my/personal/testuser/Personal
Documents/cal9.pdf
    
    id:http://127.0.0.1:/my/personal/testuser/Personal
(text:documents text:cal9 text:pdf)
    
    id:http://127.0.0.1:/my/personal/testuser/Personal
(text:documents text:cal9 text:pdf)
    
  "http://127.0.0.1:/my/personal/testuser/Personal%20Documents/cal17.pdf">
  0.0024349838 = (MATCH) product of: 0.0048699677 = (MATCH) sum
  of: 0.0048699677 = (MATCH) product of: 0.014609902 = (MATCH)
  sum of: 0.014609902 = (MATCH) weight(text:pdf in 0), product
  of: 0.39035153 = queryWeight(text:pdf), product of: 1.9162908
  = idf(docFreq=1, maxDocs=5) 0.20370162 = queryNorm
  0.037427552 = (MATCH) fieldWeight(text:pdf in 0), product of:
  1.0 = tf(termFreq(text:pdf)=1) 1.9162908 = idf(docFreq=1,
  maxDocs=5) 0.01953125 = fieldNorm(field=text, doc=0)
  0.3334 = coord(1/3) 0.5 = coord(1/2)
    
    LuceneQParser
    
  15.0
  
    0.0
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
  
  
    15.0
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  15.0
    
  
    
  


Taking in account the dugug info, I escaped the (space) character
with "\" but I got empty results with the response:


  
    0
    0
    
  
  id:http\://127.0.0.1\:/my/personal/testuser/Personal\
  Documents/cal9.pdf
  true
  id
    
  
  
  
    
    id:http\://127.0.0.1\:/my/personal/testuser/Personal\
    Documents/cal9.pdf
    
    id:http\://127.0.0.1\:/my/personal/testuser/Personal\
    Documents/cal9.pdf
    
    id:http://127.0.0.1:/my/personal/testuser/Personal
    Documents/cal9.pdf
    
    id:http://127.0.0.1:/my/personal/testuser/Personal
    Documents/cal9.pdf
    
    LuceneQParser
    
  0.0
  
    0.0
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
  
  
    0.0
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
    

  0.0
    
  
    
  


Any suggestions ?

Cheers 

Rob

-- 
  Dott. Roberto
Iannone
  
   Ubuntu is ...
  "A traveller through a country would stop at a village and
he didn't have to ask for food or for water. Once he stops,
the people give him food, entertain him. That is one aspect
of Ubuntu, but it will have various aspects. Ubuntu does not
   

Re: problems with DisjunctionMaxQuery and early-termination

2012-03-16 Thread Mikhail Khludnev
Hello Carlos,

I have two concerns about your approach. First-K (not top-K honestly)
collector approach impacts recall of your search and using disjunctive
queries impacts precision e.g. I want to find some fairly small and quiet,
and therefore unpopular "Lemond Hotel" you parse my phrase into Lemond OR
Hotel and return 1K of popular hotels but not Lemond one because it's
nearly a hapax. So, I don't believe that it's a great search.
And the also concern from the end of your letter is about joining separate
query result. I'd like to remind that absolute scores from the different
queries are not comparable at all, and maybe, but I'm not sure the relative
ones scaled by max score are comparable.
I suppose you need conjunctive queries instead. And the great stuff about
them is "not-found for free" getting the zero result found cost is
proportional to number of query terms i.e. miserable.
so, search all terms with MUST first, you've got the best result in terms
of precision and recall if you've got something. Otherwise you still have a
lot of time. You need to drop one of the words or switch ones of them into
SHOULD. Enumerating all combinations is NPcomplete task I believe. But you
have a good heuristics:
* zero docFreq means that you can drop this term off or pass it through
spell correction
* if you have a instant suggest like app and has zero result for some
phrase, maybe dropping the last word gives you the phrase which had some
results before, and present in cache.
* otherwise excluding less frequent term from conjunction probably gives
non-zero results

Regards

On Thu, Mar 15, 2012 at 12:01 AM, Carlos Gonzalez-Cadenas <
c...@experienceon.com> wrote:

> Hello all,
>
> We have a SOLR index filled with user queries and we want to retrieve the
> ones that are more similar to a given query entered by an end-user. It is
> kind of a "related queries" system.
>
> The index is pretty big and we're using early-termination of queries (with
> the index sorted so that the "more popular" queries have lower docids and
> therefore the termination yields higher-quality results)
>
> Clearly, when the user enters a user-level query into the search box, i.e.
> "cheap hotels barcelona offers", we don't know whether there exists a
> document (query) in the index that contains these four words or not.
>  Therefore, when we're building the SOLR query, the first intuition would
> be to do a query like this "cheap OR hotels OR barcelona OR offers".
>
> If all the documents in the index where evaluated, the results of this
> query would be good. For example, if there is no query in the index with
> these four words but there's a query in the index with the text "cheap
> hotels barcelona", it will probably be one of the top results, which is
> precisely what we want.
>
> The problem is that we're doing early termination and therefore this query
> will exhaust very fast the top-K result limit (our custom collector limits
> on the number of evaluated documents), given that queries like "hotels in
> madrid" or "hotels in NYC" will match the OR expression described above
> (because they all match "hotels").
>
> Our next step was to think in a DisjunctionMaxQuery, trying to write a
> query like this:
>
> DisjunctionMaxQuery:
>  1) +cheap +hotels +barcelona +offers
>  2) +cheap +hotels +barcelona
>  3) +cheap +hotels
>  4) +hotels
>
> We were thinking that perhaps the sub-queries within the
> DisjunctionMaxQuery were going to get evaluated in "parallel" given that
> they're separated queries, but in fact from a runtime perspective it does
> behave in a similar way than the OR query that we described above.
>
> Our desired behavior is to try match documents with each subquery within
> the DisjunctionMaxQuery (up to a per-subquery limit that we put) and then
> score them and return them all together (therefore we don't want all the
> matches being done by a single sub-query, like it's happening now).
>
> Clearly, we could create a script external to SOLR that just runs the
> several sub-queries as standalone queries and then joins all the results
> together, but before going for this we'd like to know if you have any ideas
> on how to solve this problem within SOLR. We do have our own QParser, and
> therefore we'd be able to implement any arbitrary query construction that
> you can come up with, or even create a new Query type if it's needed.
>
> Thanks a lot for your help,
> Carlos
>
>
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
>
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: exact match with id field (represented as url) in solr 3.5

2012-03-16 Thread Tanguy Moal

  
  
Hello Roberto,

Exact match needs extra " (double-quotes) surrounding the exact
thing you want to query in the id field.
Give a try to a query like this :
id:"http://127.0.0.1:/my/personal/testuser/Personal
Documents/cal9.pdf"

See this wiki page :

http://wiki.apache.org/solr/SolrQuerySyntax
and more precisely :

http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/queryparsersyntax.html

From the debug output :
The first part was used to query the id field :
id:http://127.0.0.1:/my/personal/testuser/Personal
The query was only splitted on space and not on (characters like
':', '/', '.', etc because your id field must be of the string type,
and no tokenization occur apart from the one of the query parser...)
The other parts, lacking a field operator, were queried against
default search field (text) and it was splitted according to the
tokenization of the text field as :
text:Documents
text:cal9
text:pdf

What happened more precisely is that nothing matched on the
id:something part, only the textual content after the space matched
in the text field.

Since the query got split into several part, a default query
operator was used to perform a disjunction between the query terms.
You should provide your solr configuration and schema (solrconf.xml
and schema.xml) when using the mailing list as it helps greatly
debuging this.

Hope this helps,

--
Tanguy

Le 16/03/2012 08:55, Roberto Iannone a écrit :

  
  Dear all, 
   I've got an issue querying for the "id" field in solr. the "id"
  field is filled with document url taking from a sharepoint library
  using manifoldcf repository connector.
  
  in my index there are these document ids:
  
  
    
      
      http://localhost:/my/personal/testuser/Personal%20Documents/cal9.pdf
    
    
      
      http://localhost:/my/personal/testuser/Personal%20Documents/cal1.pdf
    
    
      
      http://localhost:/my/personal/testuser/Personal%20Documents/cal3.pdf
    
    
      
      http://localhost:/my/personal/testuser/Personal%20Documents/cal17.pdf
    
    
      
      http://localhost:/my/personal/testuser/Personal%20Documents/cal6.pdf
    
  
  
  When a submit this query:
  
  
  I got the following (wrong) response from Solr:
  
  
    
      0
      15
      
    
    id:http\://127.0.0.1\:/my/personal/testuser/Personal
  Documents/cal9.pdf
    true
    id
      
    
    
      
    
    http://127.0.0.1:/my/personal/testuser/Personal%20Documents/cal17.pdf
      
    
    
      
      id:http\://127.0.0.1\:/my/personal/testuser/Personal
  Documents/cal9.pdf
      
      id:http\://127.0.0.1\:/my/personal/testuser/Personal
  Documents/cal9.pdf
      
      id:http://127.0.0.1:/my/personal/testuser/Personal
  (text:documents text:cal9 text:pdf)
      
      id:http://127.0.0.1:/my/personal/testuser/Personal
  (text:documents text:cal9 text:pdf)
      
    "http://127.0.0.1:/my/personal/testuser/Personal%20Documents/cal17.pdf">
    0.0024349838 = (MATCH) product of: 0.0048699677 = (MATCH)
  sum
    of: 0.0048699677 = (MATCH) product of: 0.014609902 = (MATCH)
    sum of: 0.014609902 = (MATCH) weight(text:pdf in 0), product
    of: 0.39035153 = queryWeight(text:pdf), product of:
  1.9162908
    = idf(docFreq=1, maxDocs=5) 0.20370162 = queryNorm
    0.037427552 = (MATCH) fieldWeight(text:pdf in 0), product
  of:
    1.0 = tf(termFreq(text:pdf)=1) 1.9162908 = idf(docFreq=1,
    maxDocs=5) 0.01953125 = fieldNorm(field=text, doc=0)
    0.3334 = coord(1/3) 0.5 = coord(1/2)
      
      LuceneQParser
      
    15.0
    
      0.0
      
  
    0.0
      
      
  
    0.0
      
      
  
    0.0
      
      
  
    0.0
      
      
  
    0.0
      
      
  
    0.0
      
    
    
      15.0
      
  
    0.0
      
      
  
    0.0
      
      
  
    0.0
      
      
  
    0.0
      
      
  
    0.0
      
      
  
    15.0
      
    
      
    
  
  

Re: Field Value Substitution

2012-03-16 Thread Jan Høydahl
You could use the MappingUpdateProcessor for this, doing the mapping through a 
simple synonyms-like config file at index time, indexing the description in a 
String field. https://issues.apache.org/jira/browse/SOLR-2151

Or you could make a SearchComponent plugin doing the same thing "live" at query 
time?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 15. mars 2012, at 08:13, tosenthu wrote:

> Hi
> 
> I have a scenario, where I store a field which is an Id, 
> 
> ID field 
> --
> 1
> 3
> 4
> 
> Descrption mapping 
> ---
> 1 = "Options 1"
> 2 = "Options A"
> 3 = "Options 3"
> 4 = "Options 4a"
> 
> Is there a way in solr when ever i query this field should return me the
> description instead of the id. And help me with the procedure to setup solr
> to do this..
> 
> Regards
> Senthil Kumar M R
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Field-Value-Substitution-tp3828028p3828028.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem witch adding classpath

2012-03-16 Thread Chantal Ackermann

Hi,

I put all those jars into SOLR_HOME/lib. I do not specify them in
solrconfig.xml explicitely, and they are all found all right.

Would that be an option for you?

Chantal


On Thu, 2012-03-15 at 17:43 +0100, ViruS wrote:
> Hello,
> 
> I just now try to switch from 3.4.0 to 3.5.0 ... i make new instance and
> when I try use same config for adding libaries i have error.
> SEVERE: java.lang.NoClassDefFoundError:
> org/apache/lucene/analysis/TokenStream
> This error only show when i use polish stempel.
> In config i have set (solr/vrs/conf/solrconfig.xml):
>   
>   
> 
> 
> When I start Solr is adding path:
> INFO: Adding specified lib dirs to ClassLoader
> 2012-03-15 17:35:51 org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/dist/lucene-stempel-3.5.0.jar' to
> classloader
> 2012-03-15 17:35:51 org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/dist/apache-solr-analysis-extras-3.5.0.jar'
> to classloader
> 
> Same problem I have witch Velocity
> In config (solr/ac/conf/solrconfig.xml:
> 
> ...
>  enable="true"/>
> 
> When I satrt have this error:
> SEVERE: org.apache.solr.common.SolrException: Error Instantiating
> QueryResponseWriter, solr.VelocityResponseWriter is not a
> org.apache.solr.response.QueryResponseWriter
> INFO: Adding specified lib dirs to ClassLoader
> 2012-03-15 17:40:17 org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-2.0.jar'
> to classloader
> 
> 
> 
> Full start log here: http://piotrsikora.pl/solr.log
> 
> 
> Thanks in advanced!
> 



Request Timeout Parameter in update queries

2012-03-16 Thread samarth s
Hi,

Does an update query to solr work well when sent with a timeout
parameter ? https://issues.apache.org/jira/browse/SOLR-502
For example, consider an update query was fired with a timeout of 30
seconds, and the request got aborted half way due to the timeout. Can
this corrupt the index in any way ?

-- 
Regards,
Samarth


Re: Responding to Requests with Chunks/Streaming

2012-03-16 Thread Nicholas Ball

Mikhail & Ludovic,

Thanks for both your replies, very helpful indeed!

Ludovic, I was actually looking into just that and did some tests with
SolrJ, it does work well but needs some changes on the Solr server if we
want to send out individual documents a various times. This could be done
with a write() and flush() to the FastOutputStream (daos) in JavBinCodec. I
therefore think that a combination of this and Mikhail's solution would
work best! 

Mikhail, you mention that your solution doesn't currently work and not
sure why this is the case, but could it be that you haven't flushed the
data (os.flush()) you've written in the collect method of DocSetStreamer? I
think placing the output stream into the SolrQueryRequest is the way to go,
so that we can access it and write to it how we intend. However, I think
using the JavaBinCodec would be ideal so that we can work with SolrJ
directly, and not mess around with the encoding of the docs/data etc... 

At the moment the entry point to JavaBinCodec is through the
BinaryResponseWriter which calls the highest level marshal() method which
decodes and sends out the entire SolrQueryResponse (line 49 @
BinaryResponseWriter). What would be ideal is to be able to break up the
response and call the JavaBinCodec for pieces of it with a flush after each
call. Did a few tests with a simple Thread.sleep and a flush to see if this
would actually work and looks like it's working out perfectly. Just trying
to figure out the best way to actually do it now :) any ideas?

An another note, for a solution to work with the chunked transfer encoding
(and therefore web browsers), a lot more development is going to be needed.
Not sure if it's worth trying yet but might look into it later down the
line.

Nick

On Fri, 16 Mar 2012 07:29:20 +0300, Mikhail Khludnev
 wrote:
> Ludovic,
> 
> I looked through. First of all, it seems to me you don't amend regular
> "servlet" solr server, but the only embedded one.
> Anyway, the difference is that you stream DocList via callback, but it
> means that you've instantiated it in memory and keep it there until it
will
> be completely consumed. Think about a billion numfound. Core idea of my
> approach is keep almost zero memory for response.
> 
> Regards
> 
> On Fri, Mar 16, 2012 at 12:12 AM, lboutros  wrote:
> 
>> Hi,
>>
>> I was looking for something similar.
>>
>> I tried this patch :
>>
>> https://issues.apache.org/jira/browse/SOLR-2112
>>
>> it's working quite well (I've back-ported the code in Solr 3.5.0...).
>>
>> Is it really different from what you are trying to achieve ?
>>
>> Ludovic.
>>
>> -
>> Jouve
>> France.
>> --
>> View this message in context:
>>
http://lucene.472066.n3.nabble.com/Responding-to-Requests-with-Chunks-Streaming-tp3827316p3829909.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>


utf8 encoding for solr not working

2012-03-16 Thread Merlin Morgenstern
I am running solr 3.5 with a mysql data connector. Solr is configured to
use UTF8 as encoding:




unfortunatelly solr does encode special characters like "ä" into
htmlentities:

ä

which leads to problems when cutting strings with php mb_substr(..)

How can I configure solr to deliver UTF-8 instead of htmlentities?

Thank you for any help.


Re: Query results

2012-03-16 Thread Tanguy Moal

That's because of the space.

If you want to include the space in the search query (performing exact 
match), then use double quotes around your search terms :


q=multiplex_name:"Agent Vinod"

Online documentation :
* http://wiki.apache.org/solr/SolrQuerySyntax
* 
http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/queryparsersyntax.html


--
Tanguy

Le 16/03/2012 13:16, Abhishek tiwari a écrit :

Please point me where i am wrong :

query: *multiplex_name:Agent Vinod*



headline1


when i search above got few result matching *headline1 (default search
feild) ..
why it it so ?



*





Re: utf8 encoding for solr not working

2012-03-16 Thread Tanguy Moal

I think you're using PHP to request solr.

You can ask solr to respond in several different formats (xml, json, 
php, ...), see http://wiki.apache.org/solr/QueryResponseWriter .


Depending on how you connect to solr from php, you may want to use 
html_entity_decode before using mb_substr.


--
Tanguy

Le 16/03/2012 13:00, Merlin Morgenstern a écrit :

I am running solr 3.5 with a mysql data connector. Solr is configured to
use UTF8 as encoding:




unfortunatelly solr does encode special characters like "ä" into
htmlentities:

ä

which leads to problems when cutting strings with php mb_substr(..)

How can I configure solr to deliver UTF-8 instead of htmlentities?

Thank you for any help.





Re: Filter Queries: Intersection

2012-03-16 Thread Erick Erickson
Your problem is that you're saying with the -myField:* "Remove from
the result set all documents with any value in myField", which is not
what you want. Lucene query language is not strictly boolean logic,
here's an excellent writeup:

http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

you want something like
(myfield:bob myfield:alice) (*:* -myfield:*)

Best
Erick

On Thu, Mar 15, 2012 at 10:23 AM, Alexander Golubowitsch
 wrote:
> Hi all,
>
>  I'm facing problems regarding multiple Filter Queries in SOLR 1.4.1 - I
> hope some one will be able to help.
>
> Example 1 - works fine: {!tag=myfieldtag}(-(myfield:*))
> Example 2 - works fine: {!tag=myfieldtag}((myfield:"Bio" | myfield:"Alexa"))
>
> Please note that in Example 2, result sets of individual filter queries do
> not intersect; 'OR' just works as expected.
>
> Example 3 - no results: {!tag=myfieldtag}((myfield:"Bio" | myfield:"Alexa")
> | -(myfield:*))
>
> What I am trying to generate is results having either no value for myfield,
> or any of {"Alexa", "Bio"}.
>
> How would I acomplish that?
> I have tried all combinations/positions of brackets, +/- etc., without
> success.
>
> Thanks a lot for any advice!
>
> Kind regards,
>  Alex


Re: Master/Slave switch on teh fly. Replication

2012-03-16 Thread Erick Erickson
What's the use-case? Presumably you have different configs...

I'm actually not sure if you can do a reload
see: http://wiki.apache.org/solr/CoreAdmin#RELOAD
without a core, but you could try.

Best
Erick

On Thu, Mar 15, 2012 at 4:59 AM, stockii  wrote:
> Hello.
>
> Is it possible to switch master/slave on the fly without restarting the
> server?
>
> -
> --- System 
> 
>
> One Server, 12 GB RAM, 2 Solr Instances, 8 Cores,
> 1 Core with 45 Million Documents other Cores < 200.000
>
> - Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
> - Solr2 for Update-Request  - delta every Minute - 4GB Xmx
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Master-Slave-switch-on-teh-fly-Replication-tp3828313p3828313.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: PorterStemmer using example schema and data

2012-03-16 Thread Erick Erickson
What you think the results of stemming should be and what they
actually are sometimes differ ...

Look at the admin/analysis page, check the "verbose" boxes
and try recharging rechargeable and you'll see, step by step,
the results of each element of the analysis chain. Since
the Porter stemmer is algorithmic, I'm betting that
these don't stem to the same root.

Best
Erick

On Thu, Mar 15, 2012 at 7:05 AM, Birkmann, Magdalena
 wrote:
>
> Hey there,
> I've been working through the Solr Tutorial
> (http://lucene.apache.org/solr/tutorial.html), using the example schema and
> documents, just working through step by step trying everything out. Everything
> worked out the way it should (just using the example queries and stuff), 
> except
> for the stemming (A search for features:recharging
> 
> should match Rechargeable due to stemming with the EnglishPorterFilter, but
> doesn't). I've been the using the example directory exactly the way it was 
> when
> downloading it, without changing anything. Since I'm fairly new to all of this
> and don't quite understand yet how all of it works or should work, I don't
> really know where the problem lies or how to configure anything to make it 
> work,
> so I just thought I'd ask here, since you all seem so nice :)
> Thanks a lot in advance,
> Magda


Re: Apache solr issue after configuration

2012-03-16 Thread Erick Erickson
At a guess, you don't have any paths to solr dist. Try copying all the other lib
directives from the example (not core) dir (adjusting paths as necessary). The
error message indicates you aren't getting to
/dist/apache-solr-velocity-3.5.0.jar

Best
Erick

On Thu, Mar 15, 2012 at 9:48 AM, ViruS  wrote:
> Hello,
>
> I have still same problem after installation.
> Files are loaded:
>
> ~/appl/apache-solr-3.5.0/example $ java -Dsolr.solr.home=multicore/ -jar
> start.jar 2>&1 | grep contrib
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-2.0.jar'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-beanutils-NOTICE.txt'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-NOTICE.txt'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-collections-NOTICE.txt'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-1.6.4.jar'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-NOTICE.txt'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-LICENSE-ASL.txt'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-collections-3.2.1.jar'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-beanutils-1.7.0.jar'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-beanutils-LICENSE-ASL.txt'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-LICENSE-ASL.txt'
> to classloader
> INFO: Adding
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-collections-LICENSE-ASL.txt'
> to classloader
>
>
> my config multicore/ac/conf/solrconfig.xml
>
> 
>  LUCENE_35
>  
> ...
>  enable="true"/>
> 
>
> And I still get error:
>
> INFO: [ac] Opening new SolrCore at multicore/ac/,
> dataDir=multicore/ac/data/
> 2012-03-15 13:18:11 org.apache.solr.core.JmxMonitoredMap 
> INFO: No JMX servers found, not exposing Solr information with JMX.
> 2012-03-15 13:18:11 org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error Instantiating
> QueryResponseWriter, solr.VelocityResponseWriter is not a
> org.apache.solr.response.QueryResponseWriter
>        at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:427)
> ...
>
> What's wrong?
>
> Thanks in advanced for help!
>
>
>
> --
> Piotr (ViruS) Sikora
> vi...@cdt.pl
> svi...@gmail.com
> JID: vi...@ipc.net.pl


Re: Regarding Indexing Multiple Columns Best Practise

2012-03-16 Thread Erick Erickson
I would *guess* you won't notice much/any difference. Note that, if you use
a fieldType with the increment gap > 1 (the default is often set to 100),
phrase queries (slop) will perform differently depending upon which option
you choose.

Best
Erick

On Thu, Mar 15, 2012 at 10:49 AM, Husain, Yavar  wrote:
> Say I have around 30-40 fields (SQL Table Columns) indexed using Solr from 
> the database. I concatenate those fields into one field by using Solr 
> copyfield directive and than make it default search field which I search.
>
> If at the database level itself I perform concatenation of all those fields 
> into one field and then index that field directly (it will avoid copy 
> operation of Solr of each field to that concatenated field) will it be a 
> indexing performance improvement? I am sure it will be but will it make a 
> big/huge change in indexing running time?
>
> Thanks
> **
> This message may contain confidential or proprietary information intended 
> only for the use of the
> addressee(s) named above or may contain information that is legally 
> privileged. If you are
> not the intended addressee, or the person responsible for delivering it to 
> the intended addressee,
> you are hereby notified that reading, disseminating, distributing or copying 
> this message is strictly
> prohibited. If you have received this message by mistake, please immediately 
> notify us by
> replying to the message and delete the original message and any copies 
> immediately thereafter.
>
> Thank you.-
> **
> FAFLD
>


Indexing Halts for long time and then restarts

2012-03-16 Thread Husain, Yavar
Since Erick is really active answering now so posting a quick question :)

I am using:
DIH
Solr 3.5 on Windows

Building Auto Recommendation Utility

Having around 1 Billion Query Strings (3-6 words each) in database. Indexing 
them using NGram.

Merge Factor = 30
Auto Commit not set.

DIH halted after indexing 7 million for around 25 minutes and was not showing 
any increment in the Total Documents Processed/Fetched, ofcourse it was doing 
some stuff, was it some merge stuff?. After 25 minutes it started moving again.

Due to this indexing time has increased a lot. Any help will be appreciated.

Thanks.


**This
 message may contain confidential or proprietary information intended only for 
the use of theaddressee(s) named above or may contain information that is 
legally privileged. If you arenot the intended addressee, or the person 
responsible for delivering it to the intended addressee,you are hereby 
notified that reading, disseminating, distributing or copying this message is 
strictlyprohibited. If you have received this message by mistake, please 
immediately notify us byreplying to the message and delete the original 
message and any copies immediately thereafter.

Thank you.~
**
FAFLD



RE: Regarding Indexing Multiple Columns Best Practise

2012-03-16 Thread Husain, Yavar
Thanks Erick!!

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, March 16, 2012 6:58 PM
To: solr-user@lucene.apache.org
Subject: Re: Regarding Indexing Multiple Columns Best Practise

I would *guess* you won't notice much/any difference. Note that, if you use a 
fieldType with the increment gap > 1 (the default is often set to 100), phrase 
queries (slop) will perform differently depending upon which option you choose.

Best
Erick

On Thu, Mar 15, 2012 at 10:49 AM, Husain, Yavar  wrote:
> Say I have around 30-40 fields (SQL Table Columns) indexed using Solr from 
> the database. I concatenate those fields into one field by using Solr 
> copyfield directive and than make it default search field which I search.
>
> If at the database level itself I perform concatenation of all those fields 
> into one field and then index that field directly (it will avoid copy 
> operation of Solr of each field to that concatenated field) will it be a 
> indexing performance improvement? I am sure it will be but will it make a 
> big/huge change in indexing running time?
>
> Thanks
> **
>  This message may contain confidential or 
> proprietary information intended only for the use of the
> addressee(s) named above or may contain information that is legally 
> privileged. If you are not the intended addressee, or the person 
> responsible for delivering it to the intended addressee, you are 
> hereby notified that reading, disseminating, distributing or copying 
> this message is strictly prohibited. If you have received this message by 
> mistake, please immediately notify us by replying to the message and delete 
> the original message and any copies immediately thereafter.
>
> Thank you.-
> **
> 
> FAFLD
>


Re: Solr 3.5.0 - different behaviour on rows?

2012-03-16 Thread Erick Erickson
Well, a lot depends upon the query analysis. Are you using
the *exact* same analysis chains in both? Look at the admin/analysis
page and see how your term evaluates. I'm guessing that
WordDelimiterFilterFactory is being used in the 3.5 case and not
in the 1.4.1 case so the 3.5 case is matching everything in your
index in 3.5 and trying to re turn a huge number of rows.

How many documents are found in the 1.4.1 case as opposed to the 3.5 case?

Best
Erick

On Thu, Mar 15, 2012 at 12:22 PM, Frederico Azeiteiro
 wrote:
> Hi all,
>
>
>
> Just testing SOLR 3.5.0. and notice a different behavior on this new
> version:
>
> select?rows=10&q=sig%3a("54ba3e8fd3d5d8371f0e01c403085a0c")&?
>
>
>
> this query returns no results on my indexes, but works for SOLR 1.4.0
> and returns "Java heap space java.lang.OutOfMemoryError: Java heap
> space" on SOLR 3.5.0
>
>
>
> Is this normal? As there are no results, why the OutOfMemoryError?
>
> Is it some memory allocated based on the rows number?
>
>
>
> Regards,
>
> Frederico
>
>
>


Re: Index-time field boost with DIH

2012-03-16 Thread Erick Erickson
I'd go ahead and do the query time boosts. The "penalty" will
be a single multiplication per doc (I think), and probably not
noticeable. And it's much more flexible/easier...

Best
Erick

On Thu, Mar 15, 2012 at 9:21 PM, Arcadius Ahouansou
 wrote:
> Hello.
>
> I have an SQL database with documents having an ID, TITLE and SUMMARY.
> I am using the DIH to index the data.
>
> In the DIH dataConfig, for every document, I would like to do something
> like:
>
> 
>
> In other words,  "A match on any document's title is worth twice as much as
> a match on other fields"
>
> In my schema, I have omitNorms set to false.
>
> 1) How can I do this in the DIH?
>
> 2) Apart from omitNorms making the index bigger,  I thought that index-time
> boost would give us more performance than doing the very same boosting at
> query time over and over again.
> Is that correct?
>
> 3) I also came across the Lucene FAQ at
> http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F
>
> where the following interesting statement seems to contradict what I'm
> trying to achieve:
>
> *Index time field boosts are worthless if you set them on every document. *
>
> Any hint would be much appreciated.
>
>
> Thanks.
>
> Arcadius.


Re: Indexing Halts for long time and then restarts

2012-03-16 Thread Erick Erickson
Flattery will get you a lot ...

Yeah, I expect you're hitting a merge issue. To test, set up autocommit
to only trigger after a lot of docs are committed. You should see the
time before the big pause change radically (perhaps disappear if
you don't commit until the run is done).

Note that it'll still happen, just not as often. This problem is changed
in 4.0 with the DocumentWriterPerThread stuff (Mike McCandless
wrote a cool blog post on it).


Best
Erick

On Fri, Mar 16, 2012 at 8:27 AM, Husain, Yavar  wrote:
> Since Erick is really active answering now so posting a quick question :)
>
> I am using:
> DIH
> Solr 3.5 on Windows
>
> Building Auto Recommendation Utility
>
> Having around 1 Billion Query Strings (3-6 words each) in database. Indexing 
> them using NGram.
>
> Merge Factor = 30
> Auto Commit not set.
>
> DIH halted after indexing 7 million for around 25 minutes and was not showing 
> any increment in the Total Documents Processed/Fetched, ofcourse it was doing 
> some stuff, was it some merge stuff?. After 25 minutes it started moving 
> again.
>
> Due to this indexing time has increased a lot. Any help will be appreciated.
>
> Thanks.
> 
> 
> **This
>  message may contain confidential or proprietary information intended only 
> for the use of theaddressee(s) named above or may contain information 
> that is legally privileged. If you arenot the intended addressee, or the 
> person responsible for delivering it to the intended addressee,you are 
> hereby notified that reading, disseminating, distributing or copying this 
> message is strictlyprohibited. If you have received this message by 
> mistake, please immediately notify us byreplying to the message and 
> delete the original message and any copies immediately thereafter.
> 
> Thank you.~
> **
> FAFLD
> 


Re: Solr in Windows XP + JDK 5 + Tomcat 6.0.13

2012-03-16 Thread danchoithuthiet
Hi  Alejandro, 
I followed your instructions step by step, but it still isn't working

HTTP Status 404 - /solr/admin 
type Status report 
message /solr/admin 
description The requested resource (/solr/admin) is not available. 

I used
Apache Tomcat/6.0.35
Xampp 1.7.7
Sun JDK 7 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-in-Windows-XP-JDK-5-Tomcat-6-0-13-tp484010p3831908.html
Sent from the Solr - User mailing list archive at Nabble.com.


mailto: scheme aware tokenizer

2012-03-16 Thread Kai Gülzau
Is there any analyzer out there which handles the mailto: scheme?

UAX29URLEmailTokenizer seems to split at the wrong place:

mailto:t...@example.org ->
mailto:test
example.org

As a workaround I use

mailto:"; 
replacement="mailto: "/>

Regards,

Kai Gülzau

novomind AG
__

Bramfelder Straße 121 • 22305 Hamburg

phone +49 (0)40 808071138 • fax +49 (0)40 808071-100
email kguel...@novomind.com • http://www.novomind.com

Vorstand : Peter Samuelsen (Vors.) • Stefan Grieben • Thomas Köhler
Aufsichtsratsvorsitzender: Werner Preuschhof
Gesellschaftssitz: Hamburg • HR B93508 Amtsgericht Hamburg


Re: Master/Slave switch on teh fly. Replication

2012-03-16 Thread stockii
i have 8 cores ;-)

i thought that replication is defined in solrconfig.xml and this file is
only load on startup and i cannot change master to slave and slave to master
without restarting the servlet-container ?!?!?!

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 
1 Core with 45 Million Documents other Cores < 200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Master-Slave-switch-on-the-fly-Replication-tp3828313p3831948.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Master/Slave switch on teh fly. Replication

2012-03-16 Thread Michael Kuhlmann

Am 16.03.2012 15:05, schrieb stockii:

i have 8 cores ;-)

i thought that replication is defined in solrconfig.xml and this file is
only load on startup and i cannot change master to slave and slave to master
without restarting the servlet-container ?!?!?!


No, you can reload the whole core at any time, without interruption. 
Even with a new solrconfig.xml.


You can even add a new core at runtime, fill it with data and switch 
cores afterwards.


See http://wiki.apache.org/solr/CoreAdmin for details.

-Kuli


SolrJ Request issue when trying to add a PDF file to Index

2012-03-16 Thread Jones, Rhys
Hello,

I'm having trouble adding a pdf file to my index.  It's multicored.  My server 
object instantiates properly (StreamingUpdateSolrServer).  In my request object 
(ContentStreamUpdateRequest) I add a couple of literals to populate fields in 
the index that the parsed content of the PDF won't populate.  I've verified 
that the fields are spelled in accordance with the fields in the schema.txt.

I use the addFile() method of the ContentStreamUpdateRequest object and that 
isn't an issue.

However, when I use the .request() method off of the server 
(StreamingUpdateSolrServer)object, I get a 404 Not Found Not Found error.

I've checked the solrconfig.xml and the /update request handler is there...so 
what am I doing incorrectly?  By the examples, this should work.

Thanks,
Rhys

Rhys Jones
Associate Software Engineer
The TriZetto Group, Inc.
Phone (630) 428-5038
CONFIDENTIALITY NOTICE: This electronic message transmission is intended only 
for the person or the entity to which it is addressed and may contain 
information that is privileged, confidential or otherwise protected from 
disclosure, including personal health or other information which may be 
protected by federal or state law. If you have received this transmission, but 
are not the intended recipient, you are hereby notified that any disclosure, 
copying, distribution or use of the contents of this information is strictly 
prohibited. If you have received this e-mail in error, please contact the 
sender of the e-mail and destroy the original message and all copies.



Spellchecker problem

2012-03-16 Thread Finotti Simone
Hello,
I have this configuration where a single master builds the Solr index and it 
replicates to two slave Solr instances. Regular queries are sent only to those 
two slaves. Configurations are the same for everyone (except of replication 
section, of course).

My problem: it's happened that, in a particular query, I expected spellchecker 
to give me a suggestion. Fact is that only one of the two instances answers as 
I had expected! I checked the data directory and discovered that the failing 
instance had a data/spellchecker directory almost empty (12 KB against 7 MB of 
the other working instance). I don't understand this behaviour.

I tried to issue a spellchecker.build=true command, and this is what I've got:


Problem accessing /solr/yoox_slave/select. Reason:

org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
NativeFSLock@C:\Users\sqladmin\LucidImagination\LucidWorksEnterprise\data\solr\cores\yoox_slave_1\spellchecker\write.lock

java.lang.RuntimeException: org.apache.lucene.store.LockObtainFailedException: 
Lock obtain timed out: 
NativeFSLock@C:\Users\sqladmin\LucidImagination\LucidWorksEnterprise\data\solr\cores\yoox_slave_1\spellchecker\write.lock
at 
org.apache.solr.spelling.IndexBasedSpellChecker.build(IndexBasedSpellChecker.java:92)
at 
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:110)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1406)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:129)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:59)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:122)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:110)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
out: 
NativeFSLock@C:\Users\sqladmin\LucidImagination\LucidWorksEnterprise\data\solr\cores\yoox_slave_1\spellchecker\write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:840)
at 
org.apache.lucene.search.spell.SpellChecker.clearIndex(SpellChecker.java:470)
at 
org.apache.solr.spelling.IndexBasedSpellChecker.build(IndexBasedSpellChecker.java:88)
... 27 more


Does anybody faced a similar problem? Can you point me to the solution?


Thank you in advance


Re: Apache solr issue after configuration

2012-03-16 Thread Richard Noble
Solr newbie here, but this looks familier.

Another thing to make sure of is that the plugin jars are not ialready
loaded from the standard java classpath.
I had a problem with this in that some jars were being loaded by the
standard java classloader,
and my some other plugins were being loaded by Solr,
so QueryResponseWriter was not an instance of
VelocityResponseWriter due to the classloader differences.

They should be loaded by Solr's classloader.

Regards

Richard

On Fri, Mar 16, 2012 at 1:24 PM, Erick Erickson wrote:

> At a guess, you don't have any paths to solr dist. Try copying all the
> other lib
> directives from the example (not core) dir (adjusting paths as necessary).
> The
> error message indicates you aren't getting to
> /dist/apache-solr-velocity-3.5.0.jar
>
> Best
> Erick
>
> On Thu, Mar 15, 2012 at 9:48 AM, ViruS  wrote:
> > Hello,
> >
> > I have still same problem after installation.
> > Files are loaded:
> >
> > ~/appl/apache-solr-3.5.0/example $ java -Dsolr.solr.home=multicore/ -jar
> > start.jar 2>&1 | grep contrib
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-2.0.jar'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-beanutils-NOTICE.txt'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-NOTICE.txt'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-collections-NOTICE.txt'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-1.6.4.jar'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-NOTICE.txt'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-LICENSE-ASL.txt'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-collections-3.2.1.jar'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-beanutils-1.7.0.jar'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-beanutils-LICENSE-ASL.txt'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-LICENSE-ASL.txt'
> > to classloader
> > INFO: Adding
> >
> 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-collections-LICENSE-ASL.txt'
> > to classloader
> >
> >
> > my config multicore/ac/conf/solrconfig.xml
> >
> > 
> >  LUCENE_35
> >  
> > ...
> >  > enable="true"/>
> > 
> >
> > And I still get error:
> >
> > INFO: [ac] Opening new SolrCore at multicore/ac/,
> > dataDir=multicore/ac/data/
> > 2012-03-15 13:18:11 org.apache.solr.core.JmxMonitoredMap 
> > INFO: No JMX servers found, not exposing Solr information with JMX.
> > 2012-03-15 13:18:11 org.apache.solr.common.SolrException log
> > SEVERE: org.apache.solr.common.SolrException: Error Instantiating
> > QueryResponseWriter, solr.VelocityResponseWriter is not a
> > org.apache.solr.response.QueryResponseWriter
> >at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:427)
> > ...
> >
> > What's wrong?
> >
> > Thanks in advanced for help!
> >
> >
> >
> > --
> > Piotr (ViruS) Sikora
> > vi...@cdt.pl
> > svi...@gmail.com
> > JID: vi...@ipc.net.pl
>



-- 
*nix has users, Mac has fans, Windows has victims.


Re: Filter Queries: Intersection

2012-03-16 Thread Alexander Golubowitsch

http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

->  That's an excellent read - thanks a lot for the heads-up!

Kind regards,
 Alex

On 16.03.2012 14:08, Erick Erickson wrote:

Your problem is that you're saying with the -myField:* "Remove from
the result set all documents with any value in myField", which is not
what you want. Lucene query language is not strictly boolean logic,
here's an excellent writeup:

http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

you want something like
(myfield:bob myfield:alice) (*:* -myfield:*)

Best
Erick

On Thu, Mar 15, 2012 at 10:23 AM, Alexander Golubowitsch
  wrote:

Hi all,

  I'm facing problems regarding multiple Filter Queries in SOLR 1.4.1 - I
hope some one will be able to help.

Example 1 - works fine: {!tag=myfieldtag}(-(myfield:*))
Example 2 - works fine: {!tag=myfieldtag}((myfield:"Bio" | myfield:"Alexa"))

Please note that in Example 2, result sets of individual filter queries do
not intersect; 'OR' just works as expected.

Example 3 - no results: {!tag=myfieldtag}((myfield:"Bio" | myfield:"Alexa")
| -(myfield:*))

What I am trying to generate is results having either no value for myfield,
or any of {"Alexa", "Bio"}.

How would I acomplish that?
I have tried all combinations/positions of brackets, +/- etc., without
success.

Thanks a lot for any advice!

Kind regards,
  Alex


Re: Field Value Substitution

2012-03-16 Thread Erick Erickson
I guess I don't quite understand. If the description field
is single valued, simply specifying that field on the
fl parameter should return it.

It would help if you showed some sample documents,
because I can't tell whether you only have one descriptor
per document or several

By the way, you'll save yourself some trouble in the future if
you let your "id" field be your  and use some
different field for non uniqueKey data

Best
Erick

On Thu, Mar 15, 2012 at 2:13 AM, tosenthu  wrote:
> Hi
>
> I have a scenario, where I store a field which is an Id,
>
> ID field
> --
> 1
> 3
> 4
>
> Descrption mapping
> ---
> 1 = "Options 1"
> 2 = "Options A"
> 3 = "Options 3"
> 4 = "Options 4a"
>
> Is there a way in solr when ever i query this field should return me the
> description instead of the id. And help me with the procedure to setup solr
> to do this..
>
> Regards
> Senthil Kumar M R
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Field-Value-Substitution-tp3828028p3828028.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Maybe switching to Solr Cores

2012-03-16 Thread Mike Austin
I'm trying to understand the difference between multiple Tomcat indexes
using context fragments versus using one application with multiple cores?
Since I'm currently using tomcat context fragments to run 7 different
indexes, could I get help understanding more why I would want to use solr
cores instead? or if I would?

>From reading the documentation here are the main points that I see..

- manage them as a single application
- create new indexes on the fly by spinning up new SolrCores
- even make one SolrCore replace another SolrCore without ever restarting
your Servlet Container.

It seems that the biggest real-world advantage is the ability to control
core creation and replacement with no downtime.  The negative would be the
isolation however the are still somewhat isolated.  What other benefits and
common real-world situations would you use to talk me into switching to
Solr cores?

I'm guessing the replication works the same..

Thanks,
Mike


Re: Maybe switching to Solr Cores

2012-03-16 Thread Michael Kuhlmann

Am 16.03.2012 16:42, schrieb Mike Austin:

It seems that the biggest real-world advantage is the ability to control
core creation and replacement with no downtime.  The negative would be the
isolation however the are still somewhat isolated.  What other benefits and
common real-world situations would you use to talk me into switching to
Solr cores?


Different Solr cores already are quite isolated: They use different 
configs, different caches, different readers, different handlers...


In fact, there is not much more common between Solr cores except the 
solr.xml configuration.


One additional advantage is that cores need less footprint in Tomcat 
than fully deployed Solr web applications.


I don't see a single drawback of multiples cores in contrast to multiple 
web apps


...except one, but that has nothing to do with Solr, only with the JVM 
itself: If you have large hardware environment with lots of RAM, than it 
might be better to have multiple Tomcat instances running in different 
OS processes. The reason is Java's garbage collector that works better 
with not-so-huge memory.


Sometimes it might be even better to have two or four replicated Solr 
instances in different Tomcat processes than just one. You'll avoid 
longer stop-the-world pauses with Java's GC as well.


However, this depends on the environment and needs to be avaluated as 
well...


-Kuli


Java Server and PHP server

2012-03-16 Thread Spadez
Hi,

Call me crazy, but I don’t like the idea of having a single server which not
only runs my PHP site on Apache, but also runs SOLR and Nutch, inclusive of
Tomcat.

Is it a terrible idea to have one Rackspace VPS account which runs the PHP
site with MYSQL database, and another rackspace account which runs Tomcat,
Solr and Nutch? Then access SOLR via HTTP from the PHP server.

I know there may be some increase latency from the fact that there are two
servers, but it just seems like I might end up with a more stable platform
this way, and less prone to conflict. 

Regards,

James


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Java-Server-and-PHP-server-tp3832290p3832290.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problems with DisjunctionMaxQuery and early-termination

2012-03-16 Thread Carlos Gonzalez-Cadenas
On Fri, Mar 16, 2012 at 9:26 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello Carlos,
>

Hello Mikhail:

Thanks for your answer.


>
> I have two concerns about your approach. First-K (not top-K honestly)
> collector approach impacts recall of your search and using disjunctive
> queries impacts precision e.g. I want to find some fairly small and quiet,
> and therefore unpopular "Lemond Hotel" you parse my phrase into Lemond OR
> Hotel and return 1K of popular hotels but not Lemond one because it's
> nearly a hapax. So, I don't believe that it's a great search.
>

Yes, I agree that OR queries combined with top-K (or first-K as you say)
doesn't work very well (your results will be full of very popular yet not
very precise matches) and this is also what I tried to explain in my email.


> And the also concern from the end of your letter is about joining separate
> query result. I'd like to remind that absolute scores from the different
> queries are not comparable at all, and maybe, but I'm not sure the relative
> ones scaled by max score are comparable.

 I suppose you need conjunctive queries instead. And the great stuff about
> them is "not-found for free" getting the zero result found cost is
> proportional to number of query terms i.e. miserable.
> so, search all terms with MUST first, you've got the best result in terms
> of precision and recall if you've got something. Otherwise you still have a
> lot of time. You need to drop one of the words or switch ones of them into
> SHOULD.


Agree, this is precisely what we're trying to do (the idea of having
multiple queries, from narrow to broad). My question was more of a
practical nature, that is, how can we do these queries without really
having to do independent SOLR queries. Now we use DisjunctionMaxQuery, but
it has the problems that I described in my former email w.r.t.
early-termination.

This morning we found two potential directions that might work (we're
testing them as of now):

   1. Implement a custom RequestHandler and execute several queries within
   SOLR (https://issues.apache.org/jira/browse/SOLR-1093). This is better
   than executing them from outside and having all the network / HTTP / ...
   overhead, but still not very good.
   2. Modify DisjunctionMaxQuery. In particular, modifying DisjunctionMaxScorer
   so that it doesn't use a min heap for the subscorers. We'll try several
   strategies to collect documents from the child subscorers, like round-robin
   or collecting the narrower subscorers first and then go broader until the
   upstream collector stops the collection. This looks like the most
   interesting option.


Enumerating all combinations is NPcomplete task I believe. But you have a
> good heuristics:
> * zero docFreq means that you can drop this term off or pass it through
> spell correction
> * if you have a instant suggest like app and has zero result for some
> phrase, maybe dropping the last word gives you the phrase which had some
> results before, and present in cache.
> * otherwise excluding less frequent term from conjunction probably gives
> non-zero results
>

This is not a problem in practice. We're using a bunch of heuristics in our
QueryParser (including a lot of info extracted from the TermsEnum, stopword
lists, etc ...) to severely cut the space.

Thanks
Carlos



>
> Regards
>
>
> On Thu, Mar 15, 2012 at 12:01 AM, Carlos Gonzalez-Cadenas <
> c...@experienceon.com> wrote:
>
>> Hello all,
>>
>> We have a SOLR index filled with user queries and we want to retrieve the
>> ones that are more similar to a given query entered by an end-user. It is
>> kind of a "related queries" system.
>>
>> The index is pretty big and we're using early-termination of queries (with
>> the index sorted so that the "more popular" queries have lower docids and
>> therefore the termination yields higher-quality results)
>>
>> Clearly, when the user enters a user-level query into the search box, i.e.
>> "cheap hotels barcelona offers", we don't know whether there exists a
>> document (query) in the index that contains these four words or not.
>>  Therefore, when we're building the SOLR query, the first intuition would
>> be to do a query like this "cheap OR hotels OR barcelona OR offers".
>>
>> If all the documents in the index where evaluated, the results of this
>> query would be good. For example, if there is no query in the index with
>> these four words but there's a query in the index with the text "cheap
>> hotels barcelona", it will probably be one of the top results, which is
>> precisely what we want.
>>
>> The problem is that we're doing early termination and therefore this query
>> will exhaust very fast the top-K result limit (our custom collector limits
>> on the number of evaluated documents), given that queries like "hotels in
>> madrid" or "hotels in NYC" will match the OR expression described above
>> (because they all match "hotels").
>>
>> Our next step was to think in a DisjunctionMaxQuery, trying

Re: Java Server and PHP server

2012-03-16 Thread Erick Erickson
It's really up to you. All any app needs to connect to Solr is the HTTP
connection, even if you use something like SolrJ. Yes, there'll
be some latency but I suspect you'll only really notice that if you're
trying to index massive amounts of data across the wire.

Best
Erick

On Fri, Mar 16, 2012 at 11:03 AM, Spadez  wrote:
> Hi,
>
> Call me crazy, but I don’t like the idea of having a single server which not
> only runs my PHP site on Apache, but also runs SOLR and Nutch, inclusive of
> Tomcat.
>
> Is it a terrible idea to have one Rackspace VPS account which runs the PHP
> site with MYSQL database, and another rackspace account which runs Tomcat,
> Solr and Nutch? Then access SOLR via HTTP from the PHP server.
>
> I know there may be some increase latency from the fact that there are two
> servers, but it just seems like I might end up with a more stable platform
> this way, and less prone to conflict.
>
> Regards,
>
> James
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Java-Server-and-PHP-server-tp3832290p3832290.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Error while trying to load JSON

2012-03-16 Thread Chambeda
I am trying to load a json document that has the following structure:
...
"accessoriesImage": null,
  "department": "ET",
  "shipping": [
{
  "nextDay": 10.19,
  "secondDay": 6.45,
  "ground": 1.69
}
  ],
  "preowned": false,
  "format": "CD",
...

When executing the curl request to store the document in solr I get the
following error:

p>Problem accessing /solr/update/json. Reason:
invalid key: nextDay [948]/Powered by
Jetty:///

the JSON is valid, so I am not sure what I need to do to get this to pass. 
Any ideas?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-while-trying-to-load-JSON-tp3832518p3832518.html
Sent from the Solr - User mailing list archive at Nabble.com.


Performance Question

2012-03-16 Thread Jamie Johnson
I'm curious if anyone tell me how Solr/Lucene performs in a situation
where you have 100,000 documents each with 100 tokens vs having
1,000,000 documents each with 10 tokens.  Should I expect the
performance to be the same?  Any information would be greatly
appreciated.


Adding a 'Topics' pulldown for refined initial searches.

2012-03-16 Thread Valentin, AJ
Hello all,

Yesterday was my first time using this (or any) email list and I think I did 
something wrong.  Anyways, I will try this again.


I have installed Solr search on my Drupal 7 installation.  Currently, it works 
as an 'All' search tool.  I'd like to limit the scope of the search with an 
available pull-down to set the topic for searching.

If I've researched correctly, I think my term 'scope' or 'topic' is the same as 
'clustering'...I may be wrong.

Here is a link to a screenshot for what I have to get implemented soon.
http://imageupload.org/en/file/200809/solr-scope.png.html

Regards,
AJ


CONFIDENTIALITY NOTICE:  This email constitutes an electronic communication 
within the meaning of the Electronic Communications Privacy Act, 18 U.S.C. 
2510, and its disclosure is strictly limited to the named recipient(s) intended 
by the sender of this message.  This email, and any attachments, may contain 
confidential and/or proprietary information of Scientific Research Corporation. 
 If you are not a named recipient, any copying, using, disclosing or 
distributing to others the information in this email and attachments is 
STRICTLY PROHIBITED.  If you have received this email in error, please notify 
the sender immediately and permanently delete the email, any attachments, and 
all copies thereof from any drives or storage media and destroy any printouts 
or hard copies of the email and attachments.

EXPORT COMPLIANCE NOTICE:  This email and any attachments may contain technical 
data subject to U.S export restrictions under the International Traffic in Arms 
Regulations (ITAR) or the Export Administration Regulations (EAR).  Export or 
transfer of this technical data and/or related information to any foreign 
person(s) or entity(ies), either within the U.S. or outside of the U.S., may 
require advance export authorization by the appropriate U.S. Government agency 
prior to export or transfer.  In addition, technical data may not be exported 
or transferred to certain countries or specified designated nationals 
identified by U.S. embargo controls without prior export authorization.  By 
accepting this email and any attachments, all recipients confirm that they 
understand and will comply with all applicable ITAR, EAR and embargo compliance 
requirements.



Re: Error while trying to load JSON

2012-03-16 Thread Erick Erickson
I don't believe Solr indexes arbitrary JSON, just as it
does not index arbitrary XML. You need the input
to be quite specific to how Solr expects the data,
it's a relatively flat structure. There is an example
in /solr/example/exampledocs/books.json that
will give you an idea of the expected format.

Best
Erick

On Fri, Mar 16, 2012 at 12:19 PM, Chambeda  wrote:
> I am trying to load a json document that has the following structure:
> ...
> "accessoriesImage": null,
>  "department": "ET",
>  "shipping": [
>    {
>      "nextDay": 10.19,
>      "secondDay": 6.45,
>      "ground": 1.69
>    }
>  ],
>  "preowned": false,
>  "format": "CD",
> ...
>
> When executing the curl request to store the document in solr I get the
> following error:
>
> p>Problem accessing /solr/update/json. Reason:
>     invalid key: nextDay [948]/Powered by
> Jetty:///
>
> the JSON is valid, so I am not sure what I need to do to get this to pass.
> Any ideas?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Error-while-trying-to-load-JSON-tp3832518p3832518.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error while trying to load JSON

2012-03-16 Thread Chambeda
Ok, so my issue is that it must be a flat structure.  Why isn't the JSON
parser able to deconstruct the object into a flatter structure for indexing?
Shouldn't it be able to take any valid JSON structure?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-while-trying-to-load-JSON-tp3832518p3832611.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error while trying to load JSON

2012-03-16 Thread Erick Erickson
bq: Shouldn't it be able to take any valid JSON structure?

No, that was never the intent. The intent here was just to provide
a JSON-compatible format for indexing data for those who
don't like/want to use XML or SolrJ or Solr doesn't index arbitrary
XML either. And I have a hard time imagining what the
schema.xml file would look like when trying to map
arbitrary JSON (or XML or) into fields.

Best
Erick

On Fri, Mar 16, 2012 at 12:54 PM, Chambeda  wrote:
> Ok, so my issue is that it must be a flat structure.  Why isn't the JSON
> parser able to deconstruct the object into a flatter structure for indexing?
> Shouldn't it be able to take any valid JSON structure?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Error-while-trying-to-load-JSON-tp3832518p3832611.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: problems with DisjunctionMaxQuery and early-termination

2012-03-16 Thread Mikhail Khludnev
On Fri, Mar 16, 2012 at 8:38 PM, Carlos Gonzalez-Cadenas <
c...@experienceon.com> wrote:

> On Fri, Mar 16, 2012 at 9:26 AM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
>> Hello Carlos,
>>
>
>> so, search all terms with MUST first, you've got the best result in terms
>> of precision and recall if you've got something. Otherwise you still have a
>> lot of time. You need to drop one of the words or switch ones of them into
>> SHOULD.
>
>
> Agree, this is precisely what we're trying to do (the idea of having
> multiple queries, from narrow to broad). My question was more of a
> practical nature, that is, how can we do these queries without really
> having to do independent SOLR queries.
>

Sorry I forget this question. I did a tiny DelegateRequestHandler
which sequentially iterates through list of other request handlers until
the first not empty results has been found. Every of slave RequestHandlers
has own QParser params e.g. I search for conjunction, then apply spelling
correction, and after that I search for disjunction. You can see from my
"theory" that the total time is equal to the time of the last successful
search.


> Now we use DisjunctionMaxQuery, but it has the problems that I described
> in my former email w.r.t. early-termination.
>
> This morning we found two potential directions that might work (we're
> testing them as of now):
>
>1. Implement a custom RequestHandler and execute several queries
>within SOLR (https://issues.apache.org/jira/browse/SOLR-1093). This is
>better than executing them from outside and having all the network / HTTP /
>... overhead, but still not very good.
>2. Modify DisjunctionMaxQuery. In particular, modifying 
> DisjunctionMaxScorer
>so that it doesn't use a min heap for the subscorers. We'll try several
>strategies to collect documents from the child subscorers, like round-robin
>or collecting the narrower subscorers first and then go broader until the
>upstream collector stops the collection. This looks like the most
>interesting option.
>
>
> Enumerating all combinations is NPcomplete task I believe. But you have a
>> good heuristics:
>> * zero docFreq means that you can drop this term off or pass it through
>> spell correction
>> * if you have a instant suggest like app and has zero result for some
>> phrase, maybe dropping the last word gives you the phrase which had some
>> results before, and present in cache.
>> * otherwise excluding less frequent term from conjunction probably gives
>> non-zero results
>>
>
> This is not a problem in practice. We're using a bunch of heuristics in
> our QueryParser (including a lot of info extracted from the TermsEnum,
> stopword lists, etc ...) to severely cut the space.
>
> Thanks
> Carlos
>
>
>
>>
>> Regards
>>
>>
>> On Thu, Mar 15, 2012 at 12:01 AM, Carlos Gonzalez-Cadenas <
>> c...@experienceon.com> wrote:
>>
>>> Hello all,
>>>
>>> We have a SOLR index filled with user queries and we want to retrieve the
>>> ones that are more similar to a given query entered by an end-user. It is
>>> kind of a "related queries" system.
>>>
>>> The index is pretty big and we're using early-termination of queries
>>> (with
>>> the index sorted so that the "more popular" queries have lower docids and
>>> therefore the termination yields higher-quality results)
>>>
>>> Clearly, when the user enters a user-level query into the search box,
>>> i.e.
>>> "cheap hotels barcelona offers", we don't know whether there exists a
>>> document (query) in the index that contains these four words or not.
>>>  Therefore, when we're building the SOLR query, the first intuition would
>>> be to do a query like this "cheap OR hotels OR barcelona OR offers".
>>>
>>> If all the documents in the index where evaluated, the results of this
>>> query would be good. For example, if there is no query in the index with
>>> these four words but there's a query in the index with the text "cheap
>>> hotels barcelona", it will probably be one of the top results, which is
>>> precisely what we want.
>>>
>>> The problem is that we're doing early termination and therefore this
>>> query
>>> will exhaust very fast the top-K result limit (our custom collector
>>> limits
>>> on the number of evaluated documents), given that queries like "hotels in
>>> madrid" or "hotels in NYC" will match the OR expression described above
>>> (because they all match "hotels").
>>>
>>> Our next step was to think in a DisjunctionMaxQuery, trying to write a
>>> query like this:
>>>
>>> DisjunctionMaxQuery:
>>>  1) +cheap +hotels +barcelona +offers
>>>  2) +cheap +hotels +barcelona
>>>  3) +cheap +hotels
>>>  4) +hotels
>>>
>>> We were thinking that perhaps the sub-queries within the
>>> DisjunctionMaxQuery were going to get evaluated in "parallel" given that
>>> they're separated queries, but in fact from a runtime perspective it does
>>> behave in a similar way than the OR query that we described above.
>>>
>>> Our desired behavior is to try m

Re: Performance Question

2012-03-16 Thread Mikhail Khludnev
Hello,

Frankly speaking the computational complexity of Lucene search depends from
size of search result: numFound*log(start+rows), but from size of index.

Regards

On Fri, Mar 16, 2012 at 9:34 PM, Jamie Johnson  wrote:

> I'm curious if anyone tell me how Solr/Lucene performs in a situation
> where you have 100,000 documents each with 100 tokens vs having
> 1,000,000 documents each with 10 tokens.  Should I expect the
> performance to be the same?  Any information would be greatly
> appreciated.
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-16 Thread Matthew Parker
I'm still having issues replicating in my work environment. Can anyone
explain how the replication mechanism works? Is it communicating across
ports or through zookeeper to manager the process?




On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:

> All,
>
> I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
> apache-solr-4.0-2012-02-29_09-07-30) , sent some document through Manifold
> using its crawler, and it looks like it's replicating fine once the
> documents are committed.
>
> This must be related to my environment somehow. Thanks for your help.
>
> Regards,
>
> Matt
>
> On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson wrote:
>
>> Matt:
>>
>> Just for paranoia's sake, when I was playing around with this (the
>> _version_ thing was one of my problems too) I removed the entire data
>> directory as well as the zoo_data directory between experiments (and
>> recreated just the data dir). This included various index.2012
>> files and the tlog directory on the theory that *maybe* there was some
>> confusion happening on startup with an already-wonky index.
>>
>> If you have the energy and tried that it might be helpful information,
>> but it may also be a total red-herring
>>
>> FWIW
>> Erick
>>
>> On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller 
>> wrote:
>> >> I assuming the windows configuration looked correct?
>> >
>> > Yeah, so far I can not spot any smoking gun...I'm confounded at the
>> moment. I'll re read through everything once more...
>> >
>> > - Mark
>>
>
>


Re: Error while trying to load JSON

2012-03-16 Thread Pulkit Singhal
It seems that you are using the bbyopen data. If have made up your mind on
using the JSON data then simply store it in ElasticSearch instead of Solr
as they do take any valid JSON structure. Otherwise, you can download the
xml archive from bbyopen and prepare a schema:

Here are some generic instructions to familiarize you with building schema
given arbitrary data, it should help speed things up, they don't apply
directly to bbyopen data though:
http://pulkitsinghal.blogspot.com/2011/10/import-dynamic-fields-from-xml-into.html
http://pulkitsinghal.blogspot.com/2011/09/import-data-from-amazon-rss-feeds-into.html

Keep in mind, ES also does you a favor by building the right schema
dynamically on the fly as you feed it the JSON data. So it is much easier
to work with.

On Fri, Mar 16, 2012 at 1:26 PM, Erick Erickson wrote:

> bq: Shouldn't it be able to take any valid JSON structure?
>
> No, that was never the intent. The intent here was just to provide
> a JSON-compatible format for indexing data for those who
> don't like/want to use XML or SolrJ or Solr doesn't index arbitrary
> XML either. And I have a hard time imagining what the
> schema.xml file would look like when trying to map
> arbitrary JSON (or XML or) into fields.
>
> Best
> Erick
>
> On Fri, Mar 16, 2012 at 12:54 PM, Chambeda  wrote:
> > Ok, so my issue is that it must be a flat structure.  Why isn't the JSON
> > parser able to deconstruct the object into a flatter structure for
> indexing?
> > Shouldn't it be able to take any valid JSON structure?
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Error-while-trying-to-load-JSON-tp3832518p3832611.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Extract terms of a query to do highlighting

2012-03-16 Thread Nicolas Labrot
Hello,

I want to do highlighting by "hand" into my indexed document which can be
XML, HTML, PDF, SVG, CGM...

Given a search query I want to be able to extract all the terms occurring
in this query to be able to do custom highlighting on the results. The
returned terms should be coherent with the Analyzer.

I do not find any methods to do it. When I was using pure Lucene I was
using Query.extractTerms.

Is there any methods to answer my need ?

Thanks for your help,

Nicolas


suggestions on automated testing for solr output

2012-03-16 Thread geeky2
hello all,

i know this is never a fun topic for people, but our SDLC mandates that we
have unit test cases that attempt to validate the output from specific solr
queries.

i have some ideas on how to do this, but would really appreciate feedback
from anyone that has done this or is doing it now.

the ideal situation (for this environment) would be something script based
and automated.

thanks for any input,
mark


--
View this message in context: 
http://lucene.472066.n3.nabble.com/suggestions-on-automated-testing-for-solr-output-tp3833049p3833049.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-16 Thread vybe3142
Hi,
Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP
streaming. 

Use case: 
* Text Files to be indexed are on file server (A) (some potentially large -
several 100 MB)
* SOLRJ client is on server (B)
* SOLR server is on server (C) running with dynamically created SOLR cores

Looking at how ContentStreamUpdateRequest is typically used in SOLRJ, it
looks like the files would be read from A to the client on B (across the
wire) and then sent across the wire via an HTTP request (in the body) to C
to be indexed. 

Is there a more efficient way to accomplish this i.e. pass a path to the
file when making the request from B so that the SOLR server on C can read
directly from file server A ?

Thanks


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3833419.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sorting Index Results by User's Score

2012-03-16 Thread Phill Tornroth
I'm puzzled on whether or not Solr is the right system for solving this
problem I've got. I'm using some Solr indexes for autocompletion, and I
have a desire to rank the results by their value to the requesting user.
Essentially, I'll tally the number of times the user has chosen particular
results, and I have a need to include that value in the process of sorting
and limiting results.

This doesn't seem like a request that would be un-typical, but I'm
wondering how Solr experts suggest it be done? It seems impractical to hold
my scores elsewhere and ask Solr for unlimited results and then do the
ordering/limiting on my side.. but I don't see an obvious way to do this
within Solr itself, though the JOIN functionality and the Function Query
stuff look like they might be a part of the right solution.

Any help would be greatly appreciated.

Thanks!

Phill


Re: suggestions on automated testing for solr output

2012-03-16 Thread Gora Mohanty
On 17/03/2012, geeky2  wrote:
> hello all,
>
> i know this is never a fun topic for people, but our SDLC mandates that we
> have unit test cases that attempt to validate the output from specific solr
> queries.
>
> i have some ideas on how to do this, but would really appreciate feedback
> from anyone that has done this or is doing it now.
[...]

Query responses are XML (you can also get JSON), so
parsing these for validity should be straightforward. If
your test case has to include particular results, that
will be specific to your index, and search terms.

Regards,
Gora


Any way to get reference to original request object from within Solr component?

2012-03-16 Thread SUJIT PAL
Hello,

I have a custom component which depends on the ordering of a multi-valued 
parameter. Unfortunately it looks like the values do not come back in the same 
order as they were put in the URL. Here is some code to explain the behavior:

URL: /solr/my_custom_handler?q=something&myparam=foo&myparam=bar&myparam=baz

Inside my component's process(ResponseBuilder) method, I do the following:

public void process(ResponseBuilder rb) throws IOException {
  String[] myparams = rb.req.getParams().getParams("myparam");
  System.out.println("myparams=" + ArrayUtils.toString(myparams);
  ...
}

and I notice that the values are ordered differently than ["foo", "bar", "baz"] 
that I would have expected. I am guessing its because the SolrParams is a 
MultiMap structure, so order is destroyed on its way in.

My question is:
1) is there a setting in Solr can use to enforce ordering of multi-valued 
parameters? I suppose I could use a single parameter with comma-separated 
values, but its a bit late to do that now...
2) is it possible to use a specific SolrParams object that preserves order? If 
so how?
3) is it possible to get a reference to the HTTP request object from within a 
component? If so how?

I am on Solr version 3.2.0.

Thanks in advance for any help you can provide,

Sujit