Solr Authentication Problem

2009-06-22 Thread Allahbaksh Asadullah
Hi All,
I am facing getting error when I am using Authentication in Solr. I
followed Wiki. The error doesnot appear when I searching. Below is the
code snippet and the error.

Please note I am using Solr 1.4 Development build from SVN.


HttpClient client=new HttpClient();

AuthScope scope = new AuthScope(AuthScope.ANY_HOST,
AuthScope.ANY_PORT,null, null);

client.getState().setCredentials(

   scope,

new UsernamePasswordCredentials("guest", 
"guest")

);

SolrServer server =new
CommonsHttpSolrServer("http://localhost:8983/solr",client);





SolrInputDocument doc1=new SolrInputDocument();

//Add fields to the document

doc1.addField("employeeid", "1237");

doc1.addField("employeename", "Ann");

doc1.addField("employeeunit", "etc");

doc1.addField("employeedoj", "1995-11-31T23:59:59Z");

server.add(doc1);





Exception in thread "main"
org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity
enclosing request can not be repeated.

at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:468)

at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)

at 
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)

at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)

at test.SolrAuthenticationTest.(SolrAuthenticationTest.java:49)

at test.SolrAuthenticationTest.main(SolrAuthenticationTest.java:113)

Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
entity enclosing request can not be repeated.

at 
org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)

at 
org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)

at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)

at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)

at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)

at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)

at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)

at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:415)

... 5 more.

Thanks and regards,
Allahbaksh


Termscomponent and filter queries

2009-06-22 Thread Ingo Renner

Hi *,

currently the terms component does not support filter queries.  
However, without them the returned count for the terms might differ to  
the actual results the user gets when conducting a search with a  
suggested word and (automatically) applied filter queries.


So, are there any plans to add filter query support to the terms  
component?



best
Ingo

--
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2





howto understand solr stats

2009-06-22 Thread Julian Davchev
Hi
Where can I read about understanding solr stats.
I got this in cache section but kinda not talking too much to me.

lookups : 149272
hits : 135267
hitratio : 0.90
inserts : 14018
evictions : 13506
size : 512
warmupTime : 0
cumulative_lookups : 7188459
cumulative_hits : 5429817
cumulative_hitratio : 0.75
cumulative_inserts : 1758642
cumulative_evictions : 812185


Re: Auto suggest.. how to do mixed case

2009-06-22 Thread Shalin Shekhar Mangar
On Fri, Jun 19, 2009 at 12:50 PM, Ian Holsman  wrote:

> I've noticed that one of the new features in Solr 1.4 is the Termscomponent
> which enables the Autosuggest.
>

TermsComponent *can* be used for autosuggest though I don't think that was
the original motivation. In the end it just the same thing as a prefix but
returns the indexed tokens only rather than the stored field values. I think
that by naming it as /autoSuggest, a lot of users have been misled since
there are other techniques available.


>
> but what puzzles me is how to actually use it in an application.
>
> most autosuggests are case insensitive, so there is no difference if I type
> in 'San Francisco' or 'san francisco'.
>
> now I've tried with a 'text' field, and a 'string' field with no joy. with
> String providing the best result, but still with case sensitivity.
>
> at the moment I'm using a custom field type
>
> sortMissingLast="true" omitNorms="true">
>  
>
>
>
>
>
>
>  
>
>
> which converts all the field to all lower case, which allows me to submit
> the query as lower case and better good results.
>
> so the point of the email is to find out how do I get the autosuggest to
> return mixed case results, and not require me to lower case the query
> before
> I send it?
>

There is no way to do this right now using TermsComponent. You can index
lower case terms and store the mixed case terms. Then you can use a prefix
query which will return documents (and hence stored field values).

-- 
Regards,
Shalin Shekhar Mangar.


Re: Auto suggest.. how to do mixed case

2009-06-22 Thread Ingo Renner


Am 22.06.2009 um 11:09 schrieb Shalin Shekhar Mangar:

Hi Shalin,


I think
that by naming it as /autoSuggest, a lot of users have been misled  
since

there are other techniques available.


what would you suggest?


Ingo

--
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2





Re: Auto suggest.. how to do mixed case

2009-06-22 Thread Shalin Shekhar Mangar
On Mon, Jun 22, 2009 at 2:55 PM, Ingo Renner  wrote:

>
> Hi Shalin,
>
>  I think
>> that by naming it as /autoSuggest, a lot of users have been misled since
>> there are other techniques available.
>>
>
> what would you suggest?
>
>
There are many techniques. Personally, I've used

   1. Prefix search on shingles
   2. Exact (phrase) search on n-grams

The regular prefix search also works. The good thing with these is that you
can filter and different stored value is also possible.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Auto suggest...

2009-06-22 Thread Paul Libbrecht

I'm not sure I'm understanding fully this thread,

on the one hand it speaks about tuning the appropriate analyzer to get  
mixed case matching...

This part I am not addressing and I zapped that part of the suject.

on the other hand it seems to speak about an auto-suggestion facility?
Is this http://wiki.apache.org/solr/SolrJS ?
That page doesn't describe much of the server interface (e.g. the  
field types, the type of queries, how to fuzzify them).


Are there other such plans in Solr?
If that maybe be useful we have such an auto-completion with GWT under  
APL at http://i2geo.net/ where we intend to move to solr soon.


paul


Le 22-juin-09 à 13:11, Shalin Shekhar Mangar a écrit :


On Mon, Jun 22, 2009 at 2:55 PM, Ingo Renner  wrote:



Hi Shalin,

I think
that by naming it as /autoSuggest, a lot of users have been misled  
since

there are other techniques available.



what would you suggest?



There are many techniques. Personally, I've used

  1. Prefix search on shingles
  2. Exact (phrase) search on n-grams

The regular prefix search also works. The good thing with these is  
that you

can filter and different stored value is also possible.

--
Regards,
Shalin Shekhar Mangar.




smime.p7s
Description: S/MIME cryptographic signature


Re: Auto suggest...

2009-06-22 Thread Shalin Shekhar Mangar
On Mon, Jun 22, 2009 at 4:55 PM, Paul Libbrecht  wrote:

> I'm not sure I'm understanding fully this thread,
>
> on the one hand it speaks about tuning the appropriate analyzer to get
> mixed case matching...
> This part I am not addressing and I zapped that part of the suject.
>
> on the other hand it seems to speak about an auto-suggestion facility?
> Is this http://wiki.apache.org/solr/SolrJS ?


No. In the past the TermsComponent was defined in the example schema.xml as
/autoSuggest which seems to suggest that it is *the* way to get auto-suggest
support in Solr. This is what I was referring to which I said that users may
have been misled by this.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr Authentication Problem

2009-06-22 Thread Allahbaksh Asadullah
Hi All,
I am facing getting error when I am using Authentication in Solr. I
followed Wiki. The error doesnot appear when I searching. Below is the
code snippet and the error.

Please note I am using Solr 1.4 Development build from SVN.


   HttpClient client=new HttpClient();

   AuthScope scope = new AuthScope(AuthScope.ANY_HOST,
AuthScope.ANY_PORT,null, null);

   client.getState().setCredentials(

  scope,

   new UsernamePasswordCredentials("guest",
"guest")

   );

   SolrServer server =new
CommonsHttpSolrServer("http://localhost:8983/solr",client);





   SolrInputDocument doc1=new SolrInputDocument();

   //Add fields to the document

   doc1.addField("employeeid", "1237");

   doc1.addField("employeename", "Ann");

   doc1.addField("employeeunit", "etc");

   doc1.addField("employeedoj", "1995-11-31T23:59:59Z");

   server.add(doc1);





Exception in thread "main"
org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity
enclosing request can not be repeated.

   at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:468)

   at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)

   at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)

   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)

   at test.SolrAuthenticationTest.(SolrAuthenticationTest.java:49)

   at test.SolrAuthenticationTest.main(SolrAuthenticationTest.java:113)

Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
entity enclosing request can not be repeated.

   at
org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)

   at
org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)

   at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)

   at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)

   at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)

   at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)

   at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)

   at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:415)

   ... 5 more.

Thanks and regards,
Allahbaksh


multi-word synonyms with multiple matches

2009-06-22 Thread Ensdorf Ken
We have a field with index-time synonyms called "title".  Among the entries in 
the synonyms file are

vp,vice president
svp,senior vice president

However, a search for "vp" does not return results where the title is "senior 
vice president".  It appears that the term "vp" is not indexed when there is a 
longer string that matches a different synonym.  Is this by design, and is 
there any way to make solr index all synonyms that match a term, even if it is 
contained in a longer synonym?  Thanks!

-Ken



Data Import Handler

2009-06-22 Thread Mukerjee, Neiloy (Neil)
After setting up a working Solr 1.3 example with a Tomcat 6 container, I have 
been trying to figure out the Data Import Handler so I can work with a MySQL 
database. However, after following the guidelines at 
http://wiki.apache.org/solr/DataImportHandler#head-b3518c890e46befa05c9242c8fc329517c1ea61b,
 I end up with the following message displayed in my browser when I go to 
http://localhost:8080/solr/dataimport:

HTTP Status 404 - /solr/dataimport
type Status report
message /solr/dataimport
description The requested resource (/solr/dataimport) is not available.
Apache Tomcat/6.0.20

I have tried creating a dataimport directory in the hopes that /solr/dataimport 
would work like /solr/admin, and I put the dataimport.jsp file into this 
directory, but I still receive the same error message. When trying to go to 
http://localhost:8080/solr/admin/dataimport.jsp, I see two frames, the left 
frame having what I think I am supposed to see in order to deliver commands to 
the handler, and the right frame having the same error message as before.

Is there something I am doing wrong? Does anyone know of a clearer set of 
guidelines I might be able to use? [Google hasn't pointed me to any as of yet.]


Re: Data Import Handler

2009-06-22 Thread Shalin Shekhar Mangar
On Mon, Jun 22, 2009 at 7:52 PM, Mukerjee, Neiloy (Neil) <
neil.muker...@alcatel-lucent.com> wrote:

> After setting up a working Solr 1.3 example with a Tomcat 6 container, I
> have been trying to figure out the Data Import Handler so I can work with a
> MySQL database. However, after following the guidelines at
> http://wiki.apache.org/solr/DataImportHandler#head-b3518c890e46befa05c9242c8fc329517c1ea61b,
> I end up with the following message displayed in my browser when I go to
> http://localhost:8080/solr/dataimport:
>
> HTTP Status 404 - /solr/dataimport
> type Status report
> message /solr/dataimport
> description The requested resource (/solr/dataimport) is not available.
> Apache Tomcat/6.0.20
>

That usually means that DataImportHandler is not registered at /dataimport
in solrconfig.xml. Another reason might be that you are using multiple solr
cores? Did you restart solr after changing the solrconfig.xml ?

-- 
Regards,
Shalin Shekhar Mangar.


spellcheck. limit the suggested words by some field

2009-06-22 Thread Julian Davchev
Hi,
I have build spellcheck dictionary based on name field.
It works like a charm but I'd like to limit the returned suggestion.
For example we have following sturcutre

id  name   type
1Berlin  city
2berganphony


So when I search for  suggested words of "ber" I would get both Berlin
and bergan  but I somehow want to limit to only those of type city.
I tried with fq=type:city but this didn't help either.

Any pointers are more than welcome.  The other approeach would be makind
different spellcheck dictionaries based on type and just use the
specific dictionary but then againI didn't see option howto build
dictionary based on type.

Thanks.


Re: ExtractRequestHandler - not properly indexing office docs?

2009-06-22 Thread cloax

Yep, I've tried both of those and still no joy. Here's both my curl statement
and the resulting Solr log output. 

curl
http://localhost:8983/solr/update/extract?ext.def.fl=text\&ext.literal.id=1\&ext.map.div=text\&ext.capture=div
-F "myfi...@dj_character.doc"  

Curls output:


0317


Solr log:
Jun 22, 2009 12:21:42 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract
params={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1}
status=0 QTime=544 
Jun 22, 2009 12:22:26 PM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[1]} 0 317
Jun 22, 2009 12:22:26 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract
params={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1}
status=0 QTime=317 
Jun 22, 2009 12:22:37 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={wt=standard&rows=10&start=0&explainOther=&hl.fl=&indent=on&q=kondel&fl=*,score&qt=standard&version=2.2}
hits=0 status=0 QTime=2

The submitted document has "kondel" in it numerous times, so Solr should
have a hit. Yet it returns nothing. I also made sure I committed, but that
didn't seem to help either.


Grant Ingersoll-6 wrote:
> 
> Do you have a default field declared?  &ext.default.fl=
> Either that, or you need to explicitly capture the fields you are  
> interested in using &ext.capture=
> 
> You could add this to your curl statement to try out.
> 
> -Grant
> 


-- 
View this message in context: 
http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24150763.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Slowness during submit the index

2009-06-22 Thread Francis Yakin
No VM.

-Original Message-
From: Bruno [mailto:brun...@gmail.com]
Sent: Saturday, June 20, 2009 10:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Slowness during submit the index

We were having performance issues using servers running on VM. Are you
running QA or Prod in a VM?

2009/6/21, Stephen Weiss :
> Isn't it possible that the production equipment is simply under much
> higher load (given that, since it's in production, your various users
> are all actually using it), vs the QA equipment, which is only in use
> by the people doing QA?
>
> We've found the same thing at one point - we had a very small index (<
> 4 rows), so small it didn't seem worth the effort to do delta
> updates.  So we would just refresh the whole thing every time - or so
> we planned.  In the test environment it updated within a minute.  In
> production, it would take as long as 15 minutes.  What we finally
> realized was, because the DB was under much higher load in production
> than in the test environment, especially considering the amount of
> joins that needed to take place to pull out the data properly, various
> writes from the users to the affected tables would slow down the data
> selection process dramatically as the indexer would have to wait for
> locks to clear.  Now of course we do delta updates and everything's
> fine (and blazingly fast in both environments).
>
> Try simulating higher load (involving a "normal" amount of writes to
> the DB) against your QA equipment and then building the index.  See if
> the QA equipment still runs so quickly.
>
> --
> Steve
>
> On Jun 20, 2009, at 11:29 PM, Otis Gospodnetic wrote:
>
>>
>> Hi Francis,
>>
>> I can't tell what the problem is from the information you've
>> provided so far.  My gut instinct is that this is due to some
>> difference in QA vs. PROD environments that isn't Solr-specific.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>> From: Francis Yakin 
>>> To: "solr-user@lucene.apache.org" 
>>> Sent: Saturday, June 20, 2009 2:18:07 AM
>>> Subject: RE: Slowness during submit the index
>>>
>>> The amount of data in Prod is about 20% more than QA.
>>> We tested the network speed is fine. The hardware in Prod is larger
>>> and more
>>> powerful than QA.
>>> But QA is faster during reload. It takes QA only one hour than 6
>>> hours in Prod.
>>>
>>> That's why we don't understand what's the reason, the amount of
>>> data is only 20%
>>> more but it will not take 5 times slower because the data only 20%
>>> more.
>>>
>>> So, we looked into the config file for solr, but it's not much
>>> different, except
>>> Prod has master/slave environment which QA only master.
>>>
>>> Thanks for the response.
>>>
>>> Francis
>>>
>>>
>>> -Original Message-
>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>>> Sent: Friday, June 19, 2009 8:58 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Slowness during submit the index
>>>
>>>
>>> Francis,
>>>
>>> So it could easily be that your QA and PROD DBs are really just
>>> simply different
>>> (different amount of data, different network speed, different
>>> hardware...)
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>> - Original Message 
 From: Francis Yakin
 To: "solr-user@lucene.apache.org"
 Sent: Friday, June 19, 2009 10:39:48 PM
 Subject: RE: Slowness during submit the index

 * is the java version the same on both machines (QA vs. PROD)  - YES
 * are the same java parameters being used on both machines  -
 YES
 * is the connection to the DB the same on both machines -
 Not sure,
>>> need
 to ask the network guy
 * are both the PROD and QA DB servers the same and are both DB
 instances the
 same - they are not from the same DB

 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: Friday, June 19, 2009 6:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Slowness during submit the index


 Francis,

 I'm not sure if I understood your email correctly, but I think you
 are saying
 you are indexing your DB content into a Solr index.  If this is
 correct, here
 are things to look at:
 * is the java version the same on both machines (QA vs. PROD)
 * are the same java parameters being used on both machines
 * is the connection to the DB the same on both machines
 * are both the PROD and QA DB servers the same and are both DB
 instances the
 same
 ...


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
> From: Francis Yakin
> To: "solr-user@lucene.apache.org"
> Sent: Friday, June 19, 2009 5:27:59 PM
> Subject: Slowness during submit the index
>
>
> We are experiencin

Re: howto understand solr stats

2009-06-22 Thread Otis Gospodnetic

Julian,

Explanations below.

 --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Julian Davchev 
> To: solr-user@lucene.apache.org
> Sent: Monday, June 22, 2009 5:01:12 AM
> Subject: howto understand solr stats
> 
> Hi
> Where can I read about understanding solr stats.
> I got this in cache section but kinda not talking too much to me.
> 
> lookups : 149272

lookups for a given cache key/value

> hits : 135267

cache hits - successful lookups - value found for the key

> hitratio : 0.90

hit/miss ratio - 0.90 is pretty good.

> inserts : 14018

number of items inserted

> evictions : 13506

number of items evicted - this looks high - your cache is likely too small

> size : 512

index size - looks smallish

> warmupTime : 0

time taken to warm up the new cache when a new searcher is opened

> cumulative_lookups : 7188459
> cumulative_hits : 5429817
> cumulative_hitratio : 0.75
> cumulative_inserts : 1758642
> cumulative_evictions : 812185

cumulative/aggregate numbers over the whole/current lifespan of the Solr 
instance/JVM.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


Re: Sorlj when to commit?

2009-06-22 Thread Otis Gospodnetic

Hi,

If you don't need the searcher to see index changes (new docs) during your 
indexing, just wait until you are done and commit/optimize at the end.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: pof 
> To: solr-user@lucene.apache.org
> Sent: Monday, June 22, 2009 2:31:53 AM
> Subject: Sorlj when to commit?
> 
> 
> Hi, I am doing a large batch (thousands) of insertions to my index using an
> EmbeddedSolrServer. I was wondering how often should I use server.commit()
> as I am trying to avoid unecessary bottlenecks. 
> 
> Thanks, Brett.
> -- 
> View this message in context: 
> http://www.nabble.com/Sorlj-when-to-commit--tp24142326p24142326.html
> Sent from the Solr - User mailing list archive at Nabble.com.



RE: Data Import Handler

2009-06-22 Thread Mukerjee, Neiloy (Neil)
I am not using multiple Solr cores, but I hadn't restarted after making changes 
to the solrconfig file or adding a data-config file, so I did that and got a 
"severe errors" warning in my browser, with the below text in my logs. When I 
delete the data-config file and remove the DataImportHandler section from the 
solrconfig file, I restart and see Solr running fine (although, of course, 
without the data import handler), and when I go in and repeat the process, I 
get the same errors. 

I suspect that the fact that the data-config file is blank is causing these 
issues, but per the documentation on the website, there is no indication of 
what, if anything, should go there - is there an alternate resource that anyone 
knows of which I could use? 


Jun 22, 2009 1:07:48 PM org.apache.solr.handler.dataimport.DataImportHandler 
inform
SEVERE: Exception while loading DataImporter
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception 
occurred while initializing context
at 
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:165)
at 
org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:99)
at 
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:97)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:415)
at org.apache.solr.core.SolrCore.(SolrCore.java:572)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:128)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630)
at 
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556)
at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at 
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
at 
org.apache.catalina.core.StandardService.start(StandardService.java:516)
at 
org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
Caused by: org.xml.sax.SAXParseException: Premature end of file.
at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)
at 
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:153)
... 33 more
Jun 22, 2009 1:07:48 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to searc...@134ce4a main
Jun 22, 2009 1:07:48 PM org.apache.solr.servlet.SolrDispatchFilter init
SEVERE: Could not start SOLR. Check solr/home property
org.apache.solr.common.SolrException: FATAL: Could not create importer. 
DataImporter config invalid
at 
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:105)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:415)
at org.apache.solr.core.SolrCore.(SolrCore.java:572)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain

Re: Data Import Handler

2009-06-22 Thread Shalin Shekhar Mangar
On Mon, Jun 22, 2009 at 10:51 PM, Mukerjee, Neiloy (Neil) <
neil.muker...@alcatel-lucent.com> wrote:

>
> I suspect that the fact that the data-config file is blank is causing these
> issues, but per the documentation on the website, there is no indication of
> what, if anything, should go there - is there an alternate resource that
> anyone knows of which I could use?
>
>
The data-config.xml is the file which specified how and from where Solr can
pull data.

For example look at the full-import from a database data-config.xml at
http://wiki.apache.org/solr/DataImportHandler#head-c24dc86472fa50f3e87f744d3c80ebd9c31b791c

Or, look at the Slashdot feed example at
http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
-- 
Regards,
Shalin Shekhar Mangar.


Re: Keyword Density

2009-06-22 Thread Chris Hostetter
: Date: Wed, 3 Jun 2009 10:19:06 -0700 (PDT)
: From: Otis Gospodnetic
: Subject: Re: Keyword Density

: > > But I don't need to sort using this value. I need to cut results, where
: > > this value (for particular term of query!) not in some range.

: I don't think this is possible without changing Solr. Or maybe it's 
: possible with a custom Search Component that looks at all hits and 
: checks the "df" (document frequency) for a term in each document?  
: Sounds like a very costly operation...

FWIW: The best place to try and tackle something like this would probably 
be to write a new subclass of FilteredTermDocs that only returned 
docs/frequncies where the the freq was in the range you were interested 
in.  Then use your new FilteredTermDocs class in a subclass of TermQuery 
when constructing a TermScorer.  *then* use your new TermQuery subclass in 
a custom Solr QParser.

It can be done efficiently, but it definitely requires making some low 
level changes to the code.



-Hoss



Re: Sending Mlt POST request

2009-06-22 Thread Chris Hostetter
: I wish to send an Mlt request to Solr and filter the result by a list of 
: values to specific field.  The problem is sometimes the list can include 
: thousands of values and it's impossible to send such GET request.
: 
: Sending this request as POST didn't work well... Is POST supported by 
: mlt? If not, is there suppose to be added in one of the next versions? 
: Or is there a different solution maybe?

POST to any RequestHandler should work fine ... provided the POST is 
structured correctly.

What exactly is hte behavior you are seeing (ie: an error message?)




-Hoss



Re: searchcomponent howto ...

2009-06-22 Thread Chris Hostetter
: and then ask,
:- how can i set the value of query so that it is reflected in the 'q'
: node of the search results e.g. solr.
: the example 'process' method above works, but the original query is still
: written to the search results page.

if you're talking about the param values that get written out in the 
header section, those always contain the "original" params (either form 
the URL, or from defaults in configs ... I don't think you can modify 
those easily.

your component can always add the your new "q" value to the response as a 
new object (with whatever name you want), and your client code can get at 
it that way.


-Hoss



Re: Schema vs Dynamic Fields

2009-06-22 Thread Chris Hostetter
: Date: Mon, 08 Jun 2009 16:44:45 -0700
: From: Phil Hagelberg
: Subject: Schema vs Dynamic Fields

: Is the use of a predefined schema primarily a "type safety" feature?
: We're considering using Solr for a data set that is very free-form; will
: we get much slower results if the majority of our data is in a dynamic
: field such as:
: 
:   
: 
: I'm a little unclear on the trade-offs involved and would appreciate
: a hint.

There is some cost involved in every new "field" that exists in your index 
(regardless of wether it was explicitly declared, or sprang into existence 
because of a dynamicField declaration) but there are ways to mitigate some 
of those costs (omitNorms=true being a big one)

in general the big advantage to explicitly delcaring fields is that you 
can customize their analysis/datatypes ... you can do similar things by 
having "type specific" dynamic fields but then youre fiend names must 
follow set convnetions based on data type.



-Hoss



Re: no .war with ubuntu release ?

2009-06-22 Thread Chris Hostetter

: Date: Thu, 18 Jun 2009 19:00:18 -0400
: From: Jonathan Vanasco
: Subject: no .war with ubuntu release ?


: after countless searching, it seems that there is no .war file in the distro

: http://packages.ubuntu.com/hardy/all/solr-common/filelist
: http://packages.ubuntu.com/hardy/all/solr-jetty/filelist
: 
: as you can see, there is no .war

i'm not familiar with the ubuntu packaging, but by the looks of those file 
lists, they have unzipedthe solr.war into /usr/share/solr/ (note the 
WEB-INF directory).

the interesting thing about java "webapps" is that they can be distributed 
as a "war" file or as a directory ... most servlet containers actually 
unzip the war file into a directory on local disk anyway (so they don't 
have to keep the whole thing in memory) and it looks like the ubuntu 
packagers just decided to package the uncompressed webapp in the .deb 
instead of having a war in there that would get uncompressed on first 
usage.

that's just a theory however, and doens't explain why it isn't working for 
you.

presumably somwhere in one of the jetty config files there should be a 
refrence to /user/share/ as the place to fine webapps, and a refrence to 
/etc/solr as the SolrHomeDir.



-Hoss



THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-22 Thread Bradford Stephens
Hey all, just a friendly reminder that this is Wednesday! I hope to see
everyone there again. Please let me know if there's something interesting
you'd like to talk about -- I'll help however I can. You don't even need a
Powerpoint presentation -- there's many whiteboards. I'll try to have a
video cam, but no promises.
Feel free to call at 904-415-3009 if you need directions or any questions :)

~~`

Greetings,

On the heels of our smashing success last month, we're going to be
convening the Pacific Northwest (Oregon and Washington)
Hadoop/HBase/Lucene/etc. meetup on the last Wednesday of June, the
24th.  The meeting should start at 6:45, organized chats will end
around  8:00, and then there shall be discussion and socializing :)

The meeting will be at the University of Washington in
Seattle again. It's in the Computer Science building (not electrical
engineering!), room 303, located here:
http://www.washington.edu/home/maps/southcentral.html?80,70,792,660

If you've ever wanted to learn more about distributed computing, or
just see how other people are innovating with Hadoop, you can't miss
this opportunity. Our focus is on learning and education, so every
presentation must end with a few questions for the group to research
and discuss. (But if you're an introvert, we won't mind).

The format is two or three 15-minute "deep dive" talks, followed by
several 5 minute "lightning chats". We had a few interesting topics
last month:

-Building a Social Media Analysis company on the Apache Cloud Stack
-Cancer detection in images using Hadoop
-Real-time OLAP on HBase -- is it possible?
-Video and Network Flow Analysis in Hadoop vs. Distributed RDBMS
-Custom Ranking in Lucene

We already have one "deep dive" scheduled this month, on truly
scalable Lucene with Katta. If you've been looking for a way to handle
those large Lucene indices, this is a must-attend!

Looking forward to seeing everyone there again.

Cheers,
Bradford

http://www.roadtofailure.com -- The Fringes of Distributed Computing,
Computer Science, and Social Media.


Re: DataImportHandler configuration - externalizing environment-specific settings?

2009-06-22 Thread Erik Hatcher
Ah, thanks Noble.  I should have figured that one out myself - I think  
the built-in capabilities of setting a parameter from the handler  
mapping will do the trick nicely, indirecting it from a system property.


Erik

On Jun 21, 2009, at 11:49 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



There is no straight way but there is a way
http://wiki.apache.org/solr/DataImportHandlerFaq#head-c4003ab5af86a200b35cf6846a58913839a5a096

On Mon, Jun 22, 2009 at 6:23 AM, Erik Hatcher
 wrote:


In an environment where there are developer machines, test,  
staging, and production servers there is a need to externalize DIH  
configuration options like JDBC connections strings (at least the  
database server name), username, password, and base paths for XML  
and plain text files.


How are folks handling this currently?  Didn't seem to be a way to  
use system properties like we can in solrconfig/schema.xml files  
using ${sys.property[:defaultValue]} syntax.  Having system  
properties be available in the variable resolver would be quite  
useful.  Is this already there and I missed it?


Thanks,
   Erik





--
-
Noble Paul | Principal Engineer| AOL | http://aol.com




Re: ExtractRequestHandler - not properly indexing office docs?

2009-06-22 Thread Grant Ingersoll

What's your default search field?

On Jun 22, 2009, at 12:29 PM, cloax wrote:



Yep, I've tried both of those and still no joy. Here's both my curl  
statement

and the resulting Solr log output.

curl
http://localhost:8983/solr/update/extract?ext.def.fl=text 
\&ext.literal.id=1\&ext.map.div=text\&ext.capture=div

-F "myfi...@dj_character.doc"

Curls output:


0317


Solr log:
Jun 22, 2009 12:21:42 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract
params 
={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1}

status=0 QTime=544
Jun 22, 2009 12:22:26 PM  
org.apache.solr.update.processor.LogUpdateProcessor

finish
INFO: {add=[1]} 0 317
Jun 22, 2009 12:22:26 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract
params 
={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1}

status=0 QTime=317
Jun 22, 2009 12:22:37 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params 
= 
{wt 
= 
standard 
&rows 
= 
10 
&start 
= 
0 
&explainOther 
=&hl.fl=&indent=on&q=kondel&fl=*,score&qt=standard&version=2.2}

hits=0 status=0 QTime=2

The submitted document has "kondel" in it numerous times, so Solr  
should
have a hit. Yet it returns nothing. I also made sure I committed,  
but that

didn't seem to help either.


Grant Ingersoll-6 wrote:


Do you have a default field declared?  &ext.default.fl=
Either that, or you need to explicitly capture the fields you are
interested in using &ext.capture=

You could add this to your curl statement to try out.

-Grant




--
View this message in context: 
http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24150763.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Solr Authentication Problem

2009-06-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
I have raised an issue https://issues.apache.org/jira/browse/SOLR-1238

there is patch attached to the issue.


On Mon, Jun 22, 2009 at 1:40 PM, Allahbaksh Asadullah
 wrote:
>
> Hi All,
> I am facing getting error when I am using Authentication in Solr. I
> followed Wiki. The error doesnot appear when I searching. Below is the
> code snippet and the error.
>
> Please note I am using Solr 1.4 Development build from SVN.
>
>
>                        HttpClient client=new HttpClient();
>
>                        AuthScope scope = new AuthScope(AuthScope.ANY_HOST,
> AuthScope.ANY_PORT,null, null);
>
>                        client.getState().setCredentials(
>
>                               scope,
>
>                                new UsernamePasswordCredentials("guest", 
> "guest")
>
>                                );
>
>                        SolrServer server =new
> CommonsHttpSolrServer("http://localhost:8983/solr",client);
>
>
>
>
>
>                        SolrInputDocument doc1=new SolrInputDocument();
>
>                        //Add fields to the document
>
>                        doc1.addField("employeeid", "1237");
>
>                        doc1.addField("employeename", "Ann");
>
>                        doc1.addField("employeeunit", "etc");
>
>                        doc1.addField("employeedoj", "1995-11-31T23:59:59Z");
>
>                        server.add(doc1);
>
>
>
>
>
> Exception in thread "main"
> org.apache.solr.client.solrj.SolrServerException:
> org.apache.commons.httpclient.ProtocolException: Unbuffered entity
> enclosing request can not be repeated.
>
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:468)
>
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
>
>        at 
> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
>
>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)
>
>        at test.SolrAuthenticationTest.(SolrAuthenticationTest.java:49)
>
>        at test.SolrAuthenticationTest.main(SolrAuthenticationTest.java:113)
>
> Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
> entity enclosing request can not be repeated.
>
>        at 
> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
>
>        at 
> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
>
>        at 
> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
>
>        at 
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>
>        at 
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>
>        at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>
>        at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:415)
>
>        ... 5 more.
>
> Thanks and regards,
> Allahbaksh



--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: ExtractRequestHandler - not properly indexing office docs?

2009-06-22 Thread cloax

I've tried 'text' ( taken from the example config ) and then tried creating a
new field called doc_content and using that. Neither has worked. 
 

Grant Ingersoll-6 wrote:
> 
> What's your default search field?
> 
> On Jun 22, 2009, at 12:29 PM, cloax wrote:
> 
>>
>> Yep, I've tried both of those and still no joy. Here's both my curl  
>> statement
>> and the resulting Solr log output.
>>
>> curl
>> http://localhost:8983/solr/update/extract?ext.def.fl=text 
>> \&ext.literal.id=1\&ext.map.div=text\&ext.capture=div
>> -F "myfi...@dj_character.doc"
>>
>> Curls output:
>> 
>> 
>> 0> name="QTime">317
>> 
>>
>> Solr log:
>> Jun 22, 2009 12:21:42 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/update/extract
>> params 
>> ={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1}
>> status=0 QTime=544
>> Jun 22, 2009 12:22:26 PM  
>> org.apache.solr.update.processor.LogUpdateProcessor
>> finish
>> INFO: {add=[1]} 0 317
>> Jun 22, 2009 12:22:26 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/update/extract
>> params 
>> ={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1}
>> status=0 QTime=317
>> Jun 22, 2009 12:22:37 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/select
>> params 
>> = 
>> {wt 
>> = 
>> standard 
>> &rows 
>> = 
>> 10 
>> &start 
>> = 
>> 0 
>> &explainOther 
>> =&hl.fl=&indent=on&q=kondel&fl=*,score&qt=standard&version=2.2}
>> hits=0 status=0 QTime=2
>>
>> The submitted document has "kondel" in it numerous times, so Solr  
>> should
>> have a hit. Yet it returns nothing. I also made sure I committed,  
>> but that
>> didn't seem to help either.
>>
>>
>> Grant Ingersoll-6 wrote:
>>>
>>> Do you have a default field declared?  &ext.default.fl=
>>> Either that, or you need to explicitly capture the fields you are
>>> interested in using &ext.capture=
>>>
>>> You could add this to your curl statement to try out.
>>>
>>> -Grant
>>>
>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24150763.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24159267.html
Sent from the Solr - User mailing list archive at Nabble.com.