Re: How do I secure solr server?

2008-02-21 Thread matt davies

Hi Mel

One method is to limit the access to the web backend by only having it  
respond to 127.0.0.1.


I'm not certain here but i think do that you need to add the limiting  
access code in your servlet, which may be different.


For instance, we edited jetty.xml in our situation.

I hope this is of some help to get you  started looking, I've probably  
got alot of terminology incorrect there, and some facts :-)


Might help though.

matt





On 21 Feb 2008, at 06:46, Mel Brand wrote:


Hi guys,

I run solr on a separate server from the application server and I'd
like to know how to protect it. I'd like to know how to prevent
someone from communicating to the server and also prevent unauthorized
access (through the web) to admin page.

Any help is extremely appreciated!! :)

Thanks,

Mel




Re: How do I secure solr server?

2008-02-21 Thread Thorsten Scherler
On Thu, 2008-02-21 at 01:46 -0500, Mel Brand wrote:
> Hi guys,
> 
> I run solr on a separate server from the application server and I'd
> like to know how to protect it. 

best with a firewall.

> I'd like to know how to prevent
> someone from communicating to the server and also prevent unauthorized
> access (through the web) to admin page.

I would not expose http://yourServer:8983 at all. I would use an Apache
httpd server as proxy and implement the ac there.

salu2

> 
> Any help is extremely appreciated!! :)
> 
> Thanks,
> 
> Mel
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



spellchecker and extendedResults

2008-02-21 Thread Suslik

Hi, I'am using spellchecker from solr nightly build (2008-02-20) and it does
show extended results when i put option extendedResults. What may be the
reason? Also it does sort matches in different order depending on
suggestionCount parameter. Is it normal that sort order differs when I
change suggestionCount? And i think it does not count frequency of the
suggested word at all.
The same situation with 1.2 release.
What may be wrong?

Vic.
-- 
View this message in context: 
http://www.nabble.com/spellchecker-and-extendedResults-tp15607286p15607286.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: YAML update request handler

2008-02-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
hi,
The format over the wire is not of great significance because it gets
unmarshalled into the corresponding language object as soon as it comes out
of the wire. I would say XML/JSON should meet 99% of the requirements
because all the platforms come with an unmarshaller for both of these.

But,If it can offer good performance improvement it is worth trying.
--Noble

On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <[EMAIL PROTECTED]> wrote:

> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>
> > A few months back I wrote a YAML update request handler to see if we
> > could post documents faster than with XMl.  We did see some small
> > speed improvements (didn't write down the numbers), but the hacked
> > together code was probably making it slower as well.  Not sure if
> > there are faster YAML libraries out there either.
> >
> > We're not actually using it, since it was just a small proof of
> > concept type of project, but is this anything people might be
> > interested in?
> >
>
> Out of simple preference I would love to see a YAML request handler
> just because I like the YAML format. If its also faster than XML, then
> all the better.
>
> Cheers
> Alec
>



-- 
--Noble Paul


Re: Solr in Windows XP + JDK 5 + Tomcat 6.0.13

2008-02-21 Thread Alejandro Valdez
Thanks a lot, it's running right now.

It seems that solr.solr.home should not point into the webapps
directory, maybe this tip should be included in the installation
guide...

Thanks again.


On Wed, Feb 20, 2008 at 10:50 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 20, 2008 at 5:32 PM, Alejandro Valdez
>
> <[EMAIL PROTECTED]> wrote:
>
> > Hi, I changed that line to:
>  >
>  >  set JAVA_OPTS=-Dsolr.home=C:\xampp\tomcat\webapps\solr -Duser.language=en
>  >
>  >  But It STILL isn't working...I almost give up :-(
>  >
>  >  When I try to open http://localhost:8080/solr/admin, I get:
>  >
>  > ---
>  >  HTTP Status 404 - /solr/admin
>  >  type Status report
>  >  message /solr/admin
>  >  description The requested resource (/solr/admin) is not available.
>  >  Apache Tomcat/6.0.13
>  >  ---
>  >
>  >
>  >  Someone should fix the page http://wiki.apache.org/solr/SolrTomcat,
>  >  there says that should be used -Dsolr.solr.home=... :
>
>  solr.solr.home is the correct variable.
>  Try putting the solr home (the contents of solr/example) outside the
>  webapps directory.  Only solr.war should go inside webapps.
>
>  You could also try the "simple example install" from here:
>
>
> http://wiki.apache.org/solr/SolrTomcat
>
>  -Yonik
>


Different Filters

2008-02-21 Thread Owens, Martin
Hello all,

We have a requirement for being able to switch on and off certain filters for 
different searches.

The problem is that these filters are defined in the schema, per field; we only 
have one field with text so I was wondering if there was a way of setting the 
filters in the solrconfig.xml and create a different search a bit like the 
example but with different filters.

thoughts?

Best Regards, Martin Owens


Re: Different Filters

2008-02-21 Thread Yonik Seeley
On Thu, Feb 21, 2008 at 1:20 PM, Owens, Martin
<[EMAIL PROTECTED]> wrote:
>  We have a requirement for being able to switch on and off certain filters 
> for different searches.

Can the client send in which filters should be turned on and off, but
leave the definition of the filters in solrconfig.xml?

If so, you can get this effect with the new query parser plugin
framework.  Part of that includes what I call "local parameters" (not
really documented yet), which includes parameter dereferencing.

So you could add something like this to your query
fq=&fq=

Re: Solr in Windows XP + JDK 5 + Tomcat 6.0.13

2008-02-21 Thread David Pratt
Hi Alejandro. Since this was a bit of trouble for you could you post the 
steps you used to get it to work (and/or any deviation from the wiki) to 
summarize this thread. It has been some days that I have seen the thread 
on the list and it would leave something useful other than I got it 
running for other folks with a similar issue in future. Many thanks.


Regards
David

Alejandro Valdez wrote:

Thanks a lot, it's running right now.

It seems that solr.solr.home should not point into the webapps
directory, maybe this tip should be included in the installation
guide...

Thanks again.


On Wed, Feb 20, 2008 at 10:50 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On Wed, Feb 20, 2008 at 5:32 PM, Alejandro Valdez

<[EMAIL PROTECTED]> wrote:


Hi, I changed that line to:

 >
 >  set JAVA_OPTS=-Dsolr.home=C:\xampp\tomcat\webapps\solr -Duser.language=en
 >
 >  But It STILL isn't working...I almost give up :-(
 >
 >  When I try to open http://localhost:8080/solr/admin, I get:
 >
 > ---
 >  HTTP Status 404 - /solr/admin
 >  type Status report
 >  message /solr/admin
 >  description The requested resource (/solr/admin) is not available.
 >  Apache Tomcat/6.0.13
 >  ---
 >
 >
 >  Someone should fix the page http://wiki.apache.org/solr/SolrTomcat,
 >  there says that should be used -Dsolr.solr.home=... :

 solr.solr.home is the correct variable.
 Try putting the solr home (the contents of solr/example) outside the
 webapps directory.  Only solr.war should go inside webapps.

 You could also try the "simple example install" from here:


http://wiki.apache.org/solr/SolrTomcat

 -Yonik





Companies Using Solr

2008-02-21 Thread Clay Webster
Hey Folks,

Reminder: http://wiki.apache.org/solr/PublicServers lists the sites using
Solr.  The listing is a bit thin.  I know many people don't know about the
list or don't have the time to add themselves to the list.  I'd like to be
able to promote open sourcing more systems (like Solr) and this information
would help show it is helping a large community.

Feel free to reply directly to me and I can add you.

Thanks.

--cw

Clay Webster
Associate VP, Platform Infrastructure
CNET, Inc. (Nasdaq:CNET)


RE: Different Filters

2008-02-21 Thread Owens, Martin

> Can the client send in which filters should be turned on and off, but
> leave the definition of the filters in solrconfig.xml?

The client must set the property, how solr deals with that is how I want it to 
work.

> If so, you can get this effect with the new query parser plugin
> framework.  Part of that includes what I call "local parameters" (not
> really documented yet), which includes parameter dereferencing.

What version of solr does this first appear? we're using a nightly build from 
December which was heavily hacked to do database result returning and word 
offset highlighting (and some other fixes) so we'd like to avoid using anything 
newer.

> So you could add something like this to your query
> fq=&fq= and have the various filters be a default defined in a handler in 
> solrconfig.xml

How does this work? I'm still confused from your explanation. Are the query 
options turning the filters on or off? what kind of hander would go into 
solrconfig.xml?

Best Regards, Martin Owens


Re: Different Filters

2008-02-21 Thread Yonik Seeley
On Thu, Feb 21, 2008 at 1:49 PM, Owens, Martin
<[EMAIL PROTECTED]> wrote:
>  > So you could add something like this to your query
>  > fq=&fq=  > and have the various filters be a default defined in a handler in 
> solrconfig.xml
>
>  How does this work? I'm still confused from your explanation. Are the query 
> options turning the filters on or off? what kind of hander would go into 
> solrconfig.xml?

This feature was first committed 10/22/07

It's a simple indirection.
fq=myfield:myval
  is equivalent to
fq=&filter1=myfield:myval

Now put filter1 as a default in your handler (same as any other
default), and the client can turn on and off filter1 without knowing
what exactly it is.

-Yonik


Re: YAML update request handler

2008-02-21 Thread Walter Underwood
Python marshal format is worth a try. It is binary and can represent
the same data as JSON. It should be a good fit to Solr.

We benchmarked that against XML several years ago and it was 2X faster.
Of course, XML parsers are a lot faster now.

wunder

On 2/21/08 10:50 AM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:

> XML can be a problem when it is really lengthy (lots of results, large
> results) such that a binary format could be useful in certain cases
> where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
> that deal with really large files wrapped in XML where the XML parsing
> takes a significant amount of time as compared to a more compact
> binary format.
> 
> I think it at least warrants profiling/testing.
> 
> -Grant
> 
> On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
> नोब्ळ् wrote:
> 
>> hi,
>> The format over the wire is not of great significance because it gets
>> unmarshalled into the corresponding language object as soon as it
>> comes out
>> of the wire. I would say XML/JSON should meet 99% of the requirements
>> because all the platforms come with an unmarshaller for both of these.
>> 
>> But,If it can offer good performance improvement it is worth trying.
>> --Noble
>> 
>> On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <[EMAIL PROTECTED]>
>> wrote:
>> 
>>> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>>> 
 A few months back I wrote a YAML update request handler to see if we
 could post documents faster than with XMl.  We did see some small
 speed improvements (didn't write down the numbers), but the hacked
 together code was probably making it slower as well.  Not sure if
 there are faster YAML libraries out there either.
 
 We're not actually using it, since it was just a small proof of
 concept type of project, but is this anything people might be
 interested in?
 
>>> 
>>> Out of simple preference I would love to see a YAML request handler
>>> just because I like the YAML format. If its also faster than XML,
>>> then
>>> all the better.
>>> 
>>> Cheers
>>> Alec
>>> 
>> 
>> 
>> 
>> -- 
>> --Noble Paul
> 
> --
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 



Re: Solr in Windows XP + JDK 5 + Tomcat 6.0.13

2008-02-21 Thread Alejandro Valdez
Hello, yes of course.

I followed the instructions from
http://wiki.apache.org/solr/SolrTomcat (see below)
but instead of copy the example configuration files into the directory
c:\web\solr\ as
is explained in that page, I did it into c:\tomcat\webapps\solr and
started Tomcat with:
-Dsolr.solr.home=c:\tomcat\webapps\solr

But it didn't work.


Apparently the directory used in solr.solr.home variable MUST NOT
point inside the Tomcat's webapps directory, or it will be ignored.
***

The enviroment I used was:
Windows XP Professional
XAMPP 1.6.4
Tomcat 6.0.13
Sun JDK 5


Updated content of http://wiki.apache.org/solr/SolrTomcat:

Tomcat on Windows
Single Solr app

1) Download and install [WWW] Tomcat for Windows using the MSI
installer. Install it with the tcnative.dll file. Say you installed it
in c:\tomcat\
2) Check if Tomcat is installed correctly by going to [WWW]
http://localhost:8080/
3) Change the c:\tomcat\conf\server.xml file to add the URIEncoding
Connector element as shown above.
4) Download and unzip the Solr distribution zip file into (say) c:\temp\solrZip\
5) Make a directory called solr where you intend the application
server to function, say c:\web\solr\ (Important: It must be outside
the Tomcat's webapps directory)
6) Copy the contents of the example\solr directory
c:\temp\solrZip\example\solr\ to c:\web\solr\
7) Stop the Tomcat service
8) Copy the *solr*.war file from c:\temp\solrZip\dist\ to the Tomcat
webapps directory c:\tomcat\webapps\
9) Rename the *solr*.war file solr.war
10)Use the system tray icon to configure Tomcat to start with the
following Java option: -Dsolr.solr.home=c:\web\solr
11)Start the Tomcat service
12)Go to the solr admin page to verify that the installation is
working. It will be at [WWW] http://localhost:8080/solr/admin


On Thu, Feb 21, 2008 at 4:38 PM, David Pratt <[EMAIL PROTECTED]> wrote:
> Hi Alejandro. Since this was a bit of trouble for you could you post the
>  steps you used to get it to work (and/or any deviation from the wiki) to
>  summarize this thread. It has been some days that I have seen the thread
>  on the list and it would leave something useful other than I got it
>  running for other folks with a similar issue in future. Many thanks.
>
>  Regards
>  David
>
>
>
>  Alejandro Valdez wrote:
>  > Thanks a lot, it's running right now.
>  >
>  > It seems that solr.solr.home should not point into the webapps
>  > directory, maybe this tip should be included in the installation
>  > guide...
>  >
>  > Thanks again.
>  >
>  >
>  > On Wed, Feb 20, 2008 at 10:50 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>  >> On Wed, Feb 20, 2008 at 5:32 PM, Alejandro Valdez
>  >>
>  >> <[EMAIL PROTECTED]> wrote:
>  >>
>  >>> Hi, I changed that line to:
>  >>  >
>  >>  >  set JAVA_OPTS=-Dsolr.home=C:\xampp\tomcat\webapps\solr 
> -Duser.language=en
>  >>  >
>  >>  >  But It STILL isn't working...I almost give up :-(
>  >>  >
>  >>  >  When I try to open http://localhost:8080/solr/admin, I get:
>  >>  >
>  >>  > ---
>  >>  >  HTTP Status 404 - /solr/admin
>  >>  >  type Status report
>  >>  >  message /solr/admin
>  >>  >  description The requested resource (/solr/admin) is not available.
>  >>  >  Apache Tomcat/6.0.13
>  >>  >  ---
>  >>  >
>  >>  >
>  >>  >  Someone should fix the page http://wiki.apache.org/solr/SolrTomcat,
>  >>  >  there says that should be used -Dsolr.solr.home=... :
>  >>
>  >>  solr.solr.home is the correct variable.
>  >>  Try putting the solr home (the contents of solr/example) outside the
>  >>  webapps directory.  Only solr.war should go inside webapps.
>  >>
>  >>  You could also try the "simple example install" from here:
>  >>
>  >>
>  >> http://wiki.apache.org/solr/SolrTomcat
>  >>
>  >>  -Yonik
>  >>
>  >
>


Re: Solr in Windows XP + JDK 5 + Tomcat 6.0.13

2008-02-21 Thread David Pratt
Hi Alejandro. Your summary is good and it should be of benefit to 
others. Thank you for taking the time to prepare it.


Regards,
David

Alejandro Valdez wrote:

Hello, yes of course.

I followed the instructions from
http://wiki.apache.org/solr/SolrTomcat (see below)
but instead of copy the example configuration files into the directory
c:\web\solr\ as
is explained in that page, I did it into c:\tomcat\webapps\solr and
started Tomcat with:
-Dsolr.solr.home=c:\tomcat\webapps\solr

But it didn't work.


Apparently the directory used in solr.solr.home variable MUST NOT
point inside the Tomcat's webapps directory, or it will be ignored.
***

The enviroment I used was:
Windows XP Professional
XAMPP 1.6.4
Tomcat 6.0.13
Sun JDK 5


Updated content of http://wiki.apache.org/solr/SolrTomcat:

Tomcat on Windows
Single Solr app

1) Download and install [WWW] Tomcat for Windows using the MSI
installer. Install it with the tcnative.dll file. Say you installed it
in c:\tomcat\
2) Check if Tomcat is installed correctly by going to [WWW]
http://localhost:8080/
3) Change the c:\tomcat\conf\server.xml file to add the URIEncoding
Connector element as shown above.
4) Download and unzip the Solr distribution zip file into (say) c:\temp\solrZip\
5) Make a directory called solr where you intend the application
server to function, say c:\web\solr\ (Important: It must be outside
the Tomcat's webapps directory)
6) Copy the contents of the example\solr directory
c:\temp\solrZip\example\solr\ to c:\web\solr\
7) Stop the Tomcat service
8) Copy the *solr*.war file from c:\temp\solrZip\dist\ to the Tomcat
webapps directory c:\tomcat\webapps\
9) Rename the *solr*.war file solr.war
10)Use the system tray icon to configure Tomcat to start with the
following Java option: -Dsolr.solr.home=c:\web\solr
11)Start the Tomcat service
12)Go to the solr admin page to verify that the installation is
working. It will be at [WWW] http://localhost:8080/solr/admin


On Thu, Feb 21, 2008 at 4:38 PM, David Pratt <[EMAIL PROTECTED]> wrote:

Hi Alejandro. Since this was a bit of trouble for you could you post the
 steps you used to get it to work (and/or any deviation from the wiki) to
 summarize this thread. It has been some days that I have seen the thread
 on the list and it would leave something useful other than I got it
 running for other folks with a similar issue in future. Many thanks.

 Regards
 David



 Alejandro Valdez wrote:
 > Thanks a lot, it's running right now.
 >
 > It seems that solr.solr.home should not point into the webapps
 > directory, maybe this tip should be included in the installation
 > guide...
 >
 > Thanks again.
 >
 >
 > On Wed, Feb 20, 2008 at 10:50 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
 >> On Wed, Feb 20, 2008 at 5:32 PM, Alejandro Valdez
 >>
 >> <[EMAIL PROTECTED]> wrote:
 >>
 >>> Hi, I changed that line to:
 >>  >
 >>  >  set JAVA_OPTS=-Dsolr.home=C:\xampp\tomcat\webapps\solr 
-Duser.language=en
 >>  >
 >>  >  But It STILL isn't working...I almost give up :-(
 >>  >
 >>  >  When I try to open http://localhost:8080/solr/admin, I get:
 >>  >
 >>  > ---
 >>  >  HTTP Status 404 - /solr/admin
 >>  >  type Status report
 >>  >  message /solr/admin
 >>  >  description The requested resource (/solr/admin) is not available.
 >>  >  Apache Tomcat/6.0.13
 >>  >  ---
 >>  >
 >>  >
 >>  >  Someone should fix the page http://wiki.apache.org/solr/SolrTomcat,
 >>  >  there says that should be used -Dsolr.solr.home=... :
 >>
 >>  solr.solr.home is the correct variable.
 >>  Try putting the solr home (the contents of solr/example) outside the
 >>  webapps directory.  Only solr.war should go inside webapps.
 >>
 >>  You could also try the "simple example install" from here:
 >>
 >>
 >> http://wiki.apache.org/solr/SolrTomcat
 >>
 >>  -Yonik
 >>
 >





Re: YAML update request handler

2008-02-21 Thread Grant Ingersoll
XML can be a problem when it is really lengthy (lots of results, large  
results) such that a binary format could be useful in certain cases  
where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps  
that deal with really large files wrapped in XML where the XML parsing  
takes a significant amount of time as compared to a more compact  
binary format.


I think it at least warrants profiling/testing.

-Grant

On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



hi,
The format over the wire is not of great significance because it gets
unmarshalled into the corresponding language object as soon as it  
comes out

of the wire. I would say XML/JSON should meet 99% of the requirements
because all the platforms come with an unmarshaller for both of these.

But,If it can offer good performance improvement it is worth trying.
--Noble

On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <[EMAIL PROTECTED]>  
wrote:



On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:


A few months back I wrote a YAML update request handler to see if we
could post documents faster than with XMl.  We did see some small
speed improvements (didn't write down the numbers), but the hacked
together code was probably making it slower as well.  Not sure if
there are faster YAML libraries out there either.

We're not actually using it, since it was just a small proof of
concept type of project, but is this anything people might be
interested in?



Out of simple preference I would love to see a YAML request handler
just because I like the YAML format. If its also faster than XML,  
then

all the better.

Cheers
Alec





--
--Noble Paul


--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







RE: Different Filters

2008-02-21 Thread Owens, Martin


>> This feature was first committed 10/22/07

Great! should be there then.

> Now put filter1 as a default in your handler (same as any other
> default), and the client can turn on and off filter1 without knowing
> what exactly it is.

OK so I have to add a new search hander into solrconfig.xml with a set name,
I then use that in the query line to specify which field the search hander 
should use?

Are you able to do an example including the solrconfig or schema changes and 
show the field and how it works with the English Stemmer for instance.

Sorry for being a dunce today, I'm just not sure I'm totally understanding 
everything.

Best Regards, Martin Owens


Re: Different Filters

2008-02-21 Thread Yonik Seeley
On Thu, Feb 21, 2008 at 2:17 PM, Owens, Martin
<[EMAIL PROTECTED]> wrote:
>  > Now put filter1 as a default in your handler (same as any other
>  > default), and the client can turn on and off filter1 without knowing
>  > what exactly it is.
>
>  OK so I have to add a new search hander into solrconfig.xml with a set name,
>  I then use that in the query line to specify which field the search hander 
> should use?

What field??? or what filter?
I'm not really sure I still understand what you are trying to accomplish.
Perhaps if you have some explicit examples of what types of things
clients would send in as query parameters to Solr, and what types of
lucene queries you actually want to be generated.

-Yonik


Re: Different Filters

2008-02-21 Thread Yonik Seeley
OK, talk of different fields threw me.

To enable a client to turn on/off a specific filter without knowing
what that filter is,
add the following parameter to the query string when you want to turn
the filter on:
fq=

Then add a default for the filter1 param in lucene query syntax (like
+cat:electronics +inStock:true) to whatever handler you want to query
(refer to the examples in solrconfig.xml for how to do this).

-Yonik


On Thu, Feb 21, 2008 at 3:34 PM, Owens, Martin
<[EMAIL PROTECTED]> wrote:
>
>  > What field??? or what filter?
>  > I'm not really sure I still understand what you are trying to accomplish.
>  > Perhaps if you have some explicit examples of what types of things
>  > clients would send in as query parameters to Solr, and what types of
>  > lucene queries you actually want to be generated.
>
>  Oh dear a complete break down,
>
>  OK so our perl based software uses http to set a request to solr, we want 
> for our software to be able to control the query filters being used with each 
> search by modifying attributes in the http query string such as I think you 
> were suggesting. I need examples of how to impliment what you were talking 
> about.
>
>  Best Regards, Martin Owens
>


RE: Different Filters

2008-02-21 Thread Owens, Martin

> What field??? or what filter?
> I'm not really sure I still understand what you are trying to accomplish.
> Perhaps if you have some explicit examples of what types of things
> clients would send in as query parameters to Solr, and what types of
> lucene queries you actually want to be generated.

Oh dear a complete break down,

OK so our perl based software uses http to set a request to solr, we want for 
our software to be able to control the query filters being used with each 
search by modifying attributes in the http query string such as I think you 
were suggesting. I need examples of how to impliment what you were talking 
about.

Best Regards, Martin Owens


Re: wildcard query question

2008-02-21 Thread Chris Hostetter


: the record is found.  I was wondering how the colon character affects
: the search, and if there is another way to write a wildcard query.

most likely the issue is that your analyzer is striping out the colon 
character, hence your normal phrase search works (because the colon is 
striped out both when indexing and querying) but your wildcards don't, 
because the wildcard query string is not analyzed...

http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35a





-Hoss



Re: 2D Facet

2008-02-21 Thread Chris Hostetter

: say I have a parameter facet.field=STATE. For example we'll take 3D 
: faceting, so I'll need 2 more facet fields related to the first one. 
: Should we do something like this: 
: facet.field=STATE&f.STATE.facet.matrix=NAME&f.STATE.facet.matrix=INCOME
: Or for example we can have may be like this:
: facet.matrix=STATE,NAME,INCOME
: What would you suggest is better?

It's not something i've thought about too hard, but i was thinking along 
the line of the first example.  So STATE is the main facet for the matrix, 
and the other facets are identified as values of the f.STATE.facet.matrix 
param ("matrix" isn't really the best word, it's more like a tre of 
facet values ... for each of the top N values in the "main" facet, you 
also get the top N values of the other facets listed).

That way you could have multiple fracet trees, and a single facet could 
be part of more then one tree, it just couldn't be the main facet of 
more then one tree.  for example, imagine we want to facet cars...

  facet.limit=10 & 
  facet.field=STATE & 
  facet.field=MODEL & f.MODEL.facet.tree=COLOR & f.MODEL.facet.tree=YEAR &
  facet.field=TYPE  & f.TYPE.facet.tree=COLOR & f.TYPE.facet.tree=STATE

...that would give you completley independent facet counts for STATE, 
MODEL, and TYPE, but it would also tell you what the type 10 COLORs and 
YEARs are for each of the top 10 MODELs, and what the top 10 COLORs and 
STATEs are for each TYPE of car (even if not enough cars are in that state 
to show up in the main STATE facet)

...honestly: any permutation you want is possible, it's jsut a question of 
how to express it cleanly in key=val pair style input so it's easy to 
express over HTTP.

: Also, where in Solr I could find something similar to take it as an 
: example? Where all this logic should be placed?

the logic could o in a custom RequestHandler, or a custom Component ... if 
you look at the FacetComponent class in the nightly builds of Solr you can 
see how the current Simple faceting code is handled ... the underlying 
methods (for getting counts using DocSet intersections) can still be 
reused, you just need to pass them additional "filter" DocSets from the 
"main" facet.




-Hoss



Re: Wild card searching not working properly

2008-02-21 Thread Chris Hostetter

Problems like this depend heavily on example what the fieldtype and 
"index" analyzer is for the field you are querying on.  it's important to 
keep in mind that wildcard and fuzzy queries are not "analyzed" so things 
like lowercasing and stemming have to be taken into account -- typically 
it's useful to use copyField to have a special version of your field with 
simplified analysis for doing wildcard searches on.

as for your specific problem: given the limited information you've 
provided, no guesses immediately jump out at me as to what you should od 
to get things working the way you want ... it depends on your schema, and 
what the orriginal text was in those 3 documents you want to match.

: For example, following are the search queries and the corresponding results
: tomcat* -> 3 results
: tomca* -> 0 results
: tom*at -> 0 results
: tom~at -> 0 results


-Hoss



Re: How to search multiphrase or a middle term in a word???

2008-02-21 Thread Chris Hostetter

your "master" examples work part of the time because of the 
WordDelimiterField can tell at indextime that the capital M in the middle 
of the word is a good place to split on.

without hints like that at index time, the only way to do "middle of the 
word" searches is with wildcard type queries -- which ar really 
inefficent.

You might want to read a it about N-Grams and consider using an 
NGramTokenizer to chunk up your input words into ngrams for easy searching 
on pieces of words.

:My index data is :
: srinivasan,sweetheart,thomasmaster,thomasMaster.(totally 4 words)

:Search data : vasan

:Like that, if i search for "hear", its return nothing. The result should

:Search data : Master or master

:But mainly, i am unable to search a middle term in a word. I have



-Hoss



Re: Indexing very large files.

2008-02-21 Thread David Thibault
All,
A while back I was running into an issue with a Java heap out of memory
error while indexing large files.  I figured out that was my own error due
to a misconfiguration of my Netbeans memory settings.

However, now that is fixed and I have stumbled upon a new error.  When
trying to upload files which include a Solr TextField value of 32MB or more
in size, I get the following error (uploading with SimplePostTool):


Solr returned an error: error reading input, returned 0
javax.xml.stream.XMLStreamException: error reading input, returned 0  at
com.bea.xml.stream.MXParser.fillBuf(MXParser.java:3709)  at
com.bea.xml.stream.MXParser.more(MXParser.java:3715)  at
com.bea.xml.stream.MXParser.nextImpl(MXParser.java:1936)  at
com.bea.xml.stream.MXParser.next(MXParser.java:1333)  at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(
XmlUpdateRequestHandler.java:318)  at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(
XmlUpdateRequestHandler.java:195)  at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(
XmlUpdateRequestHandler.java:123)  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:117)  at org.apache.solr.core.SolrCore.execute(
SolrCore.java:902)  at org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:280)  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:237)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:235)  at
org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:206)  at
org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:233)  at
org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:175)  at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:109)  at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
 at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:583)  at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)  at
java.lang.Thread.run(Thread.java:613)

I suspect there's a setting somewhere that I'm overlooking that is causing
this, but after peering through the solrconfig.xml and schema.xml files I am
not seeing anything obvious (to me, anyway...=).  The second line of the
error shows it's crashing in MXParser.fillBuf, which implies that I'm
overloading the buffer (I assume due to too large of a string).

Thanks in advance for any assistance,
Dave


Re: custom handler results don't seem to match manually entered query string

2008-02-21 Thread Chris Hostetter

Hmmm... everything seems right here.  

This may be a silly question, but 
you are calling rsp.add("response", docs_main.docList) in your custom 
handler correct?

second question: how are you building up your query obejct?  the only 
thing i can think of is that you are constructing the TermQueries directly 
(without using the analyzer) so they don't match what's really in the 
index (ie: things aren't being lowercased, not splitting on "." and "_") 
but when you cut/paste the query string into standard request handler it 
uses the QueryParser which does the proper analysis.

what does debugQuery=true say about your query when you cut/paste the 
query string?

can you post the full code of your custo mrequest handler?


: Hi,
: my problem is as follows: my request handler's code
: 
: filters = null;
: DocListAndSet docs_main = searcher.getDocListAndSet(query, filters, null,
: start, rows, flags);
: String querystr = query.toString();
: rsp.add("QUERY_main", querystr);
: 
: 
: gives zero responses:
: 
:  ((text:Travel text:Home text:Online_Archives
: text:Ireland text:Consumer_Information text:Regional text:Europe text:News
: text:Complaints text:CNN.com text:February text:Transport
: text:Airlines)^0.3)
:   
: 
: 
: While copying the "QUERY_main" string into Solr admin returns full of them:
: 
: 
: (text:Travel text:Home text:Online_Archives text:Ireland
: text:Consumer_Information text:Regional text:Europe text:News
: text:Complaints text:CNN.com text:February text:Transport text:Airlines)^0.3
: 
: 10
: 2.2
: 
: 
: ÿÿ
:   
: 
: 
: 
: Please help me understand what's going on, I'm a bit confused atm. Thanks
: :-)

-Hoss


Re: Multi field queries

2008-02-21 Thread Chris Hostetter

: Documents in my solr index has three fields, name, content and summary.
: Suppose the user query be, "java sky democratic". I want the resulting
: documents to have all the terms in the query ( "java sky democratic") in
: either name, content or the summary (for example i.e., java and sky is in the
: content and democratic is in the summary).

take a look at the "dismax" request handler.  it is designed explicitly 
for this purpose.  

http://wiki.apache.org/solr/DisMaxRequestHandler

(NOTE: if you want all the input words to be required, set mm=100%)


-Hoss



Offsets in results?

2008-02-21 Thread Steve Suppe

Hi all,

I apologize if this question has been asked, but I've been unable to find 
the answer in the archives.


Is there a way to get the offsets for results from a search in JSON format, 
or even from SOLR in general (regardless of format, XML even)?  As in, I 
have fields set that I am searching over, and am returning those fields in 
my JSON object, and would like to know either a) the offsets of those 
fields or b) the offsets of the part that matched.


I understand that there is a highlighting plugin, but for various reasons 
I'd like to get my hands on the offsets themselves.  Is this something I 
need to hack up on my own?


Thanks,
Steve Suppe


Re: YAML update request handler

2008-02-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
For the case where we use Solrj (we control both ends) It is best to resort
to a custom binary format. It works fastest and with least cost /bandwidth .
We can use a custom object serialization/deserialization mechanism (java
standard serialization is verbose ) which is lightweight .

I can create a patch which can be used for the same if you think it is
useful.

--Noble



On Fri, Feb 22, 2008 at 12:20 AM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:

> XML can be a problem when it is really lengthy (lots of results, large
> results) such that a binary format could be useful in certain cases
> where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
> that deal with really large files wrapped in XML where the XML parsing
> takes a significant amount of time as compared to a more compact
> binary format.
>
> I think it at least warrants profiling/testing.
>
> -Grant
>
> On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
> नोब्ळ् wrote:
>
> > hi,
> > The format over the wire is not of great significance because it gets
> > unmarshalled into the corresponding language object as soon as it
> > comes out
> > of the wire. I would say XML/JSON should meet 99% of the requirements
> > because all the platforms come with an unmarshaller for both of these.
> >
> > But,If it can offer good performance improvement it is worth trying.
> > --Noble
> >
> > On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <[EMAIL PROTECTED]>
> > wrote:
> >
> >> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
> >>
> >>> A few months back I wrote a YAML update request handler to see if we
> >>> could post documents faster than with XMl.  We did see some small
> >>> speed improvements (didn't write down the numbers), but the hacked
> >>> together code was probably making it slower as well.  Not sure if
> >>> there are faster YAML libraries out there either.
> >>>
> >>> We're not actually using it, since it was just a small proof of
> >>> concept type of project, but is this anything people might be
> >>> interested in?
> >>>
> >>
> >> Out of simple preference I would love to see a YAML request handler
> >> just because I like the YAML format. If its also faster than XML,
> >> then
> >> all the better.
> >>
> >> Cheers
> >> Alec
> >>
> >
> >
> >
> > --
> > --Noble Paul
>
> --
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>


-- 
--Noble Paul