date:20101114

Error When Switching to Tomcat

2010-11-14 Thread Eric Martin

Hi,

 

I have been using Jetty on my linux/apache webserver for about 3 weeks now.
I decided that I should change to Tomcat after realizing I will be indexing
a lot of URL's and Jetty is good for small production sites as noted in the
Wiki. I am running into this error:

 

org.apache.solr.common.SolrException: Schema Parsing Failed at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:656) at
org.apache.solr.schema.IndexSchema.(IndexSchema.java:95) at
org.apache.solr.core.SolrCore.

 

My localhost/solr.xml :

 







 

My solrconfig.xml:

 

${solr.data.dir:/tomcat/webapps/solr/conf}

I can get to the 8080 Tomcat default page just fine.  I've gone over the
Wiki a couple of dozen times and verified that my solr.xml is configured
correctly based on trial and error and reading the error logs. I just can't
figure out where it is going wrong. I read there are three different ways to
do this. Can someone help me out?

 

I am using Solr 1.4.0 and Tomcat 5.5.30

 

Eric

Solr Negative query

2010-11-14 Thread Viswa S


Dear Solr/Lucene gurus,
I have run into a weird issue trying use a negative condition in my query.
Parser:StandardQueryParserMy Query: Field1:Val1 NOT Field2:Val2Resolved as: 
Field1:Val1 -Field2:Val2
The above query never returns any document, no matter how we use a paranthesis.
I did see some suggestions on LuceneQParser to use something like:*:* 
Field1:Val1 -Field2:Val2
This seems to return some documents, however it seems to ignore the first 
condition (Field1:Val1), Please help.
ThanksVis

Re: Solr Negative query

2010-11-14 Thread Leonardo Menezes

try
Field1:Val1 AND (*:* NOT Field2:Val2), that shoud work ok

On Sun, Nov 14, 2010 at 9:02 AM, Viswa S  wrote:

>
> Dear Solr/Lucene gurus,
> I have run into a weird issue trying use a negative condition in my query.
> Parser:StandardQueryParserMy Query: Field1:Val1 NOT Field2:Val2Resolved as:
> Field1:Val1 -Field2:Val2
> The above query never returns any document, no matter how we use a
> paranthesis.
> I did see some suggestions on LuceneQParser to use something like:*:*
> Field1:Val1 -Field2:Val2
> This seems to return some documents, however it seems to ignore the first
> condition (Field1:Val1), Please help.
> ThanksVis

Re: A Newbie Question

2010-11-14 Thread K. Seshadri Iyer

Thanks for all the responses.

Govind: To answer your question, yes, all I want to search is plain text
files. They are located in NFS directories across multiple Solaris/Linux
storage boxes. The total storage is in hundreds of terabytes.

I have just got started with Solr and my understanding is that I will
somehow need Tika to help stream/upload files to Solr. I don't know anything
about Java programming, being a system admin. So far, I have read that the
autodetect parser in Tika will somehow detect the file type and I can use
the stream to populate Solr. How, that is still a mystery to me - working on
it. Any tips appreciated; thanks in advance.

Sesh



On 13 November 2010 15:24, Govind Kanshi  wrote:

> Another pov you might want to think about - what kind of search you want.
> Just plain - full text search or there is something more to those text
> files. Are they grouped in folders? Do the folders imply certain kind of
> grouping/hierarchy/tagging?
>
> I recently was trying to help somebody who had files across lot of places
> grouped by date/subject/author - he wanted to ensure these are "fields"
> which too can act as filters/navigators.
>
> Just an input - ignore it if you just want plain full text search.
>
> On Sat, Nov 13, 2010 at 11:25 AM, Lance Norskog  wrote:
>
> > About web servers: Solr is a servlet war file and needs a Java web server
> > "container" to run. The example/ folder in the Solr disribution uses
> > 'Jetty', and this is fine for small production-quality projects.  You can
> > just copy the example/ directory somewhere to set up your own running
> Solr;
> > that's what I always do.
> >
> > About indexing programs: if you know Unix scripting, it may be easiest to
> > walk the file system yourself with the 'find' program and create Solr
> input
> > XML files.
> >
> > But yes, you definitely want the Solr 1.4 Enterprise manual. I spent
> months
> > learning this stuff very slowly, and the book would have been great back
> > then.
> >
> > Lance
> >
> >
> > Erick Erickson wrote:
> >
> >> Think of the data import handler (DIH) as Solr pulling data to index
> >> from some source based on configuration. So, once you set up
> >> your DIH config to point to your file system, you issue a command
> >> to solr like "OK, do your data import thing". See the
> >> FileListEntityProcessor.
> >> http://wiki.apache.org/solr/DataImportHandler
> >>
> >> SolrJ is a clent library
> >> you'd use to push data to Solr. Basically, you
> >> write a Java program that uses SolrJ to walk the file system, find
> >> documents, create a Solr document and sent that to Solr. It's not
> >> nearly as complex as it sounds. See:
> >> http://wiki.apache.org/solr/Solrj
> >>
> >> It's probably worth your while to
> get
> >> a
> >> copy of "Solr 1.4, Enterprise Search Server"
> >> by Erik Pugh and David Smiley.
> >>
> >> Best
> >> Erick
> >>
> >> On Fri, Nov 12, 2010 at 8:37 AM, K. Seshadri Iyer >> >wrote:
> >>
> >>
> >>
> >>> Hi Lance,
> >>>
> >>> Thank you very much for responding (not sure how I reply to the group,
> >>> so,
> >>> writing to you).
> >>>
> >>> Can you please expand on your suggestion? I am not a web guy and so,
> >>> don't
> >>> know where to start.
> >>>
> >>> What is the difference between SolrJ and DataImportHandler? Do I need
> to
> >>> set
> >>> up web servers on all my storage boxes?
> >>>
> >>> Apologies for the basic level of questions, but hope I can get started
> >>> and
> >>> implement this before the year end (you know why :o)
> >>>
> >>> Thanks,
> >>>
> >>> Sesh
> >>>
> >>> On 12 November 2010 13:31, Lance Norskog  wrote:
> >>>
> >>>
> >>>
>  Using 'curl' is fine. There is a library called SolrJ for Java and
>  other libraries for other scripting languages that let you upload with
>  more control. There is a thing in Solr called the DataImportHandler
>  that lets you script walking a file system.
> 
>  On Thu, Nov 11, 2010 at 8:38 PM, K. Seshadri Iyer<
> seshadri...@gmail.com
> 
>  wrote:
> 
> 
> > Hi,
> >
> > Pardon me if this sounds very elementary, but I have a very basic
> >
> >
>  question
> 
> 
> > regarding Solr search. I have about 10 storage devices running
> Solaris
> >
> >
>  with
> 
> 
> > hundreds of thousands of text files (there are other files, as well,
> >
> >
>  but
> >>>
> >>>
>  my
> 
> 
> > target is these text files). The directories on the Solaris boxes are
> > exported and are available as NFS mounts.
> >
> > I have installed Solr 1.4 on a Linux box and have tested the
> >
> >
>  installation,
> 
> 
> > using curl to post  documents. However, the manual says that curl is
> >
> >
>  not
> >>>
> >>>
>  the
> 
> 
> > recommended way of posting documents to Solr. Could someone please
> tell
> >
>

Re: Error When Switching to Tomcat

2010-11-14 Thread Ahmet Arslan

Move solr.war file and solrhome directory somewhere else outside the tomcat 
webapps. Like /home/foo. Tomcat will generate webapps/solr automatically.

This is what i use: under catalineHome/conf/Catalina/localhost/solr.xml


   


I also delete ...  entry from solrconfig.xml. So that data 
dir is created under the solrhome directory.

http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat


> I have been using Jetty on my linux/apache webserver for
> about 3 weeks now.
> I decided that I should change to Tomcat after realizing I
> will be indexing
> a lot of URL's and Jetty is good for small production sites
> as noted in the
> Wiki. I am running into this error:
> 
>  
> 
> org.apache.solr.common.SolrException: Schema Parsing Failed
> at
> org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:656)
> at
> org.apache.solr.schema.IndexSchema.(IndexSchema.java:95)
> at
> org.apache.solr.core.SolrCore.
> 
>  
> 
> My localhost/solr.xml :
> 
>  
> 
>  privileged="true"
> allowLinking="true" crossContext="true">
> 
>  value="/tomcat/webapps/solr/" override="true" />
> 
> 
> 
>  
> 
> My solrconfig.xml:
> 
>  
> 
> ${solr.data.dir:/tomcat/webapps/solr/conf}
> 
> I can get to the 8080 Tomcat default page just fine. 
> I've gone over the
> Wiki a couple of dozen times and verified that my solr.xml
> is configured
> correctly based on trial and error and reading the error
> logs. I just can't
> figure out where it is going wrong. I read there are three
> different ways to
> do this. Can someone help me out?

Solr TermsComponent: space in term

2010-11-14 Thread Parsa Ghaffari

Hi folks,

I'm using Solr 1.4.1 and I'm willing to use TermsComponent for AutoComplete.
The problem is, I can't get it to match strings with spaces in them. So to
say,

terms.fl=name&terms.lower=david&terms.prefix=david&terms.lower.incl=false&indent=true&wt=json

matches all strings starting with "david" but if I change it to:

terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json

it doesn't match all strings starting with "david ". Is it meant to be that
way? If so, are n-grams the way to go? And does anybody know if
TermsComponent is implementing Tries or DAWGs or Raddix trees and if it's
efficient?

Cheers,
Parsa

Re: Solr TermsComponent: space in term

2010-11-14 Thread Ahmet Arslan

> terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json
> 
> it doesn't match all strings starting with "david ". Is it
> meant to be that
> way? 

This is about fielyType of name field. What is it? If it does have 
ShingleFilterFactory in it, then this is expected.

Re: Testing/packaging question

2010-11-14 Thread Bernhard Reiter

Hi,

and thanks for your hints. I've done some additional research and found
that there doesn't really seem to be any possibility of an embedded solr
server in solrpy.

Jetty, then. It'd all be probably kinda easy if it weren't for the way
things are unbundled in debian. I've recently posted to the debian-java
ML, but received no replay -- maybe because it's so solr-specific (so,
sorry for cross-posting here). It's kinda long as I've tried to capture
the issues I've run into so far. I'd really appreciate any help -- maybe
there are some solr ninjas here that know how to work around the
problems I'm experiencing.

> [...]
> The tests are basically a python file and a corresponding schema.xml.
> Normally, it's suggested to just use solr's example directory (with
> integrated jetty) and replace its schema.xml with the one provided by
> solrpy, and then run it via java -jar start.jar
> 
> 
> 
> On debian however, solr and jetty are unbundled (for good reason), and
i
> can't of course just replace solr's example schema.xml (which gets
> installed to /etc/solr/conf).
> 
> I've started by looking for ways to run solr on jetty as non-root by
> basically running
> 
> java -jar /usr/share/jetty/start.jar
> 
> But that is highly non-trivial, as jetty's configuration files seem to
> point to solr's in a pretty hardcoded way.
> 
> I was able to change jetty's logs and home location by
> 
> java -jar -Djetty.home=/usr/share/jetty
>
-Djetty.logs=/home/me/solr/logs /usr/share/jetty/start.jar /etc/jetty/jetty.xml 
> 
> I can run jetty as a non-root user by issuing that command from
> within /usr/share/jetty, but it fails to start solr, as I haven't yet
> found a way to make solr use another datadir than /var/lib/solr/data
--
> which is hardcoded in /etc/solr/conf/solrconfig.xml.
> 
> I think I might be able to change the latter by having the solr-common
> package change its /etc/solr/conf/solrconfig.xml to read
> 
> ${solr.data.dir}
> 
> instead of
> 
> /var/lib/solr/data
> 
> and put a solrcore.properties into solr.home
> 
> #solrcore.properties
> data.dir=/data/solrindex
> 
> ( as seen over at
>
http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution )
> -- or just pass -Dsolr.data.dir=/data/dir from within some startup
> script. 
> 
> Would such a modification of solr-common have any chance to be
accepted
> by the java team?
> 
> ---
> 
> With schema.xml, I'm rather clueless. -Dsolr.solr.home=/home/me/solr/
> doesn't do the trick as this stuff is also hardcoded
> (in /usr/share/solr/WEB-INF/jetty-web.xml -- which is part of
> solr-jetty). Actually, that's the only purpose of jetty-web.xml, which
> is evaluated because of jetty's context settings (I think). Moreover,
> I'm trying not to duplicate the other contents of that directory, so
> setting solr.solr.home to a different location might not be what I
want
> at all.
> 
> I think my most relevant questions are:
> 1. Is anyone able to successfully run jetty with solr
> via /usr/share/jetty/start.jar as a non-root user (ideally from within
> any directory, not just from within /usr/share/jetty)? If yes: how?
> 2. How can I make solr use my schema.xml instead of the one installed?
> 3. I haven't tried the whole thing in tomcat yet, so does anyone know
if
> it would be easier there -- maybe because configuration files aren't
> that much entangled and hardcoded there? (In the end, I might have to
do
> the whole thing also for tomcat anyway as I'm build-depending on
> solr-jetty | solr-tomcat which conflict each other).
> 
> I'm kinda desperate with the load of configuration files, so I'd
really
> appreciate any help that gets me closer to get this thing done.
> Kind regards
> Bernhard

Am Donnerstag, den 04.11.2010, 23:57 +0100 schrieb Peter Karich:
> Hi,
> 
> don't know if the python package provides one but solrj offers to start 
> solr embedded (|EmbeddedSolrServer|) and
> setting up different schema + config is possible. for this see:
> https://karussell.wordpress.com/2010/06/10/how-to-test-apache-solrj/
> 
> if you need an 'external solr' (via jetty and java -jar start.jar) while 
> tests running see this:
> http://java.dzone.com/articles/getting-know-solr
> 
> Regards,
> Peter.
> 
> 
> > Hi,
> >
> > I'm pretty much of a Solr newbie currently packaging solrpy for Debian;
> > see
> > http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/
> >
> > In order to run solrpy's supplied tests at build time, I'd need Solr to
> > know about the schema.xml that comes with the tests.
> > Can anyone tell me how do that properly? I'd basically need Solr to
> > temporarily recognize that schema.xml without permanently installing it
> > -- is there any way to do this, eg via environment variables?
> >
> > TIA
> > Bernhard Reiter
> >
>

Re: Solr TermsComponent: space in term

2010-11-14 Thread Ahmet Arslan

> I'm using Solr 1.4.1 and I'm willing to use TermsComponent
> for AutoComplete.
> The problem is, I can't get it to match strings with spaces
> in them. So to
> say,
> 
> terms.fl=name&terms.lower=david&terms.prefix=david&terms.lower.incl=false&indent=true&wt=json
> 
> matches all strings starting with "david" but if I change
> it to:
> 
> terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json
> 
> it doesn't match all strings starting with "david ". Is it
> meant to be that
> way? 

This is about fielyType of name? What is it? If it does have 
ShingleFilterFactory in it, then this is expected.

RE: Testing/packaging question

2010-11-14 Thread Bernhard Reiter

Hi,

I have up to now focussed on Jetty as it's already bundled with solr.
The main issue there seems to be the way it's unbundled by Debian; I
figure things might be similar with Tomcat, depending on how entangled
configuration is there. 

Before I dig deeper into the Tomcat option: would you mind sending me
the script you mention? Maybe that could clear up things a bit.

Regards
Bernhard

Am Donnerstag, den 04.11.2010, 17:16 -0500 schrieb Turner, Robbin J:
> You can setup your own tomcat instance which would contain just 
> configurations you need.  You won't even have to recreate all the tomcat 
> configuration and binaries, just the ones that were not defaults.  So, if you 
> lookup multiple tomcat configuration instance (google it), and then you'll 
> have a set of directories.   You'll need to have your own startup script that 
> points to your configurations.  You can use the current startup script as a 
> model, then in your build procedures (I've done all this with a script) have 
> this added to the system so you can preform restart.  You'd have to have a 
> couple of other environment variables set:
> 
> export CATALINA_BASE=/path/to/your/tomcat/instance/conf/files
> export CATALINA_HOME=/path/to/default/installation/bin/files
> export SOLR_HOME=/path/to/solr/dataNconf
> 
> Good luck
> 
> 
> From: Bernhard Reiter [ock...@raz.or.at]
> Sent: Thursday, November 04, 2010 5:49 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Testing/packaging question
> 
> Thanks for your instructions. Unfortunately, I need to do all that as
> part of my package's (python-solrpy) build procedure, so I can't change
> any global configuration, such as in the catalina subdirectories.
> 
> I've already sensed that restarting tomcat is also just too
> system-invasive and would include changing its (system-wide)
> configuration.
> 
> Are there any other ways to use solr for running the tests from
> http://pypi.python.org/packages/source/s/solrpy/solrpy-0.9.3.tar.gz
> without having to change any system configuration? Maybe via a user
> Tomcat instance such as provided by the tomcat6-user debian package?
> 
> Thanks for your help!
> Bernhard
> 
> Am Donnerstag, den 04.11.2010, 16:15 -0500 schrieb Turner, Robbin J:
> > You need to either add that to catalina.sh or create a setenv.sh in the 
> > CATALINA_HOME/bin directory.  Then you can restart tomcat.
> >
> > So, setenv.sh would contain the following:
> >
> >export JAVA_HOME="/path/to/jre"
> >export JAVA_OPTS="="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml"
> >
> > If you were setting the export in your own environment and then issuing the 
> > restart, tomcat was not picking up your local environment because it's 
> > running as root.  You don't want to change root's environment.
> >
> > You could also, create a context.xml in you 
> > CATALINA_HOME/conf/CATALINA/localhost.  You should be able to find those 
> > instruction on/through the Solr FAQ.
> >
> > Hope this helps.
> > 
> > From: Bernhard Reiter [ock...@raz.or.at]
> > Sent: Thursday, November 04, 2010 4:49 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Testing/packaging question
> >
> > Hi,
> >
> > I'm now trying to
> >
> > export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml"
> >
> > and restarting tomcat (v6 package from ubuntu maverick) via
> >
> > sudo /etc/init.d/tomcat6 restart
> >
> > but solr still doesn't seem to find that schema.xml, as it complains
> > about unknown fields when running the tests that require that schema.xml
> >
> > Can someone please tell me what I'm doing wrong -- and what I should be
> > doing?
> >
> > TIA again,
> > Bernhard
> >
> > Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter:
> > > Hi,
> > >
> > > I'm pretty much of a Solr newbie currently packaging solrpy for Debian;
> > > see
> > > http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/
> > >
> > > In order to run solrpy's supplied tests at build time, I'd need Solr to
> > > know about the schema.xml that comes with the tests.
> > > Can anyone tell me how do that properly? I'd basically need Solr to
> > > temporarily recognize that schema.xml without permanently installing it
> > > -- is there any way to do this, eg via environment variables?
> > >
> > > TIA
> > > Bernhard Reiter

Re: Solr TermsComponent: space in term

2010-11-14 Thread Parsa Ghaffari

Hi Ahmet,

This is the fieldType for "name":


  




  
  





  


and:



there's no ShingleFilterFactory. And also after changing parameters in the
schema, should one re-index the table?


On Sun, Nov 14, 2010 at 10:32 PM, Ahmet Arslan  wrote:

> > I'm using Solr 1.4.1 and I'm willing to use TermsComponent
> > for AutoComplete.
> > The problem is, I can't get it to match strings with spaces
> > in them. So to
> > say,
> >
> >
> terms.fl=name&terms.lower=david&terms.prefix=david&terms.lower.incl=false&indent=true&wt=json
> >
> > matches all strings starting with "david" but if I change
> > it to:
> >
> >
> terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json
> >
> > it doesn't match all strings starting with "david ". Is it
> > meant to be that
> > way?
>
> This is about fielyType of name? What is it? If it does have
> ShingleFilterFactory in it, then this is expected.
>
>
>
>


-- 
Parsa B. Ghaffari

Re: Solr TermsComponent: space in term

2010-11-14 Thread Ahmet Arslan


--- On Sun, 11/14/10, Parsa Ghaffari  wrote:

> From: Parsa Ghaffari 
> Subject: Re: Solr TermsComponent: space in term
> To: solr-user@lucene.apache.org
> Date: Sunday, November 14, 2010, 5:06 PM
> Hi Ahmet,
> 
> This is the fieldType for "name":
> 
>      class="solr.TextField"
> positionIncrementGap="100">
>       
>          class="solr.WhitespaceTokenizerFactory"/>
>          class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"
> />
>          class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
> catenateWords="1"
> catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="0"/>
>          class="solr.LowerCaseFilterFactory"/>
>       
>       
>          class="solr.WhitespaceTokenizerFactory"/>
>          class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>          class="solr.StopFilterFactory"
>                
> ignoreCase="true"
>                
> words="stopwords.txt"
>                
> enablePositionIncrements="true"
>                
> />
>          class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
> catenateWords="0"
> catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="0"/>
>          class="solr.LowerCaseFilterFactory"/>
>       
>     
> 
> and:
> 
>  stored="true"/>
> 
> there's no ShingleFilterFactory. And also after changing
> parameters in the
> schema, should one re-index the table?

Yes yes, re-index and restart servlet container is required. What kind of 
values does name field take? Does it contains punctuations? Can you give some 
examples of that field's values?

Re: Solr TermsComponent: space in term

2010-11-14 Thread Parsa Ghaffari

Alphanumeric + "_" + "%" + "."

So to say: "John_Smith", "John Smith", "John_B._Smith" and "John 44 Smith"
are all possible values.

On Sun, Nov 14, 2010 at 11:46 PM, Ahmet Arslan  wrote:

>
> --- On Sun, 11/14/10, Parsa Ghaffari  wrote:
>
> > From: Parsa Ghaffari 
> > Subject: Re: Solr TermsComponent: space in term
> > To: solr-user@lucene.apache.org
> > Date: Sunday, November 14, 2010, 5:06 PM
> > Hi Ahmet,
> >
> > This is the fieldType for "name":
> >
> >  > class="solr.TextField"
> > positionIncrementGap="100">
> >   
> >  > class="solr.WhitespaceTokenizerFactory"/>
> >  > class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"
> > />
> >  > class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1"
> > catenateWords="1"
> > catenateNumbers="1" catenateAll="0"
> > splitOnCaseChange="0"/>
> >  > class="solr.LowerCaseFilterFactory"/>
> >   
> >   
> >  > class="solr.WhitespaceTokenizerFactory"/>
> >  > class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> >  > class="solr.StopFilterFactory"
> >
> > ignoreCase="true"
> >
> > words="stopwords.txt"
> >
> > enablePositionIncrements="true"
> >
> > />
> >  > class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1"
> > catenateWords="0"
> > catenateNumbers="0" catenateAll="0"
> > splitOnCaseChange="0"/>
> >  > class="solr.LowerCaseFilterFactory"/>
> >   
> > 
> >
> > and:
> >
> >  > stored="true"/>
> >
> > there's no ShingleFilterFactory. And also after changing
> > parameters in the
> > schema, should one re-index the table?
>
> Yes yes, re-index and restart servlet container is required. What kind of
> values does name field take? Does it contains punctuations? Can you give
> some examples of that field's values?
>
>
>
>


-- 
Parsa B. Ghaffari

Re: Solr Negative query

2010-11-14 Thread Yonik Seeley

On Sun, Nov 14, 2010 at 4:17 AM, Leonardo Menezes
 wrote:
> try
> Field1:Val1 AND (*:* NOT Field2:Val2), that shoud work ok

That should be equivalent to Field1:Val1 -Field2:Val2
You only need the *:* trick if all of the clauses of a boolean query
are negative.

-Yonik
http://www.lucidimagination.com

Re: full text search in multiple fields

2010-11-14 Thread PeterKerk


Ok, thanks. it works now for title and description fields. :)

But now I also need it for the city. And I cant get that to work, even
though im doing the exact same (or so I think).

I now have the code below for the city field. 
(Im defining city field twice in my data-config and schema.xml but thats
because I want the city field to be indexed both as string (whole value) and
as text. Though thats not the point now.)

data-config.xml




schema.xml

  




URL:
http://localhost:8983/solr/db/select/?q=amsterdam&defType=dismax&qf=citytext_search^10.0

The value in the db for the city field is "amsterdam"

but no results are found. and yes: restarted server, reloaded data-config,
did a full import.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p1900535.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

2010-11-14 Thread Ahmet Arslan



--- On Sun, 11/14/10, PeterKerk  wrote:

> From: PeterKerk 
> Subject: Re: full text search in multiple fields
> To: solr-user@lucene.apache.org
> Date: Sunday, November 14, 2010, 8:52 PM
> 
> Ok, thanks. it works now for title and description fields.
> :)
> 
> But now I also need it for the city. And I cant get that to
> work, even
> though im doing the exact same (or so I think).
> 
> I now have the code below for the city field. 
> (Im defining city field twice in my data-config and
> schema.xml but thats
> because I want the city field to be indexed both as string
> (whole value) and
> as text. Though thats not the point now.)
> 
> data-config.xml
> 
> 
> 
> 
> schema.xml
>  stored="true"/>
>  stored="true"/>  
> 
>  stored="true"/>
>  dest="citytext_search"/>
> 
> URL:
> http://localhost:8983/solr/db/select/?q=amsterdam&defType=dismax&qf=citytext_search^10.0
> 
> The value in the db for the city field is "amsterdam"

Everything seems fine. What happens when you do these two queries?

8983/solr/db/select/?q=citytext_search:amsterdam&defType=lucene
8983/solr/db/select/?q=citytext_search:[* TO *]&defType=lucene

Re: full text search in multiple fields

2010-11-14 Thread PeterKerk


both queries give me 0 results...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p1900648.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

2010-11-14 Thread Ahmet Arslan

 
> both queries give me 0 results...

Then your field(s) is not populated. You can debug on /admin/dataimport.jsp
or /admin/schema.jsp

Re: Default file locking on trunk

2010-11-14 Thread Lance Norskog

Ok, more detail: I was testing using the NoMergePolicy in Solr. As
Hoss pointed out in another thread, NoMergePolicy has no 0-argument
constructor, and so throws an exception during loading the core.

When there is no existing data/index/ directory, Solr creates a new
index/ directory at the beginning of loading the core, locks it, but
does not flush out an empty index. Here's the problem: when the core
fails while being loaded (in this case because the core configuration
was bogus) it left the index/ directory locked. It did not flush out
the new empty index (just the segment* files).

So, if a core has no index, and fails during loading, it should either
write out an empty index as it intended to, or remove the half-built
data/index/ directory. Or just not make the empty index until loading
completes?

Lance

On Wed, Nov 10, 2010 at 11:52 AM, Chris Hostetter
 wrote:
>
> : There is now a data/index with a write lock file in it. I have not
> : attempted to read the index, let alone add something to it.
> : I start solr again, and it cannot open the index because of the write lock.
>
> Lance, i can't reproduce using trunk r1033664 on Linux w/ext4 -- what OS &
> Filesystem are you using?
>
> If you load "http://localhost:8983/solr/admin/stats.jsp"; what does it list
> for the "reader" and "readerDir" in the "searcher" entry?
>
> : Why is there a write lock file when I have not tried to index anything?
>
> No idea ... i don't get any write locks until i actually attempt to index
> something.
>
>
>
> -Hoss
>

-- 
Lance Norskog
goks...@gmail.com

RE: Error When Switching to Tomcat

2010-11-14 Thread Eric Martin

Hi,

Thank you! I got it working after you jarred my brain. Of course, the
location of the solr instance is arbitrary/logical to tomcat. Sheesh, I feel
kind of small, now. Anyway, I was able to clearly see my mistake from your
information.

As with all help I get from here I posted my fix/walkthrough for others to
see here:

http://drupal.org/node/716632

Thanks a bunch! You helped me and anyone else coming to the Drupal site for
help with Tomcat and Solr :-)

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Sunday, November 14, 2010 2:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Error When Switching to Tomcat

Move solr.war file and solrhome directory somewhere else outside the tomcat
webapps. Like /home/foo. Tomcat will generate webapps/solr automatically.

This is what i use: under catalineHome/conf/Catalina/localhost/solr.xml


   


I also delete ...  entry from solrconfig.xml. So that
data dir is created under the solrhome directory.

http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomca
t


> I have been using Jetty on my linux/apache webserver for
> about 3 weeks now.
> I decided that I should change to Tomcat after realizing I
> will be indexing
> a lot of URL's and Jetty is good for small production sites
> as noted in the
> Wiki. I am running into this error:
> 
>  
> 
> org.apache.solr.common.SolrException: Schema Parsing Failed
> at
> org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:656)
> at
> org.apache.solr.schema.IndexSchema.(IndexSchema.java:95)
> at
> org.apache.solr.core.SolrCore.
> 
>  
> 
> My localhost/solr.xml :
> 
>  
> 
>  privileged="true"
> allowLinking="true" crossContext="true">
> 
>  value="/tomcat/webapps/solr/" override="true" />
> 
> 
> 
>  
> 
> My solrconfig.xml:
> 
>  
> 
> ${solr.data.dir:/tomcat/webapps/solr/conf}
> 
> I can get to the 8080 Tomcat default page just fine. 
> I've gone over the
> Wiki a couple of dozen times and verified that my solr.xml
> is configured
> correctly based on trial and error and reading the error
> logs. I just can't
> figure out where it is going wrong. I read there are three
> different ways to
> do this. Can someone help me out?

Re: full text search in multiple fields

2010-11-14 Thread PeterKerk


Ok, that makes sense ;)

but I dont understand why its not indexed.
IMO, I've defined the "city_search" field the exact same as "city" in the
schema.xml:






So I checked the schema.jsp you suggested.

When under fields I click on the respective fields, I get this output:

Field: city
Field Type: string
Properties: Indexed, Stored, Omit Norms, Sort Missing Last
Schema: Indexed, Stored, Omit Norms, Sort Missing Last
Index: Indexed, Stored, Omit Norms
Index Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer
Query Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer
Docs: 7
Distinct: 6


Field: city_search
Field Type: string
Properties: Indexed, Stored, Omit Norms, Sort Missing Last
Copied Into: citytext_search
Index Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer
Query Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer 


Field: citytext_search
Field Type: text
Properties: Indexed, Tokenized, Stored
Copied From: city_search
Position Increment Gap: 100
Index Analyzer: org.apache.solr.analysis.TokenizerChain Details
Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory
Filters:
   1. org.apache.solr.analysis.StopFilterFactory args:{words:
stopwords_dutch.txt ignoreCase: true luceneMatchVersion: LUCENE_24 }
   2. org.apache.solr.analysis.WordDelimiterFilterFactory
args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1
luceneMatchVersion: LUCENE_24 generateWordParts: 1 catenateAll: 0
catenateNumbers: 1 }
   3. org.apache.solr.analysis.LowerCaseFilterFactory
args:{luceneMatchVersion: LUCENE_24 }
   4. org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected:
protwords.txt luceneMatchVersion: LUCENE_24 }
   5. org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory
args:{luceneMatchVersion: LUCENE_24 }
Query Analyzer: org.apache.solr.analysis.TokenizerChain Details


So you can see, that the city field DOES index some data, whereas the
city_search and citytext_search have NO data at all...

This debugging has confirmed that no data is indexed, but to me doesnt
provide any more info on what I did wrong

Do you have any suggestion?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p1901551.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deletes writing bytes len 0, corrupting the index

2010-11-14 Thread Jason Rutherglen

In addition, I had tried and since backed away from (on Solr) indexing
heavily while also searching on the same server.  This would lock up
segments and searchers longer than the disk space would allow.  I
think that part of Solr can be rewritten to better handle this N/RT
use case as there is no reason indexing and searching should not be
more possible (even with our somewhat large and sometimes long running
queries).  I've since gone back to using replication to avoid running
into indexing + searching and out of disk space exceptions.  In RT,
replication will probably not be useful because there will not be a
way to replicate analyzed documents across the wire.

> Can you post the exceptions you hit?  (Are these logged?).

Copied below are several.  We managed to experience a panoply of exceptions:

SEVERE: org.apache.lucene.index.MergePolicy$MergeException:
MergePolicy selected non-contiguous segments to merge (
_1pw:C4035->_1pb _1py:C19180->_1py _1pz:C22252->_1py _1q0:C23005->_1py
_1q1:C22051->_1py _1q2:C19520->_1py _1q3:C17
143->_1py _1q4:C18015->_1py _1q5:C19764->_1py _1q6:C18967->_1py vs
_v7:C10151578 _1pw:C9958546 _2kn:C10372070 _3fs:
C11462047 _4af:C11120971 _55i:C12402453 _60d:C11249698 _6v8:C11308887
_7py:C13299679 _8ku:C11369240 _sy:C12218112 _
1np:C11597131 _1ns:C65126 _1o3:C65375 _1oe:C63724 _1op:C60821
_1p0:C80242 _1pa:C118076 _1pl:C170005->_1pb _1px:C213
967->_1pb _1pw:C4035->_1pb _1py:C19180->_1py _1pz:C22252->_1py
_1q0:C23005->_1py _1q1:C22051->_1py _1q2:C19520->_1p
y _1q3:C17143->_1py _1q4:C18015->_1py _1q5:C19764->_1py
_1q6:C18967->_1py _1q7:C15903->_1py _1q8:C15061->_1py _1q9:
C17304->_1py _1qa:C16683->_1py _1qb:C16076->_1py _1qc:C15160->_1py
_1qd:C14191->_1py _1qe:C13448->_1py _1qf:C13002-
>_1py _1qg:C13040->_1py _1qh:C13222->_1py _1qi:C12896->_1py _1qj:C12559->_1py 
>_1qk:C12163->_1py), which IndexWriter
 (currently) cannot handle

on solr04:
Nov 6, 2010 2:44:31 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: 
NativeFSLock@/mnt/solr/./data/index/lucene-5a92641e18d5832f54989a60e612116b-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)

Exception in thread "Lucene Merge Thread #2"
org.apache.lucene.index.MergePolicy$MergeException:
java.io.IOException: No space left on device
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315)
Caused by: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:499)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
at 
org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
at 
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
at 
org.apache.lucene.store.BufferedIndexOutput.seek(BufferedIndexOutput.java:124)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.seek(SimpleFSDirectory.java:217)
at 
org.apache.lucene.index.TermInfosWriter.close(TermInfosWriter.java:220)
at 
org.apache.lucene.index.FormatPostingsFieldsWriter.finish(FormatPostingsFieldsWriter.java:70)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:589)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:154)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5029)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)

null -
java.lang.RuntimeException:
org.apache.lucene.index.CorruptIndexException: doc counts differ for
segment _v4: fieldsReader shows 117150 but segmentInfo shows 10331041
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1079) at
org.apache.solr.core.SolrCore.(SolrCore.java:583) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.solr.request.

Re: Rollback can't be done after committing?

2010-11-14 Thread Jason Rutherglen

The timed deletion policy is a bit too abstract, as is keeping a
numbered limit of commit points.  How would one know what they're
rolling back to when num limit is defined?

I think committing to a name and being able to roll back to it in Solr
is a good feature to add.

On Fri, Nov 12, 2010 at 2:47 AM, Michael McCandless
 wrote:
> In fact Lucene can rollback to a previous commit.
>
> You just need to use a deletion policy that preserves past commits
> (the default policy only keeps the most recent commit).
>
> Once you have multiple commits in the index you can do fun things like
> open an IndexReader on an old commit, rollback (open an IndexWriter on
> an old commit, deleting the "future" commits).  You can even open an
> IndexWriter on an old commit yet still preserve the newer commits, to
> "revert" changes to the index yet preserve the history.
>
> You can use IndexReader.listCommits to get all commits currently in the index.
>
> But I'm not sure if these capabilities are exposed yet through Solr.
>
> Mike
>
> On Thu, Nov 11, 2010 at 10:25 PM, Pradeep Singh  wrote:
>> In some cases you can rollback to a named checkpoint. I am not too sure but
>> I think I read in the lucene documentation that it supported named
>> checkpointing.
>>
>> On Thu, Nov 11, 2010 at 7:12 PM, gengshaoguang 
>> wrote:
>>
>>> Hi, Kouta:
>>> Any data store does not support rollback AFTER commit, rollback works only
>>> BEFORE.
>>>
>>> On Friday, November 12, 2010 12:34:18 am Kouta Osabe wrote:
>>> > Hi, all
>>> >
>>> > I have a question about Solr and SolrJ's rollback.
>>> >
>>> > I try to rollback like below
>>> >
>>> > try{
>>> > server.addBean(dto);
>>> > server.commit;
>>> > }catch(Exception e){
>>> >  if (server != null) { server.rollback();}
>>> > }
>>> >
>>> > I wonder if any Exception thrown, "rollback" process is run. so all
>>> > data would not be updated.
>>> >
>>> > but once commited, rollback would not be well done.
>>> >
>>> > rollback correctly will be done only when "commit" process will not?
>>> >
>>> > Solr and SolrJ's rollback system is not the same as any RDB's rollback?
>>>
>>>
>>
>

Re: full text search in multiple fields

2010-11-14 Thread Ahmet Arslan


> but I dont understand why its not indexed.

Probably something wrong with data-config.xml.

> So you can see, that the city field DOES index some data,
> whereas the
> city_search and citytext_search have NO data at all...

Then populate these two fields from city via copyField. It is 100% legal.

Re: Rollback can't be done after committing?

2010-11-14 Thread Lance Norskog

This feature would make the ReplicationHandler more robust in its own 
practice of "reserving" previous commit points, by pushing that code out 
into Solr proper.


Jason Rutherglen wrote:

The timed deletion policy is a bit too abstract, as is keeping a
numbered limit of commit points.  How would one know what they're
rolling back to when num limit is defined?

I think committing to a name and being able to roll back to it in Solr
is a good feature to add.

On Fri, Nov 12, 2010 at 2:47 AM, Michael McCandless
  wrote:
   

In fact Lucene can rollback to a previous commit.

You just need to use a deletion policy that preserves past commits
(the default policy only keeps the most recent commit).

Once you have multiple commits in the index you can do fun things like
open an IndexReader on an old commit, rollback (open an IndexWriter on
an old commit, deleting the "future" commits).  You can even open an
IndexWriter on an old commit yet still preserve the newer commits, to
"revert" changes to the index yet preserve the history.

You can use IndexReader.listCommits to get all commits currently in the index.

But I'm not sure if these capabilities are exposed yet through Solr.

Mike

On Thu, Nov 11, 2010 at 10:25 PM, Pradeep Singh  wrote:
 

In some cases you can rollback to a named checkpoint. I am not too sure but
I think I read in the lucene documentation that it supported named
checkpointing.

On Thu, Nov 11, 2010 at 7:12 PM, gengshaoguangwrote:

   

Hi, Kouta:
Any data store does not support rollback AFTER commit, rollback works only
BEFORE.

On Friday, November 12, 2010 12:34:18 am Kouta Osabe wrote:
 

Hi, all

I have a question about Solr and SolrJ's rollback.

I try to rollback like below

try{
server.addBean(dto);
server.commit;
}catch(Exception e){
  if (server != null) { server.rollback();}
}

I wonder if any Exception thrown, "rollback" process is run. so all
data would not be updated.

but once commited, rollback would not be well done.

rollback correctly will be done only when "commit" process will not?

Solr and SolrJ's rollback system is not the same as any RDB's rollback?

Re: Deletes writing bytes len 0, corrupting the index

2010-11-14 Thread Lance Norskog

Here is a separate configuration: use separate Solr instances for 
indexing and querying. Both point to the same data directory. A 'commit' 
to the query Solr reloads the index. It works in read-only mode- for 
production mode, I would make the indexer and queryer in different 
permissions so that the query instance cannot possible alter the index.


This seems like the natural design for shared-file-system deployments.

Jason Rutherglen wrote:

In addition, I had tried and since backed away from (on Solr) indexing
heavily while also searching on the same server.  This would lock up
segments and searchers longer than the disk space would allow.  I
think that part of Solr can be rewritten to better handle this N/RT
use case as there is no reason indexing and searching should not be
more possible (even with our somewhat large and sometimes long running
queries).  I've since gone back to using replication to avoid running
into indexing + searching and out of disk space exceptions.  In RT,
replication will probably not be useful because there will not be a
way to replicate analyzed documents across the wire.

   

Can you post the exceptions you hit?  (Are these logged?).
 

Copied below are several.  We managed to experience a panoply of exceptions:

SEVERE: org.apache.lucene.index.MergePolicy$MergeException:
MergePolicy selected non-contiguous segments to merge (
_1pw:C4035->_1pb _1py:C19180->_1py _1pz:C22252->_1py _1q0:C23005->_1py
_1q1:C22051->_1py _1q2:C19520->_1py _1q3:C17
143->_1py _1q4:C18015->_1py _1q5:C19764->_1py _1q6:C18967->_1py vs
_v7:C10151578 _1pw:C9958546 _2kn:C10372070 _3fs:
C11462047 _4af:C11120971 _55i:C12402453 _60d:C11249698 _6v8:C11308887
_7py:C13299679 _8ku:C11369240 _sy:C12218112 _
1np:C11597131 _1ns:C65126 _1o3:C65375 _1oe:C63724 _1op:C60821
_1p0:C80242 _1pa:C118076 _1pl:C170005->_1pb _1px:C213
967->_1pb _1pw:C4035->_1pb _1py:C19180->_1py _1pz:C22252->_1py
_1q0:C23005->_1py _1q1:C22051->_1py _1q2:C19520->_1p
y _1q3:C17143->_1py _1q4:C18015->_1py _1q5:C19764->_1py
_1q6:C18967->_1py _1q7:C15903->_1py _1q8:C15061->_1py _1q9:
C17304->_1py _1qa:C16683->_1py _1qb:C16076->_1py _1qc:C15160->_1py
_1qd:C14191->_1py _1qe:C13448->_1py _1qf:C13002-
   

_1py _1qg:C13040->_1py _1qh:C13222->_1py _1qi:C12896->_1py _1qj:C12559->_1py 
_1qk:C12163->_1py), which IndexWriter
 

  (currently) cannot handle

on solr04:
Nov 6, 2010 2:44:31 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: 
NativeFSLock@/mnt/solr/./data/index/lucene-5a92641e18d5832f54989a60e612116b-write.lock
 at org.apache.lucene.store.Lock.obtain(Lock.java:85)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
 at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402)
 at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
 at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
 at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)

Exception in thread "Lucene Merge Thread #2"
org.apache.lucene.index.MergePolicy$MergeException:
java.io.IOException: No space left on device
 at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351)
 at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315)
Caused by: java.io.IOException: No space left on device
 at java.io.RandomAccessFile.writeBytes(Native Method)
 at java.io.RandomAccessFile.write(RandomAccessFile.java:499)
 at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
 at 
org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
 at 
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
 at 
org.apache.lucene.store.BufferedIndexOutput.seek(BufferedIndexOutput.java:124)
 at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.seek(SimpleFSDirectory.java:217)
 at 
org.apache.lucene.index.TermInfosWriter.close(TermInfosWriter.java:220)
 at 
org.apache.lucene.index.FormatPostingsFieldsWriter.finish(FormatPostingsFieldsWriter.java:70)
 at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:589)
 at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:154)
 at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5029)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
 at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
 at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)

null -
java.lang.RuntimeException:
org.apache.lu

Re: A Newbie Question

2010-11-14 Thread Lance Norskog


Yes, the ExtractingRequestHandler uses Tika to parse many file formats.

Solr 1.4.1 uses a previous version of Tika (0.6 or 0.7).

Here's the problem with Tika and extraction utilities in general: they 
are not perfect. They will fail on some files. In the 
ExtractingRequestHandler's case, there is no way to let it fail in 
parsing but save the document's metadata anyway with a notation: "sorry 
not parsed".  I would rather have the unix 'strings' command parse my 
documents (thanks to a co-worker for this).


K. Seshadri Iyer wrote:

Thanks for all the responses.

Govind: To answer your question, yes, all I want to search is plain text
files. They are located in NFS directories across multiple Solaris/Linux
storage boxes. The total storage is in hundreds of terabytes.

I have just got started with Solr and my understanding is that I will
somehow need Tika to help stream/upload files to Solr. I don't know anything
about Java programming, being a system admin. So far, I have read that the
autodetect parser in Tika will somehow detect the file type and I can use
the stream to populate Solr. How, that is still a mystery to me - working on
it. Any tips appreciated; thanks in advance.

Sesh



On 13 November 2010 15:24, Govind Kanshi  wrote:

   

Another pov you might want to think about - what kind of search you want.
Just plain - full text search or there is something more to those text
files. Are they grouped in folders? Do the folders imply certain kind of
grouping/hierarchy/tagging?

I recently was trying to help somebody who had files across lot of places
grouped by date/subject/author - he wanted to ensure these are "fields"
which too can act as filters/navigators.

Just an input - ignore it if you just want plain full text search.

On Sat, Nov 13, 2010 at 11:25 AM, Lance Norskog  wrote:

 

About web servers: Solr is a servlet war file and needs a Java web server
"container" to run. The example/ folder in the Solr disribution uses
'Jetty', and this is fine for small production-quality projects.  You can
just copy the example/ directory somewhere to set up your own running
   

Solr;
 

that's what I always do.

About indexing programs: if you know Unix scripting, it may be easiest to
walk the file system yourself with the 'find' program and create Solr
   

input
 

XML files.

But yes, you definitely want the Solr 1.4 Enterprise manual. I spent
   

months
 

learning this stuff very slowly, and the book would have been great back
then.

Lance


Erick Erickson wrote:

   

Think of the data import handler (DIH) as Solr pulling data to index
from some source based on configuration. So, once you set up
your DIH config to point to your file system, you issue a command
to solr like "OK, do your data import thing". See the
FileListEntityProcessor.
http://wiki.apache.org/solr/DataImportHandler

SolrJ is a clent library
you'd use to push data to Solr. Basically, you
write a Java program that uses SolrJ to walk the file system, find
documents, create a Solr document and sent that to Solr. It's not
nearly as complex as it sounds. See:
http://wiki.apache.org/solr/Solrj

It's probably worth your while to
 

get
 

a
copy of "Solr 1.4, Enterprise Search Server"
by Erik Pugh and David Smiley.

Best
Erick

On Fri, Nov 12, 2010 at 8:37 AM, K. Seshadri Iyer 

wrote:
   



 

Hi Lance,

Thank you very much for responding (not sure how I reply to the group,
so,
writing to you).

Can you please expand on your suggestion? I am not a web guy and so,
don't
know where to start.

What is the difference between SolrJ and DataImportHandler? Do I need
   

to
 

set
up web servers on all my storage boxes?

Apologies for the basic level of questions, but hope I can get started
and
implement this before the year end (you know why :o)

Thanks,

Sesh

On 12 November 2010 13:31, Lance Norskog   wrote:



   

Using 'curl' is fine. There is a library called SolrJ for Java and
other libraries for other scripting languages that let you upload with
more control. There is a thing in Solr called the DataImportHandler
that lets you script walking a file system.

On Thu, Nov 11, 2010 at 8:38 PM, K. Seshadri Iyer<
 

seshadri...@gmail.com
 

wrote:


 

Hi,

Pardon me if this sounds very elementary, but I have a very basic


   

question


 

regarding Solr search. I have about 10 storage devices running
   

Solaris
 


   

with


 

hundreds of thousands of text files (there are other files, as well,


   

but
 


   

my


 

target is these text files). The directories on the Solaris boxes are
exported and are available as NFS mounts.

I have installed Solr 1.4 on a Linux box and have tested the


   

installation,

Re: full text search in multiple fields

2010-11-14 Thread Erick Erickson

nowhere (unless I overlooked it) do you ever populate city_search
in the first place, it's simply defined..

Also, I don't think (but check it) that  is chainable.
I don't *think* that

will populate citytext_search. Ahmet's suggestion to do two
s with source="city" is spot-on

Best
Erick

On Sun, Nov 14, 2010 at 5:48 PM, Ahmet Arslan  wrote:

>
> > but I dont understand why its not indexed.
>
> Probably something wrong with data-config.xml.
>
> > So you can see, that the city field DOES index some data,
> > whereas the
> > city_search and citytext_search have NO data at all...
>
> Then populate these two fields from city via copyField. It is 100% legal.
>
> 
> 
>
>
>
>

Re: A Newbie Question

2010-11-14 Thread Ken Krugler



On Nov 14, 2010, at 3:02pm, Lance Norskog wrote:

Yes, the ExtractingRequestHandler uses Tika to parse many file  
formats.


Solr 1.4.1 uses a previous version of Tika (0.6 or 0.7).

Here's the problem with Tika and extraction utilities in general:  
they are not perfect. They will fail on some files. In the  
ExtractingRequestHandler's case, there is no way to let it fail in  
parsing but save the document's metadata anyway with a notation:  
"sorry not parsed".


By "there is no way" do you mean in configuring the current  
ExtractingRequestHandler? Or is there some fundamental issue with how  
Solr uses Tika that prevents ExtractingRequestHandler from being  
modified to work this way (which seems like a useful configuration  
settings)?


Regards,

-- Ken

I would rather have the unix 'strings' command parse my documents  
(thanks to a co-worker for this).


K. Seshadri Iyer wrote:

Thanks for all the responses.

Govind: To answer your question, yes, all I want to search is plain  
text
files. They are located in NFS directories across multiple Solaris/ 
Linux

storage boxes. The total storage is in hundreds of terabytes.

I have just got started with Solr and my understanding is that I will
somehow need Tika to help stream/upload files to Solr. I don't know  
anything
about Java programming, being a system admin. So far, I have read  
that the
autodetect parser in Tika will somehow detect the file type and I  
can use
the stream to populate Solr. How, that is still a mystery to me -  
working on

it. Any tips appreciated; thanks in advance.

Sesh



On 13 November 2010 15:24, Govind Kanshi   
wrote:



Another pov you might want to think about - what kind of search  
you want.
Just plain - full text search or there is something more to those  
text
files. Are they grouped in folders? Do the folders imply certain  
kind of

grouping/hierarchy/tagging?

I recently was trying to help somebody who had files across lot of  
places
grouped by date/subject/author - he wanted to ensure these are  
"fields"

which too can act as filters/navigators.

Just an input - ignore it if you just want plain full text search.

On Sat, Nov 13, 2010 at 11:25 AM, Lance  
Norskog  wrote:



About web servers: Solr is a servlet war file and needs a Java  
web server
"container" to run. The example/ folder in the Solr disribution  
uses
'Jetty', and this is fine for small production-quality projects.   
You can
just copy the example/ directory somewhere to set up your own  
running



Solr;


that's what I always do.

About indexing programs: if you know Unix scripting, it may be  
easiest to
walk the file system yourself with the 'find' program and create  
Solr



input


XML files.

But yes, you definitely want the Solr 1.4 Enterprise manual. I  
spent



months

learning this stuff very slowly, and the book would have been  
great back

then.

Lance


Erick Erickson wrote:


Think of the data import handler (DIH) as Solr pulling data to  
index

from some source based on configuration. So, once you set up
your DIH config to point to your file system, you issue a command
to solr like "OK, do your data import thing". See the
FileListEntityProcessor.
http://wiki.apache.org/solr/DataImportHandler

SolrJ is a clent  
library

you'd use to push data to Solr. Basically, you
write a Java program that uses SolrJ to walk the file system, find
documents, create a Solr document and sent that to Solr. It's not
nearly as complex as it sounds. See:
http://wiki.apache.org/solr/Solrj

It's probably worth your  
while to



get


a
copy of "Solr 1.4, Enterprise Search Server"
by Erik Pugh and David Smiley.

Best
Erick

On Fri, Nov 12, 2010 at 8:37 AM, K. Seshadri Iyer
wrote:






Hi Lance,

Thank you very much for responding (not sure how I reply to the  
group,

so,
writing to you).

Can you please expand on your suggestion? I am not a web guy  
and so,

don't
know where to start.

What is the difference between SolrJ and DataImportHandler? Do  
I need



to


set
up web servers on all my storage boxes?

Apologies for the basic level of questions, but hope I can get  
started

and
implement this before the year end (you know why :o)

Thanks,

Sesh

On 12 November 2010 13:31, Lance Norskog
wrote:





Using 'curl' is fine. There is a library called SolrJ for Java  
and
other libraries for other scripting languages that let you  
upload with
more control. There is a thing in Solr called the  
DataImportHandler

that lets you script walking a file system.

On Thu, Nov 11, 2010 at 8:38 PM, K. Seshadri Iyer<


seshadri...@gmail.com


wrote:




Hi,

Pardon me if this sounds very elementary, but I have a very  
basic





question




regarding Solr search. I have about 10 storage devices running


Solaris





with



hundreds of thousands of text files (there are other files,  
as well,





but





my



target is these text files). The directories on t

my index has 500 million docs ,how to improve solr search performance？

2010-11-14 Thread lu.rongbin


   I split my docs to 100 indexs,I deploy the 100 indexs on 10 ec2 m2.4xLarge
instances for solr shards. it means each instance has 10 solr cores. it cost
4 to 10 seconds only for search when I test hundred concurrent threads,and
now I have 1000 online users per sencond, the user must wait for more time
to view the result. I use solr filter to search ,for example,
category:digital,price:[some price TO some price], I don't know if it cost
time?
Any way to improve the search performance? thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/my-index-has-500-million-docs-how-to-improve-solr-search-performance-tp1902595p1902595.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: my index has 500 million docs ,ho w to improve solr search performance？

2010-11-14 Thread lu.rongbin


In addition,my index has only two store fields, id and price, and other
fields are index. I increase the document and query cache. the ec2
m2.4xLarge instance is 8 cores, 68G memery. all indexs size is about 100G.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/my-index-has-500-million-docs-how-to-improve-solr-search-performance-tp1902595p1902869.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr Negative query

2010-11-14 Thread Viswa S


Apologies for starting a new thread again, my mailing list subscription didn't 
finalize till later than Yonik's response.

Using "Field1:Val1 AND (*:* NOT Field2:Val2)" works, thanks.

Does my original query "Field1:Value1 AND (NOT Field2:Val2)" fall into "need 
the *:* trick if all of the clauses of a boolean query are negative" case?. It 
does seem to have a positive and negative match attributes? can you please 
elaborate.

Thanks for quick responses, you guys are awesome.


From:Yonik Seeley (yon...@lucidimagination.com)Date:Nov 14, 2010  9:08:32 am 
wrote:
On Sun, Nov 14, 2010 at 4:17 AM, Leonardo Menezes
 wrote:try
Field1:Val1 AND (*:* NOT Field2:Val2), that shoud work ok


That should be equivalent to Field1:Val1 -Field2:Val2
You only need the *:* trick if all of the clauses of a boolean query
are negative.


-Yonik
http://www.lucidimagination.com

To: solr-user@lucene.apache.org
Subject: Solr Negative query
Date: Sun, 14 Nov 2010 13:32:51 +0530








Dear Solr/Lucene gurus,
I have run into a weird issue trying use a negative condition in my query.
Parser:StandardQueryParserMy Query: Field1:Val1 NOT Field2:Val2Resolved as: 
Field1:Val1 -Field2:Val2
The above query never returns any document, no matter how we use a paranthesis.
I did see some suggestions on LuceneQParser to use something like:*:* 
Field1:Val1 -Field2:Val2
This seems to return some documents, however it seems to ignore the first 
condition (Field1:Val1), Please help.
ThanksVis

Searching with acronyms

2010-11-14 Thread sivaprasad


Hi,

I have a requirement where a user enters acronym of a word, then the search
results should come for the expandable word.Let us say. If the user enters
'TV', the search results should come for 'Television'.

Is the synonyms filter is the way to achieve this?

Any inputs.

Regards,
Siva

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-with-acronyms-tp1902583p1902583.html
Sent from the Solr - User mailing list archive at Nabble.com.

Error When Switching to Tomcat

Solr Negative query

Re: Solr Negative query

Re: A Newbie Question

Re: Error When Switching to Tomcat

Solr TermsComponent: space in term

Re: Solr TermsComponent: space in term

Re: Testing/packaging question

Re: Solr TermsComponent: space in term

RE: Testing/packaging question

Re: Solr TermsComponent: space in term

Re: Solr TermsComponent: space in term

Re: Solr TermsComponent: space in term

Re: Solr Negative query

Re: full text search in multiple fields

Re: full text search in multiple fields

Re: full text search in multiple fields

Re: full text search in multiple fields

Re: Default file locking on trunk

RE: Error When Switching to Tomcat

Re: full text search in multiple fields

Re: Deletes writing bytes len 0, corrupting the index

Re: Rollback can't be done after committing?

Re: full text search in multiple fields

Re: Rollback can't be done after committing?

Re: Deletes writing bytes len 0, corrupting the index

Re: A Newbie Question

Re: full text search in multiple fields

Re: A Newbie Question

my index has 500 million docs ,how to improve solr search performance？

Re: my index has 500 million docs ,ho w to improve solr search performance？

RE: Solr Negative query

Searching with acronyms

33 matches

Site Navigation

Mail list logo

Footer information