Re: field collapse using 'adjacent' & 'includeCollapsedDocs' + 'sort' query field

2009-11-15 Thread Martijn v Groningen
Hi Micheal,

What you are saying seems logical, but that is currently not the case
with the collapsedDocs functionality. This functionality was build
with computing aggregated statistics in mind and not really to have a
separate collapse group search result. Although the collapsed
documents are collected in the order the appear in the search result
(only if collapsetype is adjacent) they are not saved in the order
they appear.

If you really need to have the collapse group search result in the
order they were collapsed you need to tweak the code. What you can do
is change the CollapsedDocumentCollapseCollector class in the
DocumentFieldsCollapseCollectorFactory.java source file. Currently the
document ids are stored inside a OpenBitSet per collapse group. You
can change that into an ArrayList for example. In this way
the order in where the documents were collapsed is preserved.

I think the downside of this change will be to increase of memory
usage. OpenBitSet is memory wise more efficient then an ArrayList of
integers. I think that this will only be a real problem when the
collapse groups become very large.

I hope this will answer your question.

Martijn

2009/11/14 michael8 :
>
> Hi,
>
> This almost seems like a bug, but I can't be sure so I'm seeking
> confirmation.  Basically I am building a site that presents search results
> in reverse chronologically order.  I am also leveraging the field collapse
> feature so that I can group results using 'adjacent' mode and have solr
> return the collapsed results as well via 'includeCollapsedDocs'.  My
> collapsing field is a custom grouping_id that I have specified.
>
> What I'm noticing is that, my search results are coming back in the correct
> order by descending time (via 'sort' param in the main query) as expected.
> However, the results returned within the 'collapsedDocs' section via
> 'includeCollapsedDocs' are not in the same descending time order.
>
> My question is, shouldn't the collapsedDocs results also be in the same
> 'sort' order and key I have specified in the overall query, particularly
> since 'adjacent' mode is enabled, and that would mean results that are
> 'adjacent' in the sort order of the results.
>
> I'm using Solr 1.4.0 + field collapse patch as of 10/27/2009
>
> Thanks,
> Michael
>
> --
> View this message in context: 
> http://old.nabble.com/field-collapse-using-%27adjacent%27---%27includeCollapsedDocs%27-%2B-%27sort%27-query-field-tp26351840p26351840.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Newbie Solr questions

2009-11-15 Thread yz5od2

Thanks for the reply:

I follow the schema.xml concept, but what if my requirement is more  
dynamic in nature? I.E. I would like my developers to be able to  
annotate a POJO and submit it to the Solr server (embedded) to be  
indexed according to public properties OR annotations. Is that possible?


If that is not possible, can I programatically define documents and  
fields (and the field options) in straight Java? I.E. in pseudo code  
below...


// this is made up but this is what I would like to be able to do
SolrDoc document = new SolrDoc();
SolrField field = new SolrField()
field.isIndexed=true;
field.isStored=true;
field.name = 'myField'

field.value = myPOJO.getValue();

solrServer.index(document);





On Nov 15, 2009, at 12:50 AM, Avlesh Singh wrote:



a) Since Solr is built on top of lucene, using SolrJ, can I still  
directly
create custom documents, specify the field specifics etc (indexed,  
stored

etc) and then map POJOs to those documents, simular to just using the
straight lucene API?

b) I took a quick look at the SolrJ javadocs but did not see  
anything in

there that allowed me to customize if a field is stored, indexed, not
indexed etc. How do I do that with SolrJ without having to go  
directly to

the lucene apis?

c) The SolrJ beans package. By annotating a POJO with @Field, how  
exactly
does SolrJ treat that field? Indexed/stored, or just indexed? Is  
there any

other way to control this?


The answer to all your questions above is the magical file called
schema.xml. For more read here - http://wiki.apache.org/solr/ 
SchemaXml.
SolrJ is simply a java client to access (read and update from) the  
solr

server.

c) If I create a custom index outside of Solr using straight lucene,  
is it

easy to import a pre-exisiting lucene index into a Solr Server?

As long as the Lucene index matches the definitions in your schema  
you can
use the same index. The data however needs to copied into a  
predictable

location inside SOLR_HOME.

Cheers
Avlesh

On Sun, Nov 15, 2009 at 9:26 AM, yz5od2 outdo...@yahoo.com>wrote:



Hi,
I am new to Solr but fairly advanced with lucene.

In the past I have created custom Lucene search engines that indexed
objects in a Java application, so my background is coming from this
requirement

a) Since Solr is built on top of lucene, using SolrJ, can I still  
directly
create custom documents, specify the field specifics etc (indexed,  
stored

etc) and then map POJOs to those documents, simular to just using the
straight lucene API?

b) I took a quick look at the SolrJ javadocs but did not see  
anything in

there that allowed me to customize if a field is stored, indexed, not
indexed etc. How do I do that with SolrJ without having to go  
directly to

the lucene apis?

c) The SolrJ beans package. By annotating a POJO with @Field, how  
exactly
does SolrJ treat that field? Indexed/stored, or just indexed? Is  
there any

other way to control this?

c) If I create a custom index outside of Solr using straight  
lucene, is it

easy to import a pre-exisiting lucene index into a Solr Server?

thanks!





Re: field collapse using 'adjacent' & 'includeCollapsedDocs' + 'sort' query field

2009-11-15 Thread michael8

Hi Martijn,

Thanks for your insight of collapsedDocs, and what I need to modify if I
need the functionality I want.

Michael


Martijn v Groningen wrote:
> 
> Hi Micheal,
> 
> What you are saying seems logical, but that is currently not the case
> with the collapsedDocs functionality. This functionality was build
> with computing aggregated statistics in mind and not really to have a
> separate collapse group search result. Although the collapsed
> documents are collected in the order the appear in the search result
> (only if collapsetype is adjacent) they are not saved in the order
> they appear.
> 
> If you really need to have the collapse group search result in the
> order they were collapsed you need to tweak the code. What you can do
> is change the CollapsedDocumentCollapseCollector class in the
> DocumentFieldsCollapseCollectorFactory.java source file. Currently the
> document ids are stored inside a OpenBitSet per collapse group. You
> can change that into an ArrayList for example. In this way
> the order in where the documents were collapsed is preserved.
> 
> I think the downside of this change will be to increase of memory
> usage. OpenBitSet is memory wise more efficient then an ArrayList of
> integers. I think that this will only be a real problem when the
> collapse groups become very large.
> 
> I hope this will answer your question.
> 
> Martijn
> 
> 2009/11/14 michael8 :
>>
>> Hi,
>>
>> This almost seems like a bug, but I can't be sure so I'm seeking
>> confirmation.  Basically I am building a site that presents search
>> results
>> in reverse chronologically order.  I am also leveraging the field
>> collapse
>> feature so that I can group results using 'adjacent' mode and have solr
>> return the collapsed results as well via 'includeCollapsedDocs'.  My
>> collapsing field is a custom grouping_id that I have specified.
>>
>> What I'm noticing is that, my search results are coming back in the
>> correct
>> order by descending time (via 'sort' param in the main query) as
>> expected.
>> However, the results returned within the 'collapsedDocs' section via
>> 'includeCollapsedDocs' are not in the same descending time order.
>>
>> My question is, shouldn't the collapsedDocs results also be in the same
>> 'sort' order and key I have specified in the overall query, particularly
>> since 'adjacent' mode is enabled, and that would mean results that are
>> 'adjacent' in the sort order of the results.
>>
>> I'm using Solr 1.4.0 + field collapse patch as of 10/27/2009
>>
>> Thanks,
>> Michael
>>
>> --
>> View this message in context:
>> http://old.nabble.com/field-collapse-using-%27adjacent%27---%27includeCollapsedDocs%27-%2B-%27sort%27-query-field-tp26351840p26351840.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/field-collapse-%27includeCollapsedDocs%27-doesn%27t-return-results-within-%27collapsedDocs%27-in-%27sort%27-order-specified-tp26351840p26360433.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr stops running periodically

2009-11-15 Thread athir nuaimi
We have 4 machines running solr.  On one of the machines, every 2-3  
days solr stops running.  By that I mean that the java/tomcat  
process just disappears.  If I look at the catalina logs, I see  
normal log entries and then nothing.  There is no shutdown messages  
like you would normally see if you sent a SIGTERM to the process.


Obviously this is a problem. I''m new to solr/java so if there are  
more diagnostic things I can do I'd appreciate any tips/advice.


thanks in advance
Athir




Re: Why does BinaryRequestWriter force the path to be base URL + "/update/javabin"

2009-11-15 Thread Shalin Shekhar Mangar
On Tue, Nov 3, 2009 at 5:24 AM, Stuart Tettemer  wrote:

> Hi folks,
> First of all, thanks for Solr.  It is a great piece of work.
>
> I have a question about BinaryRequestWriter in the solrj project.  Why does
> it force the path of UpdateRequests to have be "/update/javabin" (see
> BinaryRequestWriter.getPath(String) starting on line 109)?
>
> I am extending BinaryRequestWriter specifically to remove this requirement
> and am interested to know the reasoning behind in the inital choice.
>
>
Stuart, please open an issue in the Jira and, if possible, give a patch.
This can go in right away.


-- 
Regards,
Shalin Shekhar Mangar.


Re: Spell check suggestion and correct way of implementation and some Questions

2009-11-15 Thread Shalin Shekhar Mangar
On Wed, Nov 4, 2009 at 12:31 AM, darniz  wrote:

>
> Thanks
>
> i included the  buildOncommit and buildOnOptimize as true and indexed some
> documents and it automatically builds the dictionary.
>
> Are there any performance issues we should be aware of, with this approach.
>
>
Well, it depends. Each commit/optimize will re-create the spell check index
with those options. So, it is best if you test it out with your index,
queries and load.


-- 
Regards,
Shalin Shekhar Mangar.


Newbie tips: migrating from mysql fulltext search / PHP integration

2009-11-15 Thread mbneto
Hi,

I am looking for alternatives to MySQL fulltext searches.  The combo
Lucene/Solr is one of my options and I'd like to gather as much information
I can before choosing and even build a prototype.

My current need does not seem to be different.

- fast response time (currently some searches can take more than 11sec)
- API to add/update/delete documents to the collection
- way to add synonymous or similar words for misspelled ones (ex. Sony =
Soni)
- way to define relevance of results (ex. If I search for LCD return
products that belong to the LCD category, contains LCD in the product
definition or ara marked as special offer)

I know that I may have to add external code, for example, to take the
results and apply some business logic to resort the results but I'd like to
know, besides the wiki and the solr 1.4 Enterprise Seacrh Server book (which
I am considering to buy) the tips for solr usage.


[OT] Webinar on spatial search using Lucene and Solr

2009-11-15 Thread Grant Ingersoll
From Here to There, You Can Find it Anywhere:
> Building Local/Geo-Search
> with Apache Lucene and Solr
 
Join us for a free webinar hosted by TechTarget / TheServerSide.com
> Wednesday, November 18th 2009
> 10:00 AM PST / 1:00 PM EST
 
Click here to sign up
> http://theserversidecom.bitpipe.com/detail/RES/1257457967_42.html&asrc=CL_PRM_Lucid_11_18_09_c&li=252934
 
With new advances in the flexibility and customizability of Apache Lucene/Solr 
open source search, the ubiquity of location-aware devices and vast amounts of 
spatial data, tremendous opportunities open up to deliver more powerful and 
effective geo-aware search results.
 
We'll hear from Grant Ingersoll, co-founder of Lucid Imagination and chairman 
of the Apache Lucene PMC, for an in-depth technical workshop on the potential 
and application of the newly released Lucene and Solr geo-search functions. 
Grant will be joined by thought leaders: Ryan McKinley, co-founder of Voyager 
GIS and Apache Lucene PMC member; and Sameer Maggon, of AT&T Interactive, which 
manages and delivers online and mobile advertising products across AT&T's media 
platforms.
 
> Features and benefits of using spatial data in a search engine 
> > Representing and leveraging spatial data in Lucene to empower Local Search 
> > Spatial search in action, a peek at Voyager GIS, a tool to index and search 
> > geographic data 
> > How AT&T Interactive uses Solr/Lucene to power local search at YP.com
 
Click here to sign up
> http://theserversidecom.bitpipe.com/detail/RES/1257457967_42.html&asrc=CL_PRM_Lucid_11_18_09_c&li=252934

> About the presenters:
 
Grant Ingersoll
> Co-founder of Lucid Imagination
> Grant Ingersoll, co-founder of Lucid Imagination, is a published expert in 
> search and Natural Language Processing, with many articles published on 
> Lucene, Solr, findability, relevance, and is co-founder of the Apache Mahout 
> machine learning project. Grant's the author of the forthcoming book "Taming 
> Text", from Manning publications.
 
Ryan McKinley
> Co-founder of Voyager GIS
> Ryan McKinley, co-founder of Voyager GIS, works with technology to help find, 
> share, and distribute information. He has built many sites using solr, 
> including: ludb.clui.org andwww.digitalcommonwealth.org. He was a partner at 
> Squid Labs and co-founded www.instructables.com. Ryan is a member of Lucid 
> Imagination's Technical Advisory Board.
 
Sameer Maggon
> AT&T Interactive
> Sameer Maggon leads the Search Engineering Team at AT&T Interactive. He 
> helped the company launch YP.com that uses Solr underneath. Before joining 
> AT&T Interactive, he worked with Siderean (http://www.siderean.com) working 
> on an enterprise search and navigation product that used Lucene and was 
> ultimately responsible for delivering the technology behind their new 
> product. Sameer has been been an active Lucene user since 2001.
>

Re: solr stops running periodically

2009-11-15 Thread Grant Ingersoll
Have you looked in other logs, like your syslogs?  I've never seen Solr/Tomcat 
just disappear w/o so much as a blip.  I'd think if a process just died from an 
error condition there would be some note of it somewhere.  I'd try to find some 
other events taking place at that time which might give a hint.

On Nov 15, 2009, at 1:45 PM, athir nuaimi wrote:

>> We have 4 machines running solr.  On one of the machines, every 2-3 days 
>> solr stops running.  By that I mean that the java/tomcat process just 
>> disappears.  If I look at the catalina logs, I see normal log entries and 
>> then nothing.  There is no shutdown messages like you would normally see if 
>> you sent a SIGTERM to the process.
>> 
>> Obviously this is a problem. I''m new to solr/java so if there are more 
>> diagnostic things I can do I'd appreciate any tips/advice.
>> 
>> thanks in advance
>> Athir
> 




RE: Segment file not found error - after replicating

2009-11-15 Thread Maduranga Kannangara
Yes. We have tried Solr 1.4 and so far its been great success.

Still I am investigating why Solr 1.3 gave an issue like before.

Currently seems to me 
org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to 
figure out correct segment file name. (May be index replication issue -- 
leading to "not fully replicated".. but its so hard to believe as both master 
and slave are having 100% same data now!)

Anyway.. will keep on trying till I find something useful.. and will let you 
know.


Thanks
Madu


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Wednesday, 11 November 2009 10:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Segment file not found error - after replicating

It sounds like your index is not being fully replicated.  I can't tell why, but 
I can suggest you try the new Solr 1.4 replication.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Maduranga Kannangara 
> To: "solr-user@lucene.apache.org" 
> Sent: Tue, November 10, 2009 5:42:44 PM
> Subject: RE: Segment file not found error - after replicating
>
> Thanks Otis,
>
> I did the du -s for all three index directories as you said right after
> replicating and when I find errors.
>
> All three gave me the exact same value. This time I found the error in a 
> rather
> small index too (31Mb).
>
> BTW, if I copy the segment_x file to what Solr is looking for, and restart the
> Solr web-app from Tomcat manager, this resolves. But it's just a work around,
> never good enough for the production deployments.
>
> My next plan is to do a remote debug to see what exactly happening in the 
> code.
>
> Any other things I should looking at?
> Any help is really appreciated on this matter.
>
> Thanks
> Madu
>
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Tuesday, 10 November 2009 1:14 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Segment file not found error - after replicating
>
> Madu,
>
> So are you saying that all slaves have the exact same index, and that index is
> exactly the same as the one on the master, yet only some of those slaves 
> exhibit
> this error, while others do not?  Mind listing index directories of 1) master 
> 2)
> slave without errors, 3) slave with errors and doing:
> du -s /path/to/index/on/master
> du -s /path/to/index/on/slave/without/errors
> du -s /path/to/index/on/slave/with/errors
>
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
> > From: Maduranga Kannangara
> > To: "solr-user@lucene.apache.org"
> > Sent: Mon, November 9, 2009 7:47:04 PM
> > Subject: RE: Segment file not found error - after replicating
> >
> > Thanks Otis!
> >
> > Yes, I checked the index directories and they are 100% same, both timestamp
> and
> > size wise.
> >
> > Not all the slaves face this issue. I would say roughly 50% has this 
> > trouble.
> >
> > Logs do not have any errors too :-(
> >
> > Any other things I should do/look at?
> >
> > Cheers
> > Madu
> >
> >
> > -Original Message-
> > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> > Sent: Tuesday, 10 November 2009 9:26 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Segment file not found error - after replicating
> >
> > It's hard to troubleshoot blindly like this, but have you tried manually
> > comparing the contents of the index dir on the master and on the slave(s)?
> > If they are out of sync, have you tried forcing of replication to see if one
> of
> > the subsequent replication attempts gets the dirs in sync?
> > Do you have more than 1 slave and do they all start having this problem at 
> > the
> > same time?
> > Any errors in the logs for any of the scripts involved in replication in 
> > 1.3?
> >
> > Otis
> > --
> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >
> >
> >
> > - Original Message 
> > > From: Maduranga Kannangara
> > > To: "solr-user@lucene.apache.org"
> > > Sent: Sun, November 8, 2009 10:30:44 PM
> > > Subject: Segment file not found error - after replicating
> > >
> > > Hi guys,
> > >
> > > We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux
> > > environment and use the replication scripts to make replicas those live in
> > load
> > > balancing slaves.
> > >
> > > The issue we face quite often (only in Linux servers) is that they tend to
> not
> >
> > > been able to find the segment file (segment_x etc) after the replicating
> > > completed. As this has become quite common, we started hitting a serious
> > issue.
> > >
> > > Below is a stack trace, if that helps and any help on this matter is 
> > > greatly
> > > appreciated.
> > >
> > > --

RE: Segment file not found error - after replicating

2009-11-15 Thread Maduranga Kannangara
Just found out the root cause:

* The segments.gen file does not get replicated to slave all the time.

For some reason, this small (20bytes) file lives in memory and does not get 
updated to the master's hard disk. Therefore it is not obviously transferred to 
slaves.

Solution was to shut down the master web app (must be a clean shut down!, not 
kill of Tomcat). Then do the replication.

Also, if the timestamp/size (size won't change anyway!) is not changed, Rsync 
does not seem to copy over this file too. So enforcing in the replication 
scripts solved the problem.

Thanks Otis and everyone for all your support!

Madu


-Original Message-
From: Maduranga Kannangara
Sent: Monday, 16 November 2009 12:37 PM
To: solr-user@lucene.apache.org
Subject: RE: Segment file not found error - after replicating

Yes. We have tried Solr 1.4 and so far its been great success.

Still I am investigating why Solr 1.3 gave an issue like before.

Currently seems to me 
org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to 
figure out correct segment file name. (May be index replication issue -- 
leading to "not fully replicated".. but its so hard to believe as both master 
and slave are having 100% same data now!)

Anyway.. will keep on trying till I find something useful.. and will let you 
know.


Thanks
Madu


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Wednesday, 11 November 2009 10:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Segment file not found error - after replicating

It sounds like your index is not being fully replicated.  I can't tell why, but 
I can suggest you try the new Solr 1.4 replication.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Maduranga Kannangara 
> To: "solr-user@lucene.apache.org" 
> Sent: Tue, November 10, 2009 5:42:44 PM
> Subject: RE: Segment file not found error - after replicating
>
> Thanks Otis,
>
> I did the du -s for all three index directories as you said right after
> replicating and when I find errors.
>
> All three gave me the exact same value. This time I found the error in a 
> rather
> small index too (31Mb).
>
> BTW, if I copy the segment_x file to what Solr is looking for, and restart the
> Solr web-app from Tomcat manager, this resolves. But it's just a work around,
> never good enough for the production deployments.
>
> My next plan is to do a remote debug to see what exactly happening in the 
> code.
>
> Any other things I should looking at?
> Any help is really appreciated on this matter.
>
> Thanks
> Madu
>
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Tuesday, 10 November 2009 1:14 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Segment file not found error - after replicating
>
> Madu,
>
> So are you saying that all slaves have the exact same index, and that index is
> exactly the same as the one on the master, yet only some of those slaves 
> exhibit
> this error, while others do not?  Mind listing index directories of 1) master 
> 2)
> slave without errors, 3) slave with errors and doing:
> du -s /path/to/index/on/master
> du -s /path/to/index/on/slave/without/errors
> du -s /path/to/index/on/slave/with/errors
>
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
> > From: Maduranga Kannangara
> > To: "solr-user@lucene.apache.org"
> > Sent: Mon, November 9, 2009 7:47:04 PM
> > Subject: RE: Segment file not found error - after replicating
> >
> > Thanks Otis!
> >
> > Yes, I checked the index directories and they are 100% same, both timestamp
> and
> > size wise.
> >
> > Not all the slaves face this issue. I would say roughly 50% has this 
> > trouble.
> >
> > Logs do not have any errors too :-(
> >
> > Any other things I should do/look at?
> >
> > Cheers
> > Madu
> >
> >
> > -Original Message-
> > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> > Sent: Tuesday, 10 November 2009 9:26 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Segment file not found error - after replicating
> >
> > It's hard to troubleshoot blindly like this, but have you tried manually
> > comparing the contents of the index dir on the master and on the slave(s)?
> > If they are out of sync, have you tried forcing of replication to see if one
> of
> > the subsequent replication attempts gets the dirs in sync?
> > Do you have more than 1 slave and do they all start having this problem at 
> > the
> > same time?
> > Any errors in the logs for any of the scripts involved in replication in 
> > 1.3?
> >
> > Otis
> > --
> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >
> >
> >
> > - Original Message 

Re: Segment file not found error - after replicating

2009-11-15 Thread Mark Miller
Thats odd - that file is normally not used - its a backup method to
figure out the current generation in case it cannot be determined with a
directory listing - its basically for NFS.

Maduranga Kannangara wrote:
> Just found out the root cause:
>
> * The segments.gen file does not get replicated to slave all the time.
>
> For some reason, this small (20bytes) file lives in memory and does not get 
> updated to the master's hard disk. Therefore it is not obviously transferred 
> to slaves.
>
> Solution was to shut down the master web app (must be a clean shut down!, not 
> kill of Tomcat). Then do the replication.
>
> Also, if the timestamp/size (size won't change anyway!) is not changed, Rsync 
> does not seem to copy over this file too. So enforcing in the replication 
> scripts solved the problem.
>
> Thanks Otis and everyone for all your support!
>
> Madu
>
>
> -Original Message-
> From: Maduranga Kannangara
> Sent: Monday, 16 November 2009 12:37 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Segment file not found error - after replicating
>
> Yes. We have tried Solr 1.4 and so far its been great success.
>
> Still I am investigating why Solr 1.3 gave an issue like before.
>
> Currently seems to me 
> org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to 
> figure out correct segment file name. (May be index replication issue -- 
> leading to "not fully replicated".. but its so hard to believe as both master 
> and slave are having 100% same data now!)
>
> Anyway.. will keep on trying till I find something useful.. and will let you 
> know.
>
>
> Thanks
> Madu
>
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Wednesday, 11 November 2009 10:03 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Segment file not found error - after replicating
>
> It sounds like your index is not being fully replicated.  I can't tell why, 
> but I can suggest you try the new Solr 1.4 replication.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
>   
>> From: Maduranga Kannangara 
>> To: "solr-user@lucene.apache.org" 
>> Sent: Tue, November 10, 2009 5:42:44 PM
>> Subject: RE: Segment file not found error - after replicating
>>
>> Thanks Otis,
>>
>> I did the du -s for all three index directories as you said right after
>> replicating and when I find errors.
>>
>> All three gave me the exact same value. This time I found the error in a 
>> rather
>> small index too (31Mb).
>>
>> BTW, if I copy the segment_x file to what Solr is looking for, and restart 
>> the
>> Solr web-app from Tomcat manager, this resolves. But it's just a work around,
>> never good enough for the production deployments.
>>
>> My next plan is to do a remote debug to see what exactly happening in the 
>> code.
>>
>> Any other things I should looking at?
>> Any help is really appreciated on this matter.
>>
>> Thanks
>> Madu
>>
>>
>> -Original Message-
>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>> Sent: Tuesday, 10 November 2009 1:14 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Segment file not found error - after replicating
>>
>> Madu,
>>
>> So are you saying that all slaves have the exact same index, and that index 
>> is
>> exactly the same as the one on the master, yet only some of those slaves 
>> exhibit
>> this error, while others do not?  Mind listing index directories of 1) 
>> master 2)
>> slave without errors, 3) slave with errors and doing:
>> du -s /path/to/index/on/master
>> du -s /path/to/index/on/slave/without/errors
>> du -s /path/to/index/on/slave/with/errors
>>
>>
>> Otis
>> --
>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>>
>>
>>
>> - Original Message 
>> 
>>> From: Maduranga Kannangara
>>> To: "solr-user@lucene.apache.org"
>>> Sent: Mon, November 9, 2009 7:47:04 PM
>>> Subject: RE: Segment file not found error - after replicating
>>>
>>> Thanks Otis!
>>>
>>> Yes, I checked the index directories and they are 100% same, both timestamp
>>>   
>> and
>> 
>>> size wise.
>>>
>>> Not all the slaves face this issue. I would say roughly 50% has this 
>>> trouble.
>>>
>>> Logs do not have any errors too :-(
>>>
>>> Any other things I should do/look at?
>>>
>>> Cheers
>>> Madu
>>>
>>>
>>> -Original Message-
>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>>> Sent: Tuesday, 10 November 2009 9:26 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Segment file not found error - after replicating
>>>
>>> It's hard to troubleshoot blindly like this, but have you tried manually
>>> comparing the contents of the index dir on the master and on the slave(s)?
>>> If they are out of sync, have you tried forcing of replication to see if one
>>>   
>> of
>> 
>>> the subseq

RE: Segment file not found error - after replicating

2009-11-15 Thread Maduranga Kannangara
Yes, I too believed so..

The logic in earlier said method does the "gen number calculation" using 
segment files available (genA) and using segment.gen file content (genB). Which 
ever larger, would be the gen number used to look up for segment file.

When the file is not properly replicated (due to that is not being written to 
hard disk, or rsync ed) and segment gen number in the segment.gen file (genB) 
is larger than the file based calculation (genA) we hit the pre-said issue.

Cheers
Madu


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Monday, 16 November 2009 2:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Segment file not found error - after replicating

Thats odd - that file is normally not used - its a backup method to
figure out the current generation in case it cannot be determined with a
directory listing - its basically for NFS.

Maduranga Kannangara wrote:
> Just found out the root cause:
>
> * The segments.gen file does not get replicated to slave all the time.
>
> For some reason, this small (20bytes) file lives in memory and does not get 
> updated to the master's hard disk. Therefore it is not obviously transferred 
> to slaves.
>
> Solution was to shut down the master web app (must be a clean shut down!, not 
> kill of Tomcat). Then do the replication.
>
> Also, if the timestamp/size (size won't change anyway!) is not changed, Rsync 
> does not seem to copy over this file too. So enforcing in the replication 
> scripts solved the problem.
>
> Thanks Otis and everyone for all your support!
>
> Madu
>
>
> -Original Message-
> From: Maduranga Kannangara
> Sent: Monday, 16 November 2009 12:37 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Segment file not found error - after replicating
>
> Yes. We have tried Solr 1.4 and so far its been great success.
>
> Still I am investigating why Solr 1.3 gave an issue like before.
>
> Currently seems to me 
> org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to 
> figure out correct segment file name. (May be index replication issue -- 
> leading to "not fully replicated".. but its so hard to believe as both master 
> and slave are having 100% same data now!)
>
> Anyway.. will keep on trying till I find something useful.. and will let you 
> know.
>
>
> Thanks
> Madu
>
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Wednesday, 11 November 2009 10:03 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Segment file not found error - after replicating
>
> It sounds like your index is not being fully replicated.  I can't tell why, 
> but I can suggest you try the new Solr 1.4 replication.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
>
>> From: Maduranga Kannangara 
>> To: "solr-user@lucene.apache.org" 
>> Sent: Tue, November 10, 2009 5:42:44 PM
>> Subject: RE: Segment file not found error - after replicating
>>
>> Thanks Otis,
>>
>> I did the du -s for all three index directories as you said right after
>> replicating and when I find errors.
>>
>> All three gave me the exact same value. This time I found the error in a 
>> rather
>> small index too (31Mb).
>>
>> BTW, if I copy the segment_x file to what Solr is looking for, and restart 
>> the
>> Solr web-app from Tomcat manager, this resolves. But it's just a work around,
>> never good enough for the production deployments.
>>
>> My next plan is to do a remote debug to see what exactly happening in the 
>> code.
>>
>> Any other things I should looking at?
>> Any help is really appreciated on this matter.
>>
>> Thanks
>> Madu
>>
>>
>> -Original Message-
>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>> Sent: Tuesday, 10 November 2009 1:14 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Segment file not found error - after replicating
>>
>> Madu,
>>
>> So are you saying that all slaves have the exact same index, and that index 
>> is
>> exactly the same as the one on the master, yet only some of those slaves 
>> exhibit
>> this error, while others do not?  Mind listing index directories of 1) 
>> master 2)
>> slave without errors, 3) slave with errors and doing:
>> du -s /path/to/index/on/master
>> du -s /path/to/index/on/slave/without/errors
>> du -s /path/to/index/on/slave/with/errors
>>
>>
>> Otis
>> --
>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>>
>>
>>
>> - Original Message 
>>
>>> From: Maduranga Kannangara
>>> To: "solr-user@lucene.apache.org"
>>> Sent: Mon, November 9, 2009 7:47:04 PM
>>> Subject: RE: Segment file not found error - after replicating
>>>
>>> Thanks Otis!
>>>
>>> Yes, I checked the index directories and they are 100% same, both timestamp
>>>
>> and
>>
>>> size wise.
>>>
>>> Not all the slaves face this issu

Re: Newbie Solr questions

2009-11-15 Thread Peter Wolanin
Take a look at the example schema - you can have dynamic fields that
are used based on wildcard matching to the field name if a field
doesn't mtach the name of an existing field.

-Peter

On Sun, Nov 15, 2009 at 10:50 AM, yz5od2  wrote:
> Thanks for the reply:
>
> I follow the schema.xml concept, but what if my requirement is more dynamic
> in nature? I.E. I would like my developers to be able to annotate a POJO and
> submit it to the Solr server (embedded) to be indexed according to public
> properties OR annotations. Is that possible?
>
> If that is not possible, can I programatically define documents and fields
> (and the field options) in straight Java? I.E. in pseudo code below...
>
> // this is made up but this is what I would like to be able to do
> SolrDoc document = new SolrDoc();
> SolrField field = new SolrField()
> field.isIndexed=true;
> field.isStored=true;
> field.name = 'myField'
>
> field.value = myPOJO.getValue();
>
> solrServer.index(document);
>
>
>
>
>
> On Nov 15, 2009, at 12:50 AM, Avlesh Singh wrote:
>
>>>
>>> a) Since Solr is built on top of lucene, using SolrJ, can I still
>>> directly
>>> create custom documents, specify the field specifics etc (indexed, stored
>>> etc) and then map POJOs to those documents, simular to just using the
>>> straight lucene API?
>>>
>>> b) I took a quick look at the SolrJ javadocs but did not see anything in
>>> there that allowed me to customize if a field is stored, indexed, not
>>> indexed etc. How do I do that with SolrJ without having to go directly to
>>> the lucene apis?
>>>
>>> c) The SolrJ beans package. By annotating a POJO with @Field, how exactly
>>> does SolrJ treat that field? Indexed/stored, or just indexed? Is there
>>> any
>>> other way to control this?
>>>
>> The answer to all your questions above is the magical file called
>> schema.xml. For more read here - http://wiki.apache.org/solr/SchemaXml.
>> SolrJ is simply a java client to access (read and update from) the solr
>> server.
>>
>> c) If I create a custom index outside of Solr using straight lucene, is it
>>>
>>> easy to import a pre-exisiting lucene index into a Solr Server?
>>>
>> As long as the Lucene index matches the definitions in your schema you can
>> use the same index. The data however needs to copied into a predictable
>> location inside SOLR_HOME.
>>
>> Cheers
>> Avlesh
>>
>> On Sun, Nov 15, 2009 at 9:26 AM, yz5od2
>> wrote:
>>
>>> Hi,
>>> I am new to Solr but fairly advanced with lucene.
>>>
>>> In the past I have created custom Lucene search engines that indexed
>>> objects in a Java application, so my background is coming from this
>>> requirement
>>>
>>> a) Since Solr is built on top of lucene, using SolrJ, can I still
>>> directly
>>> create custom documents, specify the field specifics etc (indexed, stored
>>> etc) and then map POJOs to those documents, simular to just using the
>>> straight lucene API?
>>>
>>> b) I took a quick look at the SolrJ javadocs but did not see anything in
>>> there that allowed me to customize if a field is stored, indexed, not
>>> indexed etc. How do I do that with SolrJ without having to go directly to
>>> the lucene apis?
>>>
>>> c) The SolrJ beans package. By annotating a POJO with @Field, how exactly
>>> does SolrJ treat that field? Indexed/stored, or just indexed? Is there
>>> any
>>> other way to control this?
>>>
>>> c) If I create a custom index outside of Solr using straight lucene, is
>>> it
>>> easy to import a pre-exisiting lucene index into a Solr Server?
>>>
>>> thanks!
>>>
>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: Newbie Solr questions

2009-11-15 Thread yz5od2
ok, so what I am hearing, there is no way to create custom documents/ 
fields via the SolrJ client @ runtime. Instead you have to use the  
schema.xml ahead of time OR create a custom index via the lucene APIs  
then import the indexes into Solr for searching?




On Nov 15, 2009, at 9:16 PM, Peter Wolanin wrote:


Take a look at the example schema - you can have dynamic fields that
are used based on wildcard matching to the field name if a field
doesn't mtach the name of an existing field.

-Peter

On Sun, Nov 15, 2009 at 10:50 AM, yz5od2 outdo...@yahoo.com> wrote:

Thanks for the reply:

I follow the schema.xml concept, but what if my requirement is more  
dynamic
in nature? I.E. I would like my developers to be able to annotate a  
POJO and
submit it to the Solr server (embedded) to be indexed according to  
public

properties OR annotations. Is that possible?

If that is not possible, can I programatically define documents and  
fields
(and the field options) in straight Java? I.E. in pseudo code  
below...


// this is made up but this is what I would like to be able to do
SolrDoc document = new SolrDoc();
SolrField field = new SolrField()
field.isIndexed=true;
field.isStored=true;
field.name = 'myField'

field.value = myPOJO.getValue();

solrServer.index(document);





On Nov 15, 2009, at 12:50 AM, Avlesh Singh wrote:



a) Since Solr is built on top of lucene, using SolrJ, can I still
directly
create custom documents, specify the field specifics etc  
(indexed, stored
etc) and then map POJOs to those documents, simular to just using  
the

straight lucene API?

b) I took a quick look at the SolrJ javadocs but did not see  
anything in
there that allowed me to customize if a field is stored, indexed,  
not
indexed etc. How do I do that with SolrJ without having to go  
directly to

the lucene apis?

c) The SolrJ beans package. By annotating a POJO with @Field, how  
exactly
does SolrJ treat that field? Indexed/stored, or just indexed? Is  
there

any
other way to control this?


The answer to all your questions above is the magical file called
schema.xml. For more read here - http://wiki.apache.org/solr/SchemaXml 
.
SolrJ is simply a java client to access (read and update from) the  
solr

server.

c) If I create a custom index outside of Solr using straight  
lucene, is it


easy to import a pre-exisiting lucene index into a Solr Server?

As long as the Lucene index matches the definitions in your schema  
you can
use the same index. The data however needs to copied into a  
predictable

location inside SOLR_HOME.

Cheers
Avlesh

On Sun, Nov 15, 2009 at 9:26 AM, yz5od2
wrote:


Hi,
I am new to Solr but fairly advanced with lucene.

In the past I have created custom Lucene search engines that  
indexed

objects in a Java application, so my background is coming from this
requirement

a) Since Solr is built on top of lucene, using SolrJ, can I still
directly
create custom documents, specify the field specifics etc  
(indexed, stored
etc) and then map POJOs to those documents, simular to just using  
the

straight lucene API?

b) I took a quick look at the SolrJ javadocs but did not see  
anything in
there that allowed me to customize if a field is stored, indexed,  
not
indexed etc. How do I do that with SolrJ without having to go  
directly to

the lucene apis?

c) The SolrJ beans package. By annotating a POJO with @Field, how  
exactly
does SolrJ treat that field? Indexed/stored, or just indexed? Is  
there

any
other way to control this?

c) If I create a custom index outside of Solr using straight  
lucene, is

it
easy to import a pre-exisiting lucene index into a Solr Server?

thanks!








--
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com





Re: solr stops running periodically

2009-11-15 Thread Otis Gospodnetic
Look for the HotSpot dump files that Sun's Java leaves on disk when it dies.  I 
think their names start with "hs".  Luckily, I don't have any of them handy to 
tell you the exact name pattern.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Grant Ingersoll 
> To: solr-user@lucene.apache.org
> Sent: Sun, November 15, 2009 8:15:47 PM
> Subject: Re: solr stops running periodically
> 
> Have you looked in other logs, like your syslogs?  I've never seen 
> Solr/Tomcat 
> just disappear w/o so much as a blip.  I'd think if a process just died from 
> an 
> error condition there would be some note of it somewhere.  I'd try to find 
> some 
> other events taking place at that time which might give a hint.
> 
> On Nov 15, 2009, at 1:45 PM, athir nuaimi wrote:
> 
> >> We have 4 machines running solr.  On one of the machines, every 2-3 days 
> >> solr 
> stops running.  By that I mean that the java/tomcat process just disappears.  
> If 
> I look at the catalina logs, I see normal log entries and then nothing.  
> There 
> is no shutdown messages like you would normally see if you sent a SIGTERM to 
> the 
> process.
> >> 
> >> Obviously this is a problem. I''m new to solr/java so if there are more 
> diagnostic things I can do I'd appreciate any tips/advice.
> >> 
> >> thanks in advance
> >> Athir
> > 



Re: Newbie tips: migrating from mysql fulltext search / PHP integration

2009-11-15 Thread Otis Gospodnetic
Hi,

I'm not sure if you have a specific question there.
But regarding "PHP integration" part, I just learned PHP now has native Solr 
(1.3 and 1.4) support:

  http://twitter.com/otisg/status/5757184282


Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: mbneto 
> To: solr-user@lucene.apache.org
> Sent: Sun, November 15, 2009 4:56:15 PM
> Subject: Newbie tips: migrating from mysql fulltext search / PHP integration
> 
> Hi,
> 
> I am looking for alternatives to MySQL fulltext searches.  The combo
> Lucene/Solr is one of my options and I'd like to gather as much information
> I can before choosing and even build a prototype.
> 
> My current need does not seem to be different.
> 
> - fast response time (currently some searches can take more than 11sec)
> - API to add/update/delete documents to the collection
> - way to add synonymous or similar words for misspelled ones (ex. Sony =
> Soni)
> - way to define relevance of results (ex. If I search for LCD return
> products that belong to the LCD category, contains LCD in the product
> definition or ara marked as special offer)
> 
> I know that I may have to add external code, for example, to take the
> results and apply some business logic to resort the results but I'd like to
> know, besides the wiki and the solr 1.4 Enterprise Seacrh Server book (which
> I am considering to buy) the tips for solr usage.



Re: Is there a way to skip cache for a query

2009-11-15 Thread Otis Gospodnetic
I don't think that is supported today.  It might be useful, though (e.g. 
something I'd use with an external monitoring service, so that it doesn't 
always get fast results from the cache).


Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Bertie Shen 
> To: solr-user@lucene.apache.org
> Sent: Sat, November 14, 2009 9:43:25 PM
> Subject: Is there a way to skip cache for a query
> 
> Hey,
> 
>   I do not want to disable cache completely by changing the setting in
> solrconfig.xml. I just want to sometimes skip cache for a query for  testing
> purpose. So is there a parameter like skipcache=true to specify in
> select/?q=hot&version=2.2&start=0&rows=10&skipcache=true to skip cache for
> the query [hot]. skipcache can by default be false.
> 
> Thanks.



Re: converting over from sphinx

2009-11-15 Thread Otis Gospodnetic
Something doesn't sound right here.  Why do you need wildcards for queries in 
the first place?
Are you finding that with stopword removal and stemming you are not matching 
some docs that you think should be matched?  If so, we may be able to help if 
you provide a few examples.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Cory Ondrejka 
> To: solr-user@lucene.apache.org
> Sent: Sat, November 14, 2009 12:57:56 PM
> Subject: converting over from sphinx
> 
> I've been using Sphinx for full text search, but since I want to move my
> project over to Heroku, need to switch to Solr. Everything's up and running
> using the acts_as_solr plugin, but I'm curious if I'm using Solr the right
> way.  In particular, I'm doing phrase searching into a corpus of
> descriptions, such as "I need help with a foo" where I have a bunch of "foo:
> a foo is a subset of a bar often used to create briznatzes", etc.
> 
> With Sphinx, I could convert "I need help with a foo" into "*need* *help*
> *with* *foo*" and get pretty nice matches. With Solr, my understanding is
> that you can only do wildcard matches on the suffix. In addition, stemming
> only happens on non-wildcard terms. So, my first thought would be to convert
> "I need help with a foo" into "need need* help help* with with* foo foo*".
> 
> Thanks in advance for any help.
> 
> -- 
> Cory Ondrejka
> cory.ondre...@gmail.com
> http://ondrejka.net/



RE: Is there a way to skip cache for a query

2009-11-15 Thread Jake Brownell
See https://issues.apache.org/jira/browse/SOLR-1363 -- it's currently scheduled 
for 1.5.

Jake

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Sunday, November 15, 2009 11:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Is there a way to skip cache for a query

I don't think that is supported today.  It might be useful, though (e.g. 
something I'd use with an external monitoring service, so that it doesn't 
always get fast results from the cache).


Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Bertie Shen 
> To: solr-user@lucene.apache.org
> Sent: Sat, November 14, 2009 9:43:25 PM
> Subject: Is there a way to skip cache for a query
> 
> Hey,
> 
>   I do not want to disable cache completely by changing the setting in
> solrconfig.xml. I just want to sometimes skip cache for a query for  testing
> purpose. So is there a parameter like skipcache=true to specify in
> select/?q=hot&version=2.2&start=0&rows=10&skipcache=true to skip cache for
> the query [hot]. skipcache can by default be false.
> 
> Thanks.



Re: Some guide about setting up local/geo search at solr

2009-11-15 Thread Otis Gospodnetic
Nota bene:
My understanding is the external versions of Local Lucene/Solr are eventually 
going to be "deprecated" in favour of what we have in contrib.  Here's a stub 
page with a link to the spatial JIRA issue: 
http://wiki.apache.org/solr/SpatialSearch

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Bertie Shen 
> To: solr-user@lucene.apache.org
> Sent: Sat, November 14, 2009 3:32:01 AM
> Subject: Some guide about setting up local/geo search at solr
> 
> Hey,
> 
> I spent some times figuring out how to set up local/geo/spatial search at
> solr. I hope the following description can help  given the current status.
> 
> 1) Download localsolr. I download it from
> http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and put jar
> file (in my case, localsolr-1.5.jar) in your application's WEB_INF/lib
> directory of application server.
> 
> 2) Download locallucene. I download it from
> http://sourceforge.net/projects/locallucene/ and put jar file (in my case,
> locallucene.jar in locallucene_r2.0/dist/ diectory) in your application's
> WEB_INF/lib directory of application server. I also need to copy
> gt2-referencing-2.3.1.jar, geoapi-nogenerics-2.1-M2.jar, and jsr108-0.01.jar
> under locallucene_r2.0/lib/ directory to WEB_INF/lib. Do not copy
> lucene-spatial-2.9.1.jar under Lucene codebase. The namespace has been
> changed from com.pjaol.blah.blah.blah to org.apache.blah blah.
> 
> 3) Update your solrconfig.xml and schema.xml. I copy it from
> http://www.gissearch.com/localsolr.
> 
> 4) Restart application server and try a query
> /solr/select?&qt=geo&lat=xx.xx&long=yy.yy&q=abc&radius=zz.



Re: Newbie tips: migrating from mysql fulltext search / PHP integration

2009-11-15 Thread Mattmann, Chris A (388J)
WOW, +1!! Great job, PHP!

Cheers,
Chris



On 11/15/09 10:13 PM, "Otis Gospodnetic"  wrote:

Hi,

I'm not sure if you have a specific question there.
But regarding "PHP integration" part, I just learned PHP now has native Solr 
(1.3 and 1.4) support:

  http://twitter.com/otisg/status/5757184282


Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: mbneto 
> To: solr-user@lucene.apache.org
> Sent: Sun, November 15, 2009 4:56:15 PM
> Subject: Newbie tips: migrating from mysql fulltext search / PHP integration
>
> Hi,
>
> I am looking for alternatives to MySQL fulltext searches.  The combo
> Lucene/Solr is one of my options and I'd like to gather as much information
> I can before choosing and even build a prototype.
>
> My current need does not seem to be different.
>
> - fast response time (currently some searches can take more than 11sec)
> - API to add/update/delete documents to the collection
> - way to add synonymous or similar words for misspelled ones (ex. Sony =
> Soni)
> - way to define relevance of results (ex. If I search for LCD return
> products that belong to the LCD category, contains LCD in the product
> definition or ara marked as special offer)
>
> I know that I may have to add external code, for example, to take the
> results and apply some business logic to resort the results but I'd like to
> know, besides the wiki and the solr 1.4 Enterprise Seacrh Server book (which
> I am considering to buy) the tips for solr usage.



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Re: exclude some fields from copying dynamic fields | schema.xml

2009-11-15 Thread Vicky_Dev

Thanks for response

Defining field is not working :( 

Is there any way to stop copy task for particular set of values

Thanks
~Vikrant



Lance Norskog-2 wrote:
> 
> There is no direct way.
> 
> Let's say you have a "nocopy_s" and you do not want a copy
> "nocopy_str_s". This might work: declare "nocopy_str_s" as a field and
> make it not indexed and not stored. I don't know if this will work.
> 
> It requires two overrides to work: 1) that declaring a field name that
> matches a wildcard will override the default wildcard rule, and 2)
> that "stored=false indexed=false" works.
> 
> On Fri, Nov 13, 2009 at 3:23 AM, Vicky_Dev
>  wrote:
>>
>> Hi,
>> we are using the following entry in schema.xml to make a copy of one type
>> of
>> dynamic field to another :
>> 
>>
>> Is it possible to exclude some fields from copying.
>>
>> We are using Solr1.3
>>
>> ~Vikrant
>>
>> --
>> View this message in context:
>> http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
> 
> 

-- 
View this message in context: 
http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26367099.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: javabin in .NET?

2009-11-15 Thread Noble Paul നോബിള്‍ नोब्ळ्
For a client the marshal() part is not important.unmarshal() is
probably all you need

On Sun, Nov 15, 2009 at 12:36 AM, Mauricio Scheffer
 wrote:
> Original code is here: http://bit.ly/hkCbI
> I just started porting it here: http://bit.ly/37hiOs
> It needs: tests/debugging, porting NamedList, SolrDocument, SolrDocumentList
> Thanks for any help!
>
> Cheers,
> Mauricio
>
> 2009/11/14 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> OK. Is there anyone trying it out? where is this code ? I can try to help
>> ..
>>
>> On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer
>>  wrote:
>> > I meant the standard IO libraries. They are different enough that the
>> code
>> > has to be manually ported. There were some automated tools back when
>> > Microsoft introduced .Net, but IIRC they never really worked.
>> >
>> > Anyway it's not a big deal, it should be a straightforward job. Testing
>> it
>> > thoroughly cross-platform is another thing though.
>> >
>> > 2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् 
>> >
>> >> The javabin format does not have many dependencies. it may have 3-4
>> >> classes an that is it.
>> >>
>> >> On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
>> >>  wrote:
>> >> > Nope. It has to be manually ported. Not so much because of the
>> language
>> >> > itself but because of differences in the libraries.
>> >> >
>> >> >
>> >> > 2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् 
>> >> >
>> >> >> Is there any tool to directly port java to .Net? then we can etxract
>> >> >> out the client part of the javabin code and convert it.
>> >> >>
>> >> >> On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher <
>> erik.hatc...@gmail.com>
>> >> >> wrote:
>> >> >> > Has anyone looked into using the javabin response format from .NET
>> >> >> (instead
>> >> >> > of SolrJ)?
>> >> >> >
>> >> >> > It's mainly a curiosity.
>> >> >> >
>> >> >> > How much better could performance/bandwidth/throughput be?  How
>> >> difficult
>> >> >> > would it be to implement some .NET code (C#, I'd guess being the
>> best
>> >> >> > choice) to handle this response format?
>> >> >> >
>> >> >> > Thanks,
>> >> >> >        Erik
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> -
>> >> >> Noble Paul | Principal Engineer| AOL | http://aol.com
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> -
>> >> Noble Paul | Principal Engineer| AOL | http://aol.com
>> >>
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: javabin in .NET?

2009-11-15 Thread Noble Paul നോബിള്‍ नोब्ळ्
start with a JavabinDecoder only so that the class is simple to start with.

2009/11/16 Noble Paul നോബിള്‍  नोब्ळ् :
> For a client the marshal() part is not important.unmarshal() is
> probably all you need
>
> On Sun, Nov 15, 2009 at 12:36 AM, Mauricio Scheffer
>  wrote:
>> Original code is here: http://bit.ly/hkCbI
>> I just started porting it here: http://bit.ly/37hiOs
>> It needs: tests/debugging, porting NamedList, SolrDocument, SolrDocumentList
>> Thanks for any help!
>>
>> Cheers,
>> Mauricio
>>
>> 2009/11/14 Noble Paul നോബിള്‍ नोब्ळ् 
>>
>>> OK. Is there anyone trying it out? where is this code ? I can try to help
>>> ..
>>>
>>> On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer
>>>  wrote:
>>> > I meant the standard IO libraries. They are different enough that the
>>> code
>>> > has to be manually ported. There were some automated tools back when
>>> > Microsoft introduced .Net, but IIRC they never really worked.
>>> >
>>> > Anyway it's not a big deal, it should be a straightforward job. Testing
>>> it
>>> > thoroughly cross-platform is another thing though.
>>> >
>>> > 2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् 
>>> >
>>> >> The javabin format does not have many dependencies. it may have 3-4
>>> >> classes an that is it.
>>> >>
>>> >> On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
>>> >>  wrote:
>>> >> > Nope. It has to be manually ported. Not so much because of the
>>> language
>>> >> > itself but because of differences in the libraries.
>>> >> >
>>> >> >
>>> >> > 2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् 
>>> >> >
>>> >> >> Is there any tool to directly port java to .Net? then we can etxract
>>> >> >> out the client part of the javabin code and convert it.
>>> >> >>
>>> >> >> On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher <
>>> erik.hatc...@gmail.com>
>>> >> >> wrote:
>>> >> >> > Has anyone looked into using the javabin response format from .NET
>>> >> >> (instead
>>> >> >> > of SolrJ)?
>>> >> >> >
>>> >> >> > It's mainly a curiosity.
>>> >> >> >
>>> >> >> > How much better could performance/bandwidth/throughput be?  How
>>> >> difficult
>>> >> >> > would it be to implement some .NET code (C#, I'd guess being the
>>> best
>>> >> >> > choice) to handle this response format?
>>> >> >> >
>>> >> >> > Thanks,
>>> >> >> >        Erik
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> -
>>> >> >> Noble Paul | Principal Engineer| AOL | http://aol.com
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> -
>>> >> Noble Paul | Principal Engineer| AOL | http://aol.com
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Solr date and string search problem

2009-11-15 Thread ashokcz

Hi Lance Norskog ,
Thanks for your reply.

Let me first put the config files details.
These are the fields i have defined 

 
  


   

  




  

   

  



  



  

 


   




















  



and this is my requestHandler configuration 




 explicit
 0.01
 PlantSearch^1 GeographySearch^1 RegionSearch^1
CountrySearch^1 BusUnitSearch^1 BusinessFunctionSearch^1 
Businessprocesses^1 LifecycleStatus^1 ApplicationNature^1 UploadedDate^1 

 PlantSearch^1 GeographySearch^1 RegionSearch^1
CountrySearch^1 BusUnitSearch^1 BusinessFunctionSearch^1 
Businessprocesses^1 LifecycleStatus^1 ApplicationNature^1 UploadedDate^1 

 *,score
 
ord(popularity)^0.5 recip(rord(popularity),1,1000,1000)^0.3
 
 *:*
 
10<50%
 

  


  and this is the query thats been fired.


 
facet.limit=-1&rows=10&start=0&facet=true&facet.mincount=1&facet.field=Geography&facet.field=Country&facet.field=Functionality&facet.field=BusinessFunction&facet.field=BusUnit&facet.field=Region&facet.field=PGServiceManager&facet.field=AppName&facet.field=Plant&facet.field=status&q=Behavior&facet.sort=true

  i clearly understand where the problem is happening , but dont know how to
resolve it .

  i have defined UploadedDate as date field and i have defined in my request
handler to search in UploadedDate field also  (   UploadedDate^1 .)

  but what happens is every query that is been fired is converted to date
and it throws me an error.
  if i remove UploadedDate from  request handler it works fine.

  so i dont know how to have some tring fields and some date fields together
co exist in a request handler  ??
  and according to the given query solr should filter it out in all the
fields and should give me the result back .
  is there an way to do tat??
  sorry for a such a long repsone :)

  thanks 
  ---
  Ashok


  



Lance Norskog-2 wrote:
> 
> This line is the key:
>> SEVERE: org.apache.solr.core.SolrException: Invalid Date
>> String:'Behavior'
>>at org.apache.solr.schema.DateField.toInternal(DateField.java:108)
>>at
> 
> The string 'Behavior' is being parsed as a date, and fails. Your query
> is attempting to find this as a date. Please post your query. and the
>  configuration that it is used against.
> 
> On Sat, Nov 14, 2009 at 4:16 AM, ashokcz 
> wrote:
>>
>> Hi ,
>> I have been using solr1.2 for a year and now i m facing a weird problem.
>> Till now i have used only string and number solr types for the search
>> field
>> .
>> and whatever string the users are trying to search will pass it on to the
>> search engine and it will find in appropraite fields and return me the
>> results.
>>
>> But now i have added a another field with type DATE and made it as a
>> search
>> field.
>> so what happens is whatever string that i m giving to solr tries to
>> convert
>> to date and throws me an error.
>>
>>  Nov 14, 2009 4:36:05 PM org.apache.solr.core.SolrException log
>> SEVERE: org.apache.solr.core.SolrException: Invalid Date
>> String:'Behavior'
>>        at org.apache.solr.schema.DateField.toInternal(DateField.java:108)
>>        at
>> org.apache.solr.schema.FieldType$DefaultAnalyzer$1.next(FieldType.java:298)
>>        at
>> org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:437)
>>        at
>> org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:78)
>>        at
>> org.apache.solr.util.SolrPluginUtils$DisjunctionMaxQueryParser.getFieldQuery(SolrPluginUtils.java:774)
>>        at
>> org.apache.solr.util.SolrPluginUtils$DisjunctionMaxQueryParser.getFieldQuery(SolrPluginUtils.java:762)
>>        at
>> org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1092)
>>        at
>> org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:979)
>>        at
>> org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:907)
>>        at
>> org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:896)
>>        at
>> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:146)
>>        at
>> org.apache.solr.request.DisMaxRequestHandler.handleRequestBody(DisMaxRequestHandler.java:238)
>>        at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:706)
>>        at
>> com.tcs.mighty.cb.service.CoreEntitySearcherImpl.solrSearch(CoreEntitySearcherImpl.java:896)
>>        at
>> com.tcs.mighty.cb.service.CoreEntitySearcherImpl.search(CoreEntitySearcherImpl.java:342)
>>        at
>> com.tcs.mighty.cb.service.CoreEntitySearcherImpl.handleGetSearchResults(CoreEntitySearcherImpl.java:52)
>>        at
>> com.tcs.mighty.cb.service.CoreEntitySearcherBase.getSearchResults(CoreEntitySearcherBase.java:122)
>>        at sun.reflect.NativeMethodAccessorImpl.invo

Re: Newbie tips: migrating from mysql fulltext search / PHP integration

2009-11-15 Thread Israel Ekpo
On Mon, Nov 16, 2009 at 12:34 AM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> WOW, +1!! Great job, PHP!
>
> Cheers,
> Chris
>
>
>
> On 11/15/09 10:13 PM, "Otis Gospodnetic" 
> wrote:
>
> Hi,
>
> I'm not sure if you have a specific question there.
> But regarding "PHP integration" part, I just learned PHP now has native
> Solr (1.3 and 1.4) support:
>
>  http://twitter.com/otisg/status/5757184282
>
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
> > From: mbneto 
> > To: solr-user@lucene.apache.org
> > Sent: Sun, November 15, 2009 4:56:15 PM
> > Subject: Newbie tips: migrating from mysql fulltext search / PHP
> integration
> >
> > Hi,
> >
> > I am looking for alternatives to MySQL fulltext searches.  The combo
> > Lucene/Solr is one of my options and I'd like to gather as much
> information
> > I can before choosing and even build a prototype.
> >
> > My current need does not seem to be different.
> >
> > - fast response time (currently some searches can take more than 11sec)
> > - API to add/update/delete documents to the collection
> > - way to add synonymous or similar words for misspelled ones (ex. Sony =
> > Soni)
> > - way to define relevance of results (ex. If I search for LCD return
> > products that belong to the LCD category, contains LCD in the product
> > definition or ara marked as special offer)
> >
> > I know that I may have to add external code, for example, to take the
> > results and apply some business logic to resort the results but I'd like
> to
> > know, besides the wiki and the solr 1.4 Enterprise Seacrh Server book
> (which
> > I am considering to buy) the tips for solr usage.
>
>
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.mattm...@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>

Hi,

There is native support for Solr in PHP but currently you have to build it
as a PECL extension.

It is currently not bundled with the PHP source yet but it is down loadable
from the PECL project homepage

http://pecl.php.net/package/solr

If you currently have pecl support built into your php installation you can
install it by running the following command

pecl install solr-beta

Some usage examples are available here

http://us3.php.net/manual/en/solr.examples.php

More details are available here

http://www.php.net/manual/en/book.solr.php

I use Solr with PHP 5.2

- In PHP, the SolrClient class has methods to add, update, delete and
rollback changes to the index made since the last commit.
- There are also built-in tools in Solr that allow you to analyze and modify
the data before indexing it and when searching for it.
- with Solr you can define synonyms (check the wiki for more details)
- Solr also allows you to sort by score (relevance)
- You can specify the fields that you want either as (optional, required or
prohibited)

My last two points could take care of your last requirement.

Solr is awesome and most of the search I perform return sub-second response
times.

Its several hundred folds easier and more efficient than MySQL fulltext.
believe me.
-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.