date:20110202

Re: Migration from Solr 1.2 to Solr 1.4

2011-02-02 Thread Vincent Chavelle

> if you don't have any custom components, you can probably just use
> your entire solr home dir as is -- just change the solr.war.  (you can't
> just copy the data dir though, you need to use the same configs)
>
> test it out, and note the "Upgrading" notes in the CHANGES.txt for the
> 1.3, 1.4, and 1.4.1 releases for "gotchas" that you might wnat to watch
> out for.
>
>
Hi Hoss,

Thank you for your reply, I've tried to copy the data and configuration
directory without success :
SEVERE: Could not start SOLR. Check solr/home property
java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException:
Unknown format version: -10

Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-02 Thread Churchill Nanje Mambe

thanks guys
 I will try the trunk

as for unpacking the war and changing the lucene... I am not an expect and
this my get complicated for me maybe over time
when I am comfortable

Mambe Churchill Nanje
237 33011349,
AfroVisioN Founder, President,CEO
http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
skypeID: mambenanje
www.twitter.com/mambenanje



On Wed, Feb 2, 2011 at 8:03 AM, Grijesh  wrote:

>
> You can extract the solr.war using java's jar -xvf solr.war  command
>
> change the lucene-2.9.jar with your lucene-3.0.3.jar in WEB-INF/lib
> directory
>
> then use jar -cxf solr.war * to again pack the war
>
> deploy that war hope that work
>
> -
> Thanx:
> Grijesh
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-1-4-and-Lucene-3-0-3-index-problem-tp2396605p2403542.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr and Eclipse

2011-02-02 Thread Erlend Garåsen



Try to run the "svn co" command by using the console (in case you're 
running a UNIX-like OS). Add the following files for Solr (.project and 
.classpath) into your the solr folder:

http://markmail.org/message/yb5qgeamosvdscao

Then do an "import as an existing project" in Eclipse, and you're done.

Erlend

On 01.02.11 23.59, Eric Grobler wrote:

Hi

I am a newbie and I am trying to run solr in eclipse.

 From this url
http://wiki.apache.org/solr/HowToContribute#Development_Environment_Tips
there is a subclipse example:

I use Team ->  Share Project and this url:
   http://svn.apache.org/repos/asf/lucene/dev/trunk

but I get a "access forbidden for unknown reason error"

I assume using readonly http I do not need credentials?

Also, would it make more sense to rather checkout the project with
the command-line svn and in Eclipse use
"Create project from existing source"?


Thanks
Ericz




--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Re: Solr and Eclipse

2011-02-02 Thread Robert Muir

On Tue, Feb 1, 2011 at 5:59 PM, Eric Grobler  wrote:
> Hi
>
> I am a newbie and I am trying to run solr in eclipse.
>
> From this url
> http://wiki.apache.org/solr/HowToContribute#Development_Environment_Tips
> there is a subclipse example:
>
> I use Team -> Share Project and this url:
>  http://svn.apache.org/repos/asf/lucene/dev/trunk
>
> but I get a "access forbidden for unknown reason error"
>
> I assume using readonly http I do not need credentials?
>
> Also, would it make more sense to rather checkout the project with
> the command-line svn and in Eclipse use
> "Create project from existing source"?
>

I always use New...Other...SVN...Checkout Projects from SVN

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-02-02 Thread Ron Mayer


> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
> 
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> 
> [x] I/we build them from source via an SVN/Git checkout.
> 
> [x] Other (someone in your company mirrors them internally or via a 
> downstream project)

>From the blacklight project originally;  now using our own fork with some 
>patches from jira.

Re: disappearing MBeans

2011-02-02 Thread matthew sporleder

Sorry to reply to myself, but I just wanted to see if anyone saw
this/had ideas why MBeans would be removed/re-added/removed.

I tried looking for this in the code but was unable to grok what
triggers bean removal.

Any hints?


On Thu, Jan 27, 2011 at 3:30 PM, matthew sporleder  wrote:
> I am using JMX to monitor my replication status and am finding that my
> MBeans are disappearing.  I turned on debugging for JMX and found that
> solr seems to be deleting the mbeans.
>
> Is this a bug?  Some trace info is below..
>
> here's me reading the mbean successfully:
> Jan 27, 2011 5:00:02 PM ServerCommunicatorAdmin reqIncoming
> FINER: Receive a new request.
> Jan 27, 2011 5:00:02 PM DefaultMBeanServerInterceptor getAttribute
> FINER: Attribute= indexReplicatedAt, obj=
> solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:00:02 PM Repository retrieve
> FINER: 
> name=solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:00:02 PM ServerCommunicatorAdmin reqIncoming
> FINER: Finish a request.
>
>
> a little while later it removes the mbean from the PM Repository
> (whatever that is) and then re-adds it:
> FINER: Send create notification of object
> solr/myapp-core:id=org.apache.solr.handler.component.SearchHandler,type=atlas
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor sendNotification
> FINER: JMX.mbean.registered
> solr/myapp-core:type=atlas,id=org.apache.solr.handler.component.SearchHandler
> Jan 27, 2011 5:16:14 PM Repository contains
> FINER: 
> name=solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM Repository retrieve
> FINER: 
> name=solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM Repository remove
> FINER: 
> name=solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor unregisterMBean
> FINER: Send delete notification of object
> solr/myapp-core:id=org.apache.solr.handler.ReplicationHandler,type=/replication
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor sendNotification
> FINER: JMX.mbean.unregistered
> solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor registerMBean
> FINER: ObjectName =
> solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM Repository addMBean
> FINER: 
> name=solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor addObject
> FINER: Send create notification of object
> solr/myapp-core:id=org.apache.solr.handler.ReplicationHandler,type=/replication
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor sendNotification
> FINER: JMX.mbean.registered
> solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler
>
>
> And after a tons of messages but still in the same second it does:
> Jan 27, 2011 5:16:14 PM Repository contains
> FINER: 
> name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM Repository retrieve
> FINER: 
> name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM Repository removeFINER:
> name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor unregisterMBean
> FINER: Send delete notification of object
> solr/myapp-core:id=org.apache.solr.handler.ReplicationHandler,type=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor sendNotification
> FINER: JMX.mbean.unregistered
> solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor registerMBean
> FINER: ObjectName =
> solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandlerJan
> 27, 2011 5:16:14 PM Repository addMBean
> FINER: 
> name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor addObjectFINER:
> Send create notification of object
> solr/myapp-core:id=org.apache.solr.handler.ReplicationHandler,type=org.apache.solr.handler.ReplicationHandler
> Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor
> sendNotificationFINER: JMX.mbean.registered
> solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler
>
>
> And then I don't know what this is about but it removes the bean again:
> Jan 27, 2011 5:16:15 PM Repository contains
> FINER: 
> name=so

value for maxFieldLength

2011-02-02 Thread McGibbney, Lewis John

Hello list,

I am aware that setting the value of maxFieldLength in solrconfig.xml too high 
may/will result in out-of-mem errors. I wish to provide content extraction on a 
number of pdf documents which are large, by large I mean 8-11MB (occasionally 
more), and I am also not sure how many terms reside in each field when it is 
indexed. My question is therefore what is a sensible number to set this value 
to in order to include the majority/all terms within documents of this size.

Thank you

Lewis


Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education's Widening Participation Initiative of the Year 
2009 and Herald Society's Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education's Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Index MS office

2011-02-02 Thread Thumuluri, Sai

Good Morning,

 I am planning to get started on indexing MS office using ApacheSolr -
can someone please direct me where I should start? 

Thanks,
Sai Thumuluri

Re: Index MS office

2011-02-02 Thread Markus Jelsma

http://wiki.apache.org/solr/ExtractingRequestHandler

On Wednesday 02 February 2011 16:49:12 Thumuluri, Sai wrote:
> Good Morning,
> 
>  I am planning to get started on indexing MS office using ApacheSolr -
> can someone please direct me where I should start?
> 
> Thanks,
> Sai Thumuluri

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Index MS office

2011-02-02 Thread Jayendra Patil

http://wiki.apache.org/solr/ExtractingRequestHandler

Regards,
Jayendra

On Wed, Feb 2, 2011 at 10:49 AM, Thumuluri, Sai
 wrote:
> Good Morning,
>
>  I am planning to get started on indexing MS office using ApacheSolr -
> can someone please direct me where I should start?
>
> Thanks,
> Sai Thumuluri
>
>
>

Re: Index MS office

2011-02-02 Thread Sascha Szott


Hi,

have a look at Solr's ExtractingRequestHandler:

http://wiki.apache.org/solr/ExtractingRequestHandler

-Sascha

On 02.02.2011 16:49, Thumuluri, Sai wrote:

Good Morning,

  I am planning to get started on indexing MS office using ApacheSolr -
can someone please direct me where I should start?

Thanks,
Sai Thumuluri

Re: Index MS office

2011-02-02 Thread Darx Oman

take a look at DIH
http://wiki.apache.org/solr/DataImportHandler

Re: Solr and Eclipse

2011-02-02 Thread Eric Grobler

> I always use New...Other...SVN...Checkout Projects from SVN

Thanks, that seemed to work perfectly :-)

On Wed, Feb 2, 2011 at 12:43 PM, Robert Muir  wrote:

> On Tue, Feb 1, 2011 at 5:59 PM, Eric Grobler 
> wrote:
> > Hi
> >
> > I am a newbie and I am trying to run solr in eclipse.
> >
> > From this url
> > http://wiki.apache.org/solr/HowToContribute#Development_Environment_Tips
> > there is a subclipse example:
> >
> > I use Team -> Share Project and this url:
> >  http://svn.apache.org/repos/asf/lucene/dev/trunk
> >
> > but I get a "access forbidden for unknown reason error"
> >
> > I assume using readonly http I do not need credentials?
> >
> > Also, would it make more sense to rather checkout the project with
> > the command-line svn and in Eclipse use
> > "Create project from existing source"?
> >
>
> I always use New...Other...SVN...Checkout Projects from SVN
>

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-02-02 Thread Darx Oman

[x] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[x] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)

Open Too Many Files

2011-02-02 Thread Bing Li

Dear all,

I got an exception when querying the index within Solr. It told me that too
many files are opened. How to handle this problem?

Thanks so much!
LB

[java] org.apache.solr.client.solrj.
SolrServerException: java.net.SocketException: Too many open files
 [java] at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
 [java] at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
 [java] at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
 [java] at
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
 [java] at com.greatfree.Solr.Broker.Search(Broker.java:145)
 [java] at
com.greatfree.Solr.SolrIndex.SelectHubPageHashByHubKey(SolrIndex.java:116)
 [java] at com.greatfree.Web.HubCrawler.Crawl(Unknown Source)
 [java] at com.greatfree.Web.Worker.run(Unknown Source)
 [java] at java.lang.Thread.run(Thread.java:662)
 [java] Caused by: java.net.SocketException: Too many open files
 [java] at java.net.Socket.createImpl(Socket.java:397)
 [java] at java.net.Socket.(Socket.java:371)
 [java] at java.net.Socket.(Socket.java:249)
 [java] at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
 [java] at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
 [java] at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
 [java] at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
 [java] at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
 [java] at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
 [java] at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
 [java] at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 [java] at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
 [java] ... 8 more
 [java] Exception in thread "Thread-96" java.lang.NullPointerException
 [java] at
com.greatfree.Solr.SolrIndex.SelectHubPageHashByHubKey(SolrIndex.java:117)
 [java] at com.greatfree.Web.HubCrawler.Crawl(Unknown Source)
 [java] at com.greatfree.Web.Worker.run(Unknown Source)
 [java] at java.lang.Thread.run(Thread.java:662)

Re: Controlling Tika's metadata

2011-02-02 Thread Grant Ingersoll


On Jan 28, 2011, at 5:38 PM, Andreas Kemkes wrote:

> Just getting my feet wet with the text extraction using both schema and 
> solrconfig settings from the example directory in the 1.4 distribution, so I 
> might miss something obvious.
> 
> Trying to provide my own title (and discarding the one received through 
> Tika's 
> metadata) wasn't straightforward. I had to use the following:
> 
> fmap.title=tika_title (to discard the Tika title)
> literal.attr_title=New Title (to provide the correct one)
> fmap.attr_title=title (to map it back to the field as I would like to use 
> title 
> in searches)
> 
> Is there anything easier than the above?
> 
> How can this best be generalized to other metadata provided by Tika (which in 
> our use case will be mostly ignored, as it is provided separately)?

You can provide your own ContentHandler (see the wiki docs).  I think it would 
be reasonable to patch the ExtractingRequestHandler to have a no metadata 
option and it wouldn't be that hard.

Re: Solr and Eclipse

2011-02-02 Thread Eric Grobler

> I always use New...Other...SVN...Checkout Projects from SVN

And how do run in eclipse jetty in the example folder?

Thanks for your help
Ericz


On Wed, Feb 2, 2011 at 12:43 PM, Robert Muir  wrote:

> On Tue, Feb 1, 2011 at 5:59 PM, Eric Grobler 
> wrote:
> > Hi
> >
> > I am a newbie and I am trying to run solr in eclipse.
> >
> > From this url
> > http://wiki.apache.org/solr/HowToContribute#Development_Environment_Tips
> > there is a subclipse example:
> >
> > I use Team -> Share Project and this url:
> >  http://svn.apache.org/repos/asf/lucene/dev/trunk
> >
> > but I get a "access forbidden for unknown reason error"
> >
> > I assume using readonly http I do not need credentials?
> >
> > Also, would it make more sense to rather checkout the project with
> > the command-line svn and in Eclipse use
> > "Create project from existing source"?
> >
>
> I always use New...Other...SVN...Checkout Projects from SVN
>

geodist and spacial search

2011-02-02 Thread Eric Grobler

Hi

In http://wiki.apache.org/solr/SpatialSearch
there is an example of a bbox filter and a geodist function.

Is it possible to do a bbox filter and sort by distance - combine the two?

Thanks
Ericz

Re: filed names for solr spatial

2011-02-02 Thread Grant Ingersoll


On Jan 30, 2011, at 2:47 AM, Dennis Gearon wrote:

> I would love it if I could use 'latitude' and 'longitude' in all places. But 
> it 
> seems that solr spatial for 1.4 plugin only works with lat/lng. Any way to 
> change that?

What 1.4 plugin are you referring to?

> 
> Dennis Gearon
> 
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
> better 
> idea to learn from others’ mistakes, so you do not have to make them 
> yourself. 
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> 
> EARTH has a Right To Life,
> otherwise we all die.
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Solr and Eclipse

2011-02-02 Thread Robert Muir

On Wed, Feb 2, 2011 at 11:15 AM, Eric Grobler  wrote:
>> I always use New...Other...SVN...Checkout Projects from SVN
>
> And how do run in eclipse jetty in the example folder?
>

you can always go to the commandline and use the usual techniques,
e.g. ant run-example, or java -jar start.jar from the example
folder... this is what i do, sorry I don't have a better suggestion, I
only use eclipse as a fancy text editor!

Re: CommonsHttpSolrServer and dynamic custom results filtering

2011-02-02 Thread Dave Troiano

Sorry to re-post, but can anyone help out on the question below of dynamic
custom results filtering using CommonsHttpSolrServer?  If anyone is doing
this sort of thing, any suggestions would be much appreciated.

Thanks!
Dave

On 1/31/11 2:47 PM, "Dave Troiano"  wrote:

> Hi,
> 
> I'm implementing custom dynamic results filtering to improve fuzzy /
> phonetic search support in my search application.  I use the
> CommonsHttpSolrServer object to connect remotely to Solr.  I would like to
> be able to index multiple fuzzy / phonetic match encodings, e.g. one of the
> packaged phonetic encodings, my own phonetic encoding, my own or a packaged
> q-gram encoding that will capture string overlap, etc., and then be able to
> filter out the results I consider "false positives" in a dynamic, custom
> way.  The general approaches I've seen for this are:
> 
> 1. Use Solr's fuzzy queries.  I haven't been able to achieve acceptable
> performance using fuzzy queries, and also the fuzzy queries lack the dynamic
> flexibility above.  e.g. whether or not I filter a phonetic match from
> results may depend on a lot of things (whether or not there were exact
> matches on relevant entities, who the user is, etc), and I can't achieve
> this flexibility with a fuzzy field query.
> 
> 2. Create an RMI-based client/server setup so that I can use the
> SolrIndexSearcher to pass in a customer Collector (as in Ch. 9 of Lucene in
> Action, but add in a custom Collector).  A custom Collector seems like
> exactly what I want but I don't see a way to achieve this using any of the
> packaged SolrServer implementations that support a remote setup like this.
> I also worry a about the stability of the remote object framework since it's
> been moved over to contrib and it seems that there may be serialization
> issues or other instability
> (http://lucene.472066.n3.nabble.com/extending-SolrIndexSearcher-td472809.htm
> l).
> 
> 3. Continue to use the CommonsHttpSolrServer object for querying my index,
> but add in post-processing to dynamically filter results.  This seems doable
> but unnatural and potentially inefficient given that I need to worry about
> supporting pagination and facet counts in such a framework.
> 
> Is there an easier way to do custom dynamic results filtering (like via a
> custom Collector) while still using CommonsHttpSolrServer?  Do people have
> any other suggestions or insights about the approaches summarized above?
> 
> Thanks,
> Dave
>

Re: Solr and Eclipse

2011-02-02 Thread Eric Grobler

> I only use eclipse as a fancy text editor!
Eclipse will feed insulted :-)

I will just try to create hot keys to start/stop jetty manually.
Thanks for your feedback

Regards
Ericz


On Wed, Feb 2, 2011 at 4:26 PM, Robert Muir  wrote:

> On Wed, Feb 2, 2011 at 11:15 AM, Eric Grobler 
> wrote:
> >> I always use New...Other...SVN...Checkout Projects from SVN
> >
> > And how do run in eclipse jetty in the example folder?
> >
>
> you can always go to the commandline and use the usual techniques,
> e.g. ant run-example, or java -jar start.jar from the example
> folder... this is what i do, sorry I don't have a better suggestion, I
> only use eclipse as a fancy text editor!
>

Reg filter criteria on multivalued attribute

2011-02-02 Thread bbarani


Hi,

I have a question on filters on multivalued atrribute. Is there a way to
filter a multivalue attribute based on a particular value inside that
attribute?

Consider the below example.


DEF_BY
BEL_TO


I want to do a search which returns the result which just has only the
relationship DEF_BY and not BEL_TO. Currently if I do a normal search for
DEF_BY, the documens which contains DEF_BY along with other relationship is
being returned rather I want the documents that contain only DEF_BY under
relationship shoudl be returned. Also is there a way to make SOLR return the
documents based on the number of elements in multivalue attribute? If thats
possible I can first make SOLR return those documents and then do a filter
against that for my search on top of the results returned.

Is there a way to write a query to do this? Any pointers or help in this
regard would be appreciated..

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Reg-filter-criteria-on-multivalued-attribute-tp2406904p2406904.html
Sent from the Solr - User mailing list archive at Nabble.com.

OAI on SOLR already done?

2011-02-02 Thread Paul Libbrecht


Hello list,

I've met a few google matches that indicate that SOLR-based servers implement 
the Open Archive Initiative's Metadata Harvesting Protocol.

Is there something made to be re-usable that would be an add-on to solr?

thanks in advance

paul

Partial matches don't work (solr.NGramFilterFactory

2011-02-02 Thread Script Head

Hello,

I have the following definitions in my schema.xml:










...

...


There is a document "Hippopotamus is fatter than a Platypus" indexed.
When I search for "Hippopotamus" I receive the expected result. When I
search for any partial such as "Hippo" or "potamu" I get nothing. I
could use some guidance.

Script Head

Re: Partial matches don't work (solr.NGramFilterFactory

2011-02-02 Thread Tomás Fernández Löbbe

About this:

The NGrams are going to be indexed on the field "text_ngrams", not on
"text". For the field "text", Solr will apply the text analysis (which I
guess doesn't have NGrams). You have to search on the "text_ngrams" field,
something like "text_ngrams:hippo" or "text_ngrams:potamu". Are you
searching like this?

Tomás

On Wed, Feb 2, 2011 at 4:07 PM, Script Head  wrote:

> Hello,
>
> I have the following definitions in my schema.xml:
>
> 
>
>
> maxGramSize="15"/>
>
>
>
>
> 
> ...
>  stored="true"/>
> ...
> 
>
> There is a document "Hippopotamus is fatter than a Platypus" indexed.
> When I search for "Hippopotamus" I receive the expected result. When I
> search for any partial such as "Hippo" or "potamu" I get nothing. I
> could use some guidance.
>
> Script Head
>

Re: CUSTOM JSP FOR APACHE SOLR

2011-02-02 Thread Tomás Fernández Löbbe

Hi Paul, I don't fully understand what you want to do. The way, I think,
SolrJ is intended to be used is from a client application (outside Solr). If
what you want is something like what's done with Velocity I think you could
implement a response writer that renders the JSP and send it on the
response.

Tomás


On Mon, Jan 31, 2011 at 6:25 PM, Paul Libbrecht  wrote:

> Tomas,
>
> I also know velocity can be used and works well.
> I would be interested to a simpler way to have the objects of SOLR
> available in a jsp than write a custom jsp processor as a request handler;
> indeed, this seems to be the way solrj is expected to be used in the wiki
> page.
>
> Actually I migrated to velocity (which I like less than jsp) just because I
> did not find a response to this question.
>
> paul
>
>
> Le 31 janv. 2011 à 21:53, Tomás Fernández Löbbe a écrit :
>
> > Hi John, you can use whatever you want for building your application,
> using
> > Solr on the backend (JSP included). You should find all the information
> you
> > need on Solr's wiki page:
> > http://wiki.apache.org/solr/
> >
> > including some client libraries to easy
> > integrate your application with Solr:
> > http://wiki.apache.org/solr/IntegratingSolr
> >
> > for fast prototyping you
> could
> > use Velocity:
> > http://wiki.apache.org/solr/VelocityResponseWriter
> >
> > Anyway, I recommend
> you
> > to start with Solr's tutorial:
> > http://lucene.apache.org/solr/tutorial.html
> >
> >
> > Good luck,
> > Tomás
> >
> > 2011/1/31 JOHN JAIRO GÓMEZ LAVERDE 
> >
> >>
> >>
> >> SOLR LUCENE
> >> DEVELOPERS
> >>
> >> Hi i am new to solr and i like to make a custom search page for
> enterprise
> >> users
> >> in JSP that takes the results of Apache Solr.
> >>
> >> - Where i can find some useful examples for that topic ?
> >> - Is JSP the correct approach to solve mi requirement ?
> >> - If not what is the best solution to build a customize search page for
> my
> >> users?
> >>
> >> Thanks
> >> from South America
> >>
> >> JOHN JAIRO GOMEZ LAVERDE
> >> Bogotá - Colombia
> >>
>
>

Re: OAI on SOLR already done?

2011-02-02 Thread Péter Király

Hi,

I don't know whether it fits to your need, but we are builing a tool
based on Drupal (eXtensible Catalog Drupal Toolkit), which can harvest
with OAI-PMH and index the harvested records into Solr. The records is
harvested, processed, and stored into MySQL, then we index them into
Solr. We created some ways to manipulate the original values before
sending to Solr. We created it in a modular way, so you can change
settings in an admin interface or write your own "hooks" (special
Drupal functions), to taylor the application to your needs. We support
only Dublin Core, and our own FRBR-like schema (called XC schema), but
you can add more schemas. Since this forum is about Solr, and not
applications using Solr, if you interested this tool, plase write me a
private message, or visit http://eXtensibleCatalog.org, or the
module's page at http://drupal.org/project/xc.

Hope this helps,

Péter
eXtensible Catalog

2011/2/2 Paul Libbrecht :
>
> Hello list,
>
> I've met a few google matches that indicate that SOLR-based servers implement 
> the Open Archive Initiative's Metadata Harvesting Protocol.
>
> Is there something made to be re-usable that would be an add-on to solr?
>
> thanks in advance
>
> paul

Re: OAI on SOLR already done?

2011-02-02 Thread Paul Libbrecht

Peter,

I'm afraid your service is harvesting and I am trying to look at a PMH provider 
service.

Your project appeared early in the goolge matches.

paul


Le 2 févr. 2011 à 20:46, Péter Király a écrit :

> Hi,
> 
> I don't know whether it fits to your need, but we are builing a tool
> based on Drupal (eXtensible Catalog Drupal Toolkit), which can harvest
> with OAI-PMH and index the harvested records into Solr. The records is
> harvested, processed, and stored into MySQL, then we index them into
> Solr. We created some ways to manipulate the original values before
> sending to Solr. We created it in a modular way, so you can change
> settings in an admin interface or write your own "hooks" (special
> Drupal functions), to taylor the application to your needs. We support
> only Dublin Core, and our own FRBR-like schema (called XC schema), but
> you can add more schemas. Since this forum is about Solr, and not
> applications using Solr, if you interested this tool, plase write me a
> private message, or visit http://eXtensibleCatalog.org, or the
> module's page at http://drupal.org/project/xc.
> 
> Hope this helps,
> 
> Péter
> eXtensible Catalog
> 
> 2011/2/2 Paul Libbrecht :
>> 
>> Hello list,
>> 
>> I've met a few google matches that indicate that SOLR-based servers 
>> implement the Open Archive Initiative's Metadata Harvesting Protocol.
>> 
>> Is there something made to be re-usable that would be an add-on to solr?
>> 
>> thanks in advance
>> 
>> paul

Re: OAI on SOLR already done?

2011-02-02 Thread Péter Király

Hi Paul,

yes, you are right, the project is about harvesting, and not to be harvestable.

Péter

2011/2/2 Paul Libbrecht :
> Peter,
>
> I'm afraid your service is harvesting and I am trying to look at a PMH 
> provider service.
>
> Your project appeared early in the goolge matches.
>
> paul
>
>
> Le 2 févr. 2011 à 20:46, Péter Király a écrit :
>
>> Hi,
>>
>> I don't know whether it fits to your need, but we are builing a tool
>> based on Drupal (eXtensible Catalog Drupal Toolkit), which can harvest
>> with OAI-PMH and index the harvested records into Solr. The records is
>> harvested, processed, and stored into MySQL, then we index them into
>> Solr. We created some ways to manipulate the original values before
>> sending to Solr. We created it in a modular way, so you can change
>> settings in an admin interface or write your own "hooks" (special
>> Drupal functions), to taylor the application to your needs. We support
>> only Dublin Core, and our own FRBR-like schema (called XC schema), but
>> you can add more schemas. Since this forum is about Solr, and not
>> applications using Solr, if you interested this tool, plase write me a
>> private message, or visit http://eXtensibleCatalog.org, or the
>> module's page at http://drupal.org/project/xc.
>>
>> Hope this helps,
>>
>> Péter
>> eXtensible Catalog
>>
>> 2011/2/2 Paul Libbrecht :
>>>
>>> Hello list,
>>>
>>> I've met a few google matches that indicate that SOLR-based servers 
>>> implement the Open Archive Initiative's Metadata Harvesting Protocol.
>>>
>>> Is there something made to be re-usable that would be an add-on to solr?
>>>
>>> thanks in advance
>>>
>>> paul
>
>

Re: OAI on SOLR already done?

2011-02-02 Thread Jonathan Rochkind

The trick is that you can't just have a generic black box OAI-PMH 
provider on top of any Solr index. How would it know where to get the 
metadata elements it needs, such as title, or last-updated date, etc. 
Any given solr index might not even have this in stored fields -- and a 
given app might want to look them up from somewhere other than stored 
fields.


If the Solr index does have them in stored fields, and you do want to 
get them from the stored fields, then it's, I think (famous last words) 
relatively straightforward code to write. A mapping from solr stored 
fields to metadata elements needed for OAI-PMH, and then simply 
outputting the XML template with those filled in.


I am not aware of anyone that has done this in a 
re-useable/configurable-for-your-solr tool. You could possibly do it 
solely using the built-in Solr 
JSP/XSLT/other-templating-stuff-I-am-not-familiar-with stuff, rather 
than as an external Solr client app, or it could be an external Solr 
client app.


This is actually a very similar problem to something someone else asked 
a few days ago "Does anyone have an OpenSearch add-on for Solr?"  Very 
very similar problem, just with a different XML template for output 
(usually RSS or Atom) instead of OAI-PMH.


On 2/2/2011 3:14 PM, Paul Libbrecht wrote:

Peter,

I'm afraid your service is harvesting and I am trying to look at a PMH provider 
service.

Your project appeared early in the goolge matches.

paul


Le 2 févr. 2011 à 20:46, Péter Király a écrit :


Hi,

I don't know whether it fits to your need, but we are builing a tool
based on Drupal (eXtensible Catalog Drupal Toolkit), which can harvest
with OAI-PMH and index the harvested records into Solr. The records is
harvested, processed, and stored into MySQL, then we index them into
Solr. We created some ways to manipulate the original values before
sending to Solr. We created it in a modular way, so you can change
settings in an admin interface or write your own "hooks" (special
Drupal functions), to taylor the application to your needs. We support
only Dublin Core, and our own FRBR-like schema (called XC schema), but
you can add more schemas. Since this forum is about Solr, and not
applications using Solr, if you interested this tool, plase write me a
private message, or visit http://eXtensibleCatalog.org, or the
module's page at http://drupal.org/project/xc.

Hope this helps,

Péter
eXtensible Catalog

2011/2/2 Paul Libbrecht:

Hello list,

I've met a few google matches that indicate that SOLR-based servers implement 
the Open Archive Initiative's Metadata Harvesting Protocol.

Is there something made to be re-usable that would be an add-on to solr?

thanks in advance

paul

RE: OAI on SOLR already done?

2011-02-02 Thread Demian Katz

I already replied to the original poster off-list, but it seems that it may be 
worth weighing in here as well...

The next release of VuFind (http://vufind.org) is going to include OAI-PMH 
server support.  As you say, there is really no way to plug OAI-PMH directly 
into Solr...  but a tool like VuFind can provide a fairly generic, extensible, 
Solr-based platform for building an OAI-PMH server.  Obviously this is helpful 
for some use cases and not others...  but I'm happy to provide more information 
if anyone needs it.

- Demian

From: Jonathan Rochkind [rochk...@jhu.edu]
Sent: Wednesday, February 02, 2011 3:38 PM
To: solr-user@lucene.apache.org
Cc: Paul Libbrecht
Subject: Re: OAI on SOLR already done?

The trick is that you can't just have a generic black box OAI-PMH
provider on top of any Solr index. How would it know where to get the
metadata elements it needs, such as title, or last-updated date, etc.
Any given solr index might not even have this in stored fields -- and a
given app might want to look them up from somewhere other than stored
fields.

If the Solr index does have them in stored fields, and you do want to
get them from the stored fields, then it's, I think (famous last words)
relatively straightforward code to write. A mapping from solr stored
fields to metadata elements needed for OAI-PMH, and then simply
outputting the XML template with those filled in.

I am not aware of anyone that has done this in a
re-useable/configurable-for-your-solr tool. You could possibly do it
solely using the built-in Solr
JSP/XSLT/other-templating-stuff-I-am-not-familiar-with stuff, rather
than as an external Solr client app, or it could be an external Solr
client app.

This is actually a very similar problem to something someone else asked
a few days ago "Does anyone have an OpenSearch add-on for Solr?"  Very
very similar problem, just with a different XML template for output
(usually RSS or Atom) instead of OAI-PMH.

On 2/2/2011 3:14 PM, Paul Libbrecht wrote:
> Peter,
>
> I'm afraid your service is harvesting and I am trying to look at a PMH 
> provider service.
>
> Your project appeared early in the goolge matches.
>
> paul
>
>
> Le 2 févr. 2011 à 20:46, Péter Király a écrit :
>
>> Hi,
>>
>> I don't know whether it fits to your need, but we are builing a tool
>> based on Drupal (eXtensible Catalog Drupal Toolkit), which can harvest
>> with OAI-PMH and index the harvested records into Solr. The records is
>> harvested, processed, and stored into MySQL, then we index them into
>> Solr. We created some ways to manipulate the original values before
>> sending to Solr. We created it in a modular way, so you can change
>> settings in an admin interface or write your own "hooks" (special
>> Drupal functions), to taylor the application to your needs. We support
>> only Dublin Core, and our own FRBR-like schema (called XC schema), but
>> you can add more schemas. Since this forum is about Solr, and not
>> applications using Solr, if you interested this tool, plase write me a
>> private message, or visit http://eXtensibleCatalog.org, or the
>> module's page at http://drupal.org/project/xc.
>>
>> Hope this helps,
>>
>> Péter
>> eXtensible Catalog
>>
>> 2011/2/2 Paul Libbrecht:
>>> Hello list,
>>>
>>> I've met a few google matches that indicate that SOLR-based servers 
>>> implement the Open Archive Initiative's Metadata Harvesting Protocol.
>>>
>>> Is there something made to be re-usable that would be an add-on to solr?
>>>
>>> thanks in advance
>>>
>>> paul
>

Use Parallel Search

2011-02-02 Thread Gustavo Maia

Hello,

Let me give a brief description of my scenario.
Today I am only using Lucene 2.9.3. I have an index of 30 million documents
distributed on three machines and each machine with 6 hds (15k rmp).
The server queries the search index using the remote class search. And each
machine is made to search using the parallel search (search simultaneously
in 6 hds).
So during the search are simulating using the three machines and 18 hds,
returning me to a very good response time.


Today I am studying the SOLR and am interested in knowing more about the
searches and use of distributed parallel search on the same machine. What
would be the best scenario using SOLR that is better than I already am using
today only with lucene?
  Note: I need to have installed on each machine 6 SOLR instantiate from my
server? One for each hd? Or would some other alternative way for me to use
the 6 hds without having 6 instances of SORL server?

  Another question would be if the SOLR would have some limiting size index
for Hard drive? It would be interesting not index too big because when the
index increased the longer the search.

Thanks for everything.

from long to tlong, compatible?

2011-02-02 Thread Dan G

Hi, 


I'm using SOLR 1.4.1 and have a rather large index with 800+M docs. 
Until now we have, erroneously I think, indexed a long field with the type:



Now the range queries have become slow as there are many distinct terms in the 
index. 


My question is if it would be possible to just change the field to the 
preferred 

type "tlong" with a precision of "8"?



Would this change be compatible with my indexed data or should I re-indexed the 
date (a pain with 800+M docs :))?

Thanks
/dise

Re: from long to tlong, compatible?

2011-02-02 Thread Yonik Seeley

On Wed, Feb 2, 2011 at 3:46 PM, Dan G  wrote:

> My question is if it would be possible to just change the field to the
> preferred
> type "tlong" with a precision of "8"?
>
> Would this change be compatible with my indexed data or should I re-indexed
> the
> date (a pain with 800+M docs :))?
>

I think you'll need to re-index, or range queries on that field will miss
many of the documents you've already indexed with precisionStep=0

-Yonik
http://lucidimagination.com

copyFields, multiple terms -- IDF?

2011-02-02 Thread Martin J

Hi, I'm having a weirdness with indexing multiple terms to a single field
using a copyField. An example:

For document A
field:contents_1 is a multivalued field containing "cat", "dog" and "duck"
field:contents_2 is a multivalued field containing "cat", "horse", and
"flower"

For document B
field:contents_1 is a multivalued field containing "cat" and "fish"
field:contents_2 is a multivalued field containing "bear" and "turkey"

I have a copyField in my schema:

 

A query like contents_1:cat contents_2:cat returns document A first, and
then document B. I think that is the way it should work.

But a query like combined:cat returns document B first. In my mind, when I
am doing a copyField I am copying each of the terms in the multivalued
fields of contents_1 and contents_2 into combined, so that combined
internally has "cat", "dog", "duck", "cat", "horse", "flower" for document
A.

An explain on the query says something like (this is from a real query not
the fake one above)



4.0687284 = (MATCH) fieldWeight(combined:cat in 1663089), product of: 1.0 =
tf(termFreq(combined:cat)=1) 4.0687284 = idf(docFreq=135688,
maxDocs=2919285) 1.0 = fieldNorm(field=combined, doc=1663089)


0.8509077 = (MATCH) fieldWeight(combined:cat in 913171), product of:
2.236068 = tf(termFreq(combined:cat)=5) 4.0590663 = idf(docFreq=143689,
maxDocs=3061697) 0.09375 = fieldNorm(field=combined, doc=913171)


If I am reading this right, it is finding the higher TF in A (5 in this
case) but still scoring B higher. Shouldn't idf be exactly the same?

(Both fields are a solr.TextField:

 
  






  

)

Another piece of perhaps relevant information is that this a query over 16
shards using distributed solr.

Re: copyFields, multiple terms -- IDF?

2011-02-02 Thread Martin J

On a closer review, i am noticing that the fieldNorm is what is killing
document A.
If I reindex with omitNorms=true, will this problem be "solved"?


On Wed, Feb 2, 2011 at 4:54 PM, Martin J  wrote:

> Hi, I'm having a weirdness with indexing multiple terms to a single field
> using a copyField. An example:
>
> For document A
> field:contents_1 is a multivalued field containing "cat", "dog" and "duck"
> field:contents_2 is a multivalued field containing "cat", "horse", and
> "flower"
>
> For document B
> field:contents_1 is a multivalued field containing "cat" and "fish"
> field:contents_2 is a multivalued field containing "bear" and "turkey"
>
> I have a copyField in my schema:
>
>  
>
> A query like contents_1:cat contents_2:cat returns document A first, and
> then document B. I think that is the way it should work.
>
> But a query like combined:cat returns document B first. In my mind, when I
> am doing a copyField I am copying each of the terms in the multivalued
> fields of contents_1 and contents_2 into combined, so that combined
> internally has "cat", "dog", "duck", "cat", "horse", "flower" for document
> A.
>
> An explain on the query says something like (this is from a real query not
> the fake one above)
>
> 
> 
> 4.0687284 = (MATCH) fieldWeight(combined:cat in 1663089), product of: 1.0 =
> tf(termFreq(combined:cat)=1) 4.0687284 = idf(docFreq=135688,
> maxDocs=2919285) 1.0 = fieldNorm(field=combined, doc=1663089)
> 
> 
> 0.8509077 = (MATCH) fieldWeight(combined:cat in 913171), product of:
> 2.236068 = tf(termFreq(combined:cat)=5) 4.0590663 = idf(docFreq=143689,
> maxDocs=3061697) 0.09375 = fieldNorm(field=combined, doc=913171)
> 
>
> If I am reading this right, it is finding the higher TF in A (5 in this
> case) but still scoring B higher. Shouldn't idf be exactly the same?
>
> (Both fields are a solr.TextField:
>
>  
>   
> 
> 
> 
> 
>  ignoreCase="true"/>
>  protected="protwords.txt"/>
>   
> 
> )
>
> Another piece of perhaps relevant information is that this a query over 16
> shards using distributed solr.
>

Re: OAI on SOLR already done?

2011-02-02 Thread Dennis Gearon

Does something like this work to extract dates, phone numbers, addresses across 
international formats and languages?

Or, just in the plain ol' USA?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Demian Katz 
To: "solr-user@lucene.apache.org" 
Cc: Paul Libbrecht 
Sent: Wed, February 2, 2011 12:40:58 PM
Subject: RE: OAI on SOLR already done?

I already replied to the original poster off-list, but it seems that it may be 
worth weighing in here as well...

The next release of VuFind (http://vufind.org) is going to include OAI-PMH 
server support.  As you say, there is really no way to plug OAI-PMH directly 
into Solr...  but a tool like VuFind can provide a fairly generic, extensible, 
Solr-based platform for building an OAI-PMH server.  Obviously this is helpful 
for some use cases and not others...  but I'm happy to provide more information 
if anyone needs it.

- Demian

From: Jonathan Rochkind [rochk...@jhu.edu]
Sent: Wednesday, February 02, 2011 3:38 PM
To: solr-user@lucene.apache.org
Cc: Paul Libbrecht
Subject: Re: OAI on SOLR already done?

The trick is that you can't just have a generic black box OAI-PMH
provider on top of any Solr index. How would it know where to get the
metadata elements it needs, such as title, or last-updated date, etc.
Any given solr index might not even have this in stored fields -- and a
given app might want to look them up from somewhere other than stored
fields.

If the Solr index does have them in stored fields, and you do want to
get them from the stored fields, then it's, I think (famous last words)
relatively straightforward code to write. A mapping from solr stored
fields to metadata elements needed for OAI-PMH, and then simply
outputting the XML template with those filled in.

I am not aware of anyone that has done this in a
re-useable/configurable-for-your-solr tool. You could possibly do it
solely using the built-in Solr
JSP/XSLT/other-templating-stuff-I-am-not-familiar-with stuff, rather
than as an external Solr client app, or it could be an external Solr
client app.

This is actually a very similar problem to something someone else asked
a few days ago "Does anyone have an OpenSearch add-on for Solr?"  Very
very similar problem, just with a different XML template for output
(usually RSS or Atom) instead of OAI-PMH.

On 2/2/2011 3:14 PM, Paul Libbrecht wrote:
> Peter,
>
> I'm afraid your service is harvesting and I am trying to look at a PMH 
> provider 
>service.
>
> Your project appeared early in the goolge matches.
>
> paul
>
>
> Le 2 févr. 2011 à 20:46, Péter Király a écrit :
>
>> Hi,
>>
>> I don't know whether it fits to your need, but we are builing a tool
>> based on Drupal (eXtensible Catalog Drupal Toolkit), which can harvest
>> with OAI-PMH and index the harvested records into Solr. The records is
>> harvested, processed, and stored into MySQL, then we index them into
>> Solr. We created some ways to manipulate the original values before
>> sending to Solr. We created it in a modular way, so you can change
>> settings in an admin interface or write your own "hooks" (special
>> Drupal functions), to taylor the application to your needs. We support
>> only Dublin Core, and our own FRBR-like schema (called XC schema), but
>> you can add more schemas. Since this forum is about Solr, and not
>> applications using Solr, if you interested this tool, plase write me a
>> private message, or visit http://eXtensibleCatalog.org, or the
>> module's page at http://drupal.org/project/xc.
>>
>> Hope this helps,
>>
>> Péter
>> eXtensible Catalog
>>
>> 2011/2/2 Paul Libbrecht:
>>> Hello list,
>>>
>>> I've met a few google matches that indicate that SOLR-based servers 
>>> implement 
>>>the Open Archive Initiative's Metadata Harvesting Protocol.
>>>
>>> Is there something made to be re-usable that would be an add-on to solr?
>>>
>>> thanks in advance
>>>
>>> paul
>

Re: OAI on SOLR already done?

2011-02-02 Thread Jonathan Rochkind


On 2/2/2011 5:19 PM, Dennis Gearon wrote:

Does something like this work to extract dates, phone numbers, addresses across
international formats and languages?

Or, just in the plain ol' USA?


What are you talking about?  There is nothing discussed in this thread 
that does any 'extracting' of dates, phone numbers or addresses at all , 
whether in international or domestic formats.

Re: OAI on SOLR already done?

2011-02-02 Thread Paul Libbrecht

I would think OAI certainly has a trans-national format for dates.
And that probably dives well into SOLR's own date format.

But all of that is non-user-oriented so... no culture dependency in principle.

paul


Le 2 févr. 2011 à 23:19, Dennis Gearon a écrit :

> Does something like this work to extract dates, phone numbers, addresses 
> across 
> international formats and languages?
> 
> Or, just in the plain ol' USA?

Solr inserting Multivalued filelds

2011-02-02 Thread rahul


Hi,

I am a newbie to Apache Solr.

We are using ContentStreamUpdateRequest to insert into Solr. For eg,

ContentStreamUpdateRequest req = new ContentStreamUpdateRequest(
"/update/extract")
req.addContentStream(stream); 
req.addContentStream(literal.name, name);

SolrServer server = new CommonsHttpSolrServer(URL);
server.request(req);

here.. in schema.xml , I have specified name as multivalued text. 

Now, I want to set multivalues for name. Could anyone update me, how to do
this..??

Additonally, assume if I set commit maxTime as 1 in my solr
configuration file, ie (10 sec) then if my application stops before
committing into solr, then whether this information will be lost.

whether I need to insert all these documents to solr again ??

thanks in Advance..
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-inserting-Multivalued-filelds-tp2406612p2406612.html
Sent from the Solr - User mailing list archive at Nabble.com.

Partial matches don't work (solr.NGramFilterFactory)

2011-02-02 Thread Script Head

Hello,

I have the following definitions in my schema.xml:


    
        
        
    
    
        
    

...

...


There is a document "Hippopotamus is fatter than a Platypus" indexed.
When I search for "Hippopotamus" I receive the expected result. When I
search for any partial such as "Hippo" or "potamu" I get nothing. I
could use some guidance.

Re: Partial matches don't work (solr.NGramFilterFactory

2011-02-02 Thread Script Head

Yes, I have tried searching on text_ngrams as well and it produces no results.

On a related note, since I have  wouldn't the ngrams produced by text_ngrams field
definition also be available within the text field?


2011/2/2 Tomás Fernández Löbbe :
> About this:
>
> 
>
> The NGrams are going to be indexed on the field "text_ngrams", not on
> "text". For the field "text", Solr will apply the text analysis (which I
> guess doesn't have NGrams). You have to search on the "text_ngrams" field,
> something like "text_ngrams:hippo" or "text_ngrams:potamu". Are you
> searching like this?
>
> Tomás
>
> On Wed, Feb 2, 2011 at 4:07 PM, Script Head  wrote:
>
>> Hello,
>>
>> I have the following definitions in my schema.xml:
>>
>> 
>>    
>>        
>>        > maxGramSize="15"/>
>>    
>>    
>>        
>>    
>> 
>> ...
>> > stored="true"/>
>> ...
>> 
>>
>> There is a document "Hippopotamus is fatter than a Platypus" indexed.
>> When I search for "Hippopotamus" I receive the expected result. When I
>> search for any partial such as "Hippo" or "potamu" I get nothing. I
>> could use some guidance.
>>
>> Script Head
>>
>

Re: OAI on SOLR already done?

2011-02-02 Thread Dennis Gearon

I guess I didn't understand 'meta data'. That's why I asked the question.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Sent: Wed, February 2, 2011 2:26:32 PM
Subject: Re: OAI on SOLR already done?

On 2/2/2011 5:19 PM, Dennis Gearon wrote:
> Does something like this work to extract dates, phone numbers, addresses 
across
> international formats and languages?
> 
> Or, just in the plain ol' USA?

What are you talking about?  There is nothing discussed in this thread that 
does 
any 'extracting' of dates, phone numbers or addresses at all , whether in 
international or domestic formats.

Re: Use Parallel Search

2011-02-02 Thread Gustavo Maia

2011/2/2 Gustavo Maia 

> Hello,
>
> Let me give a brief description of my scenario.
> Today I am only using Lucene 2.9.3. I have an index of 30 million
> documents distributed on three machines and each machine with 6 hds (15k
> rmp).
> The server queries the search index using the remote class search. And
> each machine is made to search using the parallel search (search
> simultaneously in 6 hds).
> So during the search are simulating using the three machines and 18 hds,
> returning me to a very good response time.
>
>
> Today I am studying the SOLR and am interested in knowing more about the
> searches and use of distributed parallel search on the same machine. What
> would be the best scenario using SOLR that is better than I already am
> using today only with lucene?
>   Note: I need to have installed on each machine 6 SOLR instantiate from
> my server? One for each hd? Or would some other alternative way for me to use
> the 6 hds without having 6 instances of SORL server?
>
>   Another question would be if the SOLR would have some limiting size
> index for Hard drive? It would be interesting not index too big because
> when the index increased the longer the search.
>
> Thanks for everything.

Time fields

2011-02-02 Thread Dennis Gearon

For time of day fields, NOT unix timestamp/dates, what is the best way to do 
that?

I can think of seconds since beginning of day as integer
OR
string

Any other ideas? Assume that I'll be using range queries. TIA.



 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

MANY thanks for help on path so far (first of 2 steps on 1000step path :-)

2011-02-02 Thread Dennis Gearon

Got my API to input into both the database and the Solr instance, search 
geograhically/chronologically in Solr.

Next is Update and Delete. And then .. and then ... and then ..

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

DataImportHandler: no queries when using entity=something

2011-02-02 Thread Jon Drukman

So I'm trying to update a single entity in my index using DataImportHandler.

http://solr:8983/solr/dataimport?command=full-import&entity=games

It ends near-instantaneously without hitting the database at all, apparently.

Status shows:

0
0
0
0

Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.

2011-02-02 16:24:13
2011-02-02 16:24:13
0:0:0.20

The query isn't that extreme.  It returns 8771 rows in about 3 seconds.

How can I debug this?

Re: Time fields

2011-02-02 Thread Adam Estrada

If your using a DIH you can configure it however you want. Here is a
snippet of my code. Note the DateTimeTransformer.


  
  


  
  
  
  
  
  
  
  
  
  
  
  
  


On Wed, Feb 2, 2011 at 7:28 PM, Dennis Gearon  wrote:
> For time of day fields, NOT unix timestamp/dates, what is the best way to do
> that?
>
> I can think of seconds since beginning of day as integer
> OR
> string
>
> Any other ideas? Assume that I'll be using range queries. TIA.
>
>
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
> better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>

Re: DataImportHandler: no queries when using entity=something

2011-02-02 Thread Gora Mohanty

On Thu, Feb 3, 2011 at 6:08 AM, Jon Drukman  wrote:
> So I'm trying to update a single entity in my index using DataImportHandler.
>
> http://solr:8983/solr/dataimport?command=full-import&entity=games
>
> It ends near-instantaneously without hitting the database at all, apparently.
[...]

* Does the data import work without the entity=... ?
* Please show us your data import configuration file, and schema.xml.

Regards,
Gora

Using terms and N-gram

2011-02-02 Thread openvictor Open

Dear all,

I am trying to implement an autocomplete system for research. But I am stuck
on some problems that I can't solve.

Here is my problem :
I give text like :
"the cat is black" and I want to explore all 1 gram to 8 gram for all the
text that are passed :
the, cat, is, black, the cat, cat is, is black, etc...

In order to do that I have defined the following fieldtype in my schema :



  


  
  


  



Then the following field :



Then I feed solr with some phrases and I was really surprised to see that
Solr didn't behave as expected.
I went to the schema browser to see the result for the very profound query :
"the cat is black and it rains"

The results are quite deceiving : first 1 grams are not found. some 2 grams
are found like : the_cat, "and_it" etc... But not what I expected.
Is there something I am missing here ? (by the way I also tried to remove
the mingramsize and maxgramsize even the words).

Thank you,
Victor Kabdebon

solr-user@lucene.apache.org

2011-02-02 Thread Bill Bell

solr-user-help

Function Question

2011-02-02 Thread Bill Bell

This is posted as an enhancement on SOLR-2345.

I am willing to work on it. But I am stuck. I would like to loop through the 
lat/long values when they are stored in a multiValue list. But it appears that 
I cannot figure out to do that. For example:


sort=geodist() asc

This should grab the closest point in the MultiValue list, and return the 
distance so that is can be scored.

The problem is I cannot find a way to get the MultiValue list?

In function: 
src/java/org/apache/solr/search/function/distance/HaversineConstFunction.java

Has code similar to:

VectorValueSource p2;
this.p2 = vs
List sources = p2.getSources();
ValueSource latSource = sources.get(0);
ValueSource lonSource = sources.get(1);
DocValues latVals = latSource.getValues(context1, readerContext1);
DocValues lonVals = lonSource.getValues(context1, readerContext1);
double latRad = latVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
double lonRad = lonVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
etc...

It would be good if I could loop through sources.get() but it only returns 2 
sources even when there are 2 pairs of lat/long. The getSources() only returns 
the following:

sources:[double(store_0_coordinate), double(store_1_coordinate)]

How do I just get the 4 values in the function?

Function Question

2011-02-02 Thread Bill Bell


This is posted as an enhancement on SOLR-2345.

I am willing to work on it. But I am stuck. I would like to loop through
the lat/long values when they are stored in a multiValue list. But it
appears that I cannot figure out to do that. For example:

sort=geodist() asc
This should grab the closest point in the MultiValue list, and return the
distance so that is can be scored.
The problem is I cannot find a way to get the MultiValue list?
In function: 
src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja
va
Has code similar to:
VectorValueSource p2;
this.p2 = vs
List sources = p2.getSources();
ValueSource latSource = sources.get(0);
ValueSource lonSource = sources.get(1);
DocValues latVals = latSource.getValues(context1, readerContext1);
DocValues lonVals = lonSource.getValues(context1, readerContext1);
double latRad = latVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
double lonRad = lonVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
etc...
It would be good if I could loop through sources.get() but it only returns
2 sources even when there are 2 pairs of lat/long. The getSources() only
returns the following:
sources:[double(store_0_coordinate), double(store_1_coordinate)]
How do I just get the 4 values in the function?

Function Question

2011-02-02 Thread Bill Bell

This is posted as an enhancement on SOLR-2345.

I am willing to work on it. But I am stuck. I would like to loop through
the lat/long values when they are stored in a multiValue list. But it
appears that I cannot figure out to do that. For example:


sort=geodist() asc

This should grab the closest point in the MultiValue list, and return the
distance so that is can be scored.

The problem is I cannot find a way to get the MultiValue list?

In function: 
src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja
va

Has code similar to:

VectorValueSource p2;
this.p2 = vs
List sources = p2.getSources();
ValueSource latSource = sources.get(0);
ValueSource lonSource = sources.get(1);
DocValues latVals = latSource.getValues(context1, readerContext1);
DocValues lonVals = lonSource.getValues(context1, readerContext1);
double latRad = latVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
double lonRad = lonVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
etc...

It would be good if I could loop through sources.get() but it only returns
2 sources even when there are 2 pairs of lat/long. The getSources() only
returns the following:

sources:[double(store_0_coordinate), double(store_1_coordinate)]

How do I just get the 4 values in the function?

Re: Partial matches don't work (solr.NGramFilterFactory)

2011-02-02 Thread Grijesh


Use analysis.jsp to see how your analysis is going .
Also you can see the parse queries by adding  the parameter debugQuery=on on
request URL

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-matches-don-t-work-solr-NGramFilterFactory-tp2409421p2411208.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Open Too Many Files

2011-02-02 Thread Grijesh


increase the OS parameter for max file to open. it is less set as default
depending on the OS.

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Open-Too-Many-Files-tp2406289p2411217.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using terms and N-gram

2011-02-02 Thread Grijesh


Use analysis.jsp to see what happening at index time and query time with your
input data.You can use highlighting to see if match found.

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr inserting Multivalued filelds

2011-02-02 Thread rahul


Nevermind.. got the details from here..

http://wiki.apache.org/solr/ExtractingRequestHandler

Thanks..
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-inserting-Multivalued-filelds-tp2406612p2411248.html
Sent from the Solr - User mailing list archive at Nabble.com.

59 matches

Mail list logo