RE: Writing custom data import handler for Solr.

2012-06-09 Thread ram anam

Thanks for the guidance. But is there any documentation that describes the 
steps to implement custom data source and integrate it with SOLR. The data 
source I am trying to integrate is like Amazon S3 Buckets. But provider is 
different.

Thanks and regards,Ram Anam.

> Date: Fri, 8 Jun 2012 20:40:05 -0700
> Subject: Re: Writing custom data import handler for Solr.
> From: goks...@gmail.com
> To: solr-user@lucene.apache.org
> 
> The DataImportHandler is a toolkit in Solr. It has a few different
> kinds of plugins. It is very possible that you do not have to write
> any Java code.
> 
> If you have an unusual external data feed (database, file system,
> Amazon S3 buckets) then you would write a Datasource. The only
> examples are the source code in trunk/solr/contrib/dataimporthandler.
> 
> http://wiki.apache.org/solr/DataImportHandler
> 
> On Fri, Jun 8, 2012 at 8:35 PM, ram anam  wrote:
> >
> > Hi Eric,
> > I cannot disclose the data source which we are planning to index inside 
> > SOLR as it is confidential. But client wants it be in the form of Import 
> > Handler. We plan to install Solr and our custom data import handlers so 
> > that client can just consume it. Could you please provide me the pointers 
> > to examples of Custom Data Import Handlers.
> >
> > Thanks and regards,Ram Anam.
> >
> >> Date: Fri, 8 Jun 2012 13:59:34 -0400
> >> Subject: Re: Writing custom data import handler for Solr.
> >> From: erickerick...@gmail.com
> >> To: solr-user@lucene.apache.org
> >>
> >> You need to back up a bit and describe _why_ you want to do this,
> >> perhaps there's
> >> an easy way to do what you want. This could easily be an XY problem...
> >>
> >> For instance, you can write a SolrJ program to index data, which _might_ be
> >> what you want. It's a separate process runnable anywhere. See:
> >> http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/
> >>
> >> Best
> >> Erick
> >>
> >> On Fri, Jun 8, 2012 at 1:29 PM, ram anam  wrote:
> >> >
> >> > Hi,
> >> >
> >> > I am planning to write a custom data import handler for SOLR for some 
> >> > data source. Could you give me some pointers to documentation, examples 
> >> > on how to write a custom data import handler and how to integrate it 
> >> > with SOLR. Thank you for help. Thanks and regards,Ram Anam.
> >
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
  

RE: Adding Custom-Parser to Tika

2012-06-09 Thread spring
> The doc is old. Tika hunts for parsers in the classpath now.
> 
> http://www.lucidimagination.com/search/link?url=https://issues
> .apache.org/jira/browse/SOLR-2116?focusedCommentId=12977072#ac
> tion_12977072

"Re: tika-config.xml vs. META-INF/services/...; The service provider
mechanism [1] makes it easy to add custom parser implementations without
having to maintain a separate copy of the full Tika configuration file. You
could for example create a my-custom-parsers.jar file with a
META-INF/services/o.a.tika.parser.Parser file that lists only your custom
parser classes. When you add that jar to the classpath, Tika would then
automatically pick up those parsers in addition to the standard parser
classes from the tika-parsers jar."

This was exactly what I tried, but it did not work.

I'm using Tika 1.1



Re: What would cause: "SEVERE: java.lang.ClassCastException: com.company.MyCustomTokenizerFactory cannot be cast to org.apache.solr.analysis.TokenizerFactory"

2012-06-09 Thread Jack Krupansky
Make sure there are no stray jars/classes in your jar, especially any that 
might contain BaseTokenizerFactory or TokenizerFactory. I notice that your 
jar name says "-with-dependencies", raising a little suspicion. The 
exception is as if your class was referring to a BaseTokenizerFactory, which 
implements TokenizerFactory, coming from your jar (or a contained jar) 
rather than getting resolved to Solr 3.6's own BaseTokenizerFactory and 
TokenizerFactory.


-- Jack Krupansky

-Original Message- 
From: Aaron Daubman

Sent: Saturday, June 09, 2012 12:03 AM
To: solr-user@lucene.apache.org
Subject: What would cause: "SEVERE: java.lang.ClassCastException: 
com.company.MyCustomTokenizerFactory cannot be cast to 
org.apache.solr.analysis.TokenizerFactory"


Greetings,

I am in the process of updating custom code and schema from Solr 1.4 to
3.6.0 and have run into the following issue with our two custom Tokenizer
and Token Filter components.

I've been banging my head against this one for far too long, especially
since it must be something obvious I'm missing.

I have  custom Tokenizer and Token Filter components along with
corresponding factories. The code for all looks very similar to the
Tokenizer and TokenFilter (and Factory) code that is standard with 3.6.0
(and I have also read through
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

I have ensured my custom code is on the classpath, it is
in ENSolrComponents-1.0-SNAPSHOT-jar-with-dependencies.jar:
---output snip---
Jun 8, 2012 10:41:00 PM org.apache.solr.core.CoreContainer load
INFO: loading shared library: /opt/test_artists_solr/jetty-solr/lib/en
Jun 8, 2012 10:41:00 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/opt/test_artists_solr/jetty-solr/lib/en/ENSolrComponents-1.0-SNAPSHOT-jar-with-dependencies.jar'
to classloader
Jun 8, 2012 10:41:00 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/opt/test_artists_solr/jetty-solr/lib/en/ENUtil-1.0-SNAPSHOT-jar-with-dependencies.jar'
to classloader
Jun 8, 2012 10:41:00 PM org.apache.solr.core.CoreContainer create
--snip---

After successfully parsing the schema and creating many fields, etc.. the
following is logged:
---snip---
Jun 8, 2012 10:41:00 PM org.apache.solr.util.plugin.AbstractPluginLoader
load
INFO: created : com.company.MyCustomTokenizerFactory
Jun 8, 2012 10:41:00 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ClassCastException: com.company.MyCustomTokenizerFactory
cannot be cast to org.apache.solr.analysis.TokenizerFactory
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:966)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:148)
at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:986)
at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:60)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:453)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:433)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:490)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:123)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:481)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:335)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:219)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:102)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:748)
at
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:249)
at
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1222)
at
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:676)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:455)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
at
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
at
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
at or

Re: terms count in multivalues field

2012-06-09 Thread Jack Krupansky
I am not aware of any function query to get the value count for a 
multivalued field.


You could write a custom update processor which counted the values and then 
stored the count in another field. Then you could do a numeric range query 
on that other field. Or maybe even store a boolean in another field to 
indicate whether the field had a single value or more than a threshold count 
of values, or whatever custom calculation you wanted.


Of course, you could also count the values yourself and setting a separate 
count field before submitting the document to Solr.


See SOLR-2802 - "Toolkit of UpdateProcessors for modifying document values" 
for some possible base classes for such an update processor. This is for 
4.x, not 3.6. For older releases, you will have to do it all yourself.

https://issues.apache.org/jira/browse/SOLR-2802

-- Jack Krupansky

-Original Message- 
From: preetesh dubey

Sent: Friday, June 08, 2012 12:22 PM
To: solr-user@lucene.apache.org
Subject: terms count in multivalues field

Is it possible to get number of entries present in a multivalued field by
solr query. Lets say I want to query to solr to get all documents having *
count* of some multivalued field >1. Is it possible in solr ?

--
Thanks & Regards
Preetesh Dubey 



Re: terms count in multivalues field

2012-06-09 Thread preetesh dubey
Thanks Jack!
Creating another field to store the count is one option. I just wanted to
know if this was possible without adding new field to schema. And I got the
answer, it's not possible atleast in solr 3.x versions.
Thanks once again for your reply and infromation about " Toolkit of
UpdateProcessors " in 4.x version.

On Sun, Jun 10, 2012 at 12:55 AM, Jack Krupansky wrote:

> I am not aware of any function query to get the value count for a
> multivalued field.
>
> You could write a custom update processor which counted the values and
> then stored the count in another field. Then you could do a numeric range
> query on that other field. Or maybe even store a boolean in another field
> to indicate whether the field had a single value or more than a threshold
> count of values, or whatever custom calculation you wanted.
>
> Of course, you could also count the values yourself and setting a separate
> count field before submitting the document to Solr.
>
> See SOLR-2802 - "Toolkit of UpdateProcessors for modifying document
> values" for some possible base classes for such an update processor. This
> is for 4.x, not 3.6. For older releases, you will have to do it all
> yourself.
> https://issues.apache.org/**jira/browse/SOLR-2802
>
> -- Jack Krupansky
>
> -Original Message- From: preetesh dubey
> Sent: Friday, June 08, 2012 12:22 PM
> To: solr-user@lucene.apache.org
> Subject: terms count in multivalues field
>
>
> Is it possible to get number of entries present in a multivalued field by
> solr query. Lets say I want to query to solr to get all documents having *
> count* of some multivalued field >1. Is it possible in solr ?
>
>
> --
> Thanks & Regards
> Preetesh Dubey
>



-- 
Thanks & Regards
Preetesh Dubey


Re: timeAllowed flag in the response

2012-06-09 Thread Lance Norskog
Please file a JIRA. And a patch if you are so inclined.

On Fri, Jun 8, 2012 at 4:55 AM, Michael Kuhlmann  wrote:
> Am 08.06.2012 11:55, schrieb Laurent Vaills:
>
>> Hi Michael,
>>
>> Thanks for the details that helped me to take a deeper look in the source
>> code. I noticed that each time a TimeExceededException is caught the
>> method
>>  setPartialResults(true) is called...which seems to be what I'm looking
>> for.
>> I have to investigate, since this partialResults does not seem to be set
>> for the sharded queries.
>
>
> Ah, I simply was too blind! ;) The partial results flag indeed is set in the
> response header.
>
> Then I think this is a bug that it's not filled in a sharded response, or it
> simply is not there when sharding.
>
> Greeting,
> Kuli



-- 
Lance Norskog
goks...@gmail.com


Re: Adding Custom-Parser to Tika

2012-06-09 Thread Lance Norskog
How do you add it to the classpath? And, is there an example somewhere
of how to package one of these external parsers?

If all else fails, the Tika code for loading external parsers is
available for viewing.

On Sat, Jun 9, 2012 at 3:00 AM,   wrote:
>> The doc is old. Tika hunts for parsers in the classpath now.
>>
>> http://www.lucidimagination.com/search/link?url=https://issues
>> .apache.org/jira/browse/SOLR-2116?focusedCommentId=12977072#ac
>> tion_12977072
>
> "Re: tika-config.xml vs. META-INF/services/...; The service provider
> mechanism [1] makes it easy to add custom parser implementations without
> having to maintain a separate copy of the full Tika configuration file. You
> could for example create a my-custom-parsers.jar file with a
> META-INF/services/o.a.tika.parser.Parser file that lists only your custom
> parser classes. When you add that jar to the classpath, Tika would then
> automatically pick up those parsers in addition to the standard parser
> classes from the tika-parsers jar."
>
> This was exactly what I tried, but it did not work.
>
> I'm using Tika 1.1
>



-- 
Lance Norskog
goks...@gmail.com


Re: Writing custom data import handler for Solr.

2012-06-09 Thread Lance Norskog
Nope, the code is all you get.

On Sat, Jun 9, 2012 at 12:16 AM, ram anam  wrote:
>
> Thanks for the guidance. But is there any documentation that describes the 
> steps to implement custom data source and integrate it with SOLR. The data 
> source I am trying to integrate is like Amazon S3 Buckets. But provider is 
> different.
>
> Thanks and regards,Ram Anam.
>
>> Date: Fri, 8 Jun 2012 20:40:05 -0700
>> Subject: Re: Writing custom data import handler for Solr.
>> From: goks...@gmail.com
>> To: solr-user@lucene.apache.org
>>
>> The DataImportHandler is a toolkit in Solr. It has a few different
>> kinds of plugins. It is very possible that you do not have to write
>> any Java code.
>>
>> If you have an unusual external data feed (database, file system,
>> Amazon S3 buckets) then you would write a Datasource. The only
>> examples are the source code in trunk/solr/contrib/dataimporthandler.
>>
>> http://wiki.apache.org/solr/DataImportHandler
>>
>> On Fri, Jun 8, 2012 at 8:35 PM, ram anam  wrote:
>> >
>> > Hi Eric,
>> > I cannot disclose the data source which we are planning to index inside 
>> > SOLR as it is confidential. But client wants it be in the form of Import 
>> > Handler. We plan to install Solr and our custom data import handlers so 
>> > that client can just consume it. Could you please provide me the pointers 
>> > to examples of Custom Data Import Handlers.
>> >
>> > Thanks and regards,Ram Anam.
>> >
>> >> Date: Fri, 8 Jun 2012 13:59:34 -0400
>> >> Subject: Re: Writing custom data import handler for Solr.
>> >> From: erickerick...@gmail.com
>> >> To: solr-user@lucene.apache.org
>> >>
>> >> You need to back up a bit and describe _why_ you want to do this,
>> >> perhaps there's
>> >> an easy way to do what you want. This could easily be an XY problem...
>> >>
>> >> For instance, you can write a SolrJ program to index data, which _might_ 
>> >> be
>> >> what you want. It's a separate process runnable anywhere. See:
>> >> http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Fri, Jun 8, 2012 at 1:29 PM, ram anam  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > I am planning to write a custom data import handler for SOLR for some 
>> >> > data source. Could you give me some pointers to documentation, examples 
>> >> > on how to write a custom data import handler and how to integrate it 
>> >> > with SOLR. Thank you for help. Thanks and regards,Ram Anam.
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com


Re: deploy a brand new index in solrcloud

2012-06-09 Thread Jack Krupansky
How about maintaining two distinct SolrClouds with a switch in the load 
balancer?


Reindex the second cloud, test it, fully warm it, test it again, and then 
have the load balancer switch all new queries to the second cloud. Then take 
down the original cloud once all queries have completed. Rinse and repeat. 
Avoid any large-scale index changes in a cloud while serving production 
queries.


This still begs the question of how to take the old cloud and initiate a 
re-index of it, such as what the recommended technique is for deleting 
existing data.


-- Jack Krupansky

-Original Message- 
From: Anatoli Matuskova

Sent: Saturday, June 09, 2012 6:07 PM
To: solr-user@lucene.apache.org
Subject: deploy a brand new index in solrcloud

Which would be the best way to fully reindex indexes in Solr cloud? I mean,
single inserts are issued to the leader replica and it sends the 'put' to
the other replicas. But how could I deploy a bran new index to all the
replicas (let's say I've build the index using EmbeddedSolrServer somewhere
else)? If during the time of deployent, some inserts keep coming, would they
be stored in the transaction log and issued once the new index is deployed?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/deploy-a-brand-new-index-in-solrcloud-tp3988731.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: deploy a brand new index in solrcloud

2012-06-09 Thread Anatoli Matuskova
I've thought in setting replication in solrCloud:
http://www.searchworkings.org/forum/-/message_boards/view_message/339527#_19_message_339527
What I don't know is if while replication is being handled, the replica
slaves (that are not the master in replication) can keep handling puts via
transaction log

--
View this message in context: 
http://lucene.472066.n3.nabble.com/deploy-a-brand-new-index-in-solrcloud-tp3988731p3988757.html
Sent from the Solr - User mailing list archive at Nabble.com.