Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Mikhail Khludnev
Karl, what would you do if that own implementation stalls in GC, or smashes Solr over? On Thu, Feb 6, 2020 at 1:04 PM Karl Stoney wrote: > Spoke too soon, looks like it memory leaks. After about 1.3m the old gc > times went through the root and solr was almost unresponsive, had to > abort. We'

Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Mikhail Khludnev
Egor, would you mind to share some best practices regarding cursorMark in SolrEntityProcessor? On Thu, Feb 6, 2020 at 1:04 PM Karl Stoney wrote: > Spoke too soon, looks like it memory leaks. After about 1.3m the old gc > times went through the root and solr was almost unresponsive, had to > abo

Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Karl Stoney
Spoke too soon, looks like it memory leaks. After about 1.3m the old gc times went through the root and solr was almost unresponsive, had to abort. We're going to write our own implementation to copy data from one core to another that runs outside of solr. On 06/02/2020, 09:57, "Karl Stoney"

Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Karl Stoney
I cannot believe how much of a difference that cursorMark and sort order made. Previously it died about 800k docs, now we're at 1.2m without any slowdown. Thank you so much On 06/02/2020, 08:14, "Mikhail Khludnev" wrote: Hello, Karl. Please check these: https://eur03.safelinks.pro

Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Mikhail Khludnev
Hello, Karl. Please check these: https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#constraints-when-using-cursors https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#solrentityprocessor cursorMark="true" Good luck. On

Re: DataImportHandler: full import of a single entity

2017-08-08 Thread bhargava ravali koganti
It is not working if a cron job is given. It is executing the other enities as well. Is there any solution? -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-full-import-of-a-single-entity-tp2258037p4349551.html Sent from the Solr - User mailing list archive

Re: DataImportHandler OutOfMemory Mysql

2017-04-02 Thread Shawn Heisey
On 4/1/2017 4:17 PM, marotosg wrote: > I am trying to load a big table into Solr using DataImportHandler and Mysql. > I am getting OutOfMemory error because Solr is trying to load the full > table. I have been reading different posts and tried batchSize="-1". > https://wiki.apache.org/solr/DataIm

Re: DataImportHandler OutOfMemory Mysql

2017-04-01 Thread Mikhail Khludnev
Hello, Sergio. Have you tried Integer.MIN_VALUE ? -2147483648 see https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html On Sun, Apr 2, 2017 at 1:17 AM, marotosg wrote: > Hi, > > I am trying to load a big table into Solr using DataImportHandler and > Mysql

RE: DataImportHandler - Unable to load Tika Config Processing Document # 1

2017-02-09 Thread Anatharaman, Srinatha (Contractor)
hawn Heisey Subject: RE: DataImportHandler - Unable to load Tika Config Processing Document # 1 > Thank you I will follow Erick's steps > BTW I am also trying to ingesting using Flume , Flume uses Morphlines > along with Tika Even Flume SolrSink will have the same issue? Yes, when usin

RE: DataImportHandler - Unable to load Tika Config Processing Document # 1

2017-02-08 Thread Markus Jelsma
> Thank you I will follow Erick's steps > BTW I am also trying to ingesting using Flume , Flume uses Morphlines along > with Tika > Even Flume SolrSink will have the same issue? Yes, when using Tika you run the risk of it choking on a document, eating CPU and/or RAM until everything dies. This i

RE: DataImportHandler - Unable to load Tika Config Processing Document # 1

2017-02-08 Thread Anatharaman, Srinatha (Contractor)
-user@lucene.apache.org Subject: Re: DataImportHandler - Unable to load Tika Config Processing Document # 1 On 2/8/2017 9:08 AM, Anatharaman, Srinatha (Contractor) wrote: > Thank you for your reply > Other archive message you mentioned is posted by me only I am new to > Solr, When you say proces

Re: DataImportHandler - Unable to load Tika Config Processing Document # 1

2017-02-08 Thread Shawn Heisey
On 2/8/2017 9:08 AM, Anatharaman, Srinatha (Contractor) wrote: > Thank you for your reply > Other archive message you mentioned is posted by me only > I am new to Solr, When you say process outside Solr program. What exactly I > should do? > > I am having lots of text document which I need to inde

RE: DataImportHandler - Unable to load Tika Config Processing Document # 1

2017-02-08 Thread Anatharaman, Srinatha (Contractor)
document I was able to successfully do this in Solr Core stand alone -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Wednesday, February 08, 2017 1:56 PM To: solr-user@lucene.apache.org Subject: RE: DataImportHandler - Unable to load Tika Config Processing

RE: DataImportHandler - Unable to load Tika Config Processing Document # 1

2017-02-08 Thread Allison, Timothy B.
>It is *strongly* recommended to *not* use >the Tika that's embedded within >Solr, but >instead to do the processing outside of Solr >in a program of your >own and index the results. +1 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201601.mbox/%3CBY2PR09MB11210EDFCFA297528940B07C

RE: DataImportHandler - Unable to load Tika Config Processing Document # 1

2017-02-08 Thread Anatharaman, Srinatha (Contractor)
Solr? Regards, ~Sri -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Wednesday, February 08, 2017 9:46 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler - Unable to load Tika Config Processing Document # 1 On 2/6/2017 3:45 PM, Anatharaman, Srinatha

Re: DataImportHandler - Unable to load Tika Config Processing Document # 1

2017-02-08 Thread Shawn Heisey
On 2/6/2017 3:45 PM, Anatharaman, Srinatha (Contractor) wrote: > I am having below error while trying to index using dataImporthandler > > Data-Config file is mentioned below. zookeeper is not able to read > "tikaConfig.xml" on below statement > > processor="TikaEntityProcessor" tikaConfig="tika

RE: DataImportHandler | Query | performance

2016-12-23 Thread Prateek Jain J
Thanks a lot Shawn. Regards, Prateek Jain -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: 23 December 2016 01:36 PM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler | Query | performance On 12/23/2016 5:15 AM, Prateek Jain J wrote: > We n

Re: DataImportHandler | Query | performance

2016-12-23 Thread Shawn Heisey
On 12/23/2016 5:15 AM, Prateek Jain J wrote: > We need some advice/views on the way we push our documents in SOLR (4.8.1). > So, here are the requirements: > > 1. Document could be from 5 to 100 KB in size. > > 2. 10-50 users actively querying solr with different sort of data. > > 3.

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Pierre Caserta
Thanks Alexandre, I solved the problem using the xslt transform and the /update handler. I attach the xsl that I put in conf/xslt/ (for documentation) Then the command: curl "http://192.168.99.100:8999/solr/solrexchange/update?commit=true&tr=updateXmlSolrExchange.xsl"; -H "Content-Type: text/x

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Alexandre Rafalovitch
Seem you might be right, according to the source: https://github.com/apache/lucene-solr/blob/master/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java#L662 Sometimes, the magic (and schemaless is rather magical) fails when combined with older assumptions (an

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Pierre Caserta
I am rebuilding a new docker image with each change on the config file so solr starts fresh every time. add-unknown-fields-to-the-schema solr-data-config.xml still having document like such: "response":{"numFound":8,"start":0,"docs":[ { "id":"38

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Alexandre Rafalovitch
Ok, to reduce the magic, you can just stick "update.chain" parameter inside the defaults of the dataimport handler directly. You can also pass it just as a URL parameter. That's what 'defaults' section mean. And, just to be paranoid, you did reload the core after each of those changes to test it?

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Pierre Caserta
It did not work, I tried many things and ended up trying this: solr-data-config.xml add-unknown-fields-to-the-schema Regards, Pierre > On 10 Aug 2016, at 18:08, Alexandre Rafalovitch wrote: > > Your initParams section does not apply to /dataimp

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Alexandre Rafalovitch
Your initParams section does not apply to /dataimport handler as defined. Try modifying it to say: path="/update/**,/dataimport" Hopefully, that's all that takes. Managed schema is enabled by default, but schemaless mode is the next layer on top. With managed schema, you can use the API to add yo

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Pierre Caserta
Hi Alex, thanks for your answer. Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema. add-unknown-fields-to-the-schema I created my core using this command: curl http://192.168.99.100:8999/solr/admin/cores?action=CREATE&name=solrexchange&instanceDir=/opt/s

Re: DataImportHandler with a managed-schema only import id and version

2016-08-10 Thread Alexandre Rafalovitch
Do you have the actual fields defined? If not, then I am guessing that your 'post' test was against a different collection that had schemaless mode enabled and your DIH one is against one where schemaless mode is not enabled (look for 'add-unknown-fields-to-the-schema' in the solrconfig.xml to conf

Re: DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7

2016-03-08 Thread B Weber
harshrossi gmail.com> writes: > > I am using *DeltaImportHandler* for indexing data in Solr. Currently I am > manually indexing the data into Solr by selecting commands full-import or > delta-import from the Solr Admin screen. > > I am using Windows 7 and would like to automate the process by

Re: DataImportHandler scheduling

2015-09-01 Thread Kevin Lee
While it may be useful to have a scheduler for simple cases, I think there are too many variables to make it useful for everyone's case. For example, I recently wrote a script that uses the data import handler api to get the status, kick off the import, etc. However, before allowing it to just

Re: DataImportHandler scheduling

2015-09-01 Thread William Bell
We should add a simple scheduler in the UI. It is very useful. To schedule various actions: - Full index - Delta Index - Replicate On Tue, Sep 1, 2015 at 12:41 PM, Shawn Heisey wrote: > On 9/1/2015 11:45 AM, Troy Edwards wrote: > > My initial thought was to use scheduling built with DIH: > >

Re: DataImportHandler scheduling

2015-09-01 Thread Shawn Heisey
On 9/1/2015 11:45 AM, Troy Edwards wrote: > My initial thought was to use scheduling built with DIH: > http://wiki.apache.org/solr/DataImportHandler#Scheduling > > But I think just a cron job should do the same for me. The dataimport scheduler does not exist in any Solr version. This is a propose

Re: DataImportHandler scheduling

2015-09-01 Thread Troy Edwards
te to see > who does the job - questions on how to do this should be directed to a > zookeeper users' mailing list. > > -Original Message- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Monday, August 31, 2015 7:50 PM > To: solr-user@lucene.apache.org >

RE: DataImportHandler scheduling

2015-09-01 Thread Davis, Daniel (NIH/NLM) [C]
on how to do this should be directed to a zookeeper users' mailing list. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Monday, August 31, 2015 7:50 PM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler scheduling On 8/31/2015 11:26 AM, Troy Edwards

Re: DataImportHandler scheduling

2015-08-31 Thread Shawn Heisey
On 8/31/2015 11:26 AM, Troy Edwards wrote: > I am having a hard time finding documentation on DataImportHandler > scheduling in SolrCloud. Can someone please post a link to that? I have a > requirement that the DIH should be initiated at a specific time Monday > through Friday. Every modern operat

RE: DataImportHandler scheduling

2015-08-31 Thread Davis, Daniel (NIH/NLM) [C]
x27;t "run once", but instead avoids overlap, so there's good reason to write something specific to that case. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: Monday, August 31, 2015 1:35 PM To: solr-user@lucene.apache.org Subject: Re: DataImportH

Re: DataImportHandler scheduling

2015-08-31 Thread Ahmet Arslan
Hi Troy, I think folks use corncobs (with curl utility) provided by the Operating System. Ahmet On Monday, August 31, 2015 8:26 PM, Troy Edwards wrote: I am having a hard time finding documentation on DataImportHandler scheduling in SolrCloud. Can someone please post a link to that? I have a

Re: dataImportHandler

2015-08-13 Thread Erick Erickson
I tend to approach these differently. DIH is a great tool for its purpose, but I find SolrJ/Tika to be more understandable. Which may only reflect that I've never spent enough time with DIH, but there it is So, why not use a simple SolrJ program with either Tika or your favorite HTML parser t

Re: DataImportHandler while Replication

2014-06-04 Thread rulinma
good. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-while-Replication-tp4138763p4139774.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: DataImportHandler while Replication

2014-05-30 Thread Robin Woods
Erick, Thanks a bunch. Good to know the internals. Best, Robin -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-while-Replication-tp4138763p4138984.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: DataImportHandler while Replication

2014-05-30 Thread Erick Erickson
You don't need to do anything. Essentially, when replication starts the current state of the master's index is "frozen" in the sense that the _closed_ segments that make up the index at the time replication starts are the only ones that are replicated to the slave. All indexing happens into an _op

Re: DataImportHandler while Replication

2014-05-30 Thread Robin Woods
Hello Shalin, Appreciate your reply. I've not implemented DIH in production and now we are considering. hence the question, if we configure DIH on the master from which the Slave is replicating the index, we will need to control this externally? or is there any setting in DIH that will allow us to

Re: DataImportHandler while Replication

2014-05-30 Thread Shalin Shekhar Mangar
They are completely separate components in Solr. Are you seeing performance problems in replication due to indexing or vice versa? On Fri, May 30, 2014 at 10:10 AM, Robin Woods wrote: > Hi, > > What would happen to DataImportHandler that is setup on the master when the > slave is in the process

Re: DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7

2014-04-11 Thread harshrossi
Yes that is all fine with me. Only thing that worries me is what needs to be coded in the batch file. I will just try a sample batch file and get back with queries if any. Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-Automatic-scheduling-of-d

Re: DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7

2014-04-10 Thread William Bell
You can use PowerShell in windows to kick off a URL at a scheduled time. On Thu, Apr 10, 2014 at 11:02 PM, harshrossi wrote: > I am using *DeltaImportHandler* for indexing data in Solr. Currently I am > manually indexing the data into Solr by selecting commands full-import or > delta-import fr

Re: DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7

2014-04-10 Thread Alexandre Rafalovitch
DataImportHandler is just a URL call. You can see the specific URL you want to call by opening debugger window in Chrome/Firefox and looking at the network tab. Then, you have a general problem of how to call a URL from Windows Scheduler. Google brings a lot of results for that, so you should be a

Re: DataImportHandler on multi core - limiting concurrent runs on more than N cores

2013-11-20 Thread Shalin Shekhar Mangar
No, there is no synchronisation between data import handlers on different cores. You will have to implement this sort of queuing logic on your application's side. On Wed, Nov 20, 2013 at 2:23 PM, Patrice Monroe Pustavrh wrote: > Hi, > I am currently run Solr with 10 cores. It works fine with me,

Re: DataImportHandler oddity

2013-09-12 Thread Shalin Shekhar Mangar
Thanks. It'd be great if you can update this thread if you ever find a workaround. We will document it on the DataImportHandlerFaq wiki page. http://wiki.apache.org/solr/DataImportHandlerFaq On Thu, Sep 12, 2013 at 4:56 PM, Raymond Wiker wrote: > That sounds reasonable. I've done some more diggi

Re: DataImportHandler oddity

2013-09-12 Thread Raymond Wiker
That sounds reasonable. I've done some more digging, and found that the database instance in this case is an _OLD_ version of Oracle: 9.2.0.8.0. I also tried using the OCI driver (version 12), which refuses to even talk to this database. I have three other databases running on more recent versions

Re: DataImportHandler oddity

2013-09-12 Thread Shalin Shekhar Mangar
This is probably a bug with Oracle thin JDBC driver. Google found a similar issue: http://stackoverflow.com/questions/4168494/resultset-getstring-on-varchar2-column-returns-empty-string I don't think this is specific to DataImportHandler. On Thu, Sep 12, 2013 at 12:43 PM, Raymond Wiker wrote: >

Re: DataImportHandler oddity

2013-09-12 Thread Raymond Wiker
Followup: I just tried modifying the select with select CAST('APPLICATION' as varchar2(100)) as sourceid, ... and that caused the sourceid field to be empty. CASTing to char(100) gave me the expected value ('APPLICATION', right-padded to 100 characters). Meanwhile, google gave me this: http://bu

Re: DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor

2013-08-02 Thread Raymond Wiker
It appears that this is simpler than I thought: in SOLR 4.4, at least, there is a dataSource class named "FieldStreamDataSource" that I can use directly with the TikaEntityProcessor. Given a blob column named DOCIMAGE, I can use the following Tika entity: ...

Re: DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor

2013-07-30 Thread Shalin Shekhar Mangar
There's no BlobTransformer in DataImportHandler. You'll have to write one. Also, you'd probably need to write a FieldInputStreamDataSource instead of FieldReaderDataSource. On Tue, Jul 30, 2013 at 12:30 PM, Raymond Wiker wrote: > I have a case where I want to documents and metadata content from

RE: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Dyer, James
Instead of specifying CachedSqlEntityProcessor, you can specify SqlEntityProcessor with "cacheImpl='SortedMapBackedCache'". If you parametertize this, to have "SortedMapBackedCache" for full updates but blank for deltas I think it will cache only on the full import. Another option is to parame

Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
ty. > > It would be great if you could just sketch the setup with the entities I > provided. Because currently I have no idea on how to do it. > > Regards > > Constantin > > > -Ursprüngliche Nachricht- > Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com

Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is possible to create two separate root entities . one for full-import and another for delta. for the delta-import you can skip Cache that way On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber < constantin.wol...@medicalcolumbus.de> wrote: > Hi, > > i searched for a solution for quite some

Re: DataImportHandler - Indexing xml content

2013-04-26 Thread Alexandre Rafalovitch
Have you looked at: http://wiki.apache.org/solr/DataImportHandler#FieldReaderDataSource ? Regards, Alex. On Fri, Apr 26, 2013 at 12:29 PM, Peri Subrahmanya wrote: > I have a column in my database that is of type long text and holds xml > content. I was wondering when I define the entity recor

Re: dataimporthandler does not distribute documents on solr cloud

2013-04-23 Thread Joel Bernstein
Actually, it is Solr 4.1+ where the implicit router will be used if nuShards is not specified. On Tue, Apr 23, 2013 at 2:52 PM, Joel Bernstein wrote: > What version of Solr a re you using? In Solr 4.2+ if you don't specify > numShards when creating the collection, the implicit document router w

Re: dataimporthandler does not distribute documents on solr cloud

2013-04-23 Thread Joel Bernstein
What version of Solr a re you using? In Solr 4.2+ if you don't specify numShards when creating the collection, the implicit document router will be used. DIH running under the implicit document router most likely would not distribute documents. If this is the case you'll need to recreate the colle

Re: dataimporthandler: nested query is called multiple times

2013-03-21 Thread patrick
alex, thank you for the link. i enabled the trace for 'org.apache.solr.handler.dataimport' and it seems as if the database is only called once: 2013-03-21T09:40:43 1363855243889 50 org.apache.solr.handler.dataimport.JdbcDataSource FINE org.apache.solr.handler.dataimport.JdbcDataSou

Re: dataimporthandler: nested query is called multiple times

2013-03-20 Thread Alexandre Rafalovitch
There was something like this on Stack Overflow: http://stackoverflow.com/questions/15164166/solr-filelistentityprocessor-is-executing-sub-entities-multiple-times Upgrading Solr helped partially, but the conclusion was not fully satisfactory. Regards, Alex. Personal blog: http://blog.outerth

Re: DataImportHandler in Solr 1.4 bug?

2012-11-15 Thread Andy Lester
On Nov 15, 2012, at 8:02 AM, Sébastien Lorber wrote: > > > I don't know where you're getting the ${JOB_EXEC.JOB_INSTANCE_ID}. I believe that if you want to get parameters passed in, it looks like this: WHERE batchid = ${dataimporter.request.batchid} when I kick

Re: DataImportHandler WARNING: Unable to resolve variable

2012-08-10 Thread Jon Drukman
Swati > > -Original Message- > From: Swati Swoboda [mailto:sswob...@igloosoftware.com] > Sent: Thursday, August 09, 2012 11:09 PM > To: solr-user@lucene.apache.org > Subject: RE: DataImportHandler WARNING: Unable to resolve variable > > I am getting a similar issue

RE: DataImportHandler WARNING: Unable to resolve variable

2012-08-09 Thread Swati Swoboda
ose errors - null values are just not accepted, it seems. Swati -Original Message- From: Swati Swoboda [mailto:sswob...@igloosoftware.com] Sent: Thursday, August 09, 2012 11:09 PM To: solr-user@lucene.apache.org Subject: RE: DataImportHandler WARNING: Unable to resolve variable I am

RE: DataImportHandler WARNING: Unable to resolve variable

2012-08-09 Thread Swati Swoboda
I am getting a similar issue when while using a Template Transformer. My fields *always* have a value as well - it is getting indexed correctly. Furthermore, the number of warnings I get seems arbitrary. I imported one document (debug mode) and I got roughly ~400 of those warning messages for th

Re: DataImportHandler only importing 3 fields on all entities

2012-05-03 Thread Parmeley, Michael
I discovered the schema.xml file about 2 minutes before I got your response. It was very enlightening:-) thanks for the tips about dynamicFields! On May 3, 2012, at 1:02 PM, Jack Krupansky wrote: > Those three field names are already in the Solr example schema. Either > manually add your desi

Re: DataImportHandler only importing 3 fields on all entities

2012-05-03 Thread Jack Krupansky
Those three field names are already in the Solr example schema. Either manually add your desired fields to the schema, change their names (column vs. sourceColName) to fields that do exist in your Solr schema, give them names that end with one of the dynamicField suffixes (such as "*_s"), or en

RE: dataImportHandler: delta query fetching data, not just ids?

2012-03-29 Thread Dyer, James
r/DataImportHandler#Special_Commands . James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: janne mattila [mailto:jannepostilis...@gmail.com] Sent: Thursday, March 29, 2012 12:45 AM To: solr-user@lucene.apache.org Subject: Re: dataImportHandler: delta query fetc

Re: dataImportHandler: delta query fetching data, not just ids?

2012-03-28 Thread janne mattila
> I'm not sure why deltas were implemented this way.  Possibly it was designed > to behave like some of our object-to-relational libraries?  In any case, > there are 2 ways to do deltas and you just have to take your pick based on > what will work best for your situation.  I wouldn't consider th

Re: DataImportHandler: backups prior to full-import

2012-03-28 Thread Bill Bell
You could use the Solr Command Utility SCU that runs from Windows and can be scheduled to run. https://github.com/justengland/Solr-Command-Utility This is a windows system that will index using a core, and swap it if it succeeds. It works it's Solr. Let me know if you have any questions. On

Re: DataImportHandler: backups prior to full-import

2012-03-28 Thread Shawn Heisey
On 3/28/2012 12:46 PM, Artem Shnayder wrote: Does anyone know of any work done to automatically run a backup prior to a DataImportHandler full-import? I've asked this question on #solr and was pointed to https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API which is helpfu

Re: DataImportHandler: backups prior to full-import

2012-03-28 Thread Artem Shnayder
Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Artem Shnayder [mailto:artem@gmail.com] > Sent: Wednesday, March 28, 2012 1:59 PM > To: solr-user@lucene.apache.org > Subject: Re: DataImportHandler: backups prio

RE: DataImportHandler: backups prior to full-import

2012-03-28 Thread Dyer, James
esday, March 28, 2012 1:59 PM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler: backups prior to full-import My typical workflow is a once-a-day full-import with hourly delta-imports. Ideally, the backup would occur only during the full-import commits. Is there a way to differentiate i

Re: DataImportHandler: backups prior to full-import

2012-03-28 Thread Artem Shnayder
My typical workflow is a once-a-day full-import with hourly delta-imports. Ideally, the backup would occur only during the full-import commits. Is there a way to differentiate in the replication handler? On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James wrote: > I don't know of any effort out there t

RE: DataImportHandler: backups prior to full-import

2012-03-28 Thread Dyer, James
I don't know of any effort out there to have DIH trigger a backup automatically. However, you can set the replication handler to automatically backup after each commit. This might solve your problem if you aren't committing frequently. James Dyer E-Commerce Systems Ingram Content Group (615)

RE: dataImportHandler: delta query fetching data, not just ids?

2012-03-28 Thread Dyer, James
Janne, You're correct on how the delta import works. You specify 3 queries: - deletedPkQuery = query should return all "id"s (only) of items that were deleted since the last run. - deltaQuery = query should return all "id"s (only) of items that were added/updated since the last run. - deltaImp

Re: dataImportHandler: delta query fetching data, not just ids?

2012-03-27 Thread janne mattila
How did it work before SOLR-811 update? I don't understand. Did it fetch delta data with two queries (1. gets ids, 2. gets data per each id) or did it fetch all delta data with a single query? On Tue, Mar 27, 2012 at 5:45 PM, Ahmet Arslan wrote: >> 2. If not - what's the reason delta import is im

Re: dataImportHandler: delta query fetching data, not just ids?

2012-03-27 Thread Ahmet Arslan
> 2. If not - what's the reason delta import is implemented > like it is? > Why split it in two queries? I would think having a single > delta query > that fetches the data would be kind of an "obvious" design > unless > there's something that calls for 2 separate queries...? I think this is it? h

Re: DataImportHandler running out of memory

2012-02-23 Thread Shawn Heisey
On 2/20/2012 6:49 AM, v_shan wrote: DIH still running out of memory for me, with Full Import on a database of size 1.5 GB. Solr version: 3_5_0 Note that I have already added batchSize="-1" but getting same error. A few questions: - How much memory have you given to the JVM running this Solr

Re: DataImportHandler running out of memory

2012-02-20 Thread v_shan
DIH still running out of memory for me, with Full Import on a database of size 1.5 GB. Solr version: 3_5_0 Note that I have already added batchSize="-1" but getting same error. Sharing my DIH config below.

Re: DataImportHandler fails silently

2012-01-28 Thread Erik Hatcher
On Jan 28, 2012, at 09:02 , mathieu lacage wrote: > This deserves an entry in > http://wiki.apache.org/solr/DataImportHandlerFaqwhich I would have > updated but it is immutable. *hint to those who have > edit powers there* You can make yourself a wiki account and then edit the page. An account i

Re: DataImportHandler fails silently

2012-01-28 Thread mathieu lacage
On Sat, Jan 28, 2012 at 10:35 AM, mathieu lacage wrote: > > (I have tried two different sqlite jdbc drivers so, I doubt it could > be a problem there, but, who knows). > I eventually screamed really loud when I read the source code of the sqlite jdbc drivers: they interpret the jdbcDataSource at

Re: DataImportHandler fails silently

2012-01-28 Thread mathieu lacage
On 1/28/12, mathieu lacage wrote: > > Le 28 janv. 2012 à 05:17, Lance Norskog a écrit : > >> Do all of the documents have unique id fields? > > yes. I have debugged this further with http://localhost:8080/solr/admin/dataimport.jsp?handler=/dataimport The returned xml file when I ask for verbose

Re: DataImportHandler fails silently

2012-01-27 Thread mathieu lacage
Le 28 janv. 2012 à 05:17, Lance Norskog a écrit : > Do all of the documents have unique id fields? yes. > > On Fri, Jan 27, 2012 at 10:44 AM, mathieu lacage > wrote: >> On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage >> wrote: >> >>> >>> It seems to work but the following command reports

Re: DataImportHandler fails silently

2012-01-27 Thread Lance Norskog
Do all of the documents have unique id fields? On Fri, Jan 27, 2012 at 10:44 AM, mathieu lacage wrote: > On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage > wrote: > >> >> It seems to work but the following command reports that only 499 documents >> were indexed (yes, there are many more documents

Re: DataImportHandler fails silently

2012-01-27 Thread mathieu lacage
On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage wrote: > > It seems to work but the following command reports that only 499 documents > were indexed (yes, there are many more documents in my database): > And before anyone asks: 1 499 0 2012-01-27 19:37:16 Indexing completed. Added/Updated: 499 d

RE: DataImportHandler in Solr 4.0

2012-01-18 Thread Dyer, James
al Message- From: Rob [mailto:rlusa...@gmail.com] Sent: Tuesday, January 17, 2012 6:38 PM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler in Solr 4.0 Not a java pro, and the documentation hasn't been updated to include these instructions (at least that I could find). What do I n

Re: DataImportHandler in Solr 4.0

2012-01-17 Thread Rob
Not a java pro, and the documentation hasn't been updated to include these instructions (at least that I could find). What do I need to do to perform the steps that Alexandre is talking about? -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-in-Solr-4-0-tp2563

Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Briggs Thompson
Hey Rahul, Thanks for the response. I actually just figured it thankfully :). To answer your question, the raw_tag is indexed and not stored (tokenized), and then there is a copyField for raw_tag to "raw_tag_string" which would be used for facets. That *should have* been displayed in the results.

Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Rahul Warawdekar
Hi Briggs, By saying "multivalued fields are not getting indexed prperly", do you mean to say that you are not able to search on those fields ? Have you tried actually searching your Solr index for those multivalued terms and make sure if it returns the search results ? One possibility could be t

Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Briggs Thompson
In addition, I tried a query like below and changed the column definition to and still no luck. It is indexing the full content now but not multivalued. It seems like the "splitBy" ins't working properly. select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.* from

Re: DataImportHandler Streaming XML Parse

2011-11-21 Thread Chris Hostetter
: We're using DIH to import flat xml files. We're getting Heap memory : exceptions due to the file size. Is there any way to force DIH to do a : streaming parse rather than a DOM parse? I really don't want to chunk my : files up or increase the heap size. The XPathEntityProcessor is using a

Re: DataImportHandler using new connection on each query

2011-09-30 Thread Chris Hostetter
: > Noble? Shalin? what's the point of throwing away a connection that's been : > in use for more then 10 seconds? : Hoss, as others have noted, DIH throws away connections which have been idle : for more than the timeout value (10 seconds). The jdbc standard way of : checking for a valid connec

Re: DataImportHandler frequency

2011-09-30 Thread Lan
It's best run the data import once per minute. Solr updates works best when updates are batched and commits are infrequent. Doing a post per document as a transaction would require a solr commit, which could cause the server to hang under update load. Of course you could not do the commit but your

Re: DataImportHandler using new connection on each query

2011-09-23 Thread Shalin Shekhar Mangar
On Sat, Sep 3, 2011 at 1:29 AM, Chris Hostetter wrote: > > : I am not sure if current version has this, but DIH used to reload > : connections after some idle time > : > : if (currTime - connLastUsed > CONN_TIME_OUT) { > : synchronized (this) { > :

Re: DataImportHandler using new connection on each query

2011-09-02 Thread eks dev
watch out, "running 10 hours" != "idling 10 seconds" and trying again. Those are different cases. It is not dropping *used* connections (good to know it works that good, thanks for reporting!), just not reusing connections more than 10 seconds idle On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty

Re: DataImportHandler using new connection on each query

2011-09-02 Thread eks dev
take care, "running 10 hours" != "idling 10 seconds" and trying again. Those are different cases. It is not dropping *used* connections (good to know it works that good, thanks for reporting!), just not reusing connections more than 10 seconds idle On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty

Re: DataImportHandler using new connection on each query

2011-09-02 Thread Gora Mohanty
On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey wrote: [...] > I use DIH with MySQL.  When things are going well, a full rebuild will leave > connections open and active for over two hours.  This is the case with > 1.4.0, 1.4.1, 3.1.0, and 3.2.0.  Due to some kind of problem on the database > server,

Re: DataImportHandler using new connection on each query

2011-09-02 Thread Shawn Heisey
On 9/2/2011 1:59 PM, Chris Hostetter wrote: : I am not sure if current version has this, but DIH used to reload : connections after some idle time : : if (currTime - connLastUsed> CONN_TIME_OUT) { : synchronized (this) { : Connection tmpConn =

Re: DataImportHandler using new connection on each query

2011-09-02 Thread Chris Hostetter
: I am not sure if current version has this, but DIH used to reload : connections after some idle time : : if (currTime - connLastUsed > CONN_TIME_OUT) { : synchronized (this) { : Connection tmpConn = factory.call(); :

Re: DataImportHandler using new connection on each query

2011-09-02 Thread eks dev
I am not sure if current version has this, but DIH used to reload connections after some idle time if (currTime - connLastUsed > CONN_TIME_OUT) { synchronized (this) { Connection tmpConn = factory.call(); clos

Re: DataImportHandler using new connection on each query

2011-09-01 Thread Chris Hostetter
: However, I tested this against a slower SQL Server and I saw : dramatically worse results. Instead of re-using their database, each of : the sub-entities is recreating a connection each time the query runs. are you seeing any specific errors logged before these new connections are created?

  1   2   3   4   >