On 2021-01-20 6:26 PM, Joshua Wilder wrote:
Please reconsider the removal of the DIH from future versions. The repo
it's been moved to is a ghost town with zero engagement from Rohit (or
anyone). Not sure how 'moving' it caused it to now only support MariaDB but
that appears to be the case. The c
On 12/17/2020 4:05 PM, Alexandre Rafalovitch wrote:
Try with the explicit URP chain too. It may work as well.
Actually in this case we're just making sure uniqueKey is in fact unique
in all documents, so default is what we want.
For this particular dataset I may at some future point look int
Try with the explicit URP chain too. It may work as well.
Regards,
Alex.
On Thu, 17 Dec 2020 at 16:51, Dmitri Maziuk wrote:
>
> On 12/12/2020 4:36 PM, Shawn Heisey wrote:
> > On 12/12/2020 2:30 PM, Dmitri Maziuk wrote:
> >> Right, ```Every update request received by Solr is run through a chai
On 12/12/2020 4:36 PM, Shawn Heisey wrote:
On 12/12/2020 2:30 PM, Dmitri Maziuk wrote:
Right, ```Every update request received by Solr is run through a chain
of plugins known as Update Request Processors, or URPs.```
The part I'm missing is whether DIH's 'name="/dataimport"' counts as an "Upda
On 12/12/2020 2:30 PM, Dmitri Maziuk wrote:
Right, ```Every update request received by Solr is run through a chain
of plugins known as Update Request Processors, or URPs.```
The part I'm missing is whether DIH's 'name="/dataimport"' counts as an "Update Request", my reading is it
doesn't and U
On 12/12/2020 2:50 PM, Shawn Heisey wrote:
The only way I know of to use an update processor chain with DIH is to
set 'default="true"' when defining the chain.
I did manage to find an example with the default attribute, in javadocs:
https://lucene.apache.org/solr/5_0_0/solr-core/org/apache/so
On 12/12/2020 12:54 PM, Dmitri Maziuk wrote:
is there an easy way to use the stock UUID generator with DIH? We have a
hand-written one-liner class we use as DIH entity transformer but I
wonder if there's a way to use the built-in UUID generator class instead.
From the TFM it looks like there
Why not? You should be able to put an URP chain after DIH, the usual way.
Is that something about UUID that is special?
Regards,
Alex
On Sat., Dec. 12, 2020, 2:55 p.m. Dmitri Maziuk,
wrote:
> Hi everyone,
>
> is there an easy way to use the stock UUID generator with DIH? We have a
> hand-w
DIH should run fine from any node. It sends update requests as any other client,
and those are routed to the leader, wherever it is. It could be problematic if
node 2
gets overloaded by both doing DIH work, Overseer work and perhaps shard leader
work, and an overloaded node gets into all kind of p
Thank you for your quick reply.
Can I make sure that the indexing isn't conducted on the node where the DIH
executed but conducted on the Leader node, right?
As far as I have seen a log, there are errors: the failed establishment of
connection occurred from Node2 on the state of Replica on running
DIH is deprecated in current Solr versions. The general recommendation is to do
processing outside the Solr server and use the update handler (the normal one,
not Cell) to add documents to the index. So you should avoid using it as it is
not future proof .
If you need more Time to migrate to a
I think this is just an issue in the verbose/debug output. tcpdump
does not show the same issue.
On Wed, May 13, 2020 at 7:39 PM matthew sporleder wrote:
>
> I am attempting to use nested entities to populate documents from
> different tables and verbose/debug output is showing repeated queries
Hmm, I'll have a look, but the SELECT is a bit more involved so the IDs from
the other DB will be OR'ed into the WHERE clause, i.e. be added to those
selected from other part of the where clause, so it's not a pure join. I'll
think some more
--
Jan Høydahl, search solution architect
Cominvent A
Hello, Jan.
Have you considered join="zipper" ?
On Thu, Oct 31, 2019 at 12:52 AM Jan Høydahl wrote:
> I need a SELECT which filters IDS based on an ‘id’ list coming from
> another database, i.e. SELECT * FROM maindb.maintable WHERE id IN (SELECT
> myid FROM otherdb.other_table).
>
> The docs ar
Hi,
thanks for all the feedback.
The context parameter in the ScriptTransformer is new to me - thanks for
this insight. I could not find it in any docs. So just for people that also
did not know it:
you can have the ScriptTransformer with 2 parameters, e.g.
function mytransformer(row,context){
...
Hello, Jörn.
Have you tried to find a parent doc in the context which is passed as a
second argument into ScriptTransformer?
On Wed, Sep 18, 2019 at 9:56 PM Jörn Franke wrote:
>
> Hi,
>
> I load a set of documents. Based on these documents some logic needs to be
> applied to split them into chapt
I fully agree. However, I am just curious to see the limits.
> Am 18.09.2019 um 23:33 schrieb Erick Erickson :
>
> When it starts getting complex, I usually move to SolrJ. You say
> you're loading documents, so I assume Tika is in the mix too.
>
> Here's a blog on the topic so you an see how to
When it starts getting complex, I usually move to SolrJ. You say
you're loading documents, so I assume Tika is in the mix too.
Here's a blog on the topic so you an see how to get started...
https://lucidworks.com/post/indexing-with-solrj/
Best,
Erick
On Wed, Sep 18, 2019 at 2:56 PM Jörn Franke
This looks like a problem with your select statement returning too many rows. I
doubt it has to do with the multiValued field, I don’t think DIH is getting to
the point where it even tries to create a SolrInputDocument.
Depending on the driver, there are ways to limit the number of rows returned
Glad to help :)
2018年10月12日(金) 21:10 Martin Frank Hansen (MHQ) :
> You sir just made my day!!!
>
> It worked!!! Thanks a million!
>
>
> Martin Frank Hansen,
>
> -Oprindelig meddelelse-
> Fra: Kamuela Lau
> Sendt: 12. oktober 2018 11:41
> Til: solr-user@
Solr ships with DIH Tika example that seems 90% identical to yours. Can you
get that to run? If it works, then you can focus on the 10% difference.
Perhaps it is explicit dataSource=null in the outer entity? Or maybe
format=text on the inner one.
Regards,
Alex
On Fri, Oct 12, 2018, 3:11 AM
Also, just wondering, have you have tried to specify dataSource="bin" for
read_file?
On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau wrote:
> Hi,
>
> I was unable to reproduce the error that you got with the information
> provided.
> Below are the data-config.xml and managed-schema fields I used; th
Hi,
I was unable to reproduce the error that you got with the information
provided.
Below are the data-config.xml and managed-schema fields I used; the
data-config is mostly the same
(I think that BinFileDataSource doesn't actually require a dataSource, so I
think it's safe to put dataSource="null
If your ID field comes from one XML level and your record details from
another, they are processed as two separate records. Have a look at
atom example that ships with DIH example set. Specifically, at
commonField parameter, it may be useful for you:
https://lucene.apache.org/solr/guide/7_4/uploadi
That sounds good option. So spark job will connect to MySQL and create solr
document which is pushed into solr using solrj probably in batches.
On Thu, Apr 12, 2018 at 10:48 PM, Rahul Singh
wrote:
> If you want speed, Spark is the fastest easiest way. You can connect to
> relational tables direc
CSV -> Spark -> SolR
https://github.com/lucidworks/spark-solr/blob/master/docs/examples/csv.adoc
If speed is not an issue there are other methods. Spring Batch / Spring Data
might have all the tools you need to get speed without Spark.
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On
If you want speed, Spark is the fastest easiest way. You can connect to
relational tables directly and import or export to CSV / JSON and import from a
distributed filesystem like S3 or HDFS.
Combining a dfs with spark and a highly available SolR - you are maximizing all
threads.
--
Rahul Sing
Thanks Rahul. Data source is JdbcDataSource with MySQL database. Data size
is around 100GB.
I am not much familiar with spark but are you suggesting that we should
create document by merging distinct RDBMS tables in using RDD?
On Thu, Apr 12, 2018 at 10:06 PM, Rahul Singh
wrote:
> How much data
How much data and what is the database source? Spark is probably the fastest
way.
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar , wrote:
> Hi,
>
> We are using DIH with SortedMapBackedCache but as data size increases we
> need to provide mo
Stefan
There is at least one free Solr WP plugin. There are several Solr PHP
toolkits on github. Start with these unless your WP is wildly custo.. ..
cheers -- Rick
On 01/03/2018 11:50 AM, Erik Hatcher wrote:
Stefan -
If you pre-transform the XML, I’d personally recommend either transform
Stefan -
If you pre-transform the XML, I’d personally recommend either transforming it
into straight up Solr XML (docs/fields/values) or some other format or posting
directly to Solr. Avoid this DIH thing when things get complicated.
Erik
> On Jan 3, 2018, at 11:40 AM, Stefan Moises
Can,
I would like to learn many languages, but so far only two.
Shawn suggested you get help from a friend who knows English. As well, Google
translate is great for me, but I have not used it with Turkish.
Cheers -- Rick
On November 16, 2017 5:19:33 AM EST, Shawn Heisey wrote:
>On 11/15/2017 11
On 11/15/2017 11:59 PM, Can Ezgi Aydemir wrote:
I configured Solr and Cassandra. Running full data import but not stop. Only
core load during this process, stop it. Seeing that stop dih, not write
dataimport.properties.
In dataconfig.xml file, i define simplepropertywriter type and filename. B
dde No:14, Beysukent 06800, Ankara, Türkiye
T : 0 312 233 50 00 .:. F : 0312 235 56 82
E : cayde...@islem.com.tr .:. W : http://www.islem.com.tr
-Original Message-
From: Sujay Bawaskar [mailto:sujaybawas...@gmail.com]
Sent: 16 November 2017 11:49
To: solr-user@lucene.apache.org
Subje
9} status=0 QTime=0
> 2017-11-16 07:21:36.076 INFO (qtp1638215613-14) [ x:cea2]
> o.a.s.c.S.Request [cea2] webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=0
> 2017-11-16 07:21:38.064 INFO (qtp1638215613-14) [ x
75 INFO (qtp1638215613-43) [ x:cea2] o.a.s.c.S.Request
[cea2] webapp=/solr path=/dataimport
params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=2
^C
Can Ezgi Aydemir
Oracle Veri Tabanı Yöneticisi & Oracle Database Admin
İşlem Coğrafi Bilgi Sistemleri Müh. & Eğitim AŞ.
I have experience this problem recently with MySQL and after checking
solr.log found that there was a connection timeout from MySQL.
Please check solr.log for any Cassandra connection errors.
Thanks,
Sujay
On Thu, Nov 16, 2017 at 12:29 PM, Can Ezgi Aydemir
wrote:
> Hi all,
>
> I configured Solr
Giovanni,
Start with your search results page and work back from there. Decide what
fields you want to display in a results page, then plan for your Solr document
to contain all these fields. Now you will need a program to ingest the data
from whatever database, and create documents for Solr. Th
, 2017 2:12 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: DIH issue with streaming xml file
Thank you for your response. I will look into this link. Also, sorry I did
not specify the file type. I am working with XML files.
~~~
William Kevin Miller
ECS Fe
Hi,
Did not encounter this issue with solr 6.x. But delta import with cache
executes nested query for every element encountered in parent query. Since
this select does not have where clause because we are using cache, it takes
long time. So delta import witch cache is very slow. My observation is
[mailto:arafa...@gmail.com]
Sent: Monday, June 12, 2017 1:26 PM
To: solr-user
Subject: Re: DIH issue with streaming xml file
Solr 6.5.1 DIH setup has - somewhat broken - RSS example (redone as ATOM
example in 6.6) that shows how to get stuff from https URL. You can see the
atom example here:
https
reate a custom entity processor?
>
>
>
>
> ~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Monday, June 12, 2017 12:57 PM
> To: solr-user
>
) 573-2158
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Monday, June 12, 2017 12:57 PM
To: solr-user
Subject: Re: DIH issue with streaming xml file
How do you get a list of URLs for the files on the remote server? That's
probably the first issue. Once you have
How do you get a list of URLs for the files on the remote server? That's
probably the first issue. Once you have the URLs in an outside entity or
two, you can feed them one by one into the inner entity.
Regards,
Alex.
http://www.solr-start.com/ - Resources for Solr users, new and experien
Let me clarify -
DIH is running on Solr 6.5.0 that calls a different solr instance running
on 4.5.0, which has 150M documents. If we try fetch them using DIH onto
new solr cluster, wouldn't it result in deep paging on solr 4.5.0 and
drastically slow down indexing on solr 6.5.0?
On Thu, Apr 27,
On 4/27/2017 9:15 PM, Vijay Kokatnur wrote:
> Hey Shawn, Unfortunately, we can't upgrade the existing cluster. That
> was my first approach as well. Yes, SolrEntityProcessor is used so it
> results in deep paging after certain rows. I have observed that
> instead of importing for a larger period, i
:07 PM
*To:* solr-user@lucene.apache.org
*Subject:* Re: DIH Speed
On 4/27/2017 5:40 PM, Erick Erickson wrote:
> I'm unclear why DIH an deep paging are mixed. DIH is indexing and deep
paging is querying.
>
> If it's querying, consider cursorMark or the /export handler.
https://luc
On 4/27/2017 5:40 PM, Erick Erickson wrote:
> I'm unclear why DIH an deep paging are mixed. DIH is indexing and deep paging
> is querying.
>
> If it's querying, consider cursorMark or the /export handler.
> https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-
I'm unclear why DIH an deep paging are mixed. DIH is
indexing and deep paging is querying.
If it's querying, consider cursorMark or the /export handler.
https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
If it's DIH, please explain a bit
> On Apr 25, 2017, at 10:28 AM, AJ Lemke wrote:
>
> Thanks for the thought Alex!
> The fields that have this happen most often are numeric and boolean fields.
> These fields have real data (id numbers, true/false, etc.)
>
> AJ
>
We had an identical problem a few months ago, and there was no
and boolean fields.
>> These fields have real data (id numbers, true/false, etc.)
>>
>> AJ
>>
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: Tuesday, April 25, 2017 8:27 AM
>> To: solr-user
>
se, etc.)
>
> AJ
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, April 25, 2017 8:27 AM
> To: solr-user
> Subject: Re: DIH Issues
>
> Maybe the content gets simplified away between the database and the Solr
>
: solr-user
Subject: Re: DIH Issues
Maybe the content gets simplified away between the database and the Solr
schema. For example if your field contains just spaces and you have
UpdateRequestProcessors to do trim and removal of empty fields?
Schemaless mode will remove empty fields, but will not
Maybe the content gets simplified away between the database and the
Solr schema. For example if your field contains just spaces and you
have UpdateRequestProcessors to do trim and removal of empty fields?
Schemaless mode will remove empty fields, but will not trim for example.
Regards,
Alex.
-
Thanks Alex. I will test it with 5.4 and 6.4 and let you know.
On Thu, Mar 16, 2017 at 7:40 PM, Alexandre Rafalovitch
wrote:
> You have nested entities and accumulate the content of the inner
> entities in the outer one with caching on an inner one. Your
> description sounds like the inner cache
You have nested entities and accumulate the content of the inner
entities in the outer one with caching on an inner one. Your
description sounds like the inner cache is not reset on the next
iteration of the outer loop.
This may be connected to
https://issues.apache.org/jira/browse/SOLR-7843 (Fixe
This behaviour is for delta import only. One document get field values of
all documents. These fields are child entities which maps column to multi
valued fields.
On Thu, Mar 16, 2017 at 6:35 PM, Alexandre Rafalovitch
wrote:
> Could you give a bit more details. Do y
Could you give a bit more details. Do you mean one document gets the
content of multiple documents? And only on delta?
Regards,
Alex
On 16 Mar 2017 8:53 AM, "Sujay Bawaskar"
wrote:
Hi,
We are using DIH with cache(SortedMapBackedCache) with solr 5.3.1. We have
around 2.8 million documents i
> adtype
> 2017-03-09 13:41:00.053 INFO (qtp2080166188-41928) [c:collectionXXX s:shard1
> r:core_node1 x:collectionXXX_shard1_replica2] o.a.s.c.S.Request
> [collectionXXX_shard1_replica2] webapp=/solr path=/schema params={wt=json}
> status=0 QTime=0
>
>
>
> AJ
>
params={wt=json}
status=0 QTime=0
AJ
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Wednesday, March 8, 2017 9:33 AM
To: solr-user
Subject: Re: DIH Full Index Issue
Are you perhaps indexing at the same time from the source other than DIH?
Because th
Are you perhaps indexing at the same time from the source other than
DIH? Because the commit is global and all the changes from all the
sources will become visible.
Check the access logs perhaps to see the requests to /update handler or similar.
Regards,
Alex.
http://www.solr-start.com/
Seems like a legitimate request, if you can't find a JIRA feel free to open one.
And if you wanted to supply a patch, _well_ ;)
On Mon, Feb 27, 2017 at 10:37 AM, xavier jmlucjav wrote:
> Hi,
>
> After getting our interval for calling delta index shorter and shorter, I
> have found out that last_
Have you checked
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers
?
01 февр. 2017 г. 10:42 пользователь "Kent Iversen"
написал:
> I'm a newbie to Solr and can't seem to get this to work, properly. Gonna
> use Order with Orderlines as an example.
>
Resolved. My problem occurred because of the case-sensitive.
I've read the source code of Solr-6.3 and found a code referencing metadata
of databases,
so I finally noticed that Oracle Database returns *UPPERCASE* letters from
metadata.
As a correct setting, in the where clause of the query calle
Hi Shawn,
Thank you for helpful information and suggestions.
> Are you using the Oracle JVM? This is recommended. Version 1.8.x (Java
> 8)
> is required for Solr 6.3.0.
I'm using Oracle Java 8 (1.8.0_111).
In response to your advice, I've changed the logging level for
JdbcDataSource to DEBUG
On 1/20/2017 7:40 AM, Shawn Heisey wrote:
> One thing you might want to try doing is enclosing the property
> ${books.book_id} in single quotes. The example configs on the
> dataimport wiki page have the properties referenced from parent
> entities surrounded by single quotes:
A second look reveal
On 1/20/2017 5:45 AM, Keiichi MORITA wrote:
> DataImportHandler *can't* work out with Oracle 12c and Solr 6.3.
> Query in nested entities are called, the mapping values are not in child's
> WHERE clause.
> What is the cause of this error? I want some help.
>
>
> ## data-config.xml
>
>
>
I would set the times in the autoCommit to a large number (or -1 I
think). It's possible that there's a default there if the autocommit
section is found but nothing specified, you'll have to look at the
code to be sure.
But what I would do is use aliasing (either core if you're in
stand-alone or c
hi,
On Tue, Nov 15, 2016 at 02:54:49AM +1100, Alexandre Rafalovitch wrote:
>>
>>
> Attribute names are case sensitive as far as I remember. Try
> 'dataSource' for the second definition.
oh wow... that's sneaky. in the old version the case didn't seem to matter,
but now it certainly d
On 15 November 2016 at 02:19, Peter Blokland wrote:
>
>
Attribute names are case sensitive as far as I remember. Try
'dataSource' for the second definition.
Regards,
Alex.
Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and
Thank you Kiran. Simple and nice. I lost a day today trying to make the
delta-import work.
--
View this message in context:
http://lucene.472066.n3.nabble.com/DIH-Delete-with-Full-Import-tp4040070p4279981.html
Sent from the Solr - User mailing list archive at Nabble.com.
On 4/21/2016 5:25 AM, Mahmoud Almokadem wrote:
> We have a cluster of solr 4.8.1 installed on tomcat servlet container and
> we’re able to use DIH Schedule by adding this lines to web.xml of the
> installation directory:
>
>
>
> org.apache.solr.handler.dataimport.scheduler.ApplicationListe
Giving child="true" Solr 5.5 creates a documents block with implicit
relations across parent and nested children. These later retrievable via
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers
only. Giving the fact you run 4.10 I don't think you really
yer
Ingram Content Group
-Original Message-
From: Todd Long [mailto:lon...@gmail.com]
Sent: Wednesday, December 16, 2015 10:21 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH Caching w/ BerkleyBackedCache
James,
I apologize for the late response.
Dyer, James-2 wrote
> With the DIH
James,
I apologize for the late response.
Dyer, James-2 wrote
> With the DIH request, are you specifying "cacheDeletePriorData=false"
We are not specifying that property (it looks like it defaults to "false").
I'm actually seeing this issue when running a full clean/import.
It appears that the
or
getting it to work.
James Dyer
Ingram Content Group
-Original Message-
From: Todd Long [mailto:lon...@gmail.com]
Sent: Tuesday, November 17, 2015 8:11 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH Caching w/ BerkleyBackedCache
Mikhail Khludnev wrote
> It's worth to me
Mikhail Khludnev wrote
> It's worth to mention that for really complex relations scheme it might be
> challenging to organize all of them into parallel ordered streams.
This will most likely be the issue for us which is why I would like to have
the Berkley cache solution to fall back on, if possib
On Mon, Nov 16, 2015 at 5:08 PM, Todd Long wrote:
> Mikhail Khludnev wrote
> > "External merge" join helps to avoid boilerplate caching in such simple
> > cases.
>
> Thank you for the reply. I can certainly look into this though I would have
> to apply the patch for our version (i.e. 4.8.1). I re
Mikhail Khludnev wrote
> "External merge" join helps to avoid boilerplate caching in such simple
> cases.
Thank you for the reply. I can certainly look into this though I would have
to apply the patch for our version (i.e. 4.8.1). I really just simplified
our data configuration here which actually
Hello Todd,
"External merge" join helps to avoid boilerplate caching in such simple
cases.
it should be something
On Fri, Nov 13, 2015 at 10:54 PM, Todd Long wrote:
> We currently index using DIH along with the SortedMapBackedCache cache
> implementation which has worked wel
Erick Erickson wrote
> Have you considered using SolrJ instead of DIH? I've seen
> situations where that can make a difference for things like
> caching small tables at the start of a run, see:
>
> searchhub.org/2012/02/14/indexing-with-solrj/
Nice write-up. I think we're going to move to that ev
Have you considered using SolrJ instead of DIH? I've seen
situations where that can make a difference for things like
caching small tables at the start of a run, see:
searchhub.org/2012/02/14/indexing-with-solrj/
Best,
Erick
On Sat, Oct 24, 2015 at 6:17 PM, Todd Long wrote:
> Dyer, James-2 wrot
Dyer, James-2 wrote
> The DIH Cache feature does not work with delta import. Actually, much of
> DIH does not work with delta import. The workaround you describe is
> similar to the approach described here:
> https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ,
> which in my op
The DIH Cache feature does not work with delta import. Actually, much of DIH
does not work with delta import. The workaround you describe is similar to the
approach described here:
https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport , which
in my opinion is the best way to i
This is also what I have done, but I agree with the notion of using something
external to load the data.
-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com]
Sent: Thursday, October 15, 2015 9:24 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH parallel
Nabil,
What we do is have multiple dih request handlers configured in solrconfig.xml.
Then in the sql query we put something like "where mod(id, ${partition})=0".
Then an external script calls a full import on each request handler at the same
time and monitors the response. This isn't the mo
On 15/10/2015 09:57, nabil Kouici wrote:
Hi All,
I'm using DIH to index more than 15M from Sql Server to Solr. This take more
than 2 hours. Big amount of this time is consumed by data fetching from
database. I'm thinking about a solution to have parallel (thread) loud in the
same DIH. Each thr
On 9/9/2015 4:27 PM, Scott Derrick wrote:
> I can't seem to get delta-imports to work with a FileDataSource DIH
The information I have says delta-import won't work with that kind of
entity.
http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command-1
Also, please make note of this:
i have autogenerated uuid for each document in solr. it is not marked as
uniquefield. i add
uuid
in config to generate uuid when i add document from client. But now
each time i update document uuid is changed.
--
View this message in context:
http://lucene.47
i don't use SQL now. i'm adding documents manually.
db_id_s
--
View this message in context:
http://lucene.472066.n3.nabble.com/DIH-delta-import-pk-tp4224342p4224762.html
Sent from the Solr - User mailing list archive at Nabble.com.
Send the SQL and Schema.xml. Also logs. Does it complain about _id_ or you
field in schema?
On Sun, Aug 23, 2015 at 4:55 AM, CrazyDiamond wrote:
> Now I set db id as unique field and uuid field,which should be generated
> automatically as required. but when i add document i have an error tha
Now I set db id as unique field and uuid field,which should be generated
automatically as required. but when i add document i have an error that my
required uuid field is missing.
--
View this message in context:
http://lucene.472066.n3.nabble.com/DIH-delta-import-pk-tp4224342p4224701.html
Sen
As far as I understand I cant use 2 uniquefield. i need db id and uuid
because i moving data from database to solr index entirely. And temporaly i
need it to be compatble with delta-import, but in future i will use new only
uuid .
--
View this message in context:
http://lucene.472066.n3.nabble
"use 2 unique fields" to do what? Solr replaces older docs with newer
docs based _solely_ on the defined in schema.xml.
There is no notion of "compound unique key" like there can be in a
database.
You could concatenate the PK and a uuid, but what would be the point?
Since the uuid (presumably) c
ok, can I use 2 unique fields one with uuid and one with db id? what will
happened then?
--
View this message in context:
http://lucene.472066.n3.nabble.com/DIH-delta-import-pk-tp4224342p4224395.html
Sent from the Solr - User mailing list archive at Nabble.com.
On 8/20/2015 4:27 PM, CrazyDiamond wrote:
> i have a DIH delta-import query based on last_index_time.it works perfectly
> But sometimes i add documents to Solr manually and i want DIH not to add
> them again.I have UUID unique field and also i have "id" from database which
> is marked as pk in DI
Just pick a node to run it on. I vastly prefer, though,
using a SolrJ client, here's a sample:
https://lucidworks.com/blog/indexing-with-solrj/
Best,
Erick
On Wed, Jul 29, 2015 at 4:37 AM, Midas A wrote:
> Hi,
>
> I have to create DIH with solr cloud shared with multi node architecture
> for s
On 7/17/2015 8:23 AM, Bill Au wrote:
> One of my database column is a varchar containing a comma-delimited list of
> values. I wold like to import these values into a multiValued field. I
> figure that I will need to write a ScriptTransformer to do that. Is there
> a better way?
DIH provides th
You were 100 percent right. I went back and checked the metadata looking for
multiple instances of the same file path. Both of the files had an extra set
of metadata with the same filepath. Thank you very much.
--
View this message in context:
http://lucene.472066.n3.nabble.com/DIH-Not-Indexin
My first guess is that somehow these two documents have
the same as some other documents so later
docs are replacing newer docs. Although not conclusive,
looking at the admin page for the cores in question may
show numDocs=278 and maxDoc=280 or some such in
which case that would be what's happenin
1 - 100 of 921 matches
Mail list logo