Re: DIH

2021-01-21 Thread dmitri maziuk
On 2021-01-20 6:26 PM, Joshua Wilder wrote: Please reconsider the removal of the DIH from future versions. The repo it's been moved to is a ghost town with zero engagement from Rohit (or anyone). Not sure how 'moving' it caused it to now only support MariaDB but that appears to b

DIH

2021-01-20 Thread Joshua Wilder
Please reconsider the removal of the DIH from future versions. The repo it's been moved to is a ghost town with zero engagement from Rohit (or anyone). Not sure how 'moving' it caused it to now only support MariaDB but that appears to be the case. The current implementation is fas

Re: Data Import Handler (DIH) - Installing and running

2020-12-23 Thread Erick Erickson
Have you done what the message says and looked at your Solr log? If so, what information is there? > On Dec 23, 2020, at 5:13 AM, DINSD | SPAutores > wrote: > > Hi, > > I'm trying to install the package "data-import-handler", since it was > discontinued from core SolR distro. > > https://git

Data Import Handler (DIH) - Installing and running

2020-12-23 Thread DINSD | SPAutores
Hi, I'm trying to install the package "data-import-handler", since it was discontinued from core SolR distro. https://github.com/rohitbemax/dataimporthandler However, as soon as the first command is carried out solr -c -Denable.packages=true I get this screen in web interface Has anyone be

Re: DIH and UUIDProcessorFactory

2020-12-17 Thread Dmitri Maziuk
On 12/17/2020 4:05 PM, Alexandre Rafalovitch wrote: Try with the explicit URP chain too. It may work as well. Actually in this case we're just making sure uniqueKey is in fact unique in all documents, so default is what we want. For this particular dataset I may at some future point look int

Re: DIH and UUIDProcessorFactory

2020-12-17 Thread Alexandre Rafalovitch
doesn't and URP chain applies only to ' > > > If you define an update chain as default, then it will be used for all > > updates made where a different chain is not specifically requested. > > > > I have used this personally to have my custom update chain apply even

Re: DIH and UUIDProcessorFactory

2020-12-17 Thread Dmitri Maziuk
o have my custom update chain apply even when the indexing comes from DIH.  I know for sure that this works on 4.x and 5.x versions; it should work on newer versions as well. Confirmed w/ 8.7.0: I finally got to importing the one DB where I need this, and UUIDs are there with the default URP chain. Thank you Dima

DIH from hive server 2

2020-12-16 Thread Jon Morisi
Hi, Could someone please tell me what the correct dataSource driver class is for HiveServer2 (to solr)? This isn't working for me: I've also copied these jar files and referenced them (via lib dir in solrconfig.xml): hive-jdbc-3.1.0.3.1.4.0-315-standalone.jar hadoop-auth.jar hadoop-common.jar

Re: DIH and UUIDProcessorFactory

2020-12-12 Thread Shawn Heisey
est", my reading is it doesn't and URP chain applies only to ' If you define an update chain as default, then it will be used for all updates made where a different chain is not specifically requested. I have used this personally to have my custom update chain apply even when

Re: DIH and UUIDProcessorFactory

2020-12-12 Thread Dmitri Maziuk
On 12/12/2020 2:50 PM, Shawn Heisey wrote: The only way I know of to use an update processor chain with DIH is to set 'default="true"' when defining the chain. I did manage to find an example with the default attribute, in javadocs: https://lucene.apache.org/solr/5_0_0/

Re: DIH and UUIDProcessorFactory

2020-12-12 Thread Shawn Heisey
On 12/12/2020 12:54 PM, Dmitri Maziuk wrote: is there an easy way to use the stock UUID generator with DIH? We have a hand-written one-liner class we use as DIH entity transformer but I wonder if there's a way to use the built-in UUID generator class instead. From the TFM it looks like

Re: DIH and UUIDProcessorFactory

2020-12-12 Thread Alexandre Rafalovitch
Why not? You should be able to put an URP chain after DIH, the usual way. Is that something about UUID that is special? Regards, Alex On Sat., Dec. 12, 2020, 2:55 p.m. Dmitri Maziuk, wrote: > Hi everyone, > > is there an easy way to use the stock UUID generator with DIH? We have

DIH and UUIDProcessorFactory

2020-12-12 Thread Dmitri Maziuk
Hi everyone, is there an easy way to use the stock UUID generator with DIH? We have a hand-written one-liner class we use as DIH entity transformer but I wonder if there's a way to use the built-in UUID generator class instead. From the TFM it looks like there isn't, is that cor

Solr DIH: empty child document transformer

2020-11-13 Thread Jordi Cabré
I will try to explain myself in as much detail as possible and isolating as much as possible from the context. Shortly, I'm trying to create a `DIH` in order to digest some documents as nested. I mean, I need to digest an `one-to-many` relation and put it as nested documents. My `parents`

Child documents are not retrieved after DIH

2020-11-12 Thread Jordi Cabré
I will try to explain myself in as much detail as possible and isolating as much as possible from the context. Shortly, I'm trying to create a `DIH` in order to digest some documents as nested. I mean, I need to digest an `one-to-many` relation and put it as nested documents. My `parents`

Re: DIH on SolrCloud

2020-08-14 Thread Jan Høydahl
DIH should run fine from any node. It sends update requests as any other client, and those are routed to the leader, wherever it is. It could be problematic if node 2 gets overloaded by both doing DIH work, Overseer work and perhaps shard leader work, and an overloaded node gets into all kind of

Re: DIH on SolrCloud

2020-08-13 Thread Issei Nishigata
Thank you for your quick reply. Can I make sure that the indexing isn't conducted on the node where the DIH executed but conducted on the Leader node, right? As far as I have seen a log, there are errors: the failed establishment of connection occurred from Node2 on the state of Repli

Re: DIH on SolrCloud

2020-08-13 Thread Jörn Franke
DIH is deprecated in current Solr versions. The general recommendation is to do processing outside the Solr server and use the update handler (the normal one, not Cell) to add documents to the index. So you should avoid using it as it is not future proof . If you need more Time to migrate to a

DIH on SolrCloud

2020-08-13 Thread Issei Nishigata
Hi, All I'm using Solr4.10 with SolrCloud mode. I have 10 Nodes. one of Nodes is Leader Node, the others is Replica.(I will call this Node1 to Node10 for convenience) -> 1 Shard, 1 Leader(Node1), 9 Replica(Node2-10) Indexing always uses DIH of Node2. Therefore, DIH may be executed when

Re: nested entities and DIH indexing time

2020-05-14 Thread Shawn Heisey
is: https://stackoverflow.com/questions/21006045/can-solr-dih-do-atomic-updates I have never used a ScriptTransformer, so I do not know how to actually do this. Your schema would have to be compatible with atomic updates. Thanks, Shawn

Re: nested entities and DIH indexing time

2020-05-14 Thread matthew sporleder
gt; > splitting on them (gross but faster). > > When you have nested entities, this is how DIH works. A separate SQL > query for the inner entity is made for each row returned on the outer > entity. Nested entities tend to be extremely slow for this reason. > > The best way to wo

Re: nested entities and DIH indexing time

2020-05-14 Thread Shawn Heisey
es to make things faster. Just looking for some tips. I prefer this architecture to the way we currently do it with complex SQL, inserting weird strings, and then splitting on them (gross but faster). When you have nested entities, this is how DIH works. A separate SQL query for the inner enti

nested entities and DIH indexing time

2020-05-14 Thread matthew sporleder
It appears that adding entities to my entities in my data import config is slowing down my import process by a lot. Is there a good way to speed this up? I see the ID's are individually queried instead of using IN() or similar normal techniques to make things faster. Just looking for some tips.

Re: DIH nested entity repeating query in verbose output

2020-05-14 Thread matthew sporleder
I think this is just an issue in the verbose/debug output. tcpdump does not show the same issue. On Wed, May 13, 2020 at 7:39 PM matthew sporleder wrote: > > I am attempting to use nested entities to populate documents from > different tables and verbose/debug output is showing repeated queries

DIH nested entity repeating query in verbose output

2020-05-13 Thread matthew sporleder
I am attempting to use nested entities to populate documents from different tables and verbose/debug output is showing repeated queries on import. The doc number repeats the sqls. "verbose-output": [ "entity:parent", .. [ "document#5", [ ... "entity:nested1", [ "query", "SELECT body AS nested1 FR

Solr 8.X _nest_path_ and DIH

2020-05-13 Thread James Greene
I've been trying to get the _nest_path_ queries working with no success. Does anyone have a link to an example of the configurations needed to get this to work? I'm using DIH to index my data and the child entities are getting indexed (i can query them directly). But using

Re: Solr indexing with Tika DIH - ZeroByteFileException

2020-04-23 Thread Charlie Hull
If users can upload any PDF, including broken or huge ones, and some cause a Tika error, you should decouple Tika from Solr and run it as a separate process to extract text before indexing with Solr. Otherwise some of what is uploaded *will* break Solr. https://lucidworks.com/post/indexing-with

Re: Solr indexing with Tika DIH - ZeroByteFileException

2020-04-22 Thread ravi kumar amaravadi
Hi, Iam also facing same issue. Does anyone have any update/soulution how to fix this issue as part DIH? Thanks. Regards, Ravi kumar -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

TikaEntityProcessor with DIH

2020-04-20 Thread Srinivas Kashyap
Hi, we were in Solr 5.2.1 and TikaEntityProcessor to index pdf documents through DIH and was working fine. The jars were tika-core-1.4.jar and tika-parsers-1.4.jar. Below is my schema.xml: (p,s. All filed types have been defined) And my tika-data-config.xml

Re: entity in DIH for partial update?

2020-04-10 Thread matthew sporleder
Do you mean something along the lines of this (hackish?) https://stackoverflow.com/questions/21006045/can-solr-dih-do-atomic-updates method? On Fri, Apr 10, 2020 at 10:19 AM Jörn Franke wrote: > > You could use atomic updates in DIH. However, there is a bug in > current/potentially

Re: entity in DIH for partial update?

2020-04-10 Thread Jörn Franke
You could use atomic updates in DIH. However, there is a bug in current/potentially also old Solr version that this leaks a searcher (which means the index data is infinitely growing until you restart the server). You can also export from the database to Jsonline, post it to the json update

entity in DIH for partial update?

2020-04-10 Thread matthew sporleder
I have an field I would like to add to my schema which is stored in a different database from my primary data. Can I use a separate entity in my DIH to update a single field of my documents? Thanks, Matt

Re: How do I add multiple values for same field with DIH script?

2020-01-16 Thread O. Klein
Yes, field is multivalued. I managed to add an array to the content_text field and comma separated values "foo,bar" eg. but not a " list" like normally you see with a multivalued field. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: How do I add multiple values for same field with DIH script?

2020-01-16 Thread Edward Ribeiro
Hi, Are you sure content_text is a multivalued field (i.e., field definition has multiValued="true" in managed-schema)? Edward Em qui, 16 de jan de 2020 08:42, O. Klein escreveu: > row.put('content_text', "hello"); > row.put('content_text', "this is a test"); > return row; > > will only retur

Re: How do I add multiple values for same field with DIH script?

2020-01-16 Thread Mikhail Khludnev
Hello. What about putting Arrays.asList("foo", "bar") ? On Thu, Jan 16, 2020 at 2:42 PM O. Klein wrote: > row.put('content_text', "hello"); > row.put('content_text', "this is a test"); > return row; > > will only return "this is a test" > > > > > -- > Sent from: https://lucene.472066.n3.nabble.c

How do I add multiple values for same field with DIH script?

2020-01-16 Thread O. Klein
row.put('content_text', "hello"); row.put('content_text', "this is a test"); return row; will only return "this is a test" -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-25 Thread Mikhail Khludnev
It's worth to increase log level for DIH categories in SolrAmin. It's quite useful usually. On Mon, Nov 25, 2019 at 10:19 PM Shashank Bellary wrote: > That didn't make any difference. However, I upgraded to 8.0.15 version of > mysql-jdbc driver which fixed the problem of r

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-25 Thread Shashank Bellary
ok for iOS<https://aka.ms/o0ukef> > > From: Mikhail Khludnev > Sent: Sunday, November 24, 2019 2:51:40 PM > To: solr-user > Subject: Re: Solr 4 to Solr7 migration DIH behavior change > > Note - This message originated from outside Care.com - Please use

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-24 Thread Mikhail Khludnev
or iOS<https://aka.ms/o0ukef> > > From: Mikhail Khludnev > Sent: Sunday, November 24, 2019 2:51:40 PM > To: solr-user > Subject: Re: Solr 4 to Solr7 migration DIH behavior change > > Note - This message originated from outside Care.co

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-24 Thread Shashank Bellary
Thanks Mikhail, data config is on the thread above. I’ll share again if you can’t find it Get Outlook for iOS<https://aka.ms/o0ukef> From: Mikhail Khludnev Sent: Sunday, November 24, 2019 2:51:40 PM To: solr-user Subject: Re: Solr 4 to Solr7 migrati

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-24 Thread Mikhail Khludnev
gt; > > Hi Folks > > I migrated from Solr 4 to 7.5 and I see an issue with the way DIH is > working. I use `JdbcDataSource` and here the config file is attached > > 1) I started seeing OutOfMemory issue since MySQL JDBC driver has > that issue of not respecting `

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-24 Thread Shashank Bellary
e the java version to 8? Did you upgrade the MySQL driver to the latest version? > Am 22.11.2019 um 20:43 schrieb Shashank Bellary : > > > > Hi Folks > I migrated from Solr 4 to 7.5 and I see an issue with the way DIH is working. I use `JdbcDataSource`

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-22 Thread Shashank Bellary
id you update the java version to 8? Did you upgrade the MySQL driver to the latest version? > Am 22.11.2019 um 20:43 schrieb Shashank Bellary : > > > > Hi Folks > I migrated from Solr 4 to 7.5 and I see an issue with the way DIH

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-22 Thread Shashank Bellary
n to 8? Did you upgrade the MySQL driver to the latest version? > Am 22.11.2019 um 20:43 schrieb Shashank Bellary : > > > > Hi Folks > I migrated from Solr 4 to 7.5 and I see an issue with the way DIH is working. I use `JdbcDataSource` and here the c

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-22 Thread Jörn Franke
Did you update the java version to 8? Did you upgrade the MySQL driver to the latest version? > Am 22.11.2019 um 20:43 schrieb Shashank Bellary : > >  > > Hi Folks > I migrated from Solr 4 to 7.5 and I see an issue with the way DIH is working. > I use `JdbcDataSource

Solr 4 to Solr7 migration DIH behavior change

2019-11-22 Thread Shashank Bellary
Hi Folks I migrated from Solr 4 to 7.5 and I see an issue with the way DIH is working. I use `JdbcDataSource` and here the config file is attached 1) I started seeing OutOfMemory issue since MySQL JDBC driver has that issue of not respecting `batchSize` (though Solr4 didn't show

FW: Solr 4 to Solr7 migration DIH behavior change

2019-11-22 Thread Shashank Bellary
Hi Folks I migrated from Solr 4 to 7.5 and I see an issue with the way DIH is working. I use `JdbcDataSource` and here the config file is attached 1) I started seeing OutOfMemory issue since MySQL JDBC driver has that issue of not respecting `batchSize` (though Solr4 didn't show this beh

Re: DIH across two SQL DBs

2019-10-31 Thread Jan Høydahl
select? >> >> However since the list of IDs are UUID strings and there are a few >> thousand of them, I guess the SELECT becomes too large if you just send a >> huge OR clause to MySql. I have been thinking about a 2-stage solution, >> first create a temp table in MySql a

Re: DIH across two SQL DBs

2019-10-31 Thread Mikhail Khludnev
lause to MySql. I have been thinking about a 2-stage solution, > first create a temp table in MySql and INSERT all the IDs there, then > include the temp table in the WHERE as usual, and delete the tmp table > afterwards. Does DIH have a built-in and efficient feature for such an

DIH across two SQL DBs

2019-10-31 Thread Jan Høydahl
as usual, and delete the tmp table afterwards. Does DIH have a built-in and efficient feature for such an operation? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com

Re: DIH: Create Child Documents in ScriptTransformer

2019-09-19 Thread Jörn Franke
){ } The following Javadoc gives some hints on what you can do with the context: https://lucene.apache.org/solr/8_2_0/solr-dataimporthandler/org/apache/solr/handler/dataimport/Context.html Despite all this, I came to the conclusion that adding child docs in a ScriptTransformer in DIH are not

Re: DIH: Create Child Documents in ScriptTransformer

2019-09-18 Thread Mikhail Khludnev
split them into chapters (this is done). One whole document is > loaded as a parent. Chapters of the whole document + metadata should be > loaded as child documents of this parent. > I want to now collect information on how this can be done: > * Use a custom loader - this is possible and wo

Re: DIH: Create Child Documents in ScriptTransformer

2019-09-18 Thread Jörn Franke
logic needs to be >> applied to split them into chapters (this is done). One whole document is >> loaded as a parent. Chapters of the whole document + metadata should be >> loaded as child documents of this parent. >> I want to now collect information on how this can be do

Re: DIH: Create Child Documents in ScriptTransformer

2019-09-18 Thread Erick Erickson
uments of this parent. > I want to now collect information on how this can be done: > * Use a custom loader - this is possible and works > * Use DIH and extract the chapters in a ScriptTransformer and add them as > child documents there. However, the scripttransformer receives as i

DIH: Create Child Documents in ScriptTransformer

2019-09-18 Thread Jörn Franke
information on how this can be done: * Use a custom loader - this is possible and works * Use DIH and extract the chapters in a ScriptTransformer and add them as child documents there. However, the scripttransformer receives as input only a HashMap and while it works to transform field values etc. It does

Need help | NoNodeException | Could not read DIH properties

2019-09-05 Thread Pal Sumit
Hi, I am getting the below log very frequently and I can't find more details about it. ZKPropertiesWriter Could not read DIH properties from /configs//dataimport.properties :class org.apache.zookeeper.KeeperException$NoNodeException Details: We have a Solr cluster containing 2

Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-03 Thread Mikhail Khludnev
Tracked https://issues.apache.org/jira/browse/SOLR-13735 patches are welcome. On Mon, Sep 2, 2019 at 12:39 PM Vadim Ivanov < vadim.iva...@spb.ntk-intourist.ru> wrote: > Timeout causes DIH to finish with error message. So, If I check DIH > response to be sure > that DIH sessio

RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-02 Thread Vadim Ivanov
Timeout causes DIH to finish with error message. So, If I check DIH response to be sure that DIH session have finished without any mistakes it causes some trouble :) I haven't check yet whether all records successfully imported to solr. Supposed that after timeout shard does not a

Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-02 Thread Mikhail Khludnev
UpdateProcessor's chain from time to time and recreate it. However, that exception shouldn't cause any problem or it does? Also, it's worth to track as a jira, or mentioned in the ticket regarding adjusting DIH for Cloud. On Mon, Sep 2, 2019 at 9:44 AM Vadim Ivanov < vadim.iva...@spb.ntk-i

Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-01 Thread Mikhail Khludnev
Giving that org.apache.solr.common.util.FastInputStream.peek(FastInputStream.java:60) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDoc s(JavabinLoader.java:107) JavabinLoader hangs on Stream.peek(), awaiting -1, and hit timeout. I guess it's might be related with "closing so

RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-01 Thread swapna.minnaka...@copart.com
I am facing same exact issue. We never had any issue with 6.5.1 when doing full index (initial bulk load) After upgrading to 7.5.0, getting below exception and indexing is taking a very long time 2019-09-01 10:11:27.436 ERROR (qtp1650813924-22) [c:c_collection s:shard1 r:core_node3 x:c_collection_

RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-01 Thread swapna.minnaka...@copart.com
I am facing same exact issue. We never had any issue with 6.5.1 when doing full index (initial bulk load) After upgrading to 7.5.0, getting below exception and indexing is taking a very long time 2019-09-01 10:11:27.436 ERROR (qtp1650813924-22) [c:c_member_lots_a s:shard1 r:core_node3 x:c_collecti

Different DIH failure behavior on non-sharded and sharded collections

2019-08-26 Thread Jack Schlederer
ingle-shard collection, when the DIH encountered a document that was missing a required field or otherwise couldn't be indexed, it would throw a warning into the log and continue. Now, with a doubly-sharded collection, a similar event causes the entire DIH full import to f

Using DIH for dynamic list of mailboxes

2019-07-16 Thread Mr Havercamp
I am currently integrating Solr with a service which uses Zimbra for mail. I have been tasked with importing inbox content into Solr and have opted to use DIH. Because this service adds/removes mailboxes, what would be the best approach for managing a list of mailboxes for DIH? My main concern

Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-11 Thread Shawn Heisey
On 7/11/2019 9:04 AM, Joseph_Tucker wrote: Looks like I've managed to get some semblance of this working. The indexes are much faster, but the RAM usage by SolrJ is quite high. Is it normal to see around 6GB of RAM usage? (My test is indexing 250,000 records with the 50 child entities) Whatever

Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-11 Thread Joseph_Tucker
Thanks For the help. Looks like I've managed to get some semblance of this working. The indexes are much faster, but the RAM usage by SolrJ is quite high. Is it normal to see around 6GB of RAM usage? (My test is indexing 250,000 records with the 50 child entities) In short, I'm running through a

Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-08 Thread Jörn Franke
t; Solr and the language of your choosing. >>> >>> I guess what I'm after is, how would using SolrJ improve performance when >>> indexing? >> >> It's not just about improving performance (although DIH is single >> threaded, so you could o

Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-08 Thread Alexandre Rafalovitch
and the language of your choosing. > >> > >> I guess what I'm after is, how would using SolrJ improve performance when > >> indexing? > > > > It's not just about improving performance (although DIH is single > > threaded, so you could obtain a marked indexin

Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-08 Thread Joseph_Tucker
d >> between >> Solr and the language of your choosing. >> >> I guess what I'm after is, how would using SolrJ improve performance when >> indexing? > > It's not just about improving performance (although DIH is single > threaded, so you could obtain

Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-05 Thread Charlie Hull
d the language of your choosing. I guess what I'm after is, how would using SolrJ improve performance when indexing? It's not just about improving performance (although DIH is single threaded, so you could obtain a marked indexing performance gain using a client such as SolrJ).  With DIH y

Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-05 Thread Joseph_Tucker
Thanks for your help / suggestion. I'm not sure I completely follow in this case. SolrJ looks like a method to allow Java applications to talk to Solr, or any other third party application would simply be a communication method between Solr and the language of your choosing. I guess what I'm aft

Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-05 Thread Alexandre Rafalovitch
I don't think you should be designing this around DIH. It was never planned for complex scenarios. Or particularly fault tollerant, which you may need. Either use SolrJ or a third party tools that integrate with Solr. Regards, Alex On Fri, Jul 5, 2019, 7:43 AM Joseph_Tucker,

Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-05 Thread Joseph_Tucker
What is the best way - performance wise - to index data from multiple databases? I'm potentially going to have around 50 different data sources grabbing unique data Here's what I've roughly designed: ... I've excluded fields but each entity

Re: DIH import fails when importing multi-valued field

2019-06-27 Thread Erick Erickson
This looks like a problem with your select statement returning too many rows. I doubt it has to do with the multiValued field, I don’t think DIH is getting to the point where it even tries to create a SolrInputDocument. Depending on the driver, there are ways to limit the number of rows

DIH import fails when importing multi-valued field

2019-06-26 Thread Robert Dadzie
Hi All, I'm trygin to use DIH to import about 150k documents to Solr. One of the multi-valued fields I need to import stores about 1500 unique ID per record. I tried increasing the 'ramBufferSizeMB' setting but that didn't help. I get this ArrayIndexOutOfBoundsException er

indexing MongoDB using DIH

2019-06-17 Thread Wendy2
Hi, Has any one tried with the following project to index MongoDB via DIH? I tried to use it. But could not add a filter in the find() method. Any suggestions? Thanks! https://github.com/james75/SolrMongoImporter -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Indexing MongoDB via DIH

2019-06-16 Thread Wendy2
Hi, I need to index several large collection of mongoDB with filters via DIH. Ruled out vis mongo-connector. Any recommendations? Thanks! -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Solr indexing with Tika DIH - ZeroByteFileException

2019-06-11 Thread neilb
Hi, while going through solr logs, I found data import error for certain documents. Here are details about the error. Exception while processing: file document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 7866 at org.apa

Re: multi-level Nested entities in dih

2019-04-30 Thread Alexandre Rafalovitch
Well, you can do Entities within Entities and all of that hierarchical matching will go into a single document (unless you use child=true flag). But there is nothing that supports random number of deep levels. And, given the complexity, I still would keep away from using DIH for such use case

RE: multi-level Nested entities in dih

2019-04-30 Thread Srinivas Kashyap
2019 05:06 PM To: solr-user Subject: Re: multi-level Nested entities in dih DIH may not be able to do arbitrary nesting. And it is not recommended for complex production cases. However, in general, you also have to focus on what your _search_ will look like. Amd only then think about the mapping

Re: multi-level Nested entities in dih

2019-04-30 Thread Alexandre Rafalovitch
DIH may not be able to do arbitrary nesting. And it is not recommended for complex production cases. However, in general, you also have to focus on what your _search_ will look like. Amd only then think about the mapping. For example, is that whole tree gets mapped to and returned as a single

multi-level Nested entities in dih

2019-04-30 Thread Srinivas Kashyap
Hello, I'm using DIH to index the data using SQL. I have requirement as shown below: Parent entity Child1 Child2 Child3 CHILD4( child41, child42, CHILD43(child 431,child432,child433,CHILD434...) How to recursively iterat

Re: Solr indexing with Tika DIH local vs network share

2019-04-04 Thread neilb
Thank you Erick, this is very helpful! -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr indexing with Tika DIH local vs network share

2019-03-29 Thread Erick Erickson
So just try adding the autocommit and auotsoftcommit settings. All of the example configs have these entries and you can copy/paste/change > On Mar 29, 2019, at 10:35 AM, neilb wrote: > > Hi Erick, I am using solrconfig.xml from samples only and has very few > entries. I have attached my config

Re: Solr indexing with Tika DIH local vs network share

2019-03-29 Thread neilb
Hi Erick, I am using solrconfig.xml from samples only and has very few entries. I have attached my config files for review along with reply. Thanks solrconfig.xml tika-data-config.xml

Re: Solr indexing with Tika DIH local vs network share

2019-03-29 Thread Erick Erickson
something like, say, 5 minutes (or even one minute). That would reduce the interval before docs become searchable. I think DIH issues a commit at the end of the run, so that would be why you didn’t see anything for so long if I’m right. Here’s more than you want to know about all this: https

Re: Solr indexing with Tika DIH local vs network share

2019-03-29 Thread neilb
you see anything wrong with current setup of Solr and Tika DIH? All I am looking for PDF full text search results and have it integrated in web app dashboard using ajax queries. Also this particular article <http://lets-share.senktas.net/2017/11/solr-as-a-service.html> was helpful to ge

Re: Solr indexing with Tika DIH local vs network share

2019-03-26 Thread Erick Erickson
Not quite an answer to your specific qustion, but… There are a number of reasons why it’s better to run your Tika process outside of Solr and DIH. Here’s the long form: https://lucidworks.com/2012/02/14/indexing-with-solrj/ Ignore the RDBMS parts. It’s somewhat old, but should be adaptable easily

Solr indexing with Tika DIH local vs network share

2019-03-26 Thread neilb
Hi, I am trying to setup Solr for our project which can return full text searches on PDF documents. I am able to run the sample Tika DIH example locally on my windows server machine. It can index all PDF documents recursively in "baseDir" of config xml. Presently "baseDir" po

Re: Help with a DIH config file

2019-03-16 Thread Jörn Franke
ycling through. The > directory > is full of folders that have the documents in them. Do I need an html or > other file sitting in there randomly to get it to start recursion through > the folders? I am attaching my dih config to see the single change I made > to the base directory. Am

Re: Help with a DIH config file

2019-03-16 Thread Jörn Franke
the documents in them. Do I need an html or >> other file sitting in there randomly to get it to start recursion through >> the folders? I am attaching my dih config to see the single change I made >> to the base directory. Am I just being impatient and it will eventually

Re: Help with a DIH config file

2019-03-15 Thread wclarke
it to start recursion through the folders? I am attaching my dih config to see the single change I made to the base directory. Am I just being impatient and it will eventually start going in the folders? Thanks! tika-data-config-2.xml <http://lucene.472066.n3.nabble.com/file/t494707/tika-data-

Re: Help with a DIH config file

2019-03-15 Thread wclarke
Thanks! that fixed it. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Help with a DIH config file

2019-03-15 Thread Tim Allison
Haha, looks like Jörn just answered this... onError="skip|continue" >greatly preferable if the indexing process could ignore exceptions Please, no. I'm 100% behind the sentiment that DIH should gracefully handle Tika exceptions, but the better option is to log the exc

Re: Help with a DIH config file

2019-03-15 Thread Jörn Franke
o downgrade to an > old enough Tika in her Solr installation to work around the problem that way. > > The bigger question, though, is whether there's a way to allow the DIH to > simply ignore errors and keep going. Whitney needs to index several terabytes > of arbitrary documents

RE: Help with a DIH config file

2019-03-15 Thread Demian Katz
o find a way to downgrade to an old enough Tika in her Solr installation to work around the problem that way. The bigger question, though, is whether there's a way to allow the DIH to simply ignore errors and keep going. Whitney needs to index several terabytes of arbitrary documents for her

Re: Help with a DIH config file

2019-03-15 Thread Jörn Franke
to ask. > Am 15.03.2019 um 03:41 schrieb wclarke : > > Thank you so much. You helped a great deal. I am running into one last > issue where the Tika DIH is stopping at a specific language and fails there > (Malayalam). Do you know of a work around? > > > > -- > S

Re: Help with a DIH config file

2019-03-15 Thread wclarke
Thank you so much. You helped a great deal. I am running into one last issue where the Tika DIH is stopping at a specific language and fails there (Malayalam). Do you know of a work around? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Help with a DIH config file

2019-03-14 Thread Jörn Franke
sorry for my late reply. thanks for sharing yes this is possible. maybe my last mail were confusing. I hope the examples below help Alternative 1 - Use only DIH without update processor tika-data-config-2xml - add transformer in entity and the transformation in field (here done for id and for

Re: Help with a DIH config file

2019-03-13 Thread wclarke
Got each one working individually, but not multiples. Is it possible? Please see attached files. Thanks!!! tika-data-config-2.xml solrconfig.xml -- Sen

  1   2   3   4   5   6   7   8   9   10   >