Karl, what would you do if that own implementation stalls in GC, or smashes
Solr over?
On Thu, Feb 6, 2020 at 1:04 PM Karl Stoney
wrote:
> Spoke too soon, looks like it memory leaks. After about 1.3m the old gc
> times went through the root and solr was almost unresponsive, had to
> abort. We'
Egor, would you mind to share some best practices regarding cursorMark in
SolrEntityProcessor?
On Thu, Feb 6, 2020 at 1:04 PM Karl Stoney
wrote:
> Spoke too soon, looks like it memory leaks. After about 1.3m the old gc
> times went through the root and solr was almost unresponsive, had to
> abo
Spoke too soon, looks like it memory leaks. After about 1.3m the old gc times
went through the root and solr was almost unresponsive, had to abort. We're
going to write our own implementation to copy data from one core to another
that runs outside of solr.
On 06/02/2020, 09:57, "Karl Stoney"
I cannot believe how much of a difference that cursorMark and sort order made.
Previously it died about 800k docs, now we're at 1.2m without any slowdown.
Thank you so much
On 06/02/2020, 08:14, "Mikhail Khludnev" wrote:
Hello, Karl.
Please check these:
https://eur03.safelinks.pro
Hello, Karl.
Please check these:
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#constraints-when-using-cursors
https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#solrentityprocessor
cursorMark="true"
Good luck.
On
It is not working if a cron job is given. It is executing the other enities
as well. Is there any solution?
--
View this message in context:
http://lucene.472066.n3.nabble.com/DataImportHandler-full-import-of-a-single-entity-tp2258037p4349551.html
Sent from the Solr - User mailing list archive
On 4/1/2017 4:17 PM, marotosg wrote:
> I am trying to load a big table into Solr using DataImportHandler and Mysql.
> I am getting OutOfMemory error because Solr is trying to load the full
> table. I have been reading different posts and tried batchSize="-1".
> https://wiki.apache.org/solr/DataIm
Hello, Sergio.
Have you tried Integer.MIN_VALUE ? -2147483648 see
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
On Sun, Apr 2, 2017 at 1:17 AM, marotosg wrote:
> Hi,
>
> I am trying to load a big table into Solr using DataImportHandler and
> Mysql
hawn Heisey
Subject: RE: DataImportHandler - Unable to load Tika Config Processing Document
# 1
> Thank you I will follow Erick's steps
> BTW I am also trying to ingesting using Flume , Flume uses Morphlines
> along with Tika Even Flume SolrSink will have the same issue?
Yes, when usin
> Thank you I will follow Erick's steps
> BTW I am also trying to ingesting using Flume , Flume uses Morphlines along
> with Tika
> Even Flume SolrSink will have the same issue?
Yes, when using Tika you run the risk of it choking on a document, eating CPU
and/or RAM until everything dies. This i
-user@lucene.apache.org
Subject: Re: DataImportHandler - Unable to load Tika Config Processing Document
# 1
On 2/8/2017 9:08 AM, Anatharaman, Srinatha (Contractor) wrote:
> Thank you for your reply
> Other archive message you mentioned is posted by me only I am new to
> Solr, When you say proces
On 2/8/2017 9:08 AM, Anatharaman, Srinatha (Contractor) wrote:
> Thank you for your reply
> Other archive message you mentioned is posted by me only
> I am new to Solr, When you say process outside Solr program. What exactly I
> should do?
>
> I am having lots of text document which I need to inde
document
I was able to successfully do this in Solr Core stand alone
-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: Wednesday, February 08, 2017 1:56 PM
To: solr-user@lucene.apache.org
Subject: RE: DataImportHandler - Unable to load Tika Config Processing
>It is *strongly* recommended to *not* use >the Tika that's embedded within
>Solr, but >instead to do the processing outside of Solr >in a program of your
>own and index the results.
+1
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201601.mbox/%3CBY2PR09MB11210EDFCFA297528940B07C
Solr?
Regards,
~Sri
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Wednesday, February 08, 2017 9:46 AM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler - Unable to load Tika Config Processing Document
# 1
On 2/6/2017 3:45 PM, Anatharaman, Srinatha
On 2/6/2017 3:45 PM, Anatharaman, Srinatha (Contractor) wrote:
> I am having below error while trying to index using dataImporthandler
>
> Data-Config file is mentioned below. zookeeper is not able to read
> "tikaConfig.xml" on below statement
>
> processor="TikaEntityProcessor" tikaConfig="tika
Thanks a lot Shawn.
Regards,
Prateek Jain
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: 23 December 2016 01:36 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler | Query | performance
On 12/23/2016 5:15 AM, Prateek Jain J wrote:
> We n
On 12/23/2016 5:15 AM, Prateek Jain J wrote:
> We need some advice/views on the way we push our documents in SOLR (4.8.1).
> So, here are the requirements:
>
> 1. Document could be from 5 to 100 KB in size.
>
> 2. 10-50 users actively querying solr with different sort of data.
>
> 3.
Thanks Alexandre,
I solved the problem using the xslt transform and the /update handler.
I attach the xsl that I put in conf/xslt/ (for documentation)
Then the command:
curl
"http://192.168.99.100:8999/solr/solrexchange/update?commit=true&tr=updateXmlSolrExchange.xsl";
-H "Content-Type: text/x
Seem you might be right, according to the source:
https://github.com/apache/lucene-solr/blob/master/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java#L662
Sometimes, the magic (and schemaless is rather magical) fails when
combined with older assumptions (an
I am rebuilding a new docker image with each change on the config file so solr
starts fresh every time.
add-unknown-fields-to-the-schema
solr-data-config.xml
still having document like such:
"response":{"numFound":8,"start":0,"docs":[
{
"id":"38
Ok, to reduce the magic, you can just stick "update.chain" parameter
inside the defaults of the dataimport handler directly.
You can also pass it just as a URL parameter. That's what 'defaults'
section mean.
And, just to be paranoid, you did reload the core after each of those
changes to test it?
It did not work,
I tried many things and ended up trying this:
solr-data-config.xml
add-unknown-fields-to-the-schema
Regards,
Pierre
> On 10 Aug 2016, at 18:08, Alexandre Rafalovitch wrote:
>
> Your initParams section does not apply to /dataimp
Your initParams section does not apply to /dataimport handler as
defined. Try modifying it to say:
path="/update/**,/dataimport"
Hopefully, that's all that takes.
Managed schema is enabled by default, but schemaless mode is the next
layer on top. With managed schema, you can use the API to add yo
Hi Alex,
thanks for your answer.
Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema.
add-unknown-fields-to-the-schema
I created my core using this command:
curl
http://192.168.99.100:8999/solr/admin/cores?action=CREATE&name=solrexchange&instanceDir=/opt/s
Do you have the actual fields defined? If not, then I am guessing that
your 'post' test was against a different collection that had
schemaless mode enabled and your DIH one is against one where
schemaless mode is not enabled (look for
'add-unknown-fields-to-the-schema' in the solrconfig.xml to conf
harshrossi gmail.com> writes:
>
> I am using *DeltaImportHandler* for indexing data in Solr. Currently I
am
> manually indexing the data into Solr by selecting commands full-import
or
> delta-import from the Solr Admin screen.
>
> I am using Windows 7 and would like to automate the process by
While it may be useful to have a scheduler for simple cases, I think there are
too many variables to make it useful for everyone's case. For example, I
recently wrote a script that uses the data import handler api to get the
status, kick off the import, etc. However, before allowing it to just
We should add a simple scheduler in the UI. It is very useful. To schedule
various actions:
- Full index
- Delta Index
- Replicate
On Tue, Sep 1, 2015 at 12:41 PM, Shawn Heisey wrote:
> On 9/1/2015 11:45 AM, Troy Edwards wrote:
> > My initial thought was to use scheduling built with DIH:
> >
On 9/1/2015 11:45 AM, Troy Edwards wrote:
> My initial thought was to use scheduling built with DIH:
> http://wiki.apache.org/solr/DataImportHandler#Scheduling
>
> But I think just a cron job should do the same for me.
The dataimport scheduler does not exist in any Solr version. This is a
propose
te to see
> who does the job - questions on how to do this should be directed to a
> zookeeper users' mailing list.
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Monday, August 31, 2015 7:50 PM
> To: solr-user@lucene.apache.org
>
on how to do this should be directed to a zookeeper users' mailing
list.
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Monday, August 31, 2015 7:50 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler scheduling
On 8/31/2015 11:26 AM, Troy Edwards
On 8/31/2015 11:26 AM, Troy Edwards wrote:
> I am having a hard time finding documentation on DataImportHandler
> scheduling in SolrCloud. Can someone please post a link to that? I have a
> requirement that the DIH should be initiated at a specific time Monday
> through Friday.
Every modern operat
x27;t "run once", but instead avoids overlap, so
there's good reason to write something specific to that case.
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: Monday, August 31, 2015 1:35 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportH
Hi Troy,
I think folks use corncobs (with curl utility) provided by the Operating System.
Ahmet
On Monday, August 31, 2015 8:26 PM, Troy Edwards
wrote:
I am having a hard time finding documentation on DataImportHandler
scheduling in SolrCloud. Can someone please post a link to that? I have a
I tend to approach these differently. DIH is a great tool for its purpose,
but I find SolrJ/Tika to be more understandable. Which may only reflect
that I've never spent enough time with DIH, but there it is
So, why not use a simple SolrJ program with either Tika or your favorite
HTML parser t
good.
--
View this message in context:
http://lucene.472066.n3.nabble.com/DataImportHandler-while-Replication-tp4138763p4139774.html
Sent from the Solr - User mailing list archive at Nabble.com.
Erick,
Thanks a bunch. Good to know the internals.
Best,
Robin
--
View this message in context:
http://lucene.472066.n3.nabble.com/DataImportHandler-while-Replication-tp4138763p4138984.html
Sent from the Solr - User mailing list archive at Nabble.com.
You don't need to do anything. Essentially, when replication starts the
current state of the master's index is "frozen" in the sense that the
_closed_ segments that make up the index at the time replication starts are
the only ones that are replicated to the slave.
All indexing happens into an _op
Hello Shalin,
Appreciate your reply. I've not implemented DIH in production and now we are
considering. hence the question, if we configure DIH on the master from
which the Slave is replicating the index, we will need to control this
externally? or is there any setting in DIH that will allow us to
They are completely separate components in Solr. Are you seeing performance
problems in replication due to indexing or vice versa?
On Fri, May 30, 2014 at 10:10 AM, Robin Woods wrote:
> Hi,
>
> What would happen to DataImportHandler that is setup on the master when the
> slave is in the process
Yes that is all fine with me. Only thing that worries me is what needs to be
coded in the batch file.
I will just try a sample batch file and get back with queries if any.
Thank you
--
View this message in context:
http://lucene.472066.n3.nabble.com/DataImportHandler-Automatic-scheduling-of-d
You can use PowerShell in windows to kick off a URL at a scheduled time.
On Thu, Apr 10, 2014 at 11:02 PM, harshrossi wrote:
> I am using *DeltaImportHandler* for indexing data in Solr. Currently I am
> manually indexing the data into Solr by selecting commands full-import or
> delta-import fr
DataImportHandler is just a URL call. You can see the specific URL you
want to call by opening debugger window in Chrome/Firefox and looking
at the network tab.
Then, you have a general problem of how to call a URL from Windows
Scheduler. Google brings a lot of results for that, so you should be
a
No, there is no synchronisation between data import handlers on
different cores. You will have to implement this sort of queuing logic
on your application's side.
On Wed, Nov 20, 2013 at 2:23 PM, Patrice Monroe Pustavrh
wrote:
> Hi,
> I am currently run Solr with 10 cores. It works fine with me,
Thanks. It'd be great if you can update this thread if you ever find a
workaround. We will document it on the DataImportHandlerFaq wiki page.
http://wiki.apache.org/solr/DataImportHandlerFaq
On Thu, Sep 12, 2013 at 4:56 PM, Raymond Wiker wrote:
> That sounds reasonable. I've done some more diggi
That sounds reasonable. I've done some more digging, and found that the
database instance in this case is an _OLD_ version of Oracle: 9.2.0.8.0. I
also tried using the OCI driver (version 12), which refuses to even talk to
this database.
I have three other databases running on more recent versions
This is probably a bug with Oracle thin JDBC driver. Google found a
similar issue:
http://stackoverflow.com/questions/4168494/resultset-getstring-on-varchar2-column-returns-empty-string
I don't think this is specific to DataImportHandler.
On Thu, Sep 12, 2013 at 12:43 PM, Raymond Wiker wrote:
>
Followup: I just tried modifying the select with
select CAST('APPLICATION' as varchar2(100)) as sourceid, ...
and that caused the sourceid field to be empty. CASTing to char(100) gave
me the expected value ('APPLICATION', right-padded to 100 characters).
Meanwhile, google gave me this: http://bu
It appears that this is simpler than I thought: in SOLR 4.4, at least,
there is a dataSource class named "FieldStreamDataSource" that I can use
directly with the TikaEntityProcessor. Given a blob column named DOCIMAGE,
I can use the following Tika entity:
...
There's no BlobTransformer in DataImportHandler. You'll have to write one.
Also, you'd probably need to write a FieldInputStreamDataSource instead of
FieldReaderDataSource.
On Tue, Jul 30, 2013 at 12:30 PM, Raymond Wiker wrote:
> I have a case where I want to documents and metadata content from
Instead of specifying CachedSqlEntityProcessor, you can specify
SqlEntityProcessor with "cacheImpl='SortedMapBackedCache'". If you
parametertize this, to have "SortedMapBackedCache" for full updates but blank
for deltas I think it will cache only on the full import.
Another option is to parame
ty.
>
> It would be great if you could just sketch the setup with the entities I
> provided. Because currently I have no idea on how to do it.
>
> Regards
>
> Constantin
>
>
> -Ursprüngliche Nachricht-
> Von: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com
it is possible to create two separate root entities . one for full-import
and another for delta. for the delta-import you can skip Cache that way
On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber <
constantin.wol...@medicalcolumbus.de> wrote:
> Hi,
>
> i searched for a solution for quite some
Have you looked at:
http://wiki.apache.org/solr/DataImportHandler#FieldReaderDataSource ?
Regards,
Alex.
On Fri, Apr 26, 2013 at 12:29 PM, Peri Subrahmanya
wrote:
> I have a column in my database that is of type long text and holds xml
> content. I was wondering when I define the entity recor
Actually, it is Solr 4.1+ where the implicit router will be used if
nuShards is not specified.
On Tue, Apr 23, 2013 at 2:52 PM, Joel Bernstein wrote:
> What version of Solr a re you using? In Solr 4.2+ if you don't specify
> numShards when creating the collection, the implicit document router w
What version of Solr a re you using? In Solr 4.2+ if you don't specify
numShards when creating the collection, the implicit document router will
be used. DIH running under the implicit document router most likely would
not distribute documents.
If this is the case you'll need to recreate the colle
alex, thank you for the link.
i enabled the trace for 'org.apache.solr.handler.dataimport' and it
seems as if the database is only called once:
2013-03-21T09:40:43
1363855243889
50
org.apache.solr.handler.dataimport.JdbcDataSource
FINE
org.apache.solr.handler.dataimport.JdbcDataSou
There was something like this on Stack Overflow:
http://stackoverflow.com/questions/15164166/solr-filelistentityprocessor-is-executing-sub-entities-multiple-times
Upgrading Solr helped partially, but the conclusion was not fully
satisfactory.
Regards,
Alex.
Personal blog: http://blog.outerth
On Nov 15, 2012, at 8:02 AM, Sébastien Lorber
wrote:
>
>
>
I don't know where you're getting the ${JOB_EXEC.JOB_INSTANCE_ID}. I believe
that if you want to get parameters passed in, it looks like this:
WHERE batchid = ${dataimporter.request.batchid}
when I kick
Swati
>
> -Original Message-
> From: Swati Swoboda [mailto:sswob...@igloosoftware.com]
> Sent: Thursday, August 09, 2012 11:09 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DataImportHandler WARNING: Unable to resolve variable
>
> I am getting a similar issue
ose errors -
null values are just not accepted, it seems.
Swati
-Original Message-
From: Swati Swoboda [mailto:sswob...@igloosoftware.com]
Sent: Thursday, August 09, 2012 11:09 PM
To: solr-user@lucene.apache.org
Subject: RE: DataImportHandler WARNING: Unable to resolve variable
I am
I am getting a similar issue when while using a Template Transformer. My fields
*always* have a value as well - it is getting indexed correctly.
Furthermore, the number of warnings I get seems arbitrary. I imported one
document (debug mode) and I got roughly ~400 of those warning messages for th
I discovered the schema.xml file about 2 minutes before I got your response. It
was very enlightening:-)
thanks for the tips about dynamicFields!
On May 3, 2012, at 1:02 PM, Jack Krupansky wrote:
> Those three field names are already in the Solr example schema. Either
> manually add your desi
Those three field names are already in the Solr example schema. Either
manually add your desired fields to the schema, change their names (column
vs. sourceColName) to fields that do exist in your Solr schema, give them
names that end with one of the dynamicField suffixes (such as "*_s"), or
en
r/DataImportHandler#Special_Commands .
James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
-Original Message-
From: janne mattila [mailto:jannepostilis...@gmail.com]
Sent: Thursday, March 29, 2012 12:45 AM
To: solr-user@lucene.apache.org
Subject: Re: dataImportHandler: delta query fetc
> I'm not sure why deltas were implemented this way. Possibly it was designed
> to behave like some of our object-to-relational libraries? In any case,
> there are 2 ways to do deltas and you just have to take your pick based on
> what will work best for your situation. I wouldn't consider th
You could use the Solr Command Utility SCU that runs from Windows and can be
scheduled to run.
https://github.com/justengland/Solr-Command-Utility
This is a windows system that will index using a core, and swap it if it
succeeds. It works it's Solr.
Let me know if you have any questions.
On
On 3/28/2012 12:46 PM, Artem Shnayder wrote:
Does anyone know of any work done to automatically run a backup prior to a
DataImportHandler full-import?
I've asked this question on #solr and was pointed to
https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
which
is helpfu
Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Artem Shnayder [mailto:artem@gmail.com]
> Sent: Wednesday, March 28, 2012 1:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DataImportHandler: backups prio
esday, March 28, 2012 1:59 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler: backups prior to full-import
My typical workflow is a once-a-day full-import with hourly delta-imports.
Ideally, the backup would occur only during the full-import commits. Is
there a way to differentiate i
My typical workflow is a once-a-day full-import with hourly delta-imports.
Ideally, the backup would occur only during the full-import commits. Is
there a way to differentiate in the replication handler?
On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James wrote:
> I don't know of any effort out there t
I don't know of any effort out there to have DIH trigger a backup
automatically. However, you can set the replication handler to automatically
backup after each commit. This might solve your problem if you aren't
committing frequently.
James Dyer
E-Commerce Systems
Ingram Content Group
(615)
Janne,
You're correct on how the delta import works. You specify 3 queries:
- deletedPkQuery = query should return all "id"s (only) of items that were
deleted since the last run.
- deltaQuery = query should return all "id"s (only) of items that were
added/updated since the last run.
- deltaImp
How did it work before SOLR-811 update? I don't understand. Did it
fetch delta data with two queries (1. gets ids, 2. gets data per each
id) or did it fetch all delta data with a single query?
On Tue, Mar 27, 2012 at 5:45 PM, Ahmet Arslan wrote:
>> 2. If not - what's the reason delta import is im
> 2. If not - what's the reason delta import is implemented
> like it is?
> Why split it in two queries? I would think having a single
> delta query
> that fetches the data would be kind of an "obvious" design
> unless
> there's something that calls for 2 separate queries...?
I think this is it? h
On 2/20/2012 6:49 AM, v_shan wrote:
DIH still running out of memory for me, with Full Import on a database of
size 1.5 GB.
Solr version: 3_5_0
Note that I have already added batchSize="-1" but getting same error.
A few questions:
- How much memory have you given to the JVM running this Solr
DIH still running out of memory for me, with Full Import on a database of
size 1.5 GB.
Solr version: 3_5_0
Note that I have already added batchSize="-1" but getting same error.
Sharing my DIH config below.
On Jan 28, 2012, at 09:02 , mathieu lacage wrote:
> This deserves an entry in
> http://wiki.apache.org/solr/DataImportHandlerFaqwhich I would have
> updated but it is immutable. *hint to those who have
> edit powers there*
You can make yourself a wiki account and then edit the page. An account i
On Sat, Jan 28, 2012 at 10:35 AM, mathieu lacage wrote:
>
> (I have tried two different sqlite jdbc drivers so, I doubt it could
> be a problem there, but, who knows).
>
I eventually screamed really loud when I read the source code of the sqlite
jdbc drivers: they interpret the jdbcDataSource at
On 1/28/12, mathieu lacage wrote:
>
> Le 28 janv. 2012 à 05:17, Lance Norskog a écrit :
>
>> Do all of the documents have unique id fields?
>
> yes.
I have debugged this further with
http://localhost:8080/solr/admin/dataimport.jsp?handler=/dataimport
The returned xml file when I ask for verbose
Le 28 janv. 2012 à 05:17, Lance Norskog a écrit :
> Do all of the documents have unique id fields?
yes.
>
> On Fri, Jan 27, 2012 at 10:44 AM, mathieu lacage
> wrote:
>> On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
>> wrote:
>>
>>>
>>> It seems to work but the following command reports
Do all of the documents have unique id fields?
On Fri, Jan 27, 2012 at 10:44 AM, mathieu lacage
wrote:
> On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
> wrote:
>
>>
>> It seems to work but the following command reports that only 499 documents
>> were indexed (yes, there are many more documents
On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
wrote:
>
> It seems to work but the following command reports that only 499 documents
> were indexed (yes, there are many more documents in my database):
>
And before anyone asks:
1
499
0
2012-01-27 19:37:16
Indexing completed. Added/Updated: 499 d
al Message-
From: Rob [mailto:rlusa...@gmail.com]
Sent: Tuesday, January 17, 2012 6:38 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler in Solr 4.0
Not a java pro, and the documentation hasn't been updated to include these
instructions (at least that I could find). What do I n
Not a java pro, and the documentation hasn't been updated to include these
instructions (at least that I could find). What do I need to do to perform
the steps that Alexandre is talking about?
--
View this message in context:
http://lucene.472066.n3.nabble.com/DataImportHandler-in-Solr-4-0-tp2563
Hey Rahul,
Thanks for the response. I actually just figured it thankfully :). To
answer your question, the raw_tag is indexed and not stored (tokenized),
and then there is a copyField for raw_tag to "raw_tag_string" which would
be used for facets. That *should have* been displayed in the results.
Hi Briggs,
By saying "multivalued fields are not getting indexed prperly", do you mean
to say that you are not able to search on those fields ?
Have you tried actually searching your Solr index for those multivalued
terms and make sure if it returns the search results ?
One possibility could be t
In addition, I tried a query like below and changed the column definition
to
and still no luck. It is indexing the full content now but not multivalued.
It seems like the "splitBy" ins't working properly.
select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
from
: We're using DIH to import flat xml files. We're getting Heap memory
: exceptions due to the file size. Is there any way to force DIH to do a
: streaming parse rather than a DOM parse? I really don't want to chunk my
: files up or increase the heap size.
The XPathEntityProcessor is using a
: > Noble? Shalin? what's the point of throwing away a connection that's been
: > in use for more then 10 seconds?
: Hoss, as others have noted, DIH throws away connections which have been idle
: for more than the timeout value (10 seconds). The jdbc standard way of
: checking for a valid connec
It's best run the data import once per minute. Solr updates works best when
updates are batched and commits are infrequent.
Doing a post per document as a transaction would require a solr commit,
which could cause the server to hang under update load. Of course you could
not do the commit but your
On Sat, Sep 3, 2011 at 1:29 AM, Chris Hostetter wrote:
>
> : I am not sure if current version has this, but DIH used to reload
> : connections after some idle time
> :
> : if (currTime - connLastUsed > CONN_TIME_OUT) {
> : synchronized (this) {
> :
watch out, "running 10 hours" != "idling 10 seconds" and trying again.
Those are different cases.
It is not dropping *used* connections (good to know it works that
good, thanks for reporting!), just not reusing connections more than
10 seconds idle
On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty
take care, "running 10 hours" != "idling 10 seconds" and trying again.
Those are different cases.
It is not dropping *used* connections (good to know it works that
good, thanks for reporting!), just not reusing connections more than
10 seconds idle
On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty
On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey wrote:
[...]
> I use DIH with MySQL. When things are going well, a full rebuild will leave
> connections open and active for over two hours. This is the case with
> 1.4.0, 1.4.1, 3.1.0, and 3.2.0. Due to some kind of problem on the database
> server,
On 9/2/2011 1:59 PM, Chris Hostetter wrote:
: I am not sure if current version has this, but DIH used to reload
: connections after some idle time
:
: if (currTime - connLastUsed> CONN_TIME_OUT) {
: synchronized (this) {
: Connection tmpConn =
: I am not sure if current version has this, but DIH used to reload
: connections after some idle time
:
: if (currTime - connLastUsed > CONN_TIME_OUT) {
: synchronized (this) {
: Connection tmpConn = factory.call();
:
I am not sure if current version has this, but DIH used to reload
connections after some idle time
if (currTime - connLastUsed > CONN_TIME_OUT) {
synchronized (this) {
Connection tmpConn = factory.call();
clos
: However, I tested this against a slower SQL Server and I saw
: dramatically worse results. Instead of re-using their database, each of
: the sub-entities is recreating a connection each time the query runs.
are you seeing any specific errors logged before these new connections are
created?
1 - 100 of 319 matches
Mail list logo