date:20101218

Re: Is there a way to create multiple using DIH and access the data pertaining to a particular ?

2010-12-18 Thread Koji Sekiguchi


(10/11/11 1:57), bbarani wrote:


Hi,

I have a peculiar situation where we are trying to use SOLR for indexing
multiple tables (There is no relation between these tables). We are trying
to use the SOLR index instead of using the source tables and hence we are
trying to create the SOLR index as that of source tables.

There are 3 tables which needs to be indexed.

Table 1, table 2 and table 3.

I am trying to index each table in seperate doc tag with different doc tag
name and each table has some of the common field names. For Ex:

  




 






Barani,

You cannot have multiple documents in a data-config, but you can
have multiple entities in a document. And if your table 1,2, and 3
come from different dataSources, you can have multiple data sources
in a data-config. If so, you should use dataSource attribute of entity
element to refer to the name of dataSource:


  
  
  
  



  


Koji
--
http://www.rondhuit.com/en/

Re: Is there a way to create multiple using DIH and access the data pertaining to a particular ?

2010-12-18 Thread Dennis Gearon

Just curious, do these tables have the same schema, like a set of shards would? 

If not, how do you map them to the index?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Koji Sekiguchi 
To: solr-user@lucene.apache.org
Sent: Sat, December 18, 2010 5:19:08 AM
Subject: Re: Is there a way to create multiple  using DIH and access the 
data pertaining to a particular  ?

(10/11/11 1:57), bbarani wrote:
> 
> Hi,
> 
> I have a peculiar situation where we are trying to use SOLR for indexing
> multiple tables (There is no relation between these tables). We are trying
> to use the SOLR index instead of using the source tables and hence we are
> trying to create the SOLR index as that of source tables.
> 
> There are 3 tables which needs to be indexed.
> 
> Table 1, table 2 and table 3.
> 
> I am trying to index each table in seperate doc tag with different doc tag
> name and each table has some of the common field names. For Ex:
> 
>
> 
> 
> 
> 
>
> 
> 
> 
> 

Barani,

You cannot have multiple documents in a data-config, but you can
have multiple entities in a document. And if your table 1,2, and 3
come from different dataSources, you can have multiple data sources
in a data-config. If so, you should use dataSource attribute of entity
element to refer to the name of dataSource:

Koji
-- http://www.rondhuit.com/en/

RE: Memory use during merges (OOM)

2010-12-18 Thread Burton-West, Tom

Thanks Robert, 

We will try the termsIndexInterval as a workaround.   I have also opened a JIRA 
issue: https://issues.apache.org/jira/browse/SOLR-2290.
Hope I found the right sections of the Lucene code.  I'm just now in the 
process of looking at the Solr IndexReaderFactory and SolrIndexWriter and 
SolrIndexConfig  trying to better understand how solrconfig.xml gets 
instantiated and how it affects the readers and writers.

Tom

From: Robert Muir [rcm...@gmail.com]

On Thu, Dec 16, 2010 at 4:03 PM, Burton-West, Tom  wrote:
>>>Your setting isn't being applied to the reader IW uses during
>>>merging... its only for readers Solr opens from directories
>>>explicitly.
>>>I think you should open a jira issue!
>
> Do I understand correctly that this setting in theory could be applied to the 
> reader IW uses during merging but is not currently being applied?

yes, i'm not really sure (especially given the "name=") if you can/or
it was planned to have multiple IR factories in solr, e.g. a separate
one for spellchecking.
so i'm not sure if we should (hackishly) steal this parameter from the
IR factory (it is common to all IRFactories, not just
StandardIRFactory) and apply it to to IW..

but we could at least expose the divisor param separately to the IW
config so you have some way of setting it.

>
>  class="org.apache.solr.core.StandardIndexReaderFactory">
>8
>  
>
> I understand the tradeoffs for doing this during searching, but not the 
> trade-offs for doing this during merging.  Is the use during merging the 
> similar to the use during searching?
>
>  i.e. Some process has to look up data for a particular term as opposed to 
> having to iterate through all the terms?
>  (Haven't yet dug into the merging/indexing code).

it needs it for applying deletes...

as a workaround (if you are reindexing), maybe instead of using the
Terms Index Divisor=8 you could set the Terms Index Interval = 1024 (8
* 128) ?

this will solve your merging problem, and have the same perf
characteristics of divisor=8, except you cant "go back down" like you
can with the divisor without reindexing with a smaller interval...

if you've already tested that performance with the divisor of 8 is
acceptable, or in your case maybe necessary!, it sort of makes sense
to 'bake it in' by setting your divisor back to 1 and your interval =
1024 instead...

Re: how to config DataImport Scheduling

2010-12-18 Thread Hamid Vahedi

I think it must work with any version of solr. because it works url base (see 
config file). 


Attention to this point: Successfully tested on Apache Tomcat v6(should work on 
any other servlet container)

 


From: Ahmet Arslan 
To: solr-user@lucene.apache.org
Sent: Fri, December 17, 2010 3:22:37 AM
Subject: Re: how to config DataImport Scheduling

> I also have the same problem, i configure
> dataimport.properties file as shown
> in 
> http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example
> but no change occur, can any one help me

What version of solr are you using? This seems a new feature. So it won't work 
on solr 1.4.1.

Re: Is there a way to create multiple using DIH and access the data pertaining to a particular ?

2010-12-18 Thread Lance Norskog

You can have multiple documents generated by the same data-config:


 
 
 
 
   
   
  
   
   
  
   
 


It's the 'rootEntity="false" that makes the child entity a document.

On Sat, Dec 18, 2010 at 7:43 AM, Dennis Gearon  wrote:
> Just curious, do these tables have the same schema, like a set of shards 
> would?
>
> If not, how do you map them to the index?
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
> better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Message 
> From: Koji Sekiguchi 
> To: solr-user@lucene.apache.org
> Sent: Sat, December 18, 2010 5:19:08 AM
> Subject: Re: Is there a way to create multiple  using DIH and access the
> data pertaining to a particular  ?
>
> (10/11/11 1:57), bbarani wrote:
>>
>> Hi,
>>
>> I have a peculiar situation where we are trying to use SOLR for indexing
>> multiple tables (There is no relation between these tables). We are trying
>> to use the SOLR index instead of using the source tables and hence we are
>> trying to create the SOLR index as that of source tables.
>>
>> There are 3 tables which needs to be indexed.
>>
>> Table 1, table 2 and table 3.
>>
>> I am trying to index each table in seperate doc tag with different doc tag
>> name and each table has some of the common field names. For Ex:
>>
>> 
>>     
>>     
>>     
>> 
>> 
>>     
>>     
>>     
>> 
>
> Barani,
>
> You cannot have multiple documents in a data-config, but you can
> have multiple entities in a document. And if your table 1,2, and 3
> come from different dataSources, you can have multiple data sources
> in a data-config. If so, you should use dataSource attribute of entity
> element to refer to the name of dataSource:
>
> 
>  
>  
>  
>  
>    
>    
>    
>  
> 
>
> Koji
> -- http://www.rondhuit.com/en/
>
>



-- 
Lance Norskog
goks...@gmail.com

old index files not deleted on slave

2010-12-18 Thread feedly team

I have set up index replication (triggered on optimize). The problem I
am having is the old index files are not being deleted on the slave.
After each replication, I can see the old files still hanging around
as well as the files that have just been pulled. This causes the data
directory size to increase by the index size every replication until
the disk fills up.

Checking the logs, I see the following error:

SEVERE: SnapPull failed
org.apache.solr.common.SolrException: Index fetch failed :
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:265)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock
obtain timed out:
NativeFSLock@/var/solrhome/data/index/lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1065)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:954)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:192)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:99)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at 
org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471)
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
... 11 more

lsof reveals that the file is still opened from the java process.

I am running 4.0 rev 993367 with patch SOLR-1316. Otherwise, the setup
is pretty vanilla. The OS is linux, the indexes are on local
directories, write permissions look ok, nothing unusual in the config
(default deletion policy, etc.). Contents of the index data dir:

master:
-rw-rw-r-- 1 feeddo feeddo  191 Dec 14 01:06 _1lg.fnm
-rw-rw-r-- 1 feeddo feeddo  26M Dec 14 01:07 _1lg.fdx
-rw-rw-r-- 1 feeddo feeddo 1.9G Dec 14 01:07 _1lg.fdt
-rw-rw-r-- 1 feeddo feeddo 474M Dec 14 01:12 _1lg.tis
-rw-rw-r-- 1 feeddo feeddo  15M Dec 14 01:12 _1lg.tii
-rw-rw-r-- 1 feeddo feeddo 144M Dec 14 01:12 _1lg.prx
-rw-rw-r-- 1 feeddo feeddo 277M Dec 14 01:12 _1lg.frq
-rw-rw-r-- 1 feeddo feeddo  311 Dec 14 01:12 segments_1ji
-rw-rw-r-- 1 feeddo feeddo  23M Dec 14 01:12 _1lg.nrm
-rw-rw-r-- 1 feeddo feeddo  191 Dec 18 01:11 _24e.fnm
-rw-rw-r-- 1 feeddo feeddo  26M Dec 18 01:12 _24e.fdx
-rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 01:12 _24e.fdt
-rw-rw-r-- 1 feeddo feeddo 483M Dec 18 01:23 _24e.tis
-rw-rw-r-- 1 feeddo feeddo  15M Dec 18 01:23 _24e.tii
-rw-rw-r-- 1 feeddo feeddo 146M Dec 18 01:23 _24e.prx
-rw-rw-r-- 1 feeddo feeddo 283M Dec 18 01:23 _24e.frq
-rw-rw-r-- 1 feeddo feeddo  311 Dec 18 01:24 segments_1xz
-rw-rw-r-- 1 feeddo feeddo  23M Dec 18 01:24 _24e.nrm
-rw-rw-r-- 1 feeddo feeddo  191 Dec 18 13:15 _25z.fnm
-rw-rw-r-- 1 feeddo feeddo  26M Dec 18 13:16 _25z.fdx
-rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 13:16 _25z.fdt
-rw-rw-r-- 1 feeddo feeddo 484M Dec 18 13:35 _25z.tis
-rw-rw-r-- 1 feeddo feeddo  15M Dec 18 13:35 _25z.tii
-rw-rw-r-- 1 feeddo feeddo 146M Dec 18 13:35 _25z.prx
-rw-rw-r-- 1 feeddo feeddo 284M Dec 18 13:35 _25z.frq
-rw-rw-r-- 1 feeddo feeddo   20 Dec 18 13:35 segments.gen
-rw-rw-r-- 1 feeddo feeddo  311 Dec 18 13:35 segments_1y1
-rw-rw-r-- 1 feeddo feeddo  23M Dec 18 13:35 _25z.nrm

slave:
-rw-rw-r-- 1 feeddo feeddo   20 Dec 13 17:54 segments.gen
-rw-rw-r-- 1 feeddo feeddo  191 Dec 15 01:07 _1mk.fnm
-rw-rw-r-- 1 feeddo feeddo  26M Dec 15 01:08 _1mk.fdx
-rw-rw-r-- 1 feeddo feeddo 1.9G Dec 15 01:08 _1mk.fdt
-rw-rw-r-- 1 feeddo feeddo 476M Dec 15 01:18 _1mk.tis
-rw-rw-r-- 1 feeddo feeddo  15M Dec 15 01:18 _1mk.tii
-rw-rw-r-- 1 feeddo feeddo 144M Dec 15 01:18 _1mk.prx
-rw-rw-r-- 1 feeddo feeddo 278M Dec 15 01:18 _1mk.frq
-rw-rw-r-- 1 feeddo feeddo  312 Dec 15 01:18 segments_1kj
-rw-rw-r-- 1 feeddo feeddo  23M Dec 15 01:18 _1mk.nrm
-rw-rw-r-- 1 feeddo feeddo

Re: Is there a way to create multiple using DIH and access the data pertaining to a particular ?

2010-12-18 Thread Lance Norskog

And, a use case: Tika blows up on some files. But we still want other
data like file name etc. and an empty text field. So:




 
 
  
   


Both documents have the same unique id. If the Tika autoparser uses
PDF and the PDF works, the second document overwrites the first. If
the PDF blows up, the second document skips and: the first document
goes in.

Ugly, yes, but a testament to the maturity of DIH that it had enough
tools to work around a Tika weakness. Oh, and the AutoParser does not
work: SOLR-2116:
https://issues.apache.org/jira/browse/SOLR-2116

In my previous example, the innermost entities below should be 
not . Sorry for any confusion.

On Sat, Dec 18, 2010 at 4:22 PM, Lance Norskog  wrote:
> You can have multiple documents generated by the same data-config:
>
> 
>  
>  
>  
>  
>   
>       
>          
>       
>       
>          
>       
>  
> 
>
> It's the 'rootEntity="false" that makes the child entity a document.
>
> On Sat, Dec 18, 2010 at 7:43 AM, Dennis Gearon  wrote:
>> Just curious, do these tables have the same schema, like a set of shards 
>> would?
>>
>> If not, how do you map them to the index?
>>
>>  Dennis Gearon
>>
>>
>> Signature Warning
>> 
>> It is always a good idea to learn from your own mistakes. It is usually a 
>> better
>> idea to learn from others’ mistakes, so you do not have to make them 
>> yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>>
>> EARTH has a Right To Life,
>> otherwise we all die.
>>
>>
>>
>> - Original Message 
>> From: Koji Sekiguchi 
>> To: solr-user@lucene.apache.org
>> Sent: Sat, December 18, 2010 5:19:08 AM
>> Subject: Re: Is there a way to create multiple  using DIH and access the
>> data pertaining to a particular  ?
>>
>> (10/11/11 1:57), bbarani wrote:
>>>
>>> Hi,
>>>
>>> I have a peculiar situation where we are trying to use SOLR for indexing
>>> multiple tables (There is no relation between these tables). We are trying
>>> to use the SOLR index instead of using the source tables and hence we are
>>> trying to create the SOLR index as that of source tables.
>>>
>>> There are 3 tables which needs to be indexed.
>>>
>>> Table 1, table 2 and table 3.
>>>
>>> I am trying to index each table in seperate doc tag with different doc tag
>>> name and each table has some of the common field names. For Ex:
>>>
>>> 
>>>     
>>>     
>>>     
>>> 
>>> 
>>>     
>>>     
>>>     
>>> 
>>
>> Barani,
>>
>> You cannot have multiple documents in a data-config, but you can
>> have multiple entities in a document. And if your table 1,2, and 3
>> come from different dataSources, you can have multiple data sources
>> in a data-config. If so, you should use dataSource attribute of entity
>> element to refer to the name of dataSource:
>>
>> 
>>  
>>  
>>  
>>  
>>    
>>    
>>    
>>  
>> 
>>
>> Koji
>> -- http://www.rondhuit.com/en/
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Re: old index files not deleted on slave

2010-12-18 Thread Lance Norskog

This could be a quirk of the native locking feature. What's the file
system? Can you fsck it?

If this error keeps happening, please file this. It should not happen.
Add the text above and also your solrconfigs if you can.

One thing you could try is to change from the native locking policy to
the simple locking policy - but only on the child.

On Sat, Dec 18, 2010 at 4:44 PM, feedly team  wrote:
> I have set up index replication (triggered on optimize). The problem I
> am having is the old index files are not being deleted on the slave.
> After each replication, I can see the old files still hanging around
> as well as the files that have just been pulled. This causes the data
> directory size to increase by the index size every replication until
> the disk fills up.
>
> Checking the logs, I see the following error:
>
> SEVERE: SnapPull failed
> org.apache.solr.common.SolrException: Index fetch failed :
>        at 
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
>        at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:265)
>        at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
>        at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>        at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>        at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>        at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.lucene.store.LockObtainFailedException: Lock
> obtain timed out:
> NativeFSLock@/var/solrhome/data/index/lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock
>        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>        at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1065)
>        at org.apache.lucene.index.IndexWriter.(IndexWriter.java:954)
>        at 
> org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:192)
>        at 
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:99)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376)
>        at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471)
>        at 
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
>        ... 11 more
>
> lsof reveals that the file is still opened from the java process.
>
> I am running 4.0 rev 993367 with patch SOLR-1316. Otherwise, the setup
> is pretty vanilla. The OS is linux, the indexes are on local
> directories, write permissions look ok, nothing unusual in the config
> (default deletion policy, etc.). Contents of the index data dir:
>
> master:
> -rw-rw-r-- 1 feeddo feeddo  191 Dec 14 01:06 _1lg.fnm
> -rw-rw-r-- 1 feeddo feeddo  26M Dec 14 01:07 _1lg.fdx
> -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 14 01:07 _1lg.fdt
> -rw-rw-r-- 1 feeddo feeddo 474M Dec 14 01:12 _1lg.tis
> -rw-rw-r-- 1 feeddo feeddo  15M Dec 14 01:12 _1lg.tii
> -rw-rw-r-- 1 feeddo feeddo 144M Dec 14 01:12 _1lg.prx
> -rw-rw-r-- 1 feeddo feeddo 277M Dec 14 01:12 _1lg.frq
> -rw-rw-r-- 1 feeddo feeddo  311 Dec 14 01:12 segments_1ji
> -rw-rw-r-- 1 feeddo feeddo  23M Dec 14 01:12 _1lg.nrm
> -rw-rw-r-- 1 feeddo feeddo  191 Dec 18 01:11 _24e.fnm
> -rw-rw-r-- 1 feeddo feeddo  26M Dec 18 01:12 _24e.fdx
> -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 01:12 _24e.fdt
> -rw-rw-r-- 1 feeddo feeddo 483M Dec 18 01:23 _24e.tis
> -rw-rw-r-- 1 feeddo feeddo  15M Dec 18 01:23 _24e.tii
> -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 01:23 _24e.prx
> -rw-rw-r-- 1 feeddo feeddo 283M Dec 18 01:23 _24e.frq
> -rw-rw-r-- 1 feeddo feeddo  311 Dec 18 01:24 segments_1xz
> -rw-rw-r-- 1 feeddo feeddo  23M Dec 18 01:24 _24e.nrm
> -rw-rw-r-- 1 feeddo feeddo  191 Dec 18 13:15 _25z.fnm
> -rw-rw-r-- 1 feeddo feeddo  26M Dec 18 13:16 _25z.fdx
> -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 13:16 _25z.fdt
> -rw-rw-r-- 1 feeddo feeddo 484M Dec 18 13:35 _25z.tis
> -rw-rw-r-- 1 feeddo feeddo  15M Dec 18 13:35 _25z.tii
> -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 13:35 _25z.prx
> -rw-rw-r-- 1 feeddo feeddo 284M Dec 18 13:35 _25z.frq
> -rw-rw-r-- 1 feeddo feeddo   20 Dec 18 13:35 segments.gen
> -rw-rw-r-- 1 feeddo feeddo  311 Dec 18 13:35 segments_1y1
> -rw-rw-r-- 1 feeddo feeddo  23M Dec 18 13:35 _25z.nrm
>
> slave:
> -rw-rw-r-- 1 feeddo fe

DIH for sharded database?

2010-12-18 Thread Andy

I have a table that is broken up into many virtual shards. So basically I have 
N identical tables:

Document1
Document2
.
.
Document36

Currently these tables all live in the same database, but in the future they 
may be moved to different servers to scale out if the needs arise.

Is there any way to configure a DIH for these tables so that it will 
automatically loop through the 36 identical tables and pull data out for 
indexing?

Something like (pseudo code):

for (i = 1; i <= 36; i++) {
   ## retrieve data from the table Document{$i} & index the data
}

What's the best way to handle a situation like this?

Thanks

Re: DIH for sharded database?

2010-12-18 Thread Lance Norskog

You can have a file with 1,2,3 on separate lines. There is a
line-by-line file reader that can pull these as separate drivers.
Inside that entity the JDBC url has to be altered with the incoming
numbers. I don't know if this will work.

It also may work for single-threaded DIH but not during multiple
threads. (Ignore this for Solr 1.4, you have no threads feature.)

On Sat, Dec 18, 2010 at 6:20 PM, Andy  wrote:
> I have a table that is broken up into many virtual shards. So basically I 
> have N identical tables:
>
> Document1
> Document2
> .
> .
> Document36
>
> Currently these tables all live in the same database, but in the future they 
> may be moved to different servers to scale out if the needs arise.
>
> Is there any way to configure a DIH for these tables so that it will 
> automatically loop through the 36 identical tables and pull data out for 
> indexing?
>
> Something like (pseudo code):
>
> for (i = 1; i <= 36; i++) {
>   ## retrieve data from the table Document{$i} & index the data
> }
>
> What's the best way to handle a situation like this?
>
> Thanks
>
>
>
>

-- 
Lance Norskog
goks...@gmail.com

Re: DIH for sharded database?

2010-12-18 Thread Andy

--- On Sat, 12/18/10, Lance Norskog  wrote:

> You can have a file with 1,2,3 on
> separate lines. There is a
> line-by-line file reader that can pull these as separate
> drivers.
> Inside that entity the JDBC url has to be altered with the
> incoming
> numbers. I don't know if this will work.

I'm not sure I understand.

How will altering the JDBC url change the name of the table it is importing 
data from?

Wouldn't I need to change the  actual SQL query itself?

"select * from Document1"
"select * from Document2"
...
"select * from Document36"

Re: Is there a way to create multiple using DIH and access the data pertaining to a particular ?

Re: Is there a way to create multiple using DIH and access the data pertaining to a particular ?

RE: Memory use during merges (OOM)

Re: how to config DataImport Scheduling

Re: Is there a way to create multiple using DIH and access the data pertaining to a particular ?

old index files not deleted on slave

Re: Is there a way to create multiple using DIH and access the data pertaining to a particular ?

Re: old index files not deleted on slave

DIH for sharded database?

Re: DIH for sharded database?

Re: DIH for sharded database?

11 matches

Site Navigation

Mail list logo

Footer information