i recommend method 2, also for security reason.
2012/5/12 financeturd financeturd <[email protected]>
> Hello,
>
> We have a large number of
> custom-generated files (not just web logs) that we need to move from our
> JBoss servers to HDFS. Our first implementation ran a cron job every 5
> minutes to move our files from the "output" directory to HDFS.
>
> Is this recommended? We are being told by our IT team that our JBoss
> servers should not have access to HDFS for security reasons. The files
> must be "sucked" to HDFS by other servers that do not accept traffic
> from the outside. In essence, they are asking for a layer of
> indirection. Instead of:
> {JBoss server} --> {HDFS}
> it's being requested that it look like:
> {Separate server} <-- {JBoss server}
> and then
> {Separate server} --> HDFS
>
>
> While I understand in principle what is being said, the security of having
> processes on JBoss servers writing files to HDFS doesn't seem any worse
> than having Tomcat servers access a central database, which they do.
>
> Can anyone comment on what a recommended approach would be? Should our
> JBoss servers push their data to HDFS or should the data be pulled by
> another server and then placed into HDFS?
>
> Thank you!
> FT
--
Regards
Junyong