Best Practice: AWS ftp with file processing
I'm looking for some direction on an AWS architictural decision. My goal
is to allow users to ftp a file to an EC2 instance and then run some
analysis on the file. My focus is to build this in as much a
service-oriented way as possible.. and in the future scale it out for
multiple clients where each would have there own ftp server and processing
queue with no co-mingling of data.
Currently I have a dev EC2 instance with vsftpd installed and a node.js
process running Chokidar that is continuously watching for new files to be
dropped. When that file drops I'd like for another server or group of
servers to be notified to get the file and process it.
Should the ftp server move the file to S3 and then use SQS to let the pool
of processing servers know that it's ready for processing? Should I use
SQS and then have the pool of servers ssh into the ftp instance (or other
approach) to get the file rather than use S3 as a intermediary? Are there
better approaches?
Any guidance is very much appreciated. Feel free to school me on any
alternate ideas that might save money at high file volume.
Thank you for the help.
No comments:
Post a Comment