Skip to main content

amazon ec2 - AWS EC2 Mailserver Failover Strategies done right

I'm researching in this topic really hard the last few days and i just want to discuss this with a few specific questions - i did not find any suitable thread here that is covering my needs and especially, that is quite actual - the most posts about this topic are around 2010 when, i guess, the last time AWS had a big failure (a whole region in murica was down when i remember right)



The current state:



We're running a Mailserver based on Ubuntu with Postfix/Dovecot/Horde, reading all mailbased configs out of a MySQL database. This is running as an EC2 instance with an EBS Storage where the OS and currently also the mails are stored. So far so good, but we're a startup and not just a private person who needs this server - so it is a Mailservice for our customers, super critical and verry important for us. After a few fails and downtimes in the first year, i will dramatically improve the setup - so i thought about "redundancy", basically..



The requirement:



The server must be "redundant" in some way, a fail of a single EC2 instance should not break the whole service anymore.




My research so far and options i see to solve:




  • Copy the instance into another region for example and build a "real" redundancy, a little bit old fashioned but that's what i learned back in school - using the new server as an MX-Backup configured through a second MX-Entry in DNS with lower priority. Problems here: Solving the data-redundancy -> i need to use rsync and db-replication for example to sync both servers. Not the option i want to implement because it can be super-tricky...


  • Service-Driven Solution, just using the AWS Possibilities right. I should use RDS for database and S3 for storage. So, if i have all the mails in the storage cloud (S3) and all the config-database-data in the db-cloud (RDS) -> the instance itself gets super flexible. This will give me the possibility to run several instances of that type in the same moment - so i can use ELB of EC2 to handle the load, starting new instances and detect failovers if one instance dies!! On the other side, my critical data spots, db and mailstorage would be service-driven, so i have not to think about failovers, downtime and most important, about scaleability anymore! So far the absolutely best solution i can imagine, but i see some serious problems.




Final Questions:





  • I never saw a good integration of S3 directly into the filesystem of Ubuntu - the experience i made is, that after few days of permanent run, the mount can disappear suddenly and with no reason and on the other side, multiple mounted S3 "drives" will replicate their data very very slow - i can understand that because it's a global cloud service but... How should this work? Imagine multiple running mailserver-instances, each using the same S3-drive -> so it is a requirement to replicate the maildata in an instant! So how we can "implement" a service-driven mailstorage that is really working with AWS? Has anyone ever made something like this? I just read everywhere "yeah so, you have to use aws services to solve that" but i can't find real implementations of that with mail.


  • Would an EBS-Based solution be better? So each running instance will have its own, dedicated drive to store, super-available and fast and again i will make an rsync setup to sync each other... Big contra here, huge costs.. each instance must have a huge EBS because everyone have to store ALL mails -> bullshit ^^




Is there any other failover scenario with AWS which i don't know yet? Sorry for the long text but i wanted to share all my thoughts so far... Thanks for reading if anyone does! :)

Comments

Popular posts from this blog

linux - iDRAC6 Virtual Media native library cannot be loaded

When attempting to mount Virtual Media on a iDRAC6 IP KVM session I get the following error: I'm using Ubuntu 9.04 and: $ javaws -version Java(TM) Web Start 1.6.0_16 $ uname -a Linux aud22419-linux 2.6.28-15-generic #51-Ubuntu SMP Mon Aug 31 13:39:06 UTC 2009 x86_64 GNU/Linux $ firefox -version Mozilla Firefox 3.0.14, Copyright (c) 1998 - 2009 mozilla.org On Windows + IE it (unsurprisingly) works. I've just gotten off the phone with the Dell tech support and I was told it is known to work on Linux + Firefox, albeit Ubuntu is not supported (by Dell, that is). Has anyone out there managed to mount virtual media in the same scenario?

hp proliant - Smart Array P822 with HBA Mode?

We get an HP DL360 G8 with an Smart Array P822 controller. On that controller will come a HP StorageWorks D2700 . Does anybody know, that it is possible to run the Smart Array P822 in HBA mode? I found only information about the P410i, who can run HBA. If this is not supported, what you think about the LSI 9207-8e controller? Will this fit good in that setup? The Hardware we get is used but all original from HP. The StorageWorks has 25 x 900 GB SAS 10K disks. Because the disks are not new I would like to use only 22 for raid6, and the rest for spare (I need to see if the disk count is optimal or not for zfs). It would be nice if I'm not stick to SAS in future. As OS I would like to install debian stretch with zfs 0.71 as file system and software raid. I have see that hp has an page for debian to. I would like to use hba mode because it is recommend, that zfs know at most as possible about the disk, and I'm independent from the raid controller. For us zfs have many benefits,

apache 2.2 - Server Potentially Compromised -- c99madshell

So, low and behold, a legacy site we've been hosting for a client had a version of FCKEditor that allowed someone to upload the dreaded c99madshell exploit onto our web host. I'm not a big security buff -- frankly I'm just a dev currently responsible for S/A duties due to a loss of personnel. Accordingly, I'd love any help you server-faulters could provide in assessing the damage from the exploit. To give you a bit of information: The file was uploaded into a directory within the webroot, "/_img/fck_uploads/File/". The Apache user and group are restricted such that they can't log in and don't have permissions outside of the directory from which we serve sites. All the files had 770 permissions (user rwx, group rwx, other none) -- something I wanted to fix but was told to hold off on as it wasn't "high priority" (hopefully this changes that). So it seems the hackers could've easily executed the script. Now I wasn't able