This could potentially be very long post, but I am going to stick to the high-level objectives only. Leave a comment if you want more details and I will reply to you.
OBJECTIVE: Your users are stored in an ADAM database and your ASP.net website sits on top of this utilizing the ASP.Net membership framework to interface with ADAM to authenticate users. You want to make sure that you are covered in case of any disaster scenarios (like disk corruption, ADAM server blowing up, and manually unintended corruption by your system admins)
PROBLEM: If you notice carefully, we are talking two different things here.
- Hardware failures – i.e Poof!!!! and your ADAM instance just disappears. Panic your website is down!!!
- Manual data corruption – So, if your sysadmin does something foolish and say he updates all users with the same last name using a vbs script or something. This is more insidious because your website is not down, ADAM is not down but your user data is corrupt.
SOLUTION: So although we have two distinct ways of getting into trouble, the end result is the same and the solutions are also the same. But first let us talk about what is required at the minimum to recover from a failure scenario.
ADAM Replication – Fortunately, for us, ADAM (or LDS) comes out of the box with support for replication. What this means, is once your main ADAM instance is up and running you can install multiple ADAM instances on other servers as “replicated instances” and all these servers magically know how to talk to each other and keep their data in sync with each other.
Plus, it gives you the flexibility of turning on “two way” replication. i.e you change data on the replicated instance and the main ADAM server reflects these changes. You have the option of staggered replication. i.e the replicated instances will receive deltas from the main server only after XX minutes or hours. Any light bulbs yet on how you will use this to recover from bad things happening?
Windows Backups – I know, I know. Nobody uses Windows Backup and Restore. But this is the perfect place to use it. You simply setup a backup job which backs up your ADAM directory to a file server. We have done it where I work and this gives us nightly backups for every day going back to last 60 days. Also, ADAM has a lock on the files on disk, but Windows Backup used Volume Shadow Copy which takes care backing up files even if they are locked by a process. We use the append option, so backups are not overwritten every night, but are appended (Keep an eye on that backup file though…it can grow pretty fast !!!)
So, now that we know the proper way to protect us from bad things. Here is how you will apply it to various situations:
- Hardware failures – If your main ADAM server blows up, you could simply point your website to the replicated instance since it has the latest and greatest data.
- Data Corruption by SysAdmin – So, if your sysadmin writes a script which updates every one’s SSN to 000-00-0000 for example, you could either restore the ADAM data from last night’s backup (believe me it is amazing simple to overwrite ADAM data from backup and get up and running in no time). Or, if you had staggered replication setup (i.e replicated instances receive change deltas only after 1 hour and your sysadmin reports the data corruption to you in time, you can shutdown the main ADAM instance and point your website to the replicated instances because they still have good data.
Bottom-line, you can use replicated instances for instantly recovering from the failure scenario, while you are busy building the main instance from the backups. If you have two way replication setup then when you bring the main instance online, the replicated instances will send their deltas back to the main instance (so for example 100 users signed up before you could restore the main instance from backups), these 100 users are only in the replicated instance now. But the moment you bring back the main ADAM instance up, replication will send these users back to the main instance and you will be in sync.