I am considering a ZFS/iSCSI based architecture for a HA/scale-out/shared-nothing database platform running on wimpy nodes of plain PC hardware and running FreeBSD 9.
Will it work? What are possible drawbacks?
Architecture
Storage nodes have direct attached cheap SATA/SAS drives. Each disk is exported as a separate iSCSI LUN. Note that no RAID (neither HW nor SW), partitioning, volume management or anything like that is involved at this layer. Just 1 LUN per physical disk.
Database nodes run ZFS. A ZFS mirrored vdev is created from iSCSI LUNs from 3 different storage nodes. A ZFS pool is created on top of the vdev, and within that a filesystem which in turn backs a database.
When a disk or a storage node fails, the respective ZFS vdev will continue to operate in degraded mode (but still have 2 mirrored disks). A different (new) disk is assigned to the vdev to replace the failed disk or storage node. ZFS resilvering takes place. A failed storage node or disk is always completely recycled should it become available again.
When a database node fails, the LUNs previsouly used by that node are free. A new database node is booted, which recreates the ZFS vdev/pool from the LUNs the failed database node left over. There is no need for database level replication for high-availability reasons.
Possible Issues
How to detect the degradion of the vdev? Check every 5s? Any notification mechnism available with ZFS?
Is it even possible to recreate a new pool from existing LUNs making up a vdev? Any traps?
Comments
Post a Comment