DIY Network-Attached Storage on Linux

It’s nice to have a NAS at home for file sharing and automated backup, but commercial NASes are expensive for their performance, and not as flexible as a real PC.

This is a guide (mostly as a memo for myself) to build a Linux-based NAS from scratch, both hardware and software, and assumes the reader has some understanding of PCs (knows how to assemble them, or select a pre-built one with specifications), and at least some Linux knowledge.

The end result will be a NAS that is accessible over CIFS (“Network Neighbourhood”) and SFTP (useful for accessing over the internet), provides snapshot incremental backups (time travel!!), and runs on some kind of RAID.

There are nice commercial units that can do all those things, such as the Synology DS-211j, but they are slow for their price, and you can’t re-purpose them after end of life, or make it do other non-NAS things (running game server, etc).

However, if like me, you have some time on your hands, and skills, you can build a very high performance NAS for cheap. Even better if you have an old PC lying around – it could be free!

Hardware

If you have a PC lying around, with SATA ports, you are all set. IDE is possible, too, but then assuming you don’t have big enough IDE drives, you’ll have to buy them. And I don’t recommend buying IDE drives, because they are obsolete already, and no new computer can use those drives. If you do already have huge IDE drives, then by all means.

Otherwise, if you are like me, without a usable old computer (the P3 I’ve been using as a NAS for the past 5 years just decided to commit suicide, though of course, I was able to get all the data out), it’s shopping time!

Harddrives

First of all, you’ll need to decide how many harddrives you want, which is determined by how much space you need, and what level of redundancy do you want (http://en.wikipedia.org/wiki/RAID). All standard RAID levels are supported by Linux’s software RAID implementation, which is what we will be using. I don’t recommend going with a single-drive system, because if the harddrive fails, and I’ve personally had 3 fail on me, it doesn’t matter how much backup you have on that harddrive…

Speed also doesn’t matter too much, because network speed will most likely be the bottleneck. Definitely don’t use SSDs in a NAS. That’s epic waste of money for nothing.

In this example, I will be using 2x1TB WD Caviar Green drives I had in my old server, in RAID-1 configuration.

Motherboard + CPU + RAM + PSU

I recommend an Intel Atom (as of this writing), because CPU speed is not important unless you are doing RAID-5 (in which case it’s marginally important). Make sure the motherboard has enough SATA ports for your setup. I am using Intel D525MW. Intel boards are known for their reliability, and I really like the fanless design, because fans do fail once in a while, and heatsinks never fail. Fans also increase dust build-up, increasing need for maintenance.

Memory is also not very important. 256MB will probably do, but you can’t buy anything that small nowadays. I’ll be using 2x2GB DDR3 sticks from my old laptop (D525MW uses laptop RAM). More free memory for caching will increase performance somewhat, because Linux will cache your most used files in RAM, and serve them lightning fast when you need them again, but again, not so much in a NAS, because network is relatively slow, though it should help if you need to access many small files.

Power supply is important. Pick one with good reviews. Anything higher than 200W will do, unless you are planning to have a 20 HD array or something, in which case you wouldn’t be reading this article because you’ll know what you are doing more than I do already. PSU is the one thing you shouldn’t cheap out on!!! A good PSU fails by shutting itself down. A bad PSU fails by sending 200V into your components, and optionally set your house on fire. You want the former. Anything from Antec, Corsair, or Seasonic should be fine, but check reviews online. I will be using a Corsair 430W (CMPSU-430CXV2).

If the motherboard you picked doesn’t have gigabit ethernet, you’ll definitely want to pick-up a gigabit card. They are very cheap nowadays and will make your NAS 5-6x faster. Intel cards are known to work very well in Linux, and are very fast.

Total cost I got (NCIX with aggressive price matching):

  1. D525MW – $79
  2. Corsair 430W PSU – $40
  3. 4GB USB stick to install the OS on (to keep things simpler, I like to have my data drives dedicated to data, and have the OS somewhere else) – $5

Total is $125 without harddrives, but this depends on what hardware you have lying around that you can reuse.

Network

If your home network is already gigabit, you are all set. Otherwise, you’ll have to decide if you want to upgrade or not. With 10/100mbps network, typical large file transfer performance is around 10MB/s. On a gigabit network 50+MB/s is typical, and 70-80MB/s is definitely attainable with some optimization. It’s up to you, but I would definitely try getting a 1gbps network running.

To run a 1gbps network, you need

  • 1gbps card on all computers that you want 1gbps on. It’s possible to have a mixed environment, and communication between 1gbps computers will be 1gbps.
  • Cat 6/5e cables. They look and work exactly like regular Cat 5 network cables, except they are built to higher standards to guarantee 1/10gbps operation. If you are lucky, 5 will work at 1gbps, too, but that’s not recommended and may not be reliable, though sometimes you don’t have a choice (if the cable is in walls).
  • A gigabit router or switch. For some reason, gigabit wireless routers are still very expensive, even though gigabit switches are dirt cheap. If you already have a 10/100 router, you can just add a gigabit switch behind it, and connect all your computers to the switch. This way, you’ll have a 10/100 internet connection (which is fine, unless you have 100+mbps internet), and 1gbps within your LAN.

My victim:

IMG_20111215_204123

Setup overview

I initially planned to run FreeNAS (an open source NAS system based on FreeBSD), but after evaluating it in a virtual machine, I don’t think I really want to trust it with my data, yet. It just underwent a big rewrite after getting taken over by a company, is in a huge mess, and a bug as big as “email subsystem doesn’t work at all” slipped past their QA and into the stable 8.0 release. I also encountered a few bugs in just 10 minutes of testing.

I decided to go back to good old Linux instead. For our purpose, practically any distribution will work. I picked Ubuntu Server 10.04 LTS because I’ve had some experience with it before, and LTS status means it will be supported till 2015 (updates, etc). This guide should be fairly independent of distro.

It will be installed on a bootable 4GB USB drive that’s permanently plugged in, and the 2 1TB data drives will be in RAID-1, providing a total of 1TB space. It will be divided into 2 CIFS/Samba/”Network Neighbourhood” shares, 1 for my HD porn collection stuff, and one for my parents’ office documents.

All the data will be stored on a ext4 partition, on the RAID volume. Snapshot backup will be set up to provide rotating backup every few hours using the rsnapshot utility (based on rsync and hardlinks).

There’s an advanced filesystem called ZFS that has built-in RAID and snapshots, but Linux support for ZFS is only through a barely maintained FUSE driver, so probably not a good idea for a production system. Btrfs is Linux’s answer to ZFS that has most of the same functions, but it’s still experimental, so also no. ReiserFS 4 was pretty promising, too, until the main developer got thrown into jail for murdering his wife… Ext4 is pretty fast, well tested, and well supported (filesystem utilities) in case something bad happens. If you will mostly store huge files (disk images, movies), XFS may give you higher performance.

Software Setup

After assembling your victim, and installing your Linux distro of choice (just do a basic install, we will do all the drive preparation and partitioning later)… At this time you’ll need a monitor and keyboard attached to the NAS.

Network setup
First we have to set up networking. If you didn’t set it up during installation, it probably defaults to DHCP. DHCP is bad because that means your server’s IP will change all the time. You can set static IP in /etc/network/interfaces (on Debian/Ubuntu at least), and remember to set your router to exclude that IP from the DHCP pool, otherwise it may hand out this IP to another computer, and bad things will happen.

Here is my /etc/network/interfaces (note the weird address for router. I have a weird router. Yours is probably at .1)

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
#iface eth0 inet static
iface eth0 inet static
address 192.168.1.50
netmask 255.255.255.0
gateway 192.168.1.254

Then do

ifdown eth0
ifup eth0

to reset the network interface (if you are doing this over ssh, make sure you type them in one line… “ifdown eth0 && ifup eth0″ for obvious reasons).

Make sure the interface is running at 1000 mbps, full duplex

matthew@nas:~$ dmesg|grep eth0
[ 1.713278] e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
[ 15.115988] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 15.145697] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 15.146953] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 25.840015] eth0: no IPv6 routers present
[ 229.880116] e1000: eth0 NIC Link is Down
[ 323.301869] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX

Everything from this point on can be done over the network with SSH!

Disk preparation

Since the disks are brand new (or only has data you don’t care about), now would be a good time to do a destructive read-write test to make sure the media is good. This is especially important if you are using old harddrives. Not required, just recommended.

In my case, my data drives are /dev/sda and /dev/sdb

badblocks -w -v -s /dev/sda
badblocks -w -v -s /dev/sdb

They can be run at the same time. Will take a few hours (7 hr for my 1TB) depending on your drives. Make sure no errors are reported.

Then, we can partition the data drives. They will each get a huge partition with RAID type. It’s a special partition type that tells mdadm (Linux’s RAID manager) the partition is part of an array.

Any Linux partitioning program will do. fdisk, cfdisk, parted, etc. Note that if you have an “Advanced Format” (4K sectors) disk like I do, you’ll want to make sure the program you use properly aligns your partition to multiple of 8 sectors, since AF drives lie to the OS that they have 512 bytes sectors for backward compatibility (http://wdc.custhelp.com/app/answers/detail/a_id/5655/~/how-to-install-a-wd-advanced-format-drive-on-a-non-windows-operating-system).

matthew@nas:~$ sudo fdisk /dev/sda
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xcacafadd.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won’t be recoverable.

Warning: invalid flag 0×0000 of partition table 4 will be corrected by w(rite)

WARNING: DOS-compatible mode is deprecated. It’s strongly recommended to
switch off the mode (command ‘c’) and change display units to
sectors (command ‘u’).

Command (m for help): u
Changing display/entry units to sectors

Command (m for help): p

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xcacafadd

Device Boot Start End Blocks Id System

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First sector (63-1953525167, default 63): 64
Last sector, +sectors or +size{K,M,G} (64-1953525167, default 1953525167): +1953525096

Command (m for help): p

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xcacafadd

Device Boot Start End Blocks Id System
/dev/sda1 64 1953525160 976762548+ 83 Linux

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): FD
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Same for the other disk.

Then we can finally create the RAID array -

sudo mdadm –create /dev/md0 –level=mirror –raid-devices=2 /dev/sda1 /dev/sdb1

mdadm: array /dev/md0 started.

It will take a while to rebuild, but the array is usable in the mean time. It will just use idle IO bandwidth to rebuild.

“mdadm –detail /dev/md0″ will tell you all you need to know about your array.

matthew@nas:~$ sudo mdadm –detail /dev/md0

/dev/md0:

Version : 00.90

Creation Time : Fri Dec 16 13:02:58 2011

Raid Level : raid1

Array Size : 976762432 (931.51 GiB 1000.20 GB)

Used Dev Size : 976762432 (931.51 GiB 1000.20 GB)

Raid Devices : 2

Total Devices : 2

Preferred Minor : 0

Persistence : Superblock is persistent

Update Time : Fri Dec 16 13:03:05 2011

State : active, resyncing

Active Devices : 2

Working Devices : 2

Failed Devices : 0

Spare Devices : 0

Rebuild Status : 0% complete

UUID : 20389aa5:944e1839:c7780c0e:bc15422d (local to host nas)

Events : 0.3

Number   Major   Minor   RaidDevice State

0       8        1        0      active sync   /dev/sda1

1       8       17        1      active sync   /dev/sdb1

In this case, the array is still doing the initial sync.

Then,

root@nas:~# mdadm -Es >> /etc/mdadm/mdadm.conf

To make sure the array will be automatically recreated on reboot.

Make sure performance from the array is reasonable -

root@nas:~# hdparm -Tt /dev/md0

/dev/md0:

Timing cached reads:   1776 MB in  2.00 seconds = 888.24 MB/sec

Timing buffered disk reads:  284 MB in  3.02 seconds =  94.10 MB/sec

Create an EXT4 (or your choice of) filesystem on the RAID device, and set journal to writeback (higher performance, potential loss of recent data on power loss)

root@nas:~# mkfs.ext4 /dev/md0

root@nas:/data# tune2fs -o journal_data_writeback /dev/md0

Add the mount point to /etc/fstab

/dev/md0        /data           ext4    noatime,noexec,data=writeback   0       2

You’ll probably also want to set up mdadm to email you when one of your disks failed -

http://ubuntuforums.org/showthread.php?t=1185134

Sharing

Then we can set up the CIFS shares. Assuming we have 2 directories, /data/user1 and /data/user2, to be shared to user1 and user2 respectively.

First we need to create the users -

adduser user1

adduser user2

Then make them own the data directories

chown -R user1:users /data/user1

chown -R user2:users /data/user2

And in /etc/samba/smb.conf, add

security = user

And

[user1]

path = /data/user1

browseable = yes

read only = no

create mask = 0700

directory mask = 0700

valid users = user1

For each user.

Then the share should be accessible from Windows and Linux. From Windows, it will be “\\[server ip]\user1″, and can be mapped to a network drive.

If the performance is not satisfactory, there are many samba settings you can try, but it becomes a bit of black art, so I won’t cover that here. I get about 70-80MB/s (both reading and writing) out of the box, sending a big iso file from my computer, so I’m not going to bother optimizing it. Theoretical maximum is 125MB/s on gigabit, but with protocol overheads, etc, the practical maximum is probably somewhere around 90MB/s. The defaults are pretty good.

Snapshot backup

Snapshot backups allow you to basically “look back in time”, and see a copy of everything an hour ago, 2 hours ago, 2 days ago, etc. Obviously, the naive approach would take way too much space.

However, most files will be the same between snapshots, and we can exploit it to save space. For files that aren’t changed, we just need to hardlink them to the original (a hard linked file looks and feels exactly like a duplicate of the original, except they actually refer to the same data on harddrive). For big files where only small parts are changed, rsync allows us to only store differences.

This way, for example, when you look into the hourly backup directory, you’ll see a directory for 1 hour ago, a directory for 2 hours ago, etc, and while they all look like real hard copies, they only take up the space of differences between then and now.

This sounds hard. And it is, if you have to implement this yourself. Fortunately, there’s a program called rsnapshot that will handle everything for you.

So install rsnapshot using your favourite method (apt-get on Debian/Ubuntu), and edit /etc/rsnapshot.conf to set up your backup. The file is very straight forward, so I won’t talk about that.

It’s better to do the backups to an external drive or another machine, but I’m just putting it on my /data partition. Remember to use the “exclude” option to exclude things you don’t want backed up (disk images, etc). I personally use a “nobackup” directory under my share to store files I don’t want backed up. In my case, I have 384GB in total, but only 11GB or so are really important and need to be backed up.

The backup directories can be shared via CIFS if you want. Though it’s probably a good idea to make that read-only, since writing to a snapshot will corrupt other snapshots (because they are all linked). Can’t change history…

In the end, you’ll want to have cron execute the backup jobs automatically. For example, add to /etc/crontab -

0 */4 * * *       /usr/local/bin/rsnapshot hourly

30 23 * * *       /usr/local/bin/rsnapshot daily

This is for hourly = every 4 hours, and keeping daily snapshots.

That’s it! Next time you accidentally delete or change something, just go pick it up from the last snapshot. The maximum amount of work you’ll lose depends on the interval setting. In my case, that’s 4 hours.

Internet access

On the server side, nothing needs to be done (except installing OpenSSH, if it’s not installed by default and you haven’t installed it). Internet access can be done over SSH (SFTP). You’ll also need to set up port forwarding in your router (port 22, TCP), and maybe get a dynamic domain name (eg. dyndns.org) so you don’t have to remember your IP.

On Linux, sshfs can be used to mount a remote filesystem – “sshfs username@yourserver:/data/user1 mount_point”

On Windows, there is a commercial program called ExpanDrive that’s pretty good but also pretty expensive (has trial). Any SFTP client will do.

On Mac, I have no idea. Sorry.

It’s possible to use SSH over LAN, too, and not worry about Samba. However, from my testing, SSH performance on gigabit network is very bad, probably due to the mandatory encryption, especially on low power CPU. That doesn’t matter over the Internet because you’ll be limited by Internet speed anyways. Samba, on the other hand, is not suitable for internet, because it’s very latency-sensitive (small packets, wait for ack, etc).

Not covered: SMART monitoring. I personally find it pretty useless because false positives and false negatives are both too high. I just rely on the RAID for hardware integrity. If you want to use it, Googling “smartmontools” would be a good start. Also – offsite backups. You’ll want to do that for very important data, but not practical in my case.

That’s all! Happy storing! (until you receive an email telling you a disk is failing)

2 thoughts on “DIY Network-Attached Storage on Linux”

  1. Nice..
    Only thing I’d do differently is to set up an ftp server in it too, instead of using ssh to mount a remote disk.

    1. FTP will definitely be faster for transferring large files. I chose SSH because I don’t really use it from the internet much, and SSH requires no set up, and offers robust encryption, (optional) compression, and authentication.

Comments are closed.