The File Server is Back
March 23, 2005 – 11:31 pmFollowing my personal computer triumph over the weekend, I have now declared victory over the file system troubles I’ve been having with our lab’s cluster. Everyone in our lab (plus a couple of people not in our lab) share a 21-node Linux cluster for running our simulations, and the file server (with a 2 TB capacity, as in 2000 GB) went down just under a month ago after an abrupt power loss. Somehow one of the initialization scripts got corrupted, and upon trying to rescue the system, I discovered that one of the RAIDs (each 1 TB in size) was running in degraded mode, meaning one of the drives had already failed. I alluded to all of this in a past post, and fortunately the problem has finally been fixed.
The length of the downtime was almost solely a function of the difficulty of finding an extra 2 TB of storage space to make backups of all of the data on the two RAIDs. I was quite fortunate that my adviser hired one of the guys from ITS as a consultant, and this guy really knew his stuff. I learned a ton about Linux (and about RAIDs in particular) during the process. Good stuff. The cluster is now running at the same capacity it had been when the file server went down (with 3 nodes down for various reasons). I am now looking into repairing those 3 damaged nodes, and I will be making a trip to a handful of electronics stores throughout southern California tomorrow in a search for various parts to build two more file servers (each with an approximate 1 TB capacity). We (my adviser and I) learned several lessons as a result of this fiasco, so we’re making some moves to prevent this sort of downtime in the future.
I’ll try to document my construction of the file servers and publish them to the web (sending notifications to the various Linux newsgroups and such) so others in the future feel more comfortable doing this themselves. My appreciation for how nice Linux is in a server environment blossomed even further during this experience. Again, good stuff.
4 Responses to “The File Server is Back”
One might go so far as to say that you “own” linux, or eqivalently, that linux is “owned” by you.
By Kristján on Mar 24, 2005 at 12:52 am
*droooool* Terabyte *droooool*
MMMMMmmmm… electronis stores…
By Adam on Mar 24, 2005 at 1:05 am
Yeah, I was thinking of sending Linus Torvalds an email, just in case he was curious, that I did in fact currently “own” Linux. He may have an interest in such a development.
I must echo the sentiments expressed at the end of one of Kristjan’s previous comments.
By jjk on Mar 24, 2005 at 1:36 am
Nice work! I had been idly thinking of turning out apartment file server (not quite a TB, but still) into a RAID array, so I’ll know who to speak to if we ever go ahead with that!
By paul.za on Mar 25, 2005 at 6:20 pm