Fun with SCSI tapes on Linux

I inherited an LTO-4 SCSI-attached tape drive recently (yes, old-school, I know, it's mostly just for fun), and have been fiddling with it to do some tape-based offsite backups (to complement my other offsite backups).  In doing so, I learned a bunch of things about the SCSI protocol (including its many many tendrils and offshoots), and about handy tools for interacting with SCSI devices.  Some of these things may be useful to others, so I'm putting them here.  Enjoy!

Talking to the tape drive natively is trivial; it just showed up as /dev/nst0 (st for "SCSI Tape", n for non-rewinding; /dev/st0 will automatically rewind after every operation, which can be a bit annoying at times).  However, I want encrypted backups, because my offsite tapes will spend a non-trivial amount of their life in places other than my house.  While I can afford to lose one or more of these (there will be many copies), I'd rather random strangers weren't able to trivially restore all my precious data and rifle through my digital belongings.  Yes, I'm aware of the slight silliness of being worried that a random stranger might have even a first clue what to do with an LTO-4 tape, let alone have access to a functioning LTO-4/5 tape drive.  But still, the principle is there.

Now I can roll my own encryption with openssl (surprisingly easy, basically pipe tar through openssl enc -aes256 and redirect the output of that to the tape drive; it prompts for a passphrase to derive a key from, and then just gets on with the job), but I was vaguely aware that some LTO tape drives could do encryption.  Some googling convinced me that any LTO-4 tape drive should be able to.  Incidentally, I was wrong, it's an optional feature (sort of - see https://en.wikipedia.org/wiki/Linear_Tape-Open#Encryption).  It seemed like a handy thing to try and achieve though; it would require no CPU on the backup machine, and should in theory be able to happen at wire speed, where for some reason putting openssl in my backup pipeline causes throughput to drop.

The next question was "how?".  There's surprisingly little information about it, but what there was suggested stenc was the software package I needed.  It's not available in Ubuntu standard package repositories, so I had to download and compile it.  If you find yourself needing this, here's some tips:

  • If you just clone the git repo from https://github.com/scsitape/stenc then you're gonna have to figure out how to get it to create the configure script to use to create a Makefile so you can build it.  I couldn't (although I didn't try long).  My naive invocations of autoconf and automake couldn't get there, and it wasn't clearly documented.
  • The easier way is to go to https://github.com/scsitape/stenc/releases instead and download the source archive.  This has been prepared for building.  You can run ./configure then make then optionally make install
  • The author of stenc, while a splendid person for releasing the code, is a white-space monster.  The indenting is a mix of tabs and spaces (not consistently so), and only formats nicely with tabstop set to 8.

Having compile stenc, I run it.  It fails:

Sense Code: Illegal Request (0x05)
ASC: 0x24
ASCQ: 0x00

Running with --detail shows it's actually talking to my tape drive and knows what model it is, but still with the Illegal Request.  To be honest, I searched around a bit, couldn't find much at all, let alone a solution, and gave up for a week.  Coming back to it a week later I was refreshed.  Checking the code for stenc (the reason I love opensource so much is being able to go to the source when necessary) it turns out there's a compile-time option to spit out all the SCSI commands/responses.  I was delighted; there's very little I like more than turning on debugging and gleaning clues from whatever torrent of data it spits out.  So, I run ./configure --with-scsi-debug; make clean; make and try again.  The command it was failing on was the ever delightful:

a22000200000000020040000

The stenc code gave me a mere hint of the structure of this block of bytes, with a couple of slightly helpfully name constants, and some hard coded numbers (zeros, and 0x20).  More helpfully, it gave me the acronynm SSP which (checking stenc docs) I learned stands for SCSI Security Protocol.  SPIN was also relevant.  It took a bit of searching to find the following gem of a document: https://www.seagate.com/staticfiles/support/disc/manuals/Interface%20manuals/100293068c.pdf [1] the SCSI Commands Reference Manual, 446 pages of nerd delight, that doesn't seem to be Seagate specific, they just happen to host the docs.  Command code 0xA2 is the SECURITY PROTOCOL IN command.  This explains the spin_ prefix in some of the code.  http://www.t10.org/lists/asc-num.htm tells me that ASC 0x24, ASCQ 0x00 means Invalid code in CDB.  CDB is the command block, so one of the bytes in a22000200000000020040000 is 'wrong'. 

How to experiment from here?  Recompiling stenc to try and change the commands seems like an annoying proposition.  I wondered if there was a way to send arbitrary SCSI commands to a SCSI device, from a linux command line.  Turns out there is; it's the sg_raw program, from (on Ubuntu) the sg3-utils package.  Rapture!  Delight!  I run this:

sg_raw /dev/nst0 a2 20 00 20 00 00 00 00 20 04 00 00

And it tells me:

SCSI Status: Check Condition

Sense Information:
Fixed format, current; Sense key: Illegal Request
Additional sense: Invalid field in cdb
Field replaceable unit code: 48
Sense Key Specific: Error in Command: byte 2

Well that's handy.  Some tips:

  • sg_raw is much better at translating errors, and also seems to have some other information that tells me it's byte 2 that's at fault.
  • From what I learned later, it appears the byte count for '2' is 1-based, not 0-based like you might expect.  It's the 0x20 that's the problem, not the first 0x00.  Why do these people do this?  WHY????
  • Don't run sg_raw on an active device like a SCSI disk drive, or a tape drive that's actively in use.  You will screw things up for whatever thinks it's in control of the state of the device.  But on a tape drive that's doing nothing?  Sure, go for it.  Probably worth being careful which commands you send (don't send random bytes and expect a good result), but I imagine it's pretty hard to break it in a way that coulnd't be resolved by a reboot or cold power off/on, as long as you're generally paying attention.  It's probably a very good way to learn how SCSI works.

This all gave me some more clues to plug into a search engine, which got me to https://www.veritas.com/support/en_US/article.100037886.  I could probably have saved a lot of time by finding this page a week earlier, but no matter.  This suggested I might like to run:

sg_raw /dev/nst0 -r 44 a2 00 00 00 00 00 00 01 00 00 00 00

which I did, giving:

SCSI Status: Good

Received 9 bytes of data:
00 00 00 00 00 00 00 00 01 00

The Veritas page then says that the 7th/8th bytes (00 01) means it supports 1 page, and the 9th byte (0x00) means it only supports page 00h.  If my tape drive actually supported encryption, the response would be longer, the 01 would be at least 02, and the bytes after that contain an 0x20, indicating it supports page 02x20, which is the SPIN/SPOUT capability.

Tips:

  • The 44 is fairly arbitrary, and just needs to be longer than the response we're expecting (maybe 44 is the longest it can ever be, I don't know for sure, and haven't looked) 
  • The allocation length in the command (00010000) is just a "very large" number compared to what we expect to get back.  It could actually be as low as 00000044 and this would all work.  I think.

I now have some fairly good proof that my tape drive doesn't natively support transparent magical encryption.  I sort of wish stenc could have told me this, rather than just giving me sense errors and Illegal Request messages.  It would have saved me some time, but then I wouldn't have learned as much, so it's not a complete loss.  Some further internet searching reveals that my model of IBM Tape Drive only supports 'Application Managed Encryption' in the SAS connection form factor, and mine connects by Ultra160 SCSI.  This seems arbitrary to me, but I'm sure there's a good reason for it (hah).

So that was my journey.  I learned about some fun tools (sg_raw) and got a lot more comfortable with SCSI in general. 

Oh, also, the SCSI command reference mentioned Security Protocol 0x41: IKEv2-SCSI.  I thought it might have been a co-incidental name collision with the IKE from IPsec, but no, it turns out it's IKEv2 adapted for SCSI (see http://www.t10.org/ftp/t10/document.06/06-449r5.pdf)  It is, quite literally, IKEv2 from RF4306, adapted to SCSI, to provide transport-encryption of your SCSI bus.  Before today, I had no idea this could even possibly be a thing.  I'm not sure if I should be delighted, or horrified, but I'm tending towards the latter.  That might just be my IPsec experience speaking, mainly the horror of interop between heterogenous endpoints.  Oh well.  The more you learn, the more you realise you know so little.

[1] Update 2020-06-20: The link is now https://www.seagate.com/files/staticfiles/support/docs/manual/Interface%..., and 518 pages; looks like they update it occasionally and increment the last letter (was 'c', now 'j')