skippy dot net

In Praise of PhotoRec

As I mentioned on Google+, I destroyed my laptop's filesystem. This was entirely due to my own carelessness. As I sat there looking at an unhelpful GRUB error message, I reviewed my options. I knew that I had destroyed the partition table on my laptop, and created two new partitions: one 1GB ext4 partition and one 512MB Linux swap partition. Both had been formatted, meaning that I wouldn't be able to easily get my old partitions back, and even if I could the filesystem would be pretty corrupt.

I could spend some time using other computers in my house to research recovery options. This was a crap shoot at best, and I didn't really expect to be able to get back to a reliable state. I could cut my losses and install a fresh copy of Ubuntu atop a new partition table. Or I could try to tell my wife that the computer was beyond repair and that I'd need to purchase a brand new system.

After several deep breaths, I chose the second option. I quickly started the Ubuntu installation, and sat down to watch some television while waiting for the process to complete. I had performed a mental inventory of the data on my laptop, and felt reasonably comfortable with losing most of what was on there. After a couple of moments, though, I realized that I was losing all of the digital photos of my family. We've taken a lot of photos in the last two years, and only a small portion of those get published to Flickr. I had a tremendous sinking feeling in my gut as I realized the enormity of my loss.

The next day I made an effort to put a good face on things. It's just data. Just pictures. I still had my family, of course, and we could take new pictures. No big deal.

When relating all of the above to Mike, he shared his own digital loss experiences, and mentioned in passing PhotoRec, a "file data recovery software designed to recover lost files including video, documents and archives from hard disks, CD-ROMs, and lost pictures from digital camera memory." It's part of the TestDisk suite, which I saw in passing while briefly investigating my recovery options prior to re-installing Ubuntu. I thought to myself, what the heck: if it can recover some of the files, that's better than nothing.

So last night I booted the Ubuntu live CD, edited /etc/apt/sources.list to ensure that the Universe repository was enabled, and then executed sudo apt-get install testdisk. I installed an empty USB thumb drive, and then invoked sudo photorec /dev/sda, instructing PhotoRec to look at my laptop's entire hard drive.

The entire process was alarmingly simple. By default, PhotoRec finds and recovers a whole lot of file types: tarballs, executables, text files, and more. My first pass with the default options quickly filled the USB stick because it was recovering a lot more than just the JPG files I wanted.

I purged the USB stick and ran PhotoRec again, this time instructing it to only recover JPG files. Again the USB stick quickly filled up! I inserted another stick with twice the capacity and that was filled to capacity. I attached a 500GB USB drive, carved out a 10GB partition -- thinking that that would be more than enough to finish the job -- and even that was filled! So I made a second partition on the USB drive for ~490 GB and let PhotoRec run over night.

This morning, several thousand JPG files had been recovered. PhotoRec can't restore the original file names, but a quick skim through the various directories it creates shows that my photos -- and a lot more -- have been salvaged. Now I can go through the recovered files at my leisure and organize them as necessary for import back into Shotwell on my laptop, or archive to DVD.

A few quick thoughts:

  • PhotoRec does a superb job of dealing with out-of-space situations. It doesn't fail, it simply stops what it's doing and asks you where it should store future files. This allowed me to successively provide additional media without having to start over or duplicate the files recovered. I was thoroughly impressed.
  • PhotoRec recovered a bunch of images from the new installation of Ubutnu, as I expected. But it also found a lot of photos from my prior installation -- the photos I wanted. It also found a number of photos files from my browser cache, which shouldn't really surprise me but for some reason did. And I suspect that some of the files found were from an even earlier installation of Ubuntu that I had installed over some time ago!
  • Data recovery is surprisingly effective. The ease with which this data was recovered has me more convinced than ever that I should start encrypting at least some stuff, and I should definitely be using a secure delete function when possible. For any hard drives I dispose of, it is absolutely imperative that I run them through DBAN first.

I can't praise PhotoRec highly enough.

Aggregating

The value proposition of social media sites like Twitter has always been somewhat vague to me. I've stated before that I'm skeptical of social media, and that I'm not one to jump on social network bandwagons. I recently purged a bunch of people from the list that I follow on Twitter because I wasn't seeing any value to reading what they had to say. There's only so many hours in the day, and I'd prefer not to spend them reading about what other people had for lunch.

I know that part of my problem with aggregating too much information is the workflow I use. I'm extremely linear when I process things: I work from oldest to newest when reading news in Google Reader. It's only in the last couple of months that I've started marking whole categories as read, even if I hadn't read them: "if I'm not reading them, why am I aggregating them?" is the question I ask myself. When I reload the Twitter home page, I scroll down to the last thing I read (or the bottom of the page, if I'm that far behind) and then work my way up. I rarely page back to see items pushed off the home page. I use the Twitter home page because I haven't found a dedicated Twitter client I like.

But the thing that's really stuck in my craw right now is duplication of information. Most of the people I follow on Twitter are also people included in my list of feeds in Google Reader. Whenever someone posts a new blog entry, there's almost always a Twitter message declaring that fact (our software automates this for us). I almost never click the link from the Twitter message to the blog post, knowing that the post will eventually be picked up by Google Reader for me to review. Most of the people I follow on Twitter also tweet enough other stuff to make it worth continuing to follow them on that service. A notable exception is, interestingly, CrunchGear: the overwhelming bulk of the CrunchGear tweets are simply the new posts that have gone online. Since I'm aggregating CrunchGear in Google Reader anyway, what's the value in following them on Twitter?

I could, of course, aggregate the Twitter feed(s), so that Google Reader is my sole source of incoming information. But I've noticed a pretty big lag in Google Reader most days, such that a tweet posted early in the morning by someone might not be displayed to me in Google Reader until mid-afternoon. Most of the time, this might not be a big deal, but every now and again someone will tweet something that merits an immediate response: either a question for which I know the answer, or a request for a recommendation, or even an invitation. These things can be time sensitive, and I'll have missed the window of opportunity if I rely on Google Reader catching them and displaying them to me.

It's this delay that also prevents me from using something like Yahoo Pipes to create some kind of filter to weed out the extraneous bits, so that I can focus on the compelling data from each disparate service I use.

The thought that started this little tirade was the idea that I might integrate my Twitter posts directly into my blog, in a fashion similar to Chris' lifestream. Rather than a dedicated page, though, I would simply grab my tweets and store them as a new Habari content type for display alongside my normal posts. I could then also include my Flickr photos, and whatever else I wanted, making the front page of my site the complete clearinghouse for all my online activities. Then folks could simply aggregate one site to follow what I'm doing.

It's a nice idea, but it fails in execution. In addition to the delays noted above for feed readers acquiring new data, the convenience of replying on Twitter is made more complex: a reader would have to see in my feed what I had posted to Twitter, then go compose their reply either at the Twitter site or in their Twitter client. Similarly for commenting on my blog, or on any Flickr photos I posted: following the lifestream is just one piece of the puzzle. Interacting with the information presented in that stream is the next hurdle.

What do you think? How would you like to simplify and integrate interactions with aggregated information?