Crashplan: Part Two

Since my last post about Crashplan there have been some developments.

In a nutshell (and, like before, I'll start with the conclusion for those that want the Cliff Notes version, and then provide the details after that) Code 42 (the developers who produce Crashplan):
  • Continued to ignore my concerns and questions until after I'd written the blog post, and started linking to it from the comment sections of various online reviews of the software.
  • Repeatedly claimed that I've got facts wrong, when I'm just reiterating what their support engineer told me.
  • Changed their story about what the root cause of the problem was.
  • Denigrated their competitors in their replies dealing with this topic.
  • Silently deleted comments from their support forum
  • Continued to refuse to answer questions posed by customers (myself, and others)
To understand what happened you have to follow three distinct lines of communication that opened after I'd written the original post.

In the first, Matthew Dornquast, who identifies himself as "one of the developers/founding partners of Code 42 Software" opened a new Crashplan support ticket with me, and started a discussion there.

In the second, "Brian" starts a thread on Crashplan's support forums, "Silently corrupted data" which links to my blog post.  This attracts numerous comments.

And in the third, several people leave comments under my original blog post.  I notice these somewhat after the fact, as I didn't have the "Send me e-mail when people comment" option turned on in Blogger.  That's now remedied.

What follows is the unedited exchange between myself and Matthew in the new support ticket, along with a description and links to relevant entries in the Crashplan support forum thread.

One thing I want to call out up-front.  You'll note that midway through the exchange between myself and Matthew he asks that I keep his comments private, and not repost them.  I have chosen not to respect this requst for several reasons.

First, in the support thread there was concern expressed several times that I may have been paraphrasing what Code 42 staff had said.  I want there to be no doubt that this a full, unedited copy of the conversation.

Second, I don't believe it's appropriate for someone to say something and then note, after the fact, "Oh, by the way, that was off the record".  If you want to have those sorts of conversations then clearly establish that up front, not half way through.

Third, Matthew's comments are clearly intended to represent Code 42 -- he's writing in his professional capacity, not a private one.

Fourth, Matthew's comments in private contradict what another Code 42 employee wrote and that had already been quoted.

So, with that said, Matthew's first message, sent on March 30th, is as follows (misspellings and other grammar oddities retained):
Hi Nik,

I'm one of the developers/founding partners of Code 42 Software, we make CrashPlan. I read your blog post and saw several inaccuracies based on your interaction with one of our support agents. I wanted to track down the source, so I re-read the correspondence between our support staff and yourself. As I suspected, it's a case of miscommunication, we're to blame.

Rather than follow up on your blog post directly, I'd rather write you in hopes you can revise/clean up your initial blog post. I'm not trying to do damage control, you can write what you want. What I'm trying to do is share facts with you in hopes your personal trust in CrashPlan is restored.

If you have any questions about what I've written below, please give me a call! I'd love to chat with you. It's much faster that way- [number redacted -- Nik].
If you can agree with my points below, I'd appreciate you updating your blog entry to properly reflect the situation.. what you saw is real, but the cause is misdirected and frankly, you're kicking dirt on the very feature that makes us better than everyone else! :-)

Here are my points with your blog post.

re:"Previous versions of Crashplan have silently corrupted data that has been backed up."

This is absolutely not true. CrashPlan does not have a bug that silently corrupts data. Previous version of CrashPlan failed to detect all types of corruption that occur "in the wild" – bad disks, corrupt volume info, rebooting a machine instead of cleanly shutting down, etc. Again - CrashPlan did NOT corrupt your data, it just failed to detect the TYPE of corruption that had occurred. If we had corruption in our product, all your archives would have been corrupted. (And frankly, yours wouldn't be the ONLY negative post EVER on the internet.) This misunderstanding isn't your fault - it's ours. (Read my comments on support ticket below) The support engineer on our side was not communicating well.

Not only are we the only company that's so paranoid we're verifying your data at destinations (and trying to heal!) but we're also one of the few that verify every aspect of the restore as well! That part worked - it logged each and every file that didn't work. It's another failsafe we have. Other backup products don't verify the integrity of the restores! We actually detected and communicated to you the issues. Most products just write them out. Imagine if we'd just written out all your data - you would have trusted it worked. Months, maybe never, you would have attributed those few files failure to something else. Verifying everything at the mathematical level we do is expensive and time consuming. It took a lot of engineering to do.

re:"The team at Crashplan are aware of this. More recent versions of the software do not have this problem."

This statement is false because the first one is. The software does not have a backup issue, and does not corrupt data. The team at CrashPlan is aware of the fact we've improved our healing technology to include scenarios we otherwise had not considered.

re:"However, more recent versions of the software do not fix, acknowledge, or in any way indicate that some of the files in the backup are corrupt."

That's actually not true either. Every version of our software fixes, acknowledges, and repairs files that are corrupt. Do they detect, repair and fix every conceivable corruption possible? We can't say that for certain. What we can say for certain is every day we heal hundreds of thousands of issues discovered do to bad disks, reboots, corrupted file systems, etc. No other backup product works as hard as CrashPlan to make the inherently unreliable, reliable.

re:"Crashplan support appear to wholly unconcerned with this in a manner that means I no longer have faith in the product or their support. I leave you to determine the course of action that's right for you."

I agree with this if you limit it to the one agent. It's unfair to say our entire support group isn't concerned. (Hey, I'm writing you!) The reality is, there are a lot of agents here. Some new, some experienced. You got a new agent that did not properly recognize the seriousness of your event. He should have escalated this internally. This misunderstanding could have been avoided. I apologize for that. I would like to point out that we hire really great people, and we haven't outsourced our support overseas like our two main competitors have. Typically, we receive accolades for our support. Your experience (which is absolutely valid) is the exception, not the rule.

Here are my points on your support ticket with us. (Sorry for verbosity, I'm just talking informally.)

1. Mis communication by support engineer.

There was a misunderstanding as to what the source/cause was. CrashPlan is not the cause of the corruption, however it failed to heal around it. Healing around every possible situation is difficult to imagine, every release we have improves this healing. The corruption that occurred was not due to crashplan, it's just because it was a really old version, it had not been healed. The proof in what I'm saying is look at your other archive, it was fine! Your drive, your computer, the cable, something caused data corruption. It was NOT crashplan. Again, we just failed to discover and heal around that particular form of corruption that affected a very small % of the archive. I'm not making light of it, we want to be bulletproof on our healing technology. It's been improved many times since the issue you faced occurred. I strongly suspect you have a corrupt VTOC on that drive, have you run a disk repair utility on it?

2. You would not be able to reproduce the situation unless you reproduced the failure at the exact same moment. (i.e. disconnected drive, corrupted a block of data, etc.) Since the corruption was very small, my guess is it was a perfectly timed reboot as it was writing to drive. There is a good chance if you check integrity of the filesystem on that disk, there are issues.

3. I don't feel our support engineer properly conveyed a sense of urgency around this issue. My guess is he felt it wasn't as big of a deal as you had another destination and you had the data, but that doesn't make it any less serious that if it were your only source. Our agent should have treated this with a greater sense of urgency and spent more time explaining the details of this to you. Your faith in CrashPlan was unnecessarily shaken.

CrashPlan is the only product that automatically verifies destinations and attempts to heal around issues discovered through bad hardware (i.e. disks, disconnected cables, etc.) We've learned a lot over the last 3 years, we're continuously improving the feature. Please don't confuse the failure of a defensive feature with the core backup/restore engine. Our engine is 100% solid. Unfortunately, you had some corruption from an older backup that we did not heal around. You can't reproduce this, as we improved the way we store & verify data several times since then.

In summary - I'm sorry you had to post a blog entry to get the attention this deserves. I always tell support - "support is our marketing. Each person has the power to undo years of hard work." Already, your blog entry was linked to a recommendation, where now the guy might use something else other than us.. which I believe is the wrong call. This person wont get a more reliable product than CrashPlan. Who else supports multiple destinations, multiple levels of integrity checks, and attempts to heal around any and all corruption automatically?
I send the following reply on April 13th:
Matthew,

Sorry it's taken some time to reply to you -- a combination of training courses and vacation have left me away from the computer for an extended period of time.

On 30 March 2010 17:01, CrashPlan Support wrote:

Here are my points with your blog post.


re:"Previous versions of Crashplan have silently corrupted data that has been backed up."


This is absolutely not true. CrashPlan does not have a bug that silently corrupts data. Previous version of CrashPlan failed to detect all types of corruption that occur "in the wild" – bad disks, corrupt volume info, rebooting a machine instead of cleanly shutting down, etc. Again - CrashPlan did NOT corrupt your data, it just failed to detect the TYPE of corruption that had occurred.
This is semantics. It doesn't matter if it's the raw data that's corrupt or the checksum that is corrupt (and/or computed incorrectly).

It's also at complete odds with what your support engineer wrote.
re:"However, more recent versions of the software do not fix, acknowledge, or in any way indicate that some of the files in the backup are corrupt."


That's actually not true either. Every version of our software fixes, acknowledges, and repairs files that are corrupt. Do they detect, repair and fix every conceivable corruption possible? We can't say that for certain. What we can say for certain is every day we heal hundreds of thousands of issues discovered do to bad disks, reboots, corrupted file systems, etc.
You go out of your way to say this for certain on the Crashplan website. To quote this text again:

Once your files are backed up, CrashPlan continuously checks that your files are 100% healthy and ready to restore when you need them. If it finds any problems, CrashPlan fixes them. 
Not "99.9x% healthy".

Not "If it finds some problems".

If you can't stand by these statements in private then don't make them in public. And I don't expect to find out that files are unrestorable at the point when I do the restore -- note that the affected files date from 2008 and have not been modified since then, so there's been plenty of time for Crashplan's "continuous" file check to determine that they're not restorable and alert me.

To be clear -- software has bugs, hardware is not error free, I know this. However, if your promotional material tells me that using Crashplan means I don't have to worry about performing test restores then I'm going to be upset as a customer if it turns out that I have to.
Here are my points on your support ticket with us. (Sorry for verbosity, I'm just talking informally.)

1. Mis communication by support engineer.



There was a misunderstanding as to what the source/cause was. CrashPlan is not the cause of the corruption, however it failed to heal around it. Healing around every possible situation is difficult to imagine, every release we have improves this healing. The corruption that occurred was not due to crashplan, it's just because it was a really old version, it had not been healed.


The proof in what I'm saying is look at your other archive, it was fine!
No -- the other archive was never tested, I just copied the original files from the original PC.

Since the other (remote) archive was created by unplugging the USB drive that contained the corrupt archive and physically handing it to the person who hosts the remote archive, whereupon they imported it in to their Crashplan instance I don't see that there's any evidence for making any claims, positive or negative, about the health of the remote archive.
Your drive, your computer, the cable, something caused data corruption. It was NOT crashplan. Again, we just failed to discover and heal around that particular form of corruption that affected a very small % of the archive. I'm not making light of it, we want to be bulletproof on our healing technology. It's been improved many times since the issue you faced occurred. I strongly suspect you have a corrupt VTOC on that drive, have you run a disk repair utility on it?
Yes. No issues found.
2. You would not be able to reproduce the situation unless you reproduced the failure at the exact same moment. (i.e. disconnected drive, corrupted a block of data, etc.) Since the corruption was very small, my guess is it was a perfectly timed reboot as it was writing to drive. There is a good chance if you check integrity of the filesystem on that disk, there are issues.
Again, to be completely clear -- are you saying that the support engineer's statement that:
they were stored with an older version of the CrashPlan Application that has a known issue with incorrectly checksum-ing stored files in a backup archive.
is false, and that older versions of Crashplan did not have a known issue when checksumming stored files?
Matthew replies the same day.
no worries - i figured as much. I'm disappointed in your response - I was hoping you'd see the severity of your accusation and while a lot of your presumptions were based on a single miswritten statement from a support engineer, realize you jumped the gun on a few conclusions.

You're publicly saying "Crashplan corrupts data" and "they know about it" and "aren't communicating it."

That's ridiculous. That's false. You should have confirmed that understanding before writing them publicly as fact. Bloggers should fact check just like journalists if they're going to publish. (This is my opinion, hey, don't agree.. but as soon as you started promoting your blog by hijacking other threads, you crossed that line. IMHO.)

Why did you not try your other backup? It could have confirmed the checksum issue wasn't present in the product and was due to something else. Maybe we'll make progress on the source of the issue rather than assuming the worst?

Taking a junior support guys communication mistakes (he's a new hire, hasn't been here all that long) and blowing it up into "crashplan corrupts data" then hijacking our public positive threads about us is a bit over the top. If you had simply said, "Hey, can I talk to your supervisor? This doesn't make sense. It seems inconsistent with everything I'm reading/have heard including your website" it would have saved us a lot of pain and time.

I wouldn't hang google out to dry on what a single support person said in an email.. at least, not without agreement from their higher ups!

Finally, let's not loose perspective on how great CrashPlan is. Do you know of another backup product (free or otherwise) that backs up to multiple destinations, has multiple levels of checksums and protections, encrypts before transmission, and then ultimately tries to identify and heal around corruption at destinations asynchronously? One that proactively sends backup status reports to prevent silent failure? Do you seriously think spreading FUD about CrashPlan helps the consumer? What else will they use? Mozy? Carbonite? Give me a break.. While yours is the only negative thread I know of like this, literally thousands have failed to restore data with those products. They're not 1,000 times bigger.. or 50.. or even 10.

We're a market leader for a reason. We try really hard, and we care a ton. It's beyond frustrating that you take a junior guys mis-step in communication and blow it out like this. We're engineers that care, we're engineers that work really hard to make a great product, that which (I personally) think is far more reliable and secure than anything else out there. Where we fail, we have a culture of fixing things, of not accepting anything else less than perfect.

I'll post on our forums as much as I can about all the safeties we employ for your data.. hopefully you'll agree it's ridiculous how far we go to do a great job.. certainly farther than anyone else out there.. and ultimately.. what a disservice you're doing with your post.. again.. IMHO.

Sorry for the fragmented thread - jamming fast between meetings.

Also.. if I come across as harsh.. sorry.. I'm mostly frustrated at how this situation even developed.. it could have been avoided with better communication at the start. Had the support guy said, "we can't heal around all types of loss, you hit one, we have a theory on why and think we've improved it".. it would have went a better way.. you might have asked what do we do.. and then been satisfied with it.

And while you're free to post whatever you wont.. please don't? I'm writing you personally.. please respect my privacy.

As always, you're free to call me.. it can save a lot of time.. and confusion around typing.
 And my reply -- which is, at the time of writing, the final one on this ticket.
On 13 April 2010 22:29, CrashPlan Support wrote:

no worries - i figured as much. I'm disappointed in your response - I was hoping you'd see the severity of your accusation and while a lot of your presumptions were based on a single miswritten statement from a support engineer, realize you jumped the gun on a few conclusions.
You're publicly saying "Crashplan corrupts data" and "they know about it" and "aren't communicating it."


That's ridiculous. That's false. You should have confirmed that understanding before writing them publicly as fact.
I did confirm it -- with the support engineer. See my message of Mar 24 4:32,
Are you saying that backups that were started with the older version of Crashplan may have this problem, and that simply using the newer version is not sufficient to correct the issue -- the corrupt backups need to be wiped, and the backup started afresh?
Correct. [...]
See also my message of Mar 25 4:01, where I write "Are you saying that the following sequence of events [...] is sufficient to cause this corruption?"

To which your engineer replies "Yes, that is correct".

In two different messages there I said that Crashplan was causing corruption, and your engineer confirmed what I was saying, and did not quibble with my use of "corrupt" and "corruption".

While you may feel that this is one of your junior engineers speaking out of turn (I note that he was the third engineer that replied to the ticket, so from my perspective it's been escalated to more senior engineers twice) I think it is unreasonable of you to expect me to know the ins and outs of your staffing organisation.
Why did you not try your other backup?
Because it's at the other end of a very slow Internet connection, and the original files were sitting on a PC a few feet away from the one I was trying to restore the files to.
Taking a junior support guys communication mistakes (he's a new hire, hasn't been here all that long) and blowing it up into "crashplan corrupts data" then hijacking our public positive threads about us is a bit over the top. If you had simply said, "Hey, can I talk to your supervisor? This doesn't make sense. It seems inconsistent with everything I'm reading/have heard including your website" it would have saved us a lot of pain and time.
As I say:

1. Support agreed with the use of the term "corrupt".

2. This was the third person who'd chimed in on the support ticket -- from my perspective I was already talking to someone senior.

3. The final response to the ticket left several questions that I'd asked open, with a complete refusal to answer them. As far as I was concerned Crashplan was done talking to me.
Finally, let's not loose perspective on how great CrashPlan is. Do you know of another backup product (free or otherwise) that backs up to multiple destinations, has multiple levels of checksums and protections, encrypts before transmission, and then ultimately tries to identify and heal around corruption at destinations asynchronously? One that proactively sends backup status reports to prevent silent failure? Do you seriously think spreading FUD about CrashPlan helps the consumer? What else will they use? Mozy? Carbonite? Give me a break.. While yours is the only negative thread I know of like this, literally thousands have failed to restore data with those products. They're not 1,000 times bigger.. or 50.. or even 10.
With respect, this is irrelevant to the issue at hand. I also find it crass that a significant part of your response to my concerns involves denigrating your competitors.

And you still haven't answered the questions that I've asked. Again, to be completely clear -- are you saying that the support engineer's statement that:

they were stored with an older version of the CrashPlan Application that has a known issue with incorrectly checksum-ing stored files in a backup archive.
is false, and that older versions of Crashplan did not have a known issue when checksumming stored files?
And while you're free to post whatever you wont.. please don't? I'm writing you personally.. please respect my privacy.
I reserve the right to

a) Post copies of this to the existing support thread (https://crashplan.zendesk.com/entries/140286-silently-corrupted-data) and

b) Excerpt some or all of the text for what I'm sure will be a followup post to the blog. One, I hope, that I will be writing after this is resolved satisfactorily.
Having read that it may be instructive to read the Crashplan support forum thread sparked by the initial blog post.  I didn't contribute to the thread until page two, because until that point I was unaware of its existence.  I post some entries to the thread clarifying that the quote acknowledging that this was caused by a Crashplan bug is a direct quote from a Crashplan employee, and two further messages that are copies of the ongoing correspondance with Crashplan (the messages quoted above).

In response to this, Code 42 silently deleted those two messages (by which I mean -- they are removed from the forum, and there is no indication that they were ever there, no placeholder that says something like "This message removed by a moderator" or similar).  Handily, their forum software (optionally) e-mails participants in a thread a copy of new posts, so at least some other contributors there saw them (and quoted them in replies).

Matthew then posts a message headed "SITUATION SUMMARY" in the thread (I can't link to it directly, their forum software does not allow you to link to individual posts, it's about half way down page 2).  I'm not going to reproduce it here, partly because it'll make a lengthy post even lengthier, and partly because you can see the responses and questions that other customers asked that forum (assuming they haven't been deleted).

You'll also note that those questions haven't been answered.

So, I'm still looking for alternative backup software.  Some cursory searching has turned up WualaSpider OakJungle Disk, as well as those two competitors that Matthew was so quick to trash, Mozy and Carbonite.

Does anyone have any positive experiences about them to relate?