A myBalsamiq morning to forget: an apology

Hello friends.

For the second time in the history of Balsamiq, I write you today to apologize for our mistakes.

This morning we started what was supposed to be a routine myBalsamiq update. We couldn't do a zero-downtime update because this update required data migration in the database, so we announced a 30 minute downtime, thinking it was really only going to take 10 minutes, but we said 30 just to be safe.

How wrong we were. MyBalsamiq was in maintenance mode for about 3 hours today. Given that we would like to compete on reliability for myBalsamiq, this is clearly really, really bad.

A number of things happened during the downtime, it was a nightmare. We ran out of disk space in the database, a machine got rebooted while running data migration, and even our personal internet connection went down at some point. It was, simply, awful.

Some things were just unlucky, but we should have prepared for most of the others. This was our fault, no two ways around it.

In the end we ended up reverting to the old build, so the 3 hours of downtime were totally wasted on your side. We'll make sure they're not wasted on our side though, we've learned a bunch of lessons and will take them to heart.

First of all, we're going to start doing updates on Saturdays instead of Tuesday mornings. I didn't want to do this because it means that a few of us will have to work during the weekend, both to do the update and to man the support lines in case something goes wrong with the new version. As the CEO I hate to ask people to work weekends, but we all agree that your collective time is more important than our own, it's just the nature of the business we decided to get into, so we'll happily make the schedule change.

Other than that, we are improving our "things to check before a release" checklist with the lessons we've learned today, and going to make changes to our database structure so that data migration won't take nearly as long (in case you're interested, we're going to move the bmml data from the database to S3).

We also need to make sure our maintenance page embeds the @myBalsamiq twitter feed, so that people can stay updated on our progress more easily. Plus I have ideas about automated backups emailed to you, desktop sync, Dropbox integration...all things that should mitigate your downtime in case this happens again.

If you were affected by today's outage, please email sales@balsamiq.com and we'll credit your myBalsamiq site for 3 months or extend your trial for 3 months. It's the least we can do, and fully understand that it's not enough to regain your trust in our service.

We are committed to making myBalsamiq known for its uptime, but clearly we have a long way to go. We are learning, and I feel very sorry that our early adopters have to pay for our inexperience. :(

Alright, back to work for us. Again, I'm so sorry.

Peldi

Comments (35)

  1. Hi Peldi,
    I was not impacted as I use the standalone MacOS version, but I really appreciate the open, no bullshit communication.
    So sorry for your issues this morning, sorry for the customers that got impacted, but please, continue telling us the truth when problems happen!

    Olivier
  2. Only one word! You are simply the best! Congratulations on your mind open and your humility!

  3. I also do not use the web version, instead using the stand alone Mac OS X version.

    Just wanted to agree with what Olivier said. You earn my respect and admiration for your no bullshit approach to communication. I’m happy to use a Balsamiq product and tell everyone I can about it.

    Thanks for making a great product that I use daily.

    cheers!

  4. Peldi,

    Accidents can happen and only people who do something can do something wrong.

    In your post you point out the correct steps to make and it’s an open communication.

    Not much more you can do at this time. I’m sure the next update will only take 1à minutes ;)

    Grtz

  5. Thanks for the open communication!

    And Dropbox integration, YES! that would be killer!

  6. I’m proud to support Balsamiq. I too use the OSX desktop client as was not impacted by this outage. I echo Brett and Olivier with my respect and admiration for your open and honest communication.

    I will continue to support you guys and your wonderful product and hope once you’ve implemented some of your plans, you can enable your staff to have their weekend available.

    Thank you.

    Andrew

  7. I value the way you’ve been so open about this. Great stuff! Accidents do happen so don’t worry too much :-)

  8. Best. Communication. Ever! How can you be mad after seeing those pictures?

    In all seriousness, it is more than obvious that you and your team care about the people you are providing service to and we truly appreciate that. Keep up the hard work!

  9. Keep on working, guys! You’re simply awesome and we stay with you! Shit happens! :-)

  10. Another vote for my personal appreciation (and I would also imagine for many other users) of your open and honest communication in a world which tries to hide any mistakes and pretend bad things don’t happen. You certainly gain my respect that in the event of any problems in the future that your company has the best interests of your users at heart.

  11. To be honest with you I didn’t even know what your tool was until I read this post, your honest communication will make me look into it. They say even bad press is good on the web :-) How about a 3 month trail on the mac for a poor college student :-)

  12. Ouch. That is a rough morning but a great apology.

    I want to caution that it is still better to update on Tuesday morning in case the update fails. Updating before or on the weekend can lead to your releases spiraling out of control with no staff monitoring or just a skeleton crew.

    It is better to have your whole team available and awake. I also worry that you might forment resentment by making everyone work every release weekend. Releases will likely become more infrequent, and lead to bigger changes and bigger failures.

    Please consider that you did recover! Things went right as well.

  13. Appreciate the transparency!

  14. I’ve only ever dabbled with your trial product, but it is definitely a good product and one that I will be considering buying. I wasn’t affected by the downtime which you mentioned but I very much appreciate the honest and upfront apology to those that were. If only more companies and businesses were as open and transparent. I know what it is like to see servers and services misbehaving and going badly wrong, and it is very refreshing to see someone being honest about the mistakes, admitting to faults and bad practise when necessary but more importantly, re-assuring the stakeholders as to what lessons have been learned from the experience.
    Gold star for you guys :-)

    Andy Mac
  15. Peldi,

    Whilst I haven’t heard of your product yet, after this open and honest post I will be investigating it further now. Best of luck on your next attempt and in the future as well. Thank you.

  16. Watch out Balsamiq

    Why admitting fault, you are not financially liable.

    In the word we live in today, it’s a sad truth to be told.

  17. Umm – I think you’re beating yourself up way too much about this. Crikey. It’s 3 hrs, not 3 days, and you’re not handling payroll. I can’t imagine crowds with pitchforks gathering around your HQ??

  18. I use Balsamiq because it fits my wireframe toolset perfectly. With this I feel even better about using it. Honesty, transparency, and humility has been mentioned, and I think those three words describe what makes this apology so personal and real. (and you get extra points for having a @rohdesign poster in your office!)

  19. Honesty really compensates for the inconvenience… thanks :)

    Shannon
  20. It takes a lot of courage to be so open about your own mistakes… you guys are awesome :)

  21. Keep all your communications open like this and we will be happy even if you are down for 24 hours :-)

    Downtime hurts me, yes, but after reading this honest and open post, I love you !

  22. Hey Peldi

    Like some users above I just use the Mac OS X version, but when I saw this I thought I’d add my voice to the supporters. As a web company MD, I know what it’s like to not want to ask staff to work weekends – and to overlook things that should be on a checklist. You feel gutted, you feel awful and you feel like you’ve failed more than just your clients.

    The honesty and transparency of the above is all part of the process and as others say – it was just 3 hours. Looking forward to a lot more fantastic things to come from you guys – but on a Saturday, not on a Tuesday.

    Three cheers for Balsamiq – a great team, a wonderful product and one hell of a leader.

    Dan

  23. Live, learn, forgive yourself.

    I’ve been in technology for nearly 30 years now and something always happens. Disc subsystems, network, fire sprinklers, bad power fail-over (with very expensive equipment to avoid the very event) and dropping a very large index on a database ;)

    All that being said, I really like having all on hands on deck during the week for *most* upgrades. Something goes south it is good to have people around instead of a skeleton crew and not having key players out on a boat, etc.

    You and your team are a class act.

    Thanks for being transparent.

    +1 on DropBox / Box.com integration.

    BradmanGA
  24. Peldi,
    It’s not about the errors, it’s about how you (I, we, all) handle the errors.
    And you guys did the right thing.
    I realize that the majority of people are ok with an honest error. What people can’t stand it is:”it was’t our fault”

    You and your people are doing a great job and if you have learned something about what happened, that’s what have to get from this experience.

    Do the upgrade on the weekend using the checklist and then go all drink Lambrusco!

    All the best for you!

    Rui Barbosa
  25. I’m just happy to hear there is a web version! Can’t wait to try it out ;)

  26. Best business apology I’ve ever seen. Well done.

    Anonymous
  27. I’ve been an IT network gummy for years, now a developer. I get it. Things happen. I shake you hand for the honesty, but mess with my coffee and I break it! :P

  28. I can see how an application that just didn’t exist a few years ago takes the number 1 spot in the apps I work with daily (and I am including my mail client, google docs and photoshop, among a gazillion others) – a brilliant team, open and honest communications and well, plain ability to connect to the needs of your users. Mistakes happen, and you get over mistakes by learning what needs to be learnt.

    Hats off Peldi. This is simply an awesome product and it continues to get more awesome every passing day. Just ensure it doesn’t become bloatware a few months/years down the line.

    Cheers!

  29. I echo I don’t know what your product is but I will be investigating it just because of the apology. If your service/product is even half as good as the apology you’ll must be awesome.

    Darth Rudhra
  30. Daje Peldi,
    nun te sta a flaggella’ pe’ tre ore :-)
    Sei sempre il nummero uno
    Nicola

  31. Dear Peldi, I am not a Balsamiq user, but now I would like to be

  32. Pingback: Being able to say “sorry” – a great example by Balsamiq « As I learn …

  33. Everyone else has made the case – honesty and transparency beats bulls**t everytime…

    I’ve been an avid fan/advocate of balsamiq for a few years now, although I too use the mac desktop version.

    Without a doubt, my respect for your company has just been jumped several notches…

  34. Pingback: Terminus Blog | Four Mistakes That Most Startups Make | Terminus Blog

  35. Pingback: 5 CEOs Share How Transparency Impacts Their Content Strategy - ripenn

Leave a Comment

Your email is never published nor shared.