Save me, Google
In an ideal world, everyone doesn't make mistakes. Everyone's considerate. Everyone works hard. Everyone's a Democrat Republican woman good human being.
And everyone who has access to the database absolutely knows what to do with it.
But then, this is not the ideal world. We all make mistakes. And yes, we sometimes get access to things that we really are not that familiar with. When something bad happens, we really do find ourselves in situations where we just don't know what to do--situations that we should have prepared for but didn't.
The question then is, when push comes to shove, can we google our way out of trouble? Can we, for example, google our way out of a database that's marked suspect??
This post doesn't discuss how to fix a suspect database. It simply discusses some tips on what to do if you have one and Google's all you've got. (Note: when I say Google, I mean online resources. Google just sounds better
.)
Take Alex. Alex had a database that contained historical data. It was on an external drive. His company didn't access it frequently but there was definitely some substantial "select" activity. No new records were being added. The database was really just there for querying of historical data. One day, Alex had an A-HA moment and realized...if they lose that copy of the database, they're going to be in trouble. So he did the right thing and decided to back the database up. The database was 400 GB. He thought that he'd be better off just making a copy of the MDF vs backing it up via SQL Server. He thought it would take less time. He shut his server down and executed a file copy of the database's MDF to another external drive. Unfortunately, the copy failed. He received an error that said "Windows – Delayed Write Failed". Uh-oh. Shouldn't have been that bad right? Most of us have done file copies before that failed. Alex hurriedly restarted the server.
Uh-oh: the database had been marked suspect.
I know, I know. There are several actions above that should've been done differently. And yes, you who's screaming "what were you thinking??"--I hear you too. We can do the shoulda-woulda-coulda game here. But we won't. I remember one of my team members telling me about a problem he caused. He was so scared he just went on and on as to what caused it, how his excuses were valid and how the situation was unavoidable, etc. I finally just stopped him and asked, "Can you fix it?". He said "Yes, but...". I stopped him again and said--"Then fix it. Look, right now, I don't want to know the why. I just want you to fix the problem. If your excuses and reasons help you find the solution, fine. But right now, I don't want to hear them. Fix the mess you made. When you're done, then we'll discuss your reasons."
In the same way, we're going to forego the post-mortem until later (maybe the next post?). For now, let's put ourselves in Alex's shoes and try to google our way out of trouble. Let's be the DBA who made some wrong decisions. Let's be the DBA who didn't have the books, training, or experience to get out of this mess. Again, we don't care about the why. Maybe we're new to the job. Maybe we were busy and we were doing a million other things. Maybe, we actually thought we were doing the right thing. Right now, it doesn't matter. What matters is if we are to fix some problem via Google--we need to make sure we do it right.
Tip 1: Know that what you get online can be wrong.
Google's google. There are several false and half-baked information out there. So let's be careful. Let's not panic so much that we pick the first answer that we see. Who wrote it? What if Ashton Kutcher did? So we're a fan--it still doesn't make it right. What if you found it on aprilfoolsday.com?? Yes, it's May now--so it's probably not a joke anymore, right? Wrong.
Let's always be defensive with the information we find out there--just because it's there doesn't mean it's good. Or even if it is--it doesn't mean it's right.
With this in mind, let's browse through the the Google results when we search "sql server 2005 database suspect". Let's check this solution. It tells us to
-set the database to emode (emergency mode),
-execute a DBCC CHECKDB ('dbname'),
-set the database to single user mode,
-then execute a DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS
-set the database to multi-user mode
This is a popular one. It's easy...it has the script. It pretty much says copy, paste, and execute. It has good reviews. And different people have suggested it in so many online resources. Can it work? Yes. Does it always work? I don't think so. Do you know?
If you don't--STOP.
Tip 2: Do not implement solutions blindly.
Since we didn't protect ourselves before (correctly or otherwise), we should now. If we didn't plan before, we should plan now. We're in trouble already--let's not make it worse.
My main issue with this solution is a DBCC CHECKDB ('dbname') is immediately followed by steps to execute a repair. What is the point of executing "DBCC CHECKDB ('dbname')" if the results won't be read at all? If we don't know what DBCC CHECKDB is, let's find out first what it does and its impact before executing it. Also, let's check if there are preparations required to ensure that we can successfully undo whatever we're doing--even if undoing brings us back to the same hell that started it all. That hell can be heaven compared to the next hell we'll find ourselves in if we plunge head-on, yet again, without a strategy.
Let's check this other solution: http://www.karaszi.com/SQLServer/info_corrupt_suspect_db.asp. This was referenced by http://searchsqlserver.techtarget.com/answer/Suspect-SQL-Server-database-solution.
If this is the first solution we actually look at, we're in luck. It's not a copy-paste solution; it actually requires us to go through a few steps to see what our options are. The solution mentions this: "If you can, run DBCC CHECKDB against the database: Search Books Online and KB for the any error numbers returned. There might be specific recommendations for your error messages.". Notice how the first solution didn't even mention this?
Which bring us to the last tip...
Tip 3: Know whom to trust.
Really.
You can trust some people out there. Keep looking and you'll find them. I haven't been with the SQL Server community too long so I can't really make the best recommendations. Paul Randal is popular. His name is one of the first names that I heard when I started stalking people in the SQL Server community. There are more definitely. I'm not saying these people don't make mistakes. *Everybody* does. I'm just saying these experts have been around the block. They know what they're doing--and they're most likely not going to give out wrong information. If they do, they'll let you know.
They're also probably not going to tell you to be adventurous or exciting when dealing with your databases. On the contrary, they'll tell you to be boring and predictable. They'll tell you to look and verify and look and think and look and verify--and to repeat looking and thinking and verifying until you actually know what you're doing.
Conclusion
So, did Alex fix his database? No, not yet. The last I heard, he executed DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS which he said didn't fix the problem. Then he detached the database and he said he couldn't attach it back. You out there who's screaming yet again--I hear you. I hear a couple of "URGHS!!!". Easy there
. I won't discuss the details anymore. Fact is, the user did all these because Google told him to. He was in trouble, he panicked, and he trusted his salvation to Google. I'm not saying he shouldn't have, but, when information is already served on a silver platter, always do one more thing.
Verify.