A soft 404 is basically an error page with the technical part wrong (the error code). I'll save you the tech stuff, which Google themselves explain well enough.
So, when Google visits a page, it checks for common words used to denote an error page - "page not found" "error", “missing” and similar. Historically, this has been fairly light-handed, since it's perfectly possibly to use those words on a page that match those on error pages, but not actually be an error. One example might be a page that discusses error pages. But there are more examples than you might think at first. Google seem to have been ramping up the number of text strings and the method used to detect errors. And once you start turning dials at Google, there’s bound to be collateral damage!
I stumbled across one such example recently while reviewing webmaster tools data for a particular website. It took me a while of puzzling over why these were listed as errors (when the pages seemed perfectly valid) to realise what was going on:
Notice all the 500s? It seems that Google has seen lots of mentions of the word 500 in key areas of the page, and mistakenly believed these pages to be 500 (internal server error) pages – the code you get when your server breaks and can’t deliver a page correctly.
But they’re not – they’re mostly pages about the rather dinky Fiat 500. Here’s an old one:
Ever noticed that some people think cars look a bit like faces? Well, this Fiat 500 has a sad face, because Google thinks he’s broken. Aaaaw.
What to do next
Make sure you check your soft 404 reports in Google, particularly if you have pages that:
- Discuss errors or use text that might be confused with an error message
- Frequently use numbers that are in the 4 hundreds or 5 hundreds, particularly in titles and headings
Otherwise, your pages might not appear in Google search at all – just as if they actually were errors.