If you're a regular reader of my blog, you've hopefully deleted any "digital dirt" that may be floating around on the Internet.
Cleaning up some of your digital dirt is easy. It's a matter of point-and-click to delete that photo in your Facebook profile showing you pole dancing in a thong. Or the MySpace entry providing a pictorial record of you passed out on the bathroom floor of your fraternity.
Problem is, it's now possible for researchers to find old versions of Web sites and extract information from them. You can view Web pages that were modified months or even years ago through the Internet Archive, also known as the Wayback Machine, at http://www.archive.org. When I tested the Wayback Machine to view old versions of my own Web site at http://www.nestmann.com, I found more than 100 now-obsolete pages.
With the Wayback Machine, old versions of your Web site may not look the same as they did when you first created them. And fortunately for those of you who may have posted pole-dancing photos, in many cases, images aren't archived—only text. However, links on archived pages usually will function.
For businesses, threats that are even more serious lurk in archived Web pages. Did you make a claim about a product or service you offered that you later retracted? Did you sell a product that you later discontinued due to potential exposure to lawsuits? If you did, the Wayback Machine probably has a permanent archive of it.
Fortunately, while it may be too late to prevent the Wayback Machine from archiving previous versions of your Web site, you can thwart future archiving. All you need to do is add a simple set of instructions to the root page of your Web site. The instructions tell the various "Web crawlers" that troll the Internet indexing and making copies of Web pages that you don't want a particular Web page, directly, or entire Web site indexed.
The instructions are contained in a file called "robots.txt." To learn how a robots.txt file works, and how to create one, see http://www.robotstxt.org. Or, ask a Web designer for assistance.
If you have information posted on the Internet that might come back to haunt you in the future, the best advice is to remove it as soon as you can. If for business or personal reasons you can't remove it, inserting a robots.txt file into your home page is a powerful tool to prevent information that might lead to embarrassment—or a lawsuit—from surfacing in the future.
Copyright © 2008 by Mark Nestmann




Comments