Microsoft Corp. today announced beta availability of the newest
version of its industry-leading legacy integration solution, Host
Integration Server (HIS) 2004, part of Microsoft® Windows Server
System(TM). With innovative support for XML Web services, Visual
Studio® .NET and the Microsoft .NET Framework, HIS 2004 represents a
significant advance for customers seeking to connect to their IBM
mainframe and midrange systems in a more cost-effective,
industry-standard and efficient manner. Host Integration Server 2004
will include enhanced features for securing cross-platform access and
will offer improved application and data integration capabilities,
giving enterprise developers and administrators a complete, seamless
host integration experience.
Host.net of Boca Raton, Florida, a highly respected provider of
Internet transit, network transport, colocation and Voice-over-IP
(VoIP) services, has now added the premier Managed Security Solution
from ProtectPoint Security, Inc. of Fort Lauderdale, Florida to its
Managed Services Division. [PRWEB Apr 7, 2005]
It's raining and blowing like mad in the Bay Area today. I just had a
3.5 hour power outage. Yuck. Oh, well. It could be worse. At least it
doesn't snow here....
out·age (ou?tij) noun
- A quantity or portion of something lacking after delivery or
storage.
- A temporary suspension of operation, especially of electric
power.
When I woke up yesterday after a brief sleep I started to log back
in to different services and as I'm seeing something's funny with my
server, Jim over at #mobitopia
asks "is your site down?".
Damn.
As I checked what was happening, I could see that all sorts of
things were not working on the server. I was starting to fear the
worst ("the worst" in abstract, nothing specific) when I remembered
that I had seen similar symptoms a couple of months ago, and back then
it had been a disk space problem. I run "df" and sure enough, the
mountpoint where a bunch of data related to the services (including
logs) is stored was full (since November the number of pageviews a
month has increased to over 200,000, which creates pretty big
logfiles). As the last time, the logs were the culprits. Still
half-asleep, I start to compress, move things around and delete files,
when suddenly after a delete I stop cold: "No such file or
directory".
What? But I had just seen that file...
I look up the console history and four rm commands had
failed similarly.
Uh-oh.
I run "pwd". Look at the result. "That's not right...". I was
not where I thought I was.
At that point, I woke up completely. Nothing like adrenaline for
shaking off sleepiness.
I look through the command history. At some point in my switching
back and forth from one directory to another, I mistyped a "cd -"
command and it all went downhill from there. Adding to the confusion
was the fact that I used keep parallel structures of the same data on
different partitions, "just in case". I stopped doing that once I got
DSL back in May last year, opting instead to download stuff to my home
machine, but the old structure, with old data, remained. And, even
more, my bash configuration for root doesn't display the current
directory (the first thing I did after I realized that was add $PWD to
the prompt, but of course by then it was too late).
I had just wiped out the movable type DB, the MT binaries
(actually, all the CGI scripts), the archives, and a bunch of other
stuff in my home directory.
I took a deep breath and finished creating space, and moved on.
First thing I did was restart the services, now that disk space
wasn't longer an issue. Then I reinstalled the binaries that I had
just wiped out, which I always keep in a separate directory with some
quick instructions on how to install them. That turned out to be a
lifesaver, one of the many in this little story.
After that I put up a simple page that explaining the situation (he
re's a copy for... err... "historical reference"), plus a
hand-written feed and worked on the problem in breaks between work.
Then I realized that all the links that were coming in from the
outside (through other weblogs, google, etc) were getting a 404. So as
a temporary measure I redirected the archive traffic to the main page
through a mod_rewrite clause:
RewriteRule
/d2r/archives/(.*) /d2r/ [R=307]
That would return a temporary
redirect (code 307) while I got things fixed (one fire out! 10 to
go).
So what next? The data of course. When I came back to Ireland at
the beginning of January I started doing backups of different things
(a "new year, new backups" sort of thing), and I backed up all the
server data directories on Thursday, and then on Saturday I did what I
thought was a backup of my weblog data, through MovableType's "Export"
feature. As things turned out, the latter proved useless, and it was
the "binary" backup that saved the day.
Why? Well, as I started looking at things, I went to MT's "import"
command in cavalier fashion and was about to start when the word
"permalink" popped up in my head. Then it grew to a question: "What
about the permalinks?".
The question was valid because my permalinks are directly based on
the MT entry ids. Therefore, if an import changed the entry IDs, it
would also break all the permalinks. I started cursing for not
switching over to using entry-based strings for permalinks, but that
didn't help. So I did a little digging and I realized that I was
right. MT assigns entry IDs on a system-wide basis. So if you have
multiple weblogs on the same DB (which I have, some of them private,
some for testing, etc) OR if you have to recover the data from an
export (which I had to do) you're out of luck. More likely than not,
the permalinks will not work anymore. The exported file did not
include IDs. Re-importing would generate the IDs again. Different IDs.
Different links. Result: broken links all over the place, both within
the weblog and from external sources.
This is clearly an issue with the MT database design, which doesn't
seem too well adapted to the idea of recovery. To be fair, however, I
am not sure how other blogging software deals with this problem, if at
all. I think this is one big hole in the weblog infrastructure that we
haven't yet completely figured out, both for recovery and for
transitions between blog software (As Don noted recently).
This is when I started thinking that things would have been much
easier if I had written my own weblog software. :) That thought would
return a few times over the next 24 hours, but luckily I was busy
enough with other things not to indulge in it too much.
After looking online and finding nothing on the topic, I came to
the conclusion that my only chance was to do a direct restore of the
"binary" copy (that is, replacing the clean database with the backup
directly) I had from last Thursday. I did the upload, put everything
in place, and things seemed to go well, I could log in to MT and the
entries up to that point where right where they had to be. So far so
good. I was going to do a rebuild and I thought that maybe now was a
good time to close off all comment threads in all entries (to avoid
ever-increasing comment spam) and I spent some time trying to figure
out how to use the various
MT tools to close comments on old entries. However, they all seem to be ready
for MySQL rather than BerkeleyDB. It wasn't a hard decision to set it
aside and move on.
So I started a full rebuild. The first 40 entries went along fine,
albeit slowly. Then nothing happened. Then, failure. I thought for a
moment that, for some strange reason, the redirect I had set up
yesterday was causing the problem, so I removed it, restarted the
server, and Tried again. Failed again. No apparent reason.
I got angry for a second but then I remembered that the "binary"
backup was of everything, including the published HTML files.
Aha! I uploaded those,crossed my fingers, and did a rebuild only of
the index files, and everything was up again. Actually, this was
important for another reason, since the uploaded images that are
linked from the entries end up by default in the archives
directory, you need a backup of that or the images (and whatever else
you upload into MT) will be gone if you lose the site.
So the solution up until this point had been a lot simpler than I
thought at the beginning.
But wait! All the entries after last Thursday were missing, and I
didn't have a backup for those. That was when RSS came to the rescue
in three different forms: 1) I download my own feeds into my
aggregator, so there I had a copy up to a point. 2) Some kind souls,
along with their condolences for the problem, sent along their own
copy of the latest entries (Thanks!!--and Thanks to those who sent
good wishes as well). 3) Search engines, (Feedster was the most up to
date--btw, it was Matt that
suggested yesterday, also on #mobitopia, that I check out Feedster as
a source of information, a great idea that really applies to many
search engines if their database is properly updated), had cached
copies that I could use to check dates and content. So armed with all
that information I set out to recreate the missing entries.
Here the problem of the permalinks surfaced again. I had to be
careful on the sequencing, or the IDs wouldn't match. So I re-created
empty entries, one-by-one, to maintain the sequencing (leaving them
unpublished), actually posted a couple
a> of updates<
/a> of what was going on, and then I published the recovered entries
as I entered the content and set the right dates.
So. All things are restored now (except for the comments from the
last week, which are truly lost--this makes me think that setting up
comment feeds would be a good idea. However, that doesn't address how
would I recreate the comments given what happened. Would I post them
myself under the submitter's name? That doesn't seem right at all.
Another problem with no obvious solution given the combination of
export/ID issues with MT).
What's strange is that there's been slight a breakdown in
continuity now, because I did "post" some updates to that temporary
index file, but it couldn't be part of the regular blogflow. Hopefully
this entry fixes that to the extent possible.
Okay, lessons learned?
- Backups do work. :) I am going to do
another full backup today, and I'll try to set up something automated
to that effect. (Yes, I know I should have done it before, but as
usual there are no simple solutions, and then you leave it for the
next day... and the next...). Plus, backups for MT installations,
should always be both of the DB and the published data, to make
recovery quick. (I have about 1500 entries, which amount to something
like 20MB of generated HTML--additionally, the images are posted
directly on the archives directory, so if you're not backing that up,
you've lost them).
- For MovableType, the export feature is not so great as far as
backups are concerned. The single-ID-per-database problem is a big one
IMO, and I don't think MT is alone in this. We need to start looking
at recovery and transition in a big way if weblogs are going to hit
the mainstream (and we want permalinks to be really permanent)
- Solutions are often simpler than you think, if you have the right
data. Having a full backup makes recovery in this case easy and fast.
- This stuff is still too hard. What would a less
technically-oriented user do in this situation? Granted, it was my
knowledge (since I was fixing stuff directly on the server) that
actually created the problem in the first place, but there are
lots of ways in which the same result could have been "achieved",
starting from simple admin screwups, hardware failures, etc.
Overall, this has been a wake-up call in more than one sense, and
it has set off a number of ideas and questions in my head. How to
solve these problems? I'll have to think about it more.
Anyway. Back to work now, one less thing on my mind.
Where was I?
MIT power outage
MIT power outage
05/04/2004 03:12 PM
real reporting, complete with charts!
LiveJournal Outage
LiveJournal Outage
02/01/2005 10:05 PM
Due to a power failure affecting all of Internap's data center,
LiveJournal is currently completely inaccessible, and we're waiting
on...
Yesterday's outage
Yesterday's outage
04/14/2004 10:27 AM
My host's server died yesterday and didn't come back until this
morning. Sorry for the interruption. I don't know yet what will happen
to email you sent me yesterday. Apparently it's all going to arrive
soon. Sorry for the inconvenience....
Planned outage
Planned outage
03/25/2005 09:07 PM
NewsGator Online will be down for approximately 8 hours starting
Saturday, March 26 at 9:00am MST. We will be implementing a major
system upgrade to enhance our service...
Outage seen at Hotmail
Outage seen at Hotmail
05/07/2004 01:36 PM
CNET May 7 2004 5:13PM GMT
Google plays down outage
Google plays down outage
01/06/2005 07:24 AM
News.com.au - Thu Jan 6, 07:08 am GMT
Akamai DNS outage causes problems
Akamai DNS outage causes problems
06/15/2004 01:15 PM
DNS server problems at Akamai lead to several major sites being
unreachable.
Other News: Akamai Outage
Other News: Akamai Outage
06/16/2004 10:22 AM
Yesterday's blackout of Apple's and other major web sites is was
apparently caused by a mysterious Internet attack on Akamai name
servers.
Comcast suffers DNS outage
Comcast suffers DNS outage
04/08/2005 05:45 PM
Comcast said its DNS troubles yesterday were unrelated to recent
"cache poisoning" attacks on DNS servers. Service was restored around
midnight last night.
Net outage strikes Comcast
Net outage strikes Comcast
04/08/2005 12:57 AM
Blog: Comcast, the largest provider of broadband Internet access with
6.5 million customers, suffered a general outage Thursday evening.
...
Web outage blamed on zombies
Web outage blamed on zombies
06/17/2004 05:12 AM
ZDNet UK Jun 17 2004 9:03AM GMT
Temporary site outage
Temporary site outage
07/23/2004 02:43 PM
Linux.com is being re-launched. For several hours this afternoon,
neither Linux.com, IT Manager's Journal.com nor NewsForge.com will be
visible. We regret the inconvenience, but feel the new Linux.com will
be well worth it!
Comcast's Offer for Outage: $1.43 a Day
Comcast's Offer for Outage: $1.43 a Day
04/15/2005 12:36 PM
After experiencing three nights of network outages in less than a
week, BetaNews has learned that in at least one case in southeast
Michigan, a customer received a credit of $2.86 on their bill to
compensate for the two days of service he complained about.
We got heavily effected by this outage
We got heavily effected by this outage
05/05/2004 04:12 AM
On a Wing and a Wiki. When burglars brought down the
Internet link to Ziff-Davis' Manhattan offices, open-source
softwareand Sean Gallagher's personal Web serverkept
eWEEK.com's stories flowing. [eWEEK.com
Messaging and Collaboration]
Woe - this outage effected us!
We're trying to get this system done
for E3 next week and all of a sudden all of the net connections to NYC
are down. Everyone's email is out. Total outage on
infrastructure, servers, data traffic, testing, updates it's all
off-line.
Not a very condusive thing to have happen less than a week from
launch.
:-)
Impact of Outage Minimal
Impact of Outage Minimal
06/17/2004 04:38 PM
“Akamai Technologies (akamai.com) said yesterday that the
“sophisticated, large-scale” denial of service attack it
suffered earlier this week that impacted its naming functionality had
only a minimal impact on its customers.”
Akamai DNS Outage Messes up Net
Akamai DNS Outage Messes up Net
06/15/2004 10:01 AM
Website contact form outage
Website contact form outage
03/14/2005 04:35 PM
Just a short note to those of you who may have tried to contact us
recently: Apparently, we've had some trouble with the hamsters[*] who
power our contact forms — most noticeably between Feb
28th-March 1st. If you tried to...
Note to Readers: Outage and Delay
Note to Readers: Outage and Delay
04/14/2004 08:58 AM
Network problems this morning with SAVVIS, our Internet provider,
resulted in an unusual outage and will delay our normal news update.
T-Mobile Apologizes For Danger Outage...
But Not To Me
T-Mobile Apologizes For Danger Outage...
But Not To Me
03/22/2005 10:07 PM
A few weeks back we noted that all users of the Danger Hiptop/T-Mobile
Sidekick had no data
service for about a week, for some unknown reason. While service
was eventually restored, the outage has still not been adequately
explained (though, it probably made Paris Hilton happy). As an
"apology," T-Mobile sent all Sidekick users a note promising a $20
credit and a special folder of "free" games and ringtones in a folder
labeled "Thank You" on the catalog part of the device. Seeing as I'm
a Sidekick user, I checked out the catalog. I use the Sidekick only
for data, so all of the free ringtones were useless to me (I wouldn't
use them anyway, even if I did use it as a phone). However, there
were two games offered, and considering the lameness of the existing
games on the device, I figured why not at least get something out of
this free offering. For the past few days I've been trying to
download the free games, and every time I'm told they're not
available. I finally called up T-Mobile and after waiting on hold for
a while and having to speak to two different representatives, I was
told that these games don't work on my particular Sidekick model and,
yes, T-Mobile should have filtered the emails and the catalog better
and, yes, the error message should have told me that these games
weren't compatible and, yes, they probably should have offered at
least some games that would work on my Sidekick, but otherwise, too
bad. Their only suggestion was to call up Danger Inc., and see if
maybe they would do something for me. I can certainly survive without
two free games, but it does seem particularly rude to offer a special
free "thank you," but make it so it wasn't actually available to many
of your users. It's sort of a "thank you, but screw you" response.
Amusingly, the catalogue clearly labels this as "customer appreciation
week." I don't feel particularly appreciated right now.
Comair Back in Air After Computer Outage
Comair Back in Air After Computer Outage
12/27/2004 03:53 PM
Internet News Dec 27 2004 7:06PM GMT
Snake Causes Transformer Fire, Outage
(AP)
Snake Causes Transformer Fire, Outage
(AP)
06/04/2004 10:04 PM
AP - After avoiding power outages from recent storms, this community
was plunged into darkness by a snake searching for a place to nap.
L.A. Airport Outage Snarls Air Traffic
(AP)
L.A. Airport Outage Snarls Air Traffic
(AP)
04/12/2004 02:18 PM
AP - A brief failure of a power line shut down electrical service to
the Los Angeles International Airport tower and disrupted air traffic
Monday morning, authorities said.
New Outage Plagues Comcast Subscribers
New Outage Plagues Comcast Subscribers
04/13/2005 02:09 AM
For the second time in less than a week, Comcast High Speed Internet
customers found themselves without Internet service due to a nearly
nationwide failure of it's domain name servers. According to posts on
Comcast's own help forums, problems began shortly after 10:30pm
Eastern Time Tuesday.
New Outage Hits Comcast Subscribers
New Outage Hits Comcast Subscribers
04/13/2005 08:27 AM
For the second time in less than a week, Comcast High Speed Internet
customers found themselves without Internet service due to a nearly
nationwide failure of it's domain name servers. According to posts on
Comcast's own help forums, problems began shortly after 10:30pm
Eastern Time Tuesday.
MIT Scheduled Power Outage 9-10 August
MIT Scheduled Power Outage 9-10 August
08/08/2002 10:56 AM
8 August 2002: On Friday, 9 August, power at the MIT Laboratory for
Computer Science (LCS) will be turned off at approximately 11:00 p.m.
UTC for about four hours. All services will be suspended and the W3C
site will be accessible in a read-only state. Mail sent to W3C
archives will be queued and posted when the power is restored. Power
is expected to return on Saturday, 10 August at 3:00 a.m. UTC. We
apologize for the inconvenience. (News archive)
Network Outage During Site Reset
Network Outage During Site Reset
08/03/2004 01:56 AM
Scheduled Systems Outage 7 August
Scheduled Systems Outage 7 August
08/06/2004 01:27 PM
2004-08-05: W3C's mailing lists are being moved to a new server on
Saturday, 7 August at 04:00 UTC. List service will be suspended for a
few hours but the majority of the W3C Web site will remain accessible.
Mail sent to W3C archives will be queued and posted when the move is
complete. The W3C Systems Team expects to have list service restored
on the same day. We appreciate your patience. (News archive)
MIT Scheduled Power Outage 28 December
MIT Scheduled Power Outage 28 December
12/18/2002 04:12 PM
18 December 2002: Due to construction at MIT, on Friday, 27 December,
power at the MIT Laboratory for Computer Science (LCS) will be turned
off at approximately 23:00 UTC for about twenty-six hours. All
services will be suspended and the W3C site will be accessible in a
read-only state. Mail sent to W3C archives will be queued and posted
when the power is restored. Power is expected to return on Sunday, 29
December at 01:00 UTC. We apologize for the inconvenience. (News
archive)
RIM Offers Few BlackBerry Outage Details
(AP)
RIM Offers Few BlackBerry Outage Details
(AP)
06/24/2005 03:06 PM
AP - Research In Motion Ltd. is offering few details about two major
outages in a week with its popular BlackBerry service, which delivers
e-mail to wireless devices that many users affectionately call
CrackBerries.
Squirrel Blamed for Outage, Traffic Jam
(AP)
Squirrel Blamed for Outage, Traffic Jam
(AP)
08/27/2004 01:59 PM
AP - A hungry squirrel has been blamed for a power outage that snarled
rush-hour traffic in this city north of Portland, Ore.
Zombie PCs caused Web outage, Akamai
says
Zombie PCs caused Web outage, Akamai
says
06/21/2004 05:59 AM
CNET Jun 21 2004 10:25AM GMT
Akamai blames 'zombie' PCs on Web outage
Akamai blames 'zombie' PCs on Web outage
06/16/2004 06:08 PM
ZDNet Jun 16 2004 9:40PM GMT
Orange outage hits 10,300 punters
Orange outage hits 10,300 punters
11/04/2003 02:30 PM
No signal, no nuffink
'Zombie' PCs caused Web outage, Akamai
says
'Zombie' PCs caused Web outage, Akamai
says
06/16/2004 04:21 PM
Attackers built a "bot net" of unknowing home PCs to bring down
Google and other sites, the company says.
Grok Description matches for Host Outage
GrokA matches for Host Outage
Host Outage