Netgear CG3000D cable modem keeps resetting
- Details
- Published on Thursday, 25 April 2013 04:42
I published a previous blog entry about how my Netgear CG3000D cable modem had apparently failed. I wrote up a detailed description here about the problem. That blog entry was mistaken - the modem was fine all along.
I got an experienced Cox technician on site and we came up with what we think is a reasonable explanation for why the modem kept resetting. Apparently there was some user setting in the NVRAM (flash) which had been saved by a previous version of the modem's firmware. Cox upgraded the firmware in the background and when the new firmware started up, it read the settings, and found something incompatible. The Netgear programmers who wrote the firmware decided the best way to handle this situation was to reboot. As long as the NVRAM contained the same binary string of user settings, this cycle would repeat without ending.
The actual cable modem part was fine, but the router part wasn't able to work with the "corrupt" or incompatible settings. This is the main problem with integrated cable modem/router combo devices. If I had a standalone cable modem I would have been able to stay online, and I can just use my wizzy Linux server as the router. I'm very capable to debug problems with the Linux server. Not so much with "black box" devices like this Netgear device.
The only way to make the thing stable again was to clear the NVRAM with a full factory reset (hold down the reset pin for 30 seconds, no less than 30 seconds), then wait for it to come back up (the process takes 5+ minutes), then re-enter all the user settings. I do have to thank the only Tier 2 support person at Cox that I was able to speak with for suggesting this. Once I realized that the problem had to do with the user settings, it was easy to figure out.
In summary, here is how I got this problem and how to resolve it:
- The cable modem/router shipped with some firmware version from Netgear.
- I set several custom user settings for port forwarding, etc., using the original firmware version. These settings were saved to the NVRAM.
- Cox upgraded the firmware at some point. They do this invisibly and in the background and there is no notification or warning or anything. The only way to tell this happened is to check the router frequently to see if it has a new software version number.
- By the way, the Cox sooper secret login username/password for the Netgear CG3000D router is MSO/changeme. This information is not really that secret since I found it on a Cox technical support web page.
- The new firmware apparently has a different "format" or byte-string or something for the arrangement of binary digits (bits) representing the user settings in the NVRAM. There is no way in their software QA testing that they could try all of the possible combinations of bits that could be in the NVRAM, so this is an honest mistake, but if I was the designer I would have tried to make it backward compatible.
- When the new firmware was installed it caused a reboot. On reboot, the firmware read the user settings out of NVRAM and found them to be corrupt.
- The reboot cycle repeated endlessly until I cleared the NVRAM.
I guess the lesson learned from all this is that I should have tried a "factory reset" of the NVRAM the first time I noticed it rebooting.
The modem is fine and has been online with Cox ever since I resolved the NVRAM issue. Well it's fine, other than having really crappy (read: generally unusable) wifi hardware. Most people who have the CG3000D turn off the wifi and use another access point device.
Netgear CG3000D cable modem failure
- Details
- Published on Tuesday, 16 April 2013 06:07
UPDATE 2! This blog entry is incorrect and has been obsoleted this one: Netgear CG3000D cable modem keeps resetting Please refer to the updated blog entry for the new, correct information about this problem and how to resolve it.
UPDATE! The brand NEW-IN-BOX Netgear CG3000D cable modem/router that the technician installed yesterday is now doing the exact same thing as the supposedly failed unit! I guess the problem is something more complicated on Cox's end, and wasn't a cable modem failure after all. The following is an article I wrote yesterday while I still believed that my original cable modem had failed and the resolution to the problem was to install a new cable modem. I was wrong about that. Now I get to call Cox again for more technical support! (The ticket number is 1542531 for escalation to a higher tier technical support.)
Well I had a lengthy internet service outage that turned out to be a cable modem failure. This is the first outage of my Cox internet service that could arguably be my fault. I don't know what changed or what failed but the modem was working fine for about 6 months and then last Saturday night I started having a weird problem with my connection.
I have a Netgear CG3000D cable modem+router integrated device connected to Cox cable internet service. The signals are generally good and are so strong that as a result of one of my previous outages the Cox service technician installed an attenuator to decrease the downlink signal power level which would also cause the modem to boost its uplink signal power.
During this lengthy outage, the modem/router would reset and my internet connection would drop and re-establish itself on a cycle repeating every couple of minutes. I was running a continuous ping and somewhere between 7-15 pings would get through before it would drop again. The modem repeated this cycle continuously for hours. The total outage time was about 2 days - 48 hours! Here is the sequence of events I observed the Netgear CG3000D cable modem/router continuously repeating:
- These 6 LEDs were on: power, uplink, downlink, "internet," one of the switch ports, and the wifi LED.
- The uplink and downlink LEDs would turn blue.
- All the LEDs would turn off.
- All the LEDs would flash for a while, then all turn off for a moment.
- The power LED would flash for a moment then turn on solid.
- A moment later (or at the same time?) I think the wifi LED would turn on solid.
- The downlink LED would flash for a few moments then turn on solid.
- The uplink LED would flash for a few moments then turn on solid.
- The internet LED would flash for a few moments then turn on solid.
- With all 6 LEDs on solid for maybe a minute, the connection would eventually come up and a few pings would get through.
- Then the uplink and downlink LEDs would turn blue, and this cycle would repeat.
The switch port LED is not really of interest so I didn't mention it much above. I had an ethernet cable connected to another ethernet switch, but there was no traffic so the light was on solid most of the time. I think the switch port LED would blink when the other LEDs were flashing but then was on solid after the switch ASIC was initialized.
The internet LED apparently indicates if the router was able to get a DHCP IP address from the network and if network access is allowed. Generally speaking, if that is on, then you can reasonably expect to have internet access through the router and cable modem, provided your system's networking is set up properly.
Unreliable Cox Internet Service, and Nagios Flap Detection
- Details
- Published on Monday, 11 March 2013 16:18
I have home internet service from Cox Communications in California. It's cheap but not very reliable. I often get random outages and on average once a month theres a really long service outage that lasts from several hours to over a day. Obviously this is not the sort of internet service you want to run any kind of business on.
However, it's cheap ($30/month) and since I can use Verizon tethering as a backup internet connection I haven't taken the time to find another internet service provider. However I do want to know whenever this connection goes down and when it comes back up. I use Nagios for enterprise service and systems monitoring. Today I discovered that a default setting in my Nagios installation was preventing it from notifying me about service outages.
The Nagios flap detection actually DISABLED notifications of the internet connectivity disruption BECAUSE the internet connectivity was unstable. This is exactly the opposite of what I wanted to have happen! Whenever the Cox internet connection drops, I want it to know about it right away (the notifications come to my phone which is on Verizon 4G). Today I had intermittent disruptions in connectivity and it was so unstable that Nagios actually suppressed notifications!
Read more: Unreliable Cox Internet Service, and Nagios Flap Detection
Mac OS X 10.6 - Clear DNS cache
- Details
- Published on Monday, 24 September 2012 21:56
Normally I wouldn't post something relatively trivial like this. However it came to my attention today that the Mac OS X may cache out-of-date DNS entries as long as the record is still valid according to its TTL. I might not expect this behavior because the Mac is not running a DNS server and its resolv.conf points to an authoritative DNS server that is serving up-to-date data.
I would expect a copy of BIND or another caching DNS server to cache old records until the TTL expires, but not a normal network OS running a resolver client with a resolv.conf file etc. I'm running Mac OS X 10.6.8. Anyway, to clear the DNS cache on my mac, I typed this:
$ dscacheutil -flushcache
Curiously a google search of "mac os x flush dns" did not turn up that simple command, so I'm putting it on my blog for posterity.
Recent server outage and VirtualBox
- Details
- Published on Friday, 11 May 2012 22:04
This server is a Linux x86_64 virtual machine hosted in a Linux x86_64 system which was running Oracle VirtualBox 4.1.8, which was the most current release the last time the server and all the VMs were rebooted. Another virtual machine also running under VirtualBox 4.1.8 apparently caused a resource exhaustion issue which took down all of the virtual machines on the physical server, and even caused some kernel application-crash tracebacks in the system log on the physical host server. This caused my web site to be down.
You don't usually see VIRTUAL MACHINES get SCSI, SATA, or IDE disk errors when the host hard drives are fine. In this case the resource-hogging VM apparently caused enough memory issues with the host that the other VMs started having disk, bus, or other issues, like these:
ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata3.00: failed command: WRITE FPDMA QUEUED
ata3.00: cmd 61/08:00:58:51:e6/00:00:01:00:00/40 tag 0 ncq 4096 out res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3: hard resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: qc timeout (cmd 0xec)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: hard resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: qc timeout (cmd 0xec)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: limiting SATA link speed to 1.5 Gbps
ata3: hard resetting link
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata3.00: qc timeout (cmd 0xec)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3.00: disabled
ata3.00: device reported invalid CHS sector 0
ata3: hard resetting link
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata3: EH complete
sd 2:0:0:0: [sda] Unhandled error code
sd 2:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 01 e6 51 58 00 00 08 00
end_request: I/O error, dev sda, sector 31871320
Buffer I/O error on device sda3, logical block 3670315
lost page write due to I/O error on sda3
I tried to restart the virtual machines (which were all running under VBoxHeadless, not the GUI management tool) using "VBoxHeadless controlvm vm-name poweroff" and I got this error message (it was something like this, anyway):
VBoxHeadless: error: Invalid parameter: controlvm
According to a google search I did, there are no search results for anything like that error message. I guess I'm the only person in the world who's had their VirtualBox installation get into such a bad state that it couldn't even control the VMs to reset/poweroff; hence this blog post. If you get this error yourself you should probably check your VM host right away, because it's probably in a bad state and might need to be rebooted.
I updated VirtualBox to version 4.1.14 because at the moment that is the most current release. I hope it turns out to be able to handle this very heavily loaded resource-intensive virtual machine without causing problems for the other VMs and the host system.

