EMC Unified VNX — Unisphere AWOL !

Unisphere can sometimes be a fickle beast, particularly with java versions and compatibility. I had a very strange issue with Unisphere a couple of weeks ago where it would not allow any logins.

Regardless of trying to login to the Control Station or either of the SPs directly, using LDAP, local or global users, the result was the same. Stuck on the login splash screen indefinitely. Any host workstation was the same.

I was still able to login normally using ssh, and naviseccli was still functional so I could still do everything I needed to, but other admins who are not as versed using anything other than the gui would have been a bit hamstrung, and that’s not cool.

Of course, the first thing I do in a situation like this is make sure there is nothing more sinister going on with the arrays, such as a cascading fault or other issue, so a naviseccli faults -list was done to check.

15-12-2014 9-48-20 PM

 

 

Next step was to login to the Control Station via putty and get some analysis going. Checking disk usage using the DF command. I could see that dev/hd3 (mounted as /) was full and /nbsnas was at 75%.  I searched for files bigger than 50MB and cleaned them up

A quick cleanup of some backup files, messages and logs and this was back to a normal state. However the problem persisted.

Tailing /var/log/messages and /nas/http/logs/access_log didn’t show any smoking gun.  I then ran ‘top‘ and checked resource usage.

15-12-2014 9-50-10 PM

As you can see the http daemon (httpd) running under the apache user was monopolizing CPU resources. ‘uptime‘ showed this had been going on for some considerable time.

Checking the running processes showed the httpd process there and running. I decided to restart the webservices using the built in ezyadm scripts

15-12-2014 9-51-53 PM

 

 

 

I gave it 5 minutes to stabilize and tested. Same issue.

15-12-2014 9-53-04 PM

 

 

 

 

 

 

 

 

 

 

At this point I decided a reboot of the Control Station was the next step.  I’ve rarely had to do this, but it’s safe and quick. /sbin/shutdown -r now

15-12-2014 9-55-22 PM

 

 

 

Again, I gave it some time to stabilize and ensure the various daemons started successfully. Again the problem persisted.

I checked the DOMAIN to ame sure it hadn’t become corrupted for some reason and all looked ok. Check /nas/http/domain/domain_list

Checking disk usage again, I noticed that /dev/hd3 was again at capacity and /sbnsnas was approaching 80%….Interesting and looking like where I should concentrate my efforts.

Drilling down, I came to find /nas/site/failed_auth_record growing rapidly. I zeroed it out and it started growing again ! This was subsequently writing constantly to /var/log/messages and also seemed to be forking a copy of itself into /tmp which was filling up /hda3. Nice.

Seems like it was time to create an SR. I got an excellent engineer who seemed quickly able to diagnose a “failed_auth_record” bug ! Which he promptly advised was https://support.emc.com/kb/185780. Unfortunately it’s not customer viewable, at this point so I can’t detail what’s in it, apart from the being a patch for the ticketlogin.pm file, which he promptly applied over ESRS. the offending /nas/site/failed_auth_record was removed, as was the copy of it under /tmp.

I was finally able to login again, and cpu use was back down to normal, low use.

Immediately it was evident of an issue, based on the 400 alerts filling up the dashboard, which I cleared but they kept returning. Another cycle of the httpd process using the ezyadm script had them gone too.

15-12-2014 10-48-58 PM

 

 

 

 

This VNX is also at latest File and Block OE level, so it’s quite a fresh bug. My TAM advised was being fixed in the next release.

What did I learn from this ?  Sometimes you can be very close to solving an issue, and have zeroed in right to the cause, but still be a long way from a fix. Without seeking assistance and raising an SR, this issue would not have been fixed.

Don’t be stubborn, know your limitations and seek help when you need it 🙂

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *