Tags

, , , ,


Issue

This issue was flagged as a Contact Center that’s unable to receive calls. Shortly after it was noted that users we where unable to join or create meetings using the “Meet Now” option within the SFB client and that the RGS service was failing to start.

Environment is Skype for Business 2015, Standard Edition.

Troubleshooting

All services were running with the exception of the Response group service. A quick dive into the event logs on the Front End Pool showed a number of errors as follows:-

Event ID 31147

Event ID 31147 LS Response Group Service – Cannot update active Match Making Server because SQL Server does not respond. Standard error for issues with connectivity to SQL.

That explains why the Response Group Service wont start. What else we got here?

Event ID 32178

Event ID 32178 LS User Service – Failed to Sync Data for Routing Group from backup store.

Cause indicates an issue with connectivity to the back-end database. Another SQL connectivity issue by the looks of it. Digging further, I also found a few other MCU errors like:

Event ID 61037

Event ID 61037 LS MCU Infrastructure – The Audio-Video Conferencing Server failed to send health notifications to the MCU factory.

And also:Event ID 61043

Event ID 61043 LS MCU Infrastructure – The IM Conferencing Server failed to send health notifications to the MCU factory.

Almost missed this sneaky Information message (should be Error right..?).

Event ID 61029

Event ID 61029 LS MCU Infrastructure – stating that the certificate applied to the front end pool was somehow invalid.

OK so here is what I have learnt so far:

  • Response Group Service has an issue connecting the SQL back-end (even though the back-end is local in Standard Edition)
  • MCU cant connect to the back-end either and its complaining about an invalid certificate

Resolution

My money is on certificates even though technically speaking the event log entry referring to this was informational. A quick look at the certificates proves that they all look fine and no expiring certs hanging around.

Double checked the trusted root CA and that too looks fine.

However..

When checking the Trusted Root Certificates Authorities store I spotted something that wasn’t right. Usually the certificates in this store have identical Issued To and Issued By entries.

Anything that deviates from that, as far as I have experienced, generally prevented the RTCSRV service from starting. Not the case this time around.

Could this be the reason why I saw the Invalid Certificate warning before? Asked the on-site engineer what the deal was with these certs. Turns out they had been added by an engineer for some firewall trusts he was mucking around with.

Removed these dodgy certs, gave the RGS Service another start and boom, running! A round of tests to confirm and we are cooking with gas.

Folks! Seriously, there is no need to drop certs from the SAME CA as your server on the local store. They are from the same CA..

Lesson for today, if you don’t know what you are doing with certificates be warned. You could accidentally take out a Contact Center.