Best Practice & Troubleshooting

Appendix A: Best Practices and Troubleshooting

A collection of expert recommendations, professional tips, and solutions to common problems to help you get the most out of EDCC.


Overview: Your Quick Reference Guide

This appendix serves two key purposes:

A Best Practices Checklist: It consolidates the most important recommendations discussed throughout this manual into a single place, helping you establish an efficient, secure, and scalable management workflow.

A "First Aid" Troubleshooting Guide: It provides quick, step-by-step solutions to the most common problems and questions you may encounter. When something goes wrong, start here.


1. Best Practices Checklist

This section consolidates best practices into a scannable checklist, organized by operational area.

Initial Setup & Planning

  • Plan Your Hierarchy: Before adding nodes, map out your Organization and POD structure. Name PODs logically (e.g., "Taipei-IDC-Aisle-5," "Production-SQL-Cluster")

  • Configure System Services Early: Configure the Mail Server and HTTPS File Server in System > Application Settings as a first step. They are prerequisites for user invitations and OS deployment

  • Secure Default Credentials: Immediately change the default BMC password for your PODs in CONFIGURE > General Settings

Daily Monitoring & Operations

  • Start with the Dashboard: Begin your daily checks on the Dashboard. The POD Health widget is your most critical indicator

  • Use Groups for Efficiency: Organize nodes into Groups in the Node List for easier filtering and bulk operations

  • Keep POD View Updated: Treat POD View as your logical "source of truth." Keep it updated to reflect your real-world rack layouts

Maintenance & Updates

  • Automate Firmware Updates: Use the Firmware Provisioning feature for routine, fleet-wide updates. Schedule the Maintenance Window carefully during off-peak hours

  • Backup Before Changes: Always create a Configuration Backup before making significant changes to a POD. Create a System Backup before making platform-level changes

  • Protect Your "Golden Images": Use the Protect feature on known-good, stable Configuration Backups to preserve a reliable rollback point

Security Management

  • Follow the Principle of Least Privilege: Grant users the minimum level of permissions required. Use granular POD-level roles instead of Organization-level roles whenever possible

  • Regularly Audit Events: Periodically review the System Event log to audit administrative actions and track changes


2. Troubleshooting Common Issues

This section provides solutions to the most common questions and problems, grouped by category.

Scope & Permission Issues

Q: Why can't I edit settings or perform an operation on my node?

Solution: You are in the wrong management scope. Configuration can only be performed when a POD is selected.

How to fix: Open the Management Tree and click on the specific POD that contains the node you want to manage.

Q: I registered a new node, but I can't find it in the Node List.

Solution: The node is waiting in the global Inventory. It must be assigned to a POD before it can be managed.

How to fix: Go to System > Inventory. Find the node, select it, and click the Assign Device button to move it to your desired POD.

Hardware & Status Alerts

Q: The Dashboard shows a "CRITICAL" status for my POD. What should I do?

Solution: This status is a direct reflection of unresolved events in a node's BMC SEL.

Step-by-step fix:

  1. Go to MANAGE > Services (Redfish SEL Health tab) to quickly identify which node(s) are reporting a Critical status

  2. Use the BMC SEL shortcut for an affected node to jump directly to its event log

  3. In the BMC SEL tab, identify the hardware event. After resolving the physical issue, toggle the event's status to Resolved and click Apply

Q: A specific node appears "Offline" in the Node List.

Solution: EDCC cannot communicate with the node's BMC.

Step-by-step diagnosis:

  1. Physical: Verify the node's BMC/management network port is physically connected

  2. Network: Ensure the BMC's IP address is reachable from the EDCC host server (e.g., using ping). Check for firewalls

  3. Credential: Go to the Node Detail > Summary page for that node and verify that the BMC Credential is correct

Feature & Prerequisite Issues

Q: I invited a new user, but they never received the invitation email.

Solution: The SMTP server is likely misconfigured.

How to fix: Go to System > Application Settings > Mail Server. Verify all details are correct and use the Test button to confirm it's working.

Q: The "Mount ISO Image" option doesn't work.

Solution: This feature depends on two other settings.

Step-by-step fix:

  1. Go to System > Application Settings > HTTPS File Server. Ensure File Sharing is enabled

  2. Go to CONFIGURE > OS Deployment. Ensure you have uploaded the necessary ISO file

Q: My POD-wide Service Profile changes don't affect a particular node.

Solution: The node has an individual configuration override enabled.

How to fix: Go to the Node Detail > Services page for that node. On the Subscription tab, disable the INDIVIDUAL SERVICES CONFIGURATION ENABLE toggle to make it inherit the POD policy again.

Performance & Platform Issues

Q: The EDCC web interface is slow or unresponsive.

Solution: Check host system resource usage.

Step-by-step diagnosis:

  1. Go to System > System Information > Summary tab

  2. Check the CPU, Memory, and Disk usage meters

  3. If any meter shows consistently high usage (>85%), consider upgrading host resources

Q: Firmware updates fail to complete successfully.

Solution: Multiple potential causes need to be checked.

Step-by-step diagnosis:

  1. Verify Maintenance Window is correctly configured in CONFIGURE > Firmware Provisioning

  2. Check node connectivity and BMC communication status

  3. Ensure firmware file is compatible with target node model

  4. Review BMC SEL on affected nodes for specific error details

Configuration & Setup Issues

Q: I can't see the CONFIGURE module in the menu.

Solution: You need to select a POD scope and have appropriate permissions.

How to fix:

  1. Ensure you have POD Admin or Organization Admin role

  2. Use the Management Tree to select a specific POD (not Organization or Hierarchy View)

Q: Dashboard health status doesn't update after fixing hardware issues.

Solution: Events in BMC SEL need to be marked as resolved.

How to fix:

  1. Navigate to the affected node's Node Detail > BMC SEL tab

  2. Find the resolved hardware event and toggle its status to Resolved

  3. Click Apply to save the change

  4. Dashboard status should update within a few minutes


Quick Reference: Permission Requirements

Task
Required Permission
Access Location

View nodes and monitoring

POD Viewer or higher

Any scope with POD access

Configure nodes and PODs

POD Admin or Organization Admin

POD scope only

Manage users and permissions

Organization Admin only

Organization level

System settings and backups

Organization Admin only

System menu

Device inventory management

Organization Admin only

System > Inventory


Emergency Contact Information

When escalating issues that cannot be resolved using this troubleshooting guide:

Before contacting support, gather this information:

  • EDCC Software Version (from System > System Information)

  • Node Serial Numbers (from Node Detail > Summary or System Information)

  • Error messages or screenshots of the problem

  • Steps taken to reproduce or resolve the issue

  • Recent changes made to the system before the problem occurred

This information will help support teams diagnose and resolve issues more quickly.