Best Practice & Troubleshooting
Appendix A: Best Practices and Troubleshooting
A collection of expert recommendations, professional tips, and solutions to common problems to help you get the most out of EDCC.
Overview: Your Quick Reference Guide
This appendix serves two key purposes:
A Best Practices Checklist: It consolidates the most important recommendations discussed throughout this manual into a single place, helping you establish an efficient, secure, and scalable management workflow.
A "First Aid" Troubleshooting Guide: It provides quick, step-by-step solutions to the most common problems and questions you may encounter. When something goes wrong, start here.
1. Best Practices Checklist
This section consolidates best practices into a scannable checklist, organized by operational area.
Initial Setup & Planning
Plan Your Hierarchy: Before adding nodes, map out your Organization and POD structure. Name PODs logically (e.g., "Taipei-IDC-Aisle-5," "Production-SQL-Cluster")
Configure System Services Early: Configure the Mail Server and HTTPS File Server in System > Application Settings as a first step. They are prerequisites for user invitations and OS deployment
Secure Default Credentials: Immediately change the default BMC password for your PODs in CONFIGURE > General Settings
Daily Monitoring & Operations
Start with the Dashboard: Begin your daily checks on the Dashboard. The POD Health widget is your most critical indicator
Use Groups for Efficiency: Organize nodes into Groups in the Node List for easier filtering and bulk operations
Keep POD View Updated: Treat POD View as your logical "source of truth." Keep it updated to reflect your real-world rack layouts
Maintenance & Updates
Automate Firmware Updates: Use the Firmware Provisioning feature for routine, fleet-wide updates. Schedule the Maintenance Window carefully during off-peak hours
Backup Before Changes: Always create a Configuration Backup before making significant changes to a POD. Create a System Backup before making platform-level changes
Protect Your "Golden Images": Use the Protect feature on known-good, stable Configuration Backups to preserve a reliable rollback point
Security Management
Follow the Principle of Least Privilege: Grant users the minimum level of permissions required. Use granular POD-level roles instead of Organization-level roles whenever possible
Regularly Audit Events: Periodically review the System Event log to audit administrative actions and track changes
2. Troubleshooting Common Issues
This section provides solutions to the most common questions and problems, grouped by category.
Scope & Permission Issues
Q: Why can't I edit settings or perform an operation on my node?
Solution: You are in the wrong management scope. Configuration can only be performed when a POD is selected.
How to fix: Open the Management Tree and click on the specific POD that contains the node you want to manage.
Q: I registered a new node, but I can't find it in the Node List.
Solution: The node is waiting in the global Inventory. It must be assigned to a POD before it can be managed.
How to fix: Go to System > Inventory. Find the node, select it, and click the Assign Device button to move it to your desired POD.
Hardware & Status Alerts
Q: The Dashboard shows a "CRITICAL" status for my POD. What should I do?
Solution: This status is a direct reflection of unresolved events in a node's BMC SEL.
Step-by-step fix:
Go to MANAGE > Services (Redfish SEL Health tab) to quickly identify which node(s) are reporting a Critical status
Use the BMC SEL shortcut for an affected node to jump directly to its event log
In the BMC SEL tab, identify the hardware event. After resolving the physical issue, toggle the event's status to Resolved and click Apply
Q: A specific node appears "Offline" in the Node List.
Solution: EDCC cannot communicate with the node's BMC.
Step-by-step diagnosis:
Physical: Verify the node's BMC/management network port is physically connected
Network: Ensure the BMC's IP address is reachable from the EDCC host server (e.g., using ping). Check for firewalls
Credential: Go to the Node Detail > Summary page for that node and verify that the BMC Credential is correct
Feature & Prerequisite Issues
Q: I invited a new user, but they never received the invitation email.
Solution: The SMTP server is likely misconfigured.
How to fix: Go to System > Application Settings > Mail Server. Verify all details are correct and use the Test button to confirm it's working.
Q: The "Mount ISO Image" option doesn't work.
Solution: This feature depends on two other settings.
Step-by-step fix:
Go to System > Application Settings > HTTPS File Server. Ensure File Sharing is enabled
Go to CONFIGURE > OS Deployment. Ensure you have uploaded the necessary ISO file
Q: My POD-wide Service Profile changes don't affect a particular node.
Solution: The node has an individual configuration override enabled.
How to fix: Go to the Node Detail > Services page for that node. On the Subscription tab, disable the INDIVIDUAL SERVICES CONFIGURATION ENABLE toggle to make it inherit the POD policy again.
Performance & Platform Issues
Q: The EDCC web interface is slow or unresponsive.
Solution: Check host system resource usage.
Step-by-step diagnosis:
Go to System > System Information > Summary tab
Check the CPU, Memory, and Disk usage meters
If any meter shows consistently high usage (>85%), consider upgrading host resources
Q: Firmware updates fail to complete successfully.
Solution: Multiple potential causes need to be checked.
Step-by-step diagnosis:
Verify Maintenance Window is correctly configured in CONFIGURE > Firmware Provisioning
Check node connectivity and BMC communication status
Ensure firmware file is compatible with target node model
Review BMC SEL on affected nodes for specific error details
Configuration & Setup Issues
Q: I can't see the CONFIGURE module in the menu.
Solution: You need to select a POD scope and have appropriate permissions.
How to fix:
Ensure you have POD Admin or Organization Admin role
Use the Management Tree to select a specific POD (not Organization or Hierarchy View)
Q: Dashboard health status doesn't update after fixing hardware issues.
Solution: Events in BMC SEL need to be marked as resolved.
How to fix:
Navigate to the affected node's Node Detail > BMC SEL tab
Find the resolved hardware event and toggle its status to Resolved
Click Apply to save the change
Dashboard status should update within a few minutes
Quick Reference: Permission Requirements
View nodes and monitoring
POD Viewer or higher
Any scope with POD access
Configure nodes and PODs
POD Admin or Organization Admin
POD scope only
Manage users and permissions
Organization Admin only
Organization level
System settings and backups
Organization Admin only
System menu
Device inventory management
Organization Admin only
System > Inventory
Emergency Contact Information
When escalating issues that cannot be resolved using this troubleshooting guide:
Before contacting support, gather this information:
EDCC Software Version (from System > System Information)
Node Serial Numbers (from Node Detail > Summary or System Information)
Error messages or screenshots of the problem
Steps taken to reproduce or resolve the issue
Recent changes made to the system before the problem occurred
This information will help support teams diagnose and resolve issues more quickly.