Update on Geek.Zone’s GitHub Organisation Automation: A Lesson in Testing and Boundary Identification

Our recent venture into automating the Geek.Zone GitHub organisation membership hit an unforeseen snag this morning. In our effort to streamline member management, we encountered a hiccup that led to the accidental removal of almost all members from our GitHub organisation.

The Cause: Membermojo’s Maintenance Window

Unbeknownst to us, Membermojo, the platform from which we automate our membership data download, has a daily maintenance window from 00:00 to 01:30. This should have been something that we had identified. During this period, the file we downloaded contained no member data, so when OpenTofu processed this empty file, it interpreted the lack of members as an instruction to remove everyone from the GitHub organisation. This resulted in the unexpected and complete removal of our members from GitHub!

A Silver Lining: GitHub’s Safety Mechanism

Thankfully, GitHub has a safety mechanism in place which prevents the removal of the last organisation owner. This meant that we still retained at least one member with owner status, averting a complete lockout from our organisation.

Learning and Adapting: The Importance of Testing and Understanding Project Boundaries

This incident highlights the critical importance of thorough testing and a deep understanding of the entire scope and limitations of any project. We learned a valuable lesson about the necessity of considering all potential external factors, like maintenance windows of third-party services.

Our Solution: Adjusting the Automation Schedule and Adding Checks

To prevent a recurrence of this issue, we’ve made the following crucial adjustments to our automation process:

  1. Time Zone Confirmation with Membermojo
    Due to the ongoing global debate on handling times and dates, we sought clarification from Membermojo about the time zone of their maintenance window. It turns out that their window, set from 00:00 to 01:30, thought sometimes as late as 06:00, adheres to UK time, subject to shifts between Greenwich Mean Time (GMT) and British Summer Time (BST) due to daylight savings. This insight is pivotal for our automation schedule, which operates on Coordinated Universal Time (UTC).
  2. Altering the Schedule
    In light of this, our automation now runs from 02:00 to 22:30 UTC. This timing strategically dodges Membermojo’s primary maintenance window and takes into account the variability of UK time. Consequently, if you join Geek.Zone during this period, expect your GitHub invitation by 07:00 UTC.
  3. Adding a Safety Check
    To enhance the reliability of our process, we’ve incorporated additional checks within our automation to improve its fault tolleance. These checks ensure the presence of at least one GitHub organisation owner in the member data file before executing any updates, thus preventing accidental removal of all members. Another check verifies the spreadsheet headings to confirm receipt of actual member data. In the absence of these criteria, the automation halts, preventing unintended actions.

Collectively, these steps aim to ensure that the data file we rely on is as current and complete as possible, thereby helping to maintain the effectiveness and accuracy of our automated system.

Conclusion: A Step Forward with Caution

While this incident was an inconvenience, it provided us with invaluable insights into the intricacies of automating complex systems. We’ve strengthened our processes, not only technically but also in our approach to project planning and testing. The improved automation now robustly supports the dynamic nature of our Geek.Zone community, ensuring such an oversight does not happen again. This experience has reinforced the old adage: “Test twice, apply once.”

Leave a Reply