Monday, November 21, 2011

GOC holiday schedule

From 24/Nov through 27/Nov the GOC will be operating on a Holiday
schedule. Staff will be available to respond to emergencies but
routine operations will resume at start of business Monday 28/Nov.

The GOC wishes its users and OSG staff a happy and satisfying
Thanksgiving Holiday.

Thursday, November 3, 2011

Moving Services to Bloomington

As you know, the GOC updates services on the second and fourth Tuesday of each month.
The update scheduled for November 8th marks a milestone for the infrastructure team.
After this date all GOC services (with one exception) be be hosted exclusively in
the Bloomington, Indiana data center.

Previously, most services had two instances, one physically hosted in Indianapolis
the other in Bloomington. These instances are in DNS round robin allowing users
of these services transparent use of either instance. The GOC will continue to
operate (at least) two instances and keep them in round robin, but both instances
will be in Bloomington.

So why the change? Originally, the Bloomington machine room was extremely unreliable.
Problems included a leaky roof, insufficient cooling and power and a lack
of space. In short, the systems hosted there had outgrown the facility. The machine
room in Indianapolis was larger, newer and considered more reliable. The old Bloomington
machine room went down during a thunderstorm when it was discovered that both electrical
feeds were, at one point, hung from the same utility pole. (Care to guess where the
lightning struck?) Two weeks were required to restore power during which many of the
university enterprise services were unavailable. This situation was clearly unacceptable
so the university decided to invest $37.2M in a new, state-of-the-art data center.

The 92,000 sq. ft. Bloomington data center is designed to withstand category 5 tornadoes.
The facility is secured with card-key access and 7 x 24 x 365 video surveillance.
Only staff with systems or network administration privileges have access to the machine room
requiring biometric identity verification. Fire suppression is provided by a double interlock
system accompanied by a Very Early Smoke Detection Apparatus (VESDA). Three circuits feed
the Data Center, traveling redundant physical paths from two different substations.
Any two circuits can fully power the building. A flywheel motor/generator set conditions
the power and provides protection against transient events and uninterruptible power
supplies protect against failures of moderate (~1 hour) duration. Dual diesel generators
can provide power for 24 hours in the event of a longer term power failure. In house
chillers provide cooling. Externally supplied chilled water plus city water can be used
in the event of a failure of this system.

Several advantages are realized by hosting all instances in one location. Service failures
associated with the network between Indianapolis and Bloomington are avoided. By using the
same LAN, DNS round robin can be replaced with Linux Virtual Server (LVS) giving control
of round robin to the GOC rather than the DNS administrators at Indiana University. Also
avoided are failures associated with the loss of one of two data centers. It is trivial to
move virtual machines from host to host since the IP address of the VM does not change, a
property allowing detailed load balancing on all VM hosts.

The GOC looks forward to continuing providing services with the availability OSG users
have come to expect.

Tuesday, November 1, 2011

EGI Technical Forum

Wrote this for the OSG Newsletter, but thought it would be good to drop here also.

Henry Kissinger famously asked, “Who do I call if I want to call Europe?” I was reminded of this quote when I attended the EGI Technical forum in Lyon, France and then visited CERN in September. While I don’t need to call Europe as a whole, as Operations Coordinator for OSG I may need to contact any one of the 30+ National Grid Infrastructures (NGI) that make up the European Grid Infrastructure (EGI) in times of an operational crisis.

Accompanied by Scott Teige the OSG Operations Technical Lead, we presented material on both technical and personal communications between OSG, WLCG, and EGI including Global Grid User Support System (GGUS) ticket synchronization, availability and reliability reporting, Berkeley Database Information Index (BDII) information exchange, and various other ways to keep communication channels open between OSG and our European counterparts. We came away with several technical action items from the SAM team. In addition, OSG Operations took a seat as a non-voting member of the EGI Operations Management Board. We are also participating in the WLCG Operations Technical Evaluation Group which will provide input to chart the future activities within the WLCG. We look forward to continued collaboration with WLCG and EGI on an operational level.

~ Rob Quick