Tuesday, October 16, 2012

OSG Operations at the EGI Technical Forum

Reflecting on the European Grid Initiative Technical Forum which took place in Prague September 17-21, I am reminded of the maturity of the relationship that OSG Operations has with our European collaborators. We've had interoperating ticketing, information systems, topology, and monitoring tools for most of the life of the OSG project. We've recently added accounting to this list, and are now working on collaborating on communication and information dissemination efforts. Due to the diligent Operational effort on the parts of OSG, EGI, and WLCG, interoperability of these tools has become second nature.

I had the privilege of being accompanied by Bill Barnett, IUs Director of Science Community Tools, and introduce him to our EGI and WLCG Operational peers. Giving him a chance to understand some of the technical and logistical challenges facing OSG Operations when working with international collaborators.

I gave presentations on current Operational status, future Operational plans, and on OSG's plans for GLUE 2.0. While the current and future plans were in tune with EGI's and WLCG's direction, OSG is looking at new solutions for the information system and not immediately embracing the need for GLUE 2.0, unlike EGI Operations. Each presentation sparked welcome discussion and input from our EGI Collaborators. I was also asked to do an interview with Grid Talk  on the OSG and EGI Interoperability relationship, available at http://gridtalk-project.blogspot.com/2012/09/rob-quick-from-open-science-grid-on.html.

Bill and I were also able to meet with the iSGTW European Editors and work out some of the initial plans for the new iSGTW US based position that will be hired shortly. This new collaboration will sponsor a US Reporting Desk for iSGTW and will function as a satellite project to OSG.

Looking back at my presentations and several other presentations from our EGI and WLCG partners, the one idea that kept coming up in various fashions was how mature these relationships and tools have become over the several years I've been acting as the Operations Coordinator for OSG. This is due to continuous effort in the areas of evolution of services, change management, and constant communication. The effort of maintaining this ongoing relationship makes it possible to continue robust and reliable infrastructure services while expanding into new areas of collaboration.

Wednesday, August 8, 2012

Reflecting on the Ghana Grid School

As I sit in the final day of the African School for Fundamental Physics and its Applications (ASP) at Kwame Nkrumah University of Science and Technology in Kumasi, Ghana I can't help but reflect on the grid portion of course and my interactions with the students and my fellow instructors.

On arrival in Accra, Ghana I met up with some long time OSG and DOSAR VO collaborators Horst Severini from Oklahoma University and Scot Kronenfeld from the University of Wisconsin Madison. I was also introduced to a few new colleagues that would be joining us during the Grid School, Julia Gray from CERN and Stony Brook University and Pat Skubic from Oklahoma University. After a short flight from Accra to Kumasi we spent several days configuring the machines in the classroom to act as a Condor cluster and working through the lessons one final time trying to find any final adjustments to the course material. We were joined by a few other well known OSG faces at the end of the weekend, Dick Greenwood from Louisiana Tech and Jae Yu from the University of Texas Arlington completing the instructional staff.

We were able to make our first visit to the classroom on Friday evening and found a modern computer classroom with reliable, though not necessarily robust network. After a few hours of tweaking the machines in the classroom we were confident the school was ready to begin on Monday morning.

The weekend consisted of a bit of free time, we did some shopping and we visited the woodworking village Ahwaii on the outskirts on Kumasi Saturday and held a session with ASP students on Sunday where each US school represented talked with students about graduate student opportunities.

Monday brought the start of class and much to my surprise a long time friend, and former South Padre Grid School (2004) roommate, Addy Tettah who was in attendance to help with instruction. The lectures by Scot introducing Condor and HTC concepts were excellent, but the hands on exercises were a challenge, mostly due to the students being unfamiliar with UNIX navigation.

Addy and Rob renewing a long time friendship.

It was my turn Tuesday, recapping day one and introducing the students to the Open Science Grid, BLAST, DHTC and Glide-Ins. Hands on exercises went much faster, mostly due to the students quickly grasping basic UNIX. Horst took the afternoon session to introduce storage and SRM. Real science concepts such as BLAST seemed to excite and energize the learning process. I left day two both elated and exhausted.

This morning we are back to physics and examples based on the ROOT software. Time is flying and we have the afternoon scheduled to talk with students about their experience and answer any questions they have for us.

It's been a whirlwind trip, but I'm honored to have been a part of ASP 2012 and am very proud of all the hard working students. Seeing the increase in knowledge and understanding in such a short time is extremely rewarding. I've greatly enjoyed my time in Ghana and working with collaborators and future grid users has been gratifying and the extra bonus of renewing an acquaintance after several long years on his home turf, this has turned into an unforgettable experience I will treasure for a very long time. I look forward to hearing from these students as they begin their scientific careers and to return for the next Grid School, no matter where it happens to be held.
Students and Instructors of the ASP2012 Grid School.

Rob Quick
OSG Operations Area Coordinator
Indiana University

Monday, May 7, 2012

CondorWeek 2012

During the week of May 1-4, I attended CondorWeek 2012 in Madison, WI.  The event was hosted in the Wisconsin Institutes for Discovery at the University of Wisconsin-Madison.

The first day consisted of tutorials explaining the basics of Condor usage, workflows, administration, and security.  Having spent more time doing general support with OSG and working with the support workflow, I haven't spent as much time as I'd like with Condor, but the tutorials did show me some things that I would like to try.  I think that next year, these tutorials would be even more useful for me after trying some new things with Condor in the coming months.

The final three days consisted of researchers, admins, and companies showing and discussing off the multitude of ways that they use Condor to enhance and extend their workflows.  Of particular interest to me was the way that DreamWorks used Condor to send jobs to their rendering software to create their animated films.  They showed the trailer for Madagascar 3, which is their first movie to be created using Condor from beginning to end in the rendering process.

It was interesting to hear other outfits explain how they have had to work around some things Condor currently can't do and see how the Condor team was curious to see if these things could be implemented into coming versions of Condor.  The team was very interested to expand and enhance Condor, which is already a very capable product.

Overall it was a good experience that made me aware of all the things that can be done with Condor.  I look forward to trying some of these things for my own, and hopefully returning next year with a much better understanding of Condor as a whole so that I can gain even more information from the talks and tutorials.

Thursday, March 29, 2012

RabbitMQ & CometD

We've been experimenting with following new services in the last few month here at GOC.
  • event.grid.iu.edu (RabbitMQ/AMQP Server)
  • comet.grid.iu.edu (CometD Server) 
RabbitMQ (event.grid.iu.edu) allows GOC, and OSG users to publish and/or subscribe to various messages generated by our services, and OSG in general. Currently, we receive following messages.
  • RSV status changes
  • OIM updates
  • GOC Ticket updates
  • GIP information changes (prototype)
Anyone can subscribe to these messages in XML format, and be notified in real time using AMQP messaging client. APIs are available in many languages including, Java, PHP, Python, etc.. 

CometD (comet.grid.iu.edu) allows us to push messages to our various web applications. For example, GOC Ticket uses it to display users who are currently viewing a ticket. If someone updates a ticket while someone else is viewing, it will send page refresh request to all viewers. CometD can also be used to implement features such as chat, shared editing, and other functionalists.

CometD itself is a Java application framework where we can implement various services that client (web browsers) can make requests to. CometD acts as a glue between RabbitMQ and the web browsers. For example, "GOC event service" in comet.grid.iu.edu subscribes to RSV, OIM, and GOC tickets events, and pools all recent events. A web browsers can then make a request to download these events during the initial loading of a page, and it will subscribe to "new event" queue on comet in order to receive new events in real time until user closes the page.

By using Event & Comet services, we can implement interesting features such as Realtime GOC event (prototype) in MyOSG. My current goal is to continue experimenting with RabbitMQ/CometD and see what I can (and can not) accomplishing using these tools. 

If anyone has an idea about what we can do with these tools, please feel free to send me a message.

Wednesday, February 15, 2012

New Home for OSG Web Site

On Frebruary 28th the OSG Webpages located at www.opensciencegrid.org will move to the OSG Twiki. This move corresponds with the upcoming conclusion of a contract with the Chicago based web hosting service Tilted Planet.

During the scheduled production service update on the 28th, browsers will be redirected from the current location to the new location on the main OSG Twiki page. A mockup of the new page can be seen at twiki-itb.grid.iu.edu.

This is the first step, and likely an interim web page home, in a project that will affect the OSG Public Web Pages, the OSG Twiki, the DocDB, and possibly other OSG services with web-UI’s. Evaluation of content management systems, wikis, and documentation file database solutions has already begun and will continue over the next several months. Please contact the GOC (goc@opensciencegrid.org) if you have a suggestion for packages you think should be evaluated.

Monday, November 21, 2011

GOC holiday schedule

From 24/Nov through 27/Nov the GOC will be operating on a Holiday
schedule. Staff will be available to respond to emergencies but
routine operations will resume at start of business Monday 28/Nov.

The GOC wishes its users and OSG staff a happy and satisfying
Thanksgiving Holiday.

Thursday, November 3, 2011

Moving Services to Bloomington

As you know, the GOC updates services on the second and fourth Tuesday of each month.
The update scheduled for November 8th marks a milestone for the infrastructure team.
After this date all GOC services (with one exception) be be hosted exclusively in
the Bloomington, Indiana data center.

Previously, most services had two instances, one physically hosted in Indianapolis
the other in Bloomington. These instances are in DNS round robin allowing users
of these services transparent use of either instance. The GOC will continue to
operate (at least) two instances and keep them in round robin, but both instances
will be in Bloomington.

So why the change? Originally, the Bloomington machine room was extremely unreliable.
Problems included a leaky roof, insufficient cooling and power and a lack
of space. In short, the systems hosted there had outgrown the facility. The machine
room in Indianapolis was larger, newer and considered more reliable. The old Bloomington
machine room went down during a thunderstorm when it was discovered that both electrical
feeds were, at one point, hung from the same utility pole. (Care to guess where the
lightning struck?) Two weeks were required to restore power during which many of the
university enterprise services were unavailable. This situation was clearly unacceptable
so the university decided to invest $37.2M in a new, state-of-the-art data center.

The 92,000 sq. ft. Bloomington data center is designed to withstand category 5 tornadoes.
The facility is secured with card-key access and 7 x 24 x 365 video surveillance.
Only staff with systems or network administration privileges have access to the machine room
requiring biometric identity verification. Fire suppression is provided by a double interlock
system accompanied by a Very Early Smoke Detection Apparatus (VESDA). Three circuits feed
the Data Center, traveling redundant physical paths from two different substations.
Any two circuits can fully power the building. A flywheel motor/generator set conditions
the power and provides protection against transient events and uninterruptible power
supplies protect against failures of moderate (~1 hour) duration. Dual diesel generators
can provide power for 24 hours in the event of a longer term power failure. In house
chillers provide cooling. Externally supplied chilled water plus city water can be used
in the event of a failure of this system.

Several advantages are realized by hosting all instances in one location. Service failures
associated with the network between Indianapolis and Bloomington are avoided. By using the
same LAN, DNS round robin can be replaced with Linux Virtual Server (LVS) giving control
of round robin to the GOC rather than the DNS administrators at Indiana University. Also
avoided are failures associated with the loss of one of two data centers. It is trivial to
move virtual machines from host to host since the IP address of the VM does not change, a
property allowing detailed load balancing on all VM hosts.

The GOC looks forward to continuing providing services with the availability OSG users
have come to expect.