[Leaplist] 24-7 redundancy for poe folk

Bryan J Smith b.j.smith at ieee.org
Mon Jul 6 04:33:56 EDT 2009


I wasn't going to respond, as I could easily overload you and the group
with suggestions.  But I've noticed several people have hit on several
concepts that now lead into what I wanted to suggest (e.g., heartbeats,
monitoring, general clustering, etc...).  There are a lot of upstream
(project, Fedora/Fedora Hosted), Red Hat** products (e.g., Enterprise
Linux, EL), "EL rebuilds"** (e.g., CentOS), etc... (I mention Fedora/Red
Hat related details because I know you've used them or related
downstream projects such as SME Server, now based on CentOS) and related
solutions because even the smallest of enterprises need to solve these
details every day.

Some things to be aware of ...
- Red Hat Cluster Suite (CS), including Conga, Luci and Ricci
- SNMP agents, Nagios and other, SNMP-based Monitoring
- Red Hat Network (RHN) IM (Jabber) and SSH-based Probing/Monitoring

I understand you're going to be monitoring these systems remotely.  I am
also the first to point out that sometimes clustering can complicate
things to the point it's more of a support issue than if the system goes
down.  I.e., free software services running on GNU/Linux platforms tend
to be darn stable, so sometimes it's more of a support burden to use
clustering.  At the same time, these projects/products/solutions exist
for a reason, even one-off, remote office, etc... systems.

Just some ideas to throw at you that I wanted you to be aware of.  A lot
of these things will be "overkill," but it never hurts to start playing
with them when you get time, to see if they will work for you.  Things
that might give Domain-Logic and/or your clients, customers, friends,
non-profits, etc... some added capabilities.

[ **NOTE:  Remember, one doesn't always have to use an "EL rebuild" to
get "EL."  Red Hat realized long ago that it's "cheaper" to "give things
away" than to charge less for something and try to support it (hence
some of the changes during Red Hat Linux 7.x that eventually led to many
things beyond just the Fedora trademark).  So Red Hat does offer options
for education, charities, non-profit organizations, etc... as long as
they don't tap traditional support channels, such as service level
agreement (SLA) response times, etc...  Also, for home use, I always
remind people (especially those studying for the RHCE) that Red Hat does
offer a $99/year subscription with "the kitchen sink" known as JBoss
Developer Studio, which includes the high-end Red Hat Enterprise Linux
Server, Advanced Platform, and full RHN access.  Think of it as the
equivalent of MSDN, for those that know what MSDN is. ]


- Red Hat Cluster Suite (CS), including Conga, Luci and Ricci

To start, the Red Hat Cluster Suite (CS) includes a web-based management
system called Conga.  Luci is the server component and her servant,
Ricci, is the client.  ;)  Understand that Luci doesn't need to be on
any system in the cluster (although ricci instances do).  Conga is the
easiest to setup heartbeat, monitoring and other details related to
ensuring services are running.  Doing it under Xen makes storage fencing
cake.

If you really want it for software/instance failover, then 2-3 Xen domU
instances (2 if using a quorum disk) on the same, single Xen dom0 server
is cake.  That way the service will move between domU instances if one
instance fails.  If you're hardware is redundant (power, disk, etc...),
then a single system should keep running the Xen dom0 without issue,
short of a mainboard or other solid state failure.

If you want it for hardware failover, then 3 physical systems (2 for the
platforms with service, 1 more as an iSCSI target).  Red Hat Enterprise
Linux Release 5 Update 3 (EL 5.3) supports this functionality and is
mature (especially iSCSI targets, it wasn't mature enough before 5.3).
I would advise against GNBD for storage now that iSCSI targets are
mature in EL 5.3 (not that I would ever recommend GNBD for production
usage).  Again, Xen-based storage fencing is most ideal for instances
(even for multiple, physical systems).

Although, again, multiple, physical systems with shared storage may be
overkill in general.  Of course, NFS is also an option for read-only
data to multiple servers.  But iSCSI targets + GFS is mature in EL 5.3,
and trusted by many clients/customers.  YMMV on "EL rebuilds," although
most build the full "AS" (EL 2-4) / "Server, Advanced Platform" (EL 5)
versions with all components (as everything Red Hat releases is GPL and
provided in SRPMS form).

Related documentation:  
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/index.html  
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Global_File_System/index.html  


- SNMP agents, Nagios and other, SNMP-based Monitoring

If you are deploying EL, there are several SNMP agents, as well as the
Nagios monitoring framework (and even a few other SNMP options) in the
Fedora(TM) Project's Extra Package for Enterprise Linux (EPEL):  
  http://fedoraproject.org/wiki/EPEL  

As always, the thing to remember about Fedora(TM) v. Red Hat(R) is that
Fedora is Red Hat's "community trademark" and Red Hat(R) is its
"commercial branding" (by and of a publicly traded company).  I.e.,
things that interface upstream, downstream, have components regularly
utilized but not supported under Service Level Agreements (SLA) by Red
Hat, not related to Independent Hardware / Software Vendor (IHV/ISV)
certification, get the Fedora(TM) branding.  It does not necessarily
mean there aren't the same Red Hat employees on it as things with the
Red Hat(R) branding.

EPEL is one of those projects where "Fedora(TM)" addresses the official,
Red Hat interest into a single repository for any "EL" release.  So you
can safely tap EPEL on any EL system, right down to the EPEL GPG key for
validation of packages as authentic.

They are still on Nagios 2, although there was some discussions on
upgrading to a newer version.  I suggested they append the new version
to the package name.  This is commonly done in EL product management as
well, to introduce a "new feature" while not making it the default.  So
be on the lookout for a new Nagios package.  There are countless
plug-ins available on EPEL as well.

I know there are other SNMP systems out there.  You also don't have to
use a SNMP management system to get SNMP-based traps/alerts.  You can
use SMTP.  I.e., if your clients don't like SNMP on their network,
remember you can make it local-only.  In fact, snmp-net services default
to local-only access.  You can then trap on SNMP agents and send out
SMTP e-mails.  There are many, generic SNMP agents that can trap on
messages and other syslog entries as well.


- Red Hat Network (RHN) IM (Jabber) and SSH-based Probing/Monitoring

And if you don't want to run SNMP at all, or your client doesn't even
allow it on the local system, I want to mention one other option.  The
Red Hat Network (RHN) standalone version, RHN Satellite Server, offers
IM-based (Jabber, part of the osad service) and SSH-based
Probing/Monitoring.  This is an option I utilize for clients/customers
who can't allow SNMP.  Ironically, even though RHN itself utilizes
Oracle, its SSH-based Monitoring uses a separate MySQL store (don't get
me started on the evolution of RHN, it's a decade old ;).

I mention this because RHN is now open source as Spacewalk:  
  https://fedorahosted.org/spacewalk/  

It's what several "EL rebuild" projects are moving to as a management
platform as well.  E.g., you can import CentOS releases and trees and
kickstart from Spacewalk.  Cobbler/Koan and other, Red Hat Emerging
Technologies (ET) are being integrated.  In a nutshell, Spacewalk is the
upstream for RHN starting with 5.3.  One of the big changes coming is
Delta-RPM support (added to the newer version 4.6 of the RPM back-end),
which allows an "intelligent" deployment, management and update server
to dynamically generate and offer only the changes (drastically reducing
download size), something you can't do with static YUM repositories.
RHN and, therefore, Spacewalk use YUM 3.x with plug-ins, as will
Delta-RPM (up2date died in 2006, and is used for EL releases in
"maintenance-only" mode -- e.g., EL 2-4).

Although Spacewalk hasn't achieved the re-write to support PostgreSQL
instead of Oracle (long story short, RHN used stored procedures, all of
which are being moved out of the DB, only 4 will remain), one can run
the Oracle XP edition for now (with limitations).  Many management
"probes" in the monitoring service already exist for various, standard
Linux services.  It's a nice, additional option to SNMP.  And with a
full Spacewalk server, you can remotely manage and schedule your
clients' server operations, ensure they stay current, deploy new
configuration files (although Spacewalk is likely to adopt Bcfg2, Puppet
or another, real configuration management engine in the near future,
instead of the 10 year old legacy RHN one ;).



-- 
Bryan J Smith          Professional, Technical Annoyance
b.j.smith at ieee.org    http://www.linkedin.com/in/bjsmith
--------------------------------------------------------
I don't have a "favorite Linux distro."  I use, develop
and support community efforts, often built around Linux.
Technology and solutions are my focus, not dragging in
assumptions, marketing and other concepts which dominate
non-community developed software, which I left long ago.


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Leaplist mailing list