[Leaplist] Any suggestions on resolving "Stale NFS file handle" errors?

Bryan J Smith b.j.smith at ieee.org
Sat Dec 13 08:46:48 EST 2008


On Sat, 2008-12-13 at 01:43 -0500, Damien McKenna wrote:
> The files which are causing the major problem are all in one  
> subdirectory (/opt/httpd/htdocs/mysite/coolstuff/), but the entire  
> directory is being mapped several levels up (/opt/httpd/htdocs) so I  
> don't know if it's going to be feasible to redo it given there are  
> several sites running on the same cluster, or if they're going to want  
> to.

My apologies in advance, but I'm not following your logic here ...  
  "but the entire directory is being mapped several levels up"  

Understand you can map directories upon directories with NFS.  So you
could map in separate rw directories for each system.

Heck, it's Apache, so aliases work too.  If it's software outside Apache
that access things directly on the filesystem, then NFS mount
appropriate.  NOTE:  You _can_ mount the same NFS mount several times on
the system, in different or even the same subdirectories -- kernel 2.6
is damn good at it now (no issues).

As far as ...
  "so I don't know if it's going to be feasible to redo it given
   there are several sites running on the same cluster, or if
   they're going to want to."

You _must_ maintain coherency between clients.  The very likely reason
you're getting stale NFS handles is because the NFS clients _are_ trying
to enforce some coherency.  The very likely reason the Windows client is
not is because it doesn't bother and honor locks, checking state, etc...

Some applications maintain their own coherency between nodes, but still
run over NFS.  Others use their own protocols.  In any case, you need to
do some sort of server-to-multi-client interaction for coherency.

Microsoft does this with Jet in client programs, so they can access the
same file (with row-level locking -- by the way, NFSv4 has byte-level
locking now too, but it still doesn't solve the client coherency issue
completely), which is some very nasty stuff anyway.  It's totally
unreliable, and why Microsoft doesn't even recommend it for enterprises,
and recommend a SQL type access (typically with their MS SQL product or
an embedded solution), where a server enforces such.

Now if you mount "read only," you _remove_ the chance files are being
accessed read/write, being locked and the corresponding coherency
issues.  Is that possible?

> The "coolstuff" directory is pulled down on one of the servers (say,  
> server A) and then through the glory of NFS is accessible to all of  
> the other servers, so maybe I could talk with them about moving the  
> cron-update task to the SAN.  Would that make any difference?

If they are mounting it read/write, and possibly using write locks on
the files, hell yes.  The servers are liking running into locking
issues.  But even then, you still haven't talked about how the clients
maintain coherency.  I'd really like to see this architecture drawn up,
the type of interactions, etc....

Again _read-only_ mounts are how 10 out of 10 HOWTOs I've seen detail
how to use NFS for fail-over in web servers.  Why are you mounting
read/write?  If you do that, then the HOWTOs are useless.

> Key reasons are that for our CMS (drupal):

BTW, Acquia Drupal is an official ISV's which Red Hat supports with
SLAs, so I assume they have to have a "reference architecture" for
clustering:   
  http://rhx.redhat.com/rhx/catalog/productdetail.jspa?productId=1027  

I'll see if I can get some info on this internally.

> * admins upload content (in this case specifically images),
> * some parts of the static content (CSS, JS) are optimized & merged  
> into single files,
> * image thumbnails are generated dynamically as needed,
> and these actions can happen on any of the servers.

You have an atomicity problem then.  You have to solve that first'n
foremost.  You are getting NFS stale handle issues because content is
changing, and it's happening mid-access as you get hits.  A _proper_ NFS
client will tell you that.  A _poorly_ designed NFS client will gladly
serve mixed-state content without.

When it comes to production sites, 10 out of 10 solve this with a
combination of SQL and directory revisioning.  All dynamic content is
stored in SQL, and seelct static content is rotated (e.g., new
directory, change the alias -- version numbers, such as with a version
control checkout revision, work well),  and a "graceful" is done to load
the new, static content.

This sounds like a workflow and lifecycle issue that is introducing
atomicity when it should not be.

> What I could do is see about writing a plugin to pre-generate some of  
> the content on a specific server, so that in theory only one server  
> would be writing to the cluster for those instances, but I don't know  
> how feasible that is given the number of hits the site receives.
> 
> > 1.  How are you doing NFS serving/fencing on the storage-end?
> 
> I don't know, but I'll talk with the adin.

How their clustering architecture is designed would be of great interest
to myself -- client, storage, etc...  Does it follow any "reference
design"?  Again, I'm going to see what we have with Acquia.

> > 2.  How are you preventing your NFS clients from stomping on each
> > other's write locks?
> 
> That I don't know.  The majority of the NFS errors are showing up for  
> a directory which is only written to on one server.  That said, given  
> that there are a few other sites on the server and several scenarios  
> where any of the servers could be trying to write to the directory  
> structure, I'll have to see if they can work out which directory /  
> files are triggering the failure.

Even if you generate content on only one server, if the operations
happen quick enough, you're still probably getting hits too quickly that
you're having a serve mid-commit.  NFS' rpc.lockd (nfslock -- you are
running rpc.lockd -- nfslock service on Fedora/CentOS/RHEL -- correct?)
and the kernel lockd can only provide so much.  And I think the problem
is because you're mounting read/write.

No network filesystem solves client coherency, not SMB either including
not by row-level locking (which NFSv4 has as well), which is why
Microsoft invented Jet.  Jet is also why Microsoft strongly recommends
enterprises develop apps with a SQL client and a SQL backend (embedded
or MS SQL).

That all said, here's your "plan of attack" ...

1.  Designate only 1 server, and _only_ mount the NFS export read/write
there.  If this causes your CMS system and/or other software to break,
then your CMS system was _only_ designed for a single server
(unfortunately).

2.  Find out what the "reference design" is for clustering with Drupal.
Then compare it to your client, server, storage, etc... layout.  That's
crucial.  Read/write mounts, locks, etc... cause hell on concurrent
access, especially high rates of and dynamic content in access where
data could be accessed mid-commit.

When I was with a real-time distro company, there was one guy (a student
admin at Carnegie-Mellon I believe, if you can believe that), who
thought he could use a web server HOWTO, with read-only NFS mounts, for
NFS mounted home directories -- and it took me awhile to find what HOWTO
he was using.  He wouldn't back away from, "well, it works for some
people" with my constantly trying to point out, "when they have
read-only mounts, yes."

At some point, you just give up and let people believe what they want to
believe.  ;)

I'm not a web server or content guy in the least bit, so I could be
wrong.  But I do known infrastructure and data serving concepts and
details.  That's why I'm interested in knowing more about the end-to-end
architecture here, and I'll see what I can dig up from Acquia, or even
the Durpal community sites.


-- 
Bryan J  Smith                Professional, Technical Annoyance
Mugshot Homepage:  http://mugshot.org/person?who=58wDcGKx6NcZAb
---------------------------------------------------------------
           Fission Power:  An Inconvenient Solution            


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Leaplist mailing list