EPP Grid - SSH configuration woes - problem and solution.


Start of topic | Skip to actions

SSH configuration woes - problem and solution.

This is related to twiki topic:
CE overloaded and jobs not running.

At the beginning of the week August 20 2007, I removed all pool accounts from charm-mgt and started afresh. The ATLAS VO wants separate pool accounts for production users and I found some time to do it.

Since then we have been failing. This is what happened.

At some point (don't remember when) I found some strangeness when ssh'ing around as a pool user. Sometimes it was challenged and other times it wasn't. So, to get around this I created ssh key pairs in each account and added the pub key to the authorised keys file in the account. When I re'gened them last week, I didn't recreate key pairs for the accounts. It turns out that hostbased Auth was not configured correctly so when the pbs script tried to stage in stuff via #PBS -W directives, it failed.

What I did:

  • RH by default sets the long hostname in /etc/sysconfig/network - not the linux way, so I've put only the short hostname there
  • added an entry with the public IP / name of the host in /etc/hosts (charm-mgt, lcg-compute)
  • emptied out /etc/hosts on all hosts (pnets) - they only contain a localhost entry now
  • set /etc/nsswitch to only use dns for host lookups
  • fixed up the forward and reverse lookups on charm-mgt's dns
  • set all hosts to use charm-mgt's dns (/etc/resolv.conf)
  • fixed up NIS on charm-mgt by adding "--no-limit-check" to the makedbm command in /var/yp/Makefile - NIS was unhappy with the length on the lines in the group file after remaking the accounts - don't know the implications of this...
  • fixed up /etc/passwd, /etc/shadow, /etc/group, /etc/gshadow on all hosts and made sure they use NIS
    • seeing a strange problem tho: ps -ef on the hosts show's UID's for some pool accounts instead of names.., - don't know why !!
  • checked, and double checked /etc/hosts.equiv, /etc/ssh/shost.equiv and /etc/ssh/ssh_known_hosts for correctness...

So - I have now remade the ssh_known_ hosts file and everything seems to be working.

For reference - applicable to multi-homed hosts (charm-mgt, lcg-compute):

the ssh_known_hosts file has the format:

"short name, long name local interface,local ip address" ssh-rsa key....
"long name public interface, public ip address" ssh-rsa key...

From the manpage of sshd:

section SSH_KNOWN_HOSTS FILE FORMAT

-- snip --

When performing host authentication, authentication is accepted if any matching line has the proper key.  
It is thus permissible (but not recommended) to have several lines or different host keys for the same names.  
This will inevitably happen when short forms of host names from different domains are put in the file.  
-- snip --

key Log In Revision:  r2 - 30 Aug 2007 - MarcoLaRosa
Authorised by:  Geoff Taylor (G.Taylor @ physics.unimelb.edu.au)
Maintained using:  This site is powered by the TWiki collaboration platform
Copyright © 2000-2009 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.