Monday, January 25, 2016

SSH and Multiplexing

Recently I worked a project that required the transfer of data between servers using RSYNC at regular intervals. One of the things I noticed was that after awhile the number of open SSH processes started to increase. This as each new RSYNC session would open its own connection. I started wondering if those SSH connections could be reused/re-opened

Then I discovered there was a way to configure SSH to reuse the open TCP connection rather that setting up an new one. This comes with a big advantage as the overhead of creating a new TCP connection is now eliminated. This results in faster connection time and transfer of data.

To configure you will need to edit the SSH config file for the respective user account (~/.ssh/config)
Host x.x.x.x
  ControlMaster auto
  ControlPath ~/.ssh/sockets/%r@%h-%p
  ControlPersist 1800 
What do these options mean?

  • Host x.x.x.x # the IP/Domain name  of the server you are connecting to. You can opt to use the wildcard * but I prefer being specific and offers some security
  • ControlMaster auto #The default is no. So you have to specify either "yes","auto","ask","autoask".
  • ControlPath ~/.ssh/sockets/%r@%h-%p #Make sure the path exist. Create the folder and secure. %h - target hostname, %r - remote username, &%p - port. Other variables include %L,%l, %n & %u.
  • ControlPersist 3600  #How long the the master connection remains open before timing out due to inactivity. Defaults to seconds but can be expressed as 60m or 1hr

You can have multiple blocks
Host 1.1.1.1
  ControlMaster auto
  ControlPath ~/.ssh/sockets/%r@%h-%p
  ControlPersist 1800 
Host 2.2.2.2.
  ControlMaster auto
  ControlPath ~/.ssh/sockets/%r@%h-%p
  ControlPersist 1800 
All commands that use SSH, e.g. RSYNC, SCP, SFTP, will benefit from multiplexing. It should be noted that Multiplexing allows a a maximum of 10 open sessions per connection by default. You can increase this number  by changing the value of MaxSessions.  If you have a large number of connections you might need to change MaxStartups.

References

Monday, January 18, 2016

MySQL Replication and slave_net_timeout

In an article written in 2009, Jeremy Zawodny took on the MYSQL default for slave_net_timeout among other settings. Daniel Schneller, author of MySQL Admin Cookbook,  had written about his experience in 2006.

What is slave_net_timeout

It is the time in seconds for the slave to wait for more data from the master before considering the connection broken, after which it will abort the read and attempt to reconnect. The retry interval is determined by the MASTER_CONNECT_RETRY open for the CHANGE MASTER statement, while the maximum number of re-connection attempts is set by the master-retry-count variable. The first reconnect attempt takes place immediately.

For installations before MySQL 5.7 the default value for slave_net_timeout is 3600 seconds or 1 hour but as of  MySQL 5.7 it is 60 seconds. MariaDB still has 3600 seconds as the default.

The problem is summed up in the following statement.
When the network connection between a master and slave database is interrupted in a way that neither side can detect (like a firewall or routing change), you must wait until slave_net_timeout seconds have passed before the slave realizes that something is wrong. It will then try to reconnect to the master and pick up where it left off. This is bad. This could be 1 hour if the slave_net_timeout is set to 1 hour (3600 seconds)
I discovered this issue when I noticed the slave was not up to date but there were no errors. However if the slave was stopped (STOP SLAVE) and started (START SLAVE), suddenly replication started again and everything was up to date.

Before the restart the following would be noted. The  Read_Master_Log_Pos and Exec_Master_Log_Pos values did not match the log position (Show Master Status) on the master server.  On the slave, the Slave_IO_State is "Waiting for master to send event",  the Slave_IO_Running and Slave_SQL_Running values are both are "Yes". The Master_Log_File and Relay_Master_Log_File matched. In essence don't trust "Show Slave Status" alone.

The solution is to lower slave_net_timeout  to a more reasonable value 60 - 300 seconds.

Other values worth looking include:
  • skip-name-resolve -  enable and use IPs only in Grants
  • connect_timeout - Set higher than default
  • max_connect_errors - Set to High value
  • interactive_timeout - set to 300 seconds. Just lower that 28800 seconds
  • wait_timeout  - set to 300 seconds. Just lower that 28800 seconds
The last two have been of concern to Eliot Kristan since 2006.

As usual make one configuration change (or related changes only)  at a time in order to monitor and evaluate the effectiveness of the change.

References
  1. Fixing Poor MySQL Default Configuration Values
  2. Eliot Kristan - MySQL wait_timeout default is set too high!
  3. Some Reasonable Defaults for MySQL Settings 
  4. MySQL replication timeout trap
  5. MariaDB- Replication and Binary Log Server System Variables
  6. MySQL -  Replication Slave Options and Variables
  7. MySQL lowering wait_timeout value to lower number of open connections
  8. Changing MASTER_CONNECT_RETRY - Anything pit falls to keep in mind? 
  9. MySQL replication hung after slave goes offline and comes back online again

Monday, January 11, 2016

Removing Directories older than x days

I have setup where directories are created daily for the given date in the following format, yyyymmdd and content placed there for temporary access.

After a while this content becomes stale and can be deleted.

The holding folder is
/home/user/holding/

and the folders created are
/home/user/holding/20151221
/home/user/holding/20151222
/home/user/holding/20151223

I want to delete directories older than x days

The following commands will delete files only and not the directories.
/usr/bin/find /home/user/holding/ -type f -mtime +5 | /usr/bin/xargs rm
/usr/bin/find /home/user/holding/ -type f -mtime +5  -exec rm -rf {} \;
To delete directories and files included, modify the above and change rm to rm -rf
/usr/bin/find /home/user/holding/* -mtime +5 | /usr/bin/xargs rm -rf
Note the asterisk and the removal of  "-type f"

Testing

Checking with -mmin +1 i.e. files/directories modified over 1 min ago. Always test what you are going to delete before proceeding.

/usr/bin/find /home/user/holding/* -mmin +1
/home/user/holding/20151221
/home/user/holding/20151222
/home/user/holding/20151223

/usr/bin/find /home/user/holding/ -mmin +1
/home/user/holding/
/home/user/holding/20151221
/home/user/holding/20151222
/home/user/holding/20151223
Therefore to delete the folders/directories in /home/user/holding/ and not /home/user/holding/ itself use /usr/bin/find /home/user/holding/*

Complete
Using with cron to delete folders older than x days, in this case 5, use the following.
0 6 * * *  /usr/bin/find /home/user/holding/* -mtime +5 |  /usr/bin/xargs rm -rf  > /dev/null

Note be very careful with rm -rf. It can wipe out your entire system if incorrectly used. Always test what you want to delete before attempting to do so

Generate PFX file using OPENSSL on Windows

Had a situation where a client needed a PFX with password for a particular setup. This is something I have not done before, so here are the ...