I have a Rails application that runs jobs on the background using the Resque adapter. I have noticed that once a couple of days my workers disappear (just stop), my jobs get stuck in the queue, and I have to restart the workers anew every time they stop.
I check using ps -e -o pid,command | grep [r]esque
and launch workers in the background using
(RAILS_ENV=production PIDFILE=./resque.pid BACKGROUND=yes bundle exec rake resque:workers QUEUE='*' COUNT='12') 2>&1 | tee -a log/resque.log
.
Then I stopped redis-server using /etc/init.d/redis-server stop
and again checked the worker processes. They disappeared.
This gives a reason to think that worker processes stop maybe because the redis server restarting because of some reason.
Is there any Rails/Ruby way solution to this problem? What comes to my mind is writing a simple Ruby code that would watch the worker processes with the period, say, 5 seconds, and restart them if they stop.
UPDATE: I don't want to use tools such as Monit, God, eye, and etc. They are not reliable. Then I will need to watch them too. Something like to install God to manage Resque workers, then install Monit to watch God, ...
UPDTAE This is what I am using and it is really working. I manually stoped redis-server and then started it again. This script successfully launched the workers.
require 'logger'
module Watch
def self.workers_dead?
processes = `ps -e -o pid,command | grep [r]esque`
return true if processes.empty?
false
end
def self.check(time_interval)
logger = Logger.new('watch.log', 'daily')
logger.info("Starting watch")
while(true) do
if workers_dead?
logger.warn("Workers are dead")
restart_workers(logger)
end
sleep(time_interval)
end
end
def self.restart_workers(logger)
logger.info("Restarting workers...")
`cd /var/www/agts-api && (RAILS_ENV=production PIDFILE=./resque.pid BACKGROUND=yes rake resque:workers QUEUE='*' COUNT='12') 2>&1 | tee -a log/resque.log`
end
end
Process.daemon(true,true)
pid_file = File.dirname(__FILE__) + "#{__FILE__}.pid"
File.open(pid_file, 'w') { |f| f.write Process.pid }
Watch.check 10