I have a sidekiq worker that shouldn't take more than 30 seconds, but after a few days I'll find that the entire worker queue stops executing because all of the workers are locked up.
Here is my worker:
class MyWorker include Sidekiq::Worker include Sidekiq::Status::Worker sidekiq_options queue: :my_queue, retry: 5, timeout: 4.minutes sidekiq_retry_in do |count| 5 end sidekiq_retries_exhausted do |msg| store({message: "Gave up."}) end def perform(id) begin Timeout::timeout(3.minutes) do got_lock = with_semaphore("lock_#{id}") do # DO WORK end end rescue ActiveRecord::RecordNotFound => e # Handle rescue Timeout::Error => e # Handle raise e end end def with_semaphore(name, &block) Semaphore.get(name, {stale_client_timeout: 1.minute}).lock(1, &block) endend
And the semaphore class we use. (redis-semaphore gem)
class Semaphore def self.get(name, options = {}) Redis::Semaphore.new(name.to_sym, :redis => Application.redis, stale_client_timeout: options[:stale_client_timeout] || 1.hour, ) endend
Basically I'll stop the worker and it will state done: 10000 seconds, which the worker should NEVER be running for.
Anyone have any ideas on how to fix this or what is causing it? The workers are running on EngineYard.
Edit: One additional comment. The # DO WORK has a chance to fire off a PostgresSQL function. I have noticed in logs some mention of PG::TRDeadlockDetected: ERROR: deadlock detected. Would this cause the worker to never complete even with a timeout set?