Work pulling: unexpected RequestNext messages

I’m using the work pulling mechanism to implement a dynamic, distributed pool of workers. I’ve read the documentation and the relevant sample code, but while testing some failure modes I’ve encountered an unexpected behavior.

I’ve reduced my context to the smallest possible one:

One node
One worker: simplest possible one. It doesn’t use a child executor like the one in akka-sample-distributed-workers-scala.
One job

I’ve modified my worker so that it always fails by throwing a RuntimeException. Since it’s configured with the restart supervision strategy, the worker actor will always restart when it fails.

Reading the documentation this is what I expect:

System starts
The producer receives a RequestNext message for the only active worker.
I send a job to the producer, so that it’s sent to the idle worker.
The worker receives the job but fails and doesn’t send the confirm message to the ConsumerController.
Since the WorkPullingProducerController has tracked that job as unconfirmed, it’s informed by the infrastracture that the worker it has sent that job to has been restarted, and it doesn’t have other workers to send that message to, it re-sends the unconfirmed message to the restarted worker.

And this is precisely what happens. However I’d have also expected that during this process the producer wouldn’t receive any further RequestNext message, since there is only one worker and it’s working on something already. Instead, I see this behavior:

Producer: receives a RequestNext
Producer: uses RequestNext.sendNextTo to send the job to the worker.
Producer: almost immediately received another RequestNext
Consumer: fails, then restarts
Producer: receives another RequestNext
Consumer: receives the same job, fails and restarts again
Producer: receives another RequestNext

The producers received more and more RequestNext messages even if there is only one worker and it does use the sendNextTo ActorRef only of the first one. Why is that?

For the time being I’ve modified my code so that the worker never fails, always replies with the ConsumerController.Confirmed message and tracks the successful or failed execution elsewhere. However I’d like to really understand how the work pull pattern works in the above scenario.

1 post - 1 participant

Read full topic