We are using persistence actor to store actor state in postgresql. We enabled rememberEntities feature (with persistence as a state store for maintaining shard data) to have all actors in in-memory. After enabling this feature, shard coordinator pod got terminated due to memory spike. After this, cluster is getting formed but the shard region is not able to register with coordinator.
We are getting following error continuously and no events are getting processed
WARNING : Trying to register to coordinator at [ActorSelection[Anchor(akka://actor-system/), Path(/system/sharding/ActorSystemCoordinator/singleton/coordinator)]], but no acknowledgement. Total [3] buffered messages. [Coordinator [Member(address = akka://actor-system@ip:port, status = Up)] is reachable.]
ERROR : Exception in receiveRecover when replaying event type
[akka.cluster.sharding.ShardCoordinator$Internal$ShardHomeDeallocated] with sequence number [12980] for persistenceId [/sharding/DeviceActorCoordinator].
Shard [-20] not allocated: State(Map())
As of now, we have disabled rememberEntities and shard-state-store to make cluster stable
Akka version used: V2.6.6
split brain resolver: akka.cluster.sbr.SplitBrainResolverProvider
created this as an issue in github (Shard region not getting registered to coordinator · Issue #30154 · akka/akka · GitHub)
Cluster shard state in persistence store got corrupted somehow and it is not getting recovered after this.
4 posts - 2 participants