Akka Persistence Cassandra - unrecoverable failures, tag

Context: the database we’re using only handles insertion batch sizes of 100. We are getting into scenarios where our tag_writes buffer appears to be growing at a crazy rate and we’re unable to recover unless we restart our services.

We’ve set akka.persistence.cassandra.events-by-tag.max-message-batch-size to 100. Our flush interval is 50ms.
We are using akka-persistence-cassandra version 1.0.1. Would upgrading help? I see this PR that seems to touch some of this code: https://github.com/akka/akka-persistence-cassandra/pull/841/files

I think what’s happening is the following, though I could be completely wrong.

We get rate limited by our DB occasionally and get some OverloadedExceptions.
After enough of these exceptions, we start getting Writing tags has failed. This means that any eventsByTag query will be out of date. The write will be retried. Reason com.datastax.oss.driver.api.core.servererrors.WriteFailureException: Cassandra failure during write query at consistency QUORUM (1 responses were required but only 0 replica responded, 1 failed)
And then not long after, we start getting Buffer for tagged events is getting too large (401) and it keeps building up (to over 10k in some instances!)
- In the debug logs up to that, I see lots of Sequence nr > than write progress. Sending to TagWriter
- Since the buffer keeps increasing (and never seems to decrease) we’re unable to continue writing tags to our DB. We have to manually restart things over and over again to start fresh. Is there some sort of back pressure for when the buffer gets too large?
After this we keep getting timeouts, preventing our actors from making forward progress
- We are also getting a boatload of errors around recovery timeouts. Do we need to bump up the event-recovery-timeout config value? Supervisor RestartSupervisor saw failure: Exception during recovery. Last known sequence number [0]. PersistenceId [Aggregate|12345678-1234-5678-1234c630376bbcc0], due to: Replay timed out, didn't get event within [30000 milliseconds], highest sequence number seen [0]

Any help here is highly appreciated. I can provide our configuration for different params if needed.

1 post - 1 participant

Read full topic

Akka Persistence Cassandra - unrecoverable failures, tag_writes buffer grows unboundedly

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List