Occasional crash of member after a snapshot installation due to attempt to read a command for an already consumed message. The reproduction steps are highly artificial but this crash has been seen in the wild a couple of times and could happen if a follower member on a node with consumers that come and go runs slowly.
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> ** [{lists,zipwith,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> [#Fun<rabbit_fifo.60.126061837>,[],
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> [{1,[7352901|4]},{2,[7352904|4]}],
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> fail],
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> [{file,"lists.erl"},{line,844}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> {lists,zipwith,4,[{file,"lists.erl"},{line,845}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> {rabbit_fifo,'-delivery_effect/3-anonymous-5-',4,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> [{file,"rabbit_fifo.erl"},{line,2062}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> {ra_server_proc,handle_effect,5,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> [{file,"src/ra_server_proc.erl"},{line,1385}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> {ra_server_proc,handle_effects,5,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> [{file,"src/ra_server_proc.erl"},{line,1301}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> {lists,foldl_1,3,[{file,"lists.erl"},{line,2151}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> {ra_server_proc,handle_effects,5,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> [{file,"src/ra_server_proc.erl"},{line,1301}]}]
currently a queue that experiences this error can be fixed by removing the faulty member from the quorum queue cluster, wait a bit and then re-adding it back using rabbitmq-queues delete_member and rabbitmq-queues add_member
Describe the bug
Occasional crash of member after a snapshot installation due to attempt to read a command for an already consumed message. The reproduction steps are highly artificial but this crash has been seen in the wild a couple of times and could happen if a follower member on a node with consumers that come and go runs slowly.
Reproduction steps
This is easiest to re-create on 4.0.x but can happen on 3.13.x also
quorum_min_checkpoint_intervalapplication config set to 1.ra:stop_server(quorum_queues, {'%2F_q1', node()}).ra:cast_aux_command({'%2F_q1', 'rabbit-1@HOST'}, force_checkpoint).ra:restart_server(quorum_queues, {'%2F_q1', node()}).The member may recover after step 9 - this is also, in fact, a bug.
Expected behavior
No crash
Additional context
currently a queue that experiences this error can be fixed by removing the faulty member from the quorum queue cluster, wait a bit and then re-adding it back using
rabbitmq-queues delete_memberandrabbitmq-queues add_member