Discussion:
[rabbitmq-users] Rabbitmqctl stop hanging
Michael Klishin
2015-07-29 19:12:21 UTC
Permalink
+rabbitmq-users
I am hoping someone can help me with a current issue I am seeing
in a pre-production environment. We had an issue (still trying
to determine the cause) where each node in our four node rabbit
cluster began reporting a network partition. So, I attempted
to stop one of the nodes and it has been hanging for 30 minutes as
I write this.
$ sudo rabbitmqctl stop
I'm honestly afraid to force my way out of this state to leave something
corrupt. Any guidance would be greatly appreciated.
What RabbitMQ version do you run? On what Erlang and OS?
What’s in the log files?
What does

    rabbitmqctl eval ‘rabbit_diagnostics:maybe_stuck().’

output?
What about rabbitmqctl report (feel free to edit out vhost, user and queue names) 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+***@googlegroups.com.
To post to this group, send an email to rabbitmq-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ryan Brown
2015-07-29 19:49:45 UTC
Permalink
Rabbit version: 3.2.1
Erlang version: R16B02

Seeing a lot of these in the sasl log:

=SUPERVISOR REPORT==== 29-Jul-2015::12:34:14 ===
Supervisor: {<0.24246.415>,
rabbit_channel_sup_sup}
Context: shutdown_error
Reason: shutdown
Offender: [{nb_children,1},
{name,channel_sup},
{mfargs,{rabbit_channel_sup,start_link,[]}},
{restart_type,temporary},
{shutdown,infinity},
{child_type,supervisor}]

$ sudo rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'
...
Error: {undef,[{rabbit_diagnostics,maybe_stuck,[],[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,569}]},
{rpc,'-handle_call_call/6-fun-0-',5,
[{file,"rpc.erl"},{line,205}]}]}


$ sudo rabbitmqctl report
Reporting server status on {{2015,7,29},{19,36,7}}

...
Status of node ***@sflrs04sc ...
[{pid,14993},
{running_applications,[]},
{os,{unix,linux}},
{erlang_version,"Erlang R16B02 (erts-5.10.3) [source-b44b726] [64-bit]
[smp:32:32] [async-threads:30] [kernel-poll:true]\n"},
{memory,[{total,1923874184},
{connection_procs,288632},
{queue_procs,1090616288},
{plugins,0},
{other_proc,16219816},
{mnesia,14032152},
{mgmt_db,0},
{msg_index,68649968},
{other_ets,16851960},
{binary,664887376},
{code,19681212},
{atom,703377},
{other_system,31943403}]},
{vm_memory_high_watermark,0.7},
{vm_memory_limit,47269060608},
{disk_free_limit,20000000000},
{disk_free,744729702400},
{file_descriptors,[{total_limit,255900},
{total_used,903},
{sockets_limit,230308},
{sockets_used,0}]},
{processes,[{limit,1048576},{used,5878}]},
{run_queue,0},
{uptime,5276819}]

Cluster status of node ***@sflrs04sc ...
[{nodes,[{disc,[***@sflrs01sc,***@sflrs02sc,***@sflrs03sc,
***@sflrs04sc]}]},
{running_nodes,[***@sflrs04sc]},
{partitions,[{***@sflrs04sc,[***@sflrs01sc,***@sflrs02sc,
***@sflrs03sc]}]}]

Application environment of node ***@sflrs04sc ...
[{auth_backends,[rabbit_auth_backend_internal]},
{auth_mechanisms,['PLAIN','AMQPLAIN']},
{backing_queue_module,rabbit_variable_queue},
{cluster_nodes,{['***@sflrs01sc',
'***@sflrs02sc',
'***@sflrs03sc',
'***@sflrs04sc'],
disc}},
{cluster_partition_handling,ignore},
{collect_statistics,fine},
{collect_statistics_interval,5000},
{default_permissions,[<<".*">>,<<".*">>,<<".*">>]},
{},
{},
...,
{delegate_count,16},
{disk_free_limit,20000000000},
{enabled_plugins_file,"/etc/rabbitmq/enabled_plugins"},
{error_logger,{file,"/var/log/rabbitmq/***@sflrs04sc.log"}},
{frame_max,131072},
{heartbeat,100},
{hipe_compile,false},
{hipe_modules,[rabbit_reader,rabbit_channel,gen_server2,rabbit_exchange,
rabbit_command_assembler,rabbit_framing_amqp_0_9_1,
rabbit_basic,rabbit_event,lists,queue,priority_queue,
rabbit_router,rabbit_trace,rabbit_misc,rabbit_binary_parser,
rabbit_exchange_type_direct,rabbit_guid,rabbit_net,
rabbit_amqqueue_process,rabbit_variable_queue,

rabbit_binary_generator,rabbit_writer,delegate,gb_sets,lqueue,
sets,orddict,rabbit_amqqueue,rabbit_limiter,gb_trees,

rabbit_queue_index,rabbit_exchange_decorator,gen,dict,ordsets,
file_handle_cache,rabbit_msg_store,array,
rabbit_msg_store_ets_index,rabbit_msg_file,

rabbit_exchange_type_fanout,rabbit_exchange_type_topic,mnesia,

mnesia_lib,rpc,mnesia_tm,qlc,sofs,proplists,credit_flow,pmon,
ssl_connection,tls_connection,ssl_record,tls_record,gen_fsm,
ssl]},
{included_applications,[]},
{log_levels,[{connection,info}]},
{msg_store_file_size_limit,16777216},
{msg_store_index_module,rabbit_msg_store_ets_index},
{plugins_dir,"/usr/lib/rabbitmq/lib/rabbitmq_server-3.2.1/sbin/../plugins"},
{plugins_expand_dir,"/data/rabbitmq/mnesia/***@sflrs04sc-plugins-expand
"},
{queue_index_max_journal_entries,65536},
{reverse_dns_lookups,false},
{sasl_error_logger,{file,"/var/log/rabbitmq/***@sflrs04sc-sasl.log"}},
{server_properties,[]},
{ssl_apps,[asn1,crypto,public_key,ssl]},
{ssl_cert_login_from,distinguished_name},
{ssl_listeners,[]},
{ssl_options,[]},
{tcp_listen_options,[binary,
{packet,raw},
{reuseaddr,true},
{backlog,128},
{nodelay,true},
{linger,{true,0}},
{exit_on_close,false}]},
{tcp_listeners,[5672]},
{},
{vm_memory_high_watermark,0.7},
{vm_memory_high_watermark_paging_ratio,0.5}]

Connections:

Channels:


... and then that hangs...

Thank you.
Post by Michael Klishin
+rabbitmq-users
I am hoping someone can help me with a current issue I am seeing
in a pre-production environment. We had an issue (still trying
to determine the cause) where each node in our four node rabbit
cluster began reporting a network partition. So, I attempted
to stop one of the nodes and it has been hanging for 30 minutes as
I write this.
$ sudo rabbitmqctl stop
I'm honestly afraid to force my way out of this state to leave something
corrupt. Any guidance would be greatly appreciated.
What RabbitMQ version do you run? On what Erlang and OS?
What’s in the log files?
What does
rabbitmqctl eval ‘rabbit_diagnostics:maybe_stuck().’
output?
What about rabbitmqctl report (feel free to edit out vhost, user and queue
names)
--
MK
Staff Software Engineer, Pivotal/RabbitMQ
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+***@googlegroups.com.
To post to this group, send an email to rabbitmq-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Michael Klishin
2015-07-29 20:04:42 UTC
Permalink
Post by Ryan Brown
Rabbit version: 3.2.1
Erlang version: R16B02
That version is not supported any more, even with commercial support.
There were 19 releases since 3.2.1, and a few contained shutdown deadlocks.
Post by Ryan Brown
=SUPERVISOR REPORT==== 29-Jul-2015::12:34:14 ===
Supervisor: {<0.24246.415>,
rabbit_channel_sup_sup}
Context: shutdown_error
Reason: shutdown
Offender: [{nb_children,1},
{name,channel_sup},
{mfargs,{rabbit_channel_sup,start_link,[]}},
{restart_type,temporary},
{shutdown,infinity},
{child_type,supervisor}]
This means a channel process failed to terminate in a timely manner.
Since you don’t have any open connections, it should be safe to simply
kill the node. 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+***@googlegroups.com.
To post to this group, send an email to rabbitmq-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ryan Brown
2015-07-29 20:07:58 UTC
Permalink
Thank you Michael. Informative as usual. I knew we were out-of-date. But
not that far!
Post by Michael Klishin
Post by Ryan Brown
Rabbit version: 3.2.1
Erlang version: R16B02
That version is not supported any more, even with commercial support.
There were 19 releases since 3.2.1, and a few contained shutdown deadlocks.
Post by Ryan Brown
=SUPERVISOR REPORT==== 29-Jul-2015::12:34:14 ===
Supervisor: {<0.24246.415>,
rabbit_channel_sup_sup}
Context: shutdown_error
Reason: shutdown
Offender: [{nb_children,1},
{name,channel_sup},
{mfargs,{rabbit_channel_sup,start_link,[]}},
{restart_type,temporary},
{shutdown,infinity},
{child_type,supervisor}]
This means a channel process failed to terminate in a timely manner.
Since you don’t have any open connections, it should be safe to simply
kill the node.
--
MK
Staff Software Engineer, Pivotal/RabbitMQ
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+***@googlegroups.com.
To post to this group, send an email to rabbitmq-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Michael Klishin
2015-07-29 20:11:12 UTC
Permalink
Post by Michael Klishin
There were 19 releases since 3.2.1, and a few contained shutdown deadlocks.
that should read: fixes for shutdown deadlocks
--
MK

Staff Software Engineer, Pivotal/RabbitMQ
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+***@googlegroups.com.
To post to this group, send an email to rabbitmq-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ryan Brown
2015-07-30 02:12:19 UTC
Permalink
Unfortunately, no matter what I do I can't get it started again. It
actually hangs when I run sudo service rabbitmq-server start. If I do a ps
aux | grep rabbit I can see the process started. The management console is
up but doesn't show anything.
Post by Michael Klishin
Post by Michael Klishin
There were 19 releases since 3.2.1, and a few contained shutdown deadlocks.
that should read: fixes for shutdown deadlocks
--
MK
Staff Software Engineer, Pivotal/RabbitMQ
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+***@googlegroups.com.
To post to this group, send an email to rabbitmq-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Michael Klishin
2015-07-30 08:44:31 UTC
Permalink
If this is a single node, you can upgrade the package after backing up your database dir.

Otherwise see logs and standard stream output (captured by Debian package scripts).

Please always check logs first and post them to the list. It is absolutely impossible
to help otherwise, especially since your version doesn't have rabbit_diagnostics.
Unfortunately, no matter what I do I can't get it started again. It actually hangs when I run sudo service rabbitmq-server start. If I do a ps aux | grep rabbit I can see the process started. The management console is up but doesn't show anything.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+***@googlegroups.com.
To post to this group, send an email to rabbitmq-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ryan Brown
2015-07-30 17:15:46 UTC
Permalink
Unfortunately, this is not a single node. It is part of a four node
cluster. And, when we attempted an upgrade a few months back to the latest
version of rmq our client stopped connecting. That's obviously a bug with
our code we need to fix. But, this is the more pressing matter at the
moment.

Not being able to use stop_app seriously hinders our efforts. We are unable
to join or remove from the cluster, reset, etc. When I attempt a stop_app I
get the flowing:

trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 276648
neighbours:

=SUPERVISOR REPORT==== 30-Jul-2015::09:32:07 ===
Supervisor: {<0.17528.586>,
rabbit_connection_sup}
Context: shutdown_error
Reason: channel_termination_timeout
Offender: [{pid,<0.797.586>},
{name,reader},
{mfargs,{rabbit_reader,start_link,[<0.27180.585>]}},
{restart_type,intrinsic},
{shutdown,4294967295},
{child_type,worker}]


=CRASH REPORT==== 30-Jul-2015::09:32:07 ===
crasher:
initial call: rabbit_reader:init/2
pid: <0.28286.593>
registered_name: []
exception exit: channel_termination_timeout
in function rabbit_reader:wait_for_channel_termination/2
in call from rabbit_reader:handle_exception/3
in call from rabbit_reader:terminate/2
in call from rabbit_reader:handle_other/2
in call from rabbit_reader:mainloop/2
in call from rabbit_reader:run/1
in call from rabbit_reader:start_connection/5
ancestors: [<0.9043.593>,rabbit_tcp_client_sup,rabbit_sup,<0.149.0>]
messages: [{'EXIT',#Port<0.9560658>,normal}]
links: []
dictionary: [{{channel,1},
{<0.29620.593>,{method,rabbit_framing_amqp_0_9_1}}},
{credit_blocked,[<0.29620.593>]},
{{ch_pid,<0.29620.593>},{1,#Ref<0.0.36610.38601>}},
{{credit_from,<0.29620.593>},0}]
trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 64144
neighbours:

=SUPERVISOR REPORT==== 30-Jul-2015::09:32:07 ===
Supervisor: {<0.9043.593>,rabbit_connection_sup}
Context: shutdown_error
Reason: channel_termination_timeout
Offender: [{pid,<0.28286.593>},
{name,reader},
{mfargs,{rabbit_reader,start_link,[<0.9051.593>]}},
{restart_type,intrinsic},
{shutdown,4294967295},
{child_type,worker}]


=CRASH REPORT==== 30-Jul-2015::09:32:07 ===
crasher:
initial call: rabbit_reader:init/2
pid: <0.30070.573>
registered_name: []
exception exit: channel_termination_timeout
in function rabbit_reader:wait_for_channel_termination/2
in call from rabbit_reader:handle_exception/3
in call from rabbit_reader:terminate/2
in call from rabbit_reader:handle_other/2
in call from rabbit_reader:mainloop/2
in call from rabbit_reader:run/1
in call from rabbit_reader:start_connection/5
ancestors: [<0.30064.573>,rabbit_tcp_client_sup,rabbit_sup,<0.149.0>]
messages: [{'EXIT',#Port<0.9556597>,normal}]
links: []
dictionary: [{{channel,1},
{<0.2457.574>,{method,rabbit_framing_amqp_0_9_1}}},
{credit_blocked,[<0.2457.574>]},
{{credit_from,<0.2457.574>},0},
{{ch_pid,<0.2457.574>},{1,#Ref<0.0.36616.182663>}}]
trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 41037
neighbours:

=SUPERVISOR REPORT==== 30-Jul-2015::09:32:07 ===
Supervisor: {<0.30064.573>,
rabbit_connection_sup}
Context: shutdown_error
Reason: channel_termination_timeout
Offender: [{pid,<0.30070.573>},
{name,reader},
{mfargs,{rabbit_reader,start_link,[<0.1082.574>]}},
{restart_type,intrinsic},
{shutdown,4294967295},
{child_type,worker}]
Post by Michael Klishin
If this is a single node, you can upgrade the package after backing up your database dir.
Otherwise see logs and standard stream output (captured by Debian package scripts).
Please always check logs first and post them to the list. It is absolutely impossible
to help otherwise, especially since your version doesn't have
rabbit_diagnostics.
Post by Ryan Brown
Unfortunately, no matter what I do I can't get it started again. It
actually hangs when I run sudo service rabbitmq-server start. If I do a ps
aux | grep rabbit I can see the process started. The management console is
up but doesn't show anything.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+***@googlegroups.com.
To post to this group, send an email to rabbitmq-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...