-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2961/
-----------------------------------------------------------
(Updated Dec. 4, 2013, 3:45 p.m.)
Status
------
This change has been marked as submitted.
Review request for Asterisk Developers.
Bugs: ASTERISK-22709
https://issues.asterisk.org/jira/browse/ASTERISK-22709
Repository: Asterisk
Description
-------
John Bigelow discovered that if the newly-created atxfer_threeway_nominal test
is run in a loop, eventually a test run will cause the UUT instance of Asterisk
to crash. Investigation of the crash showed that the crash was being caused
while trying to get the string representation of a channel's translation paths
while creating a channel snapshot. The attended transfer manager thread was
creating the channel snapshot as part of the attended transfer stasis message
while in another thread (in this case, the bridge channel thread) the
translation paths were being altered to make the channel compatible with others
in the bridge.
It's clear that the issue is that the channel's contents are not lock protected
while creating a channel snapshot. Luckily, the channel's lock is held already
when updating translation paths. By adding locking, I was able to run the test
over 450 times straight with no negative consequences.
When adding locking, there were two strategies to follow:
1) Take advantage of the fact that channel locks are recursive, and just add a
single SCOPED_CHANNELLOCK() call to ast_channel_snapshot_create() and call it a
day.
2) Make an audit of channel-related calls that end up creating a channel
snapshot and require locks be held prior to making those calls.
I like approach 2 more, even though it's harder to pull off. In investigating
what paths result in channel snapshots being created, I found that channel
locks should be required before calling the following functions:
* ast_channel_snapshot_create()
* ast_channel_blob_create()
* ast_channel_publish_snapshot()
* ast_publish_channel_state()
* ast_channel_publish_blob()
* ast_channel_publish_varset()
* ast_channel_amaflags_set()
* ast_channel_callid_set()
* ast_channel_whentohangup_set()
* ast_channel_callgroup_set()
* ast_channel_pickupgroup_set()
* ast_channel_internal_bridge_set()
* ast_channel_language_set()
* ast_channel_accountcode_set()
* ast_channel_peeraccount_set()
* ast_channel_linkedid_set()
* ast_channel_stage_snapshot()
* ast_setstate()
* ast_aoc_manager_event()
* ast_channel_setwhentohangup_tv()
* ast_channel_setwhentohangup()
* ast_set_variables()
All of these functions' documentation have been updated to indicate a
precondition of having the channel locked.
In addition, there are some more complex functions that create channel
snapshots but could not easily be made to have the precondition of having the
channel locked. These are:
* ast_bridge_blob_create()
* ast_publish_attended_transfer_fail()
* ast_publish_attended_transfer_bridge_merge()
* ast_publish_attended_transfer_threeway()
* ast_publish_attended_transfer_app()
* ast_publish_attended_transfer_link()
* ast_bridge_publish_enter()
* ast_bridge_publish_leave()
* ast_bridge_publish_blind_transfer()
These functions' documentation now have a precondition of having no channels or
bridges locked.
This review seeks only to make sure that channels are properly locked prior to
creating channel snapshots. A similar effort is likely needed to ensure that
bridges are locked during bridge snapshot creation.
Diffs
-----
/branches/12/tests/test_stasis_channels.c 402818
/branches/12/tests/test_cel.c 402818
/branches/12/tests/test_cdr.c 402818
/branches/12/res/res_stasis.c 402818
/branches/12/res/res_pjsip_refer.c 402818
/branches/12/res/res_agi.c 402818
/branches/12/res/parking/parking_manager.c 402818
/branches/12/res/parking/parking_bridge_features.c 402818
/branches/12/pbx/pbx_realtime.c 402818
/branches/12/main/stasis_channels.c 402818
/branches/12/main/stasis_bridges.c 402818
/branches/12/main/pickup.c 402818
/branches/12/main/pbx.c 402818
/branches/12/main/endpoints.c 402818
/branches/12/main/dial.c 402818
/branches/12/main/core_unreal.c 402818
/branches/12/main/core_local.c 402818
/branches/12/main/channel.c 402818
/branches/12/main/cel.c 402818
/branches/12/main/bridge_channel.c 402818
/branches/12/main/bridge.c 402818
/branches/12/include/asterisk/stasis_channels.h 402818
/branches/12/include/asterisk/stasis_bridges.h 402818
/branches/12/include/asterisk/channelstate.h 402818
/branches/12/include/asterisk/channel.h 402818
/branches/12/include/asterisk/aoc.h 402818
/branches/12/funcs/func_timeout.c 402818
/branches/12/channels/sig_pri.c 402818
/branches/12/channels/sig_analog.c 402818
/branches/12/channels/chan_vpb.cc 402818
/branches/12/channels/chan_unistim.c 402818
/branches/12/channels/chan_skinny.c 402818
/branches/12/channels/chan_sip.c 402818
/branches/12/channels/chan_pjsip.c 402818
/branches/12/channels/chan_phone.c 402818
/branches/12/channels/chan_oss.c 402818
/branches/12/channels/chan_nbs.c 402818
/branches/12/channels/chan_motif.c 402818
/branches/12/channels/chan_misdn.c 402818
/branches/12/channels/chan_mgcp.c 402818
/branches/12/channels/chan_jingle.c 402818
/branches/12/channels/chan_iax2.c 402818
/branches/12/channels/chan_h323.c 402818
/branches/12/channels/chan_gtalk.c 402818
/branches/12/channels/chan_dahdi.c 402818
/branches/12/channels/chan_console.c 402818
/branches/12/channels/chan_alsa.c 402818
/branches/12/apps/app_voicemail.c 402818
/branches/12/apps/app_userevent.c 402818
/branches/12/apps/app_queue.c 402818
/branches/12/apps/app_meetme.c 402818
/branches/12/apps/app_disa.c 402818
/branches/12/apps/app_dial.c 402818
/branches/12/apps/app_confbridge.c 402818
/branches/12/apps/app_agent_pool.c 402818
/branches/12/addons/chan_ooh323.c 402818
/branches/12/addons/chan_mobile.c 402818
Diff: https://reviewboard.asterisk.org/r/2961/diff/
Testing
-------
As stated in the description, I ran the atxfer_threeway_nominal test in a loop.
After approximately 450 test runs, there were no negative consequences. Prior
to this changeset, running the test 20-50 times would result in a crash.
Thanks,
Mark Michelson
--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --
asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-dev