reshaped to 5 disks as the imsm format does not support a 5-disk raid10
representation. This requires the ->reshape_super method to check the
contents of the array and ask the user to run the reshape at container
-scope (if both subarrays are agreeable to the change), or report an
+scope (if all subarrays are agreeable to the change), or report an
error in the case where one subarray cannot support the change.
1.3 Monitoring / checkpointing
reshape-manager per subarray when the reshape is being carried out at the
container level. For these two reasons the ->manage_reshape() method is
introduced. This method in addition to base tasks mentioned above:
-1/ Spawns a manager per-subarray, when necessary
+1/ Processed each subarray one at a time in series - where appropriate.
2/ Uses either generic routines in Grow.c for md-style backup file
support, or uses the metadata-format specific location for storing
recovery data.
2.1 Freezing sync_action
+ Before making any attempt at a reshape we 'freeze' every array in
+ the container to ensure no spare assignment or recovery happens.
+ This involves writing 'frozen' to sync_action and changing the '/'
+ after 'external:' in metadata_version to a '-'. mdmon knows that
+ this means not to perform any management.
+
+ Before doing this we check that all sync_actions are 'idle', which
+ is racy but still useful.
+ Afterwards we check that all member arrays have no spares
+ or partial spares (recovery_start != 'none') which would indicate a
+ race. If they do, we unfreeze again.
+
+ Once this completes we know all the arrays are stable. They may
+ still have failed devices as devices can fail at any time. However
+ we treat those like failures that happen during the reshape.
+
2.2 Reshape size
1/ mdadm::Grow_reshape(): checks if mdmon is running and optionally
because only redundant raid levels can modify the number of raid disks
2/ mdadm::Grow_reshape(): calls ->reshape_super() to check that the level
change is allowed (being performed at proper scope / permissible
- geometry / proper spares available in the container) prepares a metadata
- update.
+ geometry / proper spares available in the container), chooses
+ the spares to use, and prepares a metadata update.
3/ mdadm::Grow_reshape(): Converts each subarray in the container to the
raid level that can perform the reshape and starts mdmon.
- 4/ mdadm::Grow_reshape(): Pushes the update to mdmon...
- 4a/ mdmon::process_update(): marks the array as reshaping
- 4b/ mdmon::manage_member(): adds the spares (without assigning a slot)
- 5/ mdadm::Grow_reshape(): Notes that mdmon has assigned spares and invokes
- ->manage_reshape()
- 5/ mdadm::<format>->manage_reshape(): (for each subarray) sets sync_max to
- zero, starts the reshape, and pings mdmon
- 5a/ mdmon::read_and_act(): notices that reshape has started and notifies
- the metadata handler to record the slots chosen by the kernel
- 6/ mdadm::<format>->manage_reshape(): saves data that will be overwritten by
+ 4/ mdadm::Grow_reshape(): Pushes the update to mdmon.
+ 5/ mdadm::Grow_reshape(): uses container_content to find details of
+ the spares and passes them to the kernel.
+ 6/ mdadm::Grow_reshape(): gives raid_disks update to the kernel,
+ sets sync_max, sync_min, suspend_lo, suspend_hi all to zero,
+ and starts the reshape by writing 'reshape' to sync_action.
+ 7/ mdmon::monitor notices the sync_action change and tells
+ managemon to check for new devices. managemon notices the new
+ devices, opens relevant sysfs file, and passes them all to
+ monitor.
+ 8/ mdadm::Grow_reshape() calls ->manage_reshape to oversee the
+ rest of the reshape.
+
+ 9/ mdadm::<format>->manage_reshape(): saves data that will be overwritten by
the kernel to either the backup file or the metadata specific location,
advances sync_max, waits for reshape, ping mdmon, repeat.
- 6a/ mdmon::read_and_act(): records checkpoints
- 7/ mdadm::<format>->manage_reshape(): Once reshape completes changes the raid
+ Meanwhile mdmon::read_and_act(): records checkpoints.
+ Specifically.
+
+ 9a/ if the 'next' stripe to be reshaped will over-write
+ itself during reshape then:
+ 9a.1/ increase suspend_hi to cover a suitable number of
+ stripes.
+ 9a.2/ backup those stripes safely.
+ 9a.3/ advance sync_max to allow those stripes to be backed up
+ 9a.4/ when sync_completed indicates that those stripes have
+ been reshaped, manage_reshape must ping_manager
+ 9a.5/ when mdmon notices that sync_completed has been updated,
+ it records the new checkpoint in the metadata
+ 9a.6/ after the ping_manager, manage_reshape will increase
+ suspend_lo to allow access to those stripes again
+
+ 9b/ if the 'next' stripe to be reshaped will over-write unused
+ space during reshape then we apply same process as above,
+ except that there is no need to back anything up.
+ Note that we *do* need to keep suspend_hi progressing as
+ it is not safe to write to the area-under-reshape. For
+ kernel-managed-metadata this protection is provided by
+ ->reshape_safe, but that does not protect us in the case
+ of user-space-managed-metadata.
+
+ 10/ mdadm::<format>->manage_reshape(): Once reshape completes changes the raid
level back to the nominal raid level (if necessary)
FIXME: native metadata does not have the capability to record the original
2.6 Reshape raid disks (shrink)
-3 TODO
+3 Interaction with metadata handle.
+
+ The following calls are made into the metadata handler to assist
+ with initiating and monitoring a 'reshape'.
+
+ 1/ ->reshape_super is called quite early (after only minimial
+ checks) to make sure that the metadata can record the new shape
+ and any necessary transitions. It may be passed a 'container'
+ or an individual array within a container, and it should notice
+ the difference and act accordingly.
+ When a reshape is requested against a container it is expected
+ that it should be applied to every array in the container,
+ however it is up to the metadata handler to determine final
+ policy.
+
+ If the reshape is supportable, the internal copy of the metadata
+ should be updated, and a metadata update suitable for sending
+ to mdmon should be queued.
+
+ If the reshape will involve converting spares into array members,
+ this must be recorded in the metadata too.
+
+ 2/ ->container_content will be called to find out the new state
+ of all the array, or all arrays in the container. Any newly
+ added devices (with state==0 and raid_disk >= 0) will be added
+ to the array as spares with the relevant slot number.
+
+ It is likely that the info returned by ->container_content will
+ have ->reshape_active set, ->reshape_progress set to e.g. 0, and
+ new_* set appropriately. mdadm will use this information to
+ cause the correct reshape to start at an appropriate time.
+
+ 3/ ->set_array_state will be called by mdmon when reshape has
+ started and again periodically as it progresses. This should
+ record the ->last_checkpoint as the point where reshape has
+ progressed to. When the reshape finished this will be called
+ again and it should notice that ->curr_action is no longer
+ 'reshape' and so should record that the reshape has finished
+ providing 'last_checkpoint' has progressed suitably.
+
+ 4/ ->manage_reshape will be called once the reshape has been set
+ up in the kernel but before sync_max has been moved from 0, so
+ no actual reshape will have happened.
+
+ ->manage_reshape should call progress_reshape() to allow the
+ reshape to progress, and should back-up any data as indicated
+ by the return value. See the documentation of that function
+ for more details.
+ ->manage_reshape will be called multiple times when a
+ container is being reshaped, once for each member array in
+ the container.
+
+
+ The progress of the metadata is as follows:
+ 1/ mdadm sends a metadata update to mdmon which marks the array
+ as undergoing a reshape. This is set up by
+ ->reshape_super and applied by ->process_update
+ For container-wide reshape, this happens once for the whole
+ container.
+ 2/ mdmon notices progress via the sysfs files and calls
+ ->set_array_state to update the state periodically
+ For container-wide reshape, this happens repeatedly for
+ one array, then repeatedly for the next, etc.
+ 3/ mdmon notices when reshape has finished and call
+ ->set_array_state to record the the reshape is complete.
+ For container-wide reshape, this happens once for each
+ member array.
+
+
...