[thirdparty/mdadm.git] / README.initramfs

Assembling md arrays at boot time.
---------------------------------
December 2005

These notes apply to 2.6 kernels only and, in some cases,
to 2.6.15 or later.

Md arrays can be assembled at boot time using the 'autodetect' functionality
which is triggered by storing components of an array in partitions of type
'fd' - Linux Raid Autodetect.
They can also be assembled by specifying the component devices in a
kernel parameter such as
  md=0,/dev/sda,/dev/sdb
In this case, /dev/md0 will be assembled (because of the 0) from the listed
devices.

These mechanisms, while useful, do not provide complete functionality
and are unlikely to be extended.  The preferred way to assemble md
arrays at boot time is using 'mdadm' or 'mdassemble' (which is a
trimmed-down mdadm).  To assemble an array which contains the root
filesystem, mdadm needs to be run before that filesystem is mounted,
and so needs to be run from an initial-ram-fs.  It is how this can
work that is the primary focus of this document.

It should be noted up front that only the array containing the root
filesystem should be assembled from the initramfs.  Any other arrays
should be assembled under the control of files on the main filesystem
as this enhanced flexibility and maintainability.

A minimal initramfs for assembling md arrays can be created using 3
files and one directory.  These are:

/bin           Directory
/bin/mdadm     statically linked mdadm binary
/bin/busybox   statically linked busybox binary
/bin/sh        hard link to /bin/busybox
/init          a shell script which call mdadm appropriately.

An example init script is:

==============================================
#!/bin/sh

echo 'Auto-assembling boot md array'
mkdir /proc
mount -t proc proc /proc
if [ -n "$rootuuid" ]
then arg=--uuid=$rootuuid
elif [ -n "$mdminor" ]
then arg=--super-minor=$mdminor
else arg=--super-minor=0
fi
echo "Using $arg"
mdadm -Acpartitions $arg --auto=part /dev/mda
cd /
mount /dev/mda1 /root ||  mount /dev/mda /root
umount /proc
cd /root
exec chroot . /sbin/init < /dev/console > /dev/console 2>&1
=============================================

This could certainly be extended, or merged into a larger init script.
Though tested and in production use, it is not presented here as
"The Right Way" to do it, but as a useful example.
Some key points are:

  /proc needs to be mounted so that /proc/partitions can be accessed
  by mdadm, and so that /proc/filesystems can be accessed by mount.

  The uuid of the array can be passed in as a kernel parameter
  (rootuuid).  As the kernel doesn't use this value, it is made available
  in the environment for /init

  If no uuid is given, we default to md0, (--super-minor=0) which is a
  commonly used to store the root filesystem.  This may not work in
  all situations.

  We assemble the array as a partitionable array (/dev/mda) even if we
  end up using the whole array.  There is no cost in using the partitionable
  interface, and in this context it is simpler.

  We try mounting both /dev/mda1 and /dev/mda as they are the most like
  part of the array to contain the root filesystem.

  The --auto flag is given to mdadm so that it will create /dev/md*
  files automatically.  This is needed as /dev will not contain
  and md files, and udev will not create them (as udev only created device
  files after the device exists, and mdadm need the device file to create
  the device).  Note that the created md files may not exist in /dev
  of the mounted root filesystem.  This needs to be deal with separately
  from mdadm - possibly using udev.

  We do not need to create device files for the components which will
  be assembled into /dev/mda.  mdadm finds the major/minor numbers from
  /proc/partitions and creates a temporary /dev file if one doesn't already
  exist.

The script "mkinitramfs" which is included with the mdadm distribution
can be used to create a minimal initramfs.  It creates a file called
'init.cpio.gz' which can be specified as an 'initrd' to lilo or grub
(or whatever boot loader is being used).


Resume from an md array
-----------------------

If you want to make use of the suspend-to-disk/resume functionality in Linux,
and want to have swap on an md array, you will need to assemble the array
before resume is possible.
However, because the array is active in the resumed image, you do not want
anything written to any drives during the resume process, such as superblock
updates or array resync.

This can be achieved in 2.6.15-rc1 and later kernels using the
'start_readonly' module parameter.
Simply include the command
  echo 1 > /sys/module/md_mod/parameters/start_ro
before assembling the array with 'mdadm'.
You can then echo
  9:0
or whatever is appropriate to /sys/power/resume to trigger the resume.
Commit	Line	Data
0ff1a185 NB	1	Assembling md arrays at boot time.
	2	---------------------------------
	3	December 2005
	4
	5	These notes apply to 2.6 kernels only and, in some cases,
	6	to 2.6.15 or later.
	7
	8	Md arrays can be assembled at boot time using the 'autodetect' functionality
	9	which is triggered by storing components of an array in partitions of type
	10	'fd' - Linux Raid Autodetect.
	11	They can also be assembled by specifying the component devices in a
	12	kernel parameter such as
	13	md=0,/dev/sda,/dev/sdb
	14	In this case, /dev/md0 will be assembled (because of the 0) from the listed
	15	devices.
	16
	17	These mechanisms, while useful, do not provide complete functionality
	18	and are unlikely to be extended. The preferred way to assemble md
	19	arrays at boot time is using 'mdadm' or 'mdassemble' (which is a
	20	trimmed-down mdadm). To assemble an array which contains the root
	21	filesystem, mdadm needs to be run before that filesystem is mounted,
	22	and so needs to be run from an initial-ram-fs. It is how this can
	23	work that is the primary focus of this document.
	24
	25	It should be noted up front that only the array containing the root
	26	filesystem should be assembled from the initramfs. Any other arrays
	27	should be assembled under the control of files on the main filesystem
	28	as this enhanced flexibility and maintainability.
	29
	30	A minimal initramfs for assembling md arrays can be created using 3
	31	files and one directory. These are:
	32
	33	/bin Directory
	34	/bin/mdadm statically linked mdadm binary
	35	/bin/busybox statically linked busybox binary
	36	/bin/sh hard link to /bin/busybox
	37	/init a shell script which call mdadm appropriately.
	38
	39	An example init script is:
	40
	41	==============================================
	42	#!/bin/sh
	43
	44	echo 'Auto-assembling boot md array'
	45	mkdir /proc
	46	mount -t proc proc /proc
	47	if [ -n "$rootuuid" ]
	48	then arg=--uuid=$rootuuid
	49	elif [ -n "$mdminor" ]
	50	then arg=--super-minor=$mdminor
	51	else arg=--super-minor=0
	52	fi
	53	echo "Using $arg"
	54	mdadm -Acpartitions $arg --auto=part /dev/mda
	55	cd /
	56	mount /dev/mda1 /root \|\| mount /dev/mda /root
	57	umount /proc
	58	cd /root
	59	exec chroot . /sbin/init < /dev/console > /dev/console 2>&1
	60	=============================================
	61
	62	This could certainly be extended, or merged into a larger init script.
	63	Though tested and in production use, it is not presented here as
	64	"The Right Way" to do it, but as a useful example.
65	Some key points are:
66
67	/proc needs to be mounted so that /proc/partitions can be accessed
68	by mdadm, and so that /proc/filesystems can be accessed by mount.
69
70	The uuid of the array can be passed in as a kernel parameter
71	(rootuuid). As the kernel doesn't use this value, it is made available
72	in the environment for /init
73
74	If no uuid is given, we default to md0, (--super-minor=0) which is a
75	commonly used to store the root filesystem. This may not work in
76	all situations.
77
78	We assemble the array as a partitionable array (/dev/mda) even if we
79	end up using the whole array. There is no cost in using the partitionable
80	interface, and in this context it is simpler.
81
82	We try mounting both /dev/mda1 and /dev/mda as they are the most like
83	part of the array to contain the root filesystem.
84
85	The --auto flag is given to mdadm so that it will create /dev/md*
86	files automatically. This is needed as /dev will not contain
87	and md files, and udev will not create them (as udev only created device
88	files after the device exists, and mdadm need the device file to create
89	the device). Note that the created md files may not exist in /dev
90	of the mounted root filesystem. This needs to be deal with separately
91	from mdadm - possibly using udev.
92
93	We do not need to create device files for the components which will
94	be assembled into /dev/mda. mdadm finds the major/minor numbers from
95	/proc/partitions and creates a temporary /dev file if one doesn't already
96	exist.
97
98	The script "mkinitramfs" which is included with the mdadm distribution
99	can be used to create a minimal initramfs. It creates a file called
100	'init.cpio.gz' which can be specified as an 'initrd' to lilo or grub
101	(or whatever boot loader is being used).
102
103
104
105
106	Resume from an md array
107	-----------------------
108
109	If you want to make use of the suspend-to-disk/resume functionality in Linux,
110	and want to have swap on an md array, you will need to assemble the array
111	before resume is possible.
112	However, because the array is active in the resumed image, you do not want
113	anything written to any drives during the resume process, such as superblock
114	updates or array resync.
115
116	This can be achieved in 2.6.15-rc1 and later kernels using the
117	'start_readonly' module parameter.
118	Simply include the command
119	echo 1 > /sys/module/md_mod/parameters/start_ro
120	before assembling the array with 'mdadm'.
121	You can then echo
122	9:0
123	or whatever is appropriate to /sys/power/resume to trigger the resume.