I’m trying to setup a RAID 1 array using MDADM. I have connected to WD RED 3TiB drives to the SATA controller (supplied as part of the NAS perk) but running mdadm --create always fails after a little while: the array is created successfully but during synching SDB always gets marked as failed.
I have tried different drives and different SATA cables but that does not help, so I’m thinking it’s either a dodgy SATA controller or there’s another hardware / driver issue.
Any suggestions?? Thank you very much in advance!!
Here’s some DMESG output:
[49355.725523] ata2: hard resetting link
[49365.730597] ata2: softreset failed (1st FIS failed)
[49365.735494] ata2: hard resetting link
[49375.740606] ata2: softreset failed (1st FIS failed)
[49375.745494] ata2: hard resetting link
[49394.960619] ata1.00: exception Emask 0x0 SAct 0xc0 SErr 0x0 action 0x6 frozen
[49394.967786] ata1.00: cmd 60/08:30:08:08:00/00:00:00:00:00/40 tag 6 ncq 4096 in
[49394.967786] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[49394.982450] ata1.00: cmd 60/20:38:00:08:00/00:00:00:00:00/40 tag 7 ncq 16384 in
[49394.982450] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[49394.997201] ata1: hard resetting link
[49405.000600] ata1: softreset failed (1st FIS failed)
[49405.005498] ata1: hard resetting link
[49410.750598] ata2: softreset failed (1st FIS failed)
[49410.755495] ata2: limiting SATA link speed to 3.0 Gbps
[49410.755500] ata2: hard resetting link
[49415.010597] ata1: softreset failed (1st FIS failed)
[49415.015492] ata1: hard resetting link
[49415.760603] ata2: softreset failed (1st FIS failed)
[49415.765494] ata2: reset failed, giving up
[49415.769512] ata2.00: disabled
[49415.769550] ata2: EH complete
[49415.769601] sd 1:0:0:0: [sdb] tag#28 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[49415.769610] sd 1:0:0:0: [sdb] tag#28 CDB: opcode=0x8a 8a 00 00 00 00 00 00 d0 4d 80 00 00 05 00 00 00
[49415.769615] blk_update_request: I/O error, dev sdb, sector 13651328
[49415.775945] sd 1:0:0:0: [sdb] tag#29 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[49415.775952] sd 1:0:0:0: [sdb] tag#29 CDB: opcode=0x8a 8a 00 00 00 00 00 00 d0 4a 80 00 00 03 00 00 00
[49415.775956] blk_update_request: I/O error, dev sdb, sector 13650560
[49415.776505] md/raid1:md0: Disk failure on sdb1, disabling device.
[49415.776505] md/raid1:md0: Operation continuing on 1 devices.
[49415.776573] md: md0: recovery interrupted.
If smart tools are installed you could try to check if SMART data from the disks or SMART self-tests tell anything. You could also check the temperature is not rising too high for the disks if you don’t have a fan for the NAS box.
Thanks for the suggestions - I do have a fan installed, so everything stays nice and cool. I also ran a long smart test on sdb and that completed without errors. I am pretty sure it’s not the actual drive as I swapped them around and it’s always the one connected to channel 2 (sdb) that fails. I have confirmed this using the actual drive serial numbers. I have also tried different SATA cables but that does not solve the problem either…
I have not really any additions to this, only a tip.
Guys, keep in mind the vibrations of the hard disks. Try to have something soft or rubber under the NAS-case. I am not sure if this problem also occurs using 2 harddisks, but when using multiple HD’s you get the problem of vibration, which can damage the HD’s.
Fixing in the nasbox is realy good.
That not will be problem.
I found by google that similiar problems occurs to guys on PCs who had emulated IDE interface.
Or mb we should use mb diferent partition table (i was trying Linux Raid) mb we should do diferent.
Btw disks works when i create two folders (one for each drive) and i started filling both of them at same time by data. There was no error and i put there almost 300GB and it was ok.
I created a btrfs raid-1 filesystem and copied some data to and from the device, and that seems to be working fine. I’m going to to see what happens if I try to use dmcrypt…
i have a llitle problem with that look like command mkfs.btrfs dont want to do the job for me…
any help here ?
root@turris:~# mkfs.btrfs -m raid1 -d raid1 -f /dev/sda1 /dev/sdb1
btrfs-progs v4.5.1
See http://btrfs.wiki.kernel.org for more information.
Warning, could not drop caches
Warning, could not drop caches
Label: (null)
UUID: c676dab7-744c-4901-94a5-a45c4b9b7738
Node size: 16384
Sector size: 4096
Filesystem size: 5.46TiB
Block group profiles:
Data: RAID1 1.01GiB
Metadata: RAID1 1.01GiB
System: RAID1 12.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 2
Devices:
ID SIZE PATH
1 2.73TiB /dev/sda1
2 2.73TiB /dev/sdb1
Warning, could not drop caches
Warning, could not drop caches
I faced similar issue with I/O Error on one drive. I bought two identical drives, connect as used to. First ‘partitioning’ passed okey. Creating filesystems failed on sdb (during ioctl ending operation). So i check fdisk/cfdisk/sfdisk/partx to see what’s wrong. Nothing suspicious. I connect drives to Windows and check the disk. First correctly GPT with my wanted Ext3. Second one shows as MBR disk with preformated ntfs.
Reformating both on windows and plug to Turris. First drive okey, second drive got I/O error.
Clearing all again and doing one disk by one on first channel. No luck. Second drive always has issues, no matter which tool i used. Several “dd” (clearing 0, clearing 0+1, clearing(urandom, 1:1 sda2sdb cloning. Hm, i even reasemble the router (re-plug all stuff, including pci sata card). Later on that disk start complaining about sector 0, sector 2048, 4096 … something with offset or/and zero found. So what the hell, i run the hdparm tool based on info i have from first drive. That went really okey, for like 12hours waiting. After that disk was fine and i can make it running with 3x1T Ext(3) partitions. While not using it, it disconnect for several times. So i have to reconnect to windows box for check. Seatools/DiskWizard/Gparted and so on. In the end, disk refuse any GPT , so only MBR was possible. So i ende in situation 2T partition on 3T disk. Put it back, run cfdisk (change all to GPT + 2x 1,5T NTFS).
Finally working set. Ready to mount and use.
Two days under ‘load/testing’. SDB2 partition got sickness. Again , new check and voila, ‘bad-disk’ or/and ‘read-only’.
Found out that wonderful --yes-i-know-what-i-am-doing flag for hdparm and make security enhanced erase/factory reset. Over night and few hours disk looks fine. Just to not repeat all again, quickly 3x 1T NTFS . For like 10h it was okey, but then disconnect and since then i can’t do any ‘write’ operation after 4096 sector or at the end of the disk (anything between was ok).
Bricked drive, doh …meh …mffffp…
(i can erase 0-4095 sectors, i can ‘repair partition’ (but i can’t write that repair back), LLF tool is failing as well. ‘dd’/‘shred’/‘scrum’ got always i/o error …). also i can play within DCO area a bit (read, clear security/set security … but issuing again security-erase/security-enhanced-erase command fails on IO error …
After few days playing, drive was sent back for replacement. Obviously defected drive.
That learn me a leson, that force me to read bunch of man pages and from those ‘disk’ tools.
Be hacker sfdisk is really useful.
Btw: on the other hand, windows internal ‘diskpart.exe’ under admin in cmd.exe is also very helpful tool (why the hell i was using some 3rd party tools before … PartitionMagic and such …)
SATA drives.
i finaly figure out that its working even with this errors.
So now i have Raid 1 running by using mkfs.btrfs -m raid1 -d raid1 -f /dev/sda1 /dev/sdb1
If you put encryption between hard drives and mdraid you’ll end up encrypting the same data twice (and unless the same key is used in both cases, this may have security implications). To save CPU time, put encryption between RAID configuration and file system.
Whether to use btrfs’ RAID support is another matter. I will go with ext4 on top of mdraid.
Both Omnia and NAS have finally arrived. I had exactly same idea of buying two WD Red 3TB, but I never have to configure HW on unix. Do you have any good manual how to setup it all and ideally solve this I/O error?
From my experiences with my 2x3T drives i recommend to use GParted aside or any similar livedistro (with parted/gparted tool) to prepare the partitions in advance.
When i was trying to find out what went wrong i was looking for ‘parted’ tool (and during looking why it is not installed by default) i found some article (really i can’t find it again) where was stated that GPT is supported by TurrisOS (like to use it on drives), but due some limitations of correspond fork of OpenWRT it is not recommended to create 2T+ partition(and format them) directly in ssh.