Using rsync to backup remote n00 files

I had trouble rsync remote Linux 600 files (rw——-) today. I knew that I came across this issue before but couldn’t remember how I resolved it. Therefore I had to waste time looking for and verifying a solution. Hence this blog post.

This is the problem I had earlier:

rsync -zr userA@remoteServer:/var/www/website/ /home/user/Documents/webSiteBackup/website/www/
rsync: send_files failed to open "/var/www/website/wp-config.php": Permission denied (13)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [generator=3.1.0]

So the issue is that wp-config.php is a 600 file, meaning only root can read and write it. Although userA@remoteServer has sudo privilege, I still need to run visudo so that it won’t ask for password when this user runs rsync.

Here is the line I added into visudo:

userA    ALL=(root) NOPASSWD: /usr/bin/rsync

And here is the slightly modified bash command to run:

rsync --rsync-path="sudo rsync" -zr userA@remoteServer:/var/www/website/ /home/user/Documents/webSiteBackup/website/www/

Hope it helps you as well.

Grant full permission to Windows folders and files

For a Windows application or service, it is often necessary for either the user or service startup account to have full control to its folders and files. In most cases, this is transparent to the user. It is generally being taken care of by various delegation mechanisms behind the scene. However, this can be a problem during application upgrade or migration, when one rearranges disk drive letters or mount point names, or moves things from one server to another. To resolve this, one could manually go to each file and folder property page through Windows Explorer, then the Security tab, then the Edit… button, then the Add… button, etc. This method isn’t too bad if only a handful of files and folders are involved. But it is easy to see that manual method gets old really quickly, is error prone, and does not scale.

For example, say you are doing SQL Server migration for a data warehouse instance, where there are multiple mount points and hundred of database files due to filegroup usage and table partitions. During the migration process, you’ll move them to a new location and attach those database files. To ensure SQL Server service SID has proper permission is no small task and we need an automated way.

Given this is Windows, I started searching around for a PowerShell solution. You’d think this is a common problem, and therefore it shouldn’t be too difficult to find a solution. But that’s not the case so I wrote my own. Have fun with it!

Notes:

  • Make sure you start PowerShell with “Start as Administrator”. This is to get around the issue that the creator and permission granter are different identities;
  • The first thing the script does is to give ownership to the Administrators group. During execution, you’ll see output like:
    SUCCESS: The file (or folder): “x:\MountPoints\A1″ now owned by the administrators group.
    that is to be expected.
  • The script’s execution is recursive, meaning all sub-folders and files are covered.

Without further ado, here is the PowerShell function:

function grantFullPermission ($folderpath, $userAccount) {
    Invoke-Expression "takeown /f $folderpath /r /a /d Y"
    $aclWork = (Get-Item $folderPath).GetAccessControl("Access")
    $ssaf = New-Object system.Security.AccessControl.FileSystemAccessRule($userAccount, "FullControl", "ContainerInherit, ObjectInherit", "None", "Allow")
    $aclWork.SetAccessrule($ssaf)
    Set-ACL $folderPath $aclWork }

As an example, to grant proper permission to 3 folders listd in the code, you can run the following script as administrator:

function grantFullPermission ($folderpath, $userAccount) {
    Invoke-Expression "takeown /f $folderpath /r /a /d Y"
    $aclWork = (Get-Item $folderPath).GetAccessControl("Access")
    $ssaf = New-Object system.Security.AccessControl.FileSystemAccessRule($userAccount, "FullControl", "ContainerInherit, ObjectInherit", "None", "Allow")
    $aclWork.SetAccessrule($ssaf)
    Set-ACL $folderPath $aclWork }

$folders = @("x:\MountPoints\A1", "x:\MountPoints\A2\", "x:\MountPoints\B1")

$folders | foreach {
    grantFullPermission $_ 'NT SERVICE\MSSQLSERVER' }

You could also make this a short program that takes two parameters: a folder and an account. Remember to quote the parameters if they have space character.

param(
[parameter(Mandatory=$true)]$DirectoryPath,
[parameter(Mandatory=$true)]$AccountName)

function grantFullPermission ($folderpath, $userAccount) {
    Invoke-Expression "takeown /f $folderpath /r /a /d Y"
    $aclWork = (Get-Item $folderPath).GetAccessControl("Access")
    $ssaf = New-Object system.Security.AccessControl.FileSystemAccessRule($userAccount, "FullControl", "ContainerInherit, ObjectInherit", "None", "Allow")
    $aclWork.SetAccessrule($ssaf)
    Set-ACL $folderPath $aclWork }

grantFullPermission $DirectoryPath $AccountName

sed tricks

I helped a charity to rebuild a MySQL server and to restore a database with a lot of data of longblob type in the last two days. Fortunately there was a dump backup file for the database in question.

However, tables with longblob column(s) were not defined with “ROW_FORMAT=COMPRESSED”. I’d like to restore that database with row compression before inserting the data. Therefore I need to modify the dump sql file. The problem is that the file is 2.5 GB and the server only has 4 GB memory. So editing it is a challenge. Fortunately, Linux has sed to save the day. Don’t you love open source free software?

I am power Vi/Vim user, so I am familiar with sed and have used it in the past. But there are still a few things that I searched around for quick answers. So I’ll record noteworthy points here. I couldn’t remember how many times my own past blog entries helped me over the years. And I hope you’ll find this helpful too!

  • The -n switch is important. sed is a stream editor. In many cases you’d like to supress data streaming to stdout, and -n does that. This was especially important in my case, because a) the file is large, b) it contains blob that may or may not “fit to print”;
  • To see a particular line, say line a, use the p (print) command: sed -n 'ap' file
  • To see all lines between line a and b: sed -n 'a,bp' file
  • To see multiple, non-adjacent lines, say line a, e, g: sed -n 'ap;ep;gp' file
  • To edit big files, you’d like to make in-place changes. Therefore the -i switch. For example, to put in InnoDB row compression, this is the command I used: sed -i 's/CHARSET=utf8;/CHARSET=utf8 ROW_FORMAT=COMPRESSED;/' file
  • Similarly, to delete line a: sed -i 'ad' file You can also do range delete as well

By the way, when restore InnoDB database with a lot of blob data, it makes a lot of sense to enable the following settings in my.cnf, if they are not enabled already. It’ll make administration much easier down the road:
innodb_file_per_table
innodb_file_format = Barracuda

You may also need to tweak the max_allowed_packet and innodb_log_file_size parameters for successful restore.

Something else to pay attention to:
If you use:

mysql -uuser -p database < dump.sql

to restore the database back, the program may report the wrong line where it had loading problems. In most cases, you need to search the surrounding lines to find where the problem is.

Additionally, if you are in a hurry and want to load data in without troubleshooting loading issues, you can try adding -f switch to the command above so the restore ignores errors it encountered and jump to the next line.

Script to generate index rebuild with PAGE compression

For BI data warehouse databases, since the data does not change much and they typically require a lot of space, it makes a lot of sense to compress the indexes to save space.

I came across some BI databases whose indexes were created without compression. We are in the process of migrating those databases to a new server so I took this opportunity to completely rebuild those indexes with PAGE compression.

Two things are of interest:

  • Index rebuild starts from the smallest to the largest. The rational is that during one index rebuild, it needs roughly twice of the actual index size. If we start with the largest index, it may need to expand the file size unnecessarily. On the other hand, if we start with the smallest one, there might be enough space inside to accommodate that rebuild. Once that rebuild is done, more space will be saved, leaving more room for the next rebuild. This way we’ll be able to accommodate index rebuild with no or minimal additional space requirement;
  • Re-indexing is done on a new server, therefore there are no or very few connections to it, so the script defines the MAXDOP parameter, to hopefully make the process faster.
SELECT 
    s.Name AS SchemaName,
    t.NAME AS TableName,
    i.name AS IndexName,
    'ALTER INDEX ' + i.name + ' ON ' + s.name + '.' + t.name + ' REBUILD WITH (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, MAXDOP = 20);' AS AlterIndex,
    SUM(a.total_pages) * 8 AS TotalSpaceKB, 
    SUM(a.used_pages) * 8 AS UsedSpaceKB, 
    (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM 
    sys.tables t
INNER JOIN 
    sys.schemas s ON s.schema_id = t.schema_id
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a ON p.partition_id = a.container_id
WHERE 
    t.NAME NOT LIKE 'dt%' 
    AND t.is_ms_shipped = 0
    AND i.OBJECT_ID > 255 
    AND i.index_id > 0
GROUP BY 
    s.Name, t.Name, i.name
ORDER BY 
    UsedSpaceKB

JiMetrics now gathers SQL Server startup account

During the last few days, I’ve refactored JiMetrics and added a new function:

  • Used Pester to create more test cases for PowerShell functions I wrote;
  • Enhanced the design and code so JiMetrics also gathers SQL Server instance’s startup account, which can be useful.

If you don’t know what JiMetrics is, go to this page to find out. It’s is a tool that uses SQL Server to gather important SQL Server metrics in your enterprise. No additional software install, no registry change, no files to copy and move around that pollute your system, it only uses SQL Server, which you already have and it just works! It has been pretty useful to me and my co-workers. JiMetrics is open source and free to use. Check it out and let me know what you think.

I will keep improving this. My next objective is to play with SQL Server 2014 column-store and see if I can convert the collection database tables to column-store.