Convert Excel file with hyperlinks to CSV


It seems the venerable file format CSV (Character/Comma-separated values) never goes out of style. According to CSV wikipedia page, IBM Fortran started supporting this format in 1967, before many of us were born. CSV has been with us, through thick and thin, silently but steadfastly, ready to spring into action when duty calls. For sure, it’s one of data professionals’ best friends! Often times, we’d convert spreadsheet files or dump data inside a database into a CSV before it can be distributed and consumed downstream.

Major league scripting languages, such as Perl, Python, and Ruby, all have their own way of converting Excel spreadsheet files into CSVs. Here I list their most popular libraries, based on my research: for Perl, there is Spreadsheet::ParseExcel; for Python, there is xlrd; for Ruby, there is roo.

However, none of these addressed a problem I had recently.

Here is my use case:
Given Excel files, in both xls and xlsx format, that have hyperlink columns in them, convert them to CSV. For hyperlink columns, save the text value (also known as Friendly_name) but not its URL. None of the libraries mentioned above can handle it.

So I ended up trying PHP, and found a PHP library called PHPExcel that addressed my needs. Below is a quick CLI PHP program I wrote.

Follow steps below to use it:

  1. Download PHPExcel library;
  2. Save the program below. On Linux, you can save it as excel2csv. On Windows, save it as excel2csv.php. Modify as needed so it points to the correct directory where PHPExcel is located;
  3. On Linux, you may want to run
    chmod +x excel2csv
    On Windows you should be ok if your system knows to use PHP when it sees a .php extension;
  4. To use it, on command line, run
    excel2csv inputExcel outputCsv
    Remember to replace the parameters to your liking!

Hope it helps!

[code language=”php”]
#!/usr/bin/php -q
< ?php require_once('/Directory2PHPExcel/PHPExcel/Classes/PHPExcel.php'); $inputFile = $argv[1]; $outputFile = $argv[2]; Xls2Csv($inputFile,$outputFile); function Xls2Csv($infile,$outfile) { $fileType = PHPExcel_IOFactory::identify($infile); $objReader = PHPExcel_IOFactory::createReader($fileType); $objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($infile);

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, ‘CSV’);
$objWriter->save($outfile);
}
?>
[/code]

,

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.