Development

SPL FileObject & LimitIterator

Over that last couple of weeks I’ve come to use the SPL far more than I have in the past. The SplFileObject for reading CSV files is far more convenient than the fgetcsv() function, and associated code needed for a CSV file. Using the LimitIterator allowed me to easily bypass the first row of the CSV, as they were headers and I knew the format of those headers.

$csv = new \SplFileObject("some-file.csv");
$csv->setFlags(\SplFileObject::READ_CSV | \SplFileObject::READ_AHEAD |
               \SplFileObject::SKIP_EMPTY | \SplFileObject::DROP_NEW_LINE);
foreach (new \LimitIterator($csv, 1) as $row) {
    /** process the rows **/
}

That was fast for iterating the CSV, the previous code I had for doing this was more verbose about processing for the header row, making it a little more error prone. Take a look at the setCsvControl() method on SplFileObject for more fine grained control over the delimiter, enclosure and escape parsing.

The SPL LimitIterator has also come in handy in another project. This time though I was using a SQL union in a query to merge two separate datasets and I only wanted to list the first 12 out of a potentially larger number of objects.

$sql = '(SELECT id, somestuff FROM somedb.sometable)
        UNION
        (SELECT id, otherstuff FROM otherdb.othertable) ORDER BY `date` DESC';
$data = [];

// $db is a ZF1 adapter object
foreach ($db->query($sql) as $row) {
    $data[$row['id'][] = $row;
}

foreach ($data as $id => $d1) {
    foreach (new \LimitIterator(new \ArrayIterator($data), 0, 12) as $id => $row) {
        /** process the rows **/
    }
}

I didn’t use the following code in the end, but it did work. I suspected, but didn’t confirm, that MySQL’s GROUP BY MAY introduce unknown bugs to the data imported, so I used the above method instead. Here is an example of how the SPL MultipleIterator could be used:

$sql = 'SELECT id, GROUP_CONCAT(`type`) `type`,
               GROUP_CONCAT(`date`) `date`,
               GROUP_CONCAT(`path`) `path`,
               GROUP_CONCAT(`synced`) `synced`
        FROM somedb.sometable
        GROUP BY id, `date` DESC';
foreach ($db->query($sql) as $row) {
    $id = $row['id'];
    // build our array iterators
    $typeArray = new \ArrayIterator(explode(',', $row['type']));
    $dateArray = new \ArrayIterator(explode(',', $row['date']));
    $pathArray = new \ArrayIterator(explode(',', $row['path']));
    $syncArray = new \ArrayIterator(explode(',', $row['synced']));

    // build the main iterator
    $iterator = new \MultipleIterator(\MultipleIterator::MIT_KEYS_ASSOC);

    // make the array iterators combinable into a single array entry
    $iterator->attachIterator($typeArray, 'type');
    $iterator->attachIterator($dateArray, 'date');
    $iterator->attachIterator($pathArray, 'path');
    $iterator->attachIterator($syncArray, 'synced');

    // The array keys of $data will be as set in the attachIterator()
    // method of the MultipleIterator
    foreach(new \LimitIterator($iterator, 0, 12) as $data) {
        /** process the data **/
        if ($data['synced'] == (int)1) {
            continue;
        }
        $data['type']; 
        $data['date'];
        $data['path'];
    }
}

Take a look at the Standard PHP Library for ideas on using the other iterators, data structures, and objects that bring potential for more readable code into your life. They’ve simplified my life a bit because now I don’t have to keep track of variables just to handle counts and other things for state.

Leave a reply