Working with large csv file memory efficiently in PHP

ホーム
アプリ関連ニュース
技術情報
Working with large csv file memory efficiently in PHP

2022年1月31日
技術情報

When we are supposed to work with large csv file with 500K or 1mil records, there is always a thing to be careful about the memory usage. If our program consumes lot of memory , its not good for the physical server we are using. Beside from that our program should also performed well. Today I would like to share some tips working with laravel csv file.

Configuration

Firstly , we have to make sure our PHP setting is configured. Please check the below settings and configure as you need. But keep in mind, don’t enlarge the memory_limit value if it’s not required.

- memory_limit
- post_max_size
- upload_max_filesize
- max_execution_time

Code

Once our PHP configuration is done, you might to restart the server or PHP itself. The next step is our code. We have to write a proper way not to run out of memory.

Normally we used to read the csv files like this.

$handle = fopen($filepath, "r"); //getting our csv file
while ($csv = fgetcsv($handle, 1000, ",")) { //looping through each records
//making csv rows validation
// inserting to database
// etc.
}

The above code might be ok for a few records like 1000 to 5000 and so on. But if you are working with 100K 500K records , the while loop will consume lot of memory. So we have to chunk and separate the loop to get some rest time for our program.

$handle = fopen($filepath, "r"); //getting our csv file
$data = []; 
while ($csv = fgetcsv($handle, 1000, ",")) { //looping through each records
   $data[] = $csv;// you can customize the array as you want
   //we will only collect each 1000 records and do the operations
   if(count($data) >= 1000){
　 // do the operations here
   // inserting to database (If you already prepared the array in above, can directly add to db, no need loops)
   // etc.
   
   //resetting the data array
   $data = [];
   }

   //if there is any rows less than 1000, keep going for it
   if(count($data) > 0){
      // do the operations here
   }
}

Above one is a simple protype to run the program not to run out of the memory, our program will get rest time for each 1000records.

Here is an another way using array_chuck and file function

$csvArray = file($filepath); //this will output array of our csv file
//chunking array by 1000 records
$chunks = array_chunk($csvArray,1000);

// Then lets store the chunked data files in somewhere
foreach ($chunks as $key => $chunk) {
   file_put_contents($path,$chunk);
}

//get the files we have stored and can loop through it
files = glob('path/path'."/*.csv");

foreach ($files as $key => $file) {
  $filer = fopen($file, "r");
  while ($csv = fgetcsv($filer, 1000, ",")) {
     // do the operations here
  }

  //delete the file back
  unlink($file);
}

Please don’t forget to close the files back fclose if you have done the file operations.

Looping content

One more thing to keep in mind is we have to take care of the codes we put inside loops. If there is any

Database calls or
Third party API calls,

it will surely slow down the performance and consume the memory more. So if you can put these calls outside of the loops, our program will be much more efficient.

I am sure there might also be some other work arounds or some packages to handle about this issue behind the scence.

Yuuma