{"id":11511,"date":"2022-01-31T10:00:00","date_gmt":"2022-01-31T01:00:00","guid":{"rendered":"https:\/\/www.gigas-jp.com\/appnews\/?p=11511"},"modified":"2022-01-28T19:55:43","modified_gmt":"2022-01-28T10:55:43","slug":"working-with-large-csv-file-memory-efficiently-in-php","status":"publish","type":"post","link":"https:\/\/www.gigas-jp.com\/appnews\/archives\/11511","title":{"rendered":"Working with large csv file memory efficiently in PHP"},"content":{"rendered":"\n<p>When we are supposed to work with large csv file with 500K or 1mil records, there is always a thing to be careful about the memory usage. If our program consumes lot of memory , its not good for the physical server we are using. Beside from that our program should also performed well. Today I would like to share some tips working with laravel csv file.<\/p>\n\n\n\n<h3>Configuration<\/h3>\n\n\n\n<p>Firstly , we have to make sure our PHP setting is configured. Please check the below settings and configure as you need. But keep in mind, don&#8217;t enlarge the memory_limit value if it&#8217;s not required.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>- memory_limit\n- post_max_size\n- upload_max_filesize\n- max_execution_time<\/code><\/pre>\n\n\n\n<h3>Code<\/h3>\n\n\n\n<p>Once our PHP configuration is done, you might to restart the server or PHP itself. The next step is our code. We have to write a proper way not to run out of memory.<\/p>\n\n\n\n<p>Normally we used to read the csv files like this.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$handle = fopen($filepath, \"r\"); \/\/getting our csv file\nwhile ($csv = fgetcsv($handle, 1000, \",\")) { \/\/looping through each records\n\/\/making csv rows validation\n\/\/ inserting to database\n\/\/ etc.\n}<\/code><\/pre>\n\n\n\n<p>The above code might be ok for a few records like 1000 to 5000 and so on. But if you are working with 100K 500K records , the while loop will consume lot of memory. So we have to chunk and separate the loop to get some rest time for our program.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$handle = fopen($filepath, \"r\"); \/\/getting our csv file\n$data = &#091;]; \nwhile ($csv = fgetcsv($handle, 1000, \",\")) { \/\/looping through each records\n   $data&#091;] = $csv;\/\/ you can customize the array as you want\n   \/\/we will only collect each 1000 records and do the operations\n   if(count($data) &gt;= 1000){\n\u3000 \/\/ do the operations here\n   \/\/ inserting to database (If you already prepared the array in above, can directly add to db, no need loops)\n   \/\/ etc.\n   \n   \/\/resetting the data array\n   $data = &#091;];\n   }\n\n   \/\/if there is any rows less than 1000, keep going for it\n   if(count($data) &gt; 0){\n      \/\/ do the operations here\n   }\n}<\/code><\/pre>\n\n\n\n<p>Above one is a simple protype to run the program not to run out of the memory, our program will get rest time for each 1000records. <\/p>\n\n\n\n<p>Here is an another way using <strong>array_chuck<\/strong> and <strong>file<\/strong> function<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$csvArray = file($filepath); \/\/this will output array of our csv file\n\/\/chunking array by 1000 records\n$chunks = array_chunk($csvArray,1000);\n\n\/\/ Then lets store the chunked data files in somewhere\nforeach ($chunks as $key =&gt; $chunk) {\n   file_put_contents($path,$chunk);\n}\n\n\/\/get the files we have stored and can loop through it\nfiles = glob('path\/path'.\"\/*.csv\");\n\nforeach ($files as $key =&gt; $file) {\n  $filer = fopen($file, \"r\");\n  while ($csv = fgetcsv($filer, 1000, \",\")) {\n     \/\/ do the operations here\n  }\n\n  \/\/delete the file back\n  unlink($file);\n}<\/code><\/pre>\n\n\n\n<p>Please don&#8217;t forget to close the files back <code>fclose<\/code> if you have done the file operations.<\/p>\n\n\n\n<h3>Looping content<\/h3>\n\n\n\n<p>One more thing to keep in mind is we have to take care of the codes we put inside loops. If there is any <\/p>\n\n\n\n<ul><li>Database calls or <\/li><li>Third party API calls,<\/li><\/ul>\n\n\n\n<p>it will surely slow down the performance and consume the memory more. So if you can put these calls outside of the loops, our program will be much more efficient.<\/p>\n\n\n\n<p>I am sure there might also be some other work arounds or some packages to handle about this issue behind the scence.<\/p>\n\n\n\n<p>Yuuma<\/p>\n<div class='wp_social_bookmarking_light'>\n            <div class=\"wsbl_google_plus_one\"><g:plusone size=\"medium\" annotation=\"none\" href=\"https:\/\/www.gigas-jp.com\/appnews\/archives\/11511\" ><\/g:plusone><\/div>\n            <div class=\"wsbl_hatena_button\"><a href=\"\/\/b.hatena.ne.jp\/entry\/https:\/\/www.gigas-jp.com\/appnews\/archives\/11511\" class=\"hatena-bookmark-button\" data-hatena-bookmark-title=\"Working with large csv file memory efficiently in PHP\" data-hatena-bookmark-layout=\"standard\" title=\"\u3053\u306e\u30a8\u30f3\u30c8\u30ea\u30fc\u3092\u306f\u3066\u306a\u30d6\u30c3\u30af\u30de\u30fc\u30af\u306b\u8ffd\u52a0\"> <img src=\"\/\/b.hatena.ne.jp\/images\/entry-button\/button-only@2x.png\" alt=\"\u3053\u306e\u30a8\u30f3\u30c8\u30ea\u30fc\u3092\u306f\u3066\u306a\u30d6\u30c3\u30af\u30de\u30fc\u30af\u306b\u8ffd\u52a0\" width=\"20\" height=\"20\" style=\"border: none;\" \/><\/a><script type=\"text\/javascript\" src=\"\/\/b.hatena.ne.jp\/js\/bookmark_button.js\" charset=\"utf-8\" async=\"async\"><\/script><\/div>\n            <div class=\"wsbl_twitter\"><a href=\"https:\/\/twitter.com\/share\" class=\"twitter-share-button\" data-url=\"https:\/\/www.gigas-jp.com\/appnews\/archives\/11511\" data-text=\"Working with large csv file memory efficiently in PHP\" data-via=\"GIGASJAPAN_APPS\" data-lang=\"ja\">Tweet<\/a><\/div>\n            <div class=\"wsbl_facebook_like\"><div id=\"fb-root\"><\/div><fb:like href=\"https:\/\/www.gigas-jp.com\/appnews\/archives\/11511\" layout=\"button_count\" action=\"like\" width=\"100\" share=\"false\" show_faces=\"false\" ><\/fb:like><\/div>\n            <div class=\"wsbl_facebook_send\"><div id=\"fb-root\"><\/div><fb:send href=\"https:\/\/www.gigas-jp.com\/appnews\/archives\/11511\" colorscheme=\"light\" ><\/fb:send><\/div>\n    <\/div>\n<br class='wp_social_bookmarking_light_clear' \/>\n","protected":false},"excerpt":{"rendered":"<p>When we are supposed to work with large csv file with 500K or 1mil records, there is always a thing to be care [&hellip;]<\/p>\n","protected":false},"author":18,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[100],"tags":[],"acf":[],"_links":{"self":[{"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/posts\/11511"}],"collection":[{"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/comments?post=11511"}],"version-history":[{"count":2,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/posts\/11511\/revisions"}],"predecessor-version":[{"id":11514,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/posts\/11511\/revisions\/11514"}],"wp:attachment":[{"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/media?parent=11511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/categories?post=11511"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/tags?post=11511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}