Nov 06 2012

Generating List of affected file from patch files of Git with PHP

Category: PHP,Programmingksg91 @ 8:44 pm

No doubt, Git is a best thing you can use for managing your code. But sometimes you really put yourself in such a worse situation that you start cursing yourself for using Git. I am working on a fairly large project and been working in teams, I had to switch to another branch because the code base I had was greatly differing from the one on git but I had to work with that source because of some reasons. Being base code not being same as Git’s master, I made a commit and started working on that. I cannot simply merge because there were total 20k+ files with code and almost impractical to go through most of them to resolve the conflicts.  I thought of then applying patch for all the later commits. It started giving me lot’s of conflicts too. The files I worked on were not touched by others so simple thing was to take all the files that I edited and to use them. I am no git expert and quite lazy to find the way to do so.  There can be several nice way to deal with this situation using git only but as I said, I am no git expert. And that is not this post is about.

So I created a simple php script that can read and identify the files that were affected in that commits. I thought to share the code, so this is the post.

As I explained already, I had created serial patch I can read from it the affected files. Being quite short on time, wrote a quick code, which may be improved in efficiency and accuracy.

Here is the script I wrote:

<!--?<span class="hiddenSpellError" pre=""-->php
 for($i=1;$i<89;$i++) {

 $content = explode('diff --',file_get_contents($entry));
 $content = $content[0];
 preg_match_all($re,$content,$out, PREG_PATTERN_ORDER);
 foreach($out[0] as $val){
 foreach($result as $key=>$val){
 echo $key."<br />";



Simple thing to do was to read each patch and look for a filename. I used three extensions to look: .php , .js, .css . I put the folder in a directory and put patches in the subdir patch/

Format for patch file name was a serial number starting from 0001 up to the number of files. I simply generated name myself using glob(). I am sure you will argue why didn’t I simply read the directory. Actually, my initial code cause script to exceed the timeout and I was not sure if it was the number of file or something else. So I thought of controlling the number of files. So was the code like this.

Reason for so long execution time was preg_match_all on a long patch files (couple of them were 20 MB! ). We don’t need to search for the files in changes, only portion containing  the list of files was important so I simply search for file names in that portion by exploding contents to diff — and searching in first array value.

To maintain the unique list, I used file name as index so unnecessary memory space does not get wasted. I believe rest of the part makes sense without any explanation.

Above code was enough and efficient in my case. Let me know what you thing about this.

Tags: , , , , , , ,