Generating List of affected file from patch files of Git with PHP

No doubt, Git is a best thing you can use for managing your code. But sometimes you really put yourself in such a worse situation that you start cursing yourself for using Git. I am working on a fairly large project and been working in teams, I had to switch to another branch because the code base I had was greatly differing from the one on git but I had to work with that source because of some reasons. Being base code not being same as Git’s master, I made a commit and started working on that. I cannot simply merge because there were total 20k+ files with code and almost impractical to go through most of them to resolve the conflicts.  I thought of then applying patch for all the later commits. It started giving me lot’s of conflicts too. The files I worked on were not touched by others so simple thing was to take all the files that I edited and to use them. I am no git expert and quite lazy to find the way to do so.  There can be several nice way to deal with this situation using git only but as I said, I am no git expert. And that is not this post is about.

So I created a simple php script that can read and identify the files that were affected in that commits. I thought to share the code, so this is the post.

As I explained already, I had created serial patch I can read from it the affected files. Being quite short on time, wrote a quick code, which may be improved in efficiency and accuracy.

Here is the script I wrote:


<!--?<span class="hiddenSpellError" pre=""-->php
 $result='';
 for($i=1;$i<89;$i++) {
 $fname=glob('patch/'.substr("0000".$i,-4)."*");
 $entry=$fname[0];

 $content = explode('diff --',file_get_contents($entry));
 $content = $content[0];
 $re="/([a-zA-Z0-9_]+)(\.php|\.js|\.css)/";
 preg_match_all($re,$content,$out, PREG_PATTERN_ORDER);
 $out[0]=array_unique($out[0]);
 foreach($out[0] as $val){
 $result[$val]=true;
 }
 }
 foreach($result as $key=>$val){
 echo $key."<br />";
 }

?>

Explanation

Simple thing to do was to read each patch and look for a filename. I used three extensions to look: .php , .js, .css . I put the folder in a directory and put patches in the subdir patch/

Format for patch file name was a serial number starting from 0001 up to the number of files. I simply generated name myself using glob(). I am sure you will argue why didn’t I simply read the directory. Actually, my initial code cause script to exceed the timeout and I was not sure if it was the number of file or something else. So I thought of controlling the number of files. So was the code like this.

Reason for so long execution time was preg_match_all on a long patch files (couple of them were 20 MB! ). We don’t need to search for the files in changes, only portion containing  the list of files was important so I simply search for file names in that portion by exploding contents to diff — and searching in first array value.

To maintain the unique list, I used file name as index so unnecessary memory space does not get wasted. I believe rest of the part makes sense without any explanation.

Above code was enough and efficient in my case. Let me know what you thing about this.

Published by ksg91

Human Being

Leave a comment

Leave a Reply

%d bloggers like this: