Nov 06 2012

Generating List of affected file from patch files of Git with PHP

Category: PHP,Programmingksg91 @ 8:44 pm

No doubt, Git is a best thing you can use for managing your code. But sometimes you really put yourself in such a worse situation that you start cursing yourself for using Git. I am working on a fairly large project and been working in teams, I had to switch to another branch because the code base I had was greatly differing from the one on git but I had to work with that source because of some reasons. Being base code not being same as Git’s master, I made a commit and started working on that. I cannot simply merge because there were total 20k+ files with code and almost impractical to go through most of them to resolve the conflicts.  I thought of then applying patch for all the later commits. It started giving me lot’s of conflicts too. The files I worked on were not touched by others so simple thing was to take all the files that I edited and to use them. I am no git expert and quite lazy to find the way to do so.  There can be several nice way to deal with this situation using git only but as I said, I am no git expert. And that is not this post is about.

So I created a simple php script that can read and identify the files that were affected in that commits. I thought to share the code, so this is the post.

As I explained already, I had created serial patch I can read from it the affected files. Being quite short on time, wrote a quick code, which may be improved in efficiency and accuracy.

Here is the script I wrote:

[php]

<!–?<span class="hiddenSpellError" pre=""–>php
$result=”;
for($i=1;$i<89;$i++) {
$fname=glob(‘patch/’.substr("0000".$i,-4)."*");
$entry=$fname[0];

$content = explode(‘diff –‘,file_get_contents($entry));
$content = $content[0];
$re="/([a-zA-Z0-9_]+)(\.php|\.js|\.css)/";
preg_match_all($re,$content,$out, PREG_PATTERN_ORDER);
$out[0]=array_unique($out[0]);
foreach($out[0] as $val){
$result[$val]=true;
}
}
foreach($result as $key=>$val){
echo $key."<br />";
}

?>

[/php]

Explanation

Simple thing to do was to read each patch and look for a filename. I used three extensions to look: .php , .js, .css . I put the folder in a directory and put patches in the subdir patch/

Format for patch file name was a serial number starting from 0001 up to the number of files. I simply generated name myself using glob(). I am sure you will argue why didn’t I simply read the directory. Actually, my initial code cause script to exceed the timeout and I was not sure if it was the number of file or something else. So I thought of controlling the number of files. So was the code like this.

Reason for so long execution time was preg_match_all on a long patch files (couple of them were 20 MB! ). We don’t need to search for the files in changes, only portion containing  the list of files was important so I simply search for file names in that portion by exploding contents to diff — and searching in first array value.

To maintain the unique list, I used file name as index so unnecessary memory space does not get wasted. I believe rest of the part makes sense without any explanation.

Above code was enough and efficient in my case. Let me know what you thing about this.

Tags: , , , , , , ,