利用OpenCV实现旋转文本图像矫正的原理及OpenCV代码

原创已于 2022-05-06 09:45:07 修改 · 1.1w 阅读

30 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#旋转文本 #霍夫变换

于 2016-06-23 13:11:55 首次发布

图像处理原理、工具、代码专栏收录该内容

280 篇文章

订阅专栏

对图像进行旋转矫正，关键是要知道旋转角度是多少！知道了旋转角度就可以用仿射变换对图像进行矫正，图像旋转的相关代码可以参考我的另一篇博文，链接如下：
https://blog.csdn.net/wenhao_ir/article/details/51469085

旋转角度怎么获取？可以对图像作傅里叶变换获取这个角度，下面说说求这个角度的大概原理。

文本图像的明显特征就是存在分行间隔，那么行与文字之间的灰度值变化程度就不如真正的文字及文字间的变化剧烈，那么相应的这些地方的频谱值也低，即频谱的低谱部分，因为傅里叶变换就是表征图像各点的变化频率的嘛~

具体的步骤如下：

⑴读取图像,读取图像的同时转化为灰度图像,代码如下：

代码中用到的图像的下载链接为：https://pan.baidu.com/s/1I3QM2Grl99J8KyqBsI_WLA?pwd=hhor

//读取图像并转化为灰度图像
const char* filename = "F:/material/images/P0040-rotating_text_correction-02.jpg";
Mat srcImg = imread(filename, CV_LOAD_IMAGE_GRAYSCALE);
if (srcImg.empty())
	return -1;
imshow("source", srcImg);

⑵图像作DFT运算前先进行尺寸变换。通常我们都是用FFT算法来计算序列的DFT的，又由于快速傅里叶变换FFT是基于图像尺寸是2、3或5的倍数完成的，因此对于输入源图像，首先应将其变换成DFTSize,OpenCV中提供了函数getOptimalDFTSize()来实现尺寸转换。相关代码如下：

//Expand image to an optimal size, for faster processing speed
//Set widths of borders in four directions
//If borderType==BORDER_CONSTANT, fill the borders with (0,0,0)
Mat padded;
int opWidth = getOptimalDFTSize(srcImg.rows);
int opHeight = getOptimalDFTSize(srcImg.cols);
cout<<"原图像的行数："<<srcImg.rows<<endl;
cout<<"原图像的列数："<<srcImg.cols<<endl;
cout<<"为适应DFT运算调整后的行数："<<opWidth<<endl;
cout<<"为适应DFT运算调整后的列数："<<opHeight<<endl<<endl;
copyMakeBorder(srcImg, padded, 0, opWidth - srcImg.rows, 0, opHeight - srcImg.cols, BORDER_CONSTANT, Scalar::all(0));

上面代码中的函数copyMakeBorder()用于扩展图像的边界，使用方法比较简单，大家一看代码应该就知道其使用方法了。

图像尺寸调整前和调整后的大小如下：

⑶求图像的DFT变换并将频谱进行搬移处理

为了大家能更好地理解以下代码，建议大家先看看我的以下三篇博文：

序列的离散傅里叶变换(DFT)的来龙去脉

用MATLAB的函数fft2()作二维傅里叶变换所需要注意的地方

在二维离散傅里叶变换中进行频谱平移(MATLAB::fft2shift)的作用

代码如下：

Mat planes[] = { Mat_<float>(padded), Mat::zeros(padded.size(), CV_32F) };//Mat_<float>(padded)是将Mat对象padded的数据类型强制转换为float类型
	Mat comImg;
	//Merge into a double-channel image
	merge(planes, 2, comImg);

	//Use the same image as input and output,
	//so that the results can fit in Mat well
	dft(comImg, comImg);

	//Compute the magnitude
	//planes[0]=Re(DFT(I)), planes[1]=Im(DFT(I))
	//magnitude=sqrt(Re^2+Im^2)
	split(comImg, planes);
	magnitude(planes[0], planes[1], planes[0]);

	//Switch to logarithmic scale, for better visual results
	//M2=log(1+M1)
	Mat magMat = planes[0];
	magMat += Scalar::all(1);
	log(magMat, magMat);

	//Crop the spectrum
	//Width and height of magMat should be even, so that they can be divided by 2
	//-2 is 11111110 in binary system, operator & make sure width and height are always even
	magMat = magMat(Rect(0, 0, magMat.cols & -2, magMat.rows & -2));

	//Rearrange the quadrants of Fourier image,
	//so that the origin is at the center of image,
	//and move the high frequency to the corners
	int cx = magMat.cols / 2;
	int cy = magMat.rows / 2;

	Mat q0(magMat, Rect(0, 0, cx, cy));
	Mat q1(magMat, Rect(0, cy, cx, cy));
	Mat q2(magMat, Rect(cx, cy, cx, cy));
	Mat q3(magMat, Rect(cx, 0, cx, cy));

	Mat tmp;
	q0.copyTo(tmp);
	q2.copyTo(q0);
	tmp.copyTo(q2);

	q1.copyTo(tmp);
	q3.copyTo(q1);
	tmp.copyTo(q3);

	//Normalize the magnitude to [0,1], then to[0,255]
	normalize(magMat, magMat, 0, 1, CV_MINMAX);
	Mat magImg(magMat.size(), CV_8UC1);
	magMat.convertTo(magImg, CV_8UC1, 255, 0);
	imshow("magnitude", magImg);

运行结果如下：

(4)倾斜角检测。经过频域中心移动后，由上图可以看出，只需要检测出图像中斜直线的倾斜角就可以对旋转文本进行校正。计算直线倾斜角有多种方法，这里采用霍夫变换线检测方法进行直线倾斜角的计算，首先将傅里叶变换后的频谱图进行固定二值化处理，这里阈值的选择和场景有很大关系，要根据实际应用场景进行合理调整；然后根据霍夫变换检测直线的步骤来完成图像中的直线检测；直线检测完了后计算图像直线的角度，然后用这个角度对原图进行仿射变换矫正。

倾斜角检测的代码如下：

	//Turn into binary image
	threshold(magImg, magImg, GRAY_THRESH, 255, CV_THRESH_BINARY);
	imshow("mag_binary", magImg);
	//imwrite("imageText_bin.jpg",magImg);

	//Find lines with Hough Transformation
	vector<Vec2f> lines;
	float pi180 = (float)CV_PI / 180;
	Mat linImg(magImg.size(), CV_8UC3);
	HoughLines(magImg, lines, 1, pi180, HOUGH_VOTE, 0, 0);
	int numLines = lines.size();
	int L = 1000;
	for (int l = 0; l<numLines; l++)
	{
		float rho = lines[l][0], theta = lines[l][1];
		Point pt1, pt2;
		double a = cos(theta), b = sin(theta);
		double x0 = a*rho, y0 = b*rho;
		pt1.x = cvRound(x0 + L * (-b));
		pt1.y = cvRound(y0 + L * (a));
		pt2.x = cvRound(x0 - L * (-b));
		pt2.y = cvRound(y0 - L * (a));
		line(linImg, pt1, pt2, Scalar(255, 0, 0), 3, 8, 0);
	}
	imshow("lines", linImg);
	//imwrite("imageText_line.jpg",linImg);
	if (lines.size() == 3){
		cout << "found three angels:" << endl;
		cout << lines[0][1] * 180 / CV_PI << endl << lines[1][1] * 180 / CV_PI << endl << lines[2][1] * 180 / CV_PI << endl << endl;
	}

	//Find the proper angel from the three found angels
	float angel = 0;
	float piThresh = (float)CV_PI / 90;
	float pi2 = CV_PI / 2;
	for (int l = 0; l<numLines; l++)
	{
		float theta = lines[l][1];
		if (abs(theta) < piThresh || abs(theta - pi2) < piThresh)
			continue;
		else{
			angel = theta;
			break;
		}
	}

运行结果如下：

对上面这段代码的说明如下：

①为什么pt1.x、pt1.y、pt2.x、pt2.y的坐标值为下面的表达式？

pt1.x = cvRound(x0 + L * (-b));
pt1.y = cvRound(y0 + L * (a));
pt2.x = cvRound(x0 - L * (-b));
pt2.y = cvRound(y0 - L * (a));

请大家参考我的另一篇博文，链接如下：利用OpenCV的霍夫变换线检测函数HoughLines()得到直线的ρ和θ值后绘制直线的原理详解

②下面这段代码实际上是去掉水平线和垂直线，或者接近水平线线和垂直线的直线。

	//Find the proper angel from the three found angels
	float angel = 0;
	float piThresh = (float)CV_PI / 90;
	float pi2 = CV_PI / 2;
	for (int l = 0; l<numLines; l++)
	{
		float theta = lines[l][1];
		if (abs(theta) < piThresh || abs(theta - pi2) < piThresh)
			continue;
		else{
			angel = theta;
			break;
		}
	}

(5)仿射变换矫正。对得到的线角度计算旋转矩阵，利用仿射变换完成旋转文本矫正。

代码如下：

	//Calculate the rotation angel
	//The image has to be square,
	//so that the rotation angel can be calculate right
	angel = angel<pi2 ? angel : angel - CV_PI;
	if (angel != pi2){
		float angelT = srcImg.rows*tan(angel) / srcImg.cols;
		angel = atan(angelT);
	}
	float angelD = angel * 180 / (float)CV_PI;
	cout << "the rotation angel to be applied:" << endl << angelD << endl << endl;

	//Rotate the image to recover
	Mat rotMat = getRotationMatrix2D(center, angelD, 1.0);
	Mat dstImg = Mat::ones(srcImg.size(), CV_8UC3);
	warpAffine(srcImg, dstImg, rotMat, srcImg.size(), 1, 0, Scalar(255, 255, 255));
	imshow("result", dstImg);

对上面这段代码说明如下：

①下面这句代码需要说明一下：

angel = angel<pi2 ? angel : angel - CV_PI;

这句代码实际上是确定函数getRotationMatrix2D()的第二个参数，函数getRotationMatrix2D()的第二个参数为图像旋转的角度，单位为度，正值代表逆时针旋转。代码的意思为，当我们得到的angle小于π/2时，旋转的角度就为angel。而如果它大于π/2时(等于π/2的情况已经被之前的代码排除了)，旋转的角度为angel -π/2。为什么是这样？对这个问题，我专门写了个文档，大家可去下载阅读，链接： https://download.csdn.net/download/wenhao_ir/85298269

②由于我们在作DFT时对图像的长宽进行了扩展，所以得到的直线的角度也会有偏差，我们得把这个偏差去掉。下面这段代码就是这个功能。

	angel = angel<pi2 ? angel : angel - CV_PI;
	if (angel != pi2){
		float angelT = srcImg.rows*tan(angel) / srcImg.cols;
		angel = atan(angelT);
	}

运行结果如下：

可见，通过这个程序的处理，原本旋转的文本被矫正了。

下面附完整代码：

代码中用到的图像的下载链接为：https://pan.baidu.com/s/1I3QM2Grl99J8KyqBsI_WLA?pwd=hhor

//博主微信/QQ 2487872782
//有问题可以联系博主交流
//有图像处理需求也可联系博主
//图像处理技术交流QQ群 271891601

//OpenCV版本：3.0
//VS版本：2012

#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include<opencv2/imgcodecs/imgcodecs.hpp>
#include <opencv2/imgproc/imgproc.hpp>

#include <iostream>

using namespace cv;
using namespace std;

#define GRAY_THRESH 150
#define HOUGH_VOTE 100


int main(int argc, char **argv)
{
	//读取图像并转化为灰度图像
	const char* filename = "F:/material/images/P0040-rotating_text_correction-02.jpg";
	Mat srcImg = imread(filename, CV_LOAD_IMAGE_GRAYSCALE);
	if (srcImg.empty())
		return -1;
	imshow("source", srcImg);

	Point center(srcImg.cols / 2, srcImg.rows / 2);

	//Expand image to an optimal size, for faster processing speed
	//Set widths of borders in four directions
	//If borderType==BORDER_CONSTANT, fill the borders with (0,0,0)
	Mat padded;
	int opWidth = getOptimalDFTSize(srcImg.rows);
	int opHeight = getOptimalDFTSize(srcImg.cols);
	//cout<<"原图像的行数："<<srcImg.rows<<endl;
	//cout<<"原图像的列数："<<srcImg.cols<<endl;
	//cout<<"为适应DFT运算调整后的行数："<<opWidth<<endl;
	//cout<<"为适应DFT运算调整后的列数："<<opHeight<<endl<<endl;
	copyMakeBorder(srcImg, padded, 0, opWidth - srcImg.rows, 0, opHeight - srcImg.cols, BORDER_CONSTANT, Scalar::all(0));

	Mat planes[] = { Mat_<float>(padded), Mat::zeros(padded.size(), CV_32F) };//Mat_<float>(padded)是将Mat对象padded的数据类型强制转换为float类型
	Mat comImg;
	//Merge into a double-channel image
	merge(planes, 2, comImg);

	//Use the same image as input and output,
	//so that the results can fit in Mat well
	dft(comImg, comImg);

	//Compute the magnitude
	//planes[0]=Re(DFT(I)), planes[1]=Im(DFT(I))
	//magnitude=sqrt(Re^2+Im^2)
	split(comImg, planes);
	magnitude(planes[0], planes[1], planes[0]);

	//Switch to logarithmic scale, for better visual results
	//M2=log(1+M1)
	Mat magMat = planes[0];
	magMat += Scalar::all(1);
	log(magMat, magMat);

	//Crop the spectrum
	//Width and height of magMat should be even, so that they can be divided by 2
	//-2 is 11111110 in binary system, operator & make sure width and height are always even
	magMat = magMat(Rect(0, 0, magMat.cols & -2, magMat.rows & -2));

	//Rearrange the quadrants of Fourier image,
	//so that the origin is at the center of image,
	//and move the high frequency to the corners
	int cx = magMat.cols / 2;
	int cy = magMat.rows / 2;

	Mat q0(magMat, Rect(0, 0, cx, cy));
	Mat q1(magMat, Rect(0, cy, cx, cy));
	Mat q2(magMat, Rect(cx, cy, cx, cy));
	Mat q3(magMat, Rect(cx, 0, cx, cy));

	Mat tmp;
	q0.copyTo(tmp);
	q2.copyTo(q0);
	tmp.copyTo(q2);

	q1.copyTo(tmp);
	q3.copyTo(q1);
	tmp.copyTo(q3);

	//Normalize the magnitude to [0,1], then to[0,255]
	normalize(magMat, magMat, 0, 1, CV_MINMAX);
	Mat magImg(magMat.size(), CV_8UC1);
	magMat.convertTo(magImg, CV_8UC1, 255, 0);
	imshow("magnitude", magImg);
	//imwrite("imageText_mag.jpg",magImg);

	//Turn into binary image
	threshold(magImg, magImg, GRAY_THRESH, 255, CV_THRESH_BINARY);
	imshow("mag_binary", magImg);
	//imwrite("imageText_bin.jpg",magImg);

	//Find lines with Hough Transformation
	vector<Vec2f> lines;
	float pi180 = (float)CV_PI / 180;
	Mat linImg(magImg.size(), CV_8UC3);
	HoughLines(magImg, lines, 1, pi180, HOUGH_VOTE, 0, 0);
	int numLines = lines.size();
	int L = 1000;
	for (int l = 0; l<numLines; l++)
	{
		float rho = lines[l][0], theta = lines[l][1];
		Point pt1, pt2;
		double a = cos(theta), b = sin(theta);
		double x0 = a*rho, y0 = b*rho;
		pt1.x = cvRound(x0 + L * (-b));
		pt1.y = cvRound(y0 + L * (a));
		pt2.x = cvRound(x0 - L * (-b));
		pt2.y = cvRound(y0 - L * (a));
		line(linImg, pt1, pt2, Scalar(255, 0, 0), 3, 8, 0);
	}
	imshow("lines", linImg);
	//imwrite("imageText_line.jpg",linImg);
	if (lines.size() == 3){
		cout << "found three angels:" << endl;
		cout << lines[0][1] * 180 / CV_PI << endl << lines[1][1] * 180 / CV_PI << endl << lines[2][1] * 180 / CV_PI << endl << endl;
	}

	//Find the proper angel from the three found angels
	float angel = 0;
	float piThresh = (float)CV_PI / 90;
	float pi2 = CV_PI / 2;
	for (int l = 0; l<numLines; l++)
	{
		float theta = lines[l][1];
		if (abs(theta) < piThresh || abs(theta - pi2) < piThresh)
			continue;
		else{
			angel = theta;
			break;
		}
	}

	//Calculate the rotation angel
	//The image has to be square,
	//so that the rotation angel can be calculate right
	angel = angel<pi2 ? angel : angel - CV_PI;
	if (angel != pi2){
		float angelT = srcImg.rows*tan(angel) / srcImg.cols;
		angel = atan(angelT);
	}
	float angelD = angel * 180 / (float)CV_PI;
	cout << "the rotation angel to be applied:" << endl << angelD << endl << endl;

	//Rotate the image to recover
	Mat rotMat = getRotationMatrix2D(center, angelD, 1.0);
	Mat dstImg = Mat::ones(srcImg.size(), CV_8UC3);
	warpAffine(srcImg, dstImg, rotMat, srcImg.size(), 1, 0, Scalar(255, 255, 255));
	imshow("result", dstImg);
	//imwrite("imageText_D.jpg",dstImg);

	waitKey(0);

	return 0;
}

运行结果在上面已经贴出了，这里就不贴了。