JAVA Tesseract OCR引擎

07-10 1522阅读

Tess4j是一个基于Tesseract OCR引擎的Java库, Tesseract库最初由惠普实验室于1985年开发，后来被Google收购并于2006年开源。识别效果不好，速度还慢，但是好早好早了。

一、POM依赖

   
        
            net.sourceforge.tess4j
            tess4j
            5.12.0            
        
        
            net.java.dev.jna
            jna
            5.14.0

记的要下载训练文件 chi_sim.traineddata

二、配置文件 TesseractOcrConfiguration

import cn.cakeerp.util.StrUtil;
import net.sourceforge.tess4j.Tesseract;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class TesseractOcrConfiguration {
    @Bean
    public Tesseract tesseract() {
        Tesseract tesseract = new Tesseract();
        // 设置训练数据文件夹路径
        tesseract.setDatapath(StrUtil.erpConfig.getTraineddata());
        // 设置为中文简体
        tesseract.setLanguage("chi_sim");
        return tesseract;
    }
}

三、使用

    @Resource
    private Tesseract tesseract;
    //直接就可以识别，也可以
    System.out.println(tesseract.doOCR(new File("d:\\2.jpg")));
    //也可以从 MultipartFile imageFile 里面识别
        InputStream is = null;
        try {
            is = new ByteArrayInputStream(imageFile.getBytes());
            BufferedImage bufferedImage = ImageIO.read(is);
            String textStr = tesseract.doOCR(bufferedImage);
            if (null == textStr || textStr.trim().equals("")) {
                return JsonResult.failed("识别失败,结果为空.");
            }          
            log.error("识别内容为:{}", textStr);
            return JsonResult.failed("未识别到订单号", -1);
        } catch (Exception e) {
            return JsonResult.failed(e.getMessage(), -1);
        } finally {
            if (null != is) {
                try {
                    is.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

后续在官方的仓库中下载到了最新的训练文件 50.2MB结果测试效果还是没啥变化，里面也没有什么可配置的参数。不知道别人怎么用呢。

tessdoc/Data-Files.md at main · tesseract-ocr/tessdoc · GitHub

VPS购买请点击我

JAVA Tesseract OCR引擎

相关阅读

尔康为什么喜欢紫薇？

为什么不能经常算命？

为什么说水瓶座可怕？

为什么心里会莫名的难受？

目录[+]