Text Recognition and Machine Learning: For Impaired Robots and Humans
DOI:
https://doi.org/10.29173/aar42Keywords:
text recognition, optical character recognition (O.C.R), machine learning, neural networks (N.N.), convolutional neural networks (C.N.N.)Abstract
As robots and machines become more reliable, developing tools that utilize their potential in manufacturing and beyond is an important step being addressed by many, including the LIMDA team at the University of Alberta. A common and effective means to improve artificial performance is through optical character recognition methods. Within the category of artificial intelligence under classification machine learning, research has focussed on the benefits of convolutional neural networks (CNN) and the improvement provided compared to its parent method, neural networks. Neural networks serious flaw comes from memorization and the lack of learning about what the images contain, while CNN's combat those issues. CNN’s are designed to connect information received by the network and begins to closely mimic how humans experience learns. Using the programming language Python and machine learning libraries such as Tensorflow and Keras, different versions of CNN’s were tested against datasets containing low-resolution images with handwritten characters. The first two CNN’s were trained against the MNIST database against digits 0 through 9. The results from these tests illustrated the benefits of elements like max-pooling and the addition of convolutional layers. Taking that knowledge a final CNN was written to prove the accuracy of the algorithm against alphabet characters. After training and testings were complete the network showed an average 99.34% accuracy and 2.23% to the loss function. Time-consuming training epochs that don’t wield higher or more impressive results could also be eliminated. These and similar CNN’s have proven to yield positive results and in future research could be implemented into the laboratory to improve safety. Continuing to develop this work will lead to better translators for language, solid text to digital text transformation, and aides for the visual and speech impaired.
Downloads
Published
Issue
Section
License
Copyright (c) 2019 Nadia Gifford, Rafiq Ahmad
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.