Augmented Reality has long been a topic of interest, earlier implementations in this field were not as successful as expected and often required expensive hardware to complement the systems. This was a limitation in earlier implementations, however, advancements in technology over the years has led to the incorporation of Augmented Reality in various fields such as mobile, web, standalone devices and also gaming. Mobile computing power has evolved to accommodate processor intensive tasks, therefore augmented reality can be delivered to consumers in form of mobile applications. Language translation software have been implemented in various ways such as text-to-text, speech-to-speech and the more recent augmented reality language translators which makes use the mobile device camera to perform translation. Chapter 1 of this proposal talks about the area, goals and approach to the proposed system, Chapter 2 details the background and literature review while Chapter 3 gives a description on the methodology for fact finding and what tools will be required to implement the proposed system.
Chapter 1. Introduction
1.1 Augmented Reality Language Translation
Augmented Reality(AR) has advanced over the years, various projects have been implemented and developed with Augmented Reality. Mobile Computing power has developed over the years with computing power capable of supporting high-end software applications which require a lot of memory and processing power. This means AR is not only constrained to high-end applications; AR has numerous applications in mobile devices. (Example, TapMeasure which is an augmented reality application that’s used to make virtual measurements 1). Language translation has come a long way from when it was just human translators, devices have been developed for the sole purpose of translation, some are stand-alone devices while others come in form of a software application. Mobile language translation applications have been implemented in various ways such as text-to-text and speech-to-speech.
Communication has always been a problem for most people that travel to foreign countries for either education, business or for vacation. Asking for information can pose a big problem. Although self-explanatory signs and text indications have been boldly incorporated in most big cities, understanding those signs and instructions can be very difficult for individuals who are not native speakers of the language. Also asking locals for information can be a tedious experience because you either can’t clearly state your question or you can’t clearly understand the response given to you.
Although lots of applications offer translation services that are being used by many. Most of the translation application provide a single way to interact with the system such as text-to-text or speech-to-speech This process tackles the translation problem but it is often slow and inefficient. Some improvements have been made to translation which adopts new forms of technology such Augmented reality, however most of the already existing applications do not offer automatic language detection features. This software will implement extra features which will be beneficial to travellers. The Augmented reality feature will utilize the phone camera to enable individuals understand sign and text-based instructions in real time. Automatic language detection feature detects the source languages then translates to the user’s set language.
The aim of this application is to design and develop a multimodal ergonomic augmented reality system that’ll automatically detect source language and translate to destination languages in real time using the consumer’s mobile camera.
· To develop an AR mobile language translator.
· To implement Optical Character recognition with the Vuforia SDK.
· To implement languages translation using Microsoft translate API.
· To overlay the destination(translated) text on the source text.
· To develop a multimodal system that enables the user interact differently with the system
1.3 Overview of Approach
This application will be developed using Android studio, android studio is the SDK for developing Android software applications. Android operating system runs on Java so this application will be developed using the Java programming language. XML will be used to design the user interface of this application.
Vuforia SDK is an Optical character recognition (OCR) kit developed by Qualcomm, it supports various Augmented Reality functionalities. Vuforia has been widely developed because it has an active community, updates involved Unity Extensions, Android Native SDK and iOS native SDK. Vuforia offers cloud services which makes it possible to track and identify different images 2.
Microsoft Translate API will be used for translation services. This API translates text fast and almost instant, it also offers automatic language detection functionalities.
Microsoft Translate API is a free service, however the free service is limited to 2M characters per month, to increase this amount a paid service is then required 3.
Firebase to be able to keep track of devices that are online with this service.
1.4 Document Structure
The rest of this document is as follows. Chapter Two provides a literature review of the area of augmented reality and language translation and the sources consulted in accomplishing this project, in addition to related work. Chapter Three describes the methodology of project structure. Chapter four presents conclusions and future works.
Chapter 2. Background
2.1 Literature Review
2.1.1 What is Augmented Reality?
Augmented reality has a long history which can be traced as far back as 1960s, Sutherland wrote an Article on an AR project and later on implemented the first Head Mounted Display(HMD) 4. This laid the foundation for the technology we know today as Augmented Reality(AR). Augmented reality is the seamless display of computer generated 3D imagery alongside the real world, this enables interactions with real world. Virtual Reality on the other hand immerses the user in a virtual world. Obscured reality however was not widely accepted at first because most of the earlier implementations required very high processing power, this limited this technology to only high-end applications 5. For the past decade, Hardware and software have developed to complement the needs of Augmented reality applications 6. Mobile computing power has grown drastically to support processor intensive tasks, AR a concept that was not widely accepted technology can now be delivered to user with mobile devices 7.
2.1.2 Language Translation
Language translation devices have seen earlier implementations, most of the first devices were stand-alone devices such as Franklin TGA-470 Global Translator
8. The advancements in mobile computing power enabled applications to be developed for specific purposes, such applications consist of mobile language translator. Some of which were speech-to-speech translation 9 and text-to-text translations, iTranslate is an iPhone application that offers text-to-text translation. 8.
2.1.3 Optical Character Recognition
Optical Character recognition(OCR) has been a topic of research for a long while, it involves locating, recognizing and translating text. Traditional OCR was developed for digitizing scanned documents 10, however developments in OCR lead to detecting text in high accuracy but they require high signal to noise ratio and correct distortion free orientation of the text. Tesseract is an SDK that performs OCR, this can easily identify black text on a white background and vice versa. Tesseract was considered computationally expensive but the results have been approved of 8. Abbyy is another SDK that performs OCR, this makes use of cloud service which supports the translations of images. Abbyy’s implementation style supports important requirements such as low processing, recognition of low resolution text/images and it works on several mobile computing platforms 11. Vuforia SDK is another robust SDK that supports numerous Augmented reality features, this will be used to develop proposed system. A new application was released by Qualcomm to do real-time OCR, on Vuforia platform. This update enabled the user to detect, recognize and track text in the environment through his mobile device camera 2.
2.2 Related Work
TranslatAR is a AR language translator that was developed on Nokia N900, this used the phones camera and touchscreen input in implementing Augmented Reality features, combined with Tesseract for Optical Character Recognition (OCR) and Google translate for translation. This application worked well when the texts considered are not distorted in any form and assumes the user is focusing on a single object 8. It required the user to take a video of the region of interest (ROI), capture a frame the translation is then overlaid on the area captured. TranslatAR uses google translate API for translation. 8
2.2.2 Google Translate
Googles Translate is a multimodal mobile languages translator that implemented OCR features using augmented reality. It provides various ways to interact with the application, such as text, voice and the camera input. The focus will be on the AR input; this requires the user to have a knowledge of the source language because the source language has to be set by the user in order for the translation to work correctly and it offers a wider frame area unlike TranslatAR that assumes the user focuses on a single object. Google translate didn’t implement automatic language detection. The source language will have to be set by the user. 13
The proposed system will be modelled as google translates but with a distinct feature, the prosed system will offer automatic language detection. With this system, the user doesn’t need prior knowledge of the unknown language, the user just to set the destination language he wants then focus his mobile camera on the region of interest (ROI). The proposed application will detect the source languages based on the region of interest (ROI), translate to the user set language then overlays the translated text on the region of interest(ROI). The user also has an option of manually inputting text either through speech or text input in case of scenarios where focus can’t be attained or if the image is heavily distorted.
Chapter 3. Methodology
3.1 Fact Finding
Two approaches were used for fact finding when developing this application, they are listed below;
A survey was conducted and the following conclusions were drawn from the survey.
· Approximately 73% of the participants has a positive reaction to the proposed.
· Approximately 4% consider the application to not be that innovative.
· Approximately 20% of the participants were neutral about the usefulness of the application, the other percentage thought it was useful.
· All participants had some levels of interest in text/signs that they didn’t understand.
· Only 10% of the participants have used augmented reality translation services, other forms such as text-to-speech have been used but text-to-text translation proved to be the most used.
· Approximately 66% of the participants would recommend this app to friends.
This proposed system will be developed for the android operating system; the android operating system is one of the leading mobile operating systems. Numerous mobile phone companies develop their devices using the android operating system. Android studio is an open-source IDE used to develop android applications for mobile, tv and watch. This environment will be used together with Vufroia SDK to implement the proposed system in android studio. Android studio’s native language is the Java programming language and XML is the mark-up language used to define the user interface.
The android Operating system is widely used in the mobile computing market and its development environment is open source and also relatively cheaper than its biggest competitor which is the iOS operating system. Vufroia SDK has an android extension and feature which supports OCR. There is also an iOS native extension for Vuforia which supports OCR but android will be used for this project.
Java is the native language for development on the android platform, this will be used to build client-side version of this application.
Mobile application monetization will be implemented using AdMob, it is a trusted by many application developers than any other ad platform. AdMob is the best platform to make money from applications and maximize application revenue through advertising 12.
3.3 Frame Recognition and display
With frame translation, the user doesn’t need to define text areas. This method makes use of the camera view, it processes the view looking for text in real time. All the user has to do is look at the region of interest and wait for the output to be overlaid on the region of interest(ROI). This method of recognition will be implemented with Vuforia system of Text Recognition. The aim of the applications is to provide translated text using Augmented reality.
Figure 3.1. Gant Chart
3.4 Business Plan
This application will need 2 developers to implement the system;
· An android native developer
· An Augmented reality developer
The team of two will be working 6 hours per day for 5 weeks with the rate of $12 per hour, which will be $360 per week for each developer and $3600 per month for both developers.
Two tables will be kept in the database to save data on the user and query.
The user table logs details about the users such as the type of phone used, OS, location, etc.
The query table logs each translation request and the response from Microsoft’s Bing translate API.
This allows for better analysis of the application usage and fasters request response if it is available on the DB.
The applications will be free to download and offers limitless translation depending the user’s storage and internet connection, however, the application will use AdMob services to generate revenue.
The initial capital will be completely self-funded.
Chapter 4. Conclusion
Language Translation applications have always been a useful tool in translations because majority of the population use mobile devices capable of running mobile applications. To deal with the overhead in traditional text-to-text translation, this system offers a multimodal approach towards translation. This multimodal application will enable the user translate text/signs through the use of the mobile camera, text or speech input. From the survey conducted, more than half of the participants had a positive reaction to this application and showed willingness to use this product. Based on the positive reviews from the survey, it’ll be beneficial to develop this application.
1 Leswing, K. (2017). This new virtual tape measure app is perfect for people who obsess over the tiny details in their home. online Business Insider. Available at: http://uk.businessinsider.com/occipital-tapmeasure-app-arkit-2017-9?r=US&IR=T Accessed 9 Jan. 2018.
2 Developer.vuforia.com. (2018). Vuforia Developer Portal |. online Available at: https://developer.vuforia.com Accessed 9 Jan. 2018
3 Microsoft.com. (2018). Translator API – Microsoft Translator. online Available at: https://www.microsoft.com/en-us/translator/translatorapi.aspx Accessed 22 Jan. 2018.
4 Sutherland, I. (1964). Sketchpad a Man-Machine Graphical Communication System. Transactions of the Society for Computer Simulation, 2(5), pp. R-3-R-20
5 Kostaras, N. and Xenos, M. (2012). Usability evaluation of Augmented Reality systems. Intelligent Decision Technologies, 6(2), pp.139-149.
6 Rankohi, S. and Waugh, L. (2013). Review and analysis of augmented reality literature for construction industry. Visualization in Engineering, 1(1), p.9.
7 Dunleavy, M., Dede, C., & Mitchell, R. (2009). Affordances and limitations of immersive participatory Augmented Reality simulations for teaching and learning. Journal of Science Education and Technology, 18(1), 7-22.
8 V. Fragoso, S. Gauglitz, S. Zamora, J. Kleban, and M. Turk. Translatar: A mobile augmented reality translator. In IEEE Wksp. on Applications of Computer Vision (WACV), pages 497 –502, Jan. 2011. 1, 6
9 M. Paul, H. Okuma, H. Yamamoto, E. Sumita,?S. Matsuda, T. Shimizu, and S. Nakamura. Multilingual mobile-phone translation services for world travelers. In Coling 2008: Companion volume: Demonstrations, pages 165–168, Manchester, UK, August 2008. Coling 2008 Organizing Committee
10 S. Mori, H. Nishida, and H. Yamada. Optical Character Recognition. John Wiley & Sons, Inc., New York, NY, USA, 1999. ?
11 Abbyy.com. (2018). ABBYY OCR SDK. online Available at: https://www.abbyy.com/en-au/sdk/ Accessed 9 Jan. 2018.
12 Google.com. (2018). Google AdMob – Mobile App Monetization & In App Advertising. online Available at: https://www.google.com/admob/index.html Accessed 9 Jan. 2018.
13 Translate.google.com. (2018). Google Translate. online Available at: https://translate.google.com Accessed 9 Jan. 2018.