Habit formation is the process through which new (and positive) behaviours can be made automatic. Old habits can be hard to break, but it’s possible to form and maintain new habits through repetition and by setting goals.
So where does Machine Translation (MT) come into this?
Our MT engineers at the KantanMT Professional Services team found out that building an effective, high-quality MT engine is akin to building a new and positive habit. We need to refrain from repeating some of our older ineffective models, and build upon newer proven methods.
At the ATC Conference (2016), I spoke about the 7th habit of the 7 key habits of building effective MT systems. I will briefly mention the 6 habits, before moving on to the 7th, and most important habit.
- Upload Quality Materials:When it comes to uploading training data, there’s often a tendency to put quantity before quantity or relevance – or vice versa. It is important to remember that both these elements are equally important and need adequate attention.
- Embrace the Loveable Triplets: F-Measure, BLEU and TER Score are the lynchpin of quality estimation for your MT engine. Score your engines against these measurements, and re-train till the desired quality is achieved.
- Training Rejects: The 3rdhabit follows on from the last: work with detailed training data analysis to determine the suitability and relevancy of training data.
- Unknown Words: Make it a habit to always run Gap Analysis and resolve unknown words rapidly to improve translation quality.
- Glossary: Use glossaries to train your engine. This habit is often neglected in the engine building process, resulting in lower quality MT output.
- Predictive Quality Estimation: Always use predictive quality estimation (KantanAnalytics™) of your engine to train and improve your engines. Make this a habit!
Finally, and most importantly, perfect the language quality review process for MT. Traditionally this involves the Project Managers (PMs) sending copies of a static spreadsheets to a team of translators. These spreadsheets contain lines of source and target segments, with additional columns where the reviewers score the translated segments according to a set of predefined parameters.
Once the spreadsheets are sent off to the reviewers, PMs are completely in the dark – with no idea how the reviewers are progressing, when they might complete the review, or if they have even started the project.
I suggested during my presentation, that we need to automate this process. It’s time to give up our old habit of using emails and a succession of spreadsheets, and use a more streamlined approach, which formalises the workflow. This will not only help reduce frustrating, repetitive MT errors, but also substantially increase translation productivity.
Productivity tools like KantanLQR™ help bring the language quality review process into the twenty-first century, by using technology to make the human evaluation of MT faster, seamless and more efficient. This dramatically reduces the changes of error creeping into the MT engine, and allows production-ready engines to be created faster.
As I have said many time before, MT is here to stay. While the basic technology will evolve and improve, it is important to make it a habit embrace the latest developments and make it a part of the engine building process.
If you have not already begun to use MT within your localization workflow, you are losing your business to your competitors. Read my new eBook, ‘The Buyers’ Guide to Machine Translation Systems,’ co-authored with my colleague Louise Irwin, which will help you ask the right questions and make the right choice when adopting a new translation technology.
Finally, I would like to convey my sincere thanks to everyone who organised and attended the ATC Conferencethis year. It was a very successful event, and it wouldn’t have been possible without you.