We have developed the easiest and most efficient way to deploy Speech to Text technology within your organization.
A stand-alone solution that includes extensive domain adaptation tools.
We can help your organization get the best accuracy in your domain. You can customize your own dictionary and your language model to gain better accuracy than any of your competitors. We respect your business and your privacy, so you can deploy and train Wiip-transcribe in your own servers. You can also use your own audio files and transcriptions to fine-tune your acoustic models.
Each language has multiple speakers across the world. Traditionally ASR vendors sell individual packages for each language accent. You need to choose between Australian English, American English, Jamaican English, and South African English. For Spanish, you need to choose between Argentina, Bolivia, Chile, Colombia, Costa Rica, Cuba, Ecuador, El Salvador, Guatemala, Honduras, México, Nicaragua, Panamá, Paraguay, Perú, Puerto Rico, República Dominicana, Uruguay, Venezuela, Spain and Guinea Ecuatorial variants. You also have Canadian French and European French.
Our technology has been trained to handle multiple accents. Just choose one of our language packs, since it supports all major accents for use in speech-to-text transcription. We use thousands of hours of spoken data and billions of words from different domains to get the best language models.
The only software requirement is to have Docker installed, everything else is bundled and provided within our Docker containers.
– CPU: intel cpu core i5-2500k
– RAM: 5GB
– HDD: 10 GB of free space.
Just copy any speech file you would like to transcribe inside your ‘wiip-audio’ folder. Any common audio format file is supported. For each audio file, two transcription files will be generated with the same name of your audio file. We generate a regular plain .txt file and also a more detailed .ctm file that includes meta-information of the transcription process (timestamps, lattice…).
The system can reach a Real Time Factor of 0.5X per core, meaning that it can decode 1 second of audio in just 0.5 secs.