Quadmore Software Services
 
 
 



Open Source Sun Microsystems Java to Microsoft Speech SAPI 5.1
Quadmore Java to Microsoft SAPI bridge for Windows version 2.5
Conceived / coded / copyright 2004 by Bert Szoghy, email: webmaster@quadmore.com

Last code update: January 24 2005
Page last modified January 27, 2007

The Quadmore Java to Microsoft SAPI bridge for Windows consists of one Windows Dynamic link library (DLL) file which allows a (Sun Microsystems) Java program to access text-to-speech and speech recognition functionality provided by the Microsoft Speech API (SAPI) version 5.1, on the Windows operating system. Supported Windows platform product versions are: Windows 98 (Second Edition), Windows NT4, Windows Millennium, Windows 2000 and Windows XP.

Microsoft SAPI 5.1 is currently the latest version of that API. Version 5.2 is going to be binary compatible with 5.1, and will be shipped in Windows "Longhorn" generally expected in 2006. The home page for SAPI can be visited by clicking here.

All versions of the Quadmore DLL are Open-Source freeware, along with the demo Java programs using this bridge DLL which are also provided. These programs are free software; you can redistribute them and/or modify them under the terms of the GNU General Public License version 2 as published by the Free Software Foundation; These programs are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details, at: http://www.gnu.org/licenses/licenses.html

If you are a company in need of extended functionality, please contact us for our consulting rates and availability.

The Java programs have been compiled and tested with the Sun Microsystems J2SE v 1.4.2_03 SDK and Java Native Interface (JNI). The Version 1 DLL was compiled using Microsoft Visual C++ .NET and Microsoft Foundation Classes (MFC) version 7. The version 2 DLLs were compiled with Microsoft Visual C++ 6 SP5 and Microsoft Foundation Classes (MFC) version 4.

The Quadmore Java to Microsoft SAPI bridge DLL is dependent on whatever specific version of the MFC it was compiled with, and this is provided through two additional required DLLs which are also in the download zip file. These two MFC DLLs should simply be left in the same directory and the combination of DLLs have been tested to work on Windows 98, NT4, ME, 2000 and XP without any surprises or gotchas.



NEWS

(January 2007) Our Java-to-SAPI DLL is used prominently in the new book "Definitive Guide To Building Java Robots" by Scott Preston (Apress). To read a sample chapter in .PDF format mentioning the Quadmore DLLs, you can click here.



Readers of this book, please note: we were not consulted for this book and it uses a older version of the Quadmore DLL. Please read the FAQ section!

(September 2005) The new BabyTalk Web version 1.6 implements the latest Quadmore DLLs for text-to-speech, modified for UNICODE and Java Packages. See the BabyTalk Web page of this web site for details. We intentionally went with this design to provide an example for Java packages. Full Java and C++ source code is provided, so you can easily figure out how to substitute our Java package name with yours before recompiling the DLLs.

(January 24, 2005) Added an interesting code review comment to QuadTTS.h and fixed a bug in the SWING demo where you could by-pass the app validation by not selecting anything in the voice drop-down: added 2 methods to TTSVoiceGetter.java

(January 2005) Quadmore Java to Sapi Version 2.5 is now available. Full source code, demo programs and and binaries can be downloaded by clicking here.

New this version:
  • "Get Voice" and "Set Voice" methods for text-to-speech: it is now possible to obtain a list of installed voices (SAPI uses the term "tokens"), and select one of these voices programmatically from Java code.


  • To demonstrate the "Get Voice" and "Set Voice" methods, we are providing a new Java SWING demo application for text to speech with this release.




  • You select a SAPI voice which was found on your system and passed to the Java SWING application:



    You enter a sentence to be read out loud by the voice you selected and you click the button:



  • The Java classes calling JNI have been cleaned up and the demo programs instantiate them.


  • The TTS and SR functionalities have been split up into three DLLs.


  • MFC is initialized globally, providing a noticeable performance improvement for speech recognition.


  • The Java class names have been revised


  • The code was compiled on Windows NT4 using Visual C++ 6 SP5 to allow more developers to participate. You can of course import these projects into Visual C++ .NET, which will create a solution (.SLN) file. If you do, just remember to replace the older MFC DLLs with the latest and greatest.


  • A much smaller download after jettisoning all pre-compiled headers. In the same breath we would like to thank Stefan Jetchick for his Visual C++ coaching and his stern code reviewing for this version.


The Unicode version of these enhancements is not yet available as we are focusing on cleaner code and more features at this time. The version 1.1 Unicode DLL and source are still available below.

(January 2005) Microsoft has released the free Speech Recognition Profile Manager Tool which exports and imports speech recognition desktop profile.

(May 2004) The Open Source project Hermes from France are using the Quadmore DLL to build a free communication tool with speech abilities for disabled persons. The Hermes team tested the DLL successfully with the Microsoft TTS Reader's synthetiser with French voices, as well as demo versions of IBM ViaVoice and SayItPro.


(May 2004) The Quadmore DLL was used by computer science students of the University of North Carolina to create an educational computer game for the visually impaired. Click here to go to the project page. The technical design document can be found here.


(April 2004) The Quadmore Java to Microsoft SAPI bridge for Windows version 1.1 is currently used by the Retail Division of the German company Wincor Nixdorf International GmbH (wincor-nixdorf.com) to interface with the AT&T Speech program.



REQUIREMENTS

1) A sound card on your computer along with speakers turned on.

2) A Sun Java Runtime version 1.4 or greater needs to be installed on your machine. We recommend the Sun Java Runtime version 1.4.2 (or later).

3) A decent computer with at least 64 megs of unused RAM.

4) A version of Windows that supports SAPI 5.1, i.e. Windows 98, Windows ME, Windows NT, Windows 2000 or Windows XP. EXCLUDED: Tablet PC Windows, Pocket PC and Windows Vista.

LIMITATIONS

1) A known Java program issue on Windows 98 was encountered while testing, it was necessary to add the following line to the CONFIG.SYS file for the Java programs:
shell=<path>command.com /e:8192 /p

2) If you wish to call the DLL from a java class that is part of a package, you will need to modify the C++ source code and recompile. Thank you to Ronnie Tsz Kit Cheung, final year diploma student of software engineering in Hong Kong both for pointing out the limitation and for coming up with the workaround, which is detailed in a separate section below.

3) There is a major performance improvement to be gained for the speech recognition. Professor Richard Fateman of the Computer Science Department at Berkeley University has aptly pointed out that the object was being instantiated and destroyed in each function call, following a "on my computer it's slow / on my computer it's fast" dialogue which ended up in an informal code review. This is on the top of our to-do list. If you beat us to the punch, be kind and forward us a copy of the fixed code.

PURPOSE

This project aims to stimulate Java development for speech technologies.

In our opinion, the Sun Microsystems text-to-speech project FreeTTS has the better architecture as it allows cross-platform Java text-to-speech applications (please see our BabyTalk project page for an example). However, the voices available right now to FreeTTS are ordinary, especially when compared to some of the better SAPI-compliant ones available for the Windows operating system.

We have provided this interim solution while waiting for sexier voices for FreeTTS. Because SAPI also provides speech recognition, and the next-generation Open Source cmuSphinx 4 is also still in the works, support for speech recognition was provided as well in the Quadmore DLL.



FAQ

Why do I get linker errors using any of the following free compilers: MINGW, Cygwin, Bloodshed, Borland, Microsoft Visual C++ Toolkit 2003, Microsoft Visual C++ 2005 Express?
Answer: none of these natively support the Microsoft Foundation Classes (MFC) and ATL, two proprietary Microsoft technologies. The Borland 5.5 compiler up provides a wrapper for MFC functions, but I have seen no successful recompile of the DLLs using it. Please let me know if you succeed by providing a complete working project that others can reuse.

What C++ compilers can I use?
Answer: Microsoft Visual C++ 6, Microsoft Visual Studio .NET (2002), and Microsoft Visual Studio 2003. The free versions of the Microsoft VC++ compilers DO NOT INCLUDE support for MFC and WILL NOT WORK.

Your demo Java program works great, but when I put the code in my app it gives an error saying it can't find the DLL.
Answer: When you use the Quadmore Java to SAPI bridge in a Java package, you MUST RECOMPILE the Visual C++ code to change the method names to reflect the Java package name. See the section "USING THE DLL FROM A JAVA PACKAGE" below for details.

I trained a speech recognition profile but the DLL does not seem to use it. Why?
Answer: Not sure, this is a new issue and I haven't looked into it yet. My best guess is that the DLL will use whatever speech recognition profile which is the default one, but this has yet to be confirmed.

Can you add a new Java method for volume ? For speed ? For saving the voice to a WAV file or MP3 file?
Answer: I'm very busy! Once in a while I will try to help out, but your chances are slim. My goal in creating this web site was not to compete with commercial products, but to provide a clear example of the very under-documented technique of how to do a Java to MFC method. I was hoping to attract a COM programmer clientele to discuss further under-documented techniques. Those guys are hard to get ahold of.

Unfortunately most of the mail I receive consists of requests for new features from Java developers usually with some kind of robotics project. Not that there is anything wrong with robotics, but it simply is out of scope of what I am trying to achieve. I keep repeating to all that any junior developer can modifify and recompile the C++ code to add whatever is needed -- all the difficult hurdles have been worked out, the code is clear and commented. 100 percent of the time these Java developers respond that they do not want to get their hands dirty with C++. This conversation has happened to me dozens of times. My feeling is that Java developers are much like Visual Basic 6 developers I worked with a decade ago! Perhaps that's not very charitable, but it has to be said.

I have tried from time to time to provide new and improved demos. Usually these demos are completely new and different, and not incremental. Most likely I will do new demos as time permits and they will be once again completely new and different, and not incremental.

What about Windows Vista?
Answer: Microsoft has been cagey for years about SAPI 5.3, and there is no SDK that I know of that can be downloaded (as of January 27, 2007). Please advise me when that happens.



DOWNLOADING AND INSTALLING STEPS

1) For all versions of Windows, you will need to download and run the Microsoft SAPI 5.1 installer. This is a free download (70 mbs)
available directly from Microsoft by clicking here. Windows XP has text-to-speech installed out of the box, but not speech recognition. The SAPI 5.1 installer will give you two additionnal voices in addition to the one that comes with XP.

If you are not sure whether Microsoft SAPI is installed on your system already, check if there is a Speech icon in your Control Panel. If so, double-click it: if you have both a Text to Speech tab and a Speech Recognition tab, you already have it installed and need to skip to step 3.

2) Install Microsoft SAPI 5.1 by following the instructions. Reboot afterwards for good measure.

3) Turn on your computer speakers and plug in your microphone.

The microphone should be of decent quality, as are most computer headset microphones above the $15 range. We use a professional Shure SM-58 microphone plugged in to an Audio Buddy pre-amplifier which in turn is plugged into the sound card input. This purchase was made for recording music, not for doing speech recognition, and so is not necessary or recommended for the purposes here.

4) Go to your Control Panel and double-click the Speech icon. You should see a text-to-speech tab and a speech recognition tab.

In the Text To Speech tab, select a voice in the drop down list and click the Preview Voice button. Make sure you hear something in your speakers.



In the Speech Recognition tab, click the Train Profile button and follow the instructions to allow the system to set the expected volume input for your voice.



5) Download the Quadmore Java to Microsoft SAPI bridge for Windows Version 2.5 by clicking here (for all versions of Windows).

Download the Quadmore Java to Microsoft SAPI bridge for Windows Version 1.1 zip file by clicking here (for all versions of Windows).

Download the Unicode Version 1.1 zip file by clicking here (for Windows NT4, Windows 2000, Windows XP Pro and Windows 2003 only).

6) Unzip to a new file directory on your hard drive and you are ready to play.



DEMO PROGRAMS PROVIDED IN THE DOWNLOAD

One command line program is provided for text-to-speech and another for speech recognition. For TTS, there is a brand new SWING application using the bridge DLL version 2.5.

We recommend you try both command line programs first, because:

1) The code is bare bones and is basically all you need to code your own application using the bridge DLL;

2) Your mileage will vary for speech recognition performance depending on your hardware... For this reason a "Please begin dictation..." command line prompt produced by the DLL will be displayed in DOS to indicate the system is ready for the next sentence, and will help you to gauge the responsiveness of your computer.

To run: in a command prompt go to the new directory where you unzipped the files first:
cd (full directory path)\MyDirectory

For the text-to-speech application type:
(full directory path to java executable)\java QuadmoreJNI

For the speech recognition application type:
(full directory path to java executable)\java SR



SOURCE CODE

Version 2.5 currently has NO UNICODE SUPPORT and includes source code: it can be downloaded by clicking here. However, version 1.6 of BabyTalk Web demonstrates how to correct the code for Unicode with just some light editing.

Version 1.1 had two flavors: regular and Unicode. The complete source code for the "regular" version of the then-single Quadmore DLL can be found by clicking here. The complete source code for the Unicode version of the 1.1 DLL can be found by clicking here.



MODIFYING THE SOURCE CODE

Version 2.5 of the project has intentionally made a leap backward in compilers to Microsoft Visual C++ 6, to allow more developers to pitch in.

We are currently struggling with C++, ATL, MFC and straight SAPI. If you are willing to help, you are quite welcome and we will credit your contribution. Next on our radar is adding methods for speech recognition and speech recognition profiles.

As far as we know, to modify the C++ source code you will absolutely need one of the following specific compilers: Microsoft Visual C++ 6, Microsoft Visual Studio .NET (2002), or Microsoft Visual Studio 2003. Please read the FAQ section above concerning this.

You will need to add the path to the necessary include files and libraries in Visual C++ 6 using the menu Tools > Options, as pictured below.







USING THE DLL FROM A JAVA PACKAGE

Many thanks to Ronnie Tsz Kit Cheung for providing the following tutorial steps.

The following example uses the DLL Version 1.1 code, but the technique is easily imitated for later versions.

1) Edit the QuadmoreJNI.java and/or SR.java file to add the package "com.quadmore.dev" (for example) on the first line;

2) Compile the .java file(s) into a .class the regular way;

3) Open a DOS command prompt and change to the directory to where the file QuadmoreJNI.class is;

4) Use the command: (path)\javah.exe -jni QuadmoreJNI

5) Change to the directory where SR.class is;

6) Use the command: (path)\javah.exe -jni SR

7) Copy the generated QuadmoreJNI.h and SR.h files into the VC++.NET project directory;

8) Open the solution file in VC++.NET;

9) Edit the QuadmoreJNI.h file, modifying Java_QuadmoreJNI_SpeakDarling into Java_com_quadmore_dev_QuadmoreJNI_SpeakDarling
where 'com.quadmore.dev' is the package name and underscore is the separator (reference: http://java.sun.com/docs/books/tutorial/native1.1/integrating/declare.html)

10) Similarly, modify the file SR.h to replace Java_SR_TakeDictationDear with
Java_com_quadmore_dev_SR_TakeDictationDear

11) Since Quadmore.h have included both QuadmoreJNI.h and SR.h, similar modifications are needed: change
Java_QuadmoreJNI_SpeakDarling
into:
Java_com_quadmore_dev_QuadmoreJNI_SpeakDarling
and replace:
Java_SR_TakeDictationDear
with:
Java_com_quadmore_dev_SR_TakeDictationDear

12) Rebuild the VC++.NET project and place the dll in your library path, (normally %systemroot%system32 on Windows machines).

13) Now the methods 'SpeakDarling' and 'TakeDictationDear' can be invoked from within the 'com.quadmore.dev' package.



FEEDBACK

You can email us here

OUR OTHER SOFTWARE

BabyTalk: Open Source text-to-speech in Java

BabyTalk Web: Open Source text-to-speech in Java SWING, with JAXB
and Java Web Services, along with the Simpletext SQL database


Johanne's Time Organizer: Open Source time tracking in Powerbuilder