Text-to-speech produces output that has relatively flat affect, that is, it's mostly free of emotion. TTS engines account somewhat for end of sentences by changing the inflection and pause of the last word before a period or question mark. There are slight pauses for other punctuation in a sentence, but for the most part TTS doesn't do much to interpret sentences.
I'd love to be able to separate text from the presentation of the text, in the same way cascading style sheets allow web designers to separate written text from the presentation of the text. I'd like CSS for TTS. I'd like to be able to control the speed, pitch, intensity and stress of TTS text by tagging the text, and then writing style sheets that recognize the tags and control the TTS output accordingly. It would be a big step towards being able to define the persona of an agent implemented fully in TTS.
This should be perfectly feasible. I'm surprised it hasn't been done. If anyone with a technical background wants to work on this and needs some direction drop me a line.