The Daily Click ::. Forums ::. Misc Chat ::. Need random people for technical experiment
 

Post Reply  Post Oekaki 
 

Posted By Message

Muz



Registered
  14/02/2002
Points
  6499

VIP MemberI'm on a BoatI am an April FoolHonored Admin Alumnus
27th May, 2010 at 21:58:11 -

Hey, got my thesis due next week and I'm just picking up some random people on the Internet to evaluate it. It's about transforming a speech signal with no emotion into something with emotion, most likely just anger in this case.

What I need is just some people to tell me what they think about the speech and how much anger you think it contains. I'll give like 8 different samples, all with different modifications, and you'd rate it from 1-5. Then I'd take the best overall result and do some modifications to it to find which parameters are the most important, and you rate that one from 1-5.

There's a bunch of last-minute stuff I'd like to do, but might not have time to do, like converting to sadness, surprise, etc, so there could be more. Also, this is probably like the first attempt in the world of its kind to do something like this and I cut a hell lot of corners, so speech quality is rather poor, IMO

Let me know if you're interested, and I'll drop you a PM to avoid any spoilers in the thread contaminating results from others.

Oh, and BTW, I kinda need this done ASAP, next week is a bit too late

Edited by Muz

 
Disclaimer: Any sarcasm in my posts will not be mentioned as that would ruin the purpose. It is assumed that the reader is intelligent enough to tell the difference between what is sarcasm and what is not.

Image

OMC

What a goofball

Registered
  21/05/2007
Points
  3516

KlikCast Musician! Guy with a HatSomewhat CrazyARGH SignLikes TDCHas Donated, Thank You!Retired Admin
27th May, 2010 at 23:05:31 -

I don't know that I'd be much use rating them correctly, but I couldn't help but say it sounds incredibly intriguing.

Somewhat scary applications are coming to mind though.

 

  		
  		

Marko

I like you You like you

Registered
  08/05/2008
Points
  2804

Has Donated, Thank You!Game of the Week WinnerVIP Member360 OwnerDos Rules!Happy FellahCrazy EvilI am an April FoolGingerbread House
28th May, 2010 at 05:20:24 -

I would, but i won't be able to take a look at it til next Monday

 
Image

Subliminal Dreams. . ., daily gaming news and the home of Mooneyman Studios!
www.mooneyman-studios.webs.com

Muz



Registered
  14/02/2002
Points
  6499

VIP MemberI'm on a BoatI am an April FoolHonored Admin Alumnus
28th May, 2010 at 20:38:44 -

Doesn't take much time at all, there's like 10 samples, each less than a second. Anyone who's not a troll should qualify and should be able to do it in less than 5 minutes

Technology's still a bit primitive, it's still a prototype, first attempt of this kind, so can't really do anything scary with it yet. But theoretically, it should work in real time with some optimizations.

 
Disclaimer: Any sarcasm in my posts will not be mentioned as that would ruin the purpose. It is assumed that the reader is intelligent enough to tell the difference between what is sarcasm and what is not.

Image

Sketchy

Cornwall UK

Registered
  06/11/2004
Points
  1971

VIP MemberWeekly Picture Me This Round 43 Winner!Weekly Picture Me This Round 47 WinnerPicture Me This Round 49 Winner!
28th May, 2010 at 20:47:18 -

Sounds interesting.
I'd be happy to help, although I probably won't know what I'm talking about.

 
n/a

Muz



Registered
  14/02/2002
Points
  6499

VIP MemberI'm on a BoatI am an April FoolHonored Admin Alumnus
3rd June, 2010 at 18:21:47 -

All done! Thanks a lot to everyone who participated!

Idea:
To synthesize emotions into speech. Started with only anger here because I got it working literally half a week before submission and only had enough time to clone one emotion. Why? Because synthesized speech sucks. You've all have probably heard Stephen Hawking or one of those ones that come with Windows. The idea here is that putting some emotion would make it sound a lot better, more human and less robot.

Conclusion:
It worked. Sorta. Rated 2.5 out of 5 anger, 2.4 out of 5 quality. Whether it's bad or not depends on what you're doing with it. I'd compare it to well, nice pixel art. You know what the picture's supposed to be, but it's not exactly photorealistic. It's got a few artifacts in the speech, but that buzz actually sounds good when you're used to it.

If you're pulling a prank on someone, it'd work very well on someone who didn't expect it (kinda like Photoshop). For speech synthesizers, it works great at making it sound less boring, just make the pitch contour higher to give it a happier sound, lower to give it a sad one.

Also anger has these spikes in pitch and energy contours. That's much there is to it. It's difficult to simulate just because they change so much more than the transformer can handle. Almost any other emotion has more subtle differences, it should work much better for those.

It's also basically a functional pitch contour transformer. I.e. it can correct you if you're singing out of tune. It's sort of like Photoshop for voice in that sense. But can't really fix your voice if you suck at singing, and if you sing out of key by around 50 Hz, it'd have a techno-ish effect. 50 Hz is still a huge range to change.. you shouldn't be singing that badly

Compared to what other people have done, well.. it's the most successful emotional transformation so far Unless someone's put some top secret research into something better.


Implementation:
Anyway, while all these PhD students were taking huge piles of statistics, hidden markov models, and basically trying to inverse whatever knowledge they got from emotion detection, I took a more retarded game designer approach. I tried to simply quantify emotions as a bunch of numbers.

So, I split it down into three variables - energy contour, duration modification, and pitch contour. There's a bunch of theories I had around these. One was to imitate it exactly - which didn't work out so well, because it just doesn't go up higher than a certain pitch.

The others were kinda meh, proven wrong. One of them was proven kinda true. What was right is that people don't really notice a lot of the bad effects. I guess we're used to listening to horribly compressed music/videos/phone speech. It's fine to just mess it up.

Technical stuff:
Well, I'm not sure what to say about this. I'm not going to give 100% details until the thesis is officially published by the uni - the whole patent possibilities and all.

The stuff I could say is common knowledge. It uses a standard PSOLA (pitch modifying algorithm). It's just a basic pitch modifier in essence, with modifications to allow it to change time even though it was theoretically a stupid thing to do. I think everyone was skeptical about that, lol.

And uh.. yeah. I don't think any of you really play around with this stuff, so a detailed technical explanation doesn't help. But if you've got questions, ask


Why it shouldn't work:
I did take a hell lot of shortcuts. If a mechanical thing, it'd probably be duct taped all over the place. Surprisingly, it held together, and while I was asking my supervisor why it didn't work... it did. It worked so well that he asked me if the synthesized speech was the original. I'm still scratching my head about it working at all, but it does.

1. Never used any of the formulas or stuff suggested by technical papers. I looked at them for like 4 months and went all "screw this" and wrote some random code based on the pictures.
2. PSOLA doesn't use interpolation In English, it's got big goddamn chunks in the pitch contour and nobody noticed.
3. Pitch detector doesn't work reliably. It needs to know the pitch before deciding what to change it to. It's sort of like a plane autoflying and landing without vision but not sure how high it's flying.
4. Pitch correction method is stupid. If you had someone screaming from a range of 40 to 400 Hz, it would just assume an error and assumes that you're screaming at 90 Hz for all that range. The "angry" speech shouldn't work at all. That's the first speech file for you guys who heard it.
5. It mixes voiced, silent, and unvoiced speech, which is epically stupid. They're two very different things (in design, not theory). I think some of you heard a big 'pop' in the middle of the second speech file. That seems to be the only noticeable one. Theoretically, it'd be 'popping' all over the place.
6. There's like 20 pages written on how to do duration modification properly. My system uses a "choose it at random" approach. Both work almost equally well, but my algorithm messes up epically when it increases duration by over 1.5.

Anyway, it raises some big questions about why they worked at all, and accidentally unlocked another branch of research into this stuff.

Edited by Muz

 
Disclaimer: Any sarcasm in my posts will not be mentioned as that would ruin the purpose. It is assumed that the reader is intelligent enough to tell the difference between what is sarcasm and what is not.

Image

OMC

What a goofball

Registered
  21/05/2007
Points
  3516

KlikCast Musician! Guy with a HatSomewhat CrazyARGH SignLikes TDCHas Donated, Thank You!Retired Admin
3rd June, 2010 at 18:27:22 -

Interesting that more work hasn't been put into good speech synthesizers with emotion. Or at least "publicly". Does this have commercial uses or do you have any plans for practical uses?

Remember us when you're famous and rich.

 

  		
  		

Muz



Registered
  14/02/2002
Points
  6499

VIP MemberI'm on a BoatI am an April FoolHonored Admin Alumnus
4th June, 2010 at 06:58:03 -

There's been like a bit of "work", but no actual work. As in a lot of people are summarizing what others did, but never took any steps in any direction in particular. Only significant previous one is a masters paper and it's all about synthesizers, not transformation. I know Microsoft puts a huge pile of money on speech technology, they're the top researchers in the world on it, but they keep all their stuff secret.

Commercial uses.. not really. Like a bunch of research stuff, it's about just trying to find another way to do something and the idea of any good research is to raise more questions. Emotional detection has a straight off commercial use, that is handling angrier calls on call centers. But this is like two steps away from that.


Heh, I'm a little surprised how many academics are thinking of using this to pull pranks on people

 
Disclaimer: Any sarcasm in my posts will not be mentioned as that would ruin the purpose. It is assumed that the reader is intelligent enough to tell the difference between what is sarcasm and what is not.

Image

Marko

I like you You like you

Registered
  08/05/2008
Points
  2804

Has Donated, Thank You!Game of the Week WinnerVIP Member360 OwnerDos Rules!Happy FellahCrazy EvilI am an April FoolGingerbread House
4th June, 2010 at 20:56:49 -

Bloody hell, i forgot all about this - sorry Muz but very interesting all the same

 
Image

Subliminal Dreams. . ., daily gaming news and the home of Mooneyman Studios!
www.mooneyman-studios.webs.com

Muz



Registered
  14/02/2002
Points
  6499

VIP MemberI'm on a BoatI am an April FoolHonored Admin Alumnus
5th June, 2010 at 16:34:20 -

Eh, no worries. I got the data right where I wanted it... saying that it worked, lol. More people make it more accurate, but great that you offered to help anyway

 
Disclaimer: Any sarcasm in my posts will not be mentioned as that would ruin the purpose. It is assumed that the reader is intelligent enough to tell the difference between what is sarcasm and what is not.

Image
   

Post Reply



 



Advertisement

Worth A Click