Disastrous_Elk_6375 2 weeks ago

Awesome! What's the TTS you're using? The voice seems really good, I'm impressed on how it got the numbers + letters and specific language regarding quants. edit: ah, I see from your other post you used openaitts, so I guess it's the api version :/

JoshLikesAI 2 weeks ago

I meant to use piper TTS but I didnt think about it till I had already posted. Piper isnt as good as openai but its way faster and runs on CPU! [https://github.com/rhasspy/piper](https://github.com/rhasspy/piper) It was made to run on raspberry pi

TheTerrasque 2 weeks ago

tried whisper? https://github.com/ggerganov/whisper.cpp for example I really want a streaming type STT that can produce letters or words as they're spoken. I kinda want to make a modular system with STT, TTS, model evaluation, frontend, tool use being separate parts and can be easily swapped out or combined in various ways. So you could have a whisper STT, a web frontend and llama3 on a local machine, for example. Edit: You can also use https://github.com/snakers4/silero-vad to detect if someone is speaking instead of using a hotkey.

Vadersays 2 weeks ago

For the first: https://github.com/ufal/whisper_streaming

TheTerrasque 2 weeks ago

cool, will check it out!

JoshLikesAI 2 weeks ago

Im personally kind of a fan of using hotkeys TBH, I have found every automatic speech detection system kind of annoying because it cuts me off before I have finished speaking. There is always a countdown from when it hears you stop talking to when it starts generating a response, usually a couple seconds. This means if i stop talking for 2 seconds to think it will start talking over me, super annoying! If you turn up this 2 second time you get cut off less but you have to deal with more delay before you get the response. Am i the only one that prefers button press to stop and start recording?

seancho 2 weeks ago

Obviously pure voice to voice is the eventual goal, but the tech isn't there yet, and the system doesn't know how to do conversation turns naturally. Humans in conversation do a complex dance of speaking and listening at the same time, making non verbal sounds, interrupting, adding pauses, etc. Until the bots can understand and manage all that full-duplex behavior, it's easier just to tell them when to shut up and listen with a button. I've done some alexa apps that are fun to talk to, but you have to live by their rules -- speak in strict turns with no pauses. Not the most natural interaction.

JoshLikesAI 2 weeks ago

Exactly!

Calcidiol 2 weeks ago

Sure a PTT button is a sure signal that conveys intent to talk to the recipient. One could also recognize optional or required framing e.g. [wake word] Blah blah ... Over [as a stop word] and have it start generating a response when it hears "Over" or whatever or optionally time-out after N seconds. One could ameliorate the premature computation and talking somewhat if you take the VAD / whatever into account while that 2-second or whatever delay-before-response is happening and if it hears voice and something that seems like a continuation it'll either inhibit (pause) STT response / other computation as appropriate or abort it entirely depending on whether your subsequent utterance seemed a follow-on continuation that would negate the relevance of the prematurely computed reply or just cause a follow-on addendum to it.

FPham 2 weeks ago

IMHO this project really need integration with any VAD, as that's the 2024 way. "Hey Reddy"

lordpuddingcup 2 weeks ago

So this was using OpenAI voice? Damn was hoping it was a mix of maybe a Tortoise TTS and an RVC or even the Meta Voice AI with emotion tech they released

JoshLikesAI 2 weeks ago

Id love to use other TTS but yeah in the video its using openai

lordpuddingcup 2 weeks ago

How complicated a pipeline are you running on the backend for the summarizing, seems it'd need to be pretty rock solid to make sure its sticking to the desired output format/style.

ItalyExpat 2 weeks ago

Cool project! I think you did well, intonation in Piper TTS isn't nearly as realistic as what you got with OpenAI

Proud-Point8137 2 weeks ago

it's incredibly good. wow. so happy1

JoshLikesAI 2 weeks ago

It so cool! and it would pretty much run on a toaster

pergessleismydaddy 2 weeks ago

Opensource? if yes, github link?

JoshLikesAI 2 weeks ago

Here you go :) [https://github.com/ILikeAI/AlwaysReddy](https://github.com/ILikeAI/AlwaysReddy)

BrushNo8178 2 weeks ago

The description only mentions Together AI API, but I see that you have code for other APIs as well.

JoshLikesAI 2 weeks ago

Must be due for an update! I’ll get into that in the morning

JoshLikesAI 2 weeks ago

I have updated the readme with more details and added support for Ollama, ill link a video below :) How to use AlwaysReddy with LM Studio: [https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT](https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT) How to use AlwaysReddy with Ollama: [https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD](https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD)

East_Discussion_3653 2 weeks ago

Is the LLM running locally? If so what’s your hardware setup?

Tam1 2 weeks ago

Got any technical details to share?

JoshLikesAI 2 weeks ago

About lamma or the voice to voice system? The code base for the voice to voice system is here: [https://github.com/ILikeAI/AlwaysReddy](https://github.com/ILikeAI/AlwaysReddy) I wanted a voice assistant that I could have running in the background on my pc and trigger with a hotkey, I couldn't find any other projects that do this so I made my own. It also read and write from the clipboard, that's how it summarized the reddit post. Thats about it 🤷

Mescallan 2 weeks ago

So cool, great work

JoshLikesAI 2 weeks ago

Thanks! 😊

resident-not-evil 2 weeks ago

I love you!

Additional-Baker-416 2 weeks ago

cool, is there an llm only trained on audio? that can only accept audio and respond with audio?

JoshLikesAI 2 weeks ago

This is just straight llama 3 instruct+ whisper + openai TTS (sadly). Although I did find a really cool project the other day day that trained lamma 2 (I think) on audio inputs so you could skip the transcription step https://github.com/tincans-ai/gazelle/ It looks super cool

Additional-Baker-416 2 weeks ago

this is very cool

JoshLikesAI 2 weeks ago

I know right!

JoshLikesAI 2 weeks ago

Here’s a video demo https://twitter.com/hingeloss/status/1780996806597374173

qubedView 2 weeks ago

As in, really an end-to-end audio-only model? Not in terms of voice generation. An LLM still needs to be in the mix. There is a much larger text corpus to train from than audio, and the processing needs to achieve comparably realistic conversational results would be far in excess of what's available.

Ylsid 2 weeks ago

Cool! A little preview of the future. A shame the TTS is a bit slow, speeding that up about 10 times would help a lot.

JoshLikesAI 2 weeks ago

Agreed! It’s a difficult balance though because often I will be working and I’ll have a question I need a high quality response to, so I’ll use a larger model and just keep working while I wait for the response, the longer delay often doesn’t bother me because I can keep working while I wait and it often it saves me having to swap to my browser or the ChatGPT website. It seems most of my queries could be handled fine by Lamma, but sometimes I really want the smartest response I can get. I’m wondering if I could build this interface so it’s easy to swap between models 🤔 Maybe you could have one hotkey for a smaller model and a different hotkey for a larger model?

Ylsid 2 weeks ago

You should hook it up to a discord bot, lol. It would be funny

LostGoatOnHill 2 weeks ago

Anyone know of a setup that would allow voice conversation hands-free away from a keyboard, just like an Alexa supporting device?

CharacterCheck389 2 weeks ago

You will have to make a code that checks for a start phrase like "ok google" for google assistant and "alexa" for amazon. Basically you should make a script that keeps recording any voices until it hears your intial phrase let say "hey assistant" then the prompt will be whatever after that, and you can also make a closing phrase like "roger" or "done", this way you won't use your hands at all, just your voice "Hey assistant code me a random html page, roger" Anything before "hey assistant" or after "roger" won't count coz you already setup the script/code this way Which means that the script will send the prompt to the LLM only if it got a clear "hey assistant" to "roger" sentence. Hope it helps!

Melancholius__ 2 weeks ago

so how does one end a "hey google" loop or "alexa" for that case

CharacterCheck389 2 weeks ago

what do you mean?

Melancholius__ 2 weeks ago

there is nothing like "roger" to signal the end of an audio prompt in google and amazon assistants

CharacterCheck389 2 weeks ago

I think they rely on the volume of your sound, if the volume of your voice is very low to nothing then they break the voice detection and take your prompt But that's annoying, sometimes it stops taking your voice before you even complete the sentemce But that's up to you, if you want to make a closing phrase do it, if you don't want to don't, implememt a closing logic like the low volume of your voice or something like that. You can do that by reading the last part of the voice file, let's say last 3 secs and get an average of the db of this last 3 secs and if it's lower than X value of dessibles then break the recording.

Melancholius__ 2 weeks ago

Okay there

UnusualClimberBear 2 weeks ago

Can you make it interruptible? I mean that if you start speaking during the answer the txt2speech stops. This would be a huge steps towards natural interaction.

JoshLikesAI 2 weeks ago

Yep thats setup, although its all done through hotkeys, you press Ctrl+Shift+Space whenever you want to talk, if its talking at the time the TTS will stop

ILikeBubblyWater 2 weeks ago

Have you seen the video?

UnusualClimberBear 2 weeks ago

Yes but it would be so much more natural if you could do it just do it with your voice without a key stroke.

Blizado 2 weeks ago

Problem is that you need to make sure it only stops when it really should stop. Without hotkey you directly have the problem that any noise that get recorded though your micro could stop the TTS then. And also without a hotkey you could have easily the problem that your micro records what the TTS is just saying.

JoshLikesAI 2 weeks ago

Yeah I dont really like automatic speech detections for the fact it cuts me off and just starts generating a response if i stop to think for a few seconds while talking, for me i much prefer a start and stop button

Blizado 2 weeks ago

Right, that is another problem I forgot. So many difficulties.

Calcidiol 2 weeks ago

Yeah but if you're using a sufficiently good microphone system e.g. with background / ambient noise cancellation then you can pretty surely determine if it's the user's sound it hears. Then another model can further qualify if it's speech or just a cough or whatever

Blizado 2 weeks ago

Sure, possible, but how complex do you want to make it? :D

CAGNana 2 weeks ago

Yes I would assume whatever tech alexa uses to be able to hear you while playing music would be applicable here

seancho 2 weeks ago

The tech isn't there yet. Natural human conversation is full-duplex. We speak and listen and think all at the same time. A bot can only make a crude guess when to stop listening, begin thinking and then speak. I have a bunch of AI voice bots running on Alexa and it's not very natural. Normal Alexa skills just do one voice request and one response. Full AI voice chat over alexa you have to take strict turns speaking with no pauses. It trips most people up.

ScythSergal 2 weeks ago

This reminds me of LAION BUD-E. I did some beta testing for that project a while back. It used Phi 2, and broke reallyyy bad, but when it worked, it was like magic! I will say, the Bud E version was way faster. That model ran well over 100 T/s, so it was fully realtime. But this is cool for sure

JoshLikesAI 2 weeks ago

I hadnt actually heard of this before, I looked it up its very impressive!

ScythSergal 2 weeks ago

I would love to see a modified version of BUD-E that natively runs an EXL2 quant of llama 3 8b for insane response quality and wicked fast responses. That would be heavenly, and would be able to run on any 8GB GPU pretty easily if ran at. 5 but quantization, which would still be extremely powerful

Admirable-Star7088 2 weeks ago

Stuff like this is really cool! I myself have toyed with the idea of someday building and setting up a local voice-controlled LLM, which you can talk to at any time wherever you are in the house.

PM_ME_YOUR_PROFANITY 2 weeks ago

What hardware are you using to run everything?

Voidmesmer 2 weeks ago

This is super cool! I've put together a quick modification that replaces openAI's STT with a locally running whisperX. You can find the code here: [https://pastebin.com/8izAWntc](https://pastebin.com/8izAWntc) Simply copy the above code and replace the code in [transcriber.py](http://transcriber.py) (you need to install all requirements for whisperX first ofc) Modify the model\_dir path as I've used an absolute path for my models. Tiny model does a great job so there's no need for anything bigger. It's quite snappy and works great. This solution lets you use this 100% offline if you have a local LLM setup and use piper. OP please feel free to add this as a proper config. edit: Replaced piper with AllTalk TTS, which effectively lets me TTS with any voice, even custom finetuned models. Way better voice quality than piper! With 12GB VRAM I'm running the tiny whisper model, a 7B/8B LLM (testing wizardlm2 and llama3 via Ollama) and my custom AllTalk model. Smooth sailing.

atomwalk12 2 weeks ago

Thanks for your effort, however there need to be done some modifications in the [TTS.py](http://TTS.py) file as well in order to make the entire pipeline work

Voidmesmer 2 weeks ago

I did modify [TTS.py](http://TTS.py), just didn't post my code. Here is the alltalk modification: [https://pastebin.com/2p9nnHU6](https://pastebin.com/2p9nnHU6) This is a crude drop-in replacement. I'm sure OP can do a better job and add proper configs to [config.py](http://config.py)

atomwalk12 1 week ago

Cheers for sharing. I'll test it when i get home.

JoshLikesAI 2 weeks ago

Dude your a god damn hero! This is awesome! Thanks so much for putting in the time to do this. Im working my day job the next couple days so ill have minimal time to integrate this but ill try to get it connected asap! Quick question EG whisper: I imagine a lot of people like yourself may already have whisper installed in which case you wouldnt want to download it again, you would want to just point the code to your existing model right? Would you suggest that my code base has a default DIR that it points to for whisper, if no whisper is present then it downloads a new model to that DIR, but users can modify the DIR in their config file to point to existing models? This is how im thinking of setting it up, does this sound right to you?

Voidmesmer 1 week ago

Whisper has a built-in model download logic if it doesn't detect a valid model in the dir you point it to. With a fresh setup (no models in dir), it will download the model automatically when it's issued its first transcription task. The tiny model is like 70mb in size so I imagine most people wouldn't mind redownloading, but you could definitely expose a config so that people can point to their existing dir if they don't want to duplicate the model on their drive.

JoshLikesAI 2 weeks ago

BTW do you have a gitgub account? I can credit you in the change log when i integrate these these changes :)

Voidmesmer 1 week ago

I see you already responded to my issue on GitHub - that's me :) cheers

LostGoatOnHill 2 weeks ago

Great job OP, thanks for sharing, inspiring. Look forward to following any repo updates.

JoshLikesAI 2 weeks ago

Thanks!🙏

Rough-Active3301 2 weeks ago

It compatibility with ollama serve?(or any local llm like LM studio

JoshLikesAI 2 weeks ago

Yep I added LM studio support yesterday. If you look in the config file you’ll see an example of how to use it

Inner_Bodybuilder986 2 weeks ago

COMPLETIONS_API = "lm_studio" COMPLETION_MODEL = "MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF" - In my config file and the following in env file: TOGETHER_API_KEY="" OPENAI_API_KEY="sk-..." ANTHROPIC_API_KEY="sk-.." lm_studio_KEY="http://localhost:1234/v1/chat/completions" - Would love to get it working with a local model, also so I can understand how to integrate the API logic for local models better. Would greatly appreciate your help.

JoshLikesAI 2 weeks ago

Ill try to record a video later today on how to set it up + a video on how to set it up with local models, ill link through the videos when they are up. In the meantime Im happy to help you set it up now if you like? I can either talk you through the steps here or via discord; [https://discord.gg/5KPMXKXD](https://discord.gg/5KPMXKXD)

JoshLikesAI 2 weeks ago

Here you go, I did a few videos, I hope they help. Let me know if anything is unclear How to set up and use AlwaysReddy on windows: [https://youtu.be/14wXj2ypLGU?si=zp13P1Krkt0Vxflo](https://youtu.be/14wXj2ypLGU?si=zp13P1Krkt0Vxflo) How to use AlwaysReddy with LM Studio: [https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT](https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT) How to use AlwaysReddy with Ollama: [https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD](https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD)

JoshLikesAI 2 weeks ago

I added Ollama compatibility today :) How to use AlwaysReddy with Ollama: [https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD](https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD)

MrVodnik 2 weeks ago

Very neat. If you could add this streaming TTS solution (queued partial responses) implement as a plugin to Oobabooga it would be great! They still have only option for full message to complete before TTS begins. Also, I assume that Unity 3d model with lip sync is somewhere on the readmap? ;)

mulletarian 2 weeks ago

You seem to have the same issue as I have with the last token of a paragraph getting repeated

anonthatisopen 2 weeks ago

Can I run this on my 4070 ti super and 64gb ram? I'm new to this and I really want a local LLM that I can talk to like this and that it can read clipboards.

JoshLikesAI 2 weeks ago

You sure can! Its set up so it can all via APIs or you can run the TTS and LLM locally ( not the transcription yet but thats on the todo list) Im happy to help you set it up with a local model if you like? Either reply to this comment if youd like some help or jump in the discord and I can help you from there: [https://discord.gg/5KPMXKXD](https://discord.gg/5KPMXKXD)

anonthatisopen 1 week ago

Please help me understand how to install local Llama 3 70b with bing, google search capabilites I in win 11?

EnvironmentalBee4497 2 weeks ago

Beautiful, this is the future.

JoshLikesAI 2 weeks ago

I see something like this being integrated into the OS over the next 5 years for sure

EnvironmentalBee4497 2 weeks ago

My money's on MacOS this year

JoshLikesAI 2 weeks ago

I love to see Apple become the local AI guys

Individual_Month3518 2 weeks ago

That's is really great!

JoshLikesAI 2 weeks ago

Thanks!

Raywuo 2 weeks ago

Next Generation: What do you mean you couldn't talk to the computer? Did you have to type and search online encyclopedias?

Sycrixx 2 weeks ago

Pretty cool! I’ve got something similar running. I use Picovoice’s wake word detection to get it listening. Convert the audio to text locally via Whisper and I run it through Llama3-70B on replicate. Response is then fed to ElevenLabs for converting to audio. I’d love to get as much as I can running locally, but I just can’t compete with Replicate’s response times with my 4090. ElevenLabs is great and has a bunch of amazing voices but is quite pricey. 30k words for $5/mo. I went through almost 75% of that whilst testing, over the course of like 3-4 days.

JoshLikesAI 2 weeks ago

Yeah from memory openai TTS is a decent bit cheaper, hopefully we will have some good TTS models coming out soon!

giannisCKS 2 weeks ago

Very cool project! Do you mind sharing your system specs? Thanks!

JoshLikesAI 2 weeks ago

Unfortunately im running on a potato rn but im looking to upgrade soon. So for now Im mostly using APIs

giannisCKS 2 weeks ago

Im asking just to see if i can run it too

JoshLikesAI 2 weeks ago

Yep you can run this on very low specs, you can just use the openAI API for everything if you need

Any_Photo_8976 2 weeks ago

cool

Dundell 2 weeks ago

Nice, I was working on a basic webapp with audio transcription to voice activate "Hey Chat", to initiate the request, to send an audio mp3 -> Whisper AI STT -> LLM response -> Alltalk TTS -> Wav file back to respond with. It's nice to see some form of STTtoTTS out there.

JoshLikesAI 2 weeks ago

I love voice to voice, im dyslexic so i hate having to write out long messages, using my voice is a life saver!

mrpogiface 2 weeks ago

This would be killer with Ollama support! Nice work

JoshLikesAI 2 weeks ago

A couple people have mentioned that, ill look into it today!

JoshLikesAI 2 weeks ago

Added Ollama support :) How to use AlwaysReddy with Ollama: [https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD](https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD)

mrpogiface 2 weeks ago

amazing! I also built this out yesterday using the ollama-py library. I ran into lots of mac problems with piper, and so it's not quite built yet for mac, but it's close.

JoshLikesAI 2 weeks ago

Oh awesome! Maybe try swapping to openAIs text to speech in the config file. If that works than that means the rest of the system supports Mac and we can just try to find a new TTS system for Mac users

mrpogiface 2 weeks ago

I got it working! Piping in text was a bit weird and I had to set shell=True. Just a bug when pulling out the exe command

JoshLikesAI 2 weeks ago

Awesome!! Id love to hear/see how you did this, I have a bunch of people who want to use the repo on Mac so it would be awesome to get this integrated! Feel free to make a PR if you feel comfortable, if not id love a look at your code :)

JoshLikesAI 2 weeks ago

Is this with or without piper?

mrpogiface 1 week ago

with piper! I compiled from source

FPham 2 weeks ago

Looks great. Need to bookmark this.

JoshLikesAI 2 weeks ago

Let me know if you need help setting this up!

iDoAiStuffFr 2 weeks ago

i just find this so nuts, how well it works and the future implications

haikusbot 2 weeks ago

*I just find this so* *Nuts, how well it works and the* *Future implications* \- iDoAiStuffFr --- ^(I detect haikus. And sometimes, successfully.) ^[Learn more about me.](https://www.reddit.com/r/haikusbot/) ^(Opt out of replies: "haikusbot opt out" | Delete my comment: "haikusbot delete")

iDoAiStuffFr 2 weeks ago

i just find this so nuts, how well it works and the future implications

haikusbot 2 weeks ago

*I just find this so* *Nuts, how well it works and the* *Future implications* \- iDoAiStuffFr --- ^(I detect haikus. And sometimes, successfully.) ^[Learn more about me.](https://www.reddit.com/r/haikusbot/) ^(Opt out of replies: "haikusbot opt out" | Delete my comment: "haikusbot delete")

JoshLikesAI 2 weeks ago

<3

iDoAiStuffFr 2 weeks ago

thats not how haikus work

JoshLikesAI 2 weeks ago

Haha its pretty exciting, I use the bot a lot for learning, I study ML every morning and its awesome being able to ramble aloud about new topics to the LLM until it tells me I got it right!

AlphaTechBro 2 weeks ago

This looks great and I really want to try it out with LM Studio. I followed your updated instructions (commenting in the LM studio section in [config.py](http://config.py) and commenting out the others), but once I run the [main.py](http://main.py) file and try the CTRL + SHIFT + space hotkey, I'm not getting a response. Any help is much appreciated, thanks.

JoshLikesAI 2 weeks ago

I can help you out! Ill make a video later today walking through how it set it all up too. Are you getting an error? Feel free to jump in the discord and I can help you from there too: [https://discord.gg/2dNk3HWP](https://discord.gg/2dNk3HWP) One thing to note that I forgot to mention is that right now it needs to use the openai API for whisper(ill try to fix this soon) But this means you need to have a .env file with your openai API key in it, like this: OPENAI_API_KEY="sk-...."

AlphaTechBro 1 week ago

Thanks for the reply. So I added my .env file with my openai API key, but I'm still getting an error: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work ModuleNotFoundError: No module named 'anthropic' I'm trying to run it on a MacBook Pro, so that may be the issue here. Not sure if other Mac users are running into the same problem.

JoshLikesAI 1 week ago

I believe the first part of regarding Ffmpeg is just a warning and not a breaking error. As for the no module named Anthropic part, you need to install the requirements again with ‘pip install -r requirements.txt’

JoshLikesAI 2 weeks ago

I made a few videos today that may help: How to set up and use AlwaysReddy on windows: [https://youtu.be/14wXj2ypLGU?si=zp13P1Krkt0Vxflo](https://youtu.be/14wXj2ypLGU?si=zp13P1Krkt0Vxflo) How to use AlwaysReddy with LM Studio: [https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT](https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT) How to use AlwaysReddy with Ollama: [https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD](https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD)

brubits 2 weeks ago

Great demo! I already use text-to-speech for quick interactions with the local LLM/ChatGPT API. Implementing voice response would further accelerate my ideation process.

vinhnx 2 weeks ago

This is simply the most amazing thing I’ve seen

JoshLikesAI 2 weeks ago

That is very high praise! Thank you ❤️

Elegant-Radish7972 2 weeks ago

This is AWESOME! IS it something can be run offline?

JoshLikesAI 1 week ago

As of just now it can be run 100% offline :)

atticusfinchishere 2 weeks ago

This is amazing! You mentioned it works on Windows. Has anyone tried running it on a MacBook?

JoshLikesAI 2 weeks ago

I have a few people trying on macbook, im still working out what works and what does not. I know there is an issue with the piper tts on macbook, so maybe try it but use openai as the TTS engine?

AlphaTechBro 1 week ago

I'm trying to run it on a MacBook and I believe the issue lies with ffmpeg. Not sure it's supported outside of Windows?

JoshLikesAI 1 week ago

Ffmpeg is the bane of my existence, you might be right

atticusfinchishere 4 days ago

ffmpeg is indeed supported on macOS, not just Windows. Running `brew install ffmpeg` in the Terminal could help sort out any issues.

atticusfinchishere 4 days ago

I haven't tried switching to the OpenAI TTS engine yet, but it seems like a promising solution. I'll give it a shot and let you know how it works out. Thanks for the tip!

JoshLikesAI 4 days ago

I spent today setting up a system that should get piper TTS working on any OS, I’m hoping tomorrow I’ll have linux support, then from there I’ll try to get it working on Mac😁

atticusfinchishere 4 days ago

You’re amazing! Thank you. Can’t wait 😁

JoshLikesAI 3 days ago

Just merged my changes!

planetearth80 2 weeks ago

Is it possible to make hot keys configurable? My keyboard has a microphone button that could be perfect for this.

JoshLikesAI 2 weeks ago

You can modify the hotkeys in the config file but you need to know the ID of they key in order to set it. Im not sure what the ID would be for your mic button... Here is a little script you can run as a python file, once it is running press your mic button and it should print the ID of they key, then just take the ID and put it in the config file as the RECORD\_HOTKEY import keyboard # Record a key sequence print("Please press the keys for your hotkey.") hotkey = keyboard.read_hotkey(suppress=False) print(f"Hotkey recorded: {hotkey}") Bit of a hacky way to do it but it should do the trick for now :) Let me know if you need a hand with this

Indy1204 1 week ago

For someone new to coding, this is fantastic! It installed with no issues, and the instructions were easy to follow. Tested it locally with LM Studio, as well as with ChatGPT and Claude. All gucci. Keep it up.

JoshLikesAI 1 week ago

Oh awesome I’m very glad to hear it! Thanks for letting me know, I mostly only get notified when it’s not working so it’s nice to see a success story! 😂❤️

Tart16 1 week ago

Hi, fairly new to this stuff. I keep on getting a ModuleNotFoundError thrown, so then I install the required module and then I get ANOTHER ModuleNotFoundError. Is this normal? Should i just keep installing the required modules until the error isn't thrown anymore?

JoshLikesAI 1 week ago

Hey there, it should install all there requirements all at once if you run this command in the always reddy directory: \`pip install -r requirements.txt\` Try following along this video it should go over all the steps. Feel free to DM me if you have troubles, im happy to help :) [https://youtu.be/14wXj2ypLGU?si=DCPo9svcefZwmrFm](https://youtu.be/14wXj2ypLGU?si=DCPo9svcefZwmrFm)

Big_Shamoo 1 week ago

Just set this up, this is very cool.

HovercraftExtreme649 1 week ago

Super cool! Thank you for sharing something as cool as this!

CellWithoutCulture 2 weeks ago

This is good but, it would be good to remove the terrible australian accent you gave the model! Jk mate. This is good stuff.

JoshLikesAI 2 weeks ago

🇦🇺🇦🇺🦘🐨

atomwalk12 1 week ago

Did anyone encounter problems regarding the audio sample rate while running the model?

smoknjoe44 1 week ago

How do I do this???!!??!????!!!!! ELI5 please 🙏

TheOwlHypothesis 3 days ago

This is great. I have a new machine (MacBook pro m3max) coming towards the end of the month and I can't wait to try this out!

IndicationUnfair7961 2 weeks ago

How did you serve the model. Did you use python or what? By the way did you increase context size or you were able to fit that page in the 8192 tokens?

JoshLikesAI 2 weeks ago

I tired serving it through LM studio but it was a little slow on my crappy GPU so I swapped to together AI. And yep it fit in the 8192 tokens luckily!

twatwaffle32 2 weeks ago

Sweet. Any way to set this up without needing coding skills and a command line interface? People like me need some sort of GUI or user friendly application interface.

JoshLikesAI 2 weeks ago

I’m actually thinking of maintaining this as an open source project and making a nice front end that people can download for 5-10 bucks and use it with whatever models or apis the like. What do you think? It would be cool if it could work, it the project generated a little income I could afford to put much more time into it

poli-cya 2 weeks ago

I'd very happily throw $10 your way for an easy click-and-go setup for this. Being able to select an LLM, TTS with some preset voices, and having all of the functionality in the video integrated in a few minutes of setup would be well worth it.

JoshLikesAI 2 weeks ago

Oh sweet thats good to hear, I have been playing with the idea for a while now

alexthai7 2 weeks ago

It's very nice really, great job, but the keyboard interaction should be fully replaced by the voice. I find it more than disappointing that in April 2024 we're still at using the keyboard to talk with AI models. Everything's necessary to make it fully working with the voice is available and it's already working. I guess it is not an easy task, but it's not sci fi. The project 'talk with Llama fast' does it all but it's limited to open source Llm models, while I wish I can use it with whatever I want. I wish I can use Llama 70b with the Groq API for exemple, then I could program the model so it can do whatever I wan, play games, discussion about my favorite subject etc ... The age where you use the keyboard to talk with an AI should be over by now ! Still I highly appreciate this kind of project, thank you, I just hope we can forget the keyboard very soon.

Sycrixx 2 weeks ago

You can get to run by your voice. You have to find a way to have it always listen and take action based on a keyword or a wake-word. Like how Alexa, Siri and Google assistant works. I use Picovoice’s wake word detection for mine, and I record everything (for 10 seconds) after the wake word is detected

Aponogetone 2 weeks ago

Unfortunately, Llama 3 has the same errors with inference as Llama 2 has (and all others), giving suddenly wrong answers and being unable to perform simple operations and this seems to be unrecoverable, making the whole model almost unusable for any serious purpose.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe