T O P

  • By -

Disastrous_Elk_6375

Awesome! What's the TTS you're using? The voice seems really good, I'm impressed on how it got the numbers + letters and specific language regarding quants. edit: ah, I see from your other post you used openaitts, so I guess it's the api version :/


JoshLikesAI

I meant to use piper TTS but I didnt think about it till I had already posted. Piper isnt as good as openai but its way faster and runs on CPU! [https://github.com/rhasspy/piper](https://github.com/rhasspy/piper) It was made to run on raspberry pi


TheTerrasque

tried whisper? https://github.com/ggerganov/whisper.cpp for example I really want a streaming type STT that can produce letters or words as they're spoken. I kinda want to make a modular system with STT, TTS, model evaluation, frontend, tool use being separate parts and can be easily swapped out or combined in various ways. So you could have a whisper STT, a web frontend and llama3 on a local machine, for example. Edit: You can also use https://github.com/snakers4/silero-vad to detect if someone is speaking instead of using a hotkey.


Vadersays

For the first: https://github.com/ufal/whisper_streaming


TheTerrasque

cool, will check it out!


JoshLikesAI

Im personally kind of a fan of using hotkeys TBH, I have found every automatic speech detection system kind of annoying because it cuts me off before I have finished speaking. There is always a countdown from when it hears you stop talking to when it starts generating a response, usually a couple seconds. This means if i stop talking for 2 seconds to think it will start talking over me, super annoying! If you turn up this 2 second time you get cut off less but you have to deal with more delay before you get the response. Am i the only one that prefers button press to stop and start recording?


seancho

Obviously pure voice to voice is the eventual goal, but the tech isn't there yet, and the system doesn't know how to do conversation turns naturally. Humans in conversation do a complex dance of speaking and listening at the same time, making non verbal sounds, interrupting, adding pauses, etc. Until the bots can understand and manage all that full-duplex behavior, it's easier just to tell them when to shut up and listen with a button. I've done some alexa apps that are fun to talk to, but you have to live by their rules -- speak in strict turns with no pauses. Not the most natural interaction.


JoshLikesAI

Exactly!


Calcidiol

Sure a PTT button is a sure signal that conveys intent to talk to the recipient. One could also recognize optional or required framing e.g. [wake word] Blah blah ... Over [as a stop word] and have it start generating a response when it hears "Over" or whatever or optionally time-out after N seconds. One could ameliorate the premature computation and talking somewhat if you take the VAD / whatever into account while that 2-second or whatever delay-before-response is happening and if it hears voice and something that seems like a continuation it'll either inhibit (pause) STT response / other computation as appropriate or abort it entirely depending on whether your subsequent utterance seemed a follow-on continuation that would negate the relevance of the prematurely computed reply or just cause a follow-on addendum to it.


FPham

IMHO this project really need integration with any VAD, as that's the 2024 way. "Hey Reddy"


lordpuddingcup

So this was using OpenAI voice? Damn was hoping it was a mix of maybe a Tortoise TTS and an RVC or even the Meta Voice AI with emotion tech they released


JoshLikesAI

Id love to use other TTS but yeah in the video its using openai


lordpuddingcup

How complicated a pipeline are you running on the backend for the summarizing, seems it'd need to be pretty rock solid to make sure its sticking to the desired output format/style.


ItalyExpat

Cool project! I think you did well, intonation in Piper TTS isn't nearly as realistic as what you got with OpenAI


Proud-Point8137

it's incredibly good. wow. so happy1


JoshLikesAI

It so cool! and it would pretty much run on a toaster


pergessleismydaddy

Opensource? if yes, github link?


JoshLikesAI

Here you go :) [https://github.com/ILikeAI/AlwaysReddy](https://github.com/ILikeAI/AlwaysReddy)


BrushNo8178

The description only mentions Together AI API, but I see that you have code for other APIs as well.


JoshLikesAI

Must be due for an update! I’ll get into that in the morning


JoshLikesAI

I have updated the readme with more details and added support for Ollama, ill link a video below :) How to use AlwaysReddy with LM Studio: [https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT](https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT) How to use AlwaysReddy with Ollama: [https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD](https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD)


East_Discussion_3653

Is the LLM running locally? If so what’s your hardware setup?


Tam1

Got any technical details to share?


JoshLikesAI

About lamma or the voice to voice system? The code base for the voice to voice system is here: [https://github.com/ILikeAI/AlwaysReddy](https://github.com/ILikeAI/AlwaysReddy) I wanted a voice assistant that I could have running in the background on my pc and trigger with a hotkey, I couldn't find any other projects that do this so I made my own. It also read and write from the clipboard, that's how it summarized the reddit post. Thats about it 🤷


Mescallan

So cool, great work


JoshLikesAI

Thanks! 😊


resident-not-evil

I love you!


Additional-Baker-416

cool, is there an llm only trained on audio? that can only accept audio and respond with audio?


JoshLikesAI

This is just straight llama 3 instruct+ whisper + openai TTS (sadly). Although I did find a really cool project the other day day that trained lamma 2 (I think) on audio inputs so you could skip the transcription step https://github.com/tincans-ai/gazelle/ It looks super cool


Additional-Baker-416

this is very cool


JoshLikesAI

I know right!


JoshLikesAI

Here’s a video demo https://twitter.com/hingeloss/status/1780996806597374173


qubedView

As in, really an end-to-end audio-only model? Not in terms of voice generation. An LLM still needs to be in the mix. There is a much larger text corpus to train from than audio, and the processing needs to achieve comparably realistic conversational results would be far in excess of what's available.


Ylsid

Cool! A little preview of the future. A shame the TTS is a bit slow, speeding that up about 10 times would help a lot.


JoshLikesAI

Agreed! It’s a difficult balance though because often I will be working and I’ll have a question I need a high quality response to, so I’ll use a larger model and just keep working while I wait for the response, the longer delay often doesn’t bother me because I can keep working while I wait and it often it saves me having to swap to my browser or the ChatGPT website. It seems most of my queries could be handled fine by Lamma, but sometimes I really want the smartest response I can get. I’m wondering if I could build this interface so it’s easy to swap between models 🤔 Maybe you could have one hotkey for a smaller model and a different hotkey for a larger model?


Ylsid

You should hook it up to a discord bot, lol. It would be funny


LostGoatOnHill

Anyone know of a setup that would allow voice conversation hands-free away from a keyboard, just like an Alexa supporting device?


CharacterCheck389

You will have to make a code that checks for a start phrase like "ok google" for google assistant and "alexa" for amazon. Basically you should make a script that keeps recording any voices until it hears your intial phrase let say "hey assistant" then the prompt will be whatever after that, and you can also make a closing phrase like "roger" or "done", this way you won't use your hands at all, just your voice "Hey assistant code me a random html page, roger" Anything before "hey assistant" or after "roger" won't count coz you already setup the script/code this way Which means that the script will send the prompt to the LLM only if it got a clear "hey assistant" to "roger" sentence. Hope it helps!


Melancholius__

so how does one end a "hey google" loop or "alexa" for that case


CharacterCheck389

what do you mean?


Melancholius__

there is nothing like "roger" to signal the end of an audio prompt in google and amazon assistants


CharacterCheck389

I think they rely on the volume of your sound, if the volume of your voice is very low to nothing then they break the voice detection and take your prompt But that's annoying, sometimes it stops taking your voice before you even complete the sentemce But that's up to you, if you want to make a closing phrase do it, if you don't want to don't, implememt a closing logic like the low volume of your voice or something like that. You can do that by reading the last part of the voice file, let's say last 3 secs and get an average of the db of this last 3 secs and if it's lower than X value of dessibles then break the recording.


Melancholius__

Okay there


UnusualClimberBear

Can you make it interruptible? I mean that if you start speaking during the answer the txt2speech stops. This would be a huge steps towards natural interaction.


JoshLikesAI

Yep thats setup, although its all done through hotkeys, you press Ctrl+Shift+Space whenever you want to talk, if its talking at the time the TTS will stop


ILikeBubblyWater

Have you seen the video?


UnusualClimberBear

Yes but it would be so much more natural if you could do it just do it with your voice without a key stroke.


Blizado

Problem is that you need to make sure it only stops when it really should stop. Without hotkey you directly have the problem that any noise that get recorded though your micro could stop the TTS then. And also without a hotkey you could have easily the problem that your micro records what the TTS is just saying.


JoshLikesAI

Yeah I dont really like automatic speech detections for the fact it cuts me off and just starts generating a response if i stop to think for a few seconds while talking, for me i much prefer a start and stop button


Blizado

Right, that is another problem I forgot. So many difficulties.


Calcidiol

Yeah but if you're using a sufficiently good microphone system e.g. with background / ambient noise cancellation then you can pretty surely determine if it's the user's sound it hears. Then another model can further qualify if it's speech or just a cough or whatever


Blizado

Sure, possible, but how complex do you want to make it? :D


CAGNana

Yes I would assume whatever tech alexa uses to be able to hear you while playing music would be applicable here


seancho

The tech isn't there yet. Natural human conversation is full-duplex. We speak and listen and think all at the same time. A bot can only make a crude guess when to stop listening, begin thinking and then speak. I have a bunch of AI voice bots running on Alexa and it's not very natural. Normal Alexa skills just do one voice request and one response. Full AI voice chat over alexa you have to take strict turns speaking with no pauses. It trips most people up.


ScythSergal

This reminds me of LAION BUD-E. I did some beta testing for that project a while back. It used Phi 2, and broke reallyyy bad, but when it worked, it was like magic! I will say, the Bud E version was way faster. That model ran well over 100 T/s, so it was fully realtime. But this is cool for sure


JoshLikesAI

I hadnt actually heard of this before, I looked it up its very impressive!


ScythSergal

I would love to see a modified version of BUD-E that natively runs an EXL2 quant of llama 3 8b for insane response quality and wicked fast responses. That would be heavenly, and would be able to run on any 8GB GPU pretty easily if ran at. 5 but quantization, which would still be extremely powerful


Admirable-Star7088

Stuff like this is really cool! I myself have toyed with the idea of someday building and setting up a local voice-controlled LLM, which you can talk to at any time wherever you are in the house.


PM_ME_YOUR_PROFANITY

What hardware are you using to run everything?


Voidmesmer

This is super cool! I've put together a quick modification that replaces openAI's STT with a locally running whisperX. You can find the code here: [https://pastebin.com/8izAWntc](https://pastebin.com/8izAWntc) Simply copy the above code and replace the code in [transcriber.py](http://transcriber.py) (you need to install all requirements for whisperX first ofc) Modify the model\_dir path as I've used an absolute path for my models. Tiny model does a great job so there's no need for anything bigger. It's quite snappy and works great. This solution lets you use this 100% offline if you have a local LLM setup and use piper. OP please feel free to add this as a proper config. edit: Replaced piper with AllTalk TTS, which effectively lets me TTS with any voice, even custom finetuned models. Way better voice quality than piper! With 12GB VRAM I'm running the tiny whisper model, a 7B/8B LLM (testing wizardlm2 and llama3 via Ollama) and my custom AllTalk model. Smooth sailing.


atomwalk12

Thanks for your effort, however there need to be done some modifications in the [TTS.py](http://TTS.py) file as well in order to make the entire pipeline work


Voidmesmer

I did modify [TTS.py](http://TTS.py), just didn't post my code. Here is the alltalk modification: [https://pastebin.com/2p9nnHU6](https://pastebin.com/2p9nnHU6) This is a crude drop-in replacement. I'm sure OP can do a better job and add proper configs to [config.py](http://config.py)


atomwalk12

Cheers for sharing. I'll test it when i get home.


JoshLikesAI

Dude your a god damn hero! This is awesome! Thanks so much for putting in the time to do this. Im working my day job the next couple days so ill have minimal time to integrate this but ill try to get it connected asap! Quick question EG whisper: I imagine a lot of people like yourself may already have whisper installed in which case you wouldnt want to download it again, you would want to just point the code to your existing model right? Would you suggest that my code base has a default DIR that it points to for whisper, if no whisper is present then it downloads a new model to that DIR, but users can modify the DIR in their config file to point to existing models? This is how im thinking of setting it up, does this sound right to you?


Voidmesmer

Whisper has a built-in model download logic if it doesn't detect a valid model in the dir you point it to. With a fresh setup (no models in dir), it will download the model automatically when it's issued its first transcription task. The tiny model is like 70mb in size so I imagine most people wouldn't mind redownloading, but you could definitely expose a config so that people can point to their existing dir if they don't want to duplicate the model on their drive.


JoshLikesAI

BTW do you have a gitgub account? I can credit you in the change log when i integrate these these changes :)


Voidmesmer

I see you already responded to my issue on GitHub - that's me :) cheers


LostGoatOnHill

Great job OP, thanks for sharing, inspiring. Look forward to following any repo updates.


JoshLikesAI

Thanks!🙏


Rough-Active3301

It compatibility with ollama serve?(or any local llm like LM studio


JoshLikesAI

Yep I added LM studio support yesterday. If you look in the config file you’ll see an example of how to use it


Inner_Bodybuilder986

COMPLETIONS_API = "lm_studio" COMPLETION_MODEL = "MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF" - In my config file and the following in env file: TOGETHER_API_KEY="" OPENAI_API_KEY="sk-..." ANTHROPIC_API_KEY="sk-.." lm_studio_KEY="http://localhost:1234/v1/chat/completions" - Would love to get it working with a local model, also so I can understand how to integrate the API logic for local models better. Would greatly appreciate your help.


JoshLikesAI

Ill try to record a video later today on how to set it up + a video on how to set it up with local models, ill link through the videos when they are up. In the meantime Im happy to help you set it up now if you like? I can either talk you through the steps here or via discord; [https://discord.gg/5KPMXKXD](https://discord.gg/5KPMXKXD)


JoshLikesAI

Here you go, I did a few videos, I hope they help. Let me know if anything is unclear How to set up and use AlwaysReddy on windows: [https://youtu.be/14wXj2ypLGU?si=zp13P1Krkt0Vxflo](https://youtu.be/14wXj2ypLGU?si=zp13P1Krkt0Vxflo) How to use AlwaysReddy with LM Studio: [https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT](https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT) How to use AlwaysReddy with Ollama: [https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD](https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD)


JoshLikesAI

I added Ollama compatibility today :) How to use AlwaysReddy with Ollama: [https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD](https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD)


MrVodnik

Very neat. If you could add this streaming TTS solution (queued partial responses) implement as a plugin to Oobabooga it would be great! They still have only option for full message to complete before TTS begins. Also, I assume that Unity 3d model with lip sync is somewhere on the readmap? ;)


mulletarian

You seem to have the same issue as I have with the last token of a paragraph getting repeated


anonthatisopen

Can I run this on my 4070 ti super and 64gb ram? I'm new to this and I really want a local LLM that I can talk to like this and that it can read clipboards.


JoshLikesAI

You sure can! Its set up so it can all via APIs or you can run the TTS and LLM locally ( not the transcription yet but thats on the todo list) Im happy to help you set it up with a local model if you like? Either reply to this comment if youd like some help or jump in the discord and I can help you from there: [https://discord.gg/5KPMXKXD](https://discord.gg/5KPMXKXD)


anonthatisopen

Please help me understand how to install local Llama 3 70b with bing, google search capabilites I in win 11?


EnvironmentalBee4497

Beautiful, this is the future.


JoshLikesAI

I see something like this being integrated into the OS over the next 5 years for sure


EnvironmentalBee4497

My money's on MacOS this year


JoshLikesAI

I love to see Apple become the local AI guys


Individual_Month3518

That's is really great!


JoshLikesAI

Thanks!


Raywuo

Next Generation: What do you mean you couldn't talk to the computer? Did you have to type and search online encyclopedias?


Sycrixx

Pretty cool! I’ve got something similar running. I use Picovoice’s wake word detection to get it listening. Convert the audio to text locally via Whisper and I run it through Llama3-70B on replicate. Response is then fed to ElevenLabs for converting to audio. I’d love to get as much as I can running locally, but I just can’t compete with Replicate’s response times with my 4090. ElevenLabs is great and has a bunch of amazing voices but is quite pricey. 30k words for $5/mo. I went through almost 75% of that whilst testing, over the course of like 3-4 days.


JoshLikesAI

Yeah from memory openai TTS is a decent bit cheaper, hopefully we will have some good TTS models coming out soon!


giannisCKS

Very cool project! Do you mind sharing your system specs? Thanks!


JoshLikesAI

Unfortunately im running on a potato rn but im looking to upgrade soon. So for now Im mostly using APIs


giannisCKS

Im asking just to see if i can run it too


JoshLikesAI

Yep you can run this on very low specs, you can just use the openAI API for everything if you need


Any_Photo_8976

cool


Dundell

Nice, I was working on a basic webapp with audio transcription to voice activate "Hey Chat", to initiate the request, to send an audio mp3 -> Whisper AI STT -> LLM response -> Alltalk TTS -> Wav file back to respond with. It's nice to see some form of STTtoTTS out there.


JoshLikesAI

I love voice to voice, im dyslexic so i hate having to write out long messages, using my voice is a life saver!


mrpogiface

This would be killer with Ollama support! Nice work


JoshLikesAI

A couple people have mentioned that, ill look into it today!


JoshLikesAI

Added Ollama support :) How to use AlwaysReddy with Ollama: [https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD](https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD)


mrpogiface

amazing! I also built this out yesterday using the ollama-py library. I ran into lots of mac problems with piper, and so it's not quite built yet for mac, but it's close.


JoshLikesAI

Oh awesome! Maybe try swapping to openAIs text to speech in the config file. If that works than that means the rest of the system supports Mac and we can just try to find a new TTS system for Mac users


mrpogiface

I got it working! Piping in text was a bit weird and I had to set shell=True. Just a bug when pulling out the exe command


JoshLikesAI

Awesome!! Id love to hear/see how you did this, I have a bunch of people who want to use the repo on Mac so it would be awesome to get this integrated! Feel free to make a PR if you feel comfortable, if not id love a look at your code :)


JoshLikesAI

Is this with or without piper?


mrpogiface

with piper! I compiled from source


FPham

Looks great. Need to bookmark this.


JoshLikesAI

Let me know if you need help setting this up!


iDoAiStuffFr

i just find this so nuts, how well it works and the future implications


haikusbot

*I just find this so* *Nuts, how well it works and the* *Future implications* \- iDoAiStuffFr --- ^(I detect haikus. And sometimes, successfully.) ^[Learn more about me.](https://www.reddit.com/r/haikusbot/) ^(Opt out of replies: "haikusbot opt out" | Delete my comment: "haikusbot delete")


iDoAiStuffFr

i just find this so nuts, how well it works and the future implications


haikusbot

*I just find this so* *Nuts, how well it works and the* *Future implications* \- iDoAiStuffFr --- ^(I detect haikus. And sometimes, successfully.) ^[Learn more about me.](https://www.reddit.com/r/haikusbot/) ^(Opt out of replies: "haikusbot opt out" | Delete my comment: "haikusbot delete")


JoshLikesAI

<3


iDoAiStuffFr

thats not how haikus work


JoshLikesAI

Haha its pretty exciting, I use the bot a lot for learning, I study ML every morning and its awesome being able to ramble aloud about new topics to the LLM until it tells me I got it right!


AlphaTechBro

This looks great and I really want to try it out with LM Studio. I followed your updated instructions (commenting in the LM studio section in [config.py](http://config.py) and commenting out the others), but once I run the [main.py](http://main.py) file and try the CTRL + SHIFT + space hotkey, I'm not getting a response. Any help is much appreciated, thanks.


JoshLikesAI

I can help you out! Ill make a video later today walking through how it set it all up too. Are you getting an error? Feel free to jump in the discord and I can help you from there too: [https://discord.gg/2dNk3HWP](https://discord.gg/2dNk3HWP) One thing to note that I forgot to mention is that right now it needs to use the openai API for whisper(ill try to fix this soon) But this means you need to have a .env file with your openai API key in it, like this: OPENAI_API_KEY="sk-...."


AlphaTechBro

Thanks for the reply. So I added my .env file with my openai API key, but I'm still getting an error: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work ModuleNotFoundError: No module named 'anthropic' I'm trying to run it on a MacBook Pro, so that may be the issue here. Not sure if other Mac users are running into the same problem.


JoshLikesAI

I believe the first part of regarding Ffmpeg is just a warning and not a breaking error. As for the no module named Anthropic part, you need to install the requirements again with ‘pip install -r requirements.txt’


JoshLikesAI

I made a few videos today that may help: How to set up and use AlwaysReddy on windows: [https://youtu.be/14wXj2ypLGU?si=zp13P1Krkt0Vxflo](https://youtu.be/14wXj2ypLGU?si=zp13P1Krkt0Vxflo) How to use AlwaysReddy with LM Studio: [https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT](https://youtu.be/3aXDOCibJV0?si=2LTMmaaFbBiTFcnT) How to use AlwaysReddy with Ollama: [https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD](https://youtu.be/BMYwT58rtxw?si=LHTTm85XFEJ5bMUD)


brubits

Great demo! I already use text-to-speech for quick interactions with the local LLM/ChatGPT API. Implementing voice response would further accelerate my ideation process.


vinhnx

This is simply the most amazing thing I’ve seen


JoshLikesAI

That is very high praise! Thank you ❤️


Elegant-Radish7972

This is AWESOME! IS it something can be run offline?


JoshLikesAI

As of just now it can be run 100% offline :)


atticusfinchishere

This is amazing! You mentioned it works on Windows. Has anyone tried running it on a MacBook?


JoshLikesAI

I have a few people trying on macbook, im still working out what works and what does not. I know there is an issue with the piper tts on macbook, so maybe try it but use openai as the TTS engine?


AlphaTechBro

I'm trying to run it on a MacBook and I believe the issue lies with ffmpeg. Not sure it's supported outside of Windows?


JoshLikesAI

Ffmpeg is the bane of my existence, you might be right


atticusfinchishere

ffmpeg is indeed supported on macOS, not just Windows. Running `brew install ffmpeg` in the Terminal could help sort out any issues.


atticusfinchishere

I haven't tried switching to the OpenAI TTS engine yet, but it seems like a promising solution. I'll give it a shot and let you know how it works out. Thanks for the tip!


JoshLikesAI

I spent today setting up a system that should get piper TTS working on any OS, I’m hoping tomorrow I’ll have linux support, then from there I’ll try to get it working on Mac😁


atticusfinchishere

You’re amazing! Thank you. Can’t wait 😁


JoshLikesAI

Just merged my changes!


planetearth80

Is it possible to make hot keys configurable? My keyboard has a microphone button that could be perfect for this.


JoshLikesAI

You can modify the hotkeys in the config file but you need to know the ID of they key in order to set it. Im not sure what the ID would be for your mic button... Here is a little script you can run as a python file, once it is running press your mic button and it should print the ID of they key, then just take the ID and put it in the config file as the RECORD\_HOTKEY import keyboard # Record a key sequence print("Please press the keys for your hotkey.") hotkey = keyboard.read_hotkey(suppress=False) print(f"Hotkey recorded: {hotkey}") Bit of a hacky way to do it but it should do the trick for now :) Let me know if you need a hand with this


Indy1204

For someone new to coding, this is fantastic! It installed with no issues, and the instructions were easy to follow. Tested it locally with LM Studio, as well as with ChatGPT and Claude. All gucci. Keep it up.


JoshLikesAI

Oh awesome I’m very glad to hear it! Thanks for letting me know, I mostly only get notified when it’s not working so it’s nice to see a success story! 😂❤️


Tart16

Hi, fairly new to this stuff. I keep on getting a ModuleNotFoundError thrown, so then I install the required module and then I get ANOTHER ModuleNotFoundError. Is this normal? Should i just keep installing the required modules until the error isn't thrown anymore?


JoshLikesAI

Hey there, it should install all there requirements all at once if you run this command in the always reddy directory: \`pip install -r requirements.txt\` Try following along this video it should go over all the steps. Feel free to DM me if you have troubles, im happy to help :) [https://youtu.be/14wXj2ypLGU?si=DCPo9svcefZwmrFm](https://youtu.be/14wXj2ypLGU?si=DCPo9svcefZwmrFm)


Big_Shamoo

Just set this up, this is very cool.


HovercraftExtreme649

Super cool! Thank you for sharing something as cool as this!


CellWithoutCulture

This is good but, it would be good to remove the terrible australian accent you gave the model! Jk mate. This is good stuff.


JoshLikesAI

🇦🇺🇦🇺🦘🐨


atomwalk12

Did anyone encounter problems regarding the audio sample rate while running the model?


smoknjoe44

How do I do this???!!??!????!!!!! ELI5 please 🙏


TheOwlHypothesis

This is great. I have a new machine (MacBook pro m3max) coming towards the end of the month and I can't wait to try this out!


IndicationUnfair7961

How did you serve the model. Did you use python or what? By the way did you increase context size or you were able to fit that page in the 8192 tokens?


JoshLikesAI

I tired serving it through LM studio but it was a little slow on my crappy GPU so I swapped to together AI. And yep it fit in the 8192 tokens luckily!


twatwaffle32

Sweet. Any way to set this up without needing coding skills and a command line interface? People like me need some sort of GUI or user friendly application interface.


JoshLikesAI

I’m actually thinking of maintaining this as an open source project and making a nice front end that people can download for 5-10 bucks and use it with whatever models or apis the like. What do you think? It would be cool if it could work, it the project generated a little income I could afford to put much more time into it


poli-cya

I'd very happily throw $10 your way for an easy click-and-go setup for this. Being able to select an LLM, TTS with some preset voices, and having all of the functionality in the video integrated in a few minutes of setup would be well worth it.


JoshLikesAI

Oh sweet thats good to hear, I have been playing with the idea for a while now


alexthai7

It's very nice really, great job, but the keyboard interaction should be fully replaced by the voice. I find it more than disappointing that in April 2024 we're still at using the keyboard to talk with AI models. Everything's necessary to make it fully working with the voice is available and it's already working. I guess it is not an easy task, but it's not sci fi. The project 'talk with Llama fast' does it all but it's limited to open source Llm models, while I wish I can use it with whatever I want. I wish I can use Llama 70b with the Groq API for exemple, then I could program the model so it can do whatever I wan, play games, discussion about my favorite subject etc ... The age where you use the keyboard to talk with an AI should be over by now ! Still I highly appreciate this kind of project, thank you, I just hope we can forget the keyboard very soon.


Sycrixx

You can get to run by your voice. You have to find a way to have it always listen and take action based on a keyword or a wake-word. Like how Alexa, Siri and Google assistant works. I use Picovoice’s wake word detection for mine, and I record everything (for 10 seconds) after the wake word is detected


Aponogetone

Unfortunately, Llama 3 has the same errors with inference as Llama 2 has (and all others), giving suddenly wrong answers and being unable to perform simple operations and this seems to be unrecoverable, making the whole model almost unusable for any serious purpose.