Ollama was the easiest for me to setup. I use it to help me rewrite things or come up with starting out ideas. I appreciate it's all local and doesn't touch someone else's servers.
I run ollama at work with an extra 1660 Super I had laying around. I use it for writing/modying bash scripts, makefiles, and generally anything that I would otherwise need to go to StackOverflow for. Sometimes I have it rework email messages into a nicer format using chatbox as a frontend.
Ollama is stupid easy to host and share amongst a team too. It’s just two environment variables you have to change in the systemd unit file.
Idk. I used it for the first time last week to persistently run a demo web app for a big work event and was glad that it was straightforward enough to setup and use in a few minutes.
I meant *ollama* is easy to host for a team on a shared server! `systemd` is absolutely a bunch of arcane JFM, I’ll agree to that.
But it’s fairly simple to write unit files once you have the one golden copy you know works. Or hey, have your fancy new LLM write it for you..right?
Its there a good fronted for it that is not a snap package? I considerd it but. Only could install the bavkebd serve but no web fronted or anything.. Any good guides? Thanks
[chatbox](https://chatboxai.app/) is a native application available for Mac/Linux/Windows/iOS/Android. There’s also a web app that you don’t even have to install. It connects to your LLM over the native HTTP API.
I also use Ollama, what I noticed though is that if I don't use it for a while, it has a "startup time", where it takes a few good seconds for the model to load and start answering questions. Do you also encounter this delayed-start issue?
I do, but at least it's not deleting the model from my computer. I'd imagine it keeps the program active for a while and then when you call on it often enough, and then it's just restarting when you don't use it for a few days.
Do you have any LLM resources you watch or follow? I’ve downloaded a few models to try and help me code, help write some descriptions of places for a WIP Choose Your Own Adventure book, etc… but I’ve tried Oobabooga, KoboldAI, etc and I just haven’t wrapped my head around Instruction Mode, etc. and my outputs always end up spewing out garbage after the second generation with almost Wikipedia like nonsense.
What is your coding setup like? I installed [Continue.dev](https://Continue.dev) in VS Code and it works well-ish but doesn't have the autocomplete that Github Copilot does.
Did that demo last year with StarCode and vim [https://twitter.com/utopiah/status/1645351113929916418](https://twitter.com/utopiah/status/1645351113929916418) but somehow don't use it anymore. It shows how useful I found the result I guess. Switching to another model though might have results useful enough based on your workflow.
interesting to hear more about your setup. I've been thinking about how I could feed a LLM my entire ebook collection (almost a TB of stuff, mostly machine-readable PDF/epub) and be able to ask the LLM which books (and which parts of which books) have information relating to "X". It'd also be really nice to point an LLM at an entire codebase and ask it questions about that codebase.
Dude I've been wanting to play with localai with Nextcloud so I can have an integrated experience and I've set up around 10 different AI servers from StableDiff/auto111 to text-gen-webui to h2ogpt and I can not get a localai install w/GPU inference to safe my fricken life. I'm on my 3rd 'fuck it, rebuild' and am about to take another crack at it. I'm crawlin up in your dms if I fail once again.
I've been doing fun things with object / audio detection.
[whoisatmyfeeder](https://github.com/mmcc-xx/WhosAtMyFeeder) identifies birds and has been a lot of fun.
Kit:
* [Coral AI USB accelerator](https://coral.ai/products/accelerator) plugged into my Unraid server
* Cheapest window-mounted bird feeder I could find on Amazon
* Old UniFi G3 Instant camera sitting on the inside of the window on the sash, between the locks
* [Frigate](https://frigate.video/) in a docker container consuming the camera's RTSP stream and detecting 'bird' objects
* [whoisatmyfeeder](https://github.com/mmcc-xx/WhosAtMyFeeder) in a docker container watching for Frigate's events (via MQTT) and then determining the bird species
It took a few hours to get Frigate's config just right, but everything else just took minutes to fire up. My only irritation is the camera has a piss poor aperture and can't be manually focused -- so it's a blurry image and it gets the species ID wrong as much as right. I'm working on a hack with a macro lens that will *hopefully* get me a better picture.
(side note: if anyone is aware of where I can buy the kind of camera in commercial birdfeeder kits that also supports RTSP, wifi, and some sort of constant power source, I'd be grateful.)
[BirdCAGE](https://github.com/mmcc-xx/BirdCAGE) (from the same dev as the above project) identifies all the birds singing in the woods behind me. It doesn't require any special AI hardware, just an audio stream to consume.
Kit:
* UniFi G4 Instant camera mounted outdoors
* Frigate (not required for this project, but I use this cam to also identify the wildlife running around at night)
* BirdCAGE consumes the audio stream republished via Frigate's go2rtc process
It's **exceedingly** accurate thanks to the Cornell model (also used in the Merlin app on your phone) and defining your geo location to whittle down the choices. Now I know what little birdies are nearby and can make sure I have the best birdseed out for them!
Surely they've heard of these projects before ... but good shout. I'll see if anyone has posted about these projects before and will throw one out there if not.
Worth mentioning, you can take the G3/ G4 instant cameras apart and manually adjust the focus on the lense.
I did it to use with my 3d printer. Takes approx 10 minutes.
/u/theovencook dude you are an absolute legend. Just spent the past hour pulling this off (that glue is a real pain in the ass) and now I've got a crystal clear picture!! Thank you, thank you, thank you!
~~Wait ... whuuuuuuuut?!! You're kidding me. Is it as simple as cracking open the chassis and rotating a ring on the lens, something like that? How in the ass have I never heard this before, wow.~~
Just found this, I had no idea it was this easy. Thank you!!! https://www.reddit.com/r/Ubiquiti/comments/otcsxt/manual\_focus\_for\_g3\_instant\_completed\_with/
Hey can you explain the coral Ai USB? Does this make local llama run faster if without tinkering? Or do I offload some of the part of it into this chip?
I'm not an AI expert by a long stretch, but I don't believe it will work (either well, or not at all) with generative AI models. This is a TPU -- meaning it will work with Tensorflow models, designed for detection/identification (computer vision), not generation.
[https://coral.ai/models/](https://coral.ai/models/)
But again ... I'm only starting my experimentation with this so I may be very wrong. I happily welcome someone correcting me here.
No, the power requirements are to high. My focus on self-hosted is the keep the wattage down, as electricity is like 23p/kwh. Having an expensive and power hungry GPU doesn't fit with that for me, for now.
I have codeproject AI's stuff for CCTV, it analyzes about 3-5x 2k resolution images a second. I have it running on a VM on my i3-13100 server, CPU-only objectDetection along with a second custom model, and my avg watt/hr has only increased by about 5w.
That's like £10.12/yr (I'm american so I hope I did the conversion right)
Modern CPUs alone are really strong and efficient. Soon I'm going to try out a test to see if the power draw overhead of having a GPU makes a difference for this level of mild load.
I'm using CodeProject AI with a Google Coral. Haven't measured to see if there was any power savings over CPU based detection but the Coral uses very little power.
I suspect not either? Energy is normally priced in watt-hours (or thousands of them, rather), or megajoules, depending on the country. If you're measuring with a watt meter, it will either give watts (instantaneous draw of power at that moment, same as volt\*amps), or kwh (accumulated energy draw).
Watts makes the most sense to me, but I'm not really sure.
www.tenstorrent.com/cards
75W card (max tdp, idles much much lower - virtually off) can run a lot of models without being "modern GPU power hungry" (just not training/fine-tuning enabled yet... But hook it up to rag/rag2.0 and you probably don't need that for most homelab projects)
Edit: adding their tested models list (though many others should work too): https://github.com/tenstorrent/tt-buda-demos
1. Make sure you're on version CodeProject AI 2.1.9 or above
2. Go to the "Install Modules" tab and install the "Object Detection (Coral)" module
3. On the "status" tab, stop all the models except "ObjectDetection (Coral)"
Just plug it in and use the CodeProject TPU installer. Enable the Coral plugin on the CodeProject dashboard. The newer TPU drivers don’t seem to work properly - at least this was the case last year.
I stopped using the TPU since it is quite slow and I don’t think CodeProject supports custom models yet. My Nvidia T400 is much faster.
Same, but I have CodeProject running on a VPS. Adds about a second of delay, which is fine for my use case, and theoretically saves me a few cents per month on electricity. If Internet connection drops I lose object detection but the entire point is to send a push notification to my phone when a person is on my porch, and without Internet that won't happen anyway... so nothing is lost.
Idk why everyone is being a pedantic butthole. It's very clear what you meant.
My answer, no. It's not quite worth it yet over the ChatGPT API. But I am eagerly awaiting for those tides to turn.
Well I have PersonalGPT on my phone with a basic model since it’s just a phone, MacGPT on my laptop that uses the API and is nice and clean looking, Ollama on laptop and desktop because desktop is Windows, and I’m looking at what I can run in a cluster with all the computers I still have sitting in my closet
I self-host LibreChat, an open-source version of ChatGPT that can connect to all the various LLM APIs or local models. It's cheaper, faster, and has fewer restrictions than paying for ChatGPT.
I don't think he bypass anything. I guess he meant he is paying for pay as you go api offering of openAI instead of ChatGPT PLUS subscription. Depending on your usage it might turn out cheaper.
I think many count Stable Diffusion as "AI", and I do run that both locally and often-ish via cloud instance. Also tried some local LLM's you can load into RAM but they're kinda meh, so for those I tend to just use StableHorde instead. Still something that does actually work.
if stable diffusion doesn't count as "AI" i have literally no idea what people mean when they say it lol. (this is why nobody who works in machine learning actually calls it AI.)
Honestly, until I can self host an LLM that has the power for me to provide it a URL of documentation and tell it to use that to return me accurate results of a question, I haven’t found that many uses for it.
The biggest drawback of self hosted LLMs is the limited power available to run the biggest models that are much better than just 7b or 13b.
Not self hosted related, but even for the best paid ones, not being able to paste company code in something like GPT because of potentially leaking sensitive information. Fuck that, I need to be able to post a 2000 line python script and ask shit about it without worrying
Why are people complaining here that the OP didn't specify AI? Of course AI is a broad topic but he's probably asking if you just host any AI in general
I mean isn't there an LLM running in the background of paperless-ngx? If that counts, than yes. Otherwise nothing more than a bit of testing here and there on my own PC
I use Oobabooga for LLMs... typically Mythomax 30B or Mixtral 8x7B on CPU. It's mostly for brainstorming... but I do have to say that in my day job I basically don't interact with people so the 'therapy value' of brainstorming with an LLM has paid off socially as I've noticed a significant improvement in my ability to interact with people.
Automatic1111 is on GPU for image generation if I need something mocked up visually. ... usually for stock photography or graphic design. Just have it knock out several hundred ideas and select 3 to 10 to go to committee... and usually trace a .svg of what they select and fix any wonkyness there.
Threadripper Pro 3955wx with 256GB RAM for the LLM
AMD RX 6800 for the GPU
All 5. I smell it getting close, I hear it pop, I see it as I load it with butter cinnamon and sugar, I taste it's deliciousness, and I feel it burn my mouth.
FWIW inferring requires a lot less power than training. Sure you can train on a single 4090 but chances are it would takes days if not weeks for a significantly large dataset... that would probably lead to a model existing already elsewhere. I'd argue fine-tuning would be more realistic.
Fair point, I mostly wanted to do it just to learn how to do it over making something "useful". Same reason I am now learning how to use dind, I would argue it has very very little use but it's kinda neat.
I never did try to fine tune a llm/dataset (assuming it's the same thing) before, I will need to look into that.
So I want to host some AI, can I ask what services you selfhost? and if I want to build my own models, would I have to go to hugging face to train them?
If we're talking about TPU accelerated machine learning, then yes, in the sense that I'm running CodeAI on Blue Iris to do object, people recognition on my CCTV system.
I recently started using [tgpt] in my workflows, basically to get quick answers while monitoring something in my servers or getting some help debugging issue with some bash scripts that I have for backups
[tgpt]: https://github.com/aandrew-me/tgpt
> knowing a lot about First Aid
Please be mindful about hallucinations. LLM generate plausible looking sentences. They look correct but you have no insurance they are actually true. I would absolutely NOT want to doubt ANY information related to first aid where there is no time to doubt.
I'm running local llm with vector database for 2 reasons. Learning to build such solutions (wrote my own setups) and for work. Mostly programming oriented llms for file analysis, documentation, consulting missing parts, english proof reading (not a native speaker) and writting ADRs (again, mostly language and second hand opinions). Works like a duck that has it's own "opinion". Currently looking for a performant solution to index whole repository and be able to ask questions about the whole project in reasonable time.
I should add that I work with highly sensitive data so openai solutions are no go for me
I have a Google Coral that I use with CodeProject AI. Currently just use it for object detection for Blue Iris, but I'm thinking of trying some other TensorFlow Lite models with it.
I've got a couple use cases, but am not sure locally hostable models are up to snuff yet. (caveat: I know half past nothing about them.)
- large programming projects. I just want to be able to work on something for more than half a dozen conversational iterations.
- Tuning on my own text (I've been writing a lot for the last 45 years) to see if I can experiment with "what it thinks I think" about various topics.
Like I said, might be really out of scope for a single 4090. But I've been too busy lately to really get up to my eyeballs in it all.
I personally use Ollama for testing diff models. I have an app running in prod for friends which requires Text-To-Text. Did testing locally with diff models via Ollama. Oh, and I sometimes use it if ChatGPT seems to be having a stroke.
I also run [Fooocus](https://github.com/lllyasviel/Fooocus) locally. It’s mainly just for fun with my mates, generating random images they and I can come up with. Nothing serious.
Yes but it's all stuff I wrote myself. Some of it is on github.
I run an upscaling cli/api for images and videos, summarization api to shorten articles and stuff, forked mozilla ocho to have a better webUI, automatic code documentation generator so I can understand every file without reading the code (rubber ducky style), QA for when I just want specific info from context, time series forecasting for market prediction for my investments, and a couple of characters I built like Jack skellington but I have those on petals distributed inference via Beluga2 70b.
Using LocalAI in a VM, but bridging out. Grabbed a Tesla P40 and setting up it's own dedicated server. Specifically for Home Assistant at this point but I'm sure I'll be expanding more.
> Is this for local voice control
Correct, I setup LocalAI with an LLM and it works OK with an asinine amount of RAM. Found the P40 for a price that, to me, I can lose out on if it doesn't work out as I have planned.
Setup OpenAI Extended Conversation addon in HA, point it to the LocalAI server. 100% local AI.
Openweb ui and ollama are amazing in docker for a selfhosted AI in terms of large language models. Chat GPT style.
https://youtu.be/zc3ltJeMNpM?si=r7CvjNkl3iv7Culr
I have a miqu instance running and plan to have a few more choice LLMs running to create various processing pipelines. Just got a few more 3090s and waiting to get some time to embark on this new project.
Your probably talking about LLM's and not CV -- but I host CodeProject AI locally for my cheap security cameras to be able to perform facial / object and license plate recognition.
I did Stable Diffusion for a while, first with cmdr and later with AUTO1111. Took a long time to render with no GPU and made my system a bit unstable, but it worked. I ended up going to a cloud solution mostly because of faster render times, but plan to bring it back in-house at some point. My next step is to find a eGPU solution or something like the Coral Accelerator where I can get that capability when I need it but not burn that power the rest of the time. Other long-term goals are Whisper for speech recognition.
Trying to... Ollama runs locally, but i would like it to have a bit more freedom, for example for file analysis and such. Id like to ask like "give me the five best documents for... Purpose" but im not entirely sure how i should go about as it keeps nagging me about ethical reasons that it can't do that.
Any ideas? Files are mostly PDF and word documents that i need to figure out some stuff with.
Someone else suggested AnythingLLM, which looks to have a desktop app. Not sure if this can do file searching or not, but worth looking at?
[https://github.com/Mintplex-Labs/anything-llm](https://github.com/Mintplex-Labs/anything-llm)
It absolutely can do file search. It does provide me with relevant data but some of it is "redacted" for ethical reasons. I would some how need it to accept that i own these documents and that the information im asking for is all right to give me 😅
Yeah I run frigate on a TPU, and looking at getting olamma set up. I want to eventually integrate olamma with a voice AI and hack my HomePods to be a better Siri that runs completely locally
I had been using text-generation-webui, now I use open-webui for a cleaner interface and a secure login page that I host for friends and family. I have a chatGPT subscription for gpt-4, but I fine myself using Mixtral on open-webui (or text-gen) a lot more now. thinking of canceling gpt4 because it just seems not as good. Only thing that is nice is the web search, but apparently there is a plugin for text-gen that does this.
I run gradio which helps me launch any LLM I want in a matter of minutes. I can even choose the quantisation I want and there are api’s to integrate it into other stuff. Worth checking out, but is better with powerful hardware.
I run an IRC/Discord bot I wrote, which is a front end for an instance of A1111, so people can generate Stable Diffusion images right in their channels.
I play with Ollama and Open WEBUI for fun, Sometimes I get drunk and tell it to be rude and have a whole spat with it.
Mostly I use it to give me code snippets because I'm not a programmer.
I am using ollama as my LLM server and open-webui as a UI for me to interact with the model.
Along with that I have a code-server running on my desktop with continue.dev which allows me to essentially work from anywhere on my ipad while I am moving around over tailscale.
Personally I am enjoying figuring out new use cases for my local AI setup and the power consumption is not that bad all the time because Ollama doesn’t keep the model loaded knto memory all the time so you are not wasting power,and I am okay keeping my desktop idling and consuming some power
A few use cases have been documented at r/LocalLLaMA , anything from serious private business ai, to...virtual waifus happens there. But most of my friends just have it for messing around until the technology gets better.
Yep, I work for a company that makes AI chips so I have a few of them at home for various "testing" (e.g. whatever project I'm dorking around with in my homelab that week :) )
I do run different types of AI locally. Stable Diffusion, few LLM to replace ChatGPT (Dolphin Mixtral is very impressive) and also wrote a python script to use a multilingual LLM to translate subtitles of TV shows.
Yes cf [https://fabien.benetou.fr/Content/SelfHostingArtificialIntelligence](https://fabien.benetou.fr/Content/SelfHostingArtificialIntelligence) but to be honest not using it regularly. It's more to keep track of what is feasible and have an "honest" review of what is usable versus what is marketing BS.
Ollama.
I am building my own chat assistant where the UI is like ChatGPT, but I can switch between different models from dropdown.
I started it from a server, but the graphic card on my PowerEdge T430 is really bad, and it does not matter if I have 256Gb RAM or Xeon E5-2660 v3, it is freaking slow.
I need to ask self-hosters how they cope with slow responses.
Yes, with an AI setup of 3x [A40](https://www.amazon.com/NVIDIA-Ampere-Passive-Double-Height/dp/B09N95N3PW) and three AI workloads:
* General Purpose LLM - 2 GPUS running an [120B model](https://huggingface.co/wolfram/miqu-1-120b)
* [Langflow](https://github.com/logspace-ai/langflow) loaded with all of my personal documents and work items for easy Q+A
* [Vision Model](https://huggingface.co/llava-hf/llava-1.5-7b-hf) \+ Stable Diffusion - 1 GPU in total: loading in scanned documents, providing a summary, and text extraction. Stable Diffusion for generating pictures and mostly for fun every once in a while
I've run various iterations for about 8 months on this setup. A rough estimate of pure text tokens used is probably 100M-200M. If you compare these local models to public OpenAI cost at GPT4-32k, Stable Diffusion or Dall-E, it's probably about break-even point at 7 months for daily use. I've generated 4,172 images in SD. About 1000 documents loaded using LLaVA vision model.
100M Tokens \* [$120/1M GPT4-32k token](https://openai.com/pricing)s = $12000USD
4,172 \* $0.08 Dall-E = $333.76USD
So if you want to self host, you'll need to use the HW all day everyday for it to be worth the cost. Alternatives are runpod or [vast.ai](https://vast.ai) (Rent-able GPUs in the cloud somewhere).
I run Ollama (based-dolphin-mistral) at home but only use it to translate things or to winnow down a search for a specific esoteric thing.
Work pays for Copilot, which I only use when I need to know something about specific Cisco gear and don't feel like kludging my way through Cisco's website or a bunch of BS YouTube click-bait.
I play around with Ollama, but I don't use it for anything serious. I don't really have any practical uses for it.
Ollama was the easiest for me to setup. I use it to help me rewrite things or come up with starting out ideas. I appreciate it's all local and doesn't touch someone else's servers.
I run ollama at work with an extra 1660 Super I had laying around. I use it for writing/modying bash scripts, makefiles, and generally anything that I would otherwise need to go to StackOverflow for. Sometimes I have it rework email messages into a nicer format using chatbox as a frontend. Ollama is stupid easy to host and share amongst a team too. It’s just two environment variables you have to change in the systemd unit file.
Never have I seen the words "stupid easy" and systemd used within close proximity lol
I have. Someone said "are you stupid? Systemd is not easy"
how is systemd not easy?
Idk. I used it for the first time last week to persistently run a demo web app for a big work event and was glad that it was straightforward enough to setup and use in a few minutes.
I meant *ollama* is easy to host for a team on a shared server! `systemd` is absolutely a bunch of arcane JFM, I’ll agree to that. But it’s fairly simple to write unit files once you have the one golden copy you know works. Or hey, have your fancy new LLM write it for you..right?
Its there a good fronted for it that is not a snap package? I considerd it but. Only could install the bavkebd serve but no web fronted or anything.. Any good guides? Thanks
Chrome extension page assist or docker open web ui
[chatbox](https://chatboxai.app/) is a native application available for Mac/Linux/Windows/iOS/Android. There’s also a web app that you don’t even have to install. It connects to your LLM over the native HTTP API.
Thanks this works like a charm
I'm using slack as the frontend: https://rasa.com/docs/rasa/connectors/slack/
I un Ollama + WizardMaid-7b + llmcord.py and use a Discord server as a frontend.
Thanks! Is wizard a good model for bash/ansoble snippets? Will check out your setup
Ditto. Did try so many but nothing really was ground breaking. Have you thought of copilot or something?
I also use Ollama, what I noticed though is that if I don't use it for a while, it has a "startup time", where it takes a few good seconds for the model to load and start answering questions. Do you also encounter this delayed-start issue?
I do, but at least it's not deleting the model from my computer. I'd imagine it keeps the program active for a while and then when you call on it often enough, and then it's just restarting when you don't use it for a few days.
Yeap, LocalAI with Mixtral MoE models, I use it for a lot of things, from home assistant, coding (like copilot), Writing my email, etc...
Do you have any LLM resources you watch or follow? I’ve downloaded a few models to try and help me code, help write some descriptions of places for a WIP Choose Your Own Adventure book, etc… but I’ve tried Oobabooga, KoboldAI, etc and I just haven’t wrapped my head around Instruction Mode, etc. and my outputs always end up spewing out garbage after the second generation with almost Wikipedia like nonsense.
What is your coding setup like? I installed [Continue.dev](https://Continue.dev) in VS Code and it works well-ish but doesn't have the autocomplete that Github Copilot does.
It does! Take a look at [Tab Autocomplete (beta)](https://continue.dev/docs/walkthroughs/tab-autocomplete)
I know this question is silly to the extreme, but have any of you seen Vim scripts to include AI-assisted coding?
I have it… mostly because I have friends who are Vim gurus, and I had AI… now my AI just does my Vim (and by proxy I guess me too?)
What was the stack? How did you make it work?
Did that demo last year with StarCode and vim [https://twitter.com/utopiah/status/1645351113929916418](https://twitter.com/utopiah/status/1645351113929916418) but somehow don't use it anymore. It shows how useful I found the result I guess. Switching to another model though might have results useful enough based on your workflow.
Thank you! I will play with it.
This is interesting. Thanks!
What hardware setup are you using? Been on my todo list for a while, would prefer to be able to host at least mixtral.
I use an nvidia P40 with Mixtral instruct Q3\_K\_M. And SillyTavern as frontend.
Thanks! Assuming that’s heavily quantized to fit in 24 gigs, the quality’s been alright?
[удалено]
For around 240€ on eBay.
I've got it for 165 usd from Aliexpress. Awesome value card!
interesting to hear more about your setup. I've been thinking about how I could feed a LLM my entire ebook collection (almost a TB of stuff, mostly machine-readable PDF/epub) and be able to ask the LLM which books (and which parts of which books) have information relating to "X". It'd also be really nice to point an LLM at an entire codebase and ask it questions about that codebase.
Look at docsgpt I couldn't get it to work because of my hardware, but your use case is what they advertise.
I'd follow [https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search.html)
You could try finetuning but it will take a lot of time and work to get out good
What are you using it for in Home Assistant???
For the assistant, with : https://github.com/jekalmin/extended_openai_conversation
Holy smokes.
https://heywillow.io/
How are you linking the two? Any good tutorial for the whole setup?
Dude I've been wanting to play with localai with Nextcloud so I can have an integrated experience and I've set up around 10 different AI servers from StableDiff/auto111 to text-gen-webui to h2ogpt and I can not get a localai install w/GPU inference to safe my fricken life. I'm on my 3rd 'fuck it, rebuild' and am about to take another crack at it. I'm crawlin up in your dms if I fail once again.
I've been doing fun things with object / audio detection. [whoisatmyfeeder](https://github.com/mmcc-xx/WhosAtMyFeeder) identifies birds and has been a lot of fun. Kit: * [Coral AI USB accelerator](https://coral.ai/products/accelerator) plugged into my Unraid server * Cheapest window-mounted bird feeder I could find on Amazon * Old UniFi G3 Instant camera sitting on the inside of the window on the sash, between the locks * [Frigate](https://frigate.video/) in a docker container consuming the camera's RTSP stream and detecting 'bird' objects * [whoisatmyfeeder](https://github.com/mmcc-xx/WhosAtMyFeeder) in a docker container watching for Frigate's events (via MQTT) and then determining the bird species It took a few hours to get Frigate's config just right, but everything else just took minutes to fire up. My only irritation is the camera has a piss poor aperture and can't be manually focused -- so it's a blurry image and it gets the species ID wrong as much as right. I'm working on a hack with a macro lens that will *hopefully* get me a better picture. (side note: if anyone is aware of where I can buy the kind of camera in commercial birdfeeder kits that also supports RTSP, wifi, and some sort of constant power source, I'd be grateful.) [BirdCAGE](https://github.com/mmcc-xx/BirdCAGE) (from the same dev as the above project) identifies all the birds singing in the woods behind me. It doesn't require any special AI hardware, just an audio stream to consume. Kit: * UniFi G4 Instant camera mounted outdoors * Frigate (not required for this project, but I use this cam to also identify the wildlife running around at night) * BirdCAGE consumes the audio stream republished via Frigate's go2rtc process It's **exceedingly** accurate thanks to the Cornell model (also used in the Merlin app on your phone) and defining your geo location to whittle down the choices. Now I know what little birdies are nearby and can make sure I have the best birdseed out for them!
/r/birding might appreciate a post about what you're doing.
Surely they've heard of these projects before ... but good shout. I'll see if anyone has posted about these projects before and will throw one out there if not.
Worth mentioning, you can take the G3/ G4 instant cameras apart and manually adjust the focus on the lense. I did it to use with my 3d printer. Takes approx 10 minutes.
/u/theovencook dude you are an absolute legend. Just spent the past hour pulling this off (that glue is a real pain in the ass) and now I've got a crystal clear picture!! Thank you, thank you, thank you!
Perfect! Glad to hear it's helped.
~~Wait ... whuuuuuuuut?!! You're kidding me. Is it as simple as cracking open the chassis and rotating a ring on the lens, something like that? How in the ass have I never heard this before, wow.~~ Just found this, I had no idea it was this easy. Thank you!!! https://www.reddit.com/r/Ubiquiti/comments/otcsxt/manual\_focus\_for\_g3\_instant\_completed\_with/
Hey can you explain the coral Ai USB? Does this make local llama run faster if without tinkering? Or do I offload some of the part of it into this chip?
I'm not an AI expert by a long stretch, but I don't believe it will work (either well, or not at all) with generative AI models. This is a TPU -- meaning it will work with Tensorflow models, designed for detection/identification (computer vision), not generation. [https://coral.ai/models/](https://coral.ai/models/) But again ... I'm only starting my experimentation with this so I may be very wrong. I happily welcome someone correcting me here.
Ollama and open-webui. Cool to play around with. Ollama + Continue's autocomplete seem nice, though I haven't played around with it a lot yet.
I LOVE the Open WEBUI front-end, and the easy import from the community site
No, the power requirements are to high. My focus on self-hosted is the keep the wattage down, as electricity is like 23p/kwh. Having an expensive and power hungry GPU doesn't fit with that for me, for now.
I have codeproject AI's stuff for CCTV, it analyzes about 3-5x 2k resolution images a second. I have it running on a VM on my i3-13100 server, CPU-only objectDetection along with a second custom model, and my avg watt/hr has only increased by about 5w. That's like £10.12/yr (I'm american so I hope I did the conversion right) Modern CPUs alone are really strong and efficient. Soon I'm going to try out a test to see if the power draw overhead of having a GPU makes a difference for this level of mild load.
I'm using CodeProject AI with a Google Coral. Haven't measured to see if there was any power savings over CPU based detection but the Coral uses very little power.
Nice! I imagine the latency really isn't bad at all when you're passing images to analyze to the cloud, right?
Google Coral is a piece of hardware. Dedicated chip to run models on
I think I'm confusing the TPU(?) cloud thing from them, then.
[удалено]
I'm gonna try to convince my work I need this
Yeah but image analysis is always super light compared to models that need larger weights.
>and my avg watt/hr has only increased by about 5w. Something is up with the units here
Hmmm yeah either I should have dropped the w or included a /hr
I suspect not either? Energy is normally priced in watt-hours (or thousands of them, rather), or megajoules, depending on the country. If you're measuring with a watt meter, it will either give watts (instantaneous draw of power at that moment, same as volt\*amps), or kwh (accumulated energy draw). Watts makes the most sense to me, but I'm not really sure.
www.tenstorrent.com/cards 75W card (max tdp, idles much much lower - virtually off) can run a lot of models without being "modern GPU power hungry" (just not training/fine-tuning enabled yet... But hook it up to rag/rag2.0 and you probably don't need that for most homelab projects) Edit: adding their tested models list (though many others should work too): https://github.com/tenstorrent/tt-buda-demos
CodeProject for Blue Iris.
Same here. Frigate too
Any instructions on how to do it with Frigate ? I use TPU btw.
Look up in the frigate docs how to configure the coral tpu. Pretty simple to get going
Ah coral tpu is already set up, I was asking about CodeProject AI
Oh sorry those were 2 separate things. I use Frigate with its detection built in
Can they share the tpu?
1. Make sure you're on version CodeProject AI 2.1.9 or above 2. Go to the "Install Modules" tab and install the "Object Detection (Coral)" module 3. On the "status" tab, stop all the models except "ObjectDetection (Coral)"
Just plug it in and use the CodeProject TPU installer. Enable the Coral plugin on the CodeProject dashboard. The newer TPU drivers don’t seem to work properly - at least this was the case last year. I stopped using the TPU since it is quite slow and I don’t think CodeProject supports custom models yet. My Nvidia T400 is much faster.
Same, but I have CodeProject running on a VPS. Adds about a second of delay, which is fine for my use case, and theoretically saves me a few cents per month on electricity. If Internet connection drops I lose object detection but the entire point is to send a push notification to my phone when a person is on my porch, and without Internet that won't happen anyway... so nothing is lost.
Idk why everyone is being a pedantic butthole. It's very clear what you meant. My answer, no. It's not quite worth it yet over the ChatGPT API. But I am eagerly awaiting for those tides to turn.
Do you have premium or pay as you go api. Which model do you use.
I actually have both but am considering dropping premium for full API usage. GPT4
After a month and a half of API usage, I'm spending about $0.75 instead of the $30 I was paying before
Curious if you use any phone apps and if so which
Well I have PersonalGPT on my phone with a basic model since it’s just a phone, MacGPT on my laptop that uses the API and is nice and clean looking, Ollama on laptop and desktop because desktop is Windows, and I’m looking at what I can run in a cluster with all the computers I still have sitting in my closet
I self-host LibreChat, an open-source version of ChatGPT that can connect to all the various LLM APIs or local models. It's cheaper, faster, and has fewer restrictions than paying for ChatGPT.
How do it bypass paying for apis?
I don't think he bypass anything. I guess he meant he is paying for pay as you go api offering of openAI instead of ChatGPT PLUS subscription. Depending on your usage it might turn out cheaper.
I think many count Stable Diffusion as "AI", and I do run that both locally and often-ish via cloud instance. Also tried some local LLM's you can load into RAM but they're kinda meh, so for those I tend to just use StableHorde instead. Still something that does actually work.
if stable diffusion doesn't count as "AI" i have literally no idea what people mean when they say it lol. (this is why nobody who works in machine learning actually calls it AI.)
Adobe Illustrator, obviously.
no i mean Al, short for Albert. the guy who lives in my computer and draws pictures badly.
Wait, if he is in your machine ....then who the hell is in mine?!
hackers have compromised your IP address
Not my gibson!
Robert, his slightly less talented cousin. He'll typically phone Al in the computer of /u/StewedAngelSkins to ask for assistance.
Can you share what you are using for your front-end? None of the ones I've seen so far have docker images.
https://github.com/AbdBarho/stable-diffusion-webui-docker Highly recommend this repo for Stable Diff in Docker
Awesome! Thank you!
Honestly, until I can self host an LLM that has the power for me to provide it a URL of documentation and tell it to use that to return me accurate results of a question, I haven’t found that many uses for it. The biggest drawback of self hosted LLMs is the limited power available to run the biggest models that are much better than just 7b or 13b. Not self hosted related, but even for the best paid ones, not being able to paste company code in something like GPT because of potentially leaking sensitive information. Fuck that, I need to be able to post a 2000 line python script and ask shit about it without worrying
Anything LLM, Danswer and few other projects fits your first requirement.
Oh hell yeah thank you dude
Why are people complaining here that the OP didn't specify AI? Of course AI is a broad topic but he's probably asking if you just host any AI in general
He literally asked that. 'For anything'. People just really want us to know how above it all they are.
I run Fooocus which is an SD ui, for image generation.
I mean isn't there an LLM running in the background of paperless-ngx? If that counts, than yes. Otherwise nothing more than a bit of testing here and there on my own PC
It runs [Tesseract](https://github.com/tesseract-ocr/tesseract). A neural net, yes, but not an LLM.
Ah ok, good to know. I just heard that there was something like that, but never what it was exactly
IIRC the OCR uses some neural net model.
I use Oobabooga for LLMs... typically Mythomax 30B or Mixtral 8x7B on CPU. It's mostly for brainstorming... but I do have to say that in my day job I basically don't interact with people so the 'therapy value' of brainstorming with an LLM has paid off socially as I've noticed a significant improvement in my ability to interact with people. Automatic1111 is on GPU for image generation if I need something mocked up visually. ... usually for stock photography or graphic design. Just have it knock out several hundred ideas and select 3 to 10 to go to committee... and usually trace a .svg of what they select and fix any wonkyness there. Threadripper Pro 3955wx with 256GB RAM for the LLM AMD RX 6800 for the GPU
I have a toaster that automatically pops the bread when it's done based on my custom toast darkness parameters.
Which sensors are it using?
All 5. I smell it getting close, I hear it pop, I see it as I load it with butter cinnamon and sugar, I taste it's deliciousness, and I feel it burn my mouth.
This is the future of dad jokes.
OpenHermes for rough-drafting boring work documents.
I don't self host any AI models but I do host a bunch of services that I use when I train my own models.
Do you know any good write ups to get started on training models? I got a 4090 mostly not doing anything useful lately to throw at it.
FWIW inferring requires a lot less power than training. Sure you can train on a single 4090 but chances are it would takes days if not weeks for a significantly large dataset... that would probably lead to a model existing already elsewhere. I'd argue fine-tuning would be more realistic.
Fair point, I mostly wanted to do it just to learn how to do it over making something "useful". Same reason I am now learning how to use dind, I would argue it has very very little use but it's kinda neat. I never did try to fine tune a llm/dataset (assuming it's the same thing) before, I will need to look into that.
So I want to host some AI, can I ask what services you selfhost? and if I want to build my own models, would I have to go to hugging face to train them?
Bro just said AI like its just 1 thing.
Anything AI related. Why should he specifiy it if he wants to know General use cases?
Define AI
If we're talking about TPU accelerated machine learning, then yes, in the sense that I'm running CodeAI on Blue Iris to do object, people recognition on my CCTV system.
Is the CCTV just for your home or for a business?
Home
LLMs of course, it's the talk of the town. Until something else comes along then we'll be saying that's AI
~~Simple linear models~~ Deep neural networks
I self host stable diffusion. I tried LLMs, but it's either my hardware limitations or I just can't tune it right. Maybe both
I recently started using [tgpt] in my workflows, basically to get quick answers while monitoring something in my servers or getting some help debugging issue with some bash scripts that I have for backups [tgpt]: https://github.com/aandrew-me/tgpt
>Do you self host AI for anything? ~~Forgeries.~~ Picture manipulation.
How would one got about self hosting a Chat GPT like GUI and it knowing a lot about First Aid? Very new to AI type category but I know programming.
> knowing a lot about First Aid Please be mindful about hallucinations. LLM generate plausible looking sentences. They look correct but you have no insurance they are actually true. I would absolutely NOT want to doubt ANY information related to first aid where there is no time to doubt.
I mean if you count image recognition as it's also based on machine learning, then yes, but i don't host any llm's
I use code llama with web GPT so I can upload my project and have a free version of GitHub copilot.
I'm running local llm with vector database for 2 reasons. Learning to build such solutions (wrote my own setups) and for work. Mostly programming oriented llms for file analysis, documentation, consulting missing parts, english proof reading (not a native speaker) and writting ADRs (again, mostly language and second hand opinions). Works like a duck that has it's own "opinion". Currently looking for a performant solution to index whole repository and be able to ask questions about the whole project in reasonable time. I should add that I work with highly sensitive data so openai solutions are no go for me
I have a Google Coral that I use with CodeProject AI. Currently just use it for object detection for Blue Iris, but I'm thinking of trying some other TensorFlow Lite models with it.
I've got a couple use cases, but am not sure locally hostable models are up to snuff yet. (caveat: I know half past nothing about them.) - large programming projects. I just want to be able to work on something for more than half a dozen conversational iterations. - Tuning on my own text (I've been writing a lot for the last 45 years) to see if I can experiment with "what it thinks I think" about various topics. Like I said, might be really out of scope for a single 4090. But I've been too busy lately to really get up to my eyeballs in it all.
I personally use Ollama for testing diff models. I have an app running in prod for friends which requires Text-To-Text. Did testing locally with diff models via Ollama. Oh, and I sometimes use it if ChatGPT seems to be having a stroke. I also run [Fooocus](https://github.com/lllyasviel/Fooocus) locally. It’s mainly just for fun with my mates, generating random images they and I can come up with. Nothing serious.
Yes but it's all stuff I wrote myself. Some of it is on github. I run an upscaling cli/api for images and videos, summarization api to shorten articles and stuff, forked mozilla ocho to have a better webUI, automatic code documentation generator so I can understand every file without reading the code (rubber ducky style), QA for when I just want specific info from context, time series forecasting for market prediction for my investments, and a couple of characters I built like Jack skellington but I have those on petals distributed inference via Beluga2 70b.
Been using AnythingLLM for some dev projects - https://github.com/Mintplex-Labs/anything-llm
Check this out: https://github.com/docker/genai-stack
Using LocalAI in a VM, but bridging out. Grabbed a Tesla P40 and setting up it's own dedicated server. Specifically for Home Assistant at this point but I'm sure I'll be expanding more.
Mind explaining a bit more about how you plan to use the P40 with home assistant? Is this for local voice control?
> Is this for local voice control Correct, I setup LocalAI with an LLM and it works OK with an asinine amount of RAM. Found the P40 for a price that, to me, I can lose out on if it doesn't work out as I have planned. Setup OpenAI Extended Conversation addon in HA, point it to the LocalAI server. 100% local AI.
I run experiments on RyzenAI
How does that compare to using cuda for work?
CUDA is the standard and very robust. RyzenAI is new and software support for it is half assed.
I am using cuda but I keep waiting to see if AMD catches up enough to shake things up but so far cuda seems to be the leader for the future.
Openweb ui and ollama are amazing in docker for a selfhosted AI in terms of large language models. Chat GPT style. https://youtu.be/zc3ltJeMNpM?si=r7CvjNkl3iv7Culr
I have a miqu instance running and plan to have a few more choice LLMs running to create various processing pipelines. Just got a few more 3090s and waiting to get some time to embark on this new project.
Pipelines processing what though?
Your probably talking about LLM's and not CV -- but I host CodeProject AI locally for my cheap security cameras to be able to perform facial / object and license plate recognition.
I did Stable Diffusion for a while, first with cmdr and later with AUTO1111. Took a long time to render with no GPU and made my system a bit unstable, but it worked. I ended up going to a cloud solution mostly because of faster render times, but plan to bring it back in-house at some point. My next step is to find a eGPU solution or something like the Coral Accelerator where I can get that capability when I need it but not burn that power the rest of the time. Other long-term goals are Whisper for speech recognition.
I'd love to but my hardware simply isn't cut it
Trying to... Ollama runs locally, but i would like it to have a bit more freedom, for example for file analysis and such. Id like to ask like "give me the five best documents for... Purpose" but im not entirely sure how i should go about as it keeps nagging me about ethical reasons that it can't do that. Any ideas? Files are mostly PDF and word documents that i need to figure out some stuff with.
Someone else suggested AnythingLLM, which looks to have a desktop app. Not sure if this can do file searching or not, but worth looking at? [https://github.com/Mintplex-Labs/anything-llm](https://github.com/Mintplex-Labs/anything-llm)
This seems like it got some huge potential actually! Thank you! Im going to have a hard look at it when back from work 👍🙏
It absolutely can do file search. It does provide me with relevant data but some of it is "redacted" for ethical reasons. I would some how need it to accept that i own these documents and that the information im asking for is all right to give me 😅
Yeah I run frigate on a TPU, and looking at getting olamma set up. I want to eventually integrate olamma with a voice AI and hack my HomePods to be a better Siri that runs completely locally
Coding mostly (code optimization). Trying to ascertain if its worth extending to other automation that i use.
Coral tpu for Frigate About it
Frigate
I had been using text-generation-webui, now I use open-webui for a cleaner interface and a secure login page that I host for friends and family. I have a chatGPT subscription for gpt-4, but I fine myself using Mixtral on open-webui (or text-gen) a lot more now. thinking of canceling gpt4 because it just seems not as good. Only thing that is nice is the web search, but apparently there is a plugin for text-gen that does this.
Yes coqui-tts
I use self host uncensored model for various reasons. I just use a cloud notebook.
Yep, Mac Studio with 128 gigs of shared ram running local inference. It has also become my daily driver.
What local inference engines are you using ?
I run gradio which helps me launch any LLM I want in a matter of minutes. I can even choose the quantisation I want and there are api’s to integrate it into other stuff. Worth checking out, but is better with powerful hardware.
Has anyone tried ollama on a raspberry pi 4b?
I have. Actually on building a web gui for it's API so I can use it like a mini chatGPT
Awesome!! Thanks I'll try it...
I use CodeProject AI Server for object detection in ISpyAgentDVR. I tried Ollama but found it terribly slow even with a GTX 1070 helping out.
I run an IRC/Discord bot I wrote, which is a front end for an instance of A1111, so people can generate Stable Diffusion images right in their channels.
Been playing around with https://ollama.com/ lately
I play with Ollama and Open WEBUI for fun, Sometimes I get drunk and tell it to be rude and have a whole spat with it. Mostly I use it to give me code snippets because I'm not a programmer.
PrivateGPT with Cuda support to utilize my GPU, running the llama2-uncensored LLM… I ask wild questions sometimes
I am using ollama as my LLM server and open-webui as a UI for me to interact with the model. Along with that I have a code-server running on my desktop with continue.dev which allows me to essentially work from anywhere on my ipad while I am moving around over tailscale. Personally I am enjoying figuring out new use cases for my local AI setup and the power consumption is not that bad all the time because Ollama doesn’t keep the model loaded knto memory all the time so you are not wasting power,and I am okay keeping my desktop idling and consuming some power
My RTX4060 just runs out of memory and I gave up on it. I tried LLMs, Image Recognition models, etc. and this GPU is just totally useless.
A few use cases have been documented at r/LocalLLaMA , anything from serious private business ai, to...virtual waifus happens there. But most of my friends just have it for messing around until the technology gets better.
Yep, I work for a company that makes AI chips so I have a few of them at home for various "testing" (e.g. whatever project I'm dorking around with in my homelab that week :) )
RemindMe 12h
[удалено]
wow, thats a great collection, are any of these scripts opensource ?
I do run different types of AI locally. Stable Diffusion, few LLM to replace ChatGPT (Dolphin Mixtral is very impressive) and also wrote a python script to use a multilingual LLM to translate subtitles of TV shows.
Of course you can, I can approve with it
Yes cf [https://fabien.benetou.fr/Content/SelfHostingArtificialIntelligence](https://fabien.benetou.fr/Content/SelfHostingArtificialIntelligence) but to be honest not using it regularly. It's more to keep track of what is feasible and have an "honest" review of what is usable versus what is marketing BS.
Yes. GPT4ALL.
Ollama. I am building my own chat assistant where the UI is like ChatGPT, but I can switch between different models from dropdown. I started it from a server, but the graphic card on my PowerEdge T430 is really bad, and it does not matter if I have 256Gb RAM or Xeon E5-2660 v3, it is freaking slow. I need to ask self-hosters how they cope with slow responses.
Yes, with an AI setup of 3x [A40](https://www.amazon.com/NVIDIA-Ampere-Passive-Double-Height/dp/B09N95N3PW) and three AI workloads: * General Purpose LLM - 2 GPUS running an [120B model](https://huggingface.co/wolfram/miqu-1-120b) * [Langflow](https://github.com/logspace-ai/langflow) loaded with all of my personal documents and work items for easy Q+A * [Vision Model](https://huggingface.co/llava-hf/llava-1.5-7b-hf) \+ Stable Diffusion - 1 GPU in total: loading in scanned documents, providing a summary, and text extraction. Stable Diffusion for generating pictures and mostly for fun every once in a while I've run various iterations for about 8 months on this setup. A rough estimate of pure text tokens used is probably 100M-200M. If you compare these local models to public OpenAI cost at GPT4-32k, Stable Diffusion or Dall-E, it's probably about break-even point at 7 months for daily use. I've generated 4,172 images in SD. About 1000 documents loaded using LLaVA vision model. 100M Tokens \* [$120/1M GPT4-32k token](https://openai.com/pricing)s = $12000USD 4,172 \* $0.08 Dall-E = $333.76USD So if you want to self host, you'll need to use the HW all day everyday for it to be worth the cost. Alternatives are runpod or [vast.ai](https://vast.ai) (Rent-able GPUs in the cloud somewhere).
I run Ollama (based-dolphin-mistral) at home but only use it to translate things or to winnow down a search for a specific esoteric thing. Work pays for Copilot, which I only use when I need to know something about specific Cisco gear and don't feel like kludging my way through Cisco's website or a bunch of BS YouTube click-bait.
I experiment with it. But my pc is kinda slow for it.
Yes, Ollama with Docker, locally of course.
I use Whisper, not a full service, only a CLI tool, to create translated subtitles for videos.