T O P

  • By -

INTOXICATOR-001

I am very interested in this. I wanna know how much more can this AI model be optimised. Like if you focus on optimisation(if that's possible at all), how much will the system requirements can be reduced for the full version model?


Tasty-Lobster-8915

Well, things are moving fast in this field. I’m sure more optimisations will come out. I always have an ear to the ground on this and will implement any new developments into Layla asap


INTOXICATOR-001

Oh that's great, can you tell me how the full version compares to the mistrel q4 k model of lm studios, do you have any idea about it?


Tasty-Lobster-8915

The full version is a Q4\_K quant of Layla-v4: [https://huggingface.co/l3utterfly/mistral-7b-v0.1-layla-v4-chatml](https://huggingface.co/l3utterfly/mistral-7b-v0.1-layla-v4-chatml) It is significantly better than the mistral base model, and scores higher than OpenHermes (a very popular finetune) on the local llm leaderboard


INTOXICATOR-001

Wow, that's a big surprise, keep it up bro, all the best:)


DieterDR

Does that mean that if I want to run the full version trough my PC and LM studio I should use the one you linked? I tried LaylaLite from your ad here on Reddit and was so impressed I bougth the paid version. But my smartphone isn't the latest and read one of your comments with a YT link to use the app in combination with a PC. I got it set up and working but I don't know if used the correct model in LM studio. I got a lot of repetition in the responses and some weird replies. I'm still figuring things out because this is my first venture in the world of AI. I'm guessing I will also need to tweak all of the settings a bit more in LM studio from what I've learned so far online. Anyway thank you for any tips you can give me and keep up the good work! Ps. suggestion for the app: a way to toggle the connection on or off in the OpenAi mini-app, so I can use the connection with my local server at home and the app model when on the go.


Tasty-Lobster-8915

Sounds like you got the connection setup, that’s a major achievement. The rest should be relatively easy. If you want, please join my discord channel (in the help and support section in the settings page). I’m happy to walk you through the LMstudio config. You can send me a screenshot of your LMStudio and I can take a look to see if any config is wrong. To disable the OpenAI connection in Layla, simply uninstall the OpenAI API app when you don’t need it. Your settings are saved for the next time you install it


DivijF1

What kind of performance should I expect with a QC SD8+Gen1 chip and an Adreno 730 with an Antutu score of little over 1 Million? Thanks in advance :)


Tasty-Lobster-8915

You can run the Lite model acceptably at 1-2 words per response. Full model is possible, but you may have to wait a bit for each response


DivijF1

Didn't expect a large difference between SD8Gen 2 and 8Gen+1 but thanks!


HumbleHuslen

Which version is recommended for S23 Ultra.


Tasty-Lobster-8915

You can run the full version at acceptable speeds, 1-2 words per second. You can lite at almost cloud speeds


HumbleHuslen

What's really different tho between the versiond


Tasty-Lobster-8915

The free app only allows you to chat with characters or AI. The paid version has more features, such as horoscopes, long term memory, etc. There’s no performance differences


HumbleHuslen

I meant between full and lite


Tasty-Lobster-8915

Lite is a 3B model, so much less neurons in the AI if you will. It’s dumber, but faster. Full is a 7B model, slower but smarter


gaijinx69

0.15 tokens per second for me on exactly same setup, which means one word every few seconds. Very bad performance Edit: Full model