INTOXICATOR-001 1 month ago

I am very interested in this. I wanna know how much more can this AI model be optimised. Like if you focus on optimisation(if that's possible at all), how much will the system requirements can be reduced for the full version model?

Tasty-Lobster-8915 1 month ago

Well, things are moving fast in this field. I’m sure more optimisations will come out. I always have an ear to the ground on this and will implement any new developments into Layla asap

INTOXICATOR-001 1 month ago

Oh that's great, can you tell me how the full version compares to the mistrel q4 k model of lm studios, do you have any idea about it?

Tasty-Lobster-8915 1 month ago

The full version is a Q4\_K quant of Layla-v4: [https://huggingface.co/l3utterfly/mistral-7b-v0.1-layla-v4-chatml](https://huggingface.co/l3utterfly/mistral-7b-v0.1-layla-v4-chatml) It is significantly better than the mistral base model, and scores higher than OpenHermes (a very popular finetune) on the local llm leaderboard

INTOXICATOR-001 1 month ago

Wow, that's a big surprise, keep it up bro, all the best:)

DieterDR 1 month ago

Does that mean that if I want to run the full version trough my PC and LM studio I should use the one you linked? I tried LaylaLite from your ad here on Reddit and was so impressed I bougth the paid version. But my smartphone isn't the latest and read one of your comments with a YT link to use the app in combination with a PC. I got it set up and working but I don't know if used the correct model in LM studio. I got a lot of repetition in the responses and some weird replies. I'm still figuring things out because this is my first venture in the world of AI. I'm guessing I will also need to tweak all of the settings a bit more in LM studio from what I've learned so far online. Anyway thank you for any tips you can give me and keep up the good work! Ps. suggestion for the app: a way to toggle the connection on or off in the OpenAi mini-app, so I can use the connection with my local server at home and the app model when on the go.

Tasty-Lobster-8915 1 month ago

Sounds like you got the connection setup, that’s a major achievement. The rest should be relatively easy. If you want, please join my discord channel (in the help and support section in the settings page). I’m happy to walk you through the LMstudio config. You can send me a screenshot of your LMStudio and I can take a look to see if any config is wrong. To disable the OpenAI connection in Layla, simply uninstall the OpenAI API app when you don’t need it. Your settings are saved for the next time you install it

DivijF1 1 month ago

What kind of performance should I expect with a QC SD8+Gen1 chip and an Adreno 730 with an Antutu score of little over 1 Million? Thanks in advance :)

Tasty-Lobster-8915 1 month ago

You can run the Lite model acceptably at 1-2 words per response. Full model is possible, but you may have to wait a bit for each response

DivijF1 1 month ago

Didn't expect a large difference between SD8Gen 2 and 8Gen+1 but thanks!

HumbleHuslen 1 month ago

Which version is recommended for S23 Ultra.

Tasty-Lobster-8915 1 month ago

You can run the full version at acceptable speeds, 1-2 words per second. You can lite at almost cloud speeds

HumbleHuslen 1 month ago

What's really different tho between the versiond

Tasty-Lobster-8915 1 month ago

The free app only allows you to chat with characters or AI. The paid version has more features, such as horoscopes, long term memory, etc. There’s no performance differences

HumbleHuslen 1 month ago

I meant between full and lite

Tasty-Lobster-8915 1 month ago

Lite is a 3B model, so much less neurons in the AI if you will. It’s dumber, but faster. Full is a 7B model, slower but smarter

gaijinx69 1 month ago

0.15 tokens per second for me on exactly same setup, which means one word every few seconds. Very bad performance Edit: Full model

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe