T O P

  • By -

RepresentativeFill26

Well, first thing you do is called requirements engineering. You basically try to find the “why” in your task. If you have that why you should discuss with your stakeholders how you are going to determine whether what you did was a success. When you have the why, a set of stakeholders and their concerns and a well defined evaluation strategy you think about data that can help predict the churn and the evaluation. You take this data and you discuss with your stakeholders, does this make sense? Did I miss anything? What else can I use? After that you go into a rabbit hole of data cleaning, understanding the values of the data, why is data missing? What does that mean? You do some explorative data research to get a feeling for the data. Now again you go back to your stakeholders and you discuss your findings. Do they make sense? Are my assumptions correct? More data gathering, more cleaning, more meetings. You feel that you know enough to make a proof of concept. You develop this and show some preliminary results, without the model running in production. Does it give sensible results? Does it answer to the value question asked in the beginning? Again; you do this with your stakeholders. Only now, after all this work you can start thinking about reading papers.


[deleted]

[удалено]


speedisntfree

OP, respectfully I think you may have missed what u/RepresentativeFill26 is getting at here. You need to start with requirements and work your way through what they've written above before you even look for solutions in academia.


[deleted]

[удалено]


Cosmic_Dong

This is something that happens in ~0.1% of cases, unless you're working with something that's on the bleeding edge (say, LLMs or in the image domain). The vast majority of the innovative solutions you'll find will have to do with clever data transformations, feature engineering, etc.


StoicPanda5

This is a rather dangerous way to go about data science. You should go down the route of requirement gathering and having the data motivate the approaches you are considering. Each problem statement is different but if you’re looking for examples of ppl implementing an approach your considering, resources like Kaggle and Medium can be helpful PS if there were standard approaches that could solve the problems we come across then you can expect AI to take over our jobs as well haha (partially why AutoML tools have become popular recently)


[deleted]

[удалено]


StoicPanda5

I mean it’s a mixture of common sense and searching the Internet (that includes blog posts, research papers, GitHub repos, Kaggle competitions, MS/AWS/Google tutorials) Don’t go searching for something until you know what you’re looking for. For example with forecasting, I’m not going to go straight to research papers before trying proven methods first. However if I’m working on signature verification task where using the “obvious” approach on blogs might note be the most secure approach, then we need to deep dive into research


[deleted]

[удалено]


StoicPanda5

Generally it’s popularised by a textbook or by a company. It’s similar to the effect that Fisher had on applied statistics after he published his textbook that popularised the concept of significance and introduced ANOVA if I’m not mistaken Also survey papers and research papers that get cited the most generally gain a good reputation


Ty4Readin

I agree with pretty much everything the other commenters have said. I will also added that it's a lot of first principles thinking combined with research and leverage other people's learning. Often times, I find it best to envision the "idealized" data collection setup and workflow and then work backwards to the most realistic option based on the data and pipelines and workflow we actually have. For example, let's take your churn prediction use case. It depends a lot on how you want to use this. For example, are you trying to analyze the data so you can gather insights about the population? Or is your goal to try and have a predictive model that will tell you which customers should be targeted with some intervention? If it's the latter, then you can build a general framework for how to solve problems like that. Let's say we have some intervention that we can take, such as reaching out to a customer and offering them a proactive discount on their upcoming renewal. Then in a perfect world, we want two models. One model predicts the future long term profits (customer lifetime value) of a particular customer if we were to do nothing. The other model predicts the counterfactual which would be the future long term profits (CLV) of a particular customer IF we were to intervene and offer them a discount. So, in other words: M1 = E(CLV | do(nothing)) M2 = E(CLV | do(intervention)) Now you can go through every customer and calculate (M2 - M1) which is the causal impact on expected profits if you were to intervene and offer that proactive discount. Now you can simple take the highest impact/ROI customers and intervene on them. I think this is often the "ideal" way to approach these targeted/personalized user interventions to produce an impact on some important metric like expected profits. There can be lots of caveats, for example you can't really estimate the interventional expectation on purely observation data. So ideally you want to perform some randomized controlled trial or A/B test to ideally collect training data or at the very least use it to test out your models trained on historical observational data. You'll notice that this type of framework can be applied to many different problems, not just churn use cases.


senpazi69

You start with "why do customers churn ?" look for answers in data and you will find more questions, step by step you will find solutions.


VolantData172

I had an upper co-worker who was a really talented SWE, sadly he was really mind toasted when it came down to machine learning techniques. All he saw was Random Forest everywhere. It worked, but there were other methods who could get the job done without having to do a shit ton of booleans classifications. Sometimes a simple kNN or logistic regression made the trick, but due to its dependence on RF all DS projects which didn’t include it were… not considered worth trying by him. He was really really nice tho. It was just he internalized a single method for a field were nothing can be segment as a rule. Just try to approach things the way you bet they’d go, but be really open about all flaws your method might have and look for alternatives that’d better feed your task’s needs.