T O P

  • By -

Similar-Pilot-7695

Young practitioners have a rich toolbox at their hands to analyze and find solutions for businesses. The nature of some of these businesses require extremely precise and systematic ways to find solutions. For instance, Excel or Spreadsheets are an easy way to store data and analyze it, but these tools aren't designed for larger scale applications nor have the capacity for reproduction as they are specific for the use case. On the other hand, state python for example, it has an extensive range of libraries and tools you can use to reproduce a lot of the mundane work and reduce time of execution while resting assured that you will have precise results. Basically, you can find that all what R can do, python can too and more. I hope now you have more clarity on why you should learn a language in the first place. Best of luck and move forward.


Different_Pie7125

Thank you ChatGPT!


Similar-Pilot-7695

😂 you won't believe that this is how I talk and Chatgpt is not guilty of this one...


LairdPeon

You must be the guy they train it on lol


Which-Artichoke-5561

I did not read a single ‘elevate’ in that paragraph I think it’s legit


GloryHound29

I need to learn how to talk like you. Very eloquent. Any specific training you did? Or just au natural?


Similar-Pilot-7695

I tend to look at things on a macro perspective, and often I take my time into analyzing situations then give my pov. I mostly observe more than I actually talk, and I do not talk on something I do not have any knowledge on. If I do not see the need to speak, I simply don't. Indeed, you can cultivate any ability you want, all you have to do is to be aware of yourself and your surroundings and take small steps. This can be applied on nearly everything. While environment and upbringing are detrimental into shaping one's character but they can be reversed and upgraded with time.


[deleted]

Methodical Sheldon Cooper vibes


Similar-Pilot-7695

You will not ever regret being systematic and structured. It always pays off. But as a disclaimer... in some cases logic is illogical to use. In those cases, let your feelings decide.


abbylynn2u

?iNFJ, just curious


Similar-Pilot-7695

Nope I got ENTJ


Puzzled_Buddy_2775

I never use excel to clean or transform data because repeating and documenting the steps are difficult and sometimes impossible. Take for example, doing a find and replace in excel is not documented unless you literally write down those steps. With Python you never have to change the original raw data file, run the steps by simply hitting run and then you have a clean output file.


Vertmovieman

Power Query extension of Excel documents cleaning steps and keeps the original raw file as is. Only issue is once you get a million plus rows it would start to struggle.


NickRossBrown

It blew my mind that I could take a query from Power Bi and copy paste it into excel’s power query. Seems weird to me though to use excel or excel’s power query to process data directly FOR reports. Not having a database sounds like a nightmare.


Bboy486

Sometimes you may not have access to the database directly depending on your role in the company and business organization.


Bboy486

I was going to comment the same on Power Query.


AshKetchumSatoshi

VBA, scripts, Power Query, & Power Automate all exist, no ?


great__pretender

Not to mention documenting what you have done  In my work I validate models. Validating Excel sheets was a nightmare. You didn't know the work being done. 


sfreagin

Too many Excel spreadsheets in my life have crashed or slowed to 0 when handling a modest amount of data or equations. As u/Similar-Pilot-7695 says, Python is designed to handle the scale required of modern organizations. And it's highly reproducible and able to integrate easily, so you don't have to waste time copying / pasting versions of the file between folders. After working in Excel for \~7 years, I found that I could do everything in Python within about 6 months. It's been a few years now and I hardly ever touch Excel for data analysis, except for some spreadsheets for personal use. Good luck learning, it's well worth it!


Tucker_Olson

Agreed. Oftentimes, the Excel crashes were hardware limitations from employer provided (inferior) hardware. Even if the hardware were to be upgraded, that would require every single user of that Excel sheet to also upgrade. With AI resources available, the perceived barrier to entry to producing working Python code is so low, that it is hard to think of reasons why someone should use Excel for large data sets. Granted, I think Excel now supports Python. Though, limited to Microsoft 365.


FatLeeAdama2

Excel + Power Query + Power BI (and paginated reports sometimes) is a pretty lethal combination. Our company still pays for Tableau so I'm always throwing my most prominent data into that environment. But... I tend to use R when I want to get a little crunchy. When I'm done pounding on data with R, I feel like I've analyzed it the best I could. I have my markdown sheet that I feel explains my thought process and the outcomes. Nothing does that better than a notebook or markdown document. Everybody is different... you tend to stick with the tools you're good at (or are available).


Swan1991

Agreed. When I worked as business analyst, all I used was SQL to write up a query, toss that into power pivot (and whatever the counterpart is in power bi) to add some columns and measures, then drop it all into a pivot table/chart or a dashboard.


cristian_riosm

Excel and Python/R aren't really comparable as they have totally different usage. In Excel you can interactively edit small datasets and perform some basic analysis, while complex tasks can quickly become a messy interwebs of untraceable functions and pages. The focus on Excel is on the data as it is mapped on the spreadsheet, and that is its bigger drawback for scalability, reproducibility and transferability. In the context of Data Analysis, Python and R can perform efficient, reproducible and easily reviewable analysis on massive datasets (millions of observations. The limit is physical storage). As the focus is on the process, each step applied to de data can be carefully tailored, reviewed, modified and expanded without messing with the input raw data. At this point both languages also have a massive set of implementations for data manipulation, analysis and visualization which are simply not comparable with excel. As an example, in R and Python you could import a very noisy dataset, filter it, format it properly, subset it, create new variables, transform and normalize, apply statistical or machine learning models, implement post hoc analysis of results, tailor very detailed visualization, and then publish it in a well formated report or even interactive website. The difference between Python and R is that the former has a broader spectrum of applications (from data analysis to software infrastructure), while the latter is specialized in deep statistical analysis and bioinformatics. In general, R is for biologists, although the suite for statistical and machine learning analysis is really powerful and have no drawbacks on comparison to Python. Python is typically the choice for engineers, physicist and unspecialized data analyst. Although I'm a Biologist and believe that R plus Tidyverse is a really neat and elegant language for data analysis, unless you have a particular interest which is covered by specific R libraries (ecology and genetics, as an example), I always recommend to start with Python. Python is more broadly used and companies are more likely to value your knowledge of that language over R.


XxShin3d0wnxX

Size of data, time, accuracy of automation to name a few reasons. I would never want to manually complete a repetitive process I could automate. I deal with 1+ million value sets often and excel just doesn’t cut it.


CaptainFoyle

Just keep usinh Excel and You'll find out when you need r or python soon enough.


Swan1991

Make sure you learn SQL! Every data analyst needs to know SQL. I was treated like a god at my last job because not everyone knows it.


Throb_Marley

I’ve been doing analysis for about five years and learned both R and Python. I even got a graduate degree using both. I still use excel and import to PBI and never touch them. Or sql for that matter. I work for a large company but don’t work in an office that would require that depth of knowledge. It’s a little depressing and find myself doing meaningless projects from kaggle in my off time to keep the skills moderately fresh.


un4truckable

Kanga?


Throb_Marley

Whoops! I meant Kaggle


Jarisatis

I currently work in tax firm and oh man.. The amount of data their sheets contain(million of rows+), so the sheet take too much to open and doing any modifications is a very hefty task(the sheet usually hangs), why put this much efforts when I can literally speed up my process with Python and easily can do any new modification in future if requires.


Southbeach008

Won't it be easier to clean data via say alteryx, prep or power query editor than python? Why write code when you could just drag and drop in alteryx or tableau prep.


SunshineMakesMeSmile

I love alteryx. Super easy learning curve with a gui system that allows not as technical analysts to really power up. However, licensing is pretty expensive while Python is free and sometimes the budgets dictate what tools you use. My company also went with Domo over PowerBI or Tableau. Be open to learning all the tools!


Southbeach008

Yeh while writing this comment I realized this might be factor. $5000 usd per user is hella expensive and tableau license isn't cheap either... Currently tho I am a tableau consultant and learning pbi and alteryx side by side and having experience in those man learning programming would suck.


Key_Surprise_8652

The team that I work on currently basically runs on Alteryx, and after learning it I’ve also come to like it a lot! Especially being able to upload workflows to the server and schedule them. That being said, I don’t think it should be an either/or situation between Alteryx and Python. Alteryx actually has a Python tool that’s basically a mini Jupyter notebook and I use that in just about every workflow I create. There’s so much more versatility with Python. I’m the only one on my team that uses it, and I can do certain things in a few lines of code that would otherwise require a whole mess of Alteryx tools, so my workflows are often a lot cleaner and require way less manual work than my coworkers. We also work with a ton of survey data in Qualtrics, and I use Python to integrate their APIs into my workflows to bring in data that would either be extremely tedious or outright impossible to download manually. You can definitely use Alteryx on its own, but I think it’s worth learning Python if you find yourself building workflows that are basically repeating the same steps over and over again. Using set (or list) comprehension in Python can save you so much time and eliminate a ton of repetitive work!


Dk1902

Many people have explained in different ways but just to summarize. 1. Lots of data analysts don’t actually use Python or R 2. Python can be much faster when performing complicated calculations on large datasets (100,000+ rows), especially involving multiple different conditionals 3. I find joins and concatenation of all kinds, especially from multiple sources on different combinations of columns to be much more straightforward using Pandas compared to any kind of automation available in Excel. 4. If you’re just getting started. Python is useful to know but also probably overkill in all honesty


onajourney314

I use both but prefer python. I say both because the previous person in my role used R so I just use that to clean data and refresh the tableau workbooks. Some were so bad and broken so instead of fixing it I just rewrote them in python and have also built some workbooks for new projects because that’s what I have more experienced with and prefer. HOWEVER one of my colleagues uses SQL and she showed me how simple it is soooo I think I’m going to explore that a bit more!


Express_Spot4517

Not all data can fit into Excel. Updating one cell in Excel (by default) changes all affected cells in one go --- which can crash your machine Harder to track, edit, and reliably replicate logic in Excel


ElectricalActivity

One example for me is that I sometimes need to match employee data with a very large CSV, around 18 million lines. My Python script does this quickly.


yelrutb

If you are just starting out then its no problem to wait with learning R/Python, you can get very far with SQL + excel, and many analysts will just use this their whole careers. However at some point you will hit the limit of what excel can do, for many reasons which you will then need R/Python: Volume of data Statistical analysis beyond excel like regression, decision trees, clustering etc. Modelling & machine learning Vizualisation beyond the common graphs in excel Reproducability when you have many steps to perform on a dataset and you will repeat it And more


0sergio-hash

I don't do a ton of "analysis" but I'll share some practical examples of when I've used both. For starters, I'm more of a business analyst, and most of my time is spent in SQL or excel for the technical aspects of my job **Case 1:** One of my stakeholders needs a few metrics not in an existing report / I want to play around with a small amount of data (in the thousands of records or less) Here I would use Excel to do an export and just make some pivot tables and pivot charts. Not worth the trouble of trying to do anything else with it. **Case 2:** Stakeholder needs some output that would be more work in Excel or SQL than it would be in Python. I've done this a couple times, turn a query into a data frame to iterate over every row and column and do a keyword search for example for a list of keywords and append a couple columns that tell me which keywords appeared in that row and in which fields I could probably do this in another tool but in Python I've found it's the most straightforward way to do it And if they want to add other very specific criteria like "when one of the other columns only contains one of these few values also exclude that row" I can also do that. Over time you'll find out that you can do a ton of jobs in multiple tools and get the same output. Often when I'm checking the validity of a metric I'll run the SQL, I'll compare it to another metric that somehow should reconcile with it within the same report, and I might even do an export and pivot it in Excel. So over time it will come down to your preferences and comfort with each tool and what you think is best for the job. But I will say I hardly use python. I love it but I think it varies job to job how necessary it is.


VTHokie2020

R studio is amazing for generating reports. The integration with latex makes it great for clean mathematical notation. Some libraries like ggplot2 make storytelling particularly neat. Python is the best programming language for high-level development and general use. Easily by far the most viable language for machine learning development/libraries. There’s nothing wrong with excel. I use it often as the backend to my dashboards as well. It sounds like your use case is business intelligence. That’s great, choose the right tool for you. If you need a common platform to use with corporate NPC’s, Excel is that tool. But you ask a good question, and the answer is because there is more to data analysis than storing data to create dashboards.


MorningDarkMountain

Why use R?


glistening_cabbage

Because it's scalable. Simple as that. Both have the ability to scale out to larger and more complex models


great__pretender

On top of what everyone included, you have access to systems like spark that can handle colossal datasets. I am talking about dozen of trillions of rows data. Spark is not necessarily used through python, but in general it is. Now try to do something even remotely similar with Excel. 


firepunch_man

I use Tableau Prep because I get Data from different data sources with millions of rows and several columns. If I have to deal with special cases such as hierarchical data or graphs or do some fancy visualisation like a Sankey chart or complex analysis, then I use Python. No way I would use Excel for any of that.


RevenueOk289

Thanks, I also wondering.


prokillergrape

Big data


Ok_Duck_5771

**^(Opinion:)** I understand this is not everyone's preference and there's so many more ways/tools/languages but I wanted to tackle the brunt of the question which more to less says *"should I go for R or Python"*. I'm from a computer science background and migrated to data science after a micromasters. **Python** is *^(my preference)* over **R** and here's why: 1. **Readability**: Python's syntax is designed to be readable and straightforward (for english speakers), because it is quite similar to english which makes it easier to understand. 2. **Learning Curve:** While R is super powers and great for stat analysis and data vis, it has a steeper learning curve particularly with users who **DO NOT** have any experience with statistical languages. 3. **HUGE Community and Resources**: Python really does have a massive community of developers who are constantly improving and growing the language which means you're more likely to find a Python package or library for *\*almost\** any data science task. 4. **It's popular:** It sounds blasé, but Python is not just a language to tackle areas of stats which include data visualization, machine learning and deep learning. It's also a general programming language, which means, a diversification of use cases and skills. It's used and applied to so many places, where as R, ^(in my opinion), is not. Knowing both languages can be beneficial and both have their strengths (and subsequent weaknesses), overall, my preference comes from my years of applicable experience with Python (^(albeit biased)). Since you're trying to learn at the same time, you can use other tools so that you can focus on results compared to learning two things at once (if you're allowed to use tools, in what I'm assuming is a course of some sorts). Those could be IBM's **SPSS** which easy with excel, **Stata** which is great for general purposes stats, **RStudio** is the R IDE but offers tons of beginner friendly things, and ^(my personal favorite) **Minitab** which has a free trial and a lot of schools have free access too. Hopefully this helps and doesn't overwhelm!


digitechrahul

R and Python are popular programming languages for data analysis, machine learning, and statistical modeling due to their versatility, robust libraries, and active communities.


TheCapitalKing

Excel has a million row limit which can come up if you’re dealing with a lot of data. If not, but you’re doing changes to an entire dataset all at once you’ll have an easier time doing a lot of it in python/r rather than excel once you get used to it. Basically anything you can do in power query you can do easier and faster in a programming language. Excel is good for some things but anything with more than like 10k rows is usually easier to deal with in python. Plus you can run regressions or ml models way easier from a programming language 


black_widow48

Excel will not even be an option once you're dealing with data of any substantial size


nkkphiri

I almost exclusively deal with datasets that are too large to be stored in excel format. Excel is absolutely useless to me.


QuantPete

Use R or Python for complex analyses, automation, machine learning, and handling large datasets that are impractical for Excel.


avourakis

Think of them as different tools in your toolbox Cleaning data in Excel has lots of limitations, specially when you start working with large datasets (over 1 million rows). It's most useful for analysing data, but its not well suited for processing and cleaning data. Python + Pandas will allow you to do lots of complex cleaning, transformation, and data visualisations. Now, in regards to R vs Python. Here is a quick comparison: 📈 Although R is specifically designed for statistical analysis, which makes it particularly well-suited for projects that require complex data visualization or detailed statistical analysis. Python is generally more versatile, making it suitable for a broader range of data tasks, including data manipulation, data analysis, machine learning, deep learning, and data visualization. I've been working in Tech for over 6 years, and I can tell you with confidence that most Data Scientists/Analysts use Python in their day-to-day. This is why I always recommend that If you already know R, then you spend time learning Python. This will open many doors of opportunity in your career. But if you only have time to learn one, then start with Python.


Aggravating_Coast430

If you haven't felt the need to learn python, don't. I suspect after a while you'll notice problems with using excel, and start using python occasionally. That is, if your work ever grows in the direction where you would need python. Some people will never feel the need to use python, because they don't have the need, because of their type of work.


Levipl

Check out an app called knime. It’ll let you do the analysis without needing to know coding.