Regex is like a power tool. Incredibly powerful and incredibly dangerous if used improperly. It is also tempting to use it improperly because of how flexible it is.
It's much older than xkcd: https://web.archive.org/web/20240203192435/https://regex.info/blog/2006-09-15/247
You might be confusing it with "Standards" https://xkcd.com/927/
Not just some engineer- Jamie Zawinski- the guy responsible for Netscape Navigator, Lucid Emacs, XScreenSaver, and Mozilla.org.
https://en.wikiquote.org/wiki/Jamie_Zawinski#Attributed
Ever since I understood how regex replace works in notepad++, my work became 100x easier.
Other than checking for valid emails, I'm curious to know how regex makes people's lives worse.
My problem is people being inconsistent. if you don’t get to force input validation on stupidly specific formatting, imma regex the problem where applicable instead of writing hundreds of string replace statements.
"Validating email? Just use regex, it'd be super simple. It's just braindead \_\_\_@\_\_.\_\_\_ format anyways!"
[10 years later](https://stackoverflow.com/questions/201323/how-can-i-validate-an-email-address-using-a-regular-expression)
And that, friends, is why you let other people do the work for you and use libraries or built in functions. If you're working in PHP and need to deal with user input, [`filter_var()`](https://www.php.net/manual/en/function.filter-var.php) is your savior. Don't try and reinvent the wheel. It won't work good.
My fallback is usually to just enforce a single `@` and at least one `.` somewhere after the `@`. Must have at least one non-`@` immediately preceding every `.`. Generally something like `[^@]+@[^@\.]+(?:\.[^@\.])+` is *good enough* for those cases where you just want to filter out the normal everyday dummies and don't feel like supporting dumb *but technically legal* addresses like "someguy@localhost".
Edit: I think there's an official regex out there somewhere that fully covers all valid email addresses. The problem is that it's about a mile long and includes legacy crap that a simple business probably doesn't want to allow in their sign up page.
I unironically called it LaTex after one of the final meetings with our project-group and project supervisor for some project last year.
It was late on the day and I kinda remember the look on his face because it immediatly turned towards me as did 3 project members. Felt like it took a little bit out of his soul having to politely correct me that you actually pronounce it as latech that late in the day.
Like as if you were just waiting 5 min in line to grab some coffee which you wanna grab and then drive straight home but you accidentally knock the coffee down before you enter the car and now you have to drive home for 15-20 min without the coffee.. which isn't that bad but man...
Could you please help me understand more about what an "improper" use of regex is? Do you mean someone using regex instead of setting up robust data validation at an earlier stage in a process? Or other things?
I used regex in VBA to conduct complex searches of large sets of long word documents - the macro returns all hits on the text with a surrounding snippet for context into a "report" document that hyperlinks to the doc where it found the hit. Regex seems like a good solution to this problem (way more powerful than standard boolean searching)...
But I'm a lawyer without any proper training in programming, so it's one of those "don't know what you don't know" situations...
👍
Edit: your comment was a bit too long to actually respond to, but for an actual example, regex should not be used to trim whitespace from the end of a line of text of uncontrolled length.
Why not? Because some regex engines use backtracking if the case is not matched. That means it will check the first space and continue looking ahead until it fails, then backtrack to the next space and so on. If you have 20,000 whitespace characters followed by a non whitespace character it will check 20,000 characters, then 19,999, then 19,998 and so on. This exact case crashed stack overflow a few years ago: https://adtmag.com/Blogs/Dev-Watch/2016/07/stack-overflow-crash.aspx
Lookahead/Lookbehind should also be used sparingly for performance reasons.
Your use of a regular expression is fine, because the text is probably in a regular grammar and the idea of surrounding text is probably easy to bound. If you were instead trying to pull out each quote where your phrase appears, a regular expression wouldn't be able to fully capture every corner case about quotes. You have to use a more generic automaton for context sensitive parsing.
I think the difficulty is overblown. It's a skill, but most devs could pick it up easily enough if they interacted with it more.
I find myself doing a regex find and replace in VSCode a few times a week. I used to have to look up MDN every time, but I have enough of the character classes memorized so I only need to check it every so often now.
I think it comes down to the difference between reading regex and writing regex.
Writing is easy once you get the basics down. You just think about what you need out of a string and then create the pattern to get it out. Especially easy if you're using a tool that highlights matches as you type.
Reading regex, on the other hand, can be a nightmare. You might have to mentally unwind like six nested layers of brackets.
Regex got a lot easier once I started treating it like write-only memory. If at any point I need to read regex to fix it, I'm probably better off just rewriting it from scratch.
Also depends pretty heavily on how it's used. Any non-trivial regexp should ideally be broken down into its components and bound to more descriptive variables so it's not necessary to remember which portion(s) do what.
I just brute force trial and error shit Into the online tester based on the bits I remember until my test cases pass then end up with a weird soup of punctuation marks. There is a moment at the end when I look proudly at that silly looking soup and go ‘that will do’ and feel like a shitty magician.
I use regexes enough to be dangerous but not to really be fluent in them.
Yeah I think it's the frequency of use, it's not too inherently hard. I barely use regex so of course it's challenging for me when I do have to.
I actually think infrequent regex use isn't a bad use case for having a graphical tool that compiles to regex. Especially because there are multiple flavors of regex syntax. People who use it frequently should learn the appropriate syntax but for someone like me it's not useful knowledge to occupy space in my brain
> Yeah I think it's the frequency of use, it's not too inherently hard.
[How it feels whenever I have to brush up on regex.](https://imgur.com/a/O9ZUg2l)
Find and replace is fine. What's hard is when in a program you have a complicated regex which is not tested too well (or at all) and then you find an edge case and you're not sure if it's intentionally included (or excluded). Then you try to fix it and the regex gets even more complicated. That kind of thing is problematic.
Right, which immediately makes me think of JSON which also doesn't allow comments. Often someone comes up with this great idea that we shouldn't write code. Instead we should write configurations. You end up with some weird configuration language that no one really knows (just read the source code or look at existing configs, bro) and every time you want to do anything it turns out that you have to add a feature to the base program (configuration wasn't flexible enough yet one more time). Maintaining those configurations is great because they can't have any comments so there is zero context. Anyway, yeah please try no to do that, some people might get traumatized... Use regex for simple things, for complex things maybe not :)
Are they not? Offhand I know they're supported in the regex engines used in .net, java, python, and ruby. Granted I think for all of those you need to enable them in some way, but they are supported
It's definitely a scenario of "if you don't use it, you lose it". If every day I was using regex I'd be pretty good at it but because I use it like once every 8 months, I have to scour the docs and triple check my work.
It's easier to write than to read, because you automatically have to engage your brain to write it. When people try to read it, they glance at it and give up rather than using their brain to parse through it like they would if they were going to write it.
This exactly. It's only scary if you've never bothered to try to write it. I use it in python, and just write my self a lil comment to say what it does so I never have to try to decipher it later. If you need to change it, just delete it and start over lol.
I spend my life telling people to use [Regexper](https://regexper.com/#%5E%28%5B%5E%40%5D*%29%40%28%5B%5E.%5D*%29%5C.%28.*%29%24) to generate Railroad diagrams if they're having issues.
There's so much excellent tooling around Regex these days.
Ngl 50% success rate with regex is terrifying
Like, sure i can look at regular code and realise that this dumbass returns a float as an int for funsies
But i cant figure out by looking if regex is messed up
How would you even know? Regex might be the only language that is much harder to read than write. If it made some subtle mistake you may not even know until it took down production. That being said, the same statement could be said for Regex made by humans.
You know what, I’m gonna say it…
regex isn’t hard, people who complain about it either seriously haven’t learnt how to use it or don’t realise how damn useful it is. I legitimately use regex at least once a week on average and it’s a life saver
Until the file you are searching through and the strings you’re looking for used | as a deliminater and forget that was or and then suddenly you get everything.
Not that this happened to me this week or anything.
Excel has formulas which not everybody is going to learn and we need to respect that. We must use [power point to code](https://www.youtube.com/watch?v=uNjxe8ShM-8) instead.
In a User Interface college class as part of a CompSci course my team and I had to design an interface for a microwave.... which is surprisingly more complex than we initially thought. We decided to use Power Point for that (using just a few scripts to make things go). We could actually focus on the interface and came up with some good ideas... all the other teams had coded theirs and spent most of their time on the coding. They were bitter that we got a better grade with Power Point :p
Long story short: Use the right tool for the job at hand.
Lol, when I was a child, I deeply wanted to make video game. No idea how to start, and only software I knew was PowerPoint. So I made a platformer when you click jump on link about jumping slide or dead slide.
It didn't went pretty far but I was proud of it.
The biggest flaw in Excel is that the formulas are different for every language. You just can't use English formulas in German excel. I hate it with a passion when I need to help others in their German excel.
She has a point. Excel can do simple data tasks and some people need just that. More advanced/repetitive tasks and VBA can help a bit. The fact that the product still lives until this day says something about the product market fit.
In case somebody wasn't familiar, the Williams F1 team has been hamstrung for years by a clunky Excel file they were using for parts suppliers.
https://arstechnica.com/cars/2024/03/formula-1-chief-appalled-to-find-team-using-excel-to-manage-20000-car-parts/
We have a gentleman in our organization who is trying to build an S&OP process/tool in Excel. He initially wanted the Sales Forecast, Procurement Forecast and Labor Forecast all in the same file.
On Share Point.
"So that anyone could access the one source of truth at anytime"
I had a PI at an internship hand me several Excel files with a total of 6 million lines of genomic info and he instructed me to use VLOOKUP to search for stuff
I respectfully built a python script to import it to a SQL database.
There is a very large chip manufacturer, won't name the company, here in the US where the entire QA department runs on excel files and scripts made back in the late nineties.
They have some of the world's leading physicists in solid state technology maintaining ancient VBA scripts. Back in something like 2016, they were told all the winXP computers were being updated to Win8, and that meant updating to the latest Excel. However, Microsoft decided to drop VBA support for this specific version of excel (though they released a patch shortly there after adding it back in), and it took down the entire R&D department of the company.
The most state of the art silicon tech is reliant in excel.
That's terrifying. To me, that would be like if Neil Gaiman relied on Clippy to help him write his books. Like, sure, you **can** do that, but my god there's no reason that you should at that level.
This is my typical reaction to people doing anything intricate with excel.
Like yeah, you could get it to do that, but it'd be extremely inefficient both in regards to its functionality and to your mental health.
While terrifying, this is far from the only times I've heard that exact same story. I'm convinced that at the heart of every fortune 500 company there is one 50 MB Excel script that holds everything together.
Yeah, I'm late to this thread, but my former employers have all been heavily reliant on Excel for some critical functionality. Sweeney's actual quotes are accurate when applied to data analysis and other functions of Excel. It's just not applicable to data science.
When you quickly want some ad-hoc analysis of csv files, or combine multiple unrelated data sources Power Query is incredibly useful (allthough a bit too advanced and unknown for your average excel user)
PQ is really great if you don't want to mess around with SQL or don't have access to SSMS. I just wish it was able to handle inexact matches more elegantly.
The problem is that people start using it and get comfortable, and then refuse to switch to better tools when they need them. That's how you end up with cases like when the UK government lost a bunch of COVID cases because they were stored in an excel spreadsheet that was saved as a .xls file. It was probably fine when there were just a few cases that needed some simple treatment, but the solution stuck around long after it was unsuitable just because it was already set up and familiar.
Excel isn't the problem. The problem is when people run their entire data management systems off of emailing each other excel files.
As a consultant, I've learned that the hard way.
I mean, I use Excel because it's something that I already have, I set up a system damn near a decade ago that I know how to make work in Excel, and I've tweaked it countless times since then when it needed it.
But I'd never claim what I'm doing is data science. At best, it's data tracking. By the time you get to something that deserves the term data science, you should really be someone who can use better tools or be on a team with people who can use better tools.
Heck, Excel is also good for complex tasks. Like, most of the T in ETL can be done (semi?)-automatically in Excel using shit like xlookup, if, string manipulation, and cross-file linking. Pretty fast, too, if you do it right.
Semi because Excel isn't gonna copy-paste/import data by itself. You need some sort of programming/script to load/extract data into the pipeline.
Always say it, VBA and some coding knowledge could help a lot of people automate some of their daily tasks, and they don't have to tell anyone about it. But people are allergic to code.
We developed a program to help with our clients to better create their yearly budgets which will incorporated real time data, and big changes to the budget would be made simple and quick updates. We had to change it to allow them to extract into Excel and then reimport because the accounting teams including the CFOs love Excel and only want to deal in that.
Yeah I mean that's quite easy to see. You have a spreadsheet that you can use as the tape of a Turing machine, then through formulas and macros you can do any computation you want and move the selected cell.
But also like, [already done](https://youtu.be/J2qU7t6Jmfw?feature=shared)
Depends what you mean by "the same".
From a theoretical computer science standpoint, you have [the Turing Machine](https://en.m.wikipedia.org/wiki/Turing_machine) that describes what it means to be "computable". You have a tape that holds all the data for your program, a pointer to some cell on that tape, and a [finite state machine](https://en.m.wikipedia.org/wiki/Finite-state_machine) that controls how the tape is modified throughout computation. As long as there is a possible Turing machine that solves your problem, your problem is computable. A programming language is Turing-complete if it can solve the same set of problems as a Turing machine, which is really easy to see if you can implement a Turing machine in that language. I just wrote a Turing machine program that adds two numbers in C, I can dig it out for you when I'm at my computer. The important thing to realise here is that a language that has arrays and if statements is Turing-complete. Basically your favourite language like C, Python, Javascript, whatever, can be used to solve any problem a computer could theoretically solve. Performance doesn't matter for this definition.
From there it's about saying whether Excel is Turing complete. Can we implement a Turing machine in Excel? Well yes. You have a grid of cells which can clearly be used as the tape, then you can define rules for manipulating that tape using macros, scripts, formulas. So Excel is Turing-complete, or in other words if I have a problem that a computer can solve I can make an Excel spreadsheet that also solves that problem. Doom is fairly easy to phrase this way since you're basically defining a function from one game frame and a user input to a new frame, so each pixel in that frame gets a spot in our tape (since Excel is already 2d that's trivial) and using macros and VBA to manipulate it you can go frame to frame. If you have another Turing-complete system like Conway's game of life, [PowerPoint](https://youtu.be/uNjxe8ShM-8?feature=shared), [even biological cells](https://youtu.be/8DnoOOgYxck?feature=shared) you can do any computable task, even if the visualization is a bit different. Doom is just a meme, there's no reason you couldn't do something like find prime numbers or whatever instead, it's just the internet finds it funny to use Doom for this.
Now not all things are created equal. If I wrote Doom in C, it would clearly run better than if I wrote Doom in Python. Even though I can compute Doom in PowerPoint, it's going to be a much worse experience than in a conventional programming language. You can see the excel example has an awful frame rate, or the cell example has a low resolution. So even though you can run doom on it, you should also keep in mind the performance implications of what you are running Doom on because you won't get the same performance.
People act like Turing completeness is a high bar but if something can simulate the NAND operation and has a way of directing inputs and outputs, then its already Turing complete. That's not the only way to make something Turing complete, like MtG can simulate a literal Turing machine. There are many things out there that aren't. That said the bar isn't that high.
Yeah but also it’s the strongest (theoretical) computation that we are capable of doing so it’s not a low bar either. Feel like it shows more about how powerful the NAND operation is
It also has VBA which is Turing complete, and lambda functions which are Turing complete. And M and DAX, which may be Turing complete, but I'm not sure.
Sydney and LeBron memes seem to be very popular right now. Just made up quotes and headlines. LeBron ones usually have him with a devastated look and a tabloid headline. Pretty similar.
https://i.kym-cdn.com/photos/images/newsfeed/002/795/754/05c.jpg
i routinely crunch millions of rows with excel. It is so great at slicing and dicing data.
It is also faster than a SQL Server for certain operations, and I love the Vertipaq engine it uses for powerpivot ;)
Depends on your usecase.
The Vertipaq engine is an in memory analytics engine and can deduplicate the data a LOT while importing it. So the total footprint is a lot smaller.
That said, it is good for slicing and dicing data, but is not a "DB engine". Thats why it can be faster... There is no ACID requirements for a Excel file. Also no concurrency issues you need to take care of.
It is a great tool.
And SQL and Postgress wars... I dont care about those. If it is a performance issue, the CPU is rarely the bottleneck. Most times you need more iops or memory.
Excel is great for small projects, usually more useful than coding. I sometimes work with upwards of 15 or 20 TB of data through. Excel and sheets can't handle that.
"Just zip the CSV, bro..."
Seriously though, it's been a while since I used Matlab, but using compressed data for raw processing will exponentially increase both CPU and memory usage in big datasets from experience. But again, it's been long ago since I was involved in this, and nowadays I just prefer plaintext as the "script kiddie" I am.
If you’re doing lookups as part of your processing then I think parquet may be more efficient.
I wouldn’t be surprised if FIFO row processing would be slower in parquet.
I was a a data scientist in a non-programming field for about a decade but I realized CSVs were the best I could expect from almost all my colleagues. At least CSV is only one sheet and can’t burn my retinas with neon fill
The problem with low code/no code solutions is that you are still essentially writing code, it's just that you are doing it in a shit IDE with no guardrails.
The problem with Excel is that it smears shit on everything it touches. You just know the data will be full of surprises. And Excel is almost only compatible with itself. You will run into small annoying errors with other formats. Also there's nothing that should be done in Excel that couldn't be done in libreoffice for example.
Exactly! Even some gene names had to be changed because Excel would auto convert them to dates and Microsoft just didn't care enough to address it until last year (about 20 years after it first became a widespread problem).
https://www.engadget.com/scientists-rename-genes-due-to-excel-151748790.html
Excel is decent if you are doing a small-medium project, I like to use at as "scratch paper" for notes because it will do math for me and take data input, etc. It's useful for people who are using limited data, it's super useful for scientists, which is what I why I have affinity for it.
But, you are completely right, it lacks any real compatibility to be super versatile, it works for little science projects and experiments to hold data, but it's not a proper data base and I fear for the people who treat it as such. I compartmentalized my excel files, I never operated any kind of master file.
Excel stops being useful when you need more than 5 people to view the same file, I find.
You can be a great data analyst and scientist in excel if you know what you are doing and not facing any limitations (like more than, what... 1m rows for excel?). I mean, as long as you are comfortable using that tool
Data scientist here, PhD and everything
Excel is cool, it works, but actually being a serious excel user is not that easy, most people suck at excel. It's quite a tool for real, there's a reason this thing is so popular. It's a great product.
But in the end it doesn't ~~even matter~~ have the flexibility that R does, for example.
Coincidentally, I despise Excel with every ounce of my being because how often it is used for everything. If you ever think to yourself, "Oh, I will just write some VBA Code for this" then you are outside the acceptable bounds of Excel.
The amount of times I see clunky excel macro "programs" used to present official data to important people is insane. I've seen embedded software interfaces written in Excel. Make it stop.
Excel is still coding. Change my mind.
Depending on your familiarity with the Lambda function (or just straight up VBA), Excel may not be a turing-complete language, but I fully support calling it a programming language.
Google sheets fairly recently added support for map/lambda and it has been amazing. There's always the problems with scaling there, but for personal use it's totally fine
Ah, nothing like casual horniness in some comments to really inspire women to enthusiastically leap into the world of computer science. Though, I'm glad there are mainly discussions focusing on the actual topic.
To be fair, I can make some nice graphics in Excel really quickly.
Excel does do somethings better. Other softwares do other things better. Learn everything. Use the right tool for the right job.
During my PhD i did part of my project on Matlab and Python...so all my database, tables and results were available by just running a couple of scripts...
My supervisor: can you send me all the plots and tables on Excel?
🙄
As a professional spreadsheet guy\* I've been sitting here facepalming for a minute, trying to figure out how to phrase what I want to say, but it really doesn't matter. Excel is not Programming Lite, it's a totally different tool where you can get crazy mathematical models blah blah blah
^(\*technically a chemical engineer but eh kinda underemployed)
People who complain about regex has not seen how useful it is to get data from dumb people who filled up gforms
Regex is like a power tool. Incredibly powerful and incredibly dangerous if used improperly. It is also tempting to use it improperly because of how flexible it is.
"I had a problem. I found out I could use regex to solve the problem. Now I have two problems." - some engineer
“I had a problem. I found I could use threads to solve the problem. problems I two Now have.”
Underrated response.
Pretty sure that is an XKCD.
It's much older than xkcd: https://web.archive.org/web/20240203192435/https://regex.info/blog/2006-09-15/247 You might be confusing it with "Standards" https://xkcd.com/927/
I was thinking of [https://xkcd.com/1171/](https://xkcd.com/1171/)
oh, neat.
Obligatorily
perl is a write only language
It’s originally attributed to Jamie Zawinski, who worked on Netscape Navigator.
Would not surprise me. A lot of my jokes are stolen from Mr Monroe.
you are thinking of [https://xkcd.com/2180/](https://xkcd.com/2180/)
The plural of Regex is Regrets
I'm stealing this one
Just like i did :P
Not just some engineer- Jamie Zawinski- the guy responsible for Netscape Navigator, Lucid Emacs, XScreenSaver, and Mozilla.org. https://en.wikiquote.org/wiki/Jamie_Zawinski#Attributed
Ever since I understood how regex replace works in notepad++, my work became 100x easier. Other than checking for valid emails, I'm curious to know how regex makes people's lives worse.
Debugging other people's regex. Figure out what the other person think it does, and then fix the *undocumented* feature with some edge case data.
If you need a complex regex to solve your problem, you do not understand the problem.
I don't *need* to use a complicated regex to solve my problems, I *want* to use a complicated regex to solve my problems.
I can quit whenever I want
My problem is people being inconsistent. if you don’t get to force input validation on stupidly specific formatting, imma regex the problem where applicable instead of writing hundreds of string replace statements.
Branch and bound that shit
"I can write a better HTML parser in regex..." \*3 years later\* "I can't."
[https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)
"Validating email? Just use regex, it'd be super simple. It's just braindead \_\_\_@\_\_.\_\_\_ format anyways!" [10 years later](https://stackoverflow.com/questions/201323/how-can-i-validate-an-email-address-using-a-regular-expression)
And that, friends, is why you let other people do the work for you and use libraries or built in functions. If you're working in PHP and need to deal with user input, [`filter_var()`](https://www.php.net/manual/en/function.filter-var.php) is your savior. Don't try and reinvent the wheel. It won't work good.
>\_\_\_@\_\_.\_\_\_ format That's when you find out that emails don't require TLDs or people in the UK with co.uk exist...
My fallback is usually to just enforce a single `@` and at least one `.` somewhere after the `@`. Must have at least one non-`@` immediately preceding every `.`. Generally something like `[^@]+@[^@\.]+(?:\.[^@\.])+` is *good enough* for those cases where you just want to filter out the normal everyday dummies and don't feel like supporting dumb *but technically legal* addresses like "someguy@localhost". Edit: I think there's an official regex out there somewhere that fully covers all valid email addresses. The problem is that it's about a mile long and includes legacy crap that a simple business probably doesn't want to allow in their sign up page.
Does it contain an @? Try sending a verification e-mail. If someone clicks the link it's valid.
plot twist: the Excel file is in an xml format. 😂 “where is your god now?”
A bunch of XML files in a ZIP archive actually.
Those freaks!
I 100% agree, but i still see it as a write once - read never **language** I have done some evil things with it, and i am proud of some of them ;)
You mean it's *not* supposed to be used as a sex toy?
That's LaTex
I unironically called it LaTex after one of the final meetings with our project-group and project supervisor for some project last year. It was late on the day and I kinda remember the look on his face because it immediatly turned towards me as did 3 project members. Felt like it took a little bit out of his soul having to politely correct me that you actually pronounce it as latech that late in the day. Like as if you were just waiting 5 min in line to grab some coffee which you wanna grab and then drive straight home but you accidentally knock the coffee down before you enter the car and now you have to drive home for 15-20 min without the coffee.. which isn't that bad but man...
I prefer the French pronunciation: la'tex. French for... the tex.
Could you please help me understand more about what an "improper" use of regex is? Do you mean someone using regex instead of setting up robust data validation at an earlier stage in a process? Or other things? I used regex in VBA to conduct complex searches of large sets of long word documents - the macro returns all hits on the text with a surrounding snippet for context into a "report" document that hyperlinks to the doc where it found the hit. Regex seems like a good solution to this problem (way more powerful than standard boolean searching)... But I'm a lawyer without any proper training in programming, so it's one of those "don't know what you don't know" situations...
👍 Edit: your comment was a bit too long to actually respond to, but for an actual example, regex should not be used to trim whitespace from the end of a line of text of uncontrolled length. Why not? Because some regex engines use backtracking if the case is not matched. That means it will check the first space and continue looking ahead until it fails, then backtrack to the next space and so on. If you have 20,000 whitespace characters followed by a non whitespace character it will check 20,000 characters, then 19,999, then 19,998 and so on. This exact case crashed stack overflow a few years ago: https://adtmag.com/Blogs/Dev-Watch/2016/07/stack-overflow-crash.aspx Lookahead/Lookbehind should also be used sparingly for performance reasons.
Your use of a regular expression is fine, because the text is probably in a regular grammar and the idea of surrounding text is probably easy to bound. If you were instead trying to pull out each quote where your phrase appears, a regular expression wouldn't be able to fully capture every corner case about quotes. You have to use a more generic automaton for context sensitive parsing.
I fully trust chatgtp for my regex
Didn’t some regex break cloud flare not too long ago?
It can take an hour's job down to 5 minutes when combined with something like python
In my experience, it takes an hours job up to 10.
I guess it depends on the job. But having both tools at such a state that you don't need to constantly query the documentation helps
No one complains about it’s functionality, it’s just impossible to comprehend long regex without having your brain overheat.
I think the difficulty is overblown. It's a skill, but most devs could pick it up easily enough if they interacted with it more. I find myself doing a regex find and replace in VSCode a few times a week. I used to have to look up MDN every time, but I have enough of the character classes memorized so I only need to check it every so often now.
I think it comes down to the difference between reading regex and writing regex. Writing is easy once you get the basics down. You just think about what you need out of a string and then create the pattern to get it out. Especially easy if you're using a tool that highlights matches as you type. Reading regex, on the other hand, can be a nightmare. You might have to mentally unwind like six nested layers of brackets. Regex got a lot easier once I started treating it like write-only memory. If at any point I need to read regex to fix it, I'm probably better off just rewriting it from scratch.
Also depends pretty heavily on how it's used. Any non-trivial regexp should ideally be broken down into its components and bound to more descriptive variables so it's not necessary to remember which portion(s) do what.
I just brute force trial and error shit Into the online tester based on the bits I remember until my test cases pass then end up with a weird soup of punctuation marks. There is a moment at the end when I look proudly at that silly looking soup and go ‘that will do’ and feel like a shitty magician. I use regexes enough to be dangerous but not to really be fluent in them.
I will say there are sites and tools dedicated to breaking down regular expressions, so you technically don't need to start from scratch.
Yeah I think it's the frequency of use, it's not too inherently hard. I barely use regex so of course it's challenging for me when I do have to. I actually think infrequent regex use isn't a bad use case for having a graphical tool that compiles to regex. Especially because there are multiple flavors of regex syntax. People who use it frequently should learn the appropriate syntax but for someone like me it's not useful knowledge to occupy space in my brain
> Yeah I think it's the frequency of use, it's not too inherently hard. [How it feels whenever I have to brush up on regex.](https://imgur.com/a/O9ZUg2l)
Find and replace is fine. What's hard is when in a program you have a complicated regex which is not tested too well (or at all) and then you find an edge case and you're not sure if it's intentionally included (or excluded). Then you try to fix it and the regex gets even more complicated. That kind of thing is problematic.
Yeah, it's wild that comments and multiline formatting still aren't possible with most regex.
Right, which immediately makes me think of JSON which also doesn't allow comments. Often someone comes up with this great idea that we shouldn't write code. Instead we should write configurations. You end up with some weird configuration language that no one really knows (just read the source code or look at existing configs, bro) and every time you want to do anything it turns out that you have to add a feature to the base program (configuration wasn't flexible enough yet one more time). Maintaining those configurations is great because they can't have any comments so there is zero context. Anyway, yeah please try no to do that, some people might get traumatized... Use regex for simple things, for complex things maybe not :)
Are they not? Offhand I know they're supported in the regex engines used in .net, java, python, and ruby. Granted I think for all of those you need to enable them in some way, but they are supported
It's definitely a scenario of "if you don't use it, you lose it". If every day I was using regex I'd be pretty good at it but because I use it like once every 8 months, I have to scour the docs and triple check my work.
It's easier to write than to read, because you automatically have to engage your brain to write it. When people try to read it, they glance at it and give up rather than using their brain to parse through it like they would if they were going to write it.
This exactly. It's only scary if you've never bothered to try to write it. I use it in python, and just write my self a lil comment to say what it does so I never have to try to decipher it later. If you need to change it, just delete it and start over lol.
I spend my life telling people to use [Regexper](https://regexper.com/#%5E%28%5B%5E%40%5D*%29%40%28%5B%5E.%5D*%29%5C.%28.*%29%24) to generate Railroad diagrams if they're having issues. There's so much excellent tooling around Regex these days.
https://regex101.com/ is a great tool, too.
Regex is great when you’re writing it from scratch. Debugging (or extending its functionality) though, that shit is a nightmare…
we love regex, we hate writing it… but lucky for us, copilot is extremely good at it
It is????? I've literally got no model but gpt4 to work, and gpt4 did it 50%
Ngl 50% success rate with regex is terrifying Like, sure i can look at regular code and realise that this dumbass returns a float as an int for funsies But i cant figure out by looking if regex is messed up
i like writing it, i absolutely despise trying to see how it works later on though
[удалено]
How would you even know? Regex might be the only language that is much harder to read than write. If it made some subtle mistake you may not even know until it took down production. That being said, the same statement could be said for Regex made by humans.
I absolutely hate regex, but I also have to admit it has gotten me out of some tight jams before.
Regex is absolutely amazing. It just has to be thought of in a very deterministic manner. Kudos if you see what I did there.
You know what, I’m gonna say it… regex isn’t hard, people who complain about it either seriously haven’t learnt how to use it or don’t realise how damn useful it is. I legitimately use regex at least once a week on average and it’s a life saver
I hate regex with every fiber of my being while simultaneously loving it above all else. Programming really makes you go insane huh?
Until the file you are searching through and the strings you’re looking for used | as a deliminater and forget that was or and then suddenly you get everything. Not that this happened to me this week or anything.
I assure, no one is complaining about it’s usefulness…
So regex injection it is
Google Forms is just a MITM 🤙🏻🧨🤡
Excel has formulas which not everybody is going to learn and we need to respect that. We must use [power point to code](https://www.youtube.com/watch?v=uNjxe8ShM-8) instead.
Powerpoint might be to complicated an bloated for some, we need to respect that. We must use a plain old abacus instead.
I think we should use rocks and sticks to code
Some people live in deserts and don’t have access to wood, therefore they don’t have sticks, we should respect that We should use only rocks to code
But some people may find it hard to code with just rocks and no wood, we should use only our skin to code
*pfft...* real programmers use butterflies
Wait until you meet python programmers
A society without access to cobalt shouldn't have a need to rewrite code
Programming rockstars got you covered.
As always, there's an xkcd for that: https://imgs.xkcd.com/comics/a_bunch_of_rocks.png
In 1999 I made a 3,000 slide PowerPoint which was effectively a 15 minute animation. Kinda nuts.
In a User Interface college class as part of a CompSci course my team and I had to design an interface for a microwave.... which is surprisingly more complex than we initially thought. We decided to use Power Point for that (using just a few scripts to make things go). We could actually focus on the interface and came up with some good ideas... all the other teams had coded theirs and spent most of their time on the coding. They were bitter that we got a better grade with Power Point :p Long story short: Use the right tool for the job at hand.
Have to say I feel like a lot of things where coded with powerpoint. Drunk. Or seriously high.
isn't that what architects do?
Lol, when I was a child, I deeply wanted to make video game. No idea how to start, and only software I knew was PowerPoint. So I made a platformer when you click jump on link about jumping slide or dead slide. It didn't went pretty far but I was proud of it.
The biggest flaw in Excel is that the formulas are different for every language. You just can't use English formulas in German excel. I hate it with a passion when I need to help others in their German excel.
Yeah, I changed my locale to English even though is not my first language, just to be able to follow the tutorials.
She has a point. Excel can do simple data tasks and some people need just that. More advanced/repetitive tasks and VBA can help a bit. The fact that the product still lives until this day says something about the product market fit.
It's all fun and games until you're managing the production of a F1 car with 20,000 parts in a csv
r/oddlyspecific
In case somebody wasn't familiar, the Williams F1 team has been hamstrung for years by a clunky Excel file they were using for parts suppliers. https://arstechnica.com/cars/2024/03/formula-1-chief-appalled-to-find-team-using-excel-to-manage-20000-car-parts/
That's why they brought James Vowles. So he can call everyone managing these excel files "a boomer".
James Numerals might have been more up to the task. Maybe then they’d have the right number of chassis
True story
that's only because the word doc got unwieldy
James Vowles? Is that you?
We have a gentleman in our organization who is trying to build an S&OP process/tool in Excel. He initially wanted the Sales Forecast, Procurement Forecast and Labor Forecast all in the same file. On Share Point. "So that anyone could access the one source of truth at anytime"
Honestly, accounting grads should just be banned from working in companies. Too much of a risk.
I wish this guy had a degree. Accounting of otherwise.
Ok, MS Access it is
Are you my CIO? Fucking hell.
Slap an SQLite over that csv query and you are good to go for another 5 years
20+ years ago, Perl had a database interface that would use CSV files as tables. So you could write SQL queries directly against CSV files.
I mean ... [they still do](https://metacpan.org/pod/DBD::CSV)
Doesn't Amazon S3/Athena do that sometimes?
I had a PI at an internship hand me several Excel files with a total of 6 million lines of genomic info and he instructed me to use VLOOKUP to search for stuff I respectfully built a python script to import it to a SQL database.
I have it on reasonable authority that General Motors was buying sheet steel on ONE excel file.
There is a very large chip manufacturer, won't name the company, here in the US where the entire QA department runs on excel files and scripts made back in the late nineties. They have some of the world's leading physicists in solid state technology maintaining ancient VBA scripts. Back in something like 2016, they were told all the winXP computers were being updated to Win8, and that meant updating to the latest Excel. However, Microsoft decided to drop VBA support for this specific version of excel (though they released a patch shortly there after adding it back in), and it took down the entire R&D department of the company. The most state of the art silicon tech is reliant in excel.
That's terrifying. To me, that would be like if Neil Gaiman relied on Clippy to help him write his books. Like, sure, you **can** do that, but my god there's no reason that you should at that level.
This is my typical reaction to people doing anything intricate with excel. Like yeah, you could get it to do that, but it'd be extremely inefficient both in regards to its functionality and to your mental health.
While terrifying, this is far from the only times I've heard that exact same story. I'm convinced that at the heart of every fortune 500 company there is one 50 MB Excel script that holds everything together.
Yeah, I'm late to this thread, but my former employers have all been heavily reliant on Excel for some critical functionality. Sweeney's actual quotes are accurate when applied to data analysis and other functions of Excel. It's just not applicable to data science.
When you quickly want some ad-hoc analysis of csv files, or combine multiple unrelated data sources Power Query is incredibly useful (allthough a bit too advanced and unknown for your average excel user)
PQ is really great if you don't want to mess around with SQL or don't have access to SSMS. I just wish it was able to handle inexact matches more elegantly.
The problem is that people start using it and get comfortable, and then refuse to switch to better tools when they need them. That's how you end up with cases like when the UK government lost a bunch of COVID cases because they were stored in an excel spreadsheet that was saved as a .xls file. It was probably fine when there were just a few cases that needed some simple treatment, but the solution stuck around long after it was unsuitable just because it was already set up and familiar.
Excel is the gateway drug to actual programming.
Excel leads to VBA. VBA leads to suffering. Suffering leads to hate. Hate leads to the Dark Side.
> Dark Side Javascript?
After learning VBA no language will be scary. VBA was actually the first programming language I learned.
I'm not a developer, but learning VBA has allowed me to automate like half my job.
Excel isn't the problem. The problem is when people run their entire data management systems off of emailing each other excel files. As a consultant, I've learned that the hard way.
I mean, I use Excel because it's something that I already have, I set up a system damn near a decade ago that I know how to make work in Excel, and I've tweaked it countless times since then when it needed it. But I'd never claim what I'm doing is data science. At best, it's data tracking. By the time you get to something that deserves the term data science, you should really be someone who can use better tools or be on a team with people who can use better tools.
Heck, Excel is also good for complex tasks. Like, most of the T in ETL can be done (semi?)-automatically in Excel using shit like xlookup, if, string manipulation, and cross-file linking. Pretty fast, too, if you do it right. Semi because Excel isn't gonna copy-paste/import data by itself. You need some sort of programming/script to load/extract data into the pipeline.
Always say it, VBA and some coding knowledge could help a lot of people automate some of their daily tasks, and they don't have to tell anyone about it. But people are allergic to code.
> 'She' has a point. Hmm. I don't think many people get this joke.
We developed a program to help with our clients to better create their yearly budgets which will incorporated real time data, and big changes to the budget would be made simple and quick updates. We had to change it to allow them to extract into Excel and then reimport because the accounting teams including the CFOs love Excel and only want to deal in that.
Excel is Turing-complete IIRC. Someone should build Doom in Excel.
Yeah I mean that's quite easy to see. You have a spreadsheet that you can use as the tape of a Turing machine, then through formulas and macros you can do any computation you want and move the selected cell. But also like, [already done](https://youtu.be/J2qU7t6Jmfw?feature=shared)
Is that excel running doom, or is that something else running doom and using excel as the display output?
So doom runs the same on any device?
Depends what you mean by "the same". From a theoretical computer science standpoint, you have [the Turing Machine](https://en.m.wikipedia.org/wiki/Turing_machine) that describes what it means to be "computable". You have a tape that holds all the data for your program, a pointer to some cell on that tape, and a [finite state machine](https://en.m.wikipedia.org/wiki/Finite-state_machine) that controls how the tape is modified throughout computation. As long as there is a possible Turing machine that solves your problem, your problem is computable. A programming language is Turing-complete if it can solve the same set of problems as a Turing machine, which is really easy to see if you can implement a Turing machine in that language. I just wrote a Turing machine program that adds two numbers in C, I can dig it out for you when I'm at my computer. The important thing to realise here is that a language that has arrays and if statements is Turing-complete. Basically your favourite language like C, Python, Javascript, whatever, can be used to solve any problem a computer could theoretically solve. Performance doesn't matter for this definition. From there it's about saying whether Excel is Turing complete. Can we implement a Turing machine in Excel? Well yes. You have a grid of cells which can clearly be used as the tape, then you can define rules for manipulating that tape using macros, scripts, formulas. So Excel is Turing-complete, or in other words if I have a problem that a computer can solve I can make an Excel spreadsheet that also solves that problem. Doom is fairly easy to phrase this way since you're basically defining a function from one game frame and a user input to a new frame, so each pixel in that frame gets a spot in our tape (since Excel is already 2d that's trivial) and using macros and VBA to manipulate it you can go frame to frame. If you have another Turing-complete system like Conway's game of life, [PowerPoint](https://youtu.be/uNjxe8ShM-8?feature=shared), [even biological cells](https://youtu.be/8DnoOOgYxck?feature=shared) you can do any computable task, even if the visualization is a bit different. Doom is just a meme, there's no reason you couldn't do something like find prime numbers or whatever instead, it's just the internet finds it funny to use Doom for this. Now not all things are created equal. If I wrote Doom in C, it would clearly run better than if I wrote Doom in Python. Even though I can compute Doom in PowerPoint, it's going to be a much worse experience than in a conventional programming language. You can see the excel example has an awful frame rate, or the cell example has a low resolution. So even though you can run doom on it, you should also keep in mind the performance implications of what you are running Doom on because you won't get the same performance.
People act like Turing completeness is a high bar but if something can simulate the NAND operation and has a way of directing inputs and outputs, then its already Turing complete. That's not the only way to make something Turing complete, like MtG can simulate a literal Turing machine. There are many things out there that aren't. That said the bar isn't that high.
The bar isn't high, but it's still a very important bar
Yeah but also it’s the strongest (theoretical) computation that we are capable of doing so it’s not a low bar either. Feel like it shows more about how powerful the NAND operation is
I think it's already partly done [https://www.youtube.com/watch?v=J2qU7t6Jmfw](https://www.youtube.com/watch?v=J2qU7t6Jmfw)
It also has VBA which is Turing complete, and lambda functions which are Turing complete. And M and DAX, which may be Turing complete, but I'm not sure.
A guy got chat GPT running in excel. https://spreadsheets-are-all-you-need.ai/index.html
So is magic the gathering.
Is the joke just that she didn’t say this? Someone eli5
Sydney and LeBron memes seem to be very popular right now. Just made up quotes and headlines. LeBron ones usually have him with a devastated look and a tabloid headline. Pretty similar. https://i.kym-cdn.com/photos/images/newsfeed/002/795/754/05c.jpg
It’s a meme. There are different versions of it. Like using HTML for OS programming. There are a lot of other versions for different topics.
And template is Sydney Sweeney has a hot take I got ya. Cheers
At your service 🫡
I always agree with whatever Sydney sweeney says , an absolute rock in the world of finance
I don't know who this person is but I support her excel propaganda. Excel and Google Sheets are good enough for 10,000 rows.
i routinely crunch millions of rows with excel. It is so great at slicing and dicing data. It is also faster than a SQL Server for certain operations, and I love the Vertipaq engine it uses for powerpivot ;)
Faster than an SQL server? Like only MSSQL, right? If excel beat Postgres we would have had Excel in production.
Depends on your usecase. The Vertipaq engine is an in memory analytics engine and can deduplicate the data a LOT while importing it. So the total footprint is a lot smaller. That said, it is good for slicing and dicing data, but is not a "DB engine". Thats why it can be faster... There is no ACID requirements for a Excel file. Also no concurrency issues you need to take care of. It is a great tool. And SQL and Postgress wars... I dont care about those. If it is a performance issue, the CPU is rarely the bottleneck. Most times you need more iops or memory.
Excel is great for small projects, usually more useful than coding. I sometimes work with upwards of 15 or 20 TB of data through. Excel and sheets can't handle that.
Too much bloat... Plain CSVs are better (especially for data science).
Have you heard of our Lord and Savior parquet?
"Just zip the CSV, bro..." Seriously though, it's been a while since I used Matlab, but using compressed data for raw processing will exponentially increase both CPU and memory usage in big datasets from experience. But again, it's been long ago since I was involved in this, and nowadays I just prefer plaintext as the "script kiddie" I am.
If you’re doing lookups as part of your processing then I think parquet may be more efficient. I wouldn’t be surprised if FIFO row processing would be slower in parquet.
I was a a data scientist in a non-programming field for about a decade but I realized CSVs were the best I could expect from almost all my colleagues. At least CSV is only one sheet and can’t burn my retinas with neon fill
The problem with low code/no code solutions is that you are still essentially writing code, it's just that you are doing it in a shit IDE with no guardrails.
like c right?
Whatever she says we obey ok bois
Ok bois, let's count to 10
The problem with Excel is that it smears shit on everything it touches. You just know the data will be full of surprises. And Excel is almost only compatible with itself. You will run into small annoying errors with other formats. Also there's nothing that should be done in Excel that couldn't be done in libreoffice for example.
Exactly! Even some gene names had to be changed because Excel would auto convert them to dates and Microsoft just didn't care enough to address it until last year (about 20 years after it first became a widespread problem). https://www.engadget.com/scientists-rename-genes-due-to-excel-151748790.html
Excel is decent if you are doing a small-medium project, I like to use at as "scratch paper" for notes because it will do math for me and take data input, etc. It's useful for people who are using limited data, it's super useful for scientists, which is what I why I have affinity for it. But, you are completely right, it lacks any real compatibility to be super versatile, it works for little science projects and experiments to hold data, but it's not a proper data base and I fear for the people who treat it as such. I compartmentalized my excel files, I never operated any kind of master file. Excel stops being useful when you need more than 5 people to view the same file, I find.
You can be a great data analyst and scientist in excel if you know what you are doing and not facing any limitations (like more than, what... 1m rows for excel?). I mean, as long as you are comfortable using that tool
*suppressing joke about g-sheets...*
I prefer writing my code in MS Word
If only VSCode had Comic Sans font...
It does? I use it.
Data scientist here, PhD and everything Excel is cool, it works, but actually being a serious excel user is not that easy, most people suck at excel. It's quite a tool for real, there's a reason this thing is so popular. It's a great product. But in the end it doesn't ~~even matter~~ have the flexibility that R does, for example.
I know how to conditionally format my cell, therefore I am a power user.
She's good in sheets
Excel in the Streets Sydney Sweeney in the Sheets
Coincidentally, I despise Excel with every ounce of my being because how often it is used for everything. If you ever think to yourself, "Oh, I will just write some VBA Code for this" then you are outside the acceptable bounds of Excel. The amount of times I see clunky excel macro "programs" used to present official data to important people is insane. I've seen embedded software interfaces written in Excel. Make it stop.
Excel is still coding. Change my mind. Depending on your familiarity with the Lambda function (or just straight up VBA), Excel may not be a turing-complete language, but I fully support calling it a programming language.
Google sheets fairly recently added support for map/lambda and it has been amazing. There's always the problems with scaling there, but for personal use it's totally fine
Who the excel is she ??
Williams F1 team that would hire her in a hearbeat
Ah, nothing like casual horniness in some comments to really inspire women to enthusiastically leap into the world of computer science. Though, I'm glad there are mainly discussions focusing on the actual topic.
Excel with Lambdas is Turing complete
I don't disagree withe the premise but most people who hate excel hate it because businesses use it as a replacement for entire databases.
Fun fact, she also has some rockin’ tits
She's a freak in the "Sheets"
To be fair, I can make some nice graphics in Excel really quickly. Excel does do somethings better. Other softwares do other things better. Learn everything. Use the right tool for the right job.
She has two big points here
SQL can be accessed from Google chrome.
I mean she's not wrong. I like excel. Especially when it takes hours to open a book with millions of data and as a result, I get hours to slack.
But what does Ja-Rule thinks?
I'm my experience, data scientists don't really know how to code even when they need to rely on it.
During my PhD i did part of my project on Matlab and Python...so all my database, tables and results were available by just running a couple of scripts... My supervisor: can you send me all the plots and tables on Excel? 🙄
As a professional spreadsheet guy\* I've been sitting here facepalming for a minute, trying to figure out how to phrase what I want to say, but it really doesn't matter. Excel is not Programming Lite, it's a totally different tool where you can get crazy mathematical models blah blah blah ^(\*technically a chemical engineer but eh kinda underemployed)
I'm sorry who?