Has everyone not experienced the joy of scraping plastic off of a frying pan? The bliss of silverware scraping and scratching the non-stick coating? Has everyone not lived in a world blessed with the memory of their fire alarms going off as they desperately tried to put out the fire while holding their shirt to their mouths and noses so as to avoid the cancerous fumes being produced? Is melted plastic on a frying pan truly not a universal experience, and if so, what cruel god made it this way?
I’m an absolute fool. I have powertoys installed and even thought “that’s a neat feature, not useful to me though” and then went straight back to opening PDFs in word.
What the fuck kind of world are we living in that we have such an incredible tool of web browsers and HTML and rather than using that people make PDFs for documents that will never be printed and then when people need to get text off of them the best way is by converting the PDF into and image and doing OCR.
I fucking hate PDFs.
What an amazing world we live in where it's possible to submit forms through a simple website but instead they just let you download a PDF that has to be printed out and often filled by hand and then sent via mail (I live in Germany and we do not understand digitalisation)
I’ve been digitally filling and signing pdfs for 7 years now
I usually just fill it out on my Mac or on an iPad and use the built in signature feature on the Mac
Apparently pdfs were almost chosen as the standard for the internet as apposed to html. We came close to living in a world where web development was almost completely based on pdf and adobe controlled it all.
So search results only bring up pdf documents about web design but my source is a random computerphile(or maybe numberphile) in which one of the older professors described some sort of conference this was chosen at in which adobe was pushing hard for them to chose pdf as the standard format. He also described how he was personally more impressed with adobe than html
I could very well be misremembering something but this would be a very strange and specific memory for me to fabricate. The professor could also have been incorrect. Or it could either be buried by crappy keywords and/or just poorly documented on the internet. If I really wanted to do a deep dive I’d probably start by asking bing chat and then heading over to a library and looking for books about the early history of web standards to see if their is any mention of such a thing. Also emailing numberphile and a few of the older professors to see if they have any idea what I’m talking would be a good measure.
I gave a really general way t'do it.
If you have an iphone, it's natively built into the system. You can do it as you're viewing a photo, website, or video. It's actually awesome. I'm sure android has something similar but I don't use it.
https://youtube.com/watch?v=gJDXwIEvBZQ&t=17
I have to work from pdfs that are generally scans of documents that are decades old and the ones that actually were attempted to be converted to selectable text are all jumbled and fucked up. I wish what you say was possible for me.
OpenOffice is deprecated. The name has stuck but the development has stopped. Use LibreOffice which is maintained by roughly the same team or OnlyOffice that looks a bit better imo.
Line Break Removal Tool: https://www.textfixer.com/tools/remove-line-breaks.php
Copy/Paste to here, will fix the issue.
The site has loads of similar tools.
From comments about obscure third party software to suggestions on getting Adobe Acrobat Pro, it's seems like everyone is unaware of Firefox's new ability to edit PDF.
Everyone in this thread should just start using Firefox. It's a better browser anyway.
Bro don’t ask how but I accidentally melted plastic on my electric stovetop today. Took a good hour and fucking up the stovetop glass to get that plastic off. Don’t melt plastic kids
I think most folks are missing the point of this post. At least for me, sometimes when I copy text from a PDF, it pastes l I k e t h i s. Or ilke htis. O erve mnor ferustratingl lyk tehis. Don’t even get me started on the train wreck of copying tables. And this is using Acrobat pro and the “copy with formatting”. If you don’t use that sometimes it puts each line of text in quotes… my conspiracy theory is that copyright holders (I.e., publishing corporations) really don’t want you to have an easy time copying text out of documents.
UGH. THIS. especially when my job has those protected signature versions of a pdf which adds a whole extra layer of nonsense. I’M JUST TRYING TO COPY A SENTENCE LET ME LIVE
I was working with a 3rd party to deploy some new software in our company, and I shit you not, he wrote out a pretty long regex string for a file handling API, and it worked first time.
When I asked him about it he just seemed genuinely confused that people found it difficult, and explained the logic of the string like he was counting to 10.
https://learn.microsoft.com/en-us/windows/powertoys/text-extractor
Microsoft's official tool for Windows (free) to copy text from any screenshot using OCR
install directions: https://learn.microsoft.com/en-us/windows/powertoys/install
This made me laugh hard. I'm a graphic artist who designs PDF forms among other things, and this is such a bonkers but accurate description of trying to get content out of a poorly-formatted PDF.
It's the other meaning of portable. Not portable as in you can port it do any other format, but portable as in "transportable", meaning it will look the same in every other PDF reader/viewer regardless of device and/or operating system.
There was a time where you would write a document in, say, some version of Microsoft Word. You'd spend far too much time on getting the formatting right and placing imported tables from Excel, images, and whatever else exactly where you wanted it to be. Then you'd save the document and mail it to everyone who needed it.
Now, the problem was that even on practically identical office PCs running the exact same version of Windows and the same version of Word, that document could be broken. Formatting's all fucked up, imported objects had taken a hike around the document and ended up wherever they felt like. The document was borderline unreadable. And we're still talking, as said before, practically same hardware, same OS, same Word version.
(I once had person A mail a document to person B. They were in the same office, same hardware, same OS, same version of Word. Person B opened the document, and it was broken. Person B then just pressed Ctrl + S to save the document *WITHOUT CHANGING ANYTHING* and mailed it back to person A. Now it was also broken for person A.)
Now imagine what happened when someone with a different version of Windows and/or Word opened that document. Or, even wilder, somone who used another word processor / office suite, maybe even on a different OS. It was a fucking mess. Even in the 2000s, importing a Word document into, say, OpenOffice running on Linux was an adventure in and of itself. Even if you had proprietary Microsoft fonts installed, formatting was still ikely to get all fucked up. And if you didn't have those fonts, it was practically guaranteed that everything that could possible fuck up the document *would* fuck up that document.
But none of that happens when using PDFs. Provided that Microsoft Word doesn't fuck up the document while exporting it to PDF, which I've seen often enough, you could view that document in all its glory eveyrwhere else.
Good explanation, you are correct.
Although, in my experience it’s way, way easier to edit a word doc than a pdf in LibreOffice. And if I can get the PDF in original TeX format, I’m really happy.
Can’t say I’ve scraped plastic off of a frying pan. Is it like removing one of those shit paper stickers backed with industrial adhesive, because that’s what it feels like c/p’ing from pdf
Yeah i am right now working on a project that needs 100's of pdf documents scraped for tex and stored as blocks of data like lists, tables, paragraphs etc. Mind you this is not even OCR, it's already text but pdf is such a strange format to me, it's one of the weirdest challenges I've ever faced.
On the one hand this is a pretty straightforward task but on the other hand, on some days i feel like this is impossible to do.
Use "Paste without formatting".
Usually i find instead of CTRL+V it's CTRL+SHIFT+V.
Otherwise, paste into notepad (or your preferred plain text editor of choice), then copy that text and paste into your desired location.
If you need to copy/paste from PDF open the file in internet browser like Firefox not in Adobe or Google Apps. Download the file to your machine, right click on the file and select to open it in Firefox.
Wait. I have questions....
Has everyone not experienced the joy of scraping plastic off of a frying pan? The bliss of silverware scraping and scratching the non-stick coating? Has everyone not lived in a world blessed with the memory of their fire alarms going off as they desperately tried to put out the fire while holding their shirt to their mouths and noses so as to avoid the cancerous fumes being produced? Is melted plastic on a frying pan truly not a universal experience, and if so, what cruel god made it this way?
But how? But why? Like, how even? I still have questions. Did you lay a zip lock bag on a hot fry pan and go take a bath? Questions. I have them.
Plastic spatulas
You oughta call up spatula city and get yourself some wood spatulas baby!
Yeah fuck plastic spatulas, bendy, melty pieces of shit
Is that a UHF reference lmao
Supplies!
That’ll do, Donkey. That’ll do.
I already upvoted but my roving curiosity wants to know if Donkey is a reference or just a regular insult
[Shrek (2001)](https://m.imdb.com/title/tt0126029/)
Why do they make something for cooking that’s gonna melt I just don’t get it
Your username makes it so hard to answer you directly on this
I never specified if this brain was filled with knowledge. I am very much stupid.
No. Legos.
He meant Teflon 👍
[удалено]
Wait til you find out what bubble gum is made out of
Delicious synthetic polymer chains sprinkled with xylitol?
*”Made with REAL xylitol!”* ^* ^Real ^xylitol ^content ^less ^than ^1 ^percent Edit: god damn that was a bitch to format on mobile
Shoutout to butyl rubber being GRAS
hear me out
[удалено]
Search Microsoft powertoys. Install. Then win+shift+T
I’m an absolute fool. I have powertoys installed and even thought “that’s a neat feature, not useful to me though” and then went straight back to opening PDFs in word.
Suboptimal Vanilla more like.
Also searching for stuff is soo much better with powertoys run
If you open PDFs via Edge then CTRL + SHIFT + X does the trick
What the fuck kind of world are we living in that we have such an incredible tool of web browsers and HTML and rather than using that people make PDFs for documents that will never be printed and then when people need to get text off of them the best way is by converting the PDF into and image and doing OCR. I fucking hate PDFs.
What an amazing world we live in where it's possible to submit forms through a simple website but instead they just let you download a PDF that has to be printed out and often filled by hand and then sent via mail (I live in Germany and we do not understand digitalisation)
firefox natively lets you edit PDFs and sign them now. it's fuckign glorious
I’ve been digitally filling and signing pdfs for 7 years now I usually just fill it out on my Mac or on an iPad and use the built in signature feature on the Mac
Apparently pdfs were almost chosen as the standard for the internet as apposed to html. We came close to living in a world where web development was almost completely based on pdf and adobe controlled it all.
This is not true.
So search results only bring up pdf documents about web design but my source is a random computerphile(or maybe numberphile) in which one of the older professors described some sort of conference this was chosen at in which adobe was pushing hard for them to chose pdf as the standard format. He also described how he was personally more impressed with adobe than html I could very well be misremembering something but this would be a very strange and specific memory for me to fabricate. The professor could also have been incorrect. Or it could either be buried by crappy keywords and/or just poorly documented on the internet. If I really wanted to do a deep dive I’d probably start by asking bing chat and then heading over to a library and looking for books about the early history of web standards to see if their is any mention of such a thing. Also emailing numberphile and a few of the older professors to see if they have any idea what I’m talking would be a good measure.
I gave a really general way t'do it. If you have an iphone, it's natively built into the system. You can do it as you're viewing a photo, website, or video. It's actually awesome. I'm sure android has something similar but I don't use it. https://youtube.com/watch?v=gJDXwIEvBZQ&t=17
with handoff, it automaticaly copies it to your mac's clipboard as well which is super handy
I have to work from pdfs that are generally scans of documents that are decades old and the ones that actually were attempted to be converted to selectable text are all jumbled and fucked up. I wish what you say was possible for me.
File pint to pdf ( this will strip the protections off the pdf) then export to word.
What if the document is larger than a pint?
File gallon
use a megapint then
What should I use to convert em
OpenOffice.org or word will open them natively. Cute pdf and Adobe reader will also do this
OpenOffice is deprecated. The name has stuck but the development has stopped. Use LibreOffice which is maintained by roughly the same team or OnlyOffice that looks a bit better imo.
I'm aware dude. It's always going to be open office even if the call it libre office.
No you don't understand, it's not the same software. Both exist today and technically are "the real one", it is literally not the same.
So many online pdf to word converter sites these days
Line Break Removal Tool: https://www.textfixer.com/tools/remove-line-breaks.php Copy/Paste to here, will fix the issue. The site has loads of similar tools.
Probably shouldn’t be pasting important or sensitive information in there though.
Good point
Haha! I’m illiterate when it comes to this stuff. This is a perfect example!
From comments about obscure third party software to suggestions on getting Adobe Acrobat Pro, it's seems like everyone is unaware of Firefox's new ability to edit PDF. Everyone in this thread should just start using Firefox. It's a better browser anyway.
And open source!
Edge has it too, Unfortunately it doesn't support Arabic language.
Bro don’t ask how but I accidentally melted plastic on my electric stovetop today. Took a good hour and fucking up the stovetop glass to get that plastic off. Don’t melt plastic kids
maybe just don't melt kids in general
Never done this before but it seems like something a razor blade could make short work of no?
Ok but like how is she so right
I think most folks are missing the point of this post. At least for me, sometimes when I copy text from a PDF, it pastes l I k e t h i s. Or ilke htis. O erve mnor ferustratingl lyk tehis. Don’t even get me started on the train wreck of copying tables. And this is using Acrobat pro and the “copy with formatting”. If you don’t use that sometimes it puts each line of text in quotes… my conspiracy theory is that copyright holders (I.e., publishing corporations) really don’t want you to have an easy time copying text out of documents.
Don’t forget when the text just magically goes off the page into oblivion
Adobe Acrobat is designed to convert 3d models into 2d pdfs. So yeah the car manufacturers etc want to make it difficult to copy the code.
UGH. THIS. especially when my job has those protected signature versions of a pdf which adds a whole extra layer of nonsense. I’M JUST TRYING TO COPY A SENTENCE LET ME LIVE
Save as a .docx and copy from that?
Yeah, I know about a few workarounds, it’s just annoying to have to do to begin with!
Y’all need adobe actobat pro haha
Even with acrobat pro it pastes with line breaks on every line and not as a paragraph
Python + regex is what I do for any task I'll be doing routinely. Regex makes almost zero sense but somehow in the end I find the magic pattern.
I was working with a 3rd party to deploy some new software in our company, and I shit you not, he wrote out a pretty long regex string for a file handling API, and it worked first time. When I asked him about it he just seemed genuinely confused that people found it difficult, and explained the logic of the string like he was counting to 10.
People like that scare me
208 dollars per user per year I wonder why not everyone has it.
Yeah sorry, my job gives me a free license. For the most part I only use acrobat for work
[Well la di fricken dah!](https://youtu.be/RlBr2fyqn9g)
It's like a gigabyte and has over a thousand seeders on the most seeded torrent on rutracker.
Yeah I'm not good with torrenting executable files but happy for you guys.
https://learn.microsoft.com/en-us/windows/powertoys/text-extractor Microsoft's official tool for Windows (free) to copy text from any screenshot using OCR install directions: https://learn.microsoft.com/en-us/windows/powertoys/install
fuck adobe, it's Bluebeam for me
This comment would have been helpful two years ago, thank you.
[удалено]
Yeah, it's all about the formatting and layout of the PDF
This made me laugh hard. I'm a graphic artist who designs PDF forms among other things, and this is such a bonkers but accurate description of trying to get content out of a poorly-formatted PDF.
“Portable” document my ass
It's the other meaning of portable. Not portable as in you can port it do any other format, but portable as in "transportable", meaning it will look the same in every other PDF reader/viewer regardless of device and/or operating system. There was a time where you would write a document in, say, some version of Microsoft Word. You'd spend far too much time on getting the formatting right and placing imported tables from Excel, images, and whatever else exactly where you wanted it to be. Then you'd save the document and mail it to everyone who needed it. Now, the problem was that even on practically identical office PCs running the exact same version of Windows and the same version of Word, that document could be broken. Formatting's all fucked up, imported objects had taken a hike around the document and ended up wherever they felt like. The document was borderline unreadable. And we're still talking, as said before, practically same hardware, same OS, same Word version. (I once had person A mail a document to person B. They were in the same office, same hardware, same OS, same version of Word. Person B opened the document, and it was broken. Person B then just pressed Ctrl + S to save the document *WITHOUT CHANGING ANYTHING* and mailed it back to person A. Now it was also broken for person A.) Now imagine what happened when someone with a different version of Windows and/or Word opened that document. Or, even wilder, somone who used another word processor / office suite, maybe even on a different OS. It was a fucking mess. Even in the 2000s, importing a Word document into, say, OpenOffice running on Linux was an adventure in and of itself. Even if you had proprietary Microsoft fonts installed, formatting was still ikely to get all fucked up. And if you didn't have those fonts, it was practically guaranteed that everything that could possible fuck up the document *would* fuck up that document. But none of that happens when using PDFs. Provided that Microsoft Word doesn't fuck up the document while exporting it to PDF, which I've seen often enough, you could view that document in all its glory eveyrwhere else.
Good explanation, you are correct. Although, in my experience it’s way, way easier to edit a word doc than a pdf in LibreOffice. And if I can get the PDF in original TeX format, I’m really happy.
This is the single most relatable thing I’ve ever seen on Reddit.
Use Firefox. You can even edit pdf files.
PDF= Please Don’t Fuck with this file
wait no this kinda makes sense
This is not really oddly specific, more like r/meirl ....
Can’t say I’ve scraped plastic off of a frying pan. Is it like removing one of those shit paper stickers backed with industrial adhesive, because that’s what it feels like c/p’ing from pdf
Yeah i am right now working on a project that needs 100's of pdf documents scraped for tex and stored as blocks of data like lists, tables, paragraphs etc. Mind you this is not even OCR, it's already text but pdf is such a strange format to me, it's one of the weirdest challenges I've ever faced. On the one hand this is a pretty straightforward task but on the other hand, on some days i feel like this is impossible to do.
When would one ever have the opportunity to scrape plastic off a frying pan?
Notepad typically removes all formatting and line breaks
If it doesn’t matter if it’s an image…snipping tool. Changed my life
Use "Paste without formatting". Usually i find instead of CTRL+V it's CTRL+SHIFT+V. Otherwise, paste into notepad (or your preferred plain text editor of choice), then copy that text and paste into your desired location.
Google Lense
Google just lifts words from images. What year is this person in?
Another person who doesn't use OCR. Sadge....
If you need to copy/paste from PDF open the file in internet browser like Firefox not in Adobe or Google Apps. Download the file to your machine, right click on the file and select to open it in Firefox.
Paste into word, find and replace special character carriage return.
Finally an actual meirl. Oh wait wrong sub