Channel: Open Codex by Joseph Reagle

Improving PDF Annotations from GoodReader

December 17, 2019, 9:00 pm

≫ Next: "Which organized geeks should I talk to?"

≪ Previous: The officiousness of Wikipedia (my bio still sucks)

For many years now, I’ve printed out PDFs and scribbled annotations on them. I then dictate my annotations (i.e., excerpts and comments) into a text file that I can transform and include in my bibliographic mindmap system (see de.py in thunderdell).

With the purchase of an iPad—I gave up on waiting for a decent Android tablet—I’m now annotating PDFs via the GoodReader app. Of course, the accuracy of the text highlighted is only as good as the PDF. The copyable text, generated by OCR, can have conjoined words or suffer from errors resulting from misunderstood ligatures, accents, or cruft. Also, the actual page number of the PDF probably doesn’t correspond to the document’s pagination.

With the short python script gr-fix.py, I use a dictionary to correct OCR errors and transform from the GoodReader format into that used by de.py. This doesn’t correct everything (e.g., words with capitals) and can introduce a few errors itself—but it’s greatly improved on the original OCR. The --number argument also lets you correct the page numbers by an offset.

↧

↧

Latest Images

‘Pay day every day’ may become Shangri-La Group, BPOs’ secret to happy employees

April 25, 2024, 5:51 am

Nonprofit donates custom home in this East Bay city for Marine injured in...

Nonprofit donates custom home in this East Bay city for Marine injured in...

April 23, 2024, 7:00 am

New private rooms on Tokaido Shinkansen change the way we travel from Tokyo...

New private rooms on Tokaido Shinkansen change the way we travel from Tokyo...

April 22, 2024, 6:00 am

Ukraine bans military from online gambling amid addiction concerns

Ukraine bans military from online gambling amid addiction concerns

April 22, 2024, 5:17 am

ಮಂಡ್ಯದಿಂದ ಸುಮಲತಾ ದೂರ; ಹೆಚ್‌ಡಿಕೆ ಪರ ಪ್ರಚಾರಕ್ಕಿಳಿಯದ ಸಂಸದೆ –ಬರ್ತಾರೆ ನೋಡೋಣ ಎಂದ...

ಮಂಡ್ಯದಿಂದ ಸುಮಲತಾ ದೂರ; ಹೆಚ್‌ಡಿಕೆ ಪರ ಪ್ರಚಾರಕ್ಕಿಳಿಯದ ಸಂಸದೆ –ಬರ್ತಾರೆ ನೋಡೋಣ ಎಂದ...

April 20, 2024, 8:08 pm

OCBC Bank Singapore Offers Up to 2.8% p.a. Fixed Deposit Promotion from 21...

April 20, 2024, 12:38 pm

National Poetry Month 2024: Maxine Starr

National Poetry Month 2024: Maxine Starr

April 19, 2024, 9:56 am

Vegan Chicken Pot Pie

Vegan Chicken Pot Pie

April 19, 2024, 9:18 am

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

April 19, 2024, 7:03 am

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

April 18, 2024, 11:05 am

Trending Articles

A Superb Baroque: Art in Genoa, 1600–1750

March 10, 2020, 2:06 pm

Practice Sheet of Right form of verbs for HSC Students

September 22, 2019, 11:40 pm

At the end of an episode of television show parking wars it says in memory of...

July 25, 2011, 7:33 pm

Jamel Debbouze : 100% Debbouze [FRENCH AC3] [DVDRiP] [MULTI]

April 1, 2013, 11:10 pm

Who Is Jennifer Hines? Bryan Olesen Wife Is Mother Of 3 Kids

March 5, 2024, 2:19 am

FLASH BACK LIVE IN KAHATHUDUWA 2012

January 13, 2017, 12:14 am

Happy Birthday Wishes for Bhabhi in Hindi & English |हैप्पी बर्थडे भाभी

March 13, 2020, 3:01 am

Review of Related Literature and Studies of Hotel Reservation System

November 22, 2018, 6:00 pm

Gamot sa balisawsaw: 6 na mga quick home remedy sa sintomas ng balisawsaw

August 25, 2022, 8:00 pm

Download EFF Album: 12 –“ASINAMALI”

February 18, 2019, 6:51 am

Ready Made Periodical Test Questions for all Grades with TOS (1st - 4th Quarter)

October 28, 2018, 2:19 am

100+ Short Whatsapp Status in English | Short Status Quotes Words

March 22, 2017, 12:27 am

Girls WhatsApp Numbers Collection For Friendship

January 20, 2021, 5:55 pm

Top 500 Stylish Work Symbols for Facebook VIP Copy & Paste

February 24, 2024, 1:50 pm

Natsamrat(2016) Full Marathi Movie HDRip Print Download

July 19, 2016, 7:53 pm

Online Grading System with Grade Viewing Capstone Project

February 27, 2019, 2:08 am

Impotent Little Sissy featuring Mistress Clarissa

July 26, 2017, 7:31 am

SSF_ALERT_CERTEXPIRE: invalid message received in email

October 24, 2015, 11:48 am

Robocopy のエラー (戻り値) について

January 23, 2018, 11:28 pm

Romantic And Impressive Birthday Wishes For Girlfriend - Best Birthday Wishes...

January 30, 2020, 8:41 am

More Pages to Explore .....

Latest Images

‘Pay day every day’ may become Shangri-La Group, BPOs’ secret to happy employees

April 25, 2024, 5:51 am

Nonprofit donates custom home in this East Bay city for Marine injured in...

Nonprofit donates custom home in this East Bay city for Marine injured in...

April 23, 2024, 7:00 am

New private rooms on Tokaido Shinkansen change the way we travel from Tokyo...

New private rooms on Tokaido Shinkansen change the way we travel from Tokyo...

April 22, 2024, 6:00 am

Ukraine bans military from online gambling amid addiction concerns

Ukraine bans military from online gambling amid addiction concerns

April 22, 2024, 5:17 am

ಮಂಡ್ಯದಿಂದ ಸುಮಲತಾ ದೂರ; ಹೆಚ್‌ಡಿಕೆ ಪರ ಪ್ರಚಾರಕ್ಕಿಳಿಯದ ಸಂಸದೆ –ಬರ್ತಾರೆ ನೋಡೋಣ ಎಂದ...

ಮಂಡ್ಯದಿಂದ ಸುಮಲತಾ ದೂರ; ಹೆಚ್‌ಡಿಕೆ ಪರ ಪ್ರಚಾರಕ್ಕಿಳಿಯದ ಸಂಸದೆ –ಬರ್ತಾರೆ ನೋಡೋಣ ಎಂದ...

April 20, 2024, 8:08 pm

OCBC Bank Singapore Offers Up to 2.8% p.a. Fixed Deposit Promotion from 21...

April 20, 2024, 12:38 pm

National Poetry Month 2024: Maxine Starr

National Poetry Month 2024: Maxine Starr

April 19, 2024, 9:56 am

Vegan Chicken Pot Pie

Vegan Chicken Pot Pie

April 19, 2024, 9:18 am

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

April 19, 2024, 7:03 am

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

April 18, 2024, 11:05 am

© 2024 //www.rssing.com