How to read table from pdf
Mostra commenti meno recenti
I have a pdf, it text within a table
I am able to read the text into a varible, but then i get a string with all the text in it.
i make use of extractFileText to read it into a string.
How can i then turn this text into a table?
I've pasted a sample of the string i read in, it has no table column names, its just actual data
So what i want to do is ignore the first to rows below and from there you see three records (lines)
Each line needs to be a row in the table, and the delimeter between each column value is the three arrows (which i think is a newline)
Weekly Gazettes 1 ↵↵↵
NEW SOUTH WALES WEEKLY ISSUE ↵ ↵↵↵
3 RIVERS ESTATE, 140 001 976 ↵↵↵374 KALKITE RD KALKITE NSW 2627 ↵↵↵Creditor: CONSULT SURVEY GRA PTY LTD ↵↵↵DEFAULT JUDGEMENT (NSW) 02/11/2020 ↵↵↵00262008/20/163, $113,237.00 ↵↵↵
ABCD PROJECTS, 618 354 331 ↵↵↵8 17 GARTMORE AVE BANKSTOWN NSW 2200 ↵↵↵Creditor: WORKERS COMPENSATION NOMINAL I ↵↵↵DEFAULT JUDGEMENT (NSW) 03/11/2020 ↵↵↵00063818/20/METN, $2,553.00 ↵↵↵
ABOUT CONCRETE CONSTRUCTIONS, 156 080 241 ↵↵↵46 NEW HORIZON AVE BAHRS SCRUB QLD 4207 ↵↵↵Creditor: HUSQVARNA AUSTRALIA PTY LTD ↵↵↵DEFAULT JUDGEMENT (NSW) 03/11/2020 ↵↵↵00223837/20/3, $1,298.00 ↵↵↵
AC SHOPFITTING SPECIALIST, 635 292 376 ↵↵↵12 CURTIN ST CABRAMATTA NSW 2166 ↵↵↵Creditor: WORKERS COMPENSATION NOMINAL I ↵↵↵DEFAULT JUDGEMENT (NSW) 06/11/2020 ↵↵↵00266709/20/METN, $5,191.00 ↵↵↵
ACN 607735080, 607 735 080 ↵↵↵14 BARNES ST WOOLGOOLGA NSW 2456 ↵↵↵Creditor: BIDFOOD AUSTRALIA LTD ↵↵↵DEFAULT JUDGEMENT (NSW) 02/11/2020 ↵↵↵00271889/20/METN, $9,891.00 ↵↵↵
6 Commenti
dpb
il 22 Nov 2020
Be much easier for somebody to mess with if you attached a .mat file with the variable as you've read it. Plus would eliminate what may have been introduced by the translation to text in the edit box...
Rizwan Khan
il 23 Nov 2020
dpb
il 23 Nov 2020
You didn't attach what asked for -- the array your script returns as the variable str as a .mat file so can see what you really get.
I don't have the Text Analytics Toolbox so can't run your code to produce the data you're trying to do something with.
I don't trust the pasted text to be accurate-enough representation of the actual variable content to make it worth the effort to fool with.
Rizwan Khan
il 24 Nov 2020
Rizwan Khan
il 25 Nov 2020
Risposte (1)
Mathieu NOE
il 23 Nov 2020
hello
I don't know where the function extractFileText comes from
So I'd did it my way : converted the pdf in excel file (on internet) and then was very easy:
T = readtable('weekly-gazettes-12-11-20-converti.xlsx');
C = table2cell(T)
C =
133×2 cell array
{'ABCD PROJECTS, 618 …'} {'DEFAULT JUDGEMENT (…'}
{'ABOUT CONCRETE CONS…'} {'DEFAULT JUDGEMENT (…'}
{'AC SHOPFITTING SPEC…'} {'DEFAULT JUDGEMENT (…'}
{'ACN 607735080, 607 …'} {'DEFAULT JUDGEMENT (…'}
{'ACP ACCOUNTANTS & C…'} {'DEFAULT JUDGEMENT (…'}
etc......
9 Commenti
Stephen23
il 23 Nov 2020
"I don't know where the function extractFileText comes from"
Mathieu NOE
il 23 Nov 2020
Stephen
tx for the info
Rizwan Khan
il 23 Nov 2020
Rizwan Khan
il 23 Nov 2020
Mathieu NOE
il 24 Nov 2020
hello
just googled API pdf to excel and got answers like :
Tabex Pdf to Excel Api is a powerful tool for data extraction and data capture from pdf to one of the Excel formats. The API allow to identify tabular structures within pdf documents, being them scanned or editable, and export these tabular structure to Excel. The API available output Excel formats are XLS and XLSX.
pdfextractoronline.com/pdf-to-excel-api/
Rizwan Khan
il 25 Nov 2020
Mathieu NOE
il 25 Nov 2020
hello
use regexp to extract / split the content of each cell
see :
Rizwan Khan
il 26 Nov 2020
dpb
il 27 Nov 2020
Sure. See split
Categorie
Scopri di più su Spreadsheets in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!