How to read table from pdf

Question

Apri in MATLAB Online

0 voti

I have a pdf, it text within a table

I am able to read the text into a varible, but then i get a string with all the text in it.

i make use of extractFileText to read it into a string.

How can i then turn this text into a table?

I've pasted a sample of the string i read in, it has no table column names, its just actual data

So what i want to do is ignore the first to rows below and from there you see three records (lines)

Each line needs to be a row in the table, and the delimeter between each column value is the three arrows (which i think is a newline)

Weekly Gazettes   1 ↵↵↵
NEW SOUTH WALES WEEKLY ISSUE ↵ ↵↵↵
3 RIVERS ESTATE, 140 001 976 ↵↵↵374 KALKITE RD KALKITE NSW 2627 ↵↵↵Creditor: CONSULT SURVEY GRA PTY LTD ↵↵↵DEFAULT JUDGEMENT (NSW) 02/11/2020 ↵↵↵00262008/20/163, $113,237.00 ↵↵↵
ABCD PROJECTS, 618 354 331 ↵↵↵8 17 GARTMORE AVE BANKSTOWN NSW 2200 ↵↵↵Creditor: WORKERS COMPENSATION NOMINAL I ↵↵↵DEFAULT JUDGEMENT (NSW) 03/11/2020 ↵↵↵00063818/20/METN, $2,553.00 ↵↵↵
ABOUT CONCRETE CONSTRUCTIONS, 156 080 241 ↵↵↵46 NEW HORIZON AVE BAHRS SCRUB QLD 4207 ↵↵↵Creditor: HUSQVARNA AUSTRALIA PTY LTD ↵↵↵DEFAULT JUDGEMENT (NSW) 03/11/2020 ↵↵↵00223837/20/3, $1,298.00 ↵↵↵
AC SHOPFITTING SPECIALIST, 635 292 376 ↵↵↵12 CURTIN ST CABRAMATTA NSW 2166 ↵↵↵Creditor: WORKERS COMPENSATION NOMINAL I ↵↵↵DEFAULT JUDGEMENT (NSW) 06/11/2020 ↵↵↵00266709/20/METN, $5,191.00 ↵↵↵
ACN 607735080, 607 735 080 ↵↵↵14 BARNES ST WOOLGOOLGA NSW 2456 ↵↵↵Creditor: BIDFOOD AUSTRALIA LTD ↵↵↵DEFAULT JUDGEMENT (NSW) 02/11/2020 ↵↵↵00271889/20/METN, $9,891.00 ↵↵↵

6 Commenti
Mostra 4 commenti meno recenti Nascondi 4 commenti meno recenti

Stephen23 il 24 Nov 2020

"i guess this remains an open issue, and unsure how to resolve."

You could upload a .mat file of the imported data, just as dpb requested here.

Rizwan Khan il 25 Nov 2020

Dear Sir,

if we see my text i pasted from teh variable.

Then, each of those arrows represents a new variable.

How can i loop through them using that arrow (left arrow) as a delimeter?

The record completed after the currency.

So the problem no longer is how i read pdf, i am doing that, the problem now is, how do i loop through that str which has all the pdf content?

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Mathieu NOE il 23 Nov 2020

Apri in MATLAB Online

0 voti

weekly-gazettes-12-11-20-converti.xlsx

hello

I don't know where the function extractFileText comes from

So I'd did it my way : converted the pdf in excel file (on internet) and then was very easy:

T = readtable('weekly-gazettes-12-11-20-converti.xlsx');
C = table2cell(T)
C =
  133×2 cell array
    {'ABCD PROJECTS, 618 …'}    {'DEFAULT JUDGEMENT (…'}
    {'ABOUT CONCRETE CONS…'}    {'DEFAULT JUDGEMENT (…'}
    {'AC SHOPFITTING SPEC…'}    {'DEFAULT JUDGEMENT (…'}
    {'ACN 607735080, 607 …'}    {'DEFAULT JUDGEMENT (…'}
    {'ACP ACCOUNTANTS & C…'}    {'DEFAULT JUDGEMENT (…'}
    
    etc......

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Rizwan Khan il 26 Nov 2020

Thanks Mathieu,

Why do i need to use regular expressions, if i have a common delimeter between each variable?

Can i somehow just use the delimeter?

dpb il 27 Nov 2020

Sure. See split

Accedi per commentare.

How to read table from pdf

6 Commenti
Mostra 4 commenti meno recenti Nascondi 4 commenti meno recenti

Risposte (1)

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Categorie

Prodotti

Release

Tag

Community Treasure Hunt

How to read table from pdf

6 Commenti Mostra 4 commenti meno recenti Nascondi 4 commenti meno recenti

Risposte (1)

9 Commenti Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Categorie

Prodotti

Release

Tag

Vedere anche

Community Treasure Hunt

6 Commenti
Mostra 4 commenti meno recenti Nascondi 4 commenti meno recenti

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti