Azzera filtri
Azzera filtri

how to find the similarity between two text documents

9 visualizzazioni (ultimi 30 giorni)
Jothi
Jothi il 19 Dic 2012
Commentato: info info il 20 Mar 2020
i have two text document.
For example, a.txt file contains ' Hai How R U'.
and b.txt file contains 'Hai How are U'.
How I can calculate the cosine similarity or Euclidean Distance for these two documents (text files).
thanks in advance.
  2 Commenti
Jan
Jan il 19 Dic 2012
The Euclidean Distance requires vektors of the same size. There are different Edit Distances, but I do not know the cosine distance. Perhaps it is better that you explain the details that that we search in WikiPedia.
info info
info info il 20 Mar 2020
i think the best way to give the similarity text is "shinling"
Shingling, a common technique of representing documents as sets. Given the document, its k-shingle is said to be all the possible consecutive substring of length k found within it. An example with k = 3 is given below :
## $Original
## [1] "The sky is blue and the sun is bright."
##
## $Shingled
## [1] "the sky is" "sky is blue" "is blue and" "blue and the"
## [5] "and the sun" "the sun is" "sun is bright"
then we virify if find in our textes
## doc_1 doc_2 doc_3
## the sky is 1 1 1
## sky is blue 1 0 1
## is blue and 1 0 0
## blue and the 1 0 0
## and the sun 1 0 0
## the sun is 1 0 0
## sun is bright 1 0 1
## the sun in 0 1 0
## sun in the 0 1 0
## in the sky 0 1 0
## sky is bright 0 1 0
## we can see 0 0 1
## can see sun 0 0 1
## see sun is 0 0 1
## is bright the 0 0 1
## bright the sky 0 0 1
then calculate .and take the big valeur

Accedi per commentare.

Risposte (1)

Jan
Jan il 19 Dic 2012

Categorie

Scopri di più su Characters and Strings in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by