2807

Assignment 1: Python Basics
Q1. Document Term Matrix
1. Define a function called compute_dtm as follows:
Take a list of documents, say as a parameter
Tokenize each document into lower-cased words without any leading and trailing
punctuations (Hint: you can refer to the solution to the Review Exercise at the end of
Python_II lecture notes)
Let denote the list of unique words in
Compute (i.e. document-term matrix), which is a 2-dimensional array created from
the documents as follows:
Each row (say ) represents a document
Each column (say ) represents a unique word in
Each cell is the count of word in document . Fill 0 if word does not
appear in document
Return and .
docs
words docs
dtm
i
j words
(i, j) j i j
i
dtm words
Q2. Performance Analysis
1. Suppose your machine learning model returns a one-dimensional array of probabilities as the
output. Write a function “performance_analysis” to do the following:
Take three input parameters: probability array, ground-truth label array, and a threshold
If a probability > , the prediction is positive; otherwise, negative
Compare the predictions with the ground truth labels to calculate the confusion matrix as
shown in the figure, where:

Attachments:

Assignment-1-….pdf