# 文本聚類 * Clustering is the typical **numerical analysis** which tries to group together like observations based on commonality or closeness of the observation data points. * In text analysis, we are almost repeating the same operation with clusters: trying to determine the relationships between word usage across a document. **Text clustering** refers to the task of identifying the clustering structure of a corpus of text documents and assigning documents to the identified cluster(s). * 常用的方法：Two typical types of clustering algorithms, i.e., connectivity-based clustering (a.k.a., hierarchical clustering) and centroid-based clustering (e.g., k-means clustering). * `k-means`clustering: reduces the sum of squares differences between relationships and group/cluster words where the distances are minimized to the thresholds specified, in this case, the number of clusters specified. ```r > library(stats) > mymeans <- kmeans(dtm,5) > mymeans ``` ```r > freq <- findFreqTerms(dtm,10) ``` #### 多變量統計 --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://lab-of-ontologies-language-proce.gitbook.io/ladsbook/part-iii-wen-ben-fen-xi-ying-yong-yu-bao-gao/text-clustering.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.