Iramuteq DIR326.2
Importing Corpus into Iram Teec
Steps to Import the Document
- Begin by clicking the red button with a "T" at the top of the Iram Teec interface to import your corpus.
- Select the folder containing your corpus file and click "Open" to proceed with the import process.
- In the import window, set character encoding to UTF-8, which is essential for proper text processing.
- Choose Portuguese as the language of your text in the language settings and enable the default dictionary option.
- After confirming these settings, click "OK." A confirmation window should display 60 documents if successful.
Analyzing Text Segments
Understanding Document Segmentation
- The imported corpus consists of 60 documents divided into 285 segments, where each segment represents fragments of up to three lines.
- Occurrences refer to total word counts in the text, while forms indicate different classes of words present in the document.
- The next step involves generating graphs for analysis; access this feature through "Text Analysis" on the interface.
Configuring Analysis Settings
Customizing Word Classes for Analysis
- In properties settings, you can customize which word classes are included in your analysis based on relevance.
- Input zero for any word class that does not interest you (e.g., definite articles, pronouns), effectively excluding them from analysis.
- This customization simplifies data interpretation by focusing only on relevant linguistic elements before finalizing settings.
Interpreting Graphical Data
Understanding Bidimensional Graph Representation
- The generated graph is bidimensional: X-axis shows word count while Y-axis indicates frequency of each word's appearance.
Analysis of Textual Data
Understanding Specificity in Text Analysis
- The analysis begins with a focus on the frequency of specific words, emphasizing the importance of understanding their occurrence within documents.
- Users are guided to open a pre-configured window for analyzing text specificity, ensuring that their document is selected before proceeding.
- A generated table displays word occurrences across different documents, indicating how often each word appears and its relevance to each document.
- The concept of "aderência" (adherence) is introduced, explaining that higher numbers indicate greater frequency and relevance of a word in a particular document.
- This analysis allows users to discern which documents discuss certain themes more prominently, providing insights into content focus.
Contextual Analysis of Word Usage
- Users can double-click on specific words to reveal their context within the text, enhancing understanding of how terms are applied by authors or judges.
- By examining contexts where terms like "moral" are used frequently, users gain insight into judicial reasoning and decision-making processes.
- Understanding these contexts aids future legal writing by aligning arguments with established judicial language and preferences.
Classification Methodology
- Transitioning to classification analysis using the Rener method, users are instructed to click through familiar settings without needing extensive reconfiguration.
- The tool generates graphical representations based on document similarities, showcasing how texts cluster based on shared characteristics.
Insights from Graphical Representations
- The speaker reflects on personal experiences learning this analytical tool over a year, highlighting initial challenges faced in utilizing it effectively.
- The generated graphs categorize 60 judicial decisions into five groups based on textual similarity, streamlining data interpretation significantly.
Value of Automated Document Analysis
- Each group’s size is represented visually; larger bars indicate more documents sharing similar content or themes.
- Percentages within these groups illustrate the proportion of documents discussing similar topics—valuable information that would otherwise require manual sorting and reading.
Understanding Document Similarity Analysis
Hierarchical Reading of Classes
- The tool provides guidance on how to read documents, emphasizing a logical sequence rather than random access.
- It suggests starting with Class 2, then moving to Class 3, and subsequently returning to Class 1 before proceeding to Class 4.
- This structured approach highlights the interconnectedness of classes, indicating a hierarchy in document reading.
Visualizing Data with Graphs
- A button within the tool generates vertical graphs that display frequent words within each class.
- For instance, in Class 2, the most common word is "pleitear," alongside others like "alegar" and "pagamento," hinting at themes related to labor law claims.
- These frequent terms help users understand the context and content of each class more effectively.
Grouping Documents by Themes
- The analysis separates documents into groups based on shared themes but does not initially identify which documents belong to which group.
- By using specific color codes (e.g., red for one theme and green for another), users can isolate and read documents that discuss similar topics within labor law.
Conducting Similarity Analysis
- Users initiate a similarity analysis by selecting relevant words and their frequencies from a provided list.
- A graph generated from all words may appear cluttered; thus, filtering for words appearing more than a specified frequency (e.g., greater than 20 times) is recommended.
Finalizing Insights from Judicial Decisions
- The analysis focuses on extracting insights from judicial decisions regarding labor law issues such as overtime claims.
Analysis of Judicial Sentences and Graphical Representation
Downloading Judicial Sentences
- The speaker discusses the process of downloading judicial sentences from a tribunal's website to create a corpus for analysis.
- Emphasizes the ability to analyze judges' profiles or specific branches of law, highlighting the versatility in data extraction.
Understanding Graphical Connections
- Introduces a graph showing connections between words, where thicker lines indicate stronger relationships among terms.
- Notes that the word "reclamante" (claimant) appears frequently alongside "pagamento" (payment), suggesting a strong correlation in legal texts.
Payment Contextualization
- Discusses various types of payments related to claims, such as additional payments, indemnities, and moral damages.
- Refers to 60 judicial decisions previously analyzed, indicating an ongoing exploration of these cases.
Enhancing Graphical Analysis
- The speaker encourages improving the graphical representation by preserving previous configurations for ease of analysis.
- Explains how adjusting settings can help visualize connections more clearly without needing to reconfigure everything.
Grouping Data Insights
- Introduces a feature that groups related terms within the graph based on their connections and contexts.
- Highlights how different groups are formed around concepts like payment types and company structures (e.g., LTDA).
Final Thoughts on Data Visualization
- Suggestion to disable certain functions while maintaining community grouping for clearer insights into data relationships.
Graph Analysis Techniques
Introduction to Graph Configuration
- The speaker emphasizes the importance of preserving previous settings when generating a new graph by clicking a specific button.
- After selecting "Communities," the graph displays groups with distinct colors, enhancing visual differentiation.
Understanding Graph Features
- The final configuration involves disabling certain options to focus solely on community representation, which simplifies the analysis.
- The thickness of connections in the graph indicates relationship intensity; thicker lines represent stronger relationships while thinner lines indicate weaker ones.
Word Cloud Analysis
- Transitioning to text analysis, the speaker introduces word clouds and explains how to set dimensions for better visualization (height: 400, width: 400).
- A maximum of 100 words is recommended for clarity; too many words can lead to a cluttered display.
Interpreting Word Clouds
- The size of words in the cloud reflects their frequency in the analyzed documents; larger words appear more frequently.
- Specific terms related to labor law are highlighted as examples, demonstrating how word clouds can reveal thematic relevance in texts.
Final Steps and Submission Guidelines
- Participants are instructed to select graphs for their analyses and submit them by Monday of the following week.
- Emphasis is placed on analyzing five key graphs for sufficient understanding before moving forward with written analyses.
Discussion on Team Rivalries
Light-hearted Banter about Football Teams
- A humorous exchange occurs regarding football teams, particularly Corinthians and Flamengo, showcasing regional rivalries.
Cultural Commentary on Fan Identity
- Discussion touches upon fan identities and perceptions among supporters of different teams, highlighting passionate affiliations.
Recipe for Improvement
Introduction to the Session
- The speaker humorously addresses a colleague, mentioning their shared interest in football (Flamengo), indicating a light-hearted atmosphere.
- The speaker expresses concern about managing their responsibilities as a teacher while being on medical leave, highlighting the importance of student progress.
Preparing for Analysis
- The speaker discusses the need to select specific graphs for analysis, emphasizing organization and clarity in data presentation.
- Instructions are given on how to visualize graphs better by adjusting icon sizes within folders, enhancing user experience during analysis.
Graph Utilization
- The first graph is introduced; it separates documents and is intended for use in Word. The process of copying and pasting is outlined clearly.
- Emphasis is placed on finding additional necessary graphs within the same folder, reiterating the copy-paste method for transferring data into Word.
Navigating Folders
- Transitioning to another folder named "sim TXT," which contains similarity analysis results. Instructions are provided on how to view these files effectively.
- A specific graph from this folder is deemed sufficient for transfer to Word, reinforcing efficiency in selecting relevant data.
Final Steps in Data Transfer