DevGex Search

Methods for Listening to Changes in MongoDB Collections

MongoDB Change Streams Tailable Cursors

This technical article discusses approaches to monitor real-time changes in MongoDB collections, essential for applications like job queues. It covers the use of Capped Collections with Tailable Cursors and the modern Change Streams feature, with code examples in various programming languages. The article compares both methods and provides recommendations for implementation.
Comprehensive Guide to Cron Jobs: Scheduling Tasks Twice Daily at Specific Times

Cron Jobs Linux Scheduling Time Configuration

This technical article provides an in-depth exploration of Cron job scheduling in Linux systems, focusing on configuring tasks to run at specific times such as 10:30 AM and 2:30 PM. Through detailed code examples and 24-hour time format explanations, readers will learn precise scheduling techniques including using comma-separated time lists for multiple daily executions.
Understanding GitLab CI Tags: A Guide to Distinguishing and Using Tags in CI/CD

Git GitLab CI/CD Tags

This article delves into the concept of tags in GitLab CI, emphasizing the distinction between Git tags and GitLab CI tags. It covers key aspects such as setting up runner tags, configuring job tags in .gitlab-ci.yml, and leveraging Git tags to trigger CI/CD pipelines, with clear examples and steps to optimize workflows.
Practices and Optimization for Checking Out Multiple Git Repositories into Subdirectories in Jenkins Pipeline

Jenkins Pipeline Git Repository Checkout Multi-Repository Management

This article delves into how to efficiently check out multiple Git repositories into different subdirectories within the same Jenkins job using pipelines. With the deprecation of the Multiple SCM plugin, developers need to migrate to more modern pipeline approaches. The paper first analyzes the limitations of traditional methods, then details two core solutions: using the dir command and the RelativeTargetDirectory extension of the checkout step. By comparing the implementation details, applicable scenarios, and performance considerations of both methods, it provides clear migration guidelines and best practices to help developers build more stable and maintainable multi-repository build processes.
Deep Analysis and Solutions for Spark Jobs Failing with MetadataFetchFailedException in Speculation Mode Due to Memory Issues

Apache Spark Speculation Mode Memory Management Shuffle Error Performance Optimization

This paper thoroughly investigates the root cause of the org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 error in Apache Spark jobs under speculation mode. The error typically occurs when tasks fail to complete shuffle outputs due to insufficient memory, especially when processing large compressed data files. Based on real-world cases, the paper analyzes how improper memory configuration leads to shuffle data loss and provides multiple solutions, including adjusting memory allocation, optimizing storage levels, and adding swap space. With code examples and configuration recommendations, it helps developers effectively avoid such failures and ensure stable Spark job execution.
Viewing RDD Contents in PySpark: A Comprehensive Guide to foreach and collect Methods

PySpark RDD foreach collect distributed debugging

This article provides an in-depth exploration of methods to view RDD contents in Apache Spark's Python API (PySpark). By analyzing a common error case, it explains the limitations of the foreach action in distributed environments, particularly the differences between print statements in Python 2 and Python 3. The focus is on the standard approach using the collect method to retrieve data to the driver node, with comparisons to alternatives like take and foreach. The discussion also covers output visibility issues in cluster mode, offering a complete solution from basic concepts to practical applications to help developers avoid common pitfalls and optimize Spark job debugging.
Complete Guide to Jenkins Data Migration: Smooth Transition from Development to Dedicated Server

Jenkins migration JENKINS_HOME continuous integration

This article provides a comprehensive guide for migrating Jenkins from a development PC to a dedicated server. By analyzing the core role of the JENKINS_HOME directory, it presents standard migration methods based on file copying and discusses alternative approaches using the ThinBackup plugin for large directories. The article covers key steps including environment preparation, permission settings, and configuration verification, ensuring the integrity of build history, job configurations, and plugin settings for reliable continuous integration environment migration.
A Comprehensive Guide to Executing Shell Commands in Background from Bash Scripts

Bash scripting background execution command variables

This article provides an in-depth analysis of executing commands stored in string variables in the background within Bash scripts. By examining best practices, it explains core concepts such as variable expansion, command execution order, and job control, offering multiple implementation approaches and important considerations to help developers avoid common pitfalls.
Diagnosing and Resolving Symbol Lookup Errors: Undefined Symbol Issues in Cluster Environments

Cluster Computing Dynamic Linking Symbol Resolution GDAL Python

This paper provides an in-depth analysis of symbol lookup errors encountered when using Python and GDAL in cluster environments, focusing on the undefined symbol H5Eset_auto2 error. By comparing dynamic linker debug outputs between interactive SSH sessions and qsub job submissions, it reveals the root cause of inconsistent shared library versions. The article explains dynamic linking processes, symbol resolution mechanisms, and offers systematic diagnostic methods and solutions, including using tools like nm and md5sum to verify library consistency, along with best practices for environment variable configuration.
Spark Performance Tuning: Deep Analysis of spark.sql.shuffle.partitions vs spark.default.parallelism

Apache Spark Performance Tuning Partition Configuration

This article provides an in-depth exploration of two critical configuration parameters in Apache Spark: spark.sql.shuffle.partitions and spark.default.parallelism. Through detailed technical analysis, code examples, and performance tuning practices, it helps developers understand how to properly configure these parameters in different data processing scenarios to improve Spark job execution efficiency. The article combines Q&A data with official documentation to offer comprehensive technical guidance from basic concepts to advanced tuning.
Jenkins CI with Git Integration: Optimized Build Triggering on Master Branch Pushes

Jenkins Git Integration Continuous Integration Build Triggering Master Branch

This technical article provides a comprehensive guide to configuring Jenkins CI systems for build triggering exclusively on pushes to the master branch in Git repositories. By analyzing limitations of traditional polling methods, it introduces an efficient hook-based triggering mechanism covering Jenkins job configuration, GitHub webhook setup, and URL parameterization. Complete implementation steps and code examples help developers establish precise continuous integration pipelines while avoiding unnecessary resource consumption.
Implementing Parallel Program Execution in Bash Scripts

Bash scripting parallel execution process management background processes wait command

This technical article provides a comprehensive exploration of methods for parallel program execution in Bash scripts. Through detailed analysis of background process management, job control, signal handling, and process synchronization, it systematically introduces implementation approaches using the & operator, wait command, subshells, and GNU Parallel. With concrete code examples, the article deeply examines the applicable scenarios, advantages, disadvantages, and implementation details of each method, offering complete guidance for developers to efficiently manage concurrent tasks in practical projects.
Configuring Multi-Repository Access in GitLab CI: A Comprehensive Guide to Deploy Keys

GitLab CI Deploy Keys Multi-Repository Access SSH Authentication Continuous Integration

This article provides an in-depth exploration of solutions for accessing multiple private repositories during GitLab CI builds, with a focus on the deploy keys method. By generating SSH key pairs, adding public keys as project deploy keys, and configuring private keys on GitLab Runners, secure automated cloning operations can be achieved. The article also compares the CI_JOB_TOKEN method as a supplementary approach, analyzing application scenarios and configuration details for both methods to offer practical guidance for continuous integration in complex projects.
Practical Techniques for Killing Background Tasks in Linux: Using the $! Variable

Linux Bash background_tasks process_management kill_command

This article provides an in-depth exploration of effective methods for terminating the most recently started background tasks in Linux systems. By analyzing the Bash shell's special variable $!, it explains its working principles and practical applications in detail. The article not only covers basic usage examples but also compares other task management approaches such as job control symbols %%, and discusses the differences between process IDs and job numbers. Through practical code demonstrations and scenario analysis, it helps readers master efficient task management techniques to enhance command-line operation efficiency.
Deep Analysis of File Change-Based Build Triggering Mechanisms in Jenkins Git Plugin

Jenkins Git plugin build triggering file change detection continuous integration

This article provides an in-depth exploration of how to implement build triggering based on specific file changes using the included region feature in Jenkins Git plugin. It details the 'included region' functionality introduced in Git plugin version 1.16, compares alternative approaches such as changeset conditions in declarative pipelines and multi-job solutions, and offers comprehensive configuration examples and best practices. Through practical code demonstrations and architectural analysis, it helps readers understand appropriate solutions for different scenarios to achieve precise continuous integration workflow control.
Distinguishing Git and GitHub Usernames: Technical Implementation and Identity Differences

Git GitHub username

This article explores the distinctions between Git and GitHub usernames, analyzing their roles in version control systems. The Git username, set via git config, serves as metadata for local commits; the GitHub username is a unique identifier on the platform, used for login, HTTPS commits, and URL access. Through technical details and practical scenarios, it explains why they need not match and emphasizes using the GitHub username in formal contexts like job applications.
Technical Analysis and Practical Guide to Obtaining the Current Number of Partitions in a DataFrame

Apache Spark DataFrame Partition Count

This article provides an in-depth exploration of methods for obtaining the current number of partitions in a DataFrame within Apache Spark. By analyzing the relationship between DataFrame and RDD, it details how to accurately retrieve partition information using the df.rdd.getNumPartitions() method. Starting from the underlying architecture, the article explains the partitioning mechanism of DataFrame as a distributed dataset and offers complete code examples in Python, Scala, and Java. Additionally, it discusses the impact of partition count on Spark job performance and how to optimize partitioning strategies based on data scale and cluster configuration in practical applications.
Core Differences Between Java and Core Java: Technical Definitions and Application Scenarios

Java Core Java Java SE Programming Language Differences Technical Definitions

This article provides an in-depth analysis of the technical distinctions between Java and Core Java, based on Oracle's official definitions and practical application contexts. Core Java specifically refers to Java Standard Edition (Java SE) and its core technological components, including the Java Virtual Machine, CORBA, and fundamental class libraries, primarily used for desktop and server application development. In contrast, Java as a broader concept encompasses multiple editions such as J2SE, J2EE, and J2ME, supporting comprehensive development from embedded systems to enterprise-level applications. Through technical comparisons and code examples, the article elaborates on their differences in architecture, application scope, and development ecosystems, aiding developers in accurately understanding technical terminology in job requirements.
PowerShell Dynamic Parameter Passing: Complete Solution from Configuration to Script Execution

PowerShell Parameter Passing Script Invocation Invoke-Expression Dynamic Parameters

This article provides an in-depth exploration of dynamic script invocation and parameter passing in PowerShell. By analyzing common error scenarios, it explains the correct usage of Invoke-Expression, particularly focusing on escape techniques for paths containing spaces. The paper compares multiple parameter passing methods including Start-Job, Invoke-Command, and splatting techniques, offering comprehensive technical guidance for script invocation in various scenarios.
Research on Methods for Detecting Last Update Time of Oracle Database Tables

Oracle Database Data Change Detection ORA_ROWSCN System Change Number Batch Processing Optimization

This paper comprehensively explores multiple technical solutions for detecting the last update time of tables in Oracle 10g environment. It focuses on analyzing the working mechanism of ORA_ROWSCN pseudocolumn, differences between block-level and row-level tracking, and configuration and application of Change Data Capture (CDC) mechanism. Through detailed code examples and performance comparisons, it provides practical data change detection strategies for C++ OCI applications to optimize batch job execution efficiency.