DevGex Search

Found 280 relevant articles

The Fundamental Differences Between Concurrency and Parallelism in Computer Science

Concurrency Parallelism Multithreading System Design Performance Optimization

This paper provides an in-depth analysis of the core distinctions between concurrency and parallelism in computer science. Concurrency emphasizes the ability of tasks to execute in overlapping time periods through time-slicing, while parallelism requires genuine simultaneous execution relying on multi-core or multi-processor architectures. Through technical analysis, code examples, and practical scenario comparisons, the article systematically explains the different application values of these concepts in system design, performance optimization, and resource management.
Concurrency, Parallelism, and Asynchronous Methods: Conceptual Distinctions and Implementation Mechanisms

Concurrency Programming Parallel Computing Asynchronous Methods

This article provides an in-depth exploration of the distinctions and relationships between three core concepts: concurrency, parallelism, and asynchronous methods. By analyzing task execution patterns in multithreading environments, it explains how concurrency achieves apparent simultaneous execution through task interleaving, while parallelism relies on multi-core hardware for true synchronous execution. The article focuses on the non-blocking nature of asynchronous methods and their mechanisms for achieving concurrent effects in single-threaded environments, using practical scenarios like database queries to illustrate the advantages of asynchronous programming. It also discusses the practical applications of these concepts in software development and provides clear code examples demonstrating implementation approaches in different patterns.
Spark Performance Tuning: Deep Analysis of spark.sql.shuffle.partitions vs spark.default.parallelism

Apache Spark Performance Tuning Partition Configuration

This article provides an in-depth exploration of two critical configuration parameters in Apache Spark: spark.sql.shuffle.partitions and spark.default.parallelism. Through detailed technical analysis, code examples, and performance tuning practices, it helps developers understand how to properly configure these parameters in different data processing scenarios to improve Spark job execution efficiency. The article combines Q&A data with official documentation to offer comprehensive technical guidance from basic concepts to advanced tuning.
Principles and Applications of Parallel.ForEach in C#: Converting from foreach to Parallel Loops

C#Parallel.ForEach Multithreading Data Parallelism Performance Optimization

This article provides an in-depth exploration of how Parallel.ForEach works in C# and its differences from traditional foreach loops. Through detailed code examples and performance analysis, it explains when using Parallel.ForEach can improve program execution efficiency and best practices for CPU-intensive tasks. The article also discusses thread safety and data parallelism concepts, offering comprehensive technical guidance for developers.
Best Practices for Asynchronous Programming in ASP.NET Core Web API Controllers: Evolution from Task to async/await

ASP.NET Core Asynchronous Programming Web API async/await Performance Optimization

This article provides an in-depth exploration of optimal asynchronous programming patterns for handling parallel I/O operations in ASP.NET Core Web API controllers. By comparing traditional Task-based parallelism with the async/await pattern, it analyzes the differences in performance, scalability, and resource utilization. Based on practical development scenarios, the article demonstrates how to refactor synchronous service methods into asynchronous ones and provides complete code examples illustrating the efficient concurrent execution of multiple independent service calls using Task.WhenAll. Additionally, it discusses common pitfalls and best practices in asynchronous programming to help developers build high-performance, scalable Web APIs.
CUDA Thread Organization and Execution Model: From Hardware Architecture to Image Processing Practice

CUDA Thread Organization GPU Parallel Computing

This article provides an in-depth analysis of thread organization and execution mechanisms in CUDA programming, covering hardware-level multiprocessor parallelism limits and the software-level grid-block-thread hierarchy. Through a concrete case study of 512×512 image processing, it details how to design thread block and grid dimensions, with complete index calculation code examples to help developers optimize GPU parallel computing performance.
Deep Analysis of Apache Spark Standalone Cluster Architecture: Worker, Executor, and Core Coordination Mechanisms

Apache Spark Standalone Cluster Worker Process Executor Process Core Resource Management Distributed Computing Architecture Task Scheduling Fault Tolerance Mechanism

This article provides an in-depth exploration of the core components in Apache Spark standalone cluster architecture—Worker, Executor, and core resource coordination mechanisms. By analyzing Spark's Master/Slave architecture model, it details the communication flow and resource management between Driver, Worker, and Executor. The article systematically addresses key issues including Executor quantity control, task parallelism configuration, and the relationship between Worker and Executor, demonstrating resource allocation logic through specific configuration examples. Additionally, combined with Spark's fault tolerance mechanism, it explains task scheduling and failure recovery strategies in distributed computing environments, offering theoretical guidance for Spark cluster optimization.
Canonical Methods for Error Checking in CUDA Runtime API: From Macro Wrapping to Exception Handling

CUDA error checking runtime API macro wrapping kernel launch exception handling

This paper delves into the canonical methods for error checking in the CUDA runtime API, focusing on macro-based wrapper techniques and their extension to kernel launch error detection. By analyzing best practices, it details the design principles and implementation of the gpuErrchk macro, along with its application in synchronous and asynchronous operations. As a supplement, it explores C++ exception-based error recovery mechanisms using thrust::system_error for more flexible error handling strategies. The paper also covers adaptations for CUDA Dynamic Parallelism and CUDA Fortran, providing developers with a comprehensive and reliable error-checking framework.
Controlling Concurrent Processes in Python: Using multiprocessing.Pool to Limit Simultaneous Process Execution

Python multiprocessing concurrency control multiprocessing.Pool process pool

This article explores how to effectively control the number of simultaneously running processes in Python, particularly when dealing with variable numbers of tasks. By analyzing the limitations of multiprocessing.Process, it focuses on the multiprocessing.Pool solution, including setting pool size, using apply_async for asynchronous task execution, and dynamically adapting to system core counts with cpu_count(). Complete code examples and best practices are provided to help developers achieve efficient task parallelism on multi-core systems.
Running Two Async Tasks in Parallel and Collecting Results in .NET 4.5

asynchronous programming parallel execution Task.WhenAll

This article provides an in-depth exploration of how to leverage the async/await pattern in .NET 4.5 to execute multiple asynchronous tasks in parallel and efficiently collect their results. By comparing traditional Task.Run approaches with modern async/await techniques, it analyzes the differences between Task.Delay and Thread.Sleep, and demonstrates the correct implementation using Task.WhenAll to await multiple task completions. The discussion covers common pitfalls in asynchronous programming, such as the impact of blocking calls on parallelism, and offers complete code examples and best practices to help developers maximize the performance benefits of C# 4.5's asynchronous features.
False Data Dependency of _mm_popcnt_u64 on Intel CPUs: Analyzing Performance Anomalies from 32-bit to 64-bit Loop Counters

false data dependency popcnt performance Intel microarchitecture compiler optimization loop variable type

This paper investigates the phenomenon where changing a loop variable from 32-bit unsigned to 64-bit uint64_t causes a 50% performance drop when using the _mm_popcnt_u64 instruction on Intel CPUs. Through assembly analysis and microarchitectural insights, it reveals a false data dependency in the popcnt instruction that propagates across loop iterations, severely limiting instruction-level parallelism. The article details the effects of compiler optimizations, constant vs. non-constant buffer sizes, and the role of the static keyword, providing solutions via inline assembly to break dependency chains. It concludes with best practices for writing high-performance hot loops, emphasizing attention to microarchitectural details and compiler behaviors to avoid such hidden performance pitfalls.
Comprehensive Analysis of Multiprocessing vs Threading in Python

Python Multiprocessing Python Threading Global Interpreter Lock Concurrent Programming Performance Optimization

This technical article provides an in-depth comparison between Python's multiprocessing and threading models, examining core differences in memory management, GIL impact, and performance characteristics. Based on authoritative Q&A data and experimental validation, the article details how multiprocessing bypasses the Global Interpreter Lock for true parallelism while threading excels in I/O-bound scenarios. Practical code examples illustrate optimal use cases for both concurrency models, helping developers make informed choices based on specific requirements.
How to Limit Concurrency in C# Parallel.ForEach

C#Parallel.ForEach Concurrency Limitation MaxDegreeOfParallelism Parallel Programming

This article provides an in-depth exploration of limiting thread concurrency in C#'s Parallel.ForEach method using the ParallelOptions.MaxDegreeOfParallelism property. It covers the fundamental concepts of parallel processing, the importance of concurrency control in real-world scenarios such as network requests and resource constraints, and detailed implementation guidelines. Through comprehensive code examples and performance analysis, developers will learn how to effectively manage parallel execution to prevent resource contention and system overload.
Implementing and Optimizing Multi-threaded Loop Operations in Python

Python Multi-threading Loop Parallelization ThreadPoolExecutor

This article provides an in-depth exploration of optimizing loop operation efficiency through multi-threading in Python 2.7. Focusing on I/O-bound tasks, it details the use of ThreadPoolExecutor and ProcessPoolExecutor, including exception handling, task batching strategies, and executor sharing configurations. By comparing thread and process applicability scenarios, it offers practical code examples and performance optimization advice, helping developers select appropriate parallelization solutions based on specific requirements.
Implementing Custom Thread Pools for Java 8 Parallel Streams: Principles and Practices

Java 8 Parallel Streams Custom Thread Pool ForkJoinPool Multithreaded Programming

This paper provides an in-depth analysis of specifying custom thread pools for Java 8 parallel streams. By examining the workings of ForkJoinPool, it details how to isolate parallel stream execution environments through task submission to custom ForkJoinPools, preventing performance issues caused by shared thread pools. With code examples, the article explains the implementation rationale and its practical value in multi-threaded server applications, while also discussing supplementary approaches like system property configuration.
Service vs IntentService in Android: A Comprehensive Comparison

Android Service IntentService Multithreading Background Tasks

This article provides an in-depth comparison between Service and IntentService in Android, covering threading models, lifecycle management, use cases, and code implementations. It includes rewritten examples and recommendations for modern alternatives to help developers choose the right component for background tasks.
Thread Pools in Python: An In-Depth Analysis of ThreadPool and ThreadPoolExecutor

Python Thread Pool Multithreading ThreadPool ThreadPoolExecutor

This article examines the implementation of thread pools in Python, focusing on ThreadPool from multiprocessing.dummy and ThreadPoolExecutor from concurrent.futures. It compares their principles, usage, and scenarios, providing code examples to efficiently parallelize IO-bound tasks without process creation overhead. Based on Q&A data and official documentation, the content is reorganized logically to help developers choose appropriate concurrency tools.
Viewing Assembly Code Generated from Source in Visual C++: Methods and Technical Analysis

Visual C++Assembly Language Code Optimization Debugging Techniques Disassembly

This technical paper comprehensively examines three core methods for viewing assembly instructions corresponding to high-level language code in Visual C++ development environments: real-time viewing through debuggers, generating assembly listing files, and utilizing third-party disassembly tools. Structured as a rigorous academic analysis, the article delves into the implementation principles, applicable scenarios, and operational procedures for each approach, with specific configuration guidelines for Visual Studio IDE. By comparing the advantages and limitations of different methods, it assists developers in selecting the most appropriate assembly code viewing strategy based on practical needs, while briefly addressing similar technical implementations for other languages like Visual Basic.
Comprehensive Guide to SparkSession Configuration Options: From JSON Data Reading to RDD Transformation

SparkSession Configuration Options JSON Data Processing

This article provides an in-depth exploration of SparkSession configuration options in Apache Spark, with a focus on optimizing JSON data reading and RDD transformation processes. It begins by introducing the fundamental concepts of SparkSession and its central role in the Spark ecosystem, then details methods for retrieving configuration parameters, common configuration options and their application scenarios, and finally demonstrates proper configuration setup through practical code examples for efficient JSON data handling. The content covers multiple APIs including Scala, Python, and Java, offering configuration best practices to help developers leverage Spark's powerful capabilities effectively.
Diagnosing and Optimizing SQL Server 100% CPU Utilization Issues

SQL Server CPU utilization performance optimization

This article addresses the common performance issue of SQL Server servers experiencing sustained near-100% CPU utilization. Based on a real-world case study, it analyzes memory management, query execution plan caching, and recompilation mechanisms. By integrating Dynamic Management Views (DMVs) and diagnostic tools like sp_BlitzCache, it provides a systematic diagnostic workflow and optimization strategies. The article emphasizes the cumulative impact of short-duration queries and offers multilingual technical guidance to help database administrators effectively identify and resolve CPU bottlenecks.