Jason Chi

xiangqi-game-scraper 2024-09-25

Picture of xiangqi-game-scraper

Description:

DPXQ is a famous website that holds the most up-to-date xiangqi game records. I have seen people studying games of their opponents before the match to try doing some pre-match preparations. In order to open the game record in the xiangqi software, you would need to copy and paste the text record into it manually. So I have come up with an idea to scrape the game records of a specific opponent, and convert them into PGN files, a format that can be parsed by most xiangqi softwares, using my own Xiangqi Nuget package

Insights gained from the Xiangqi Opponent Game Scraper:

demo of the scrapper

Leveraging Async and Parallelism

While developing my Xiangqi game opponent scraper, I took full advantage of asynchronous programming and parallelism in C#. Writing large amounts of records into PGN files efficiently was critical, and using async/await allowed me to avoid blocking the main thread, ensuring a smooth and responsive process. By employing parallelism, I was able to run multiple scraping tasks concurrently, significantly reducing the time it took to gather data. This experience not only reinforced my understanding of concurrency but also taught me how to handle complex workflows efficiently in real-time applications.

Using Playwright for Web Scraping

To interact with web pages dynamically, I used Playwright for automating the scraping process. Playwright’s ability to handle modern web technologies like JavaScript-heavy sites made it a powerful tool for my scraper. It allowed me to programmatically navigate the Xiangqi game websites, execute actions like clicking or entering data, and extract information with ease. This made web scraping much more reliable compared to traditional approaches and helped me build a robust solution for gathering opponent data.

Integrating My Own NuGet Package

An exciting aspect of this project was incorporating my own Xiangqi core NuGet package to handle the game logic and data parsing. This allowed me to seamlessly integrate functionalities I had already developed, reducing redundancy and ensuring consistency across the project. It was rewarding to see how modularizing code through a NuGet package facilitated cleaner, more reusable code while enhancing the overall efficiency of the scraper.