Testing DeepSeek V4 Pro on 3D Geometry Processing

中文版

I came up with an absolutely brilliant idea. Unfortunately, I do not have many tokens left today, so I will stop writing for now.
------Cyber Fermat

Before We Begin

DeepSeek V4 Pro’s coding ability is not really the same thing for every person. A thousand users will raise a thousand little whales. Everyone uses different languages, tasks, requirements, domains, and prompts. Of course, one job of an agent is to smooth out those differences as much as possible, but the differences still exist, and they are not small.

As a researcher working in 3D geometry, I was naturally very curious about what it could do in 3D data processing. I wanted to probe its ceiling, partly so I could make better use of it later, and partly to build some working chemistry with “Mr. D.” Then May happened to bring another token sale, and I had a bit of free time, so I ran a fairly thorough test. That is where this document came from. I used the last quiet window at the beginning of June to polish it and publish it.

The Answer Sheet

First, let us look at what DeepSeek produced. Based on third-party libraries such as Dear ImGui, Easy3D, CGAL, GLFW, and OpenGL, DeepSeek implemented a 3D geometry processing application called 3D Claw. The name came from Mr. D after 109 seconds of thought. The software can load point clouds and meshes, perform basic geometry processing, run a batch of geometry processing algorithms with process visualization, and integrate AI into the workflow.

The source code is here: https://github.com/tangtangtang1995/3D-Claw

I also prebuilt Windows and Linux clients, so they are ready to use after download. Pull requests and issues on GitHub are very welcome, especially bug reports. That gives me another way to test how DeepSeek responds to issues. I am also curious how AI agents trained by different users will read this codebase.

Software Interface

The default 3D Claw interface has a model tree on the left, a 3D viewport in the middle, properties/logs/health report/history at the bottom, and AI Chat plus AI 3D Generation on the right. AI Chat reads the current model and panel context, including model type, vertex and face counts, bounding box, normals, UVs, algorithm parameters, operation history, and health report information. Before running an algorithm, you can ask for parameter advice. After running it, you can pass result metrics to the AI for evaluation. The current model’s operation history and related information are automatically attached as context.

Still the bunny.

At startup, you can also configure the AI Chat key and response language directly. AI Chat shares context with several algorithm panels; it is not an isolated chat box.

Concrete Examples

The demo video below collects the main visual workflows: region growing, skeleton extraction, mesh simplification, mesh smoothing, and AI 3D generation. I keep the written examples here as a quick map of what to look for in the video.

Watch the demo on Bilibili

Region growing algorithm demo. The region growing implementation in the software makes seeds, region expansion, planar-region coloring, progress state, and result evaluation observable. Based on the current model, current panel parameters, and output results, the AI explains parameter meanings, warns about over-segmentation or under-segmentation risks, and suggests the next round of tuning.
Skeleton extraction algorithm demo. Skeleton extraction takes a long time, and parameter effects are not intuitive. If you only see the final skeleton, it is hard to tell whether the problem came from preprocessing, contraction, topology changes, or final curve extraction. The goal here is to expose as much intermediate state as possible, so users can watch in the viewport as the algorithm gradually converges to a curve skeleton.
Mesh simplification algorithm demo. Simplification is a good test of AI’s post-processing evaluation ability. The software produces the simplified geometry, and AI can combine face-count changes, shape preservation, visual quality, and potential risks to judge whether the simplification is conservative or aggressive. It can then suggest next steps, such as whether to simplify again, smooth first, or inspect boundaries and normals.
Mesh smoothing algorithm demo. In smoothing, displacement, area/volume change, triangle quality, and visual smoothness all matter. Here, the algorithm result and quality metrics are passed to AI, which explains the effect of the operation and warns about possible over-smoothing, detail loss, or follow-up repair needs.
AI 3D generation. This feature connects to image-to-3D and text-to-3D generation workflows, using hunyuan3d-api. Generated results are imported into 3D Claw, and then the software’s internal tools diagnose non-manifold geometry, floating fragments, UV/texture chaos, local detail collapse, and similar problems. AI evaluation is triggered automatically. As a side complaint: Hunyuan3D’s secret-id and secret-key are hidden way too deep. I spent half an hour finding them.

The video gives a compact pass over the main visualization effects. For more details, download the software and play with it directly. Pull requests and issues on GitHub are welcome: https://github.com/tangtangtang1995/3D-Claw

Cost

This test round was mainly concentrated in May. DeepSeek’s token cost is shown below. The last two days mostly depended on Codex for emergency refactoring and firefighting, but most of the feature code really was written by DeepSeek.

What Else Is Inside?

Besides the features above and basic geometry processing capabilities, the project also includes AI Chat, context-menu explanations, Health Report, operation history, model selection and measurement, texture/UV diagnosis, Poisson reconstruction, RANSAC primitive extraction, Alpha Wrapping 3D, ACVD, VSA, Planar Patch Remeshing, Geodesic Distance, ARAP Deformation, UV Parameterization, and more.

Not every feature reached what I would call “product-level completeness,” but as a test of a model’s capability ceiling, reaching this level in about 150 hours is, at least to me, already quite nice.

The Assignment

This software was not created by AI in one magical leap. I am very curious whether the newly released target mode can do that. In practice, this followed a plain development process: first, give a rough direction, including a brief requirements description, system design, task breakdown, and requirement dispatch; second, move forward and improve step by step, including high-level design, implementation, testing, and review for each module; third, complete system integration, system testing, acceptance, release, and so on.

Code is cheap, show me the talk. The earliest prompt was roughly like this:

I want to develop a 3D geometry processing application. Something like MeshLab, since CloudCompare feels too old. It should implement basic data processing functions, including but not limited to moving, rotating, selecting, adding and deleting, measuring, distance measurement, sampling, denoising, distance computation, and so on. Then it should include some cutting-edge geometry processing algorithms, such as algorithms added to CGAL in the last five years. To make it differentiated, it needs the following:

1. Embed AI-assisted functionality into the software: Many algorithms have many parameters to tune. Even after carefully reading the papers, facing different data does not necessarily tell you how to find the best parameters. Usually, this requires repeated quantitative and qualitative evaluation. Introduce AI assistance so it can explain parameter effects, evaluate output data, and suggest further parameter adjustment directions based on errors. Public papers, documentation, and open-source implementations may already be in large-model training corpora, so the model may be able to provide useful parameter explanations. But its suggestions must be constrained by the current model context and result metrics.

2. Visualize intermediate algorithm processes: Many geometry processing algorithms have long-running processes. I hope some intermediate states can also be rendered, such as exposing seeds, candidates, iterations, snapshots, metrics, and worker states, then passing them to the UI for drawing. On one hand, this helps us see early whether the algorithm has gone wrong. On the other, it can show which stage is affected by different parameters. More importantly, it may inspire algorithm improvements.

3. Real-time interaction with intermediate algorithm processes: Allow users to adjust scene elements or parameters while the algorithm is running, affecting the final output. Imagine a game cheat tool: essentially, this means transforming a traditional processing algorithm into a runtime system that can be paused, inserted into, replayed, and even intervened in.

4. Add AI generation features: There are now many image-to-3D large models. Integrate their APIs into the software as well, and call tools inside the software to evaluate quality.

There is no doubt that AI can quickly produce a runnable GUI geometry processing application, because the corpus is too rich. So to create a meaningful distinction, I added the requirements above. They correspond to:

Understanding of related geometry processing algorithms: can it combine the current model context and give reasonable parameter suggestions?
A combination of coding ability and algorithm understanding: intermediate-state rendering requires hacking source code, adding hooks, writing visitors, adding callbacks, coordinating and synchronizing threads, and decoupling frontend and backend architecture.
The third point was too hard, so I later gave up on it.
The final point tests external API integration, asynchronous network tasks, credential management, generated-result import, format compatibility, and the ability to feed generated models back into a geometry quality evaluation workflow.

At the very beginning, I only wanted to run a simple test and make a small demo. I did not plan to publish it on GitHub. Later, I found that the level of completion was decent, so I released it.

Configuration: OpenCode + DeepSeek V4 Pro. Emergency captain: Codex on x-high.

Time: May 1 to May 31. I spent about three to five hours per day on this project.

The Solution

I directly asked Mr. D to post its own development log:

- 05-02 03: Finished planning discussion, goal alignment, high-level architecture decisions, design-pattern choices, candidate algorithm selection, and third-party library evaluation.
- 05-03 04: Built the initial ImGui docking interface, GLFW main window, FBO-based 3D viewport, menu bar, status bar, and basic dock panels; drafted 16 algorithm dialogs and 6 dock widgets.
- 05-05 06: Fixed the first batch of ImGui/OpenGL issues, including GPU idle-loop problems, VAO destruction errors, Model List ID-stack assertions, Fit Screen behavior, log refresh, and status-bar updates.
- 05-07 08: Implemented asynchronous large-file loading to prevent the UI from freezing when loading large PLY files; introduced the loading/uploading/idle state machine.
- 05-09: Added the AI Chat panel with DeepSeek API integration, SSE streaming, Markdown rendering, copy support, and basic text interaction.
- 05-09: Refined the UI into a CloudCompare-style layout, including model-tree connection lines, selected bounding boxes, and right-docked AI panels.
- 05-10: Implemented the unified selection system, measurement tools, clipping, extraction, and other basic workspace operations.
- 05-11: Converted Poisson reconstruction, simplification, smoothing, hole filling, remeshing, and other algorithms to asynchronous execution to avoid blocking the UI.
- 05-11: Used Poisson reconstruction as the first AI-assisted algorithm prototype, adding parameter explanation, model metadata injection, and an algorithm-specific prompt workflow.
- 05-12: Added quality evaluation for Poisson results, including point-cloud/mesh distance analysis and scalar-field visualization, then passed the result metrics back to AI for tuning suggestions.
- 05-14: Implemented Geodesic / Distance Field tools, including front propagation, exact shortest path, Heat Method, vertex picking, and AI-assisted parameter advice.
- 05-14: Integrated CGAL Alpha Wrapping 3D and established a DLL/POD boundary between the GUI and CGAL runners; investigated why AW3 still stalled the UI despite running in a worker thread, and traced it to Debug CRT heap behavior and CGAL's large number of small allocations.
- 05-15: Implemented AW3 process visualization with event logs, timeline replay, Steiner points, gates, surface snapshots, and AI quality analysis; hardened AW3 with error events, async wakeups, large-input guardrails, cancellation, and status feedback.
- 05-15: Started RANSAC Primitive Extraction visualization, adding sampled points, candidate planes, pause/resume, and step-by-step execution.
- 05-16: Improved RANSAC visualization and fixed plane placement, camera jumps, model-tree hierarchy, and CH/AS result display issues.
- 05-16: Started Region Growing visualization, adding seeds, region expansion, plane-region coloring, and AI analysis.
- 05-16: Iteratively fixed Region Growing visual artifacts, including non-smooth color transitions, patch popping, z-fighting, ring-like artifacts, and mouse-interaction refresh issues.
- 05-17: Investigated Region Growing getting stuck at 87.1% on the barn dataset; added a debug repro, timing dashboard, and comparison against native CGAL behavior.
- 05-17: Split part of the dialog code, fixed Region Growing final handoff, and improved AI Chat with movable windows, font scaling, and Ctrl + mouse-wheel zoom.
- 05-18: Integrated CGAL Simplification and ACVD visualization workflows, handling slow mode, cluster overlays, seed display, and related UI details.
- 05-19: Continued visual workflow integration for VSA, Planar Patch Remeshing, and CGAL Smoothing.
- 05-21: Continued ARAP Deformation and Parameterization / UV workflows, including distortion heatmaps and linked 3D/UV point selection.
- 05-22: Integrated 3D AI generation, supporting text/image-to-3D generation, glTF/GLB/textured OBJ loading, and generated-model quality evaluation.
- 05-23: Completed the first project phase and shifted the focus toward an AI-assisted 3D processing workspace; added Health Report, AI Context, and menu-organization improvements.
- 05-24: Added operation history, texture diagnostics, Transform / Point-Pair Align / ICP, and global UI scaling; injected operation history into AI prompt context.
- 05-26: Started systematic source-size and long-function analysis, discovering that MainWindow, dialogs, and overlays had entered an architecture pressure zone.
- 05-26: Performed the first large-scale architecture audit and split major components such as paint canvas, dialogs, widgets, menus, and overlays.
- 05-29: Prepared the public release version, including README, BUILDING, third-party notices, vendor overrides, optional CGAL builds, and public export cleanup.
- 05-30: Continued architecture cleanup in the public tree, adding layering rules and check_layering tooling to clarify the app/services/algorithms/common boundaries.
- 05-31: Prepared the GitHub public release with README GIFs, release assets, Ubuntu minimal CI, documentation links, and build instructions.

Comments

What did I participate in? All research discussions, design reports, action-plan discussions, and reviews. Also manual acceptance: whether intermediate-process visualization looked right, whether point picking felt good, and whether the UI interaction behaved normally.

Some process documents.

How did it actually work? Plain and unglamorous pair programming. Each round usually had three to five terminals: one terminal did research, produced the next plan, and discussed with me how the next step should be implemented, how to break down the task, how to self-test, and what manual acceptance points and final deliverables should be; one terminal reviewed code; one terminal coded the current task. Later I realized that DS probably should not review itself. Codex should have been supervising.
Why choose an immediate-mode UI such as ImGui? First, there was a lot of work around visualizing intermediate algorithm processes, so I wanted every frame to refresh. At that time, I was also thinking about interaction during algorithm execution. Second, from the beginning I only wanted a quick test and a small demo. I did not expect Mr. D to produce such an impressive result. Third, Qt was simply too heavy for a small demo.
In the early stage, DeepSeek V4 Pro felt very good. The architecture already had a reference: I directly asked it to imitate the architecture of the demo GUI from my paper last year, the last glory of old-school hand-rolled programming. The early tasks were clear, so there was little chance for false consensus. The software code was also still shallow, and honestly it never became that deep in the end. Aside from Mr. D’s rather unconstrained coding style, such as doing almost no input protection, everything else was fine. Most things passed in one shot, which gave me the confidence to keep testing. The early Poisson reconstruction, AW3 wrapping algorithm, and region growing algorithm basically contained the design patterns for all later visualized algorithms. I kept settling architectural details and intervening manually. Once the template was done, integrating the other algorithms became much faster.
Why are the logs for the 27th and 28th missing?

Fine, time to complain.

After the earlier architecture had been settled, I stopped following the code closely. Technical debt always comes due. During code review on Friday night, I found that Mr. D had gone off the rails: five files were over 7,000 lines, more than ten functions were over 200 lines, and the longest function exceeded 2,000 lines. In a project of around 50,000 lines, this is absolutely not acceptable. The earlier code architecture had also been distorted by then. To be fair, ImGui played a role here too. Immediate-mode rendering requires a lot of code. Every component is a code block, and once everything is written serially, it becomes very long.

The funny part is that when I pointed out to Mr. D that the code was not standard, emphasized returning to the previous Pimpl design pattern, and asked it to preserve the original frontend-middle-backend architecture, Mr. D thought for a long time and then basically gave up: “How about we switch to another problem?” I do not remember the exact wording, but that was roughly the meaning.

There was no choice. Only two days remained before the May 31 milestone, so I had to ask Mr. O to rescue the situation. Fortunately, Altman resets things every so often, so I still had quota left. I went straight to x-high at 1.5x speed and sprinted through the weekend. Over those two days, Codex roughly pulled the architecture back while I watched it edit. Even so, the code only reached the point of being barely readable. You can imagine how painful it was before. Many parts were still not quite satisfactory. Codex really likes writing defensive code, almost too steady, and it also over-designs in places. The architecture was already messy, and then it added redundant code on top. Those two days alone produced 155 local git commits.

Refactoring records.

How did it feel?

At first, excitement. In the last few years, a lot of my excitement has come from AI. Looking back at the memorable points in my own large-model usage history: if we exclude models like BERT, I started using OpenAI’s web chat at the end of 2022; in 2024, I hand-built an agent-like tool with a community friend; at the end of 2024, DS R1 was released; in 2025, I began using Codex; and in 2026, I tried OpenCode + DeepSeek V4 Pro. The first experience was not even worse than Codex. I admit I have a filter for DeepSeek.

Then, exhaustion. May was my first time doing this kind of intensive, extreme programming with multiple AI agents. It felt as if AI had wrung me dry. Whenever a fresh model appears, I always test it immediately. Once I touch its capability ceiling, I can relax. I am looking forward to DS 4.1 and its own code agent. I believe the ability will go up another level, and by then I will probably get that fresh feeling again.

Then, confusion. Generally speaking, before development begins, product design, research reports, system design, high-level design, detailed design, and all kinds of documents are indispensable. They need layers of approval, alignment, and discussion. If one part is not done well, a landmine gets buried. But today’s agents have completely changed that paradigm. An app can go online in a week, and tens of thousands of lines of code are no big deal. Human review speed cannot keep up at all. It is hard to control the boundary of agent usage, because the agents are developing so quickly. I worry that the tasks assigned to AI are too light and fail to squeeze out its capability, but I also worry that if the tasks are too heavy, the AI will run away.

So where is DS V4’s ceiling? Disclaimer first: I can only speak about the task of developing GUI visualization software in the 3D vision domain. In this task, Mr. D was already exhausted at around 50,000 lines of code, which was exactly the point when all planned software features had just been implemented. The architecture depth was only about three layers; most of the code volume came from breadth. After 50,000 lines, any modification I asked it to make failed tests, and no matter how I prompted it, it could not solve the problems. Even more absurdly, when I asked it to review the code after Codex refactored it, it would complain and give wrong suggestions.

Maybe a small project of around 50,000 lines is already close to DS’s current ceiling. It can build a runnable demo, but that is completely different from software that can be released and code that is friendly to human maintainers. The last stretch is often the hardest. Codex really can catch those final ten miles quite steadily. Of course, Codex has also crashed in other projects. I will talk about that when I have time, hhhhh.

Do we really need coding standards? Do we need camelCase and PascalCase, functions shorter than 200 lines, fewer than six parameters, files shorter than 2,000 lines, forbidden goto, and all those other rules? These are human-made constraints, because our five-watt brains can only absorb about half a screen of function body at once. But AI does not have that problem. A one-megabyte function, context aside, would be fine as long as the hardware supported it, even if the function were as long as Dream of the Red Chamber. What limits AI is actually hardware configuration: stack size, Stack Overflow warnings, L1/L2 cache-miss warnings, and, most seriously, compiler optimization capability.
My personal view: top-tier large models are a zero-sum game. Right now, Codex is clearly in a class of its own. But in my eyes, the money I pay and the goods it delivers do not quite match; its pricing is a bit high. The most frustrating part is the intelligence swing: sometimes high, sometimes low. Mr. D does not care. DeepSeek’s selling point is making sure there are no expensive tokens under heaven. Its capability is not top-tier, but it is genuinely usable.

At this point, my personal setup is basically Codex plus DeepSeek. Finding chemistry with a large model is not easy. Different agents have very different compatibility. You need to know which tasks you can safely hand over, how much workplace credit the agent has, and that all comes from repeated testing, repeated use, and repeated tasting. I suspect this is also part of large-model stickiness, or sunk cost. It is also why I said earlier that a GPT-5.5 with unstable highs and lows would be so frustrating: it deviates from expectation. Persistent memory is, of course, also an indispensable part of the loop.

Afterword

I keep feeling that today’s large models are like computer vision ten years ago: hyped to a height that does not quite belong to them, like the last capital carnival of an economic downturn. As someone who, in a certain sense, benefits from AI and works in a related field, I feel conflicted. I do not know when the sugar coating will melt away, or when the shell underneath will land.

We are both the cause and the result of this AI wave.