Vadim Smirnov

Forum Replies Created

Viewing 15 posts - 451 through 465 (of 1,476 total)
  • Author
    Posts
  • Vadim Smirnov
    Keymaster

      Hi!

      There is a sample https://github.com/wiresock/ndisapi/tree/master/examples/cpp/socksify which redirects specified local application to the local TCP proxy and then to the specified SOCKS proxy. If I understood you right then this is what you doing in your application. I have also used similar approach in a couple of commercial projects and I can confirm that this works just fine.

      To figure out what is going wrong in your case I would capture and save the traffic to analyze. May be the packet you modified has incorrect checksum or length and thus dropped by the stack.

      in reply to: WindowsPacketFilter/Tools/ebridge not working #11774
      Vadim Smirnov
      Keymaster

        Regretfully I don’t have TB adapters to test with, but probably TB somehow differs from the ‘normal’ Ethernet. Technically it is emulation of 802.3 media over TB bus, so I would not be surprised if TB adapter simply ignores any network packets having MAC address from another TB adapter.

        In some approximation this could be similar to the situation when bridge works between wired Ethernet and WiFi where I had to translate wired Ethernet MAC addresses to WiFi and vice versa so that packets from the wired segment would not be rejected by an Access Point.

        But these are just raw ideas based on my previous experience, I don’t have the relevant hardware to test with.

        Vadim Smirnov
        Keymaster

          P.S. BTW, if you don’t need the SMB traffic to be processed in user mode then you could load the filter into the driver to pass it over without redirection.

          Vadim Smirnov
          Keymaster

            I think 3 threads are good to go:

            1. ReadPackets thread which forms re-injection lists, signals re-inject threads and waits the re-inject to complete or even better proceeds to read using secondary buffers set
            2. SendPacketsToMstcp thread waits for ReadPackets signal, re-injects, notifies ReadPackets thread and returns to wait
            3. SendPacketsToAdapter thread waits for ReadPackets signal, re-injects, notifies ReadPackets thread and returns to wait
            Vadim Smirnov
            Keymaster

              Here is the CPU breakdown of SMB download:

              Function Name Total CPU [unit, %] Self CPU [unit, %] Module Category
              |||||| – CNdisApi::SendPacketsToMstcp 2858 (56.58%) 3 (0.06%) dnstrace.exe IO | Kernel
              |||||| – CNdisApi::SendPacketsToAdapter 1495 (29.60%) 2 (0.04%) dnstrace.exe IO | Kernel
              |||||| – CNdisApi::ReadPackets 349 (6.91%) 6 (0.12%) dnstrace.exe IO | Kernel

              As you may notice splitting reading and re-injection does not make much sense, but splitting SendPacketsToMstcp and SendPacketsToAdapter over two threads definitely will have an effect.

              I can’t see how the OSR post can be related, the author problem is about repackaging packets due to the reduced MTU.

              Vadim Smirnov
              Keymaster

                This is the result on a i7-2600 @ 3.4GHz Win10x64:

                100MB/s -> 40MB/s

                CPU: 52%
                Memory: 30%
                Disk: 10%

                The test system was a receiver, right?

                In my test above I’ve been sending the file from the test system. When I have changed the direction, I have experienced more noticeable throughput degradation.

                What is important here is that in both cases it was a maximum performance achievable by single threaded dnstrace application (Resource Monitor showed 25% CPU load over 4 vCPU). This is the bottleneck… Inbound packet injection is more expensive than outbound and this explains the performance/throughput difference for inbound/outbound traffic I experience on i3-3217U. On the other hand Ryzen 7 4800H single-threaded performance is good enough to not to have any throughput degradation at all regardless of the traffic direction.

                Worth to note that Fast I/O won’t be of much help here, it was primarily designed for the customer who uses the driver in the trading platform and needed a fastest possible way to fetch the packet from the network to the application bypassing Windows TCP/IP stack.

                First idea to consider is improve dnstrace performance by splitting its operations over two threads, e.g. one thread to read packets from the driver and second thread to re-inject them back.

                I also think some optimization is possible for the packet re-injection either. E.g. scaling packet re-injection over all available vCPUs in the kernel. Though, it is not that easy as it sounds, breaking packet order in the TCP connection may result re-transmit and other undesired behavior. So, maybe adding Fast I/O for re-injection could be a better choice (currently packets are re-injected in the context of dnstrace, in case of Fast I/O they would be re-injected from the kernel thread).

                Vadim Smirnov
                Keymaster

                  P.S. An example, when I have tested the same machine but the target file was located on the HDD (on the screenshot above the file is on the SSD) I have had about 3x-4x slower throughput with 100% HDD load.

                  Vadim Smirnov
                  Keymaster

                    I have tested 8 years old Core i3-3217U (I don’t have anything slower with Windows installed) sending file to another machine over SMB with and without dnstrace running. Here are the results:

                    Core i3-3217U Test results

                    You can notice some slow down (8-9%) but it is not close to the 50% throughput reduction you have reported. What was the bottleneck in your tests?

                    Vadim Smirnov
                    Keymaster

                      Yes, sorry, it is my fault… Saturday evening 😉… In that case traffic has passed over virtual network . Here is the test over the cable:

                      Test over the cable

                      You can notice some bandwidth degradation (900 Mbps vs 976 Mbps without filtering) and extra CPU load.

                      Vadim Smirnov
                      Keymaster

                        Some samples use fast i/o, others don’t, but it is very easy to switch the sample between fast and old model by changing one line of code:

                        For the Fast I/O:

                        auto ndis_api = std::make_unique<ndisapi::fastio_packet_filter>(

                        For the ordinary I/O:

                        auto ndis_api = std::make_unique<ndisapi::simple_packet_filter>(

                        And yes, fast i/o does not support WOW64…

                        the cpu I’m testing with is core i7 5500

                        It is fast enough and in my post we have discussed above I have tested much older models. But you have mentioned that you use VM, while I tested on the real hardware over real 1 Gbps wire cable.

                        Vadim Smirnov
                        Keymaster

                          I guess one reason could be because of how powerful the underlying CPU is.

                          It is easy to verify, just start Task Manager (or Resource Monitor) when you copy the file and check the CPU load with and without dnstrace running. If your CPU peaks even without dnstrace then no wonder if you get the throughput degradation when add extra work…

                          I want to measure how much overhead this project has if we use it to send every packet to user to check (block or not), and then send those that are OK based on user mode decision. So this dnstrace seems to do exactly this right?

                          Yes, it filters (takes from the kernel to user space and then re-injects back into the kernel) all packets passed through the specified network interface, selects DNS responses, decodes and dumps.

                          Can you also try to use the code that was shared on the blog that you mentioned to see if you still don’t get any reduced performance?

                          Well, dnstrace is good enough to test with. Also, if you would like to test with fast I/O option you can take sni_inspector. It also filters all the traffic for the selected interface, but instead DNS responses selects and dumps the SNI field from the TLS handshake.

                          Vadim Smirnov
                          Keymaster

                            The only idea I have is that you have some other third-party software (which includes NDIS/WFP filter) and somehow results the software conflict… Try to setup two fresh Windows connected over the switch or direct cable.

                            P.S. To ensure, retested with reversed copy direction and the result is still the same…

                            Vadim Smirnov
                            Keymaster

                              I’ve just taken dnstrace sample (dumps DNS packets and passes everything else without any special handling) and tried to copy one large file from the system running dnstrace to another one. Here is the result with dnstrace running and without:

                              SMB Test.

                              Vadim Smirnov
                              Keymaster

                                The sample code you tested was designed for the demo purposes only. If you are interested in the performance tests you can check this post.

                                Vadim Smirnov
                                Keymaster

                                  Capture tool was primarily designed for testing/debugging purposes and it utilizes relatively slow file stream I/O, e.g. each intercepted packet is delayed for the time needed to write it into the file resulting the increased latency and decreased bandwidth.

                                  If you need a high speed traffic capture solution you have to implement in-memory packet caching and write captured packets into the file using dedicated thread instead of doing this in the packet filtering one. Or you could use use the memory mapped file and let Windows cache manager to do the rest 🤔

                                  P.S. And yes, Fast I/O may improve the performance even further…

                                Viewing 15 posts - 451 through 465 (of 1,476 total)