Skip to content

PLUG-4678: Microsoft Defender: Split Device import into segments#80

Open
jame2O wants to merge 2 commits into
mainfrom
work/jd/PLUG-4678
Open

PLUG-4678: Microsoft Defender: Split Device import into segments#80
jame2O wants to merge 2 commits into
mainfrom
work/jd/PLUG-4678

Conversation

@jame2O

@jame2O jame2O commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

📋 Summary

  • This PR splits the device imports for Defender into four seperate steps, in order to fix large payload sizes (Davies Group importing 6000+ devices) exceeding the 6MB limit. Since we use the Advanced Hunting API to fetch devices, no pagination is supported.
  • The proposal is to split devices into four buckets based on the hash() function- no devices should be dropped, the only risk is running into the same issue where a bucket exceeds 6MB, but since device IDS are split mostly evenly, this is unlikely (would require ~24MB of data)
  • Testing using our environment gave a pretty even split- this should work the same even with a scaled environment.

Example:
image


🧩 Plugin details

  • Plugin name:
  • Type of change:
    • Bug fix
    • New datastream
    • Enhancement to existing datastream
    • Performance improvement
    • Documentation / metadata / logo
    • Other (please describe):

⚠️ Breaking changes

Does this PR introduce any breaking changes?

  • No
  • Yes (please describe):

📚 Documentation

  • Documentation updated
  • No documentation changes needed

✅ Checklist

  • No secrets or credentials included
  • Plugin, datastream and UI naming follow SquaredUp guidelines
  • I agree to the Code of Conduct

@jame2O jame2O requested review from a team, AnEvilPenguin and vinbab July 2, 2026 11:41
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d74b443d-1868-417d-801a-e4953a71bbc7

📥 Commits

Reviewing files that changed from the base of the PR and between 05aeeab and 470cf36.

📒 Files selected for processing (3)
  • plugins/MicrosoftDefender/v1/dataStreams/listDevices.json
  • plugins/MicrosoftDefender/v1/indexDefinitions/default.json
  • plugins/MicrosoftDefender/v1/metadata.json
👮 Files not reviewed due to content moderation or server errors (3)
  • plugins/MicrosoftDefender/v1/indexDefinitions/default.json
  • plugins/MicrosoftDefender/v1/metadata.json
  • plugins/MicrosoftDefender/v1/dataStreams/listDevices.json

📝 Walkthrough

[!WARNING]

Walkthrough skipped

File diffs could not be summarized.


Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

🧩 Plugin PR Summary

📦 Modified Plugins

  • plugins/MicrosoftDefender/v1

📋 Results

Step Status
Validation ✅ Passed
Deployment 🚀 Deployed

🔍 Validation Details

microsoft-defender
{
  "valid": true,
  "pluginName": "microsoft-defender",
  "pluginType": "hybrid",
  "summary": {
    "Data Streams": 12,
    "Import Definitions": 1,
    "UI Configuration": true,
    "Has Icon": true,
    "Has Default Content": true,
    "Config Validation": true,
    "Custom Types": true
  }
}

@jame2O

jame2O commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@claude review once

@AnEvilPenguin

Copy link
Copy Markdown

Seems like a clever way of dealing with the limitations.

Is there genuinely no way of passing back a paging variable in LCP? In a plugin I'd probably order by deviceId, take 100 and then on subsequent runs filter by deviceIds greater than whatever the last deviceId I'd seen was.
Or is this one of those weird dialects of KQL that doesn't support things like that?

@AnEvilPenguin

Copy link
Copy Markdown

The other thing we need to be careful of here is that the '6MB limit' is a theoretical limit. On top of that the platform needs some space to add in data, and AWS adds its own things in too.
In plugins we would normally limit things to 3-3.5 MB, as this is a much more realistic limit. Usually I would lean towards 3MB to give some weasel room, but I think the most I was able to get out with a CSV payload was around 4MB.

@jame2O

jame2O commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

The other thing we need to be careful of here is that the '6MB limit' is a theoretical limit. On top of that the platform needs some space to add in data, and AWS adds its own things in too. In plugins we would normally limit things to 3-3.5 MB, as this is a much more realistic limit. Usually I would lean towards 3MB to give some weasel room, but I think the most I was able to get out with a CSV payload was around 4MB.

I understand, I'm not sure if there's much we can do here though as we have to play by the LCP framework. Still, since we run 4 steps, a customer would need device data exceeding ~12-16MB to start causing issues, so I think its fine?

@AnEvilPenguin AnEvilPenguin left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing we need to be careful of here is that the '6MB limit' is a theoretical limit. On top of that the platform needs some space to add in data, and AWS adds its own things in too. In plugins we would normally limit things to 3-3.5 MB, as this is a much more realistic limit. Usually I would lean towards 3MB to give some weasel room, but I think the most I was able to get out with a CSV payload was around 4MB.

I understand, I'm not sure if there's much we can do here though as we have to play by the LCP framework. Still, since we run 4 steps, a customer would need device data exceeding ~12-16MB to start causing issues, so I think its fine?

That's annoying, but fair enough. If there's no ability to actually manipulate the data then this seems like a reasonable workaround.

As long as the calculations didn't rely on getting a full 6MB out, that's cool. I don't know how much data is actually getting thrown around here and how much weasel room you gave yourself. Worst case scenario you just throw more steps at it and accept that any imports going on whilst the change is deployed (would be super unlucky for this though) would be a bit dodgy.

@vinbab vinbab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jame2O can you confirm (did you test) that existing plugin instance will continue to work as normal (it does not change the IDs of objects already indexed) so existing indexed objects remain unchanged as far as the user can see?

@jame2O

jame2O commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

@jame2O can you confirm (did you test) that existing plugin instance will continue to work as normal (it does not change the IDs of objects already indexed) so existing indexed objects remain unchanged as far as the user can see?

yes, correct

@andrewmumblebee

andrewmumblebee commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

I understand, I'm not sure if there's much we can do here though as we have to play by the LCP framework. Still, since we run 4 steps, a customer would need device data exceeding ~12-16MB to start causing issues, so I think its fine?

This feels like something we should solve in the framework potentially, this is a smart way around it for now though.

Not sure how that would look, some way to batch the import steps & generate a variable to send to the import based on the batch number.

Similarly a way to paginate a data stream without relying on the response/headers, i.e. using a script that can do time interval batching, etc

One to consider @clarkd

@vinbab vinbab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jame2O Approved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants