PLUG-4678: Microsoft Defender: Split Device import into segments#80
PLUG-4678: Microsoft Defender: Split Device import into segments#80jame2O wants to merge 2 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
👮 Files not reviewed due to content moderation or server errors (3)
📝 Walkthrough
Comment |
🧩 Plugin PR Summary📦 Modified Plugins
📋 Results
🔍 Validation Details✅
|
|
@claude review once |
|
Seems like a clever way of dealing with the limitations. Is there genuinely no way of passing back a paging variable in LCP? In a plugin I'd probably order by deviceId, take 100 and then on subsequent runs filter by deviceIds greater than whatever the last deviceId I'd seen was. |
|
The other thing we need to be careful of here is that the '6MB limit' is a theoretical limit. On top of that the platform needs some space to add in data, and AWS adds its own things in too. |
I understand, I'm not sure if there's much we can do here though as we have to play by the LCP framework. Still, since we run 4 steps, a customer would need device data exceeding ~12-16MB to start causing issues, so I think its fine? |
AnEvilPenguin
left a comment
There was a problem hiding this comment.
The other thing we need to be careful of here is that the '6MB limit' is a theoretical limit. On top of that the platform needs some space to add in data, and AWS adds its own things in too. In plugins we would normally limit things to 3-3.5 MB, as this is a much more realistic limit. Usually I would lean towards 3MB to give some weasel room, but I think the most I was able to get out with a CSV payload was around 4MB.
I understand, I'm not sure if there's much we can do here though as we have to play by the LCP framework. Still, since we run 4 steps, a customer would need device data exceeding ~12-16MB to start causing issues, so I think its fine?
That's annoying, but fair enough. If there's no ability to actually manipulate the data then this seems like a reasonable workaround.
As long as the calculations didn't rely on getting a full 6MB out, that's cool. I don't know how much data is actually getting thrown around here and how much weasel room you gave yourself. Worst case scenario you just throw more steps at it and accept that any imports going on whilst the change is deployed (would be super unlucky for this though) would be a bit dodgy.
vinbab
left a comment
There was a problem hiding this comment.
@jame2O can you confirm (did you test) that existing plugin instance will continue to work as normal (it does not change the IDs of objects already indexed) so existing indexed objects remain unchanged as far as the user can see?
yes, correct |
This feels like something we should solve in the framework potentially, this is a smart way around it for now though. Not sure how that would look, some way to batch the import steps & generate a variable to send to the import based on the batch number. Similarly a way to paginate a data stream without relying on the response/headers, i.e. using a script that can do time interval batching, etc One to consider @clarkd |
📋 Summary
hash()function- no devices should be dropped, the only risk is running into the same issue where a bucket exceeds 6MB, but since device IDS are split mostly evenly, this is unlikely (would require ~24MB of data)Example:

🧩 Plugin details
Does this PR introduce any breaking changes?
📚 Documentation
✅ Checklist