Collected data-- alert or report when collected data is not being synced to server?

MR
Matt Rogers
GfK Mediamark Research & Intelligence, LLC

We are on Mobicontrol Cloud 14.3.2.1171 using Samsung Tab S2 and Tab S4 Android tablets. I have a data collection rule set to gather data for available memory, battery percentage, location, signal strength, and a few other of the configurable items.

We sometimes encounter situation where we are troubleshooting user complaints and when we go to see where the tablet has been, what the connectivity was like, etc we frequently see that we are missing collected data for the previous week or longer even though the device has been checking in during that time.

Our data collection rule is set to run every 20 minutes, and due to server activity we have put the devices into different groups, exh with their own schedule, to stagger the collections so we aren't running the rule on every device at the same time.

Looking at device logs it will often reveal that the device is connecting and disconnecting frequently so my theory is that the devices are seldom online when the rule actually runs, or that it disconnects before the rule is completed. 

We have tried to leverage the out of contact rule but that only tells us if the device connected at all, not that it was persistently connected and able to do it's routine housework. Ideally we could make use of an alert that tells us if there has been no collected data for n-days and run it only against devices that were also "active" (connected at all) during that same interval. Such a tool would allow us to not only proactively ensure we aren't missing any data but to know soon after the device goes dark that we need to take corrective action. We just ran into a situation where a device was lost or stolen and after the fact we find that we haven't had any location data for 10 days. This is just one example, of course.

I don't see where the current array of alerts or reports allow us to get this information, can anyone suggest a way for us to do this?

Many thanks,

Matt

6 years ago
Android
ANSWERS
RC
Raymond Chan Diamond Contributor
6 years ago (edited 6 years ago)

What is the size of the data collection buffer you set in your data collection rule?  If they are big enough, collected data would not be truncated due to buffer full status, and you should be able to get them (especially the battery percentage level, which is not sensitive to RF signal level or connection status) as long as the device agent is active.

If the connection channel quality is poor, retransmission will be attempted and the collected data will not be lost except when truncation is needed because the DC buffer is full.

You probably have to analyse the previously capture collected data & device log from your web console to see if any of the following cause the missing data problem:

1. device agent get pre-empted by task manager or power-saving mode (causing no data collection activity at all)

2. RSSI level of Wifi and/or cellular connections to see if problematic devices have consistently low level most of the time (causing frequent truncation of collected data in data collection buffer)

In the short run, maybe you can do the following to see if the situation gets improved:

1. set a bigger data collection buffer size and truncation time threshold (Disabling truncation by setting to 0 may not work until your server is upgraded to v14.3.4+ as documented in bug-fix item MCMR-17149 of release note)

2. adjust the position and signal level of Wifi access point

3. turning off unnecessary low-power mode

4. include an out-of-contact alert rule with shorter OOC time threshold to signal lost/stolen devices

MR
Matt Rogers
6 years ago

Thanks  for your reply, Raymond. Here are my settings for the data collection rule:

Unless I am missing something I don't see where an out of contact alert will help me as the device check in consistently, sometimes devices just doesn't stay connected long enough for the on-device collected data to be copied up to the server. 

We are using Android Plus enrollment, device agent is 13.7.3.1045. I sent the "Disable Doze" script to all devices in the hopes that this would prevent any battery saving measures on the device from causing the agent to be unavailable but if that were the case I would have thought that the fact that the devices do check in means that the agent is not dozing, or dozing enough to be causing this problem: 

writeprivateprofstring DeviceFeature DisableDozeMode 1
apply featurecontrol

Of course I would like to prevent this loss of data collection from happening but until the cause and fix can be determined I need a way to know if and when a device that should be sending collected data is not doing so, then we can a) take action with the user to  investigate and remediate so we are not missing data and b) identify any patterns that may explain why this happens, whether it is specific to a particular device, configuration, or usage pattern.

Do you think there is a way to be notified that collected data is not being received? Can you think of any other reasons why this fails sometimes?

To answer your other questions:

1) I don't know how to tell from logs if power saving mode is possibly preempting the data collection task on the device but based on how out users utilize the devices (they do field survey work in different regions across the US) they are connected continuously to LTE when signal is available but there are periods during the day when they may be in low or no signal areas. In those cases they are instructed to connect to their hotel or other wifi access point at least nightly, while on charger, so they can sync their survey data. The devices are to remain on and connected overnight so we can push updates and such so data collection should at least be happening during these times if it isn't happening according to the schedule during the day.

2) Because the users connect to different sources during their day (different LTE towers as they move within that day's region, McDonald's, Starbucks, hotels, etc) I would think that even though signal strength might at times affect data collection the connection should be good enough most of the time (especially at night) that we should get several good data collections daily.

As always I appreciate your help.

Matt

RC
Raymond Chan Diamond Contributor
6 years ago

The simplest analysis logic is as follows:

1. check if there are any missing samples for battery level every 20 minutes for an extended period of time (say 1 or 2 days)

2. check if the Wifi/Cellular RSSI samples are missing while the corresponding  battery level sample with the same timestamp is present.  If this happens for many samples of 20-min interval

3. check from the device log tag of the web-console if the device gets connected to the MobiControl server at schedule update (every 2 hours by default if you have not modified to other value).

4. check from the device log tag of the web-console if there is any data collection related entries within the same extended period as in (1).

MR
Matt Rogers
6 years ago

Hi Raymond, thanks for the quick reply.

1. check if there are any missing samples for battery level every 20 minutes for an extended period of time (say 1 or 2 days)

When the data fails to collect, it fails to collect for every value that we specify in the rule. If there is no battery percentage, there is no cellular signal strength, no RSSI data, no location, nothing...we get all we ask for, or we get none.

2. check if the Wifi/Cellular RSSI samples are missing while the corresponding  battery level sample with the same timestamp is present.  If this happens for many samples of 20-min interval

See response to (1). Data collection is all or nothing when this happens.

3. check from the device log tag of the web-console if the device gets connected to the MobiControl server at schedule update (every 2 hours by default if you have not modified to other value).

The device checks in but often disconnects very quickly, sometimes remaining connected for only a minute or two. I can't tell if it is connecting via cellular or wifi at each check in, only the last one recorded in the console-- you can see that it was connected via LTE for only 12 seconds, certainly not long enough to upload collected data:

4. check from the device log tag of the web-console if there is any data collection related entries within the same extended period as in (1).

There are no such entries.

If this was the only device with this issue I would suspect some sort of hardware issue but this has happened to multiple devices at different times in the hands of different users in different geographic locations. I don't know if this is related or not but the devices do seem to slowly lose available memory, which I believe may be due to an app we use that doesn't seem to do garbage collection very well so memory is either leaking or fragmenting. We will sometimes see devices reporting less than 200 MB free RAM but I know Mobicontrol only reports on RAM available for apps, not system memory (Our 4 GB tablets only show that they have 3.2 GB total, suggesting that some is reserved) but not every device with data collection failures seems ot have low memory conditions reported at their last check-ins.

I would love it if there was some alerting mechanism to indicate that a rule was not successfully completed--it is one thing to know if the rule failed to fire but I want to know if the output was completed, too.

Thanks as always for your insights.

Matt

RC
Raymond Chan Diamond Contributor
6 years ago

If there is memory leakage problem caused by any apps on a problematic device, it can be quite tricky to debug remotely.

If the problem comes from the device agent itself, then maybe you can first check:

- if there is any change collected data process if you reduce your data collection buffer size to say 8192KB rather than 999999KB.   8MB is actually very big if you just collect data at 20-minute interval.

- if your problematic device is configure in "persistent" mode in the device's Advanced Settings->Connection Settings.

- if  your problematic device try to get connected to the server every time whenever you request a device check-in from the web-console

- if  your problematic device report any successful file-sync or data-collection related events in the device log tab of your web console if you initiate "sync file now" action on the device from the web-console

MR
Matt Rogers
6 years ago

Hi Raymond:

- if there is any change collected data process if you reduce your data collection buffer size to say 8192KB rather than 999999KB.   8MB is actually very big if you just collect data at 20-minute interval.

I have changed the size from 99999 to 8192 and will keep an eye on it for changes in behavior

- if your problematic device is configure in "persistent" mode in the device's Advanced Settings->Connection Settings.

We are using Persistent, our workflow requires it. We have very non-technical user in the field using these tablets to conduct consumer surveys and we need to be able to remote to them at any time to monitor their process silently as well as to access the system after hours when the device is not being used. If a persistent connection is causing problems we will need to reconsider some of our approach but that will need to be a last resort.

- if  your problematic device try to get connected to the server every time whenever you request a device check-in from the web-console

If the device is showing as being online in the console we have not yet had a problem where it didn't check in upon request.

- if  your problematic device report any successful file-sync or data-collection related events in the device log tab of your web console if you initiate "sync file now" action on the device from the web-console

I tried to "sync now" with several devices and did not see any errors directly attributable to this command but in reviewing logs from many devices I see quite a few entries that I cannot associate with a specific activity except that they seem to occur (when they do occur) on the 20 minute sync schedule:

For the specific device that led to the creation of this post righ around the time that we stopped getting data collection fro this device there was an uptick in the number and type of these errors (I have filtered out all others, leaving just the errors), type 80, 23, and 22:

Thanks again Raymond for all your help,

Matt

RC
Raymond Chan Diamond Contributor
6 years ago

Based on your reply on the  data-collection settings, received/missing data pattern, communication mode, buffer size, etc, I guess the most likely cause of your problem is  the communication link quality.  There may be failing/problematic hardware or poor device position.  A signal strength (LTE RSSI) of -127 seems to be a bit low to me, but I am not sure it's in the typical range in your country and for for cellular operator.   I suggest you check the range of values in similar device model with no data-collection problem, and see if the problematic devices have much lower signal strength values (on average) than those that work.

For the error type 80, 23, and 22, you have to check with Soti support team to know what they refer to.

MR
Matt Rogers
6 years ago

Hi Raymond,

-127 for RSSI is what is always shown for signal strength if the device is actually connected via LTE, I think "-127" just shows what SOTI considers to be the floor for signal strength and if the main connection is LTE then of course the wifi signal would be zero:

This is for a randomly selected device currently connected via LTE for the last few hours showing the corresponding collected data for LTE and RSSI for the same collection period