# Marcio Valerio Silva

I am a specialist in safety management, with experience in implementing and maintaining safety, health and environment management systems, in addition to risk management.

Location Rio de Janeiro, Brazil

## Activity

• Marcio Valerio Silva made a comment

When we are plotting several related series so that we can compare the patterns in them, what are the strengths and the weaknesses of a plot that puts all of the series on the same graph?
The big advantage is that everything is on the same scale. The big disadvantage is that information from all countries with a much smaller number of arrivals are all...

• Marcio Valerio Silva made a comment

! agree that predictions can only work well if the historical relationships between variables continue to hold.

• Which better summarizes what you see in the data?
The multiplicative option.

For the additive plot, the residuals tend to be small in the middle and large toward the edges. Why do you think this is?
Because the swings are bigger when the trend is higher.

• When we have a multiplicative decomposition, how do we adjust the trend value to incorporate the seasonal effect at each time point?
We adjust multiplying the seasonal swing to the trend.

What is the no-change value when we are making additive adjustments?
Is the point of additive model crossing zero. Adding zero makes no change.

What is the no-change...

• What is the basic idea behind an additive model (or additive seasonal decomposition)?
To find out seasonal swings, assuming that they are all the same apart from purely random differences.

What is the basic idea behind a multiplicative model (or multiplicative seasonal decomposition)?
To find out seasonal swings that are a lot smaller towards the...

• What is the basic idea behind an additive model (or additive seasonal decomposition)?
To find out seasonal swings, assuming that they are all the same apart from purely random differences.

Why do we want to find stable structures in our time series?
Because we need it for projecting into the future to form forecasts

• Marcio Valerio Silva made a comment

Holes in series could be solved with guesses and then changing these guesses and see how that affects the results (“sensitivity analysis”).

• Marcio Valerio Silva made a comment

What is time-series data?
Is a kind of data collected over time.

Why are people interested in time-series data?
Becuase it can help understand the past but, ever more, predict the future.

What is quarterly data?
It is data reported four times a year, covering periods of three months.

Why do people plot time-series data with points joined up by...

• Marcio Valerio Silva made a comment

It was amazing testing if a simply randomly chosen groups can cause the same deviations of the factor been studied

• VIT is very good to show the effect in experimental data can be produced by the randomisation done.

• Marcio Valerio Silva made a comment

When we look at a plot of experimental data that compares two treatment groups, why is it not always just obvious which treatment is better? What question goes through our minds?
Because it could just be "the luck of the draw". The question is: "Do the effects of a randomisation experiment real demonstrate ftreatment diferences?".

What is the basic idea...

• Marcio Valerio Silva made a comment

What is randomisation variation and why is it a problem?
Is the type of variation caused by simply using randomly chosen groups. The problem is that we can´t state the true cause.

How can we reduce the levels of randomisation variation?
Having more observation points in the samples.

• Marcio Valerio Silva made a comment

The short readings are very good references.

• Marcio Valerio Silva made a comment

Why do we want to do randomised experiments? What is the point of them?
Because we need to be sure whose factors are causing confounding. The true cause could be in other ways different

What are the two elements we need to have in place to be able to have a fair test?
We have to intervene in what cause is confounding and we have to use a balanced...

• Marcio Valerio Silva made a comment

This statement is very clear:
"With 95% confidence, the population quantity for variable for the subgroup is somewhere between ci.lower and ci.upper."

• Marcio Valerio Silva made a comment

How is a “nest of trend curves” obtained for putting onto a scatterplot?
Bootstrap resamples are taken from the data, and for each resample, the same type of curve is computed and added to the graph.

How do we interpret such a “nest of trend curves”?
We shouldn't have any confidence in the fitted trend in regions where the curves are far apart

• What uncertainties are being conveyed by the confidence intervals drawn around means on a graph?
As each average is within a probable range, trying to find the difference in the intervals adds uncertainties.

What does the overlap between confidence intervals drawn about means suggest visually?
The comparison might be difficult because the values may be...

• Marcio Valerio Silva made a comment

What are the two main ways of approaching the problem of obtaining confidence intervals?
mathematical theory (e.g. normal distribution) and computer intensive methods (e.g. bootstrap).

Why are methods based on mathematical theory the default methods in most packages for long-standing problems?
Because, in the historical route, it´s become first due to a...

• Marcio Valerio Silva made a comment

What proportion of participants in the NHANES-1000 population do we expect to be classified as “Obese”?
The simulation indicated a value between 8 and 24,5%.

I am wondering the VIT. I could really see the confidence inteerval

• Marcio Valerio Silva made a comment

What is the basic idea of how a bootstrap confidence interval is constructed to capture the true population value of some quantity (e.g. a mean, a median, a percentage, ..)?
The basic idea is finding out the minimum and the maximum of a confidence interval by calculating the same quantitynfor a large number of bootstrap resamples.

What do we do to find out...

• Marcio Valerio Silva made a comment

Viewing the new words, we could guess good knowledge ahead

• What is bootstrap re-sampling? How do we generate a bootstrap-resample?
It is a method for estimating the margins of error of a small sample. We generate by sampling from the sample with replacement.

What would happen if we took our re-samples using the ordinary way of sampling (without replacement)?
The samples will be identical so there will be no...

• What is the most reliable way we know of obtaining data about populations without misleading biases? Why is this method not perfect?
Using random sampling. It´s not perfect because there will always be an error due to sampling.

What happens whenever we use data from a sample to estimate a population quantity?
We might find errors, which get smaller as the...

• When increasing the sample size and repetitions, the error tend to be the minimum.

• The bigger the sample size is, the smaller sampling error does

• Marcio Valerio Silva made a comment

What effect does sample size have on sampling error?
The bigger the sample is, the smaller sampling error goes.

For what two reasons are non-random selection mechanisms worse than random selection mechanisms?
Random selection minimizes bias influence and can get good idea of how reliable the estimates are.

What were the 5 “take home messages” from this...

• Marcio Valerio Silva made a comment

Do the problems caused by bad measurement systems and biased selection mechanisms go away when we get huge amounts of data?
No, these problems don´t go away as we get more data.

Do the problems caused by confounding go away when we get huge amounts of data?
No, the influence of confounders doesn´t change with the amount of data.

Do the problems caused...

• Marcio Valerio Silva made a comment

What is a lurking variable?
It is an alternative name for confounders, something that causes changes in both of the outcome and the predict variables.
We have methods for adjusting for confounders, so why can we still not reliably draw causal conclusions from observational data alone?
Because there is always the chace that effects we think we are seeing is...

• Marcio Valerio Silva made a comment

What is a confounder?
It is something that causes changes in both the outcome and the predictor of interest.

What is a lurking variable?
See confounder

How can we adjust for a lack of balance on a known confounder?
We must make comparisons within groups that have similar values of the confounder.

• Marcio Valerio Silva made a comment

When is a variable a cause of changes in the outcome?
When purposefully changing its value lead to a change in the pattern of outcomes.

What is an observational study?
Data result from observing conditions as they are in the world.

When do we have positive association between variables? negative association?
Positive, whne things tend to occur...

• Marcio Valerio Silva made a comment

We have learned some ways for solving bad data:
- checking back against original sources.
- setting suspicius data as missing.
- checking if values of each numeric variable lie within believable limits.
- checking if values of each categorical variable correspond to what is expected.
- looking for suspicious points in dot plots or scatter plots.

• Marcio Valerio Silva made a comment

In terms of selection biases, there are important items to consider:
- Biases or errors are generally biggest in the “non-scientific” polls or surveys that do not use sampling.
- There are a host of things that can have a significant influence on the way people answer a question, such as information in the survey about why it is being done, differences in...

• Marcio Valerio Silva made a comment

We have discussed about validity (measuring the right thing) and reliability (when you measure the same thing over and over again, you get pretty much the same answer).

• Marcio Valerio Silva made a comment

It´s always good when we known new concepts.

• Marcio Valerio Silva made a comment

What are artefacts?
Artificial patterns caused by deficiencies in the data-collection process.

What are the two main ways that systematic biases get into data?
Bad measurement processes and biased selection process.

Why can missing values cause biases?
Because the data that we have values for could show trends different from those whose values have...

• What is the first law of data analysis?
"Garbage in, garbage out"

Can sophisticated data analysis turn bad data into reliable conclusions?
No. If data is really bad, we should just walk way form it.

In terms of the patterns we see in data, what is the difference between facts and artefacts?
Facts are patterns that reflect the way things really are in...

• Marcio Valerio Silva made a comment

I have had some problems becuase it is not my native language. Sometimes, I made some confusion with the meaning of variabls used in iNZight software. After all, I can say that I realize the features of the software for showing data trends and relationships. I would like to highlight the "Overcoming perceptual problems" session. I became surprised with the...

• It was very nice to visit many possibilities for visualizing trends and other relationships.

• Marcio Valerio Silva made a comment

INZight is a great tool for analyzing big data sets

• Marcio Valerio Silva made a comment

What are we looking for when we colour by a (third) numeric variable?
The behavior in separated ranges or different groups of a third numerical variable, in the scatterplot.

What are we looking for when we colour by a (third) categorical variable?
The spectrum ranging from completely separated to totally mixed up, related to a third variable.

What are...

• Marcio Valerio Silva made a comment

I choose smother becuase it fits better the dataset. Until 20, there is a positive slope. Than, the weight stays stable until 60, when it decrease slowly. In smother graph, the dotted lines means the spread of points between 25 and 75% of weight data.

• Marcio Valerio Silva made a comment

In large data sets, what is emphasized visually by a low transparency setting? by a high transparency setting?
In low transparency, the concentration of values is shown. In hogh transparency, the bulk of the data is clearly shown.

What are running quantiles and why are they useful for large data sets?
Running quantiles are curves drawn and labelled as...

• Marcio Valerio Silva made a comment

What is overprinting and why does it cause problems for us?
Overprinting is a situation where a second point is plotted directly on top of the first point plotted. It makes difficulties for seeing how many points are sitting at a given position.

What is jittering and how can the use of jittering help us?
Jittering is a way to add a little bit of random...

• Marcio Valerio Silva made a comment

It is important to understand that a strong correlation doesn´t mean that changes in the predictor are actually causing changes in the outcome.

• Marcio Valerio Silva made a comment

We have learned new and important parameters of lines and curves. I am excited to move forward

• Marcio Valerio Silva made a comment

A lot of new concepts... I think it´s better waiting when they will appear during the week

• Marcio Valerio Silva made a comment

What shapes can be captured by each of linear, quadratic and cubic trend curves?
Line is used to capture trend that looks like a straight line. Quadratic, to capture that looks like one segment of curve. Cubic, to capture two bneds of the curve

What advantage does a smoother have over quadratic or cubic curves?
They are more flexible and take on an even...

• Marcio Valerio Silva made a comment