Marcio Valerio Silva

Marcio Valerio  Silva

I am a specialist in safety management, with experience in implementing and maintaining safety, health and environment management systems, in addition to risk management.

Location Rio de Janeiro, Brazil



  • When we are plotting several related series so that we can compare the patterns in them, what are the strengths and the weaknesses of a plot that puts all of the series on the same graph?
    The big advantage is that everything is on the same scale. The big disadvantage is that information from all countries with a much smaller number of arrivals are all...

  • ! agree that predictions can only work well if the historical relationships between variables continue to hold.

  • Which better summarizes what you see in the data?
    The multiplicative option.

    For the additive plot, the residuals tend to be small in the middle and large toward the edges. Why do you think this is?
    Because the swings are bigger when the trend is higher.

  • When we have a multiplicative decomposition, how do we adjust the trend value to incorporate the seasonal effect at each time point?
    We adjust multiplying the seasonal swing to the trend.

    What is the no-change value when we are making additive adjustments?
    Is the point of additive model crossing zero. Adding zero makes no change.

    What is the no-change...

  • What is the basic idea behind an additive model (or additive seasonal decomposition)?
    To find out seasonal swings, assuming that they are all the same apart from purely random differences.

    What is the basic idea behind a multiplicative model (or multiplicative seasonal decomposition)?
    To find out seasonal swings that are a lot smaller towards the...

  • What is the basic idea behind an additive model (or additive seasonal decomposition)?
    To find out seasonal swings, assuming that they are all the same apart from purely random differences.

    Why do we want to find stable structures in our time series?
    Because we need it for projecting into the future to form forecasts

  • Holes in series could be solved with guesses and then changing these guesses and see how that affects the results (“sensitivity analysis”).

  • What is time-series data?
    Is a kind of data collected over time.

    Why are people interested in time-series data?
    Becuase it can help understand the past but, ever more, predict the future.

    What is quarterly data?
    It is data reported four times a year, covering periods of three months.

    Why do people plot time-series data with points joined up by...

  • It was amazing testing if a simply randomly chosen groups can cause the same deviations of the factor been studied

  • VIT is very good to show the effect in experimental data can be produced by the randomisation done.

  • When we look at a plot of experimental data that compares two treatment groups, why is it not always just obvious which treatment is better? What question goes through our minds?
    Because it could just be "the luck of the draw". The question is: "Do the effects of a randomisation experiment real demonstrate ftreatment diferences?".

    What is the basic idea...

  • What is randomisation variation and why is it a problem?
    Is the type of variation caused by simply using randomly chosen groups. The problem is that we can´t state the true cause.

    How can we reduce the levels of randomisation variation?
    Having more observation points in the samples.

  • The short readings are very good references.

  • Why do we want to do randomised experiments? What is the point of them?
    Because we need to be sure whose factors are causing confounding. The true cause could be in other ways different

    What are the two elements we need to have in place to be able to have a fair test?
    We have to intervene in what cause is confounding and we have to use a balanced...

  • This statement is very clear:
    "With 95% confidence, the population quantity for variable for the subgroup is somewhere between ci.lower and ci.upper."

  • How is a “nest of trend curves” obtained for putting onto a scatterplot?
    Bootstrap resamples are taken from the data, and for each resample, the same type of curve is computed and added to the graph.

    How do we interpret such a “nest of trend curves”?
    We shouldn't have any confidence in the fitted trend in regions where the curves are far apart

  • What uncertainties are being conveyed by the confidence intervals drawn around means on a graph?
    As each average is within a probable range, trying to find the difference in the intervals adds uncertainties.

    What does the overlap between confidence intervals drawn about means suggest visually?
    The comparison might be difficult because the values may be...

  • What are the two main ways of approaching the problem of obtaining confidence intervals?
    mathematical theory (e.g. normal distribution) and computer intensive methods (e.g. bootstrap).

    Why are methods based on mathematical theory the default methods in most packages for long-standing problems?
    Because, in the historical route, it´s become first due to a...

  • What proportion of participants in the NHANES-1000 population do we expect to be classified as “Obese”?
    The simulation indicated a value between 8 and 24,5%.

    I am wondering the VIT. I could really see the confidence inteerval

  • What is the basic idea of how a bootstrap confidence interval is constructed to capture the true population value of some quantity (e.g. a mean, a median, a percentage, ..)?
    The basic idea is finding out the minimum and the maximum of a confidence interval by calculating the same quantitynfor a large number of bootstrap resamples.

    What do we do to find out...

  • Viewing the new words, we could guess good knowledge ahead

  • What is bootstrap re-sampling? How do we generate a bootstrap-resample?
    It is a method for estimating the margins of error of a small sample. We generate by sampling from the sample with replacement.

    What would happen if we took our re-samples using the ordinary way of sampling (without replacement)?
    The samples will be identical so there will be no...

  • What is the most reliable way we know of obtaining data about populations without misleading biases? Why is this method not perfect?
    Using random sampling. It´s not perfect because there will always be an error due to sampling.

    What happens whenever we use data from a sample to estimate a population quantity?
    We might find errors, which get smaller as the...

  • When increasing the sample size and repetitions, the error tend to be the minimum.

  • The bigger the sample size is, the smaller sampling error does

  • What effect does sample size have on sampling error?
    The bigger the sample is, the smaller sampling error goes.

    For what two reasons are non-random selection mechanisms worse than random selection mechanisms?
    Random selection minimizes bias influence and can get good idea of how reliable the estimates are.

    What were the 5 “take home messages” from this...

  • Do the problems caused by bad measurement systems and biased selection mechanisms go away when we get huge amounts of data?
    No, these problems don´t go away as we get more data.

    Do the problems caused by confounding go away when we get huge amounts of data?
    No, the influence of confounders doesn´t change with the amount of data.

    Do the problems caused...

  • What is a lurking variable?
    It is an alternative name for confounders, something that causes changes in both of the outcome and the predict variables.
    We have methods for adjusting for confounders, so why can we still not reliably draw causal conclusions from observational data alone?
    Because there is always the chace that effects we think we are seeing is...

  • What is a confounder?
    It is something that causes changes in both the outcome and the predictor of interest.

    What is a lurking variable?
    See confounder

    How can we adjust for a lack of balance on a known confounder?
    We must make comparisons within groups that have similar values of the confounder.

  • When is a variable a cause of changes in the outcome?
    When purposefully changing its value lead to a change in the pattern of outcomes.

    What is an observational study?
    Data result from observing conditions as they are in the world.

    When do we have positive association between variables? negative association?
    Positive, whne things tend to occur...

  • We have learned some ways for solving bad data:
    - checking back against original sources.
    - setting suspicius data as missing.
    - checking if values of each numeric variable lie within believable limits.
    - checking if values of each categorical variable correspond to what is expected.
    - looking for suspicious points in dot plots or scatter plots.

  • In terms of selection biases, there are important items to consider:
    - Biases or errors are generally biggest in the “non-scientific” polls or surveys that do not use sampling.
    - There are a host of things that can have a significant influence on the way people answer a question, such as information in the survey about why it is being done, differences in...

  • We have discussed about validity (measuring the right thing) and reliability (when you measure the same thing over and over again, you get pretty much the same answer).

  • It´s always good when we known new concepts.

  • What are artefacts?
    Artificial patterns caused by deficiencies in the data-collection process.

    What are the two main ways that systematic biases get into data?
    Bad measurement processes and biased selection process.

    Why can missing values cause biases?
    Because the data that we have values for could show trends different from those whose values have...

  • What is the first law of data analysis?
    "Garbage in, garbage out"

    Can sophisticated data analysis turn bad data into reliable conclusions?
    No. If data is really bad, we should just walk way form it.

    In terms of the patterns we see in data, what is the difference between facts and artefacts?
    Facts are patterns that reflect the way things really are in...

  • I have had some problems becuase it is not my native language. Sometimes, I made some confusion with the meaning of variabls used in iNZight software. After all, I can say that I realize the features of the software for showing data trends and relationships. I would like to highlight the "Overcoming perceptual problems" session. I became surprised with the...

  • It was very nice to visit many possibilities for visualizing trends and other relationships.

  • INZight is a great tool for analyzing big data sets

  • What are we looking for when we colour by a (third) numeric variable?
    The behavior in separated ranges or different groups of a third numerical variable, in the scatterplot.

    What are we looking for when we colour by a (third) categorical variable?
    The spectrum ranging from completely separated to totally mixed up, related to a third variable.

    What are...

  • I choose smother becuase it fits better the dataset. Until 20, there is a positive slope. Than, the weight stays stable until 60, when it decrease slowly. In smother graph, the dotted lines means the spread of points between 25 and 75% of weight data.

  • In large data sets, what is emphasized visually by a low transparency setting? by a high transparency setting?
    In low transparency, the concentration of values is shown. In hogh transparency, the bulk of the data is clearly shown.

    What are running quantiles and why are they useful for large data sets?
    Running quantiles are curves drawn and labelled as...

  • What is overprinting and why does it cause problems for us?
    Overprinting is a situation where a second point is plotted directly on top of the first point plotted. It makes difficulties for seeing how many points are sitting at a given position.

    What is jittering and how can the use of jittering help us?
    Jittering is a way to add a little bit of random...

  • It is important to understand that a strong correlation doesn´t mean that changes in the predictor are actually causing changes in the outcome.

  • We have learned new and important parameters of lines and curves. I am excited to move forward

  • A lot of new concepts... I think it´s better waiting when they will appear during the week

  • What shapes can be captured by each of linear, quadratic and cubic trend curves?
    Line is used to capture trend that looks like a straight line. Quadratic, to capture that looks like one segment of curve. Cubic, to capture two bneds of the curve

    What advantage does a smoother have over quadratic or cubic curves?
    They are more flexible and take on an even...

  • I am excited to learn more about data analytics

  • This week was very amazing. I could face the relations between variables, be it numerical or categorical. I have learned the concepts of scatter, trend and outliers, features that offer new insights for understanding data visualization. I have struggled with the interpretation of separate bars against side-by-side bars.

  • Using InZight is amazing!

  • What must we assume when we use existing data to predict a new outcome?
    The data representativeness of the way things behave in the setting in which we want to make the prediction.

    In what type of region is it particularly dangerous to make predictions?
    The region where data don´t exist.

    How can we check visually whether a trend line or curve is...

  • How do we use a scatter plot to predict a new outcome at a given value of the predictor variable?
    By using a trend curve constructed from dots plot and the scatter around the trend.

    How can we find a range of values that is likely to contain the new outcome?
    We draw a vertical line through the value of the predict variable of interest.

    When will the...

  • What is the standard way of displaying the relationship between 2 numeric variables?
    The standard way is a scatter plot.

    What sort of variable is plotted against the vertical scale and what against the horizontal scale?
    Outcome variables are plotted against the vertical scale and predictor variables against the horizontal scale.

    It is often useful to...

  • The interesting change I have seen with gender differences was in the High School category. The more female than male proportionality graduates starts in the age groups 60-69.

  • What basic ideas allow us to explore how a relationship changes with a 3rd (and perhaps a 4th) variable?
    The pattern of relatiuonship between two variables can vary if you see a stratification in one of them, using a third (or 4th) variable.

    Why is stepping through a set of graphs (like playing a movie) useful?
    Because changes often jump out at us much...

  • We have learned new useful concepts

  • Which type of graph should you look at?
    We should look many types of graphs because one can shows what another not.

    In terms of separate bar graphs for each predictor group, what would you expect to see if there was no relationship between the outcome variable and the predictor variable?
    All the plots must have the same shape.

    What are you looking for...

  • We distinguished between an outcome variable and predictor variables. What is an outcome variable and what are predictor variables?
    Outcome variable is that of primary interest. Predictor variable might help predicting the outcome.

    What are the strengths and weaknesses of using separate bar graphs of an outcome variable for each predictor...

  • Why do people care about relationships between variables?
    Because of risk factors for diseases. The variables can help tell us how much more, or less, likely it is that a person would get a desease.

    We distinguished between an outcome variable and predictor variables. What is an outcome variable and what are predictor variables?
    Outcoma variable is that...

  • It was very interesting to use InZight to rel data sets. I could visualize many results that can be extracted to a data set. The concept of box plot was new for me. I am excited to start another week.

  • What technique was introduced here for investigating the effect of a third variable?
    Breaking out by a categorical version of a variable (ex. year) and look at this separately for each a third variable (region).
    Why is it useful to play through a set of graphs (like a movie)?
    It is important to visualize how a variable gradually oscillates compared to...

  • When looking at the extent of a difference between group centres, what else should we be relating that to when trying to gauge how important a change is?
    We should be relating the background variability which is shown by the sread of values we see.

    When do we pay attention to changes in spread or variability?
    When we need to make better predictions or...

  • When we compared children per woman, between regions, how many variables were involved and what were they?
    Two variables, a numeric variable, children per woman, and a categoric variable, region.

    What are the main things we look for when comparing groups using dot plots?
    The main things we look for are changes between groups, differences in centre,...

  • Thanks

  • The more progress in the content the more I am motivated

  • Oddities - we should look for data points that are far form the general pattern. We should ask if those values are real or mistakes

    Median is the point that divides the data in half with half of the observations above it anda half below it. First quartile is the data point that divides the bottom half in half. Third quartile is the point that divides the...

  • Congrats. Those basic concepts were very well explained. I´m excited to known dealing with skewed data sets.

  • @Chris, I would like your comments. Thanks!

    1. Each value is plotted against the scale using a dot. When two entities have the same value, we stac them one above the other.
    2. Centre, spread, shape and oddities.
    3. Centre
    4. It relates to the balance point of the dot plot
    5. Median divides the data in half, with half of the observations above it and...

  • This session was amazing. It shows the different types and possibilities to draw graphs

  • Most softwares uses alphabetic order because it´s easy to locate a particular item. There is the frequency order which shows relative popularities by the bars height. There is the natural order which shows the normal distribution of data

  • A glossary always helps a lot

  • The basic graph for displaying the data for a categorical variable is bar chart, because it shows the most common group by higher bar. It allows seeing the relative proportion from each group.

  • I am excited to remember statistics. Let´s go

  • I got on after finishing another training in the same subject. I learned the software INZight which can produce impressive things. I didn´t have many problems for understanding, unless the definitions I had to memorize. This week introduced some insights that I have to reflect on for producing solutions in my job.

  • The potential of this software is very impressive.

  • We are just getting the aplication of knowldege obtained until now

  • Nice content until now.

  • This step has brought some concepts of rectangular data: entities, row, column, numeric values, categorical values.

  • Let´s start using iNZight

  • I didn´t know about Hans Rossling history. It is amazing!

  • Enjoying a lot. Learning relevant content

  • I have used PDCA cycle all of my working life. I am excited to see examples of PPDAC implementation

  • Data analytics can help tha organisations achieving better results.

  • Hi, everybody! I am from Rio de Janeiro, Brazil. I am a safety engineer. I am interested to work with big data analytics for safety management aplications. My propose is building a tool for predicting losses from safety deviations data.

  • Great starting with a free software!

  • I´m so excited to understanding how we can deal with data, realizing what we could extract from them

  • I assert that this course has given me a better understanding of big data analytics. This experience affected positively my life. My studying in this subject will stop now. I.m encouraged to continous getting knowledge in data science.

  • I have gained knowledge about big data that transformed me more concious about this matter. Now, I can understand how is the process of using big data. It is very important to clarify what is the problem to be solved. In this way, the data analytics cycle can start and end with sucess. The potenciality of the usage of big data analytics can not be scaled, in...

  • Big data analytics has brought challenges but insights for improvement in some subjects, like sports, health and business.
    I believe that, in the future, we will see disruptive forms of social relations, work and business unimaginable

  • All time, I am asked to allow cookies in new visiting sites

  • Some vendor sites tracks individual interests. So they send specific advertisements to those individuals tracked

  • All the time, people study science data and identifiy a way for re-identifing unencrypted information. So it is difficult to prevent that

  • I dont´t believe that machines and algorithms will be able to replace doctors, nurses, or other health practitioners in the near future. Even the improvement in deep learning, the decisions on new challenges must be done by humans.

  • I Would decide to store within a EHR. I believe that my action might help other people

  • The representation using big data analytics allows a co mprehensive understanding of the behavior of the fans

  • I tried to combine offence, defence and shooting ability. My proposal is to be offensive as the best way to defend

  • I agree that more recent emphasis on analytics is removing the ‘mythology’ of sport. That analyses promotes a way of understanding how different each is from the other

  • A citizen science idea that I would be a volunteer is some involving pictures of birds

  • Inconsistent data is really a problem for them

  • packaging disposal analysis can provide inputs for producers review the design of packages, fo instance. If we could cross the frequency of disposal data, the collection service could be better planned.